summaryrefslogtreecommitdiffstats
path: root/org.eclipse.jgit.packaging/org.eclipse.jgit.junit.feature/pom.xml
diff options
context:
space:
mode:
authorGustaf Lundh <gustaf.lundh@sonymobile.com>2013-02-01 13:20:31 +0100
committerGustaf Lundh <gustaf.lundh@sonymobile.com>2013-02-25 12:36:29 +0100
commit84afea9179932995d1e59f8fda4e6b11217382ad (patch)
tree6751aab447704342ee96a4d4b864bb0035c4d6c2 /org.eclipse.jgit.packaging/org.eclipse.jgit.junit.feature/pom.xml
parent51d0e1f26e23d04ae73054958546159e01196a4d (diff)
downloadjgit-84afea9179932995d1e59f8fda4e6b11217382ad.tar.gz
jgit-84afea9179932995d1e59f8fda4e6b11217382ad.zip
Performance fixes in DateRevQueue
When a lot of commits are added to DateRevQueue, the sort-on-insertion approach is very heavy on CPU cycles. One approach to fix this was made by Dave Borowitz: https://git.eclipse.org/r/#/c/5491/ But using Java's PriorityQueue seems to have brought some extra overhead, and the desired performance could not be reached. This fix takes another approach to the insertion problem, without changing the expected behaviour or bringing extra memory overhead: If we detect over 1000 commits in the DateRevQueue, a "seek-index" is rebuilt every 1000th added commit. The index keeps track of every 100th commit in the DateRevQueue. During insertions, it will be used for a preliminary scanning (binary search) of the queue, with the intention of helping add() find a good starting point to start walking from. After finding this starting point, add() will step commit-by-commit until the correct insertion place in the queue is found (today, the queue is expected to be sorted at all times). When applied to repositories with many refs, this approach has proven to bring huge performance gains and scales quite well. For instance, in a repository with close to 80000 refs, we could cut down the time a typical Gerrit replication of 1 commit would take (just a push from JGit's point of view) from 32sec down to 3.5sec. Below you see some typical times to add a specific amount of commits (with random commit times) to the DateRevQueue and the difference the preliminary seek-index makes: Commits | Index | No Index 1024 8ms 8ms 2048 13ms 9ms 4096 5ms 59ms 8192 11ms 595ms 16384 22ms 3058ms 32768 64ms 13811ms 65536 201ms 62677ms 131072 783ms 331585ms Only one extra reference is needed for every 100 inserted commits (and only when we see more than 1000 commits in the queue), so the memory overhead should be negligible. Various index-stepping values were tested, and 100 seemed to scale very well and be effective from start. In the future, it should probably be dynamic and based on the number of refs in the queue, but this should serve well as a starting point. Note: While other fundamentally different data structures may be more suitable, the DateRevQueue is extremely central to many of the Git core operations. This approach was chosen, since the effect of the patch is easy to predict in conjuction with the current implementation. A totally new data structure will make it harder to predict behaviour in many common and uncommon cases (in terms of breaking ties, memory usage, cost when using few elements, object creation/disposing overhead, etc). Change-Id: Ie7b99f40eacf6324bfb4716d82073adeda64d10f
Diffstat (limited to 'org.eclipse.jgit.packaging/org.eclipse.jgit.junit.feature/pom.xml')
0 files changed, 0 insertions, 0 deletions