Support cutting existing delta chains longer than the max depth
Some packs built by JGit have incredibly long delta chains due to a
long standing bug in PackWriter. Google has packs created by JGit's
DfsGarbageCollector with chains of 6000 objects long, or more.
Inflating objects at the end of this 6000 long chain is impossible
to complete within a reasonable time bound. It could take a beefy
system hours to perform even using the heavily optimized native C
implementation of Git, let alone with JGit.
Enable pack.cutDeltaChains to be set in a configuration file to
permit the PackWriter to determine the length of each delta chain
and clip the chain at arbitrary points to fit within pack.depth.
Delta chain cycles are still possible, but no attempt is made to
detect them. A trivial chain of A->B->A will iterate for the full
pack.depth configured limit (e.g. 50) and then pick an object to
store as non-delta.
When cutting chains the object list is walked in reverse to try
and take advantage of existing chain computations. The assumption
here is most deltas are near the end of the list, and their bases
are near the front of the list. Going up from the tail attempts to
reuse chainLength computations by relying on the memoized value in
the delta base.
The chainLength field in ObjectToPack is overloaded into the depth
field normally used by DeltaWindow. This is acceptable because the
chain cut happens before delta search, and the chainLength is reset
to 0 if delta search will follow.
This switch is called mostly for OBJ_TREE and OBJ_BLOB types, which
typically make up 66% of the objects in a repository. Simplify the
test for these common types by testing for the one bit they have in
common and returning early.
Object type 5 is currently undefined. In the old code it would hit
the default and return true. In the new code it will match the early
case and also return true. In either implementation 5 should never show
up as it is not a valid type known to Git.
Object type 6 OFS_DELTA is not permitted to be supplied here.
Object type 7 REF_DELTA is not permitted to be supplied here.
Now that WANT_WRITE is gone renumber the flags to move the unused
bit next to the type. Recluster AS_IS and DELTA_ATTEMPTED to be
next to each other since these bits are tested as a pair.
Free up the WANT_WRITE flag in ObjectToPack by switching the test
to use the special offset value of 1. The Git pack file format
calls for the first 4 bytes to be 'PACK', which means any object
must start at an offset >= 4. Current versions require another 8
bytes in the header, placing the first object at offset = 12.
So offset = 1 is an invalid location for an object, and can be
used as a marker signal to indicate the writing loop has tried
to write the object, but recursed into the base first. When an
object is visited with offset == 1 it means there is a cycle in
the delta base path, and the cycle must be broken.
Garbage is randomly ordered and unlikely to delta compress against
other garbage. Disable delta compression allowing objects to switch
to whole form when moving to the garbage pack.
Because the garbage is not well compressed assume deltas were not
attempted during a normal GC cycle.
Override the reuse settings, garbage that can be reused should be
reused as-is into the garbage pack rather than switching something
like the compression level during a GC. It is intended that garbage
will eventually be removed from the repository so expending CPU
time on a compression switch is not worthwhile.
This implementation has been proven to deadlock in production server
loads. Google has been running with it disabled for a quite a while,
as the bugs have been difficult to identify and fix.
Instead of suggesting it works and is useful, drop the code. JGit
should not advertise support for functionality that is known to
be broken.
In a few of the places where read-ahead was enabled by DfsReader
there is more information about what blocks should be loaded when.
During object representation selection, or size lookup, or sending
object as-is to a PackWriter, or sending an entire pack as-is the
reader knows exactly which blocks are required in the cache, and it
also can compute when those will be needed. The broken read-ahead
code was stupid and just read a fixed amount ahead of the current
offset, which can waste IOs if more precise data was available.
DFS systems are usually slow to respond so read-ahead is still
a desired feature, but it needs to be rebuilt from scratch and
make better use of the offset information.
Rewrite this complicated logic to examine each pack file exactly
once. This reduces thrashing when there are many large pack files
present and the reader needs to locate each object's header.
The intermediate temporary list is now smaller, it is bounded to
the same length as the input object list. In the prior version of
this code the list contained one entry for every representation of
every object being packed.
Only one representation object is allocated, reducing the overall
memory footprint to be approximately one reference per object found
in the current pack file (the pointer in the BlockList). This saves
considerable working set memory compared to the prior version that
made and held onto a new representation for every ObjectToPack.
Clip the configured limit to Integer.MAX_VALUE at the top of the
loop, saving a compare branch per object considered. This can cut
2M branches out of a repacking of the Linux kernel.
Rewrite the logic so the primary path is to match the conditional;
most objects are larger than BLKSZ (16 bytes) and less than limit.
This may help branch prediction on CPUs if the CPU tries to assume
execution takes the side of the branch and not the second.
Declare critical exposed methods of ObjectToPack final
There is no reasonable way for a subclass to correctly override and
implement these methods. They depend on internal state that cannot
otherwise be managed.
Most of these methods are also in critical paths of PackWriter.
Declare them final so subclasses do not try to replace them,
and so the JIT knows the smaller ones can be safely inlined.
Declare internal flag accessors of ObjectToPack final
None of these methods should ever be overridden at runtime by an
extension class. Given how small they are the JIT should perform
inlining where reasonable. Hint this is possible by marking all
methods final so its clear no replacement can be loaded later on.
This flag is never checked on its own. It is only checked as part
of a pair through the doNotAttemptDelta() method. Delete the method
so there is less confusion about the flag being used on its own.
Robin Rosenberg [Tue, 2 Apr 2013 07:00:58 +0000 (09:00 +0200)]
Fix PathFilterGroup not to throw StopWalkException too early
Due to the Git internal sort order a directory is sorted as if it ended
with a '/', this means that the path filter didn't set the last possible
matching entry to the correct value. In the reported issue we had the
following filters.
org.eclipse.jgit.console
org.eclipse.jgit
As an optimization we throw a StopWalkException when the walked tree
passes the last possible filter, which was this:
org.eclipse.jgit.console
Due to the git sorting order, the tree was processed in this order:
org.eclipse.jgit.console
org.eclipse.jgit.test
org.eclipse.jgit
At org.eclipse.jgit.test we threw the StopWalkException preventing the
walk from completing successfully.
A correct last possible match should be:
org.eclipse.jgit/
For simplicit we define it as:
org/eclipse/jgit/
This filter would be the maximum if we also had e.g. org and org.eclipse
in the filter, but that would require more work so we simply replace all
characters lower than '/' by a slash.
We believe the possible extra walking does not not warrant the extra
analysis.
Robin Rosenberg [Mon, 18 Feb 2013 19:25:00 +0000 (20:25 +0100)]
Speed up clone/fetch with large number of refs
Instead of re-reading all refs after each update, execute
the deletes first, then read all refs once and perform
the check for conflicting ref names in memory.
François Rey [Thu, 13 Sep 2012 22:11:18 +0000 (00:11 +0200)]
New functions to facilitate the writing of CLI test cases
Writing CLI test cases is tedious because of all the formatting and
escaping subtleties needed when comparing actual output with what's
expected. While creating a test case the two new functions are to be
used instead of the existing execute() in order to prepare the correct
command and expected output and to generate the corresponding test code
that can be pasted into the test case function.
Change-Id: Ia66dc449d3f6fb861c300fef8b56fba83a56c94c Signed-off-by: Chris Aniszczyk <zx@twitter.com>
Matthias Sohn [Tue, 26 Mar 2013 20:20:19 +0000 (21:20 +0100)]
When renaming the lock file succeeds the lock isn't held anymore
This wrong book-keeping caused IOExceptions to be thrown because
LockFile.unlock() erroneously tried to delete the non-existing lock
file. These IOExeptions were hidden since they were silently caught.
Change-Id: If42b6192d92c5a2d8f2bf904b16567ef08c32e89 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Shawn Pearce [Tue, 26 Mar 2013 17:57:19 +0000 (13:57 -0400)]
Always add FileExt to DfsPackDescription
Instead of forcing the implementation of the DFS backend to handle
making sure the extension bits are set correctly, have the common
callers in JGit set the extension at the same time they supply the
file sizes to the pack description. This simplifies assumptions for
an implementation of the DFS backend.
Robin Rosenberg [Sun, 24 Mar 2013 00:06:29 +0000 (01:06 +0100)]
Extend FileUtils.rename to common git semantics
Unlike the OS or Java rename this method will (on *nix) try (on Windows)
replace the target with the source provided the target does not exist,
the target does exist and is a file, or if it is a directory which only
contains directories. In the latter case the directory hierarchy will be
deleted.
If the initial rename fails and the target is an existing file the the
target file will be deleted first and then the rename is retried.
Change-Id: Iae75c49c85445ada7795246a02ce02f7c248d956 Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Allow users to provide their OutputStream (via Transport#
push(monitor, refUpdates, out)) so that server messages can be written
to it (in SideBandInputStream) while they're coming in.
CQ: 7065
Bug: 398404
Change-Id: I670782784b38702d52bca98203909aca0496d1c0 Signed-off-by: Andre Dietisheim <andre.dietisheim@gmail.com> Signed-off-by: Chris Aniszczyk <zx@twitter.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Edwin Kempin [Tue, 19 Mar 2013 06:28:36 +0000 (07:28 +0100)]
Allow to get repo statistics from GarbageCollectionCommand before gc
When running the garbage collection for a repository it is often
interesting to compare the repository statistics from before and after
the garbage collection to understand the effect of the garbage
collection. This is why it makes sense that the
GarbageCollectionCommand provides a method to retrieve the repository
statistics before running the garbage collection.
So far without running the garbage collection the repository statistics
can only be retrieved by using JGit internal classes. This is what EGit
and Gerrit do at the moment, but it would be better to have an API for
this.
Change-Id: Id7e579157e9fbef5cfd1fc9f97ada45f0ca8c379 Signed-off-by: Edwin Kempin <edwin.kempin@sap.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Only on Windows the rename operation which renames temporary Packfiles
(and index-files and bitmap-files) sometime fails. This happens only
when renaming a temporary Packfile to a Packfile which already exists.
Such situations occur if you run GC twice on a repo without modifying
the repo inbetween.
In such situations there was bug in GC which led to a corrupted repo
whithout any packfiles anymore. This commit fixes the problem by
introducing a utility method which renames a file and throws an
IOException if it fails. This method also takes care to repeat a
failing rename if our FS class has found out we are running on a
platform with a unreliable File.renameTo() method.
I am searching for a better solution because even with this utility
method in hand a GC on a already GC'ed repo will fail on Windows. But
at least with this fix we will not produce corrupted repos anymore.
Bug: 389305
Change-Id: Iac1ab3e0b8c419c90404f2e2f3559672eb8f6d28 Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
With JGit it is possible to write reflog entries where new objectid and
old objectid is null. Such reflogs cause FileRepository GC to crash
because it doesn't expect the new objectid to be null. One case where
this happened is in Gerrit's allProjects repo. In the same way as we
expect the old objectid to be potentially null we should also ignore
null values in the new objectid column.
Shawn Pearce [Mon, 18 Mar 2013 14:44:48 +0000 (07:44 -0700)]
JGit 3.0: move internal classes into an internal subpackage
This breaks all existing callers once. Applications are not supposed
to build against the internal storage API unless they can accept API
churn and make necessary updates as versions change.
Shawn Pearce [Mon, 18 Mar 2013 14:35:31 +0000 (10:35 -0400)]
Merge changes I2645d482,Ic81fefb1,Id64ab38d
* changes:
Remove cached_packs support in favor of bitmaps
Remove objects before optimization from DfsGarbageCollector
Simplfy caching of DfsPackDescription from PackWriter.Statistics
Dave Borowitz [Fri, 15 Mar 2013 15:10:59 +0000 (08:10 -0700)]
NameRevCommand: Don't use merge cost for first parent
Treat first parent traversals as 1 and higher parents as MERGE_COST,
to match git name-rev. Allow overriding the merge cost during tests to
avoid creating 2^16 commits on the fly.
Shawn Pearce [Thu, 14 Mar 2013 23:27:53 +0000 (16:27 -0700)]
Remove cached_packs support in favor of bitmaps
The bitmap code in PackWriter knows exactly when to use a pack as
a "cached pack". It enables cached pack usage only when the pack
has a bitmap and its entire closure of objects needs to be sent.
This is a much simpler code path to maintain, and JGit actually
has a way to write the necessary index.
Shawn Pearce [Thu, 14 Mar 2013 23:14:04 +0000 (16:14 -0700)]
Remove objects before optimization from DfsGarbageCollector
Just counting objects is not sufficient. There are some race
conditions with receive packs and delta base completion that
may confuse such a simple algorithm.
Instead always do the larger set computations, and rely on the
PackWriter having no objects pending as the way to avoid creating
an empty pack file.
Shawn Pearce [Thu, 14 Mar 2013 23:10:44 +0000 (16:10 -0700)]
Simplfy caching of DfsPackDescription from PackWriter.Statistics
Let the pack description copy the relevant stats values. This
moves it out of the garbage collector and compactor algorithms,
co-locating with something that might care.
Remove some unnecessary code from the DfsPackCompactor, the stats
tracks the same information and can supply it.
Dave Borowitz [Thu, 7 Mar 2013 17:41:05 +0000 (09:41 -0800)]
Add a NameRevCommand for describing IDs in terms of refnames
The walk logic does not use RevWalk because it needs to walk all paths
to each of the requested commits, keeping track of each path along which
the commit was found in the RevCommit subclass. From these paths, a
single "best" path is chosen based on the total path length, with a
penalty applied for paths that traverse merges.
Robin Rosenberg [Mon, 18 Feb 2013 22:59:20 +0000 (23:59 +0100)]
A folder does not constitute a dirty work tree
This fixes two cases:
- A folder without tracked content exist both in the workdir and merged
commit, as long as there names within that folder does not conflict.
- An empty folder structure exists with the same name as a file in the
merged commit.
Shawn Pearce [Fri, 8 Mar 2013 20:45:28 +0000 (12:45 -0800)]
Avoid looking at UNREACHABLE_GARBAGE for client have lines
Clients send a bunch of unknown objects to UploadPack on each round
of negotiation. Many of these are not known to the server, which
leads the implementation to be looking at indexes for garbage packs.
Disable examining the index of a garbage pack, allowing servers to
avoid reading them from disk during negotiation.
The effect of this change is the server will only ACK a have line
if the object was reachable during the last garbage collection,
or was recently added to the repository. For most repositories
there is no impact in this behavior change.
If a repository rewinds a branch, runs GC, and then resets the
branch back to where it was before, the now current tip is going to
be skipped by this change. A client that has the commit may wind up
getting a slightly larger data transfer from the server as an older
common ancestor will be chosen during negotiation. This is fixable
on the server side by running GC again to correct the layout of
objects in pack files.
Shawn Pearce [Fri, 8 Mar 2013 20:25:12 +0000 (12:25 -0800)]
Simplify UploadPack by parsing wants separately from haves
The DHT backend was very slow at parsing objects. To work around
that performance limitation I obfuscated UploadPack by folding both
the want and have sets together in a single parse queue. Since DHT
was removed the complexity is no longer constructive to JGit.
Doing this refactoring prepares the code for a slightly future
change where the have lines need to be handled specially from the
want lines. Splitting the parsing up into two phases makes such
a modification trivial.
Shawn Pearce [Fri, 8 Mar 2013 19:19:44 +0000 (11:19 -0800)]
Cluster UNREACHABLE_GARBAGE packs at the end of the search list
Garbage is unlikely to be used by a reader. Ensure they always
cluster at the end of the search list, no matter what timestamp
was used on the pack files.
Shawn Pearce [Fri, 8 Mar 2013 19:02:04 +0000 (11:02 -0800)]
Avoid repacking unreachable garbage in DfsGarbageCollector
If a repository has significant amounts of unreachable garbage the
final phase to coalesce it can take longer than any other part of the
garbage collection phase. Provide a setting for applications to tweak
the threshold where coalescing ends and files just remain on disk.
Robin Rosenberg [Wed, 20 Feb 2013 23:07:33 +0000 (00:07 +0100)]
Do not cherry-pick merge commits during rebase
Rebase computes the list of commits that are included in
the merges, just like Git does, so do not try to include
the merge commits. Re-recreating merges during rebase is
a bit more complicated and might be a useful future extension,
but for now just linearize during rebase.
Change-Id: I61239d265f395e5ead580df2528e46393dc6bdbd Signed-off-by: Robin Stocker <robin@nibor.org>
Robin Rosenberg [Wed, 20 Feb 2013 21:54:59 +0000 (22:54 +0100)]
Extend FileUtils.delete with option to delete empty directories only
The new option EMPTY_DIRECTORIES_ONLY will make delete() only delete
empty directories. Any attempt to delete files will fail. Can be
combined with RECURSIVE to wipe out entire tree structures and
IGNORE_ERRORS to silently ignore any files or non-empty directories.
Change-Id: Icaa9a30e5302ee5c0ba23daad11c7b93e26b7445 Signed-off-by: Robin Stocker <robin@nibor.org>
Shawn Pearce [Wed, 6 Mar 2013 20:48:25 +0000 (12:48 -0800)]
Do not attempt to read bitmap from invalid pack
If a pack file has been marked invalid due to a prior IOException
accessing its contents, do not offer its bitmap index to callers.
The pack cannot be used so its bitmap should be off limits from
any reader trying to work from a bitmap.
Enable writing pack indexes with bitmaps in the GC.
Update the dfs and file GC implementations to prepare and write
bitmaps on the packs that contain the full closure of the object
graph. Update the DfsPackDescription to include the index version.
Colby Ranger [Mon, 6 Aug 2012 18:09:53 +0000 (11:09 -0700)]
Support creating pack bitmap indexes in PackWriter.
Update the PackWriter to support writing out pack bitmap indexes,
a parallel ".bitmap" file to the ".pack" file.
Bitmaps are selected at commits every 1 to 5,000 commits for
each unique path from the start. The most recent 100 commits are
all bitmapped. The next 19,000 commits have a bitmaps every 100
commits. The remaining commits have a bitmap every 5,000 commits.
Commits with more than 1 parent are prefered over ones
with 1 or less. Furthermore, previously computed bitmaps are reused,
if the previous entry had the reuse flag set, which is set when the
bitmap was placed at the max allowed distance.
Bitmaps are used to speed up the counting phase when packing, for
requests that are not shallow. The PackWriterBitmapWalker uses
a RevFilter to proactively mark commits with RevFlag.SEEN, when
they appear in a bitmap. The walker produces the full closure
of reachable ObjectIds, given the collection of starting ObjectIds.
For fetch request, two ObjectWalks are executed to compute the
ObjectIds reachable from the haves and from the wants. The
ObjectIds needed to be written are determined by taking all the
resulting wants AND NOT the haves.
For clone requests, we get cached pack support for "free" since
it is possible to determine if all of the ObjectIds in a pack file
are included in the resulting list of ObjectIds to write.
On my machine, the best times for clones and fetches of the linux
kernel repository (with about 2.6M objects and 300K commits) are
tabulated below:
Colby Ranger [Tue, 28 Aug 2012 16:08:20 +0000 (09:08 -0700)]
Added read/write support for pack bitmap index.
A pack bitmap index is an additional index of compressed
bitmaps of the object graph. Furthermore, a logical API of the index
functionality is included, as it is expected to be used by the
PackWriter.
Compressed bitmaps are created using the javaewah library, which is a
word-aligned compressed variant of the Java bitset class based on
run-length encoding. The library only works with positive integer
values. Thus, the maximum number of ObjectIds in a pack file that
this index can currently support is limited to Integer.MAX_VALUE.
Every ObjectId is given an integer mapping. The integer is the
position of the ObjectId in the complete ObjectId list, sorted
by offset, for the pack file. That integer is what the bitmaps
use to reference the ObjectId. Currently, the new index format can
only be used with pack files that contain a complete closure of the
object graph e.g. the result of a garbage collection.
The index file includes four bitmaps for the Git object types i.e.
commits, trees, blobs, and tags. In addition, a collection of
bitmaps keyed by an ObjectId is also included. The bitmap for each entry
in the collection represents the full closure of ObjectIds reachable
from the keyed ObjectId (including the keyed ObjectId itself). The
bitmaps are further compressed by XORing the current bitmaps against
prior bitmaps in the index, and selecting the smallest representation.
The XOR'd bitmap and offset from the current entry to the position
of the bitmap to XOR against is the actual representation of the entry
in the index file. Each entry contains one byte, which is currently
used to note whether the bitmap should be blindly reused.
Colby Ranger [Mon, 27 Aug 2012 23:08:42 +0000 (16:08 -0700)]
Break the dependency on RevObject when creating a newObjectToPack().
Update the ObjectReuseAsIs API to support creating new
ObjectToPack with only the AnyObjectId and Git object type. This is
needed to support the future pack index bitmaps, which only contain
this information and do not want the overhead of creating a temporary
object for every ObjectId.
As you can see, the destination ref pattern has a superfluous slash.
It looks like this behaviour has always been the case for CloneCommand,
at least since cc2197ed when code catering to bare-clone fetch refspecs
was added. That was released with JGit v1.0 almost 2 years ago, so
there will probably be some bare repos in the wild which will have been
cloned with JGit and have these corrupted refspecs.
The effect of the corrupted fetch refspec is quite interesting. Up to
and including JGit 2.0, the corrupt refspec was tolerated and fetches
would work as intended with no indication to the user that anything was
amiss. With JGit 2.1, a change was introduced which made JGit less
tolerant, and fetches now attempt to update the non-existing ref
"refs/heads//master". No exception is raised, but the real ref -
"refs/heads/master" - is not updated.
This behaviour was noticed by a user of Agit (which does bare clones by
default and recently updated from JGit v2.0 to v2.2), reported here:
https://github.com/rtyley/agit/issues/92
If you run C-Git fetch on a bare-repo cloned by JGit, it flat-out
rejects the refspec (checked against v1.7.10.4):
Incidentally, C-Git does not create an explicit fetch refspec at all
when performing a bare clone - the full remote config generated by C-Git
looks like this:
Roberto Tyley [Fri, 1 Mar 2013 21:49:58 +0000 (21:49 +0000)]
Fix RefUpdate performance for existing Refs
No longer invoke the expensive RefDatabase.isNameConflicting() check on
updating existing refs, reducing batch ref update time by ~97%.
The RefDirectory implementation of isNameConflicting() is quite
slow (it has to do an expensive loose-ref scan) but it's only necessary
to perform this check on ref update if the ref is being *created* - if
the ref already exists, we can already guarantee that it does not
conflict with any other refs.
C-Git seems to use a similar condition before making the
is_refname_available() check:
The execution time for the run is completely dominated by the batch ref
update at the end. Repeating the experiment with BFG v1.0.2 (using JGit
patched with this change), the refs update is dramatically reduced:
Colby Ranger [Mon, 28 Jan 2013 19:49:01 +0000 (11:49 -0800)]
Include supported extensions in PackFile constructor.
Previously a PackFile class was assumed to only support a .pack and .idx
file. Update the constructor to enumerate the supported extensions for
the pack file. This will allow the bitmap code to only be executed if
the bitmap extension file is known to exist.
Gustaf Lundh [Fri, 1 Feb 2013 12:20:31 +0000 (13:20 +0100)]
Performance fixes in DateRevQueue
When a lot of commits are added to DateRevQueue, the
sort-on-insertion approach is very heavy on CPU cycles.
One approach to fix this was made by Dave Borowitz:
https://git.eclipse.org/r/#/c/5491/
But using Java's PriorityQueue seems to have brought some
extra overhead, and the desired performance could not be
reached.
This fix takes another approach to the insertion problem,
without changing the expected behaviour or bringing extra
memory overhead:
If we detect over 1000 commits in the DateRevQueue, a
"seek-index" is rebuilt every 1000th added commit.
The index keeps track of every 100th commit in the
DateRevQueue. During insertions, it will be used for a
preliminary scanning (binary search) of the queue, with
the intention of helping add() find a good starting point
to start walking from. After finding this starting point,
add() will step commit-by-commit until the correct
insertion place in the queue is found (today, the queue
is expected to be sorted at all times).
When applied to repositories with many refs, this approach
has proven to bring huge performance gains and scales quite
well.
For instance, in a repository with close to 80000 refs,
we could cut down the time a typical Gerrit replication
of 1 commit would take (just a push from JGit's point of
view) from 32sec down to 3.5sec.
Below you see some typical times to add a specific amount
of commits (with random commit times) to the DateRevQueue
and the difference the preliminary seek-index makes:
Commits | Index | No Index
1024 8ms 8ms
2048 13ms 9ms
4096 5ms 59ms
8192 11ms 595ms
16384 22ms 3058ms
32768 64ms 13811ms
65536 201ms 62677ms
131072 783ms 331585ms
Only one extra reference is needed for every 100 inserted
commits (and only when we see more than 1000 commits in
the queue), so the memory overhead should be negligible.
Various index-stepping values were tested, and 100 seemed to
scale very well and be effective from start.
In the future, it should probably be dynamic and based on
the number of refs in the queue, but this should serve well
as a starting point.
Note: While other fundamentally different data structures may
be more suitable, the DateRevQueue is extremely central to
many of the Git core operations. This approach was chosen,
since the effect of the patch is easy to predict in conjuction
with the current implementation. A totally new data structure
will make it harder to predict behaviour in many common and
uncommon cases (in terms of breaking ties, memory usage, cost
when using few elements, object creation/disposing overhead,
etc).