Instead of assuming all objects cost the same amount of time to
delta compress, aggregate the byte size of objects in the list
and partition threads with roughly equal total bytes.
Before splitting the list select the N largest paths and assign
each one to its own thread. This allows threads to get through the
worst cases in parallel before attempting smaller paths that are
more likely to be splittable.
By running the largest path buckets first on each thread the likely
slowest part of compression is done early, while progress is still
reporting a low percentage. This gives users a better impression of
how fast the phase will run. On very complex inputs the slow part
is more likely to happen first, making a user realize its time to
go grab lunch, or even run it overnight.
If the worst sections are earlier, memory overruns may show up
earlier, giving the user a chance to correct the configuration and
try again before wasting large amounts of time. It also makes it
less likely the delta compression phase reaches 92% in 30 minutes
and then crawls for 10 hours through the remaining 8%.
* changes:
Always attempt delta compression when reuseDeltas is false
Avoid TemporaryBuffer.Heap on very small deltas
Correct distribution of allowed delta size along chain length
Split remaining delta work on path boundaries
Replace DeltaWindow array with circularly linked list
Micro-optimize copy instructions in DeltaEncoder
Micro-optimize DeltaWindow primary loop
Micro-optimize DeltaWindow maxMemory test to be != 0
Mark DeltaWindowEntry methods final
Always attempt delta compression when reuseDeltas is false
If reuseObjects=true but reuseDeltas=false the caller wants attempt
a delta for every object in the input list. Test for reuseDeltas
to ensure every object passes through the searchInWindow() method.
If no delta is possible for an object and it will be stored whole
(non-delta format), PackWriter may still reuse its content from any
source pack. This avoids an inflate()-deflate() cycle to recompress
the object contents.
TemporaryBuffer is great when the output size is not known, but must
be bound by a relatively large upper limit that fits in memory, e.g.
64 KiB or 20 MiB. The buffer gracefully supports growing storage by
allocating 8 KiB blocks and storing them in an ArrayList.
In a Git repository many deltas are less than 8 KiB. Typical tree
objects are well below this threshold, and their deltas must be
encoded even smaller.
For these much smaller cases avoid the 8 KiB minimum allocation used
by TemporaryBuffer. Instead allocate a very small OutputStream
writing to an array that is sized at the limit.
Correct distribution of allowed delta size along chain length
Nicolas Pitre discovered a very simple rule for selecting between two
different delta base candidates:
- if based whole object, must be <= 50% of target
- if at end of a chain, must be <= 1/depth * 50% of target
The rule penalizes deltas near the end of the chain, requiring them to
be very small in order to be kept by the packer. This favors deltas
that are based on a shorter chain, where the read-time unpack cost is
much lower. Fewer bytes need to be consulted from the source pack
file, and less copying is required in memory to rebuild the object.
Junio Hamano explained Nico's rule to me today, and this commit fixes
DeltaWindow to implement it as described.
When no base has been chosen the computation is simply the statements
denoted above. However once a base with depth of 9 has been chosen
(e.g. when pack.depth is limited to 10), a non-delta source may
create a new delta that is up to 10x larger than the already selected
base. This reflects the intent of Nico's size distribution rule no
matter what order objects are visited in the DeltaWindow.
With this patch and my other patches applied, repacking JGit with:
CGit (all) 5,711,735 bytes; real 0m13.942s user 0m47.722s [1]
JGit heads 5,718,295 bytes; real 0m11.880s user 0m38.177s [2]
rest 9,809 bytes
The improved JGit result for the head pack is only 6.4 KiB larger than
CGit's resulting pack. This patch allowed JGit to find an additional
39.7 KiB worth of space savings. JGit now also often runs 2s faster
than CGit, despite also creating bitmaps and pruning objects after the
head pack creation.
[1] time git repack -a -d -F --window=250 --depth=50
[2] time java -Xmx128m -jar jgit debug-gc
When an idle thread tries to steal work from a sibling's remaining
toSearch queue, always try to split along a path boundary. This
avoids missing delta opportunities in the current window of the
thread whose work is being taken.
The search order is reversed to walk further down the chain from
current position, avoiding the risk of splitting the list within
the path the thread is currently processing.
When selecting which thread to split from use an accurate estimate
of the size to be taken. This avoids selecting a thread that has
only one path remaining but may contain more pending entries than
another thread with several paths remaining.
As there is now a race condition where the straggling thread can
start the next path before the split can finish, the stealWork()
loop spins until it is able to acquire a split or there is only
one path remaining in the siblings.
PackWriter generally chooses the order for objects when it builds the
object lists. This ordering already depends on history information to
guide placing more recent objects first and historical objects last.
Allow PackWriter to make the basic ordering decisions, instead of
trying to override them. The old approach of sorting the list caused
DfsReader to override any ordering change PackWriter might have tried
to make when repacking a repository.
This now better matches with WindowCursor's implementation, where
PackWriter solely determines the object ordering.
Replace DeltaWindow array with circularly linked list
Typical window sizes are 10 and 250 (although others are accepted).
In either case the pointer overhead of 1 pointer in an array or
2 pointers for a double linked list is trivial. A doubly linked
list as used here for window=250 is only another 1024 bytes on a
32 bit machine, or 2048 bytes on a 64 bit machine.
The critical search loops scan through the array in either the
previous direction or the next direction until the cycle is finished,
or some other scan abort condition is reached. Loading the next
object's pointer from a field in the current object avoids the
branch required to test for wrapping around the edge of the array.
It also saves the array bounds check on each access.
When a delta is chosen the window is shuffled to hoist the currently
selected base as an earlier candidate for the next object. Moving
the window entry is easier in a double-linked list than sliding a
group of array entries.
The copy instruction formatter should not to compute the shifts and
masks twice. Instead compute them once and assume there is a register
available to store the temporary "b" for compare with 0.
javac and the JIT are more likely to understand a boolean being
used as a branch conditional than comparing int against 0 and 1.
Rewrite NEXT_RES and NEXT_SRC constants to be booleans so the
code is clarified for the JIT.
Micro-optimize DeltaWindow maxMemory test to be != 0
Instead of using a compare-with-0 use a does not equal 0.
javac bytecode has a special instruction for this, as it
is very common in software. We can assume the JIT knows
how to efficiently translate the opcode to machine code,
and processors can do != 0 very quickly.
* changes:
Increase PackOutputStream copy buffer to 64 KiB
Tighten object header writing in PackOutuptStream
Skip main thread test in ThreadSafeProgressMonitor
Declare members of PackOutputStream final
Always allocate the PackOutputStream copyBuffer
Disable CRC32 computation when no PackIndex will be created
Steal work from delta threads to rebalance CPU load
Most objects are written as OFS_DELTA with the base in the pack,
that is why this case comes first in writeHeader(). Rewrite the
condition to always examine this first and cache the PackWriter's
formatting flag for use of OFS_DELTA headers, in modern Git networks
this is true more often then it it is false.
Assume the cost of write() is high, especially due to entering the
MessageDigest to update the pack footer SHA-1 computation. Combine
the OFS_DELTA information as part of the header buffer so that the
entire burst is a single write call, rather than two relatively
small ones. Most OFS_DELTA headers are <= 6 bytes, so this rewrite
tranforms 2 writes of 3 bytes each into 1 write of ~6 bytes.
Try to simplify the objectHeader code to reduce branches and use
more local registers. This shouldn't really be necessary if the
compiler is well optimized, but it isn't very hard to clarify data
usage to either javac or the JIT, which may make it easier for the
JIT to produce better machine code for this method.
Skip main thread test in ThreadSafeProgressMonitor
update(int) is only invoked from a worker thread, in JGit's case
this is DeltaTask. The Javadoc of TSPM suggests update should only
ever be used by a worker thread.
Skip the main thread check, saving some cycles on each run of the
progress monitor.
These methods cannot be sanely overridden anywhere. Most methods
are package visible only, or are private. A few public methods do
exist but there is no useful way to override them since creation
of PackOutputStream is managed by PackWriter and cannot be delegated.
The getCopyBuffer() is almost always used during output. All known
implementations of ObjectReuseAsIs rely on the buffer to be present,
and the only sane way to get good performance from PackWriter is to
reuse objects during packing.
Avoid a branch and test when obtaining this buffer by making sure
it is always populated.
Disable CRC32 computation when no PackIndex will be created
If a server is streaming 3GiB worth of pack data to a client there
is no reason to compute the CRC32 checksum on the objects. The
CRC32 code computed by PackWriter is used only in the new index
created by writeIndex(), which is never invoked for the native Git
network protocols.
Object reuse may still compute its own CRC32 to verify the data
being copied from an existing pack has not been corrupted. This
check is done by the ObjectReader that implements ObjectReuseAsIs
and has no relationship to the CRC32 being skipped during output.
Steal work from delta threads to rebalance CPU load
If the configuration wants to run 4 threads the delta search work
is initially split somewhat evenly across the 4 threads. During
execution some threads will finish early due to the work not being
split fairly, as the initial partitions were based on object count
and not cost to inflate or size of DeltaIndex.
When a thread finishes early it now tries to take 50% of the work
remaining on a sibling thread, and executes that before exiting.
This repeats as each thread completes until a thread has only 1
object remaining.
Repacking Blink, Chromium's new fork of WebKit (2.2M objects 3.9G):
Matthias Sohn [Mon, 8 Apr 2013 21:25:01 +0000 (17:25 -0400)]
Merge changes I8445070d,I38f10d62,I2af0bf68
* changes:
Fix plugin provider names to conform with release train requirement
Add missing @since tags for new API methods
DfsReaderOptions are options for a DFS stored repository
Support cutting existing delta chains longer than the max depth
Some packs built by JGit have incredibly long delta chains due to a
long standing bug in PackWriter. Google has packs created by JGit's
DfsGarbageCollector with chains of 6000 objects long, or more.
Inflating objects at the end of this 6000 long chain is impossible
to complete within a reasonable time bound. It could take a beefy
system hours to perform even using the heavily optimized native C
implementation of Git, let alone with JGit.
Enable pack.cutDeltaChains to be set in a configuration file to
permit the PackWriter to determine the length of each delta chain
and clip the chain at arbitrary points to fit within pack.depth.
Delta chain cycles are still possible, but no attempt is made to
detect them. A trivial chain of A->B->A will iterate for the full
pack.depth configured limit (e.g. 50) and then pick an object to
store as non-delta.
When cutting chains the object list is walked in reverse to try
and take advantage of existing chain computations. The assumption
here is most deltas are near the end of the list, and their bases
are near the front of the list. Going up from the tail attempts to
reuse chainLength computations by relying on the memoized value in
the delta base.
The chainLength field in ObjectToPack is overloaded into the depth
field normally used by DeltaWindow. This is acceptable because the
chain cut happens before delta search, and the chainLength is reset
to 0 if delta search will follow.
This switch is called mostly for OBJ_TREE and OBJ_BLOB types, which
typically make up 66% of the objects in a repository. Simplify the
test for these common types by testing for the one bit they have in
common and returning early.
Object type 5 is currently undefined. In the old code it would hit
the default and return true. In the new code it will match the early
case and also return true. In either implementation 5 should never show
up as it is not a valid type known to Git.
Object type 6 OFS_DELTA is not permitted to be supplied here.
Object type 7 REF_DELTA is not permitted to be supplied here.
Now that WANT_WRITE is gone renumber the flags to move the unused
bit next to the type. Recluster AS_IS and DELTA_ATTEMPTED to be
next to each other since these bits are tested as a pair.
Free up the WANT_WRITE flag in ObjectToPack by switching the test
to use the special offset value of 1. The Git pack file format
calls for the first 4 bytes to be 'PACK', which means any object
must start at an offset >= 4. Current versions require another 8
bytes in the header, placing the first object at offset = 12.
So offset = 1 is an invalid location for an object, and can be
used as a marker signal to indicate the writing loop has tried
to write the object, but recursed into the base first. When an
object is visited with offset == 1 it means there is a cycle in
the delta base path, and the cycle must be broken.
Garbage is randomly ordered and unlikely to delta compress against
other garbage. Disable delta compression allowing objects to switch
to whole form when moving to the garbage pack.
Because the garbage is not well compressed assume deltas were not
attempted during a normal GC cycle.
Override the reuse settings, garbage that can be reused should be
reused as-is into the garbage pack rather than switching something
like the compression level during a GC. It is intended that garbage
will eventually be removed from the repository so expending CPU
time on a compression switch is not worthwhile.
This implementation has been proven to deadlock in production server
loads. Google has been running with it disabled for a quite a while,
as the bugs have been difficult to identify and fix.
Instead of suggesting it works and is useful, drop the code. JGit
should not advertise support for functionality that is known to
be broken.
In a few of the places where read-ahead was enabled by DfsReader
there is more information about what blocks should be loaded when.
During object representation selection, or size lookup, or sending
object as-is to a PackWriter, or sending an entire pack as-is the
reader knows exactly which blocks are required in the cache, and it
also can compute when those will be needed. The broken read-ahead
code was stupid and just read a fixed amount ahead of the current
offset, which can waste IOs if more precise data was available.
DFS systems are usually slow to respond so read-ahead is still
a desired feature, but it needs to be rebuilt from scratch and
make better use of the offset information.
Rewrite this complicated logic to examine each pack file exactly
once. This reduces thrashing when there are many large pack files
present and the reader needs to locate each object's header.
The intermediate temporary list is now smaller, it is bounded to
the same length as the input object list. In the prior version of
this code the list contained one entry for every representation of
every object being packed.
Only one representation object is allocated, reducing the overall
memory footprint to be approximately one reference per object found
in the current pack file (the pointer in the BlockList). This saves
considerable working set memory compared to the prior version that
made and held onto a new representation for every ObjectToPack.
Clip the configured limit to Integer.MAX_VALUE at the top of the
loop, saving a compare branch per object considered. This can cut
2M branches out of a repacking of the Linux kernel.
Rewrite the logic so the primary path is to match the conditional;
most objects are larger than BLKSZ (16 bytes) and less than limit.
This may help branch prediction on CPUs if the CPU tries to assume
execution takes the side of the branch and not the second.
Declare critical exposed methods of ObjectToPack final
There is no reasonable way for a subclass to correctly override and
implement these methods. They depend on internal state that cannot
otherwise be managed.
Most of these methods are also in critical paths of PackWriter.
Declare them final so subclasses do not try to replace them,
and so the JIT knows the smaller ones can be safely inlined.
Declare internal flag accessors of ObjectToPack final
None of these methods should ever be overridden at runtime by an
extension class. Given how small they are the JIT should perform
inlining where reasonable. Hint this is possible by marking all
methods final so its clear no replacement can be loaded later on.
This flag is never checked on its own. It is only checked as part
of a pair through the doNotAttemptDelta() method. Delete the method
so there is less confusion about the flag being used on its own.
Robin Rosenberg [Tue, 2 Apr 2013 07:00:58 +0000 (09:00 +0200)]
Fix PathFilterGroup not to throw StopWalkException too early
Due to the Git internal sort order a directory is sorted as if it ended
with a '/', this means that the path filter didn't set the last possible
matching entry to the correct value. In the reported issue we had the
following filters.
org.eclipse.jgit.console
org.eclipse.jgit
As an optimization we throw a StopWalkException when the walked tree
passes the last possible filter, which was this:
org.eclipse.jgit.console
Due to the git sorting order, the tree was processed in this order:
org.eclipse.jgit.console
org.eclipse.jgit.test
org.eclipse.jgit
At org.eclipse.jgit.test we threw the StopWalkException preventing the
walk from completing successfully.
A correct last possible match should be:
org.eclipse.jgit/
For simplicit we define it as:
org/eclipse/jgit/
This filter would be the maximum if we also had e.g. org and org.eclipse
in the filter, but that would require more work so we simply replace all
characters lower than '/' by a slash.
We believe the possible extra walking does not not warrant the extra
analysis.
Arthur Baars [Sun, 31 Mar 2013 14:28:47 +0000 (15:28 +0100)]
LogCommand.all(): filter out refs that do not refer to commit objects
1. I have authored 100% of the content I'm contributing,
2. I have the rights to donate the content to Eclipse,
3. I contribute the content under the EDL
Arthur Baars [Mon, 4 Mar 2013 11:14:16 +0000 (11:14 +0000)]
LogCommand.all(), peel references before using them
Problem:
LogCommand.all() throws an IncorrectObjectTypeException when
there are tag references, and the repository does not contain
the file "packed-refs". It seems that the references were not properly
peeled before being added to the markStart() method.
Solution:
Call getRepository().peel() on every Ref that has isPeeled()==false
in LogCommand.all() .
Added test case for LogCommand.all() on repo with a tag.
1. I have authored 100% of the content I'm contributing,
2. I have the rights to donate the content to Eclipse,
3. I contribute the content under the EDL
Robin Rosenberg [Mon, 18 Feb 2013 19:25:00 +0000 (20:25 +0100)]
Speed up clone/fetch with large number of refs
Instead of re-reading all refs after each update, execute
the deletes first, then read all refs once and perform
the check for conflicting ref names in memory.
François Rey [Thu, 13 Sep 2012 22:11:18 +0000 (00:11 +0200)]
New functions to facilitate the writing of CLI test cases
Writing CLI test cases is tedious because of all the formatting and
escaping subtleties needed when comparing actual output with what's
expected. While creating a test case the two new functions are to be
used instead of the existing execute() in order to prepare the correct
command and expected output and to generate the corresponding test code
that can be pasted into the test case function.
Change-Id: Ia66dc449d3f6fb861c300fef8b56fba83a56c94c Signed-off-by: Chris Aniszczyk <zx@twitter.com>
Matthias Sohn [Tue, 26 Mar 2013 20:20:19 +0000 (21:20 +0100)]
When renaming the lock file succeeds the lock isn't held anymore
This wrong book-keeping caused IOExceptions to be thrown because
LockFile.unlock() erroneously tried to delete the non-existing lock
file. These IOExeptions were hidden since they were silently caught.
Change-Id: If42b6192d92c5a2d8f2bf904b16567ef08c32e89 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Shawn Pearce [Tue, 26 Mar 2013 17:57:19 +0000 (13:57 -0400)]
Always add FileExt to DfsPackDescription
Instead of forcing the implementation of the DFS backend to handle
making sure the extension bits are set correctly, have the common
callers in JGit set the extension at the same time they supply the
file sizes to the pack description. This simplifies assumptions for
an implementation of the DFS backend.
Robin Rosenberg [Sun, 24 Mar 2013 00:06:29 +0000 (01:06 +0100)]
Extend FileUtils.rename to common git semantics
Unlike the OS or Java rename this method will (on *nix) try (on Windows)
replace the target with the source provided the target does not exist,
the target does exist and is a file, or if it is a directory which only
contains directories. In the latter case the directory hierarchy will be
deleted.
If the initial rename fails and the target is an existing file the the
target file will be deleted first and then the rename is retried.
Change-Id: Iae75c49c85445ada7795246a02ce02f7c248d956 Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Allow users to provide their OutputStream (via Transport#
push(monitor, refUpdates, out)) so that server messages can be written
to it (in SideBandInputStream) while they're coming in.
CQ: 7065
Bug: 398404
Change-Id: I670782784b38702d52bca98203909aca0496d1c0 Signed-off-by: Andre Dietisheim <andre.dietisheim@gmail.com> Signed-off-by: Chris Aniszczyk <zx@twitter.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Edwin Kempin [Tue, 19 Mar 2013 06:28:36 +0000 (07:28 +0100)]
Allow to get repo statistics from GarbageCollectionCommand before gc
When running the garbage collection for a repository it is often
interesting to compare the repository statistics from before and after
the garbage collection to understand the effect of the garbage
collection. This is why it makes sense that the
GarbageCollectionCommand provides a method to retrieve the repository
statistics before running the garbage collection.
So far without running the garbage collection the repository statistics
can only be retrieved by using JGit internal classes. This is what EGit
and Gerrit do at the moment, but it would be better to have an API for
this.
Change-Id: Id7e579157e9fbef5cfd1fc9f97ada45f0ca8c379 Signed-off-by: Edwin Kempin <edwin.kempin@sap.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Only on Windows the rename operation which renames temporary Packfiles
(and index-files and bitmap-files) sometime fails. This happens only
when renaming a temporary Packfile to a Packfile which already exists.
Such situations occur if you run GC twice on a repo without modifying
the repo inbetween.
In such situations there was bug in GC which led to a corrupted repo
whithout any packfiles anymore. This commit fixes the problem by
introducing a utility method which renames a file and throws an
IOException if it fails. This method also takes care to repeat a
failing rename if our FS class has found out we are running on a
platform with a unreliable File.renameTo() method.
I am searching for a better solution because even with this utility
method in hand a GC on a already GC'ed repo will fail on Windows. But
at least with this fix we will not produce corrupted repos anymore.
Bug: 389305
Change-Id: Iac1ab3e0b8c419c90404f2e2f3559672eb8f6d28 Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
With JGit it is possible to write reflog entries where new objectid and
old objectid is null. Such reflogs cause FileRepository GC to crash
because it doesn't expect the new objectid to be null. One case where
this happened is in Gerrit's allProjects repo. In the same way as we
expect the old objectid to be potentially null we should also ignore
null values in the new objectid column.
Shawn Pearce [Mon, 18 Mar 2013 14:44:48 +0000 (07:44 -0700)]
JGit 3.0: move internal classes into an internal subpackage
This breaks all existing callers once. Applications are not supposed
to build against the internal storage API unless they can accept API
churn and make necessary updates as versions change.
Shawn Pearce [Mon, 18 Mar 2013 14:35:31 +0000 (10:35 -0400)]
Merge changes I2645d482,Ic81fefb1,Id64ab38d
* changes:
Remove cached_packs support in favor of bitmaps
Remove objects before optimization from DfsGarbageCollector
Simplfy caching of DfsPackDescription from PackWriter.Statistics
Dave Borowitz [Fri, 15 Mar 2013 15:10:59 +0000 (08:10 -0700)]
NameRevCommand: Don't use merge cost for first parent
Treat first parent traversals as 1 and higher parents as MERGE_COST,
to match git name-rev. Allow overriding the merge cost during tests to
avoid creating 2^16 commits on the fly.
Shawn Pearce [Thu, 14 Mar 2013 23:27:53 +0000 (16:27 -0700)]
Remove cached_packs support in favor of bitmaps
The bitmap code in PackWriter knows exactly when to use a pack as
a "cached pack". It enables cached pack usage only when the pack
has a bitmap and its entire closure of objects needs to be sent.
This is a much simpler code path to maintain, and JGit actually
has a way to write the necessary index.
Shawn Pearce [Thu, 14 Mar 2013 23:14:04 +0000 (16:14 -0700)]
Remove objects before optimization from DfsGarbageCollector
Just counting objects is not sufficient. There are some race
conditions with receive packs and delta base completion that
may confuse such a simple algorithm.
Instead always do the larger set computations, and rely on the
PackWriter having no objects pending as the way to avoid creating
an empty pack file.
Shawn Pearce [Thu, 14 Mar 2013 23:10:44 +0000 (16:10 -0700)]
Simplfy caching of DfsPackDescription from PackWriter.Statistics
Let the pack description copy the relevant stats values. This
moves it out of the garbage collector and compactor algorithms,
co-locating with something that might care.
Remove some unnecessary code from the DfsPackCompactor, the stats
tracks the same information and can supply it.