IndexPack: Use stack-based recursion for delta resolution
Replace 'method' with 'heap'-based recursion for resolving deltas.
Git packfile delta-chain depth can exceed 50 levels in certain files
(the packfile of the JGit project itself has >800 objects with
chain-length >50). Using method-based recursion on such packfiles will
quickly throw a StackOverflowError on VMs with constrained stack.
Benefits:
* packfile delta-resolution no longer limited by the maximum number
of stack frames permitted on the current thread.
* slight performance improvement
(3% speed increase on the packfile of the JGit project)
Change-Id: I1d9b3a8ba3c6d874d83cb93ebf171c6ab193e6cc
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Always use streaming (for SHA-checksum & collision detection)
when indexing whole blobs, regardless of their size.
Positives:
* benefits of bugfix #312868 will apply to all runtimes, without
additional conf for mem-constrained JVMs (5MB huge for some)
* no byte array allocation
(re-uses readBuffer instead of allocating new full-size array)
* mildly better overall performance
(given the usual blob-does-not-need-collision-checking case)
* removes unnecessary code
Negative:
* doubles the disk IO for a blob comparision
(comparitively rare occurance)
I perf-tested a range of threshold sizes against a random selection
of packfiles I found on my harddrive, the results are here:
https://spreadsheets.google.com/ccc?key=tLCQElyyd2RKN9QevfvgwGQ&hl=en_GB#gid=1
My interpretation of the results is that the streaming size threshold
isn't beneficial (actually seems to be very slightly detrimental) -so
we should just get rid of it. This tallies with some of the comments
Shawn & I had for the default value of streamFileThreshold in the
review for I862afd4c:
http://egit.eclipse.org/r/#patch,sidebyside,2040,2,org.eclipse.jgit/src/org/eclipse/jgit/transport/IndexPack.java
The perf-test code is here: https://gist.github.com/735402
It's a bit scruffy but basically does 10 runs (in randomised order)
for each threshold size on various packfiles, waiting a second
between each pack-indexing to allow GC to catch up. I know it's not
perfect - proper perf testing is hard to do :-)
When indexing large blobs that are stored whole (non-delta form),
avoid allocating the entire blob in memory and instead stream it
through the SHA-1 checksum computation. This reduces the size
of memory required by IndexPack when processing very big blobs,
such as a 500 MiB uncompressable binary.
If the large blob already exists in the local repository, its
contents needs to be compared byte-for-byte after the entire pack
has been indexed, to ensure there isn't an unexpected SHA-1 collision
which may result in later data corruption. This compare is performed
as a streaming compare, again avoiding the large object allocation.
This change doesn't improve on memory utilization for large objects
stored as deltas. The change also doesn't improve handling for
any large commits, trees or annotated tags. There isn't much to
be done here for those objects, because they need to be passed down
to the ObjectChecker as a byte[]. Fortunately it isn't common for
these object types to be that large,
Bug: 312868
Change-Id: I862afd4cb78013ee033d4ec68c067b1774a05be8
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
CC: Roberto Tyley <roberto.tyley@guardian.co.uk>
Refactor IndexPack to use InputStream for inflation
By inflating with an InputStream like API, it is possible to stream
through large objects rather than allocating the entire thing as
a byte[]. This change only refactors the inflation code within
IndexPack to use a streaming interface.
Change-Id: I5a84b486901c2cf63fa6a3306dd5fb5c53b4056b
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
CC: Roberto Tyley <roberto.tyley@guardian.co.uk>
java.io.File.delete() reports failure as an exceptional
return value false. Fix the code which silently ignored
this exceptional return value. Also remove some duplicate
deletion helper methods.
Change-Id: I80ed20ca1f07a2bc6e779957a4ad0c713789c5be
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
IndexPack: Make translated progress messages non-static
These messages may need to change depending on the current
thread's configured locale, and thus cannot be static.
Change-Id: I96751a63852ec9c4bf6c47edadcf8752700543df
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
IndexPack: Use byte limited form of getCachedBytes
Currently our algorithm requires that we have the delta base as
a contiguous byte array... but getCachedBytes() might not work
if the object is considered to be large by its underlying loader.
Use the limited form to obtain the object as a byte array instead.
Change-Id: I33f12a8811cb6a4a67396174733f209db8119b42
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Fix concurrent read / write issue in LockFile on Windows
LockFile.commit fails if another thread concurrently reads
the base file. The problem is fixed by retrying the rename
operation if it fails.
Change-Id: I6bb76ea7f2e6e90e3ddc45f9dd4d69bd1b6fa1eb
Bug: 308506
Signed-off-by: Jens Baumgart <jens.baumgart@sap.com>
We didn't correctly handle the zlib trailer for an object. If the
trailer bytes were outside of the current buffer window but we had
fully inflated the object itself, we broke out of the loop (as we had
our target size) but inflate wasn't finished (as it did not yet get
the trailer) so we failed the test and threw a corruption exception.
Use an infinite loop and only break out when the inflater is done.
Change-Id: I7c9bbbeb577a990d9bc56a50ebd485935460f6c8
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
A programming error using the Inflater API led to an infinite
loop within IndexPack, caused by the Inflater returning 0 from
the inflate() method, but it didn't want more input. This happens
when it has reached the end of the stream, or has reached a spot
asking for an external dictionary. Such a case is a failure for us,
and we should abort out.
Thanks to Alex for pointing out that we had 3 implementations of
the inflate rountine, which should be consolidated into one and
use a switch to determine where to load data from.
Bug: 317416
Change-Id: I34120482375b687ea36ed9154002d77047e94b1f
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Similar to what we did on Repository, the openObject method
already implied we wanted to open an object, given its main
argument was of type AnyObjectId. Simplify the method name
to just the action, has or open.
Change-Id: If055e5e0d8de0e2424c18a773f6d2bc2f66054f4
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Going through ObjectReader.openObject(AnyObjectId) is faster, but
also produces cleaner application level code. The error checking
is done inside of the openObject method, which means it can be
removed from the application code.
Change-Id: Ia927b448d128005e1640362281585023582b1a3a
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Release ObjectReader before the cached ObjectDatabase
I don't want to play games with the order of release here, its
probably safer to release the reader before the database, just
in case the one depends on the other.
Change-Id: I2394c7d2477eaf7a7e1556fc3393c59d3b31e764
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Similar to what we did with the file code, move the pack writer
into its own package so the related classes and their package
private methods are hidden from the rest of the library.
Change-Id: Ic1b5c7c8c8d266e90c910d8d68dfc8e93586854f
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Move FileRepository to storage.file.FileRepository
This move isolates all of the local file specific implementation code
into a single package, where their package-private methods and support
classes are properly hidden away from the rest of the core library.
Because of the sheer number of files impacted, I have limited this
change to only the renames and the updated imports.
Change-Id: Icca4884e1a418f83f8b617d0c4c78b73d8a4bd17
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The WindowCache is an implementation detail of PackFile and how its
used by ObjectDirectory. Lets start to hide it and replace the public
API with a more generic concept, ObjectReader.
Because PackedObjectLoader is also considered a private detail of
PackFile, we have to make PackWriter temporarily dependent upon the
WindowCursor and thus FileRepository and ObjectDirectory in order to
just start the refactoring. In later changes we will clean up the
APIs more, exposing sufficient support to PackWriter without needing
the file specific implementation details.
Change-Id: I676be12b57f3534f1285854ee5de1aa483895398
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Use CoreConfig, UserConfig and TransferConfig directly
Rather than relying on the helpers in RepositoryConfig to get
these objects, obtain them directly through the Config API.
Its only slightly more verbose, but permits us to work with the
base Config class, which is more flexible than the highly file
specific RepositoryConfig.
This is what I really meant to do when I added the section parser
and caching support to Config, we just failed to finish updating
all of the call sites.
Change-Id: I481cb365aa00bfa8c21e5ad0cd367ddd9c6c0edd
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The strings are externalized into the root resource bundles.
The resource bundles are stored under the new "resources" source
folder to get proper maven build.
Strings from tests are, in general, not externalized. Only in
cases where it was necessary to make the test pass the strings
were externalized. This was typically necessary in cases where
e.getMessage() was used in assert and the exception message was
slightly changed due to reuse of the externalized strings.
Change-Id: Ic0f29c80b9a54fcec8320d8539a3e112852a1f7b
Signed-off-by: Sasa Zivkov <sasa.zivkov@sap.com>
IndexPack: Tighten up new and base object bookkeeping
The only current consumer of these collections is ReceivePack,
where it needs to test ObjectId equality between a RevObject and an
ObjectId. There we were copying from a traditional HashSet<ObjectId>
into an ObjectIdSubclassMap<ObjectId>, as the latter can perform
hashing using ObjectId's native value support, bypassing RevObject's
override on hashCode() and equals(). Instead of doing that copy,
directly create ObjectIdSubclassMap instances inside of ReceivePack.
We also only need to record the objects that do not appear in the
incoming pack, and were therefore copied from the local repositiory
in order to complete delta resolution. Instead of listing everything
that used an OBJ_REF_DELTA format, list only the objects that we
pulled from the destination repository via a normal ObjectLoader.
ReceivePack can now discard the IndexPack object, and all of its
other data, as soon as these collections are held by the check
connectivity method. This frees up memory for the ObjectWalk's
own RevObject pool.
Change-Id: I22ef71b45c2045a0202e7fd550a770ee1f6f38a6
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
IndexPack: Correct thin pack fix using less than 20 bytes
If we need to append less than 20 bytes in order to fix a thin pack
and make it complete, we need to set the length of our file back to
the actual number of bytes used because the original SHA-1 footer was
not completely overwritten. That extra data will confuse the header
and footer fixup logic when it tries to read to the end of the file.
This isn't a very common case to occur, which is why we've never
seen it before. Getting a delta that requires a whole object which
uses less than 20 bytes in pack representation is really hard.
Generally a delta generator won't make these, because the delta
would be bigger than simply deflating the whole object. I only
managed to do this with a hand-crafted pack file where a 1 byte
delta was pointed to a 1 byte whole object.
Normally we try really hard to avoid truncating, because its
typically not safe across network filesystems. But the odds of
this occurring are very low. This truncation is done on a file
we have open for writing, will append more content onto, and is
a temporary file that we won't move into position for others to
see until we've validated its SHA-1 is sane. I don't think the
truncate on NFS issue is something we need to worry about here.
Change-Id: I102b9637dfd048dc833c050890d142f43c1e75ae
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Allow users of ReceivePack access to the objects being sent
When implementing branch read access, we need to prove that the
newly created reference(s) point to objects that the user can see.
There are two ways that an object is reachable:
1) It's reachable from a branch or change the user can see
2) It was uploaded as part of the pack file the user sent us
This change adds additional methods in ReceivePack that will allow a
server to check the above conditions, in order to ensure that a user
is not trying to create a reference that they cannot see, or that a
malicious user isn't attempting to forge the SHA-1 of an object that
they cannot see in order to base a change off of it.
Change-Id: Ieba75b4f0331e06a03417c37f4ae1ebca4fbee5a
Added caching for loose object lookup during pack indexing
On Windows systems, file system lookup is a slow operation, so
checking each object if it exists during indexing (after receiving
the pack) could take a siginificant time. This patch introduces
CachedObjectDirectory that pre-caches lookup results.
Bug: 300397
Change-Id: I471b93f9bb3ee173eb37cae1d75e9e4eb49985e7
Signed-off-by: Constantine Plotnikov <constantine.plotnikov@gmail.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
As discussed on the egit-dev mailing list, we prefer not to have
trailing whitespace in our source code. Correct all currently
offending lines by trimming them.
Change-Id: I002b1d1980071084c0bc53242c8f5900970e6845
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Per CQ 3448 this is the initial contribution of the JGit project
to eclipse.org. It is derived from the historical JGit repository
at commit 3a2dd9921c.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>