mirrors/jgit - jgit - source @ dussan.org

Commit Graph

Author	SHA1	Message	Date
roberto	b5f0a7d7ff	IndexPack: Use stack-based recursion for delta resolution Replace 'method' with 'heap'-based recursion for resolving deltas. Git packfile delta-chain depth can exceed 50 levels in certain files (the packfile of the JGit project itself has >800 objects with chain-length >50). Using method-based recursion on such packfiles will quickly throw a StackOverflowError on VMs with constrained stack. Benefits: * packfile delta-resolution no longer limited by the maximum number of stack frames permitted on the current thread. * slight performance improvement (3% speed increase on the packfile of the JGit project) Change-Id: I1d9b3a8ba3c6d874d83cb93ebf171c6ab193e6cc Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
roberto	941b3d8a81	IndexPack: Remove blob-streaming size threshold Always use streaming (for SHA-checksum & collision detection) when indexing whole blobs, regardless of their size. Positives: * benefits of bugfix #312868 will apply to all runtimes, without additional conf for mem-constrained JVMs (5MB huge for some) * no byte array allocation (re-uses readBuffer instead of allocating new full-size array) * mildly better overall performance (given the usual blob-does-not-need-collision-checking case) * removes unnecessary code Negative: * doubles the disk IO for a blob comparision (comparitively rare occurance) I perf-tested a range of threshold sizes against a random selection of packfiles I found on my harddrive, the results are here: https://spreadsheets.google.com/ccc?key=tLCQElyyd2RKN9QevfvgwGQ&hl=en_GB#gid=1 My interpretation of the results is that the streaming size threshold isn't beneficial (actually seems to be very slightly detrimental) -so we should just get rid of it. This tallies with some of the comments Shawn & I had for the default value of streamFileThreshold in the review for I862afd4c: http://egit.eclipse.org/r/#patch,sidebyside,2040,2,org.eclipse.jgit/src/org/eclipse/jgit/transport/IndexPack.java The perf-test code is here: https://gist.github.com/735402 It's a bit scruffy but basically does 10 runs (in randomised order) for each threshold size on various packfiles, waiting a second between each pack-indexing to allow GC to catch up. I know it's not perfect - proper perf testing is hard to do :-)	13 years ago
Shawn O. Pearce	c181e1ab8a	IndexPack: Use streaming for large whole blobs When indexing large blobs that are stored whole (non-delta form), avoid allocating the entire blob in memory and instead stream it through the SHA-1 checksum computation. This reduces the size of memory required by IndexPack when processing very big blobs, such as a 500 MiB uncompressable binary. If the large blob already exists in the local repository, its contents needs to be compared byte-for-byte after the entire pack has been indexed, to ensure there isn't an unexpected SHA-1 collision which may result in later data corruption. This compare is performed as a streaming compare, again avoiding the large object allocation. This change doesn't improve on memory utilization for large objects stored as deltas. The change also doesn't improve handling for any large commits, trees or annotated tags. There isn't much to be done here for those objects, because they need to be passed down to the ObjectChecker as a byte[]. Fortunately it isn't common for these object types to be that large, Bug: 312868 Change-Id: I862afd4cb78013ee033d4ec68c067b1774a05be8 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com> CC: Roberto Tyley <roberto.tyley@guardian.co.uk>	13 years ago
Shawn O. Pearce	c94efa8286	Refactor IndexPack to use InputStream for inflation By inflating with an InputStream like API, it is possible to stream through large objects rather than allocating the entire thing as a byte[]. This change only refactors the inflation code within IndexPack to use a streaming interface. Change-Id: I5a84b486901c2cf63fa6a3306dd5fb5c53b4056b Signed-off-by: Shawn O. Pearce <spearce@spearce.org> CC: Roberto Tyley <roberto.tyley@guardian.co.uk>	13 years ago
Matthias Sohn	45731756a5	[findbugs] Do not ignore exceptional return value java.io.File.delete() reports failure as an exceptional return value false. Fix the code which silently ignored this exceptional return value. Also remove some duplicate deletion helper methods. Change-Id: I80ed20ca1f07a2bc6e779957a4ad0c713789c5be Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	13 years ago
Shawn O. Pearce	f5e5b98c3a	IndexPack: Make translated progress messages non-static These messages may need to change depending on the current thread's configured locale, and thus cannot be static. Change-Id: I96751a63852ec9c4bf6c47edadcf8752700543df Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	a3945d1bc8	IndexPack: Use byte limited form of getCachedBytes Currently our algorithm requires that we have the delta base as a contiguous byte array... but getCachedBytes() might not work if the object is considered to be large by its underlying loader. Use the limited form to obtain the object as a byte array instead. Change-Id: I33f12a8811cb6a4a67396174733f209db8119b42 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Jens Baumgart	db82b8d7eb	Fix concurrent read / write issue in LockFile on Windows LockFile.commit fails if another thread concurrently reads the base file. The problem is fixed by retrying the rename operation if it fails. Change-Id: I6bb76ea7f2e6e90e3ddc45f9dd4d69bd1b6fa1eb Bug: 308506 Signed-off-by: Jens Baumgart <jens.baumgart@sap.com>	14 years ago
Shawn O. Pearce	b518189b5c	IndexPack: Fix spurious pack file corruption errors We didn't correctly handle the zlib trailer for an object. If the trailer bytes were outside of the current buffer window but we had fully inflated the object itself, we broke out of the loop (as we had our target size) but inflate wasn't finished (as it did not yet get the trailer) so we failed the test and threw a corruption exception. Use an infinite loop and only break out when the inflater is done. Change-Id: I7c9bbbeb577a990d9bc56a50ebd485935460f6c8 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	0b46e70155	Fix infinite loop in IndexPack A programming error using the Inflater API led to an infinite loop within IndexPack, caused by the Inflater returning 0 from the inflate() method, but it didn't want more input. This happens when it has reached the end of the stream, or has reached a spot asking for an external dictionary. Such a case is a failure for us, and we should abort out. Thanks to Alex for pointing out that we had 3 implementations of the inflate rountine, which should be consolidated into one and use a switch to determine where to load data from. Bug: 317416 Change-Id: I34120482375b687ea36ed9154002d77047e94b1f Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	aa4b06e087	Rename openObject, hasObject to just open, has Similar to what we did on Repository, the openObject method already implied we wanted to open an object, given its main argument was of type AnyObjectId. Simplify the method name to just the action, has or open. Change-Id: If055e5e0d8de0e2424c18a773f6d2bc2f66054f4 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	1ad2feb7b3	Remove Repository.openObject(ObjectReader, AnyObjectId) Going through ObjectReader.openObject(AnyObjectId) is faster, but also produces cleaner application level code. The error checking is done inside of the openObject method, which means it can be removed from the application code. Change-Id: Ia927b448d128005e1640362281585023582b1a3a Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	e01abbd543	Release ObjectReader before the cached ObjectDatabase I don't want to play games with the order of release here, its probably safer to release the reader before the database, just in case the one depends on the other. Change-Id: I2394c7d2477eaf7a7e1556fc3393c59d3b31e764 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	ea21c111cb	Move PackWriter over to storage.pack.PackWriter Similar to what we did with the file code, move the pack writer into its own package so the related classes and their package private methods are hidden from the rest of the library. Change-Id: Ic1b5c7c8c8d266e90c910d8d68dfc8e93586854f Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	ad5238dc67	Move FileRepository to storage.file.FileRepository This move isolates all of the local file specific implementation code into a single package, where their package-private methods and support classes are properly hidden away from the rest of the core library. Because of the sheer number of files impacted, I have limited this change to only the renames and the updated imports. Change-Id: Icca4884e1a418f83f8b617d0c4c78b73d8a4bd17 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	5cfc29b491	Replace WindowCache with ObjectReader The WindowCache is an implementation detail of PackFile and how its used by ObjectDirectory. Lets start to hide it and replace the public API with a more generic concept, ObjectReader. Because PackedObjectLoader is also considered a private detail of PackFile, we have to make PackWriter temporarily dependent upon the WindowCursor and thus FileRepository and ObjectDirectory in order to just start the refactoring. In later changes we will clean up the APIs more, exposing sufficient support to PackWriter without needing the file specific implementation details. Change-Id: I676be12b57f3534f1285854ee5de1aa483895398 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	e1b312b5f7	Use CoreConfig, UserConfig and TransferConfig directly Rather than relying on the helpers in RepositoryConfig to get these objects, obtain them directly through the Config API. Its only slightly more verbose, but permits us to work with the base Config class, which is more flexible than the highly file specific RepositoryConfig. This is what I really meant to do when I added the section parser and caching support to Config, we just failed to finish updating all of the call sites. Change-Id: I481cb365aa00bfa8c21e5ad0cd367ddd9c6c0edd Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Sasa Zivkov	f3d8a8ecad	Externalize strings from JGit The strings are externalized into the root resource bundles. The resource bundles are stored under the new "resources" source folder to get proper maven build. Strings from tests are, in general, not externalized. Only in cases where it was necessary to make the test pass the strings were externalized. This was typically necessary in cases where e.getMessage() was used in assert and the exception message was slightly changed due to reuse of the externalized strings. Change-Id: Ic0f29c80b9a54fcec8320d8539a3e112852a1f7b Signed-off-by: Sasa Zivkov <sasa.zivkov@sap.com>	14 years ago
Shawn O. Pearce	2bb8defa54	IndexPack: Tighten up new and base object bookkeeping The only current consumer of these collections is ReceivePack, where it needs to test ObjectId equality between a RevObject and an ObjectId. There we were copying from a traditional HashSet<ObjectId> into an ObjectIdSubclassMap<ObjectId>, as the latter can perform hashing using ObjectId's native value support, bypassing RevObject's override on hashCode() and equals(). Instead of doing that copy, directly create ObjectIdSubclassMap instances inside of ReceivePack. We also only need to record the objects that do not appear in the incoming pack, and were therefore copied from the local repositiory in order to complete delta resolution. Instead of listing everything that used an OBJ_REF_DELTA format, list only the objects that we pulled from the destination repository via a normal ObjectLoader. ReceivePack can now discard the IndexPack object, and all of its other data, as soon as these collections are held by the check connectivity method. This frees up memory for the ObjectWalk's own RevObject pool. Change-Id: I22ef71b45c2045a0202e7fd550a770ee1f6f38a6 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	06ee913c8d	IndexPack: Correct thin pack fix using less than 20 bytes If we need to append less than 20 bytes in order to fix a thin pack and make it complete, we need to set the length of our file back to the actual number of bytes used because the original SHA-1 footer was not completely overwritten. That extra data will confuse the header and footer fixup logic when it tries to read to the end of the file. This isn't a very common case to occur, which is why we've never seen it before. Getting a delta that requires a whole object which uses less than 20 bytes in pack representation is really hard. Generally a delta generator won't make these, because the delta would be bigger than simply deflating the whole object. I only managed to do this with a hand-crafted pack file where a 1 byte delta was pointed to a 1 byte whole object. Normally we try really hard to avoid truncating, because its typically not safe across network filesystems. But the odds of this occurring are very low. This truncation is done on a file we have open for writing, will append more content onto, and is a temporary file that we won't move into position for others to see until we've validated its SHA-1 is sane. I don't think the truncate on NFS issue is something we need to worry about here. Change-Id: I102b9637dfd048dc833c050890d142f43c1e75ae Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Nico Sallembien	19126f70e9	Allow users of ReceivePack access to the objects being sent When implementing branch read access, we need to prove that the newly created reference(s) point to objects that the user can see. There are two ways that an object is reachable: 1) It's reachable from a branch or change the user can see 2) It was uploaded as part of the pack file the user sent us This change adds additional methods in ReceivePack that will allow a server to check the above conditions, in order to ensure that a user is not trying to create a reference that they cannot see, or that a malicious user isn't attempting to forge the SHA-1 of an object that they cannot see in order to base a change off of it. Change-Id: Ieba75b4f0331e06a03417c37f4ae1ebca4fbee5a	14 years ago
Constantine Plotnikov	cc64794b24	Added caching for loose object lookup during pack indexing On Windows systems, file system lookup is a slow operation, so checking each object if it exists during indexing (after receiving the pack) could take a siginificant time. This patch introduces CachedObjectDirectory that pre-caches lookup results. Bug: 300397 Change-Id: I471b93f9bb3ee173eb37cae1d75e9e4eb49985e7 Signed-off-by: Constantine Plotnikov <constantine.plotnikov@gmail.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Alex Blewitt	4d91645e89	Remove trailing whitespace at end of line As discussed on the egit-dev mailing list, we prefer not to have trailing whitespace in our source code. Correct all currently offending lines by trimming them. Change-Id: I002b1d1980071084c0bc53242c8f5900970e6845 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Git Development Community	1a6964c827	Initial JGit contribution to eclipse.org Per CQ 3448 this is the initial contribution of the JGit project to eclipse.org. It is derived from the historical JGit repository at commit `3a2dd9921c`. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago

25 Commits (51dedfdc319d51aa729f3e2801c4a06491d6ddac)