mirrors/jgit - jgit - source @ dussan.org

Commit Graph

Author	SHA1	Message	Date
Fabio Ponciroli	67e1e9532b	Git clone (v2) fails because of stale file handle Bug: 573770 Change-Id: I78e1c7d3e042eaef64e85bc546af207478f2e334	2 years ago
Matthias Sohn	5c5f7c6b14	Update EDL 1.0 license headers to new short SPDX compliant format This is the format given by the Eclipse legal doc generator [1]. [1] https://www.eclipse.org/projects/tools/documentation.php?id=technology.jgit Bug: 548298 Change-Id: I8d8cabc998ba1b083e3f0906a8d558d391ffb6c4 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	4 years ago
Jonathan Nieder	647cc8f604	Remove unnecessary modifiers from interfaces This continues what commit `d9ac7ddf10` (Remove unnecessary modifiers from interfaces, 2018-11-15) started. Change-Id: I89720985a5a986722a0dcb9b5e9bbc25996bd5b3	5 years ago
Matthias Sohn	783dbf1b03	Fix javadoc in org.eclipse.jgit storage/pack package Change-Id: Id1b7d392e1bb36079edaf16450e73a044a318e7e Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	6 years ago
Shawn Pearce	6e5c71b358	Remove validate support when reusing cached pack Cached packs are only used when writing over the network or to a bundle file and reuse validation is always disabled in these two contexts. The client/consumer of the stream will be SHA-1 checksumming every object. Reuse validation is most critical during local GC to avoid silently ignoring corruption by stopping as soon as a problem is found and leaving everything alone for the end-user to debug and salvage. Cached packs are not supported during local GC as the bitmap rebuild logic does not support including a cached pack in the result. Strip out the validation and force PackWriter to always disable the cached pack feature if reuseValidation is enabled. Change-Id: If0d7baf2ae1bf1f7e71bf773151302c9f7887039	9 years ago
Shawn Pearce	f32b861243	JGit 3.0: move internal classes into an internal subpackage This breaks all existing callers once. Applications are not supposed to build against the internal storage API unless they can accept API churn and make necessary updates as versions change. Change-Id: I2ab1327c202ef2003565e1b0770a583970e432e9	11 years ago
Shawn Pearce	3760e4319b	Remove cached_packs support in favor of bitmaps The bitmap code in PackWriter knows exactly when to use a pack as a "cached pack". It enables cached pack usage only when the pack has a bitmap and its entire closure of objects needs to be sent. This is a much simpler code path to maintain, and JGit actually has a way to write the necessary index. Change-Id: I2645d482f8733fdf0c4120cc59ba9aa4d4ba6881	11 years ago
Colby Ranger	dafcb8f6db	Support creating pack bitmap indexes in PackWriter. Update the PackWriter to support writing out pack bitmap indexes, a parallel ".bitmap" file to the ".pack" file. Bitmaps are selected at commits every 1 to 5,000 commits for each unique path from the start. The most recent 100 commits are all bitmapped. The next 19,000 commits have a bitmaps every 100 commits. The remaining commits have a bitmap every 5,000 commits. Commits with more than 1 parent are prefered over ones with 1 or less. Furthermore, previously computed bitmaps are reused, if the previous entry had the reuse flag set, which is set when the bitmap was placed at the max allowed distance. Bitmaps are used to speed up the counting phase when packing, for requests that are not shallow. The PackWriterBitmapWalker uses a RevFilter to proactively mark commits with RevFlag.SEEN, when they appear in a bitmap. The walker produces the full closure of reachable ObjectIds, given the collection of starting ObjectIds. For fetch request, two ObjectWalks are executed to compute the ObjectIds reachable from the haves and from the wants. The ObjectIds needed to be written are determined by taking all the resulting wants AND NOT the haves. For clone requests, we get cached pack support for "free" since it is possible to determine if all of the ObjectIds in a pack file are included in the resulting list of ObjectIds to write. On my machine, the best times for clones and fetches of the linux kernel repository (with about 2.6M objects and 300K commits) are tabulated below: Operation Index V2 Index VE003 Clone 37530ms (524.06 MiB) 82ms (524.06 MiB) Fetch (1 commit back) 75ms 107ms Fetch (10 commits back) 456ms (269.51 KiB) 341ms (265.19 KiB) Fetch (100 commits back) 449ms (269.91 KiB) 337ms (267.28 KiB) Fetch (1000 commits back) 2229ms ( 14.75 MiB) 189ms ( 14.42 MiB) Fetch (10000 commits back) 2177ms ( 16.30 MiB) 254ms ( 15.88 MiB) Fetch (100000 commits back) 14340ms (185.83 MiB) 1655ms (189.39 MiB) Change-Id: Icdb0cdd66ff168917fb9ef17b96093990cc6a98d	11 years ago
Colby Ranger	be7a135e94	Break the dependency on RevObject when creating a newObjectToPack(). Update the ObjectReuseAsIs API to support creating new ObjectToPack with only the AnyObjectId and Git object type. This is needed to support the future pack index bitmaps, which only contain this information and do not want the overhead of creating a temporary object for every ObjectId. Change-Id: I906360b471412688bf429ecef74fd988f47875dc	11 years ago
Shawn O. Pearce	9f5bbb5dd4	PackWriter: Speed up pruning of objects from cached packs During object enumeration for the thin pack, very few objects come out that are duplicated with the cached pack. Typically these are only cases where a blob or tree was cherry-picked forward, got a copy or rename, or was reverted... all relatively infrequent events. Speed up pruning of the thin pack object list by combining the phase with the object representation selection. Implementers should already be offering to reuse the object from the cached pack if it is stored there, at which point the implementation can perform a very fast type of containment test using the cached pack's identity rather than yet another index lookup. For the local disk case this is probably not a big improvement, but it does help on the DHT implementation where the two passes combined into one reduces latency. Change-Id: I6a07fc75d9075bf6233e967360b6546f9e9a2b33 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	a468cb57c2	PackWriter: Validate reused cached packs If object reuse validation is enabled, the output pack is going to probably be stored locally. When reusing an existing cached pack to save object enumeration costs, ensure the cached pack has not been corrupted by checking its SHA-1 trailer. If it has, writing will abort and the output pack won't be complete. This prevents anyone from trying to use the output pack, and catches corruption before it can be carried any further. Change-Id: If89d0d4e429d9f4c86f14de6c0020902705153e6 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	1b2062fe37	PackWriter: Avoid CRC-32 validation when feeding IndexPack There is no need to validate the object contents during copyObjectAsIs if the result is going to be parsed by unpack-objects or index-pack. Both programs will compute the SHA-1 of the object, and also validate most of the pack structure. For git daemon like servers, this work is already done on the client end of the connection, so the server doesn't need to repeat that work itself. Disable object validation for the 3 transport cases where we know the remote side will handle object validation for us (push, bundle creation, and upload pack). This improves performance on the server side by reducing the work that must be done. Change-Id: Iabb78eec45898e4a17f7aab3fb94c004d8d69af6 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	461b012e95	PackWriter: Support reuse of entire packs The most expensive part of packing a repository for transport to another system is enumerating all of the objects in the repository. Once this gets to the size of the linux-2.6 repository (1.8 million objects), enumeration can take several CPU minutes and costs a lot of temporary working set memory. Teach PackWriter to efficiently reuse an existing "cached pack" by answering a clone request with a thin pack followed by a larger cached pack appended to the end. This requires the repository owner to first construct the cached pack by hand, and record the tip commits inside of $GIT_DIR/objects/info/cached-packs: cd $GIT_DIR root=$(git rev-parse master) tmp=objects/.tmp-$$ names=$(echo $root \| git pack-objects --keep-true-parents --revs $tmp) for n in $names; do chmod a-w $tmp-$n.pack $tmp-$n.idx touch objects/pack/pack-$n.keep mv $tmp-$n.pack objects/pack/pack-$n.pack mv $tmp-$n.idx objects/pack/pack-$n.idx done (echo "+ $root"; for n in $names; do echo "P $n"; done; echo) >>objects/info/cached-packs git repack -a -d When a clone request needs to include $root, the corresponding cached pack will be copied as-is, rather than enumerating all of the objects that are reachable from $root. For a linux-2.6 kernel repository that should be about 376 MiB, the above process creates two packs of 368 MiB and 38 MiB[1]. This is a local disk usage increase of ~26 MiB, due to reduced delta compression between the large cached pack and the smaller recent activity pack. The overhead is similar to 1 full copy of the compressed project sources. With this cached pack in hand, JGit daemon completes a clone request in 1m17s less time, but a slightly larger data transfer (+2.39 MiB): Before: remote: Counting objects: 1861830, done remote: Finding sources: 100% (1861830/1861830) remote: Getting sizes: 100% (88243/88243) remote: Compressing objects: 100% (88184/88184) Receiving objects: 100% (1861830/1861830), 376.01 MiB \| 19.01 MiB/s, done. remote: Total `1861830` (delta 4706), reused `1851053` (delta `1553844`) Resolving deltas: 100% (1564621/1564621), done. real 3m19.005s After: remote: Counting objects: 1601, done remote: Counting objects: 1828460, done remote: Finding sources: 100% (50475/50475) remote: Getting sizes: 100% (18843/18843) remote: Compressing objects: 100% (7585/7585) remote: Total `1861830` (delta 2407), reused `1856197` (delta 37510) Receiving objects: 100% (1861830/1861830), 378.40 MiB \| 31.31 MiB/s, done. Resolving deltas: 100% (1559477/1559477), done. real 2m2.938s Repository owners can periodically refresh their cached packs by repacking their repository, folding all newer objects into a larger cached pack. Since repacking is already considered to be a normal Git maintenance activity, this isn't a very big burden. [1] In this test $root was set back about two weeks. Change-Id: Ib87131d5c4b5e8c5cacb0f4fe16ff4ece554734b Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	a017fdf112	Allow ObjectReuseAsIs to resort objects during writing It can be very handy for the implementation to resort the object list based on data locality, improving prefetch in the operating system's buffer cache. Export the list to the implementation was a proper List, and document that its mutable and OK to be modified. The only caller in PackWriter is already OK with these rules. Change-Id: I3f51cf4388898917b2be36670587a5aee902ff10 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	28ba4747bc	Allow ObjectReuseAsIs to have more control over write ordering The reuse system used by an object database may be able to benefit from knowing what objects are coming next, and even improve data throughput by delaying (or moving up) objects that are stored near each other in the source database. Pushing the iteration down into the reuse code makes it possible for a smarter implementation to aggregate reuse. But for the standard pack file format on disk we don't bother, its quite efficient already. Change-Id: I64f0048ca7071a8b44950d6c2a5dfbca3be6bba6 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	b85af06324	Allow object reuse selection to occur in parallel ObjectReader implementations may wish to use multiple threads in order to evaluate object reuse faster. Let the reader make that decision by passing the iteration down into the reader. Because the work is pushed into the reader, it may need to locate a given ObjectToPack given its ObjectId. This can easily occur if the reader has sent a list of ObjectIds to the object database and gets back information keyed only by ObjectId, without the ObjectToPack handle. Expose lookup using the PackWriter's own internal map, so the reader doesn't need to build a redundant copy to track the assocation of ObjectId back to ObjectToPack. Change-Id: I0c536405a55034881fb5db92a2d2a99534faed34 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	ea21c111cb	Move PackWriter over to storage.pack.PackWriter Similar to what we did with the file code, move the pack writer into its own package so the related classes and their package private methods are hidden from the rest of the library. Change-Id: Ic1b5c7c8c8d266e90c910d8d68dfc8e93586854f Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	ece88b99eb	Redo PackWriter object reuse output Output of selected reuses is refactored to use a new ObjectReuseAsIs interface that extends the ObjectReader. This interface allows the reader to control how it performs the reuse into the output stream, but also allows it to throw an exception to request the writer to find a different candidate representation. The PackFile reuse code was overhauled, cleaning up the APIs so they aren't exposed in the object loader, but instead are now a single method on the PackFile itself. The reuse algorithm was changed to do a data verification pass, followed by the copy pass to the output. This permits us to work around a corrupt object in a pack file by seeking another copy of that object when this one is bad. The reuse code was also optimized for the common case, where the in-pack representation is under 16 KiB. In these smaller cases data is sent to the pack writer more directly, avoiding some copying. Change-Id: I6350c2b444118305e8446ce1dfd049259832bcca Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	bf4ffff07f	Redo PackWriter object reuse selection The new selection implementation uses a public API on the ObjectReader, allowing the storage library to enumerate its candidates and select the best one for this packer without needing to build a temporary list of the candidates first. Change-Id: Ie01496434f7d3581d6d3bbb9e33c8f9fa649b6cd Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	6fc3ecac84	Extract PackFile specific code to ObjectToPack subclass The ObjectReader class is dual-purposed into being a factory for the ObjectToPack, permitting specific ObjectDatabase implementations to override the method and offer their own custom subclass of the generic ObjectToPack class. By allowing them to directly extend the type, each implementation can add custom fields to support tracking where an object is stored, without incurring any additional penalties like a parallel Map<ObjectId,Object> would cost. The reader was chosen to act as a factory rather than the database, as the reader will eventually be tied more tightly with the ObjectWalk and TreeWalk. During object enumeration the reader would have had to load the object for the RevWalk, and may chose to cache object position data internally so it can later be reused and fed into the ObjectToPack instance supplied to the PackWriter. Since a reader is not thread-safe, and is scoped to this PackWriter and its internal ObjectWalk, its a great place for the database to perform caching, if any. Right now this change goes a bit backwards by changing what should be generic ObjectToPack references inside of PackWriter to the very PackFile specific LocalObjectToPack subclass. We will correct these in a later commit as we start to refine what the ObjectToPack API will eventually look like in order to better support the PackWriter. Change-Id: I9f047d26b97e46dee3bc0ccb4060bbebedbe8ea9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	5cfc29b491	Replace WindowCache with ObjectReader The WindowCache is an implementation detail of PackFile and how its used by ObjectDirectory. Lets start to hide it and replace the public API with a more generic concept, ObjectReader. Because PackedObjectLoader is also considered a private detail of PackFile, we have to make PackWriter temporarily dependent upon the WindowCursor and thus FileRepository and ObjectDirectory in order to just start the refactoring. In later changes we will clean up the APIs more, exposing sufficient support to PackWriter without needing the file specific implementation details. Change-Id: I676be12b57f3534f1285854ee5de1aa483895398 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Alex Blewitt	4d91645e89	Remove trailing whitespace at end of line As discussed on the egit-dev mailing list, we prefer not to have trailing whitespace in our source code. Correct all currently offending lines by trimming them. Change-Id: I002b1d1980071084c0bc53242c8f5900970e6845 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Git Development Community	1a6964c827	Initial JGit contribution to eclipse.org Per CQ 3448 this is the initial contribution of the JGit project to eclipse.org. It is derived from the historical JGit repository at commit `3a2dd9921c`. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago

6 Commits (67e1e9532b7c62b3c6bb269255bed1134d7d4752)