mirrors/jgit - jgit - source @ dussan.org

Tree: 461b012e95

Author	SHA1	Message	Date
Shawn O. Pearce	461b012e95	PackWriter: Support reuse of entire packs The most expensive part of packing a repository for transport to another system is enumerating all of the objects in the repository. Once this gets to the size of the linux-2.6 repository (1.8 million objects), enumeration can take several CPU minutes and costs a lot of temporary working set memory. Teach PackWriter to efficiently reuse an existing "cached pack" by answering a clone request with a thin pack followed by a larger cached pack appended to the end. This requires the repository owner to first construct the cached pack by hand, and record the tip commits inside of $GIT_DIR/objects/info/cached-packs: cd $GIT_DIR root=$(git rev-parse master) tmp=objects/.tmp-$$ names=$(echo $root \| git pack-objects --keep-true-parents --revs $tmp) for n in $names; do chmod a-w $tmp-$n.pack $tmp-$n.idx touch objects/pack/pack-$n.keep mv $tmp-$n.pack objects/pack/pack-$n.pack mv $tmp-$n.idx objects/pack/pack-$n.idx done (echo "+ $root"; for n in $names; do echo "P $n"; done; echo) >>objects/info/cached-packs git repack -a -d When a clone request needs to include $root, the corresponding cached pack will be copied as-is, rather than enumerating all of the objects that are reachable from $root. For a linux-2.6 kernel repository that should be about 376 MiB, the above process creates two packs of 368 MiB and 38 MiB[1]. This is a local disk usage increase of ~26 MiB, due to reduced delta compression between the large cached pack and the smaller recent activity pack. The overhead is similar to 1 full copy of the compressed project sources. With this cached pack in hand, JGit daemon completes a clone request in 1m17s less time, but a slightly larger data transfer (+2.39 MiB): Before: remote: Counting objects: 1861830, done remote: Finding sources: 100% (1861830/1861830) remote: Getting sizes: 100% (88243/88243) remote: Compressing objects: 100% (88184/88184) Receiving objects: 100% (1861830/1861830), 376.01 MiB \| 19.01 MiB/s, done. remote: Total `1861830` (delta 4706), reused `1851053` (delta `1553844`) Resolving deltas: 100% (1564621/1564621), done. real 3m19.005s After: remote: Counting objects: 1601, done remote: Counting objects: 1828460, done remote: Finding sources: 100% (50475/50475) remote: Getting sizes: 100% (18843/18843) remote: Compressing objects: 100% (7585/7585) remote: Total `1861830` (delta 2407), reused `1856197` (delta 37510) Receiving objects: 100% (1861830/1861830), 378.40 MiB \| 31.31 MiB/s, done. Resolving deltas: 100% (1559477/1559477), done. real 2m2.938s Repository owners can periodically refresh their cached packs by repacking their repository, folding all newer objects into a larger cached pack. Since repacking is already considered to be a normal Git maintenance activity, this isn't a very big burden. [1] In this test $root was set back about two weeks. Change-Id: Ib87131d5c4b5e8c5cacb0f4fe16ff4ece554734b Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	f29741d1d8	amend commit: Support large delta packed objects as streams Rename the ByteWindow's inflate() method to setInput. We have completely refactored the purpose of this method to be feeding part (or all) of the window as input to the Inflater, and the actual inflate activity happens in the caller. Change-Id: Ie93a5bae0e9e637b5e822d56993ce6b562c6ad15 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	ad68553be4	Support large delta packed objects as streams Very large delta instruction streams, or deltas which use very large base objects, are now streamed through as large objects rather than being inflated into a byte array. This isn't the most efficient way to access delta encoded content, as we may need to rewind and reprocess the base object when there was a block moved within the file, but it will at least prevent the JVM from having its heap explode. When streaming a delta we have an inflater open for each level in the delta chain, to inflate the instruction set of the delta, as well as an inflater for the base level object. The base object is buffered, as is the top level delta requested by the application, but we do not buffer the intermediate delta streams. This keeps memory usage lower, so its closer to 1024 bytes per level in the chain, without having an adverse impact on raw throughput as the top-level buffer gets pushed down to the lowest stream that has the next region. Delta instructions transparently collapse here, if the top level does not copy a region from its base, the base won't materialize that part from its own base, etc. This allows us to avoid copying around a lot of segments which have been deleted from the final version. Change-Id: I724d45245cebb4bad2deeae7b896fc55b2dd49b3 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	ad5238dc67	Move FileRepository to storage.file.FileRepository This move isolates all of the local file specific implementation code into a single package, where their package-private methods and support classes are properly hidden away from the rest of the core library. Because of the sheer number of files impacted, I have limited this change to only the renames and the updated imports. Change-Id: Icca4884e1a418f83f8b617d0c4c78b73d8a4bd17 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	ece88b99eb	Redo PackWriter object reuse output Output of selected reuses is refactored to use a new ObjectReuseAsIs interface that extends the ObjectReader. This interface allows the reader to control how it performs the reuse into the output stream, but also allows it to throw an exception to request the writer to find a different candidate representation. The PackFile reuse code was overhauled, cleaning up the APIs so they aren't exposed in the object loader, but instead are now a single method on the PackFile itself. The reuse algorithm was changed to do a data verification pass, followed by the copy pass to the output. This permits us to work around a corrupt object in a pack file by seeking another copy of that object when this one is bad. The reuse code was also optimized for the common case, where the in-pack representation is under 16 KiB. In these smaller cases data is sent to the pack writer more directly, avoiding some copying. Change-Id: I6350c2b444118305e8446ce1dfd049259832bcca Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Git Development Community	1a6964c827	Initial JGit contribution to eclipse.org Per CQ 3448 this is the initial contribution of the JGit project to eclipse.org. It is derived from the historical JGit repository at commit `3a2dd9921c`. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago

4 Commits (461b012e9565af8174e5b9d2b2c3a582011ce77e)