mirrors/jgit - jgit - source @ dussan.org

Commit grafiek

Auteur	SHA1	Bericht	Datum
Nasser Grainawi	efb154fc24	Rename PackFile to Pack Pack better represents the purpose of the object and paves the way to add a PackFile object that extends File. Change-Id: I39b4f697902d395e9b6df5e8ce53078ce72fcea3 Signed-off-by: Nasser Grainawi <quic_nasserg@quicinc.com>	3 jaren geleden
David Pursehouse	9be93b7991	Remove redundant "static" qualifier from enum declarations Nested enum types are implicitly static. Change-Id: Id3d7886087494fb67bc0d080b4a3491fb4baac19 Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>	4 jaren geleden
Matthias Sohn	5c5f7c6b14	Update EDL 1.0 license headers to new short SPDX compliant format This is the format given by the Eclipse legal doc generator [1]. [1] https://www.eclipse.org/projects/tools/documentation.php?id=technology.jgit Bug: 548298 Change-Id: I8d8cabc998ba1b083e3f0906a8d558d391ffb6c4 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	4 jaren geleden
Matthias Sohn	5480da5999	Fix javadoc in org.eclipse.jgit storage/file package Change-Id: Ieb2f66aef2cab7e2a6d8e35c5f5047da881994dd Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	6 jaren geleden
Shawn Pearce	d1aacc415a	Fix MissingObjectException race in ObjectDirectory Johannes Carlsson identified a race condition[1] that can lead to spurious MissingObjectExceptions at read time. If two threads are active inside of ObjectDirectory looking for a packed object and the packList is currently the empty NO_PACKS list, thread A will find no object and eventually consider tryAgain1(). If thread A is put to sleep and this point and thread B also does not find the object, loads the packs, when thread A wakes up its tryAgain1 would return false and the thread never considers the packs. Rework the internal API of ObjectDirectory to keep a handle on the exact PackList that was iterated by thread A, allowing it to always retry walking through the packs if the new PackList is different. This had some ripple effect into the CachedObjectDirectory and the shared FileObjectDatabase interface. The new code should be slightly easier to follow, especially from the perspective of the CachedObjectDirectory trying to minimize the number of open system calls it makes to files matching "$GIT_DIR/objects/??/?x{38}". [1] http://dev.eclipse.org/mhonarc/lists/jgit-dev/msg02401.html Change-Id: I9a1c9d6ad6cb38404b7b9178167b714077561353	10 jaren geleden
Shawn Pearce	f32b861243	JGit 3.0: move internal classes into an internal subpackage This breaks all existing callers once. Applications are not supposed to build against the internal storage API unless they can accept API churn and make necessary updates as versions change. Change-Id: I2ab1327c202ef2003565e1b0770a583970e432e9	11 jaren geleden
Shawn Pearce	3760e4319b	Remove cached_packs support in favor of bitmaps The bitmap code in PackWriter knows exactly when to use a pack as a "cached pack". It enables cached pack usage only when the pack has a bitmap and its entire closure of objects needs to be sent. This is a much simpler code path to maintain, and JGit actually has a way to write the necessary index. Change-Id: I2645d482f8733fdf0c4120cc59ba9aa4d4ba6881	11 jaren geleden
Colby Ranger	3b325917a5	Added read/write support for pack bitmap index. A pack bitmap index is an additional index of compressed bitmaps of the object graph. Furthermore, a logical API of the index functionality is included, as it is expected to be used by the PackWriter. Compressed bitmaps are created using the javaewah library, which is a word-aligned compressed variant of the Java bitset class based on run-length encoding. The library only works with positive integer values. Thus, the maximum number of ObjectIds in a pack file that this index can currently support is limited to Integer.MAX_VALUE. Every ObjectId is given an integer mapping. The integer is the position of the ObjectId in the complete ObjectId list, sorted by offset, for the pack file. That integer is what the bitmaps use to reference the ObjectId. Currently, the new index format can only be used with pack files that contain a complete closure of the object graph e.g. the result of a garbage collection. The index file includes four bitmaps for the Git object types i.e. commits, trees, blobs, and tags. In addition, a collection of bitmaps keyed by an ObjectId is also included. The bitmap for each entry in the collection represents the full closure of ObjectIds reachable from the keyed ObjectId (including the keyed ObjectId itself). The bitmaps are further compressed by XORing the current bitmaps against prior bitmaps in the index, and selecting the smallest representation. The XOR'd bitmap and offset from the current entry to the position of the bitmap to XOR against is the actual representation of the entry in the index file. Each entry contains one byte, which is currently used to note whether the bitmap should be blindly reused. Change-Id: Id328724bf6b4c8366a088233098c18643edcf40f	11 jaren geleden
Colby Ranger	82ecfb3e31	Remove packIndex field from FileObjDatabase openPack method. Previously, the FileObjDatabase required both the pack file path and index file path to be passed to openPack(). A future change to add a bitmap index will add a .bitmap file parallel to the pack file (similar to the .idx file). Update the PackFile to support automatically loading pack index extensions based on the pack file path. Change-Id: Ifc8fc3e57f4afa177ba5a88df87334dbfa799f01	11 jaren geleden
Marc Strapetz	67edd3eda7	RevWalk support for shallow clones StartGenerator now processes .git/shallow to have the RevWalk stop for shallow commits. See RevWalkShallowTest for tests. Bug: 394543 CQ: 6908 Change-Id: Ia5af1dab3fe9c7888f44eeecab1e1bcf2e8e48fe Signed-off-by: Chris Aniszczyk <zx@twitter.com>	11 jaren geleden
Shawn O. Pearce	461b012e95	PackWriter: Support reuse of entire packs The most expensive part of packing a repository for transport to another system is enumerating all of the objects in the repository. Once this gets to the size of the linux-2.6 repository (1.8 million objects), enumeration can take several CPU minutes and costs a lot of temporary working set memory. Teach PackWriter to efficiently reuse an existing "cached pack" by answering a clone request with a thin pack followed by a larger cached pack appended to the end. This requires the repository owner to first construct the cached pack by hand, and record the tip commits inside of $GIT_DIR/objects/info/cached-packs: cd $GIT_DIR root=$(git rev-parse master) tmp=objects/.tmp-$$ names=$(echo $root \| git pack-objects --keep-true-parents --revs $tmp) for n in $names; do chmod a-w $tmp-$n.pack $tmp-$n.idx touch objects/pack/pack-$n.keep mv $tmp-$n.pack objects/pack/pack-$n.pack mv $tmp-$n.idx objects/pack/pack-$n.idx done (echo "+ $root"; for n in $names; do echo "P $n"; done; echo) >>objects/info/cached-packs git repack -a -d When a clone request needs to include $root, the corresponding cached pack will be copied as-is, rather than enumerating all of the objects that are reachable from $root. For a linux-2.6 kernel repository that should be about 376 MiB, the above process creates two packs of 368 MiB and 38 MiB[1]. This is a local disk usage increase of ~26 MiB, due to reduced delta compression between the large cached pack and the smaller recent activity pack. The overhead is similar to 1 full copy of the compressed project sources. With this cached pack in hand, JGit daemon completes a clone request in 1m17s less time, but a slightly larger data transfer (+2.39 MiB): Before: remote: Counting objects: 1861830, done remote: Finding sources: 100% (1861830/1861830) remote: Getting sizes: 100% (88243/88243) remote: Compressing objects: 100% (88184/88184) Receiving objects: 100% (1861830/1861830), 376.01 MiB \| 19.01 MiB/s, done. remote: Total `1861830` (delta 4706), reused `1851053` (delta `1553844`) Resolving deltas: 100% (1564621/1564621), done. real 3m19.005s After: remote: Counting objects: 1601, done remote: Counting objects: 1828460, done remote: Finding sources: 100% (50475/50475) remote: Getting sizes: 100% (18843/18843) remote: Compressing objects: 100% (7585/7585) remote: Total `1861830` (delta 2407), reused `1856197` (delta 37510) Receiving objects: 100% (1861830/1861830), 378.40 MiB \| 31.31 MiB/s, done. Resolving deltas: 100% (1559477/1559477), done. real 2m2.938s Repository owners can periodically refresh their cached packs by repacking their repository, folding all newer objects into a larger cached pack. Since repacking is already considered to be a normal Git maintenance activity, this isn't a very big burden. [1] In this test $root was set back about two weeks. Change-Id: Ib87131d5c4b5e8c5cacb0f4fe16ff4ece554734b Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 jaren geleden
Shawn O. Pearce	1bf0c3cdb1	Refactor IndexPack to not require local filesystem By moving the logic that parses a pack stream from the network (or a bundle) into a type that can be constructed by an ObjectInserter, repository implementations have a chance to inject their own logic for storing object data received into the destination repository. The API isn't completely generic yet, there are still quite a few assumptions that the PackParser subclass is storing the data onto the local filesystem as a single file. But its about the simplest split of IndexPack I can come up with without completely ripping the code apart. Change-Id: I5b167c9cc6d7a7c56d0197c62c0fd0036a83ec6c Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>	13 jaren geleden
Matthias Sohn	45731756a5	[findbugs] Do not ignore exceptional return value java.io.File.delete() reports failure as an exceptional return value false. Fix the code which silently ignored this exceptional return value. Also remove some duplicate deletion helper methods. Change-Id: I80ed20ca1f07a2bc6e779957a4ad0c713789c5be Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	13 jaren geleden
Shawn O. Pearce	e51e06946f	Update CachedObjectDirectory when inserting objects If an ObjectInserter is created from a CachedObjectDirectory, we need to ensure the cache is updated whenever a new loose object is actually added to the loose objects directory, otherwise a future read from an ObjectReader on the CachedObjectDirectory might not be able to open the newly created object. We mostly had the infrastructure in place to implement this due to the injection of unpacked large deltas, but we didn't have a way to pass the ObjectId from ObjectDirectoryInserter to CachedObjectDirectory, because the inserter was using the underlying ObjectDirectory and not the CachedObjectDirectory. Redirecting to CachedObjectDirectory ensures the cache is updated. Change-Id: I1f7bdfacc7ad77ebdb885f655e549cc570652225 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 jaren geleden
Shawn O. Pearce	5fce8d81d8	Fix cloning of repositories with big objects When running IndexPack we use a CachedObjectDirectory, which knows what objects are loose and tries to avoid stat(2) calls for objects that do not exist in the repository, as stat(2) on Win32 is very slow. However large delta objects found in a pack file are expanded into a loose object, in order to avoid costly delta chain processing when that object is used as a base for another delta. If this expand occurs while working with the CachedObjectDirectory, we need to update the cached directory data to include this new object, otherwise it won't be available when we try to open it during the object verify phase. Bug: 324868 Change-Id: Idf0c76d4849d69aa415ead32e46a435622395d68 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 jaren geleden
Shawn O. Pearce	41dd9ed1c0	Unpack and cache large deltas as loose objects Instead of spooling large delta bases into temporary files and then immediately deleting them afterwards, spool the large delta out to a normal loose object. Later any requests for that large delta can be answered by reading from the loose object, which is much easier to stream efficiently for readers. Since the object is now duplicated, once in the pack as a delta and again as a loose object, any future prune-packed will automatically delete the loose object variant, releasing the wasted disk space. As prune-packed is run automatically during either repack or gc, and gc --auto triggers automatically based on the number of loose objects, we get automatic cache management for free. Large objects that were unpacked will be periodically cleared out, and will simply be restored later if they are needed again. After a short offline discussion with Junio Hamano today, we may want to propose a change to prune-packed to hold onto larger loose objects which also exist in pack files as deltas, if the loose object was recently accessed or modified in the last 2 days. Change-Id: I3668a3967c807010f48cd69f994dcbaaf582337c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 jaren geleden
Shawn O. Pearce	e29cd27961	Move ObjectDirectory streaming limit to WindowCacheConfig IDEs like Eclipse offer up the settings in WindowCacheConfig to the user as a global set of options that are configured for the entire JVM process, not per-repository, as the cache is shared across the entire JVM. The limit on how much we are willing to allocate for an object buffer is similar to the limit on how much we can use for data caches, allocating that much space impacts the entire JVM and not just a single repository, so it should be a global limit. Change-Id: I22eafb3e223bf8dea57ece82cd5df8bfe5badebc Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 jaren geleden
Shawn O. Pearce	a5c18fcfc7	Fully implement SHA-1 abbreviations ObjectReader implementations are now responsible for creating the unique abbreviation of an ObjectId, or for resolving an abbreviation back to its full form. In this latter case the reader can offer up multiple candidates to the caller, who may be able to disambiguate them based on context. Repository.resolve() doesn't take multiple candidates into account right now, but it could in the future by looking for a remaining ^0 or ^{commit} suffix and take an expansion if there is only one commit that matches the input abbreviation. It could also use the distance from an annotated tag to resolve "tag-NNN-gcommit" style strings that are often output by `git describe`. Change-Id: Icd3250adc8177ae05278b858933afdca0cbbdb56 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 jaren geleden
Shawn O. Pearce	b584cb8754	Add getObjectSize to ObjectReader This is an informational function used by PackWriter to help it better organize objects for delta compression. Storage systems can implement it to provide up more detailed size information, or they can simply rely on the default behavior that uses the ObjectLoader obtained from open. For local file storage, we can obtain this information faster through specialized routines that parse a pack object header. Change-Id: I13a09b4effb71ea5151b51547f7d091564531e58 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 jaren geleden
Shawn O. Pearce	113577617b	Use core.streamFileThreshold to set our streaming limit We default this to 1 MiB for now, but we allow users to modify it through the Repository's configuration file to be a different value. A new repository listener is used to identify when the setting has been updated and trigger a reconfiguration of any active ObjectReaders. To prevent a horrible explosion we cap core.streamFileThreshold at no more than 1/4 of the maximum JVM heap size. We do this because we need at least 2 byte arrays equal in size to the stream threshold for the worst case delta inflation scenario, and our host application probably also needs some amount of the heap for their working set size. Change-Id: I103b3a541dc970bbf1a6d92917a12c5a1ee34d6c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 jaren geleden
Shawn O. Pearce	aa4b06e087	Rename openObject, hasObject to just open, has Similar to what we did on Repository, the openObject method already implied we wanted to open an object, given its main argument was of type AnyObjectId. Simplify the method name to just the action, has or open. Change-Id: If055e5e0d8de0e2424c18a773f6d2bc2f66054f4 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 jaren geleden
Shawn O. Pearce	ea21c111cb	Move PackWriter over to storage.pack.PackWriter Similar to what we did with the file code, move the pack writer into its own package so the related classes and their package private methods are hidden from the rest of the library. Change-Id: Ic1b5c7c8c8d266e90c910d8d68dfc8e93586854f Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 jaren geleden
Shawn O. Pearce	ad5238dc67	Move FileRepository to storage.file.FileRepository This move isolates all of the local file specific implementation code into a single package, where their package-private methods and support classes are properly hidden away from the rest of the core library. Because of the sheer number of files impacted, I have limited this change to only the renames and the updated imports. Change-Id: Icca4884e1a418f83f8b617d0c4c78b73d8a4bd17 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 jaren geleden
Shawn O. Pearce	bf4ffff07f	Redo PackWriter object reuse selection The new selection implementation uses a public API on the ObjectReader, allowing the storage library to enumerate its candidates and select the best one for this packer without needing to build a temporary list of the candidates first. Change-Id: Ie01496434f7d3581d6d3bbb9e33c8f9fa649b6cd Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 jaren geleden
Shawn O. Pearce	5cfc29b491	Replace WindowCache with ObjectReader The WindowCache is an implementation detail of PackFile and how its used by ObjectDirectory. Lets start to hide it and replace the public API with a more generic concept, ObjectReader. Because PackedObjectLoader is also considered a private detail of PackFile, we have to make PackWriter temporarily dependent upon the WindowCursor and thus FileRepository and ObjectDirectory in order to just start the refactoring. In later changes we will clean up the APIs more, exposing sufficient support to PackWriter without needing the file specific implementation details. Change-Id: I676be12b57f3534f1285854ee5de1aa483895398 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 jaren geleden
Shawn O. Pearce	133c987f4d	Refactor alternate object databases below ObjectDirectory Not every object storage system will have the concept of alternate object databases to search, and even if they do, they may not have the notion of fast-access / slow-access split like we do within the ObjectDirectory code for pack files and loose objects. Push all of that down below the generic API so that it is a hidden detail of the ObjectDirectory and its related supporting classes. Change-Id: I54bc1ca5ff2ac94dfffad1f9a9dad7af202b9523 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 jaren geleden

7 Commits (efb154fc24fbf416ae3513942fa720128358b31b)