mirrors/jgit - jgit - source @ dussan.org

Commit Graph

Tekijä	SHA1	Viesti	Päivämäärä
Fabio Ponciroli	235fe7a50a	Git clone (v2) fails because of stale file handle Bug: 573770 Change-Id: I78e1c7d3e042eaef64e85bc546af207478f2e334	3 vuotta sitten
Nasser Grainawi	efb154fc24	Rename PackFile to Pack Pack better represents the purpose of the object and paves the way to add a PackFile object that extends File. Change-Id: I39b4f697902d395e9b6df5e8ce53078ce72fcea3 Signed-off-by: Nasser Grainawi <quic_nasserg@quicinc.com>	3 vuotta sitten
Matthias Sohn	5c5f7c6b14	Update EDL 1.0 license headers to new short SPDX compliant format This is the format given by the Eclipse legal doc generator [1]. [1] https://www.eclipse.org/projects/tools/documentation.php?id=technology.jgit Bug: 548298 Change-Id: I8d8cabc998ba1b083e3f0906a8d558d391ffb6c4 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	4 vuotta sitten
Han-Wen Nienhuys	f3ec7cf3f0	Remove further unnecessary 'final' keywords Remove it from * package private functions. * try blocks * for loops this was done with the following python script: $ cat f.py import sys import re import os def replaceFinal(m): return m.group(1) + "(" + m.group(2).replace('final ', '') + ")" methodDecl = re.compile(r"^([\t ][a-zA-Z_ ]+)$([^)])$") def subst(fn): input = open(fn) os.rename(fn, fn + "~") dest = open(fn, 'w') for l in input: l = methodDecl.sub(replaceFinal, l) dest.write(l) dest.close() for root, dirs, files in os.walk(".", topdown=False): for f in files: if not f.endswith('.java'): continue full = os.path.join(root, f) print full subst(full) Change-Id: If533a75a417594fc893e7c669d2c1f0f6caeb7ca Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>	6 vuotta sitten
Matthias Sohn	5480da5999	Fix javadoc in org.eclipse.jgit storage/file package Change-Id: Ieb2f66aef2cab7e2a6d8e35c5f5047da881994dd Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	6 vuotta sitten
David Pursehouse	3b4448637f	Enable and fix warnings about redundant specification of type arguments Since the introduction of generic type parameter inference in Java 7, it's not necessary to explicitly specify the type of generic parameters. Enable the warning in Eclipse, and fix all occurrences. Change-Id: I9158caf1beca5e4980b6240ac401f3868520aad0 Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>	7 vuotta sitten
David Pursehouse	7ac182f4e4	Enable and fix 'Should be tagged with @Override' warning Set missingOverrideAnnotation=warning in Eclipse compiler preferences which enables the warning: The method <method> of type <type> should be tagged with @Override since it actually overrides a superclass method Justification for this warning is described in: http://stackoverflow.com/a/94411/381622 Enabling this causes in excess of 1000 warnings across the entire code-base. They are very easy to fix automatically with Eclipse's "Quick Fix" tool. Fix all of them except 2 which cause compilation failure when the project is built with mvn; add TODO comments on those for further investigation. Change-Id: I5772061041fd361fe93137fd8b0ad356e748a29c Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>	7 vuotta sitten
Kevin Corcoran	fa0a93119c	Make streamFileThreshold configurable Previously, the streamFileThreshold, the threshold at which a file would be streamed rather than loaded entirely into memory, was only configurable on a global basis. This commit makes this threshold configurable on a per-loader basis. Bug: 490404 Change-Id: I492c18c3155dbf56eedda9044a61d76120fd75f9 Signed-off-by: Kevin Corcoran <kevin.corcoran@puppetlabs.com> Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>	8 vuotta sitten
Dave Borowitz	adff322a69	Expose the ObjectInserter that created an ObjectReader We've found in Gerrit Code Review that it is common to pass around both an ObjectReader (or more commonly a RevWalk wrapping one) and an ObjectInserter. These code paths often assume that the ObjectReader can read back any objects created by the ObjectInserter without flushing. However, we previously had no way to enforce that constraint programmatically, leading to hard-to-spot problems. Provide a solution by exposing the ObjectInserter that created an ObjectReader, when known. Callers can either continue passing both objects and check: reader.getCreatedFromInserter() == inserter or they can just pass around ObjectReader and extract the inserter when it's needed (checking that it's not null at usage time). Change-Id: Ibbf5d1968b506f6b47030ab1b046ffccb47352ea	8 vuotta sitten
Hugo Arès	27128b3e01	Fix WindowCursor memory leak. ObjectReader release method was replaced by close method but WindowCursor was still implementing release method. To prevent the same mistake again, make ObjectReader close method abstract to force sub classes to implement it. Change-Id: I50d0d1d19a26e306fd0dba77b246a95a44fd6584 Signed-off-by: Hugo Arès <hugo.ares@ericsson.com>	9 vuotta sitten
Matthias Sohn	2390531888	Externalize translatable texts in org.eclipse.jgit Change-Id: Ibf4c299f9d203c78cae79e61f88d4bea60ea2795 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	9 vuotta sitten
Shawn Pearce	6e5c71b358	Remove validate support when reusing cached pack Cached packs are only used when writing over the network or to a bundle file and reuse validation is always disabled in these two contexts. The client/consumer of the stream will be SHA-1 checksumming every object. Reuse validation is most critical during local GC to avoid silently ignoring corruption by stopping as soon as a problem is found and leaving everything alone for the end-user to debug and salvage. Cached packs are not supported during local GC as the bitmap rebuild logic does not support including a cached pack in the result. Strip out the validation and force PackWriter to always disable the cached pack feature if reuseValidation is enabled. Change-Id: If0d7baf2ae1bf1f7e71bf773151302c9f7887039	9 vuotta sitten
Matthias Sohn	57644f23a1	Provide more details in exceptions thrown when packfile is invalid Mention packfile path in exceptions thrown when we detect that a packfile is invalid and make excplicit that corrupt packs are removed from the pack list. Change-Id: I454ada5f8e69307d3f34d1c1b8f3cb87607ddf35 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	9 vuotta sitten
Shawn Pearce	94c4d7eee8	Cleanup use of java.util.Inflater, fixing rare infinite loops The native implementation of inflate() can set finished to return true at the same time as it copies the last bytes into the buffer. Check for finished on each iteration, terminating as soon as libz knows the stream was completely inflated. If not finished, it is likely input is required before the next native call could do any useful work. Most invocations are passing in a buffer large enough to store the entire result. A partial return from inflate() will need more input before it can continue. Checking right away that needsInput() is true saves a native call to determine no bytes can be inflated without more input. This should fix a rare infinite loop condition inside of inflation when an object ends exactly at the end of a block boundary, and the next block contains only the 20 byte trailing SHA-1. When the stream is finished each new attempt to inflate() returns n == 0, as no additional bytes were output. The needsInput() test tries to add the length of the footer block to itself, but then loops back around an reloads the same block as the block is smaller than a full block size. A zero length input is set to the inflater, which triggers needsInput() condition again. Change-Id: I95d02bfeab4bf995a254d49166b4ae62d1f21346	10 vuotta sitten
Shawn Pearce	f32b861243	JGit 3.0: move internal classes into an internal subpackage This breaks all existing callers once. Applications are not supposed to build against the internal storage API unless they can accept API churn and make necessary updates as versions change. Change-Id: I2ab1327c202ef2003565e1b0770a583970e432e9	11 vuotta sitten
Shawn Pearce	3760e4319b	Remove cached_packs support in favor of bitmaps The bitmap code in PackWriter knows exactly when to use a pack as a "cached pack". It enables cached pack usage only when the pack has a bitmap and its entire closure of objects needs to be sent. This is a much simpler code path to maintain, and JGit actually has a way to write the necessary index. Change-Id: I2645d482f8733fdf0c4120cc59ba9aa4d4ba6881	11 vuotta sitten
Colby Ranger	dafcb8f6db	Support creating pack bitmap indexes in PackWriter. Update the PackWriter to support writing out pack bitmap indexes, a parallel ".bitmap" file to the ".pack" file. Bitmaps are selected at commits every 1 to 5,000 commits for each unique path from the start. The most recent 100 commits are all bitmapped. The next 19,000 commits have a bitmaps every 100 commits. The remaining commits have a bitmap every 5,000 commits. Commits with more than 1 parent are prefered over ones with 1 or less. Furthermore, previously computed bitmaps are reused, if the previous entry had the reuse flag set, which is set when the bitmap was placed at the max allowed distance. Bitmaps are used to speed up the counting phase when packing, for requests that are not shallow. The PackWriterBitmapWalker uses a RevFilter to proactively mark commits with RevFlag.SEEN, when they appear in a bitmap. The walker produces the full closure of reachable ObjectIds, given the collection of starting ObjectIds. For fetch request, two ObjectWalks are executed to compute the ObjectIds reachable from the haves and from the wants. The ObjectIds needed to be written are determined by taking all the resulting wants AND NOT the haves. For clone requests, we get cached pack support for "free" since it is possible to determine if all of the ObjectIds in a pack file are included in the resulting list of ObjectIds to write. On my machine, the best times for clones and fetches of the linux kernel repository (with about 2.6M objects and 300K commits) are tabulated below: Operation Index V2 Index VE003 Clone 37530ms (524.06 MiB) 82ms (524.06 MiB) Fetch (1 commit back) 75ms 107ms Fetch (10 commits back) 456ms (269.51 KiB) 341ms (265.19 KiB) Fetch (100 commits back) 449ms (269.91 KiB) 337ms (267.28 KiB) Fetch (1000 commits back) 2229ms ( 14.75 MiB) 189ms ( 14.42 MiB) Fetch (10000 commits back) 2177ms ( 16.30 MiB) 254ms ( 15.88 MiB) Fetch (100000 commits back) 14340ms (185.83 MiB) 1655ms (189.39 MiB) Change-Id: Icdb0cdd66ff168917fb9ef17b96093990cc6a98d	12 vuotta sitten
Colby Ranger	3b325917a5	Added read/write support for pack bitmap index. A pack bitmap index is an additional index of compressed bitmaps of the object graph. Furthermore, a logical API of the index functionality is included, as it is expected to be used by the PackWriter. Compressed bitmaps are created using the javaewah library, which is a word-aligned compressed variant of the Java bitset class based on run-length encoding. The library only works with positive integer values. Thus, the maximum number of ObjectIds in a pack file that this index can currently support is limited to Integer.MAX_VALUE. Every ObjectId is given an integer mapping. The integer is the position of the ObjectId in the complete ObjectId list, sorted by offset, for the pack file. That integer is what the bitmaps use to reference the ObjectId. Currently, the new index format can only be used with pack files that contain a complete closure of the object graph e.g. the result of a garbage collection. The index file includes four bitmaps for the Git object types i.e. commits, trees, blobs, and tags. In addition, a collection of bitmaps keyed by an ObjectId is also included. The bitmap for each entry in the collection represents the full closure of ObjectIds reachable from the keyed ObjectId (including the keyed ObjectId itself). The bitmaps are further compressed by XORing the current bitmaps against prior bitmaps in the index, and selecting the smallest representation. The XOR'd bitmap and offset from the current entry to the position of the bitmap to XOR against is the actual representation of the entry in the index file. Each entry contains one byte, which is currently used to note whether the bitmap should be blindly reused. Change-Id: Id328724bf6b4c8366a088233098c18643edcf40f	12 vuotta sitten
Colby Ranger	be7a135e94	Break the dependency on RevObject when creating a newObjectToPack(). Update the ObjectReuseAsIs API to support creating new ObjectToPack with only the AnyObjectId and Git object type. This is needed to support the future pack index bitmaps, which only contain this information and do not want the overhead of creating a temporary object for every ObjectId. Change-Id: I906360b471412688bf429ecef74fd988f47875dc	12 vuotta sitten
Marc Strapetz	67edd3eda7	RevWalk support for shallow clones StartGenerator now processes .git/shallow to have the RevWalk stop for shallow commits. See RevWalkShallowTest for tests. Bug: 394543 CQ: 6908 Change-Id: Ia5af1dab3fe9c7888f44eeecab1e1bcf2e8e48fe Signed-off-by: Chris Aniszczyk <zx@twitter.com>	11 vuotta sitten
Robin Rosenberg	95d311f888	Move JGitText to an internal package Change-Id: I763590a45d75f00a09097ab6f89581a3bbd3c797	12 vuotta sitten
Shawn O. Pearce	53fb027284	Make DeltaBaseCache per-ObjectReader The 'Counting objects' phase of PackWriter requires good hit rates from the DeltaBaseCache while walking trees, the deltas need to find their bases in the cache in order to inflate in a reasonable time. If JGit is running in a multi-threaded server, such as Gerrit Code Review, each thread needs its own DeltaBaseCache to prevent one thread from evicting the other thread's relevant bases. Move the cache to be per-ObjectReader, lazily allocated when required by a PackFile. Change-Id: If9d5ed06728e813632ae96dcfb811f4860b276e8 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 vuotta sitten
Shawn O. Pearce	a468cb57c2	PackWriter: Validate reused cached packs If object reuse validation is enabled, the output pack is going to probably be stored locally. When reusing an existing cached pack to save object enumeration costs, ensure the cached pack has not been corrupted by checking its SHA-1 trailer. If it has, writing will abort and the output pack won't be complete. This prevents anyone from trying to use the output pack, and catches corruption before it can be carried any further. Change-Id: If89d0d4e429d9f4c86f14de6c0020902705153e6 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 vuotta sitten
Shawn O. Pearce	1b2062fe37	PackWriter: Avoid CRC-32 validation when feeding IndexPack There is no need to validate the object contents during copyObjectAsIs if the result is going to be parsed by unpack-objects or index-pack. Both programs will compute the SHA-1 of the object, and also validate most of the pack structure. For git daemon like servers, this work is already done on the client end of the connection, so the server doesn't need to repeat that work itself. Disable object validation for the 3 transport cases where we know the remote side will handle object validation for us (push, bundle creation, and upload pack). This improves performance on the server side by reducing the work that must be done. Change-Id: Iabb78eec45898e4a17f7aab3fb94c004d8d69af6 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 vuotta sitten
Shawn O. Pearce	461b012e95	PackWriter: Support reuse of entire packs The most expensive part of packing a repository for transport to another system is enumerating all of the objects in the repository. Once this gets to the size of the linux-2.6 repository (1.8 million objects), enumeration can take several CPU minutes and costs a lot of temporary working set memory. Teach PackWriter to efficiently reuse an existing "cached pack" by answering a clone request with a thin pack followed by a larger cached pack appended to the end. This requires the repository owner to first construct the cached pack by hand, and record the tip commits inside of $GIT_DIR/objects/info/cached-packs: cd $GIT_DIR root=$(git rev-parse master) tmp=objects/.tmp-$$ names=$(echo $root \| git pack-objects --keep-true-parents --revs $tmp) for n in $names; do chmod a-w $tmp-$n.pack $tmp-$n.idx touch objects/pack/pack-$n.keep mv $tmp-$n.pack objects/pack/pack-$n.pack mv $tmp-$n.idx objects/pack/pack-$n.idx done (echo "+ $root"; for n in $names; do echo "P $n"; done; echo) >>objects/info/cached-packs git repack -a -d When a clone request needs to include $root, the corresponding cached pack will be copied as-is, rather than enumerating all of the objects that are reachable from $root. For a linux-2.6 kernel repository that should be about 376 MiB, the above process creates two packs of 368 MiB and 38 MiB[1]. This is a local disk usage increase of ~26 MiB, due to reduced delta compression between the large cached pack and the smaller recent activity pack. The overhead is similar to 1 full copy of the compressed project sources. With this cached pack in hand, JGit daemon completes a clone request in 1m17s less time, but a slightly larger data transfer (+2.39 MiB): Before: remote: Counting objects: 1861830, done remote: Finding sources: 100% (1861830/1861830) remote: Getting sizes: 100% (88243/88243) remote: Compressing objects: 100% (88184/88184) Receiving objects: 100% (1861830/1861830), 376.01 MiB \| 19.01 MiB/s, done. remote: Total `1861830` (delta 4706), reused `1851053` (delta `1553844`) Resolving deltas: 100% (1564621/1564621), done. real 3m19.005s After: remote: Counting objects: 1601, done remote: Counting objects: 1828460, done remote: Finding sources: 100% (50475/50475) remote: Getting sizes: 100% (18843/18843) remote: Compressing objects: 100% (7585/7585) remote: Total `1861830` (delta 2407), reused `1856197` (delta 37510) Receiving objects: 100% (1861830/1861830), 378.40 MiB \| 31.31 MiB/s, done. Resolving deltas: 100% (1559477/1559477), done. real 2m2.938s Repository owners can periodically refresh their cached packs by repacking their repository, folding all newer objects into a larger cached pack. Since repacking is already considered to be a normal Git maintenance activity, this isn't a very big burden. [1] In this test $root was set back about two weeks. Change-Id: Ib87131d5c4b5e8c5cacb0f4fe16ff4ece554734b Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 vuotta sitten
Shawn O. Pearce	a017fdf112	Allow ObjectReuseAsIs to resort objects during writing It can be very handy for the implementation to resort the object list based on data locality, improving prefetch in the operating system's buffer cache. Export the list to the implementation was a proper List, and document that its mutable and OK to be modified. The only caller in PackWriter is already OK with these rules. Change-Id: I3f51cf4388898917b2be36670587a5aee902ff10 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 vuotta sitten
Shawn O. Pearce	e29cd27961	Move ObjectDirectory streaming limit to WindowCacheConfig IDEs like Eclipse offer up the settings in WindowCacheConfig to the user as a global set of options that are configured for the entire JVM process, not per-repository, as the cache is shared across the entire JVM. The limit on how much we are willing to allocate for an object buffer is similar to the limit on how much we can use for data caches, allocating that much space impacts the entire JVM and not just a single repository, so it should be a global limit. Change-Id: I22eafb3e223bf8dea57ece82cd5df8bfe5badebc Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 vuotta sitten
Shawn O. Pearce	a5c18fcfc7	Fully implement SHA-1 abbreviations ObjectReader implementations are now responsible for creating the unique abbreviation of an ObjectId, or for resolving an abbreviation back to its full form. In this latter case the reader can offer up multiple candidates to the caller, who may be able to disambiguate them based on context. Repository.resolve() doesn't take multiple candidates into account right now, but it could in the future by looking for a remaining ^0 or ^{commit} suffix and take an expansion if there is only one commit that matches the input abbreviation. It could also use the distance from an annotated tag to resolve "tag-NNN-gcommit" style strings that are often output by `git describe`. Change-Id: Icd3250adc8177ae05278b858933afdca0cbbdb56 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 vuotta sitten
Shawn O. Pearce	28ba4747bc	Allow ObjectReuseAsIs to have more control over write ordering The reuse system used by an object database may be able to benefit from knowing what objects are coming next, and even improve data throughput by delaying (or moving up) objects that are stored near each other in the source database. Pushing the iteration down into the reuse code makes it possible for a smarter implementation to aggregate reuse. But for the standard pack file format on disk we don't bother, its quite efficient already. Change-Id: I64f0048ca7071a8b44950d6c2a5dfbca3be6bba6 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 vuotta sitten
Shawn O. Pearce	b85af06324	Allow object reuse selection to occur in parallel ObjectReader implementations may wish to use multiple threads in order to evaluate object reuse faster. Let the reader make that decision by passing the iteration down into the reader. Because the work is pushed into the reader, it may need to locate a given ObjectToPack given its ObjectId. This can easily occur if the reader has sent a list of ObjectIds to the object database and gets back information keyed only by ObjectId, without the ObjectToPack handle. Expose lookup using the PackWriter's own internal map, so the reader doesn't need to build a redundant copy to track the assocation of ObjectId back to ObjectToPack. Change-Id: I0c536405a55034881fb5db92a2d2a99534faed34 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 vuotta sitten
Shawn O. Pearce	74e0835012	Honor pack.threads and perform delta search in parallel If we have multiple CPUs available, packing usually goes faster when each CPU is assigned a slice of the available search space. The number of threads to use is guessed from the runtime if it wasn't set by the caller, or wasn't set in the configuration. Change-Id: If554fd8973db77632a52a0f45377dd6ec13fc220 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 vuotta sitten
Shawn O. Pearce	b584cb8754	Add getObjectSize to ObjectReader This is an informational function used by PackWriter to help it better organize objects for delta compression. Storage systems can implement it to provide up more detailed size information, or they can simply rely on the default behavior that uses the ObjectLoader obtained from open. For local file storage, we can obtain this information faster through specialized routines that parse a pack object header. Change-Id: I13a09b4effb71ea5151b51547f7d091564531e58 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 vuotta sitten
Shawn O. Pearce	f29741d1d8	amend commit: Support large delta packed objects as streams Rename the ByteWindow's inflate() method to setInput. We have completely refactored the purpose of this method to be feeding part (or all) of the window as input to the Inflater, and the actual inflate activity happens in the caller. Change-Id: Ie93a5bae0e9e637b5e822d56993ce6b562c6ad15 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 vuotta sitten
Shawn O. Pearce	113577617b	Use core.streamFileThreshold to set our streaming limit We default this to 1 MiB for now, but we allow users to modify it through the Repository's configuration file to be a different value. A new repository listener is used to identify when the setting has been updated and trigger a reconfiguration of any active ObjectReaders. To prevent a horrible explosion we cap core.streamFileThreshold at no more than 1/4 of the maximum JVM heap size. We do this because we need at least 2 byte arrays equal in size to the stream threshold for the worst case delta inflation scenario, and our host application probably also needs some amount of the heap for their working set size. Change-Id: I103b3a541dc970bbf1a6d92917a12c5a1ee34d6c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 vuotta sitten
Shawn O. Pearce	ad68553be4	Support large delta packed objects as streams Very large delta instruction streams, or deltas which use very large base objects, are now streamed through as large objects rather than being inflated into a byte array. This isn't the most efficient way to access delta encoded content, as we may need to rewind and reprocess the base object when there was a block moved within the file, but it will at least prevent the JVM from having its heap explode. When streaming a delta we have an inflater open for each level in the delta chain, to inflate the instruction set of the delta, as well as an inflater for the base level object. The base object is buffered, as is the top level delta requested by the application, but we do not buffer the intermediate delta streams. This keeps memory usage lower, so its closer to 1024 bytes per level in the chain, without having an adverse impact on raw throughput as the top-level buffer gets pushed down to the lowest stream that has the next region. Delta instructions transparently collapse here, if the top level does not copy a region from its base, the base won't materialize that part from its own base, etc. This allows us to avoid copying around a lot of segments which have been deleted from the final version. Change-Id: I724d45245cebb4bad2deeae7b896fc55b2dd49b3 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 vuotta sitten
Shawn O. Pearce	ded8f6c721	Support large whole packed objects as streams Similar to the loose object support, whole packed objects can now be streamed back to the caller. The streaming is less efficient as we copy the data from the cached window array into the InflaterInputStream's internal buffer, then inflate it there before returning to the application. Like with unpacked objects, there is plenty of room for some optimization, especially for the copyTo method, where we don't necessarily need so much buffering to exist. Change-Id: Ie23be81289e37e24b91d17b0891e47b9da988008 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 vuotta sitten
Shawn O. Pearce	fa23482ca7	Support large loose objects as streams Big loose objects can now be streamed if they are over the large object size threshold. This prevents the JVM heap from exploding with a very large byte array to hold the slurped file, and then again with its uncompressed copy. We may have slightly slowed down the simple case for small loose objects, as the loader no longer slurps the entire thing and decompresses in memory. To try and keep good performance for the very common small objects that are below 8 KiB in size, buffers are set to 8 KiB, causing the reader to slurp most of the file anyway. However the data has to be copied at least once, from the BufferedInputStream into the InflaterInputStream. New unit tests are supplied to get nearly 100% code coverage on the unpacked code paths, for both standard and pack style loose objects. We tested a fair chunk of the code elsewhere, but these new tests are better isolated to the specific branches in the code path. Change-Id: I87b764ab1b84225e9b5619a2a55fd8eaa640e1fe Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 vuotta sitten
Shawn O. Pearce	aa4b06e087	Rename openObject, hasObject to just open, has Similar to what we did on Repository, the openObject method already implied we wanted to open an object, given its main argument was of type AnyObjectId. Simplify the method name to just the action, has or open. Change-Id: If055e5e0d8de0e2424c18a773f6d2bc2f66054f4 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 vuotta sitten
Shawn O. Pearce	9ba7bd4df4	Throw IncorrectObjectTypeException on bad type hints If the type hint isn't OBJ_ANY and it doesn't match the actual type observed from the object store, define the reader to throw back an IncorrectObjectTypeException. This way the caller doesn't have to perform this check itself before it evaluates the object data, and we can simplify quite a few call sites. Change-Id: I9f0dfa033857f439c94245361fcae515bc0a6533 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 vuotta sitten
Shawn O. Pearce	ea21c111cb	Move PackWriter over to storage.pack.PackWriter Similar to what we did with the file code, move the pack writer into its own package so the related classes and their package private methods are hidden from the rest of the library. Change-Id: Ic1b5c7c8c8d266e90c910d8d68dfc8e93586854f Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 vuotta sitten
Shawn O. Pearce	86547022f0	Tighten up local packed object representation during packing Rather than making a loader, and then using that to fill the object representation, parse the header and set up our data directly. This saves some time, as we don't waste cycles on information we won't use right now. The weight computed for a representation is now its actual stored size in the pack file, rather than its inflated size. This accounts for changes made when the compression level is modified on the repository. It is however more costly to determine the weight of the object, since we have to find its length in the pack. To try and recover that cost we now cache the length as part of our ObjectToPack record, so it doesn't have to be found during the output phase. A LocalObjectToPack now costs us (assuming 32 bit pointers): (32 bit) (64 bit) vm header: 8 bytes 8 bytes ObjectId: 20 bytes 20 bytes PackedObjectInfo: 12 bytes 12 bytes ObjectToPack: 8 bytes 12 bytes LocalOTP: 20 bytes 24 bytes ----------- --------- 68 bytes 74 bytes Change-Id: I923d2736186eb2ac8ab498d3eb137e17930fcb50 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 vuotta sitten
Shawn O. Pearce	ad5238dc67	Move FileRepository to storage.file.FileRepository This move isolates all of the local file specific implementation code into a single package, where their package-private methods and support classes are properly hidden away from the rest of the core library. Because of the sheer number of files impacted, I have limited this change to only the renames and the updated imports. Change-Id: Icca4884e1a418f83f8b617d0c4c78b73d8a4bd17 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 vuotta sitten
Shawn O. Pearce	3a7aec03e0	Implement zero-copy for single window objects Objects that fall completely within a single window can be worked with in a zero-copy fashion, provided that the window is backed by a normal byte[] and not by a ByteBuffer. This works for a surprising number of objects. The default window size is 8 KiB, but most deltas are quite a bit smaller than that. Objects smaller than 1/2 of the window size have a very good chance of falling completely within a window's array, which means we can work with them without copying their data around. Larger objects, or objects which are unlucky enough to span over a window boundary, get copied through the temporary buffer. We pay a tiny penalty to realize we can't use the zero-copy code path, but its easier than trying to keep track of two adjacent windows. With this change (as well as everything preceeding it), packing is actually a bit faster. Some crude benchmarks based on cloning linux-2.6.git (~324 MiB, 1,624,785 objects) over localhost using C git client and JGit daemon shows we get better throughput, and slightly better times: Total Time \| Throughput (old) (now) \| (old) (now) --------------+--------------------------- 2m45s 2m37s \| 12.49 MiB/s 21.17 MiB/s 2m42s 2m36s \| 16.29 MiB/s 22.63 MiB/s 2m37s 2m31s \| 16.07 MiB/s 21.92 MiB/s Change-Id: I48b2c8d37f08d7bf5e76c5a8020cde4a16ae3396 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 vuotta sitten
Shawn O. Pearce	ece88b99eb	Redo PackWriter object reuse output Output of selected reuses is refactored to use a new ObjectReuseAsIs interface that extends the ObjectReader. This interface allows the reader to control how it performs the reuse into the output stream, but also allows it to throw an exception to request the writer to find a different candidate representation. The PackFile reuse code was overhauled, cleaning up the APIs so they aren't exposed in the object loader, but instead are now a single method on the PackFile itself. The reuse algorithm was changed to do a data verification pass, followed by the copy pass to the output. This permits us to work around a corrupt object in a pack file by seeking another copy of that object when this one is bad. The reuse code was also optimized for the common case, where the in-pack representation is under 16 KiB. In these smaller cases data is sent to the pack writer more directly, avoiding some copying. Change-Id: I6350c2b444118305e8446ce1dfd049259832bcca Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 vuotta sitten
Shawn O. Pearce	bf4ffff07f	Redo PackWriter object reuse selection The new selection implementation uses a public API on the ObjectReader, allowing the storage library to enumerate its candidates and select the best one for this packer without needing to build a temporary list of the candidates first. Change-Id: Ie01496434f7d3581d6d3bbb9e33c8f9fa649b6cd Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 vuotta sitten
Shawn O. Pearce	6fc3ecac84	Extract PackFile specific code to ObjectToPack subclass The ObjectReader class is dual-purposed into being a factory for the ObjectToPack, permitting specific ObjectDatabase implementations to override the method and offer their own custom subclass of the generic ObjectToPack class. By allowing them to directly extend the type, each implementation can add custom fields to support tracking where an object is stored, without incurring any additional penalties like a parallel Map<ObjectId,Object> would cost. The reader was chosen to act as a factory rather than the database, as the reader will eventually be tied more tightly with the ObjectWalk and TreeWalk. During object enumeration the reader would have had to load the object for the RevWalk, and may chose to cache object position data internally so it can later be reused and fed into the ObjectToPack instance supplied to the PackWriter. Since a reader is not thread-safe, and is scoped to this PackWriter and its internal ObjectWalk, its a great place for the database to perform caching, if any. Right now this change goes a bit backwards by changing what should be generic ObjectToPack references inside of PackWriter to the very PackFile specific LocalObjectToPack subclass. We will correct these in a later commit as we start to refine what the ObjectToPack API will eventually look like in order to better support the PackWriter. Change-Id: I9f047d26b97e46dee3bc0ccb4060bbebedbe8ea9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 vuotta sitten
Shawn O. Pearce	5cfc29b491	Replace WindowCache with ObjectReader The WindowCache is an implementation detail of PackFile and how its used by ObjectDirectory. Lets start to hide it and replace the public API with a more generic concept, ObjectReader. Because PackedObjectLoader is also considered a private detail of PackFile, we have to make PackWriter temporarily dependent upon the WindowCursor and thus FileRepository and ObjectDirectory in order to just start the refactoring. In later changes we will clean up the APIs more, exposing sufficient support to PackWriter without needing the file specific implementation details. Change-Id: I676be12b57f3534f1285854ee5de1aa483895398 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 vuotta sitten
Shawn O. Pearce	9c4d42e94d	Factor out duplicate Inflater setup in WindowCursor Since we use this code twice, pull it into a private method. Let the compiler/JIT worry about whether or not this logic should be inlined into the call sites. Change-Id: Ia44fb01e0328485bcdfd7af96835d62b227a0fb1 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 vuotta sitten
Alex Blewitt	4d91645e89	Remove trailing whitespace at end of line As discussed on the egit-dev mailing list, we prefer not to have trailing whitespace in our source code. Correct all currently offending lines by trimming them. Change-Id: I002b1d1980071084c0bc53242c8f5900970e6845 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 vuotta sitten
Git Development Community	1a6964c827	Initial JGit contribution to eclipse.org Per CQ 3448 this is the initial contribution of the JGit project to eclipse.org. It is derived from the historical JGit repository at commit `3a2dd9921c`. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 vuotta sitten

15 Commitit (235fe7a50a6cbc85434be5d7321925821a509703)