mirrors/jgit - jgit - source @ dussan.org

Revīziju grafs

Autors	SHA1	Ziņojums	Datums
Luca Milanesio	82b1af31e2	Fix pack files scan when filesnapshot isn't modified Do not reload packfiles when their associated filesnapshot is not modified on disk compared to the one currently stored in memory. Fix the regression introduced by `fef78212` which, in conjunction with core.trustfolderstats = false, caused any lookup of objects inside the packlist to loop forever when the object was not found in the pack list. Bug: 546190 Change-Id: I38d752ebe47cefc3299740aeba319a2641f19391 Signed-off-by: Luca Milanesio <luca.milanesio@gmail.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	pirms 5 gadiem
Luca Milanesio	fef782128d	Do not reuse packfiles when changed on filesystem The pack reload mechanism from the filesystem works only by name and does not check the actual last modified date of the packfile. This lead to concurrency issues where multiple threads were loading and removing from each other list of packfiles when one of those was failing the checksum. Rely on FileSnapshot rather than directly checking lastModified timestamp so that more checks can be performed. Bug: 544199 Change-Id: I173328f29d9914007fd5eae3b4c07296ab292390 Signed-off-by: Luca Milanesio <luca.milanesio@gmail.com>	pirms 5 gadiem
Luca Milanesio	4c558225dc	Don't remove pack when FileNotFoundException is transient The FileNotFoundException is typically raised in three conditions: 1. file doesn't exist 2. incompatible read vs. read/write open modes 3. filesystem locking 4. temporary lack of resources (e.g. too many open files) 1. is already managed, 2. would never happen as packs are not overwritten while with 3. and 4. it is worth logging the exception and retrying to read the pack again. Log transient errors using an exponential backoff strategy to avoid flooding the logs with the same error if consecutive retries to access the pack fail repeatedly. Bug: 513435 Change-Id: I03c6f6891de3c343d3d517092eaa75dba282c0cd Signed-off-by: Luca Milanesio <luca.milanesio@gmail.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	pirms 7 gadiem
Matthias Sohn	3fc93f8a56	Rename files using NIO2 atomic rename Bug: 319233 Change-Id: I5137212f5cd3195a52f90ed5e4ce3cf194a13efd Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	pirms 9 gadiem
Matthias Sohn	d2faec27a7	Raise error if FileNotFoundException is caught for an existing file File, FileInputStream and friends may throw FileNotFoundException even if the file is existing e.g. when file permissions don't allow to access the file content. In most cases this is a severe error we should not suppress hence rethrow the FileNotFoundException in this case. This may also fix bug 451508. Bug: 451508 Change-Id: If4a94217fb5b7cfd4c04d881902f3e86193c7008 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	pirms 8 gadiem
Matthias Sohn	05acf1c62f	Use java.io.File to check existence of loose objects in ObjectDirectory It was reported in [1] that `197e3393a5` led to a performance regression in a BFG benchmark. Analysis showed that this is caused by the exists() method in FS_POSIX, now overriding the default implementation in FS. The default implementation of FS.exists() uses java.io.File.exists(), while the new implementation in FS_POSIX uses java.nio.file.Files.exists() - by simply removing the override in FS_POSIX, performance was restored. Profiling showed that java.nio.file.Files.exists() is substantially slower than java.io.File.exists(), to the point where the exists() call doubles the average cost of a call to ObjectDirectory.insertUnpackedObject() - which the BFG uses a lot, because it's rewriting history. Average times measured on Ubuntu were: java.io.File.exists() - 4 microseconds java.nio.file.Files.exists() - 60 microseconds The loose object exists test should be using java.io.File and not FS. ObjectDirectory uses FS.resolve() to traverse symlinks to objects but then once inside objects all 256 sharded directories should be real directories, and the object files should be real files, not dangling symlinks. java.io.File.exists() is sufficient here, and faster. Change ObjectDirectory to use File.exists() once its computed the File handle. This does mean JGit cannot run ObjectDirectory code on an abstract virtual filesystem plugged into NIO2. If you really want to run JGit on an esoteric non-standard filesystem like "in memory" you should look at the DFS storage backend, which has fewer abstraction points to deal with. Or write your own from scratch. [1] https://dev.eclipse.org/mhonarc/lists/jgit-dev/msg02954.html Change-Id: I74684dc3957ae1ca52a7097f83a6c420aa24310f Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	pirms 8 gadiem
Martin Fick	cb08dd8b14	Add public isStaleFileHandle() API, improve detection. Add a public API to the FileUtils to determine if an IOException is a stale NFS file handle exception. This will make it easier to detect such errors, and interpret them consistently throughout the codebase. This new API is a bit more lenient in its detection than the previous detection, and should be able to detect some errors which previously were not identified as stale file handle exceptions because they had the word NFS in the error message. Adjust the packfile handling code to use this new API for detection. Change-Id: I21f80014546ba1afec7335890e5ae79e7f521412 Signed-off-by: Martin Fick<mfick@codeaurora.org>	pirms 8 gadiem
Hugo Arès	ae4b72d50e	Remove pack from list when file handle is stale This error happens on nfs file system when you try to read a file that was deleted or replaced. When the error happens because the file was deleted, removing it from the list is the proper way to handle the error, same use case as FileNotFoundException. When the error happens because the file was replaced, removing the file from the list will cause the file to be re-read so it will get the latest version of the file. Bug: 462868 Change-Id: I368af61a6cf73706601a3e4df4ef24f0aa0465c5 Signed-off-by: Hugo Arès <hugo.ares@ericsson.com>	pirms 9 gadiem
Hugo Arès	d3a73c1407	Lower log level to warn for handled pack errors Pack not found and pack corrupted/invalid are handled by the code (pack is removed from the list) so logging an error and the stacktrace is misleading because it implies that there is an action to take to fix the error. Lower the log level to warn and remove the stacktrace for those 2 types of errors and keep the error log statement for any other. Change-Id: I2400fe5fec07ac6d6c244b852cce615663774e6e Signed-off-by: Hugo Arès <hugo.ares@ericsson.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	pirms 9 gadiem
Matthias Sohn	c18694e0d1	Use slf4j to log instead of printing to System.err CQ: 9206 Bug: 458445 Change-Id: Ic68fb7dbe0fb46bf30f157db45bf18d8f3a704c0 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	pirms 9 gadiem
Matthias Sohn	f5936405a3	If a pack isn't found on disk remove it from pack list If accessing a pack throws FileNotFoundException the pack was deleted and we need to remove it from the pack list. This can be caused e.g. by git gc. Change-Id: I5d10f87f364dadbbdbfb61b6b2cbdee9c7457f3d Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	pirms 9 gadiem
Matthias Sohn	27ee334213	Don't remove pack from pack list for problems which could be transient If we hit a corrupt object or invalid pack remove the pack from the pack list. Other IOException could be transient hence we should not remove the pack from the list to avoid the problem reported on the Gerrit list [1]. It looks like in the reported case the pack was removed from the pack list causing MissingObjectExceptions which disappear when the server is restarted. [1] https://groups.google.com/forum/#!topic/repo-discuss/Qdmbl-YZ4NU Change-Id: I331626110d54b190e46cddc2c40f29ddeb9613cd Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	pirms 9 gadiem
Matthias Sohn	9b86ebb4f6	Log reason for ignoring pack when IOException occurred This should help to identify the root cause of the problem discussed on the Gerrit list [1]. [1] https://groups.google.com/forum/#!topic/repo-discuss/Qdmbl-YZ4NU Change-Id: I871f70e4bb1227952e1544b789013583b14e2b96 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	pirms 9 gadiem
Christian Halstrick	1b9130e8db	Make sure modifications to config-param trustFolderStat are detected ObjectDirectory.searchPacksAgain() should always read trustFolderStat from the config and not rely on a cached value. Change-Id: I90edbaae3c64eea0c9894d05acde4267991575ee	pirms 9 gadiem
Christian Halstrick	0fc8b05a71	Introduce config parameter core.trustfolderstat JGit's ObjectDirectory implements the optimization that it remembers the pack folders (.git/objects/pack) lastModified timestamp and doesn't check for new packfiles in this folder if the lastModified attribute has not changed. In environments using NFS this can cause trouble. If multiple JGit instances from multiple machines work on the same repository and one instance creates a new ref and a new packfile (e.g. by doing a fetch) then the other machines may detect the new ref but can't resolve the referenced object because it doesn't detect that pack folder has a new packfile. That's because NFS may cache file/folder metadata for quite a long time and the pack folders modification time is not updated although a new packfile is there and could be read. The new config parameter core.trustfolderstat controls this behaviour. The default is true and jgits behaviours is unchanged. But if this parameter is set to false then jgit doesn't trust the pack directories lastmodified anymore. Instead it will always iterate through the content of that folder to detect new packfiles. Change-Id: Ie3b4e92933286aa9916070a22422e629b3147f54 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	pirms 9 gadiem
Marc Strapetz	59a2dc801c	Files should be deleted with "retry" option Some of our Windows users have reported sporadic file system access problems related to ObjectDirectory(Inserter) file deletion code in combination with antiviral/firewall tools. For one of these users the problem was fairly reproducible and changing deletion to RETRY solved his problem. Change-Id: I1e4001d5557fca693b7bac401268599467cb0c9e Signed-off-by: Marc Strapetz <marc.strapetz@syntevo.com>‌	pirms 10 gadiem
Robin Rosenberg	5ef6d69532	Use the new FS.exists method in commonly occuring places Allegedly this should improve performance, but I could not see it. Change-Id: Id2057cb2cfcb46e94ff954483ce23f9c4a7edc5e	pirms 11 gadiem
Shawn Pearce	d1aacc415a	Fix MissingObjectException race in ObjectDirectory Johannes Carlsson identified a race condition[1] that can lead to spurious MissingObjectExceptions at read time. If two threads are active inside of ObjectDirectory looking for a packed object and the packList is currently the empty NO_PACKS list, thread A will find no object and eventually consider tryAgain1(). If thread A is put to sleep and this point and thread B also does not find the object, loads the packs, when thread A wakes up its tryAgain1 would return false and the thread never considers the packs. Rework the internal API of ObjectDirectory to keep a handle on the exact PackList that was iterated by thread A, allowing it to always retry walking through the packs if the new PackList is different. This had some ripple effect into the CachedObjectDirectory and the shared FileObjectDatabase interface. The new code should be slightly easier to follow, especially from the perspective of the CachedObjectDirectory trying to minimize the number of open system calls it makes to files matching "$GIT_DIR/objects/??/?x{38}". [1] http://dev.eclipse.org/mhonarc/lists/jgit-dev/msg02401.html Change-Id: I9a1c9d6ad6cb38404b7b9178167b714077561353	pirms 10 gadiem
Shawn Pearce	f32b861243	JGit 3.0: move internal classes into an internal subpackage This breaks all existing callers once. Applications are not supposed to build against the internal storage API unless they can accept API churn and make necessary updates as versions change. Change-Id: I2ab1327c202ef2003565e1b0770a583970e432e9	pirms 11 gadiem
Shawn Pearce	3760e4319b	Remove cached_packs support in favor of bitmaps The bitmap code in PackWriter knows exactly when to use a pack as a "cached pack". It enables cached pack usage only when the pack has a bitmap and its entire closure of objects needs to be sent. This is a much simpler code path to maintain, and JGit actually has a way to write the necessary index. Change-Id: I2645d482f8733fdf0c4120cc59ba9aa4d4ba6881	pirms 11 gadiem
Colby Ranger	3b325917a5	Added read/write support for pack bitmap index. A pack bitmap index is an additional index of compressed bitmaps of the object graph. Furthermore, a logical API of the index functionality is included, as it is expected to be used by the PackWriter. Compressed bitmaps are created using the javaewah library, which is a word-aligned compressed variant of the Java bitset class based on run-length encoding. The library only works with positive integer values. Thus, the maximum number of ObjectIds in a pack file that this index can currently support is limited to Integer.MAX_VALUE. Every ObjectId is given an integer mapping. The integer is the position of the ObjectId in the complete ObjectId list, sorted by offset, for the pack file. That integer is what the bitmaps use to reference the ObjectId. Currently, the new index format can only be used with pack files that contain a complete closure of the object graph e.g. the result of a garbage collection. The index file includes four bitmaps for the Git object types i.e. commits, trees, blobs, and tags. In addition, a collection of bitmaps keyed by an ObjectId is also included. The bitmap for each entry in the collection represents the full closure of ObjectIds reachable from the keyed ObjectId (including the keyed ObjectId itself). The bitmaps are further compressed by XORing the current bitmaps against prior bitmaps in the index, and selecting the smallest representation. The XOR'd bitmap and offset from the current entry to the position of the bitmap to XOR against is the actual representation of the entry in the index file. Each entry contains one byte, which is currently used to note whether the bitmap should be blindly reused. Change-Id: Id328724bf6b4c8366a088233098c18643edcf40f	pirms 11 gadiem
Colby Ranger	4a317a1790	Include supported extensions in PackFile constructor. Previously a PackFile class was assumed to only support a .pack and .idx file. Update the constructor to enumerate the supported extensions for the pack file. This will allow the bitmap code to only be executed if the bitmap extension file is known to exist. Change-Id: Ie59041dffec5f60d7ea2771026ffd945106bd4bf	pirms 11 gadiem
Roberto Tyley	5dcc8693d7	Fix concurrent creation of fan-out object directories If multiple threads attempted to insert loose objects into the same new fan-out directory, the creation of that directory was subject to a race condition that could lead to an unnecessary IOException being thrown - because an inserter could not 'create' a directory that had just been generated by a different thread. All we require is that the directory does indeed exist, so not being able to _create_ it is not actually a fatal problem. Setting 'skipExisting' to 'true' on the call to mkdir() fixes the issue. I found this issue as a real world occurrence while working on The BFG Repo Cleaner (https://github.com/rtyley/bfg-repo-cleaner), a tool which concurrently performs a lot of object creation. In order to demonstrate the problem here I've added a small test case which reliably reproduces the issue on the few different hardware systems I've tried. The error thrown when the race-condition arises is this: java.io.IOException: Creating directory /home/roberto/repo.git/objects/e6 failed at org.eclipse.jgit.util.FileUtils.mkdir(FileUtils.java:182) at org.eclipse.jgit.storage.file.ObjectDirectory.insertUnpackedObject(ObjectDirectory.java:590) at org.eclipse.jgit.storage.file.ObjectDirectoryInserter.insertOneObject(ObjectDirectoryInserter.java:113) at org.eclipse.jgit.storage.file.ObjectDirectoryInserter.insert(ObjectDirectoryInserter.java:91) at org.eclipse.jgit.lib.ObjectInserter.insert(ObjectInserter.java:329) Change-Id: I88eac49bc600c56ba9ad290e6133d8a7113125ab	pirms 11 gadiem
Colby Ranger	82ecfb3e31	Remove packIndex field from FileObjDatabase openPack method. Previously, the FileObjDatabase required both the pack file path and index file path to be passed to openPack(). A future change to add a bitmap index will add a .bitmap file parallel to the pack file (similar to the .idx file). Update the PackFile to support automatically loading pack index extensions based on the pack file path. Change-Id: Ifc8fc3e57f4afa177ba5a88df87334dbfa799f01	pirms 11 gadiem
Robin Rosenberg	c310fa0c80	Mark non-externalizable strings as such A few classes such as Constanrs are marked with @SuppressWarnings, as are toString() methods with many liternal, but otherwise $NLS-n$ is used for string containing text that should not be translated. A few literals may fall into the gray zone, but mostly I've tried to only tag the obvious ones. Change-Id: I22e50a77e2bf9e0b842a66bdf674e8fa1692f590	pirms 11 gadiem
Marc Strapetz	67edd3eda7	RevWalk support for shallow clones StartGenerator now processes .git/shallow to have the RevWalk stop for shallow commits. See RevWalkShallowTest for tests. Bug: 394543 CQ: 6908 Change-Id: Ia5af1dab3fe9c7888f44eeecab1e1bcf2e8e48fe Signed-off-by: Chris Aniszczyk <zx@twitter.com>	pirms 11 gadiem
Robin Rosenberg	95d311f888	Move JGitText to an internal package Change-Id: I763590a45d75f00a09097ab6f89581a3bbd3c797	pirms 12 gadiem
Shawn O. Pearce	461b012e95	PackWriter: Support reuse of entire packs The most expensive part of packing a repository for transport to another system is enumerating all of the objects in the repository. Once this gets to the size of the linux-2.6 repository (1.8 million objects), enumeration can take several CPU minutes and costs a lot of temporary working set memory. Teach PackWriter to efficiently reuse an existing "cached pack" by answering a clone request with a thin pack followed by a larger cached pack appended to the end. This requires the repository owner to first construct the cached pack by hand, and record the tip commits inside of $GIT_DIR/objects/info/cached-packs: cd $GIT_DIR root=$(git rev-parse master) tmp=objects/.tmp-$$ names=$(echo $root \| git pack-objects --keep-true-parents --revs $tmp) for n in $names; do chmod a-w $tmp-$n.pack $tmp-$n.idx touch objects/pack/pack-$n.keep mv $tmp-$n.pack objects/pack/pack-$n.pack mv $tmp-$n.idx objects/pack/pack-$n.idx done (echo "+ $root"; for n in $names; do echo "P $n"; done; echo) >>objects/info/cached-packs git repack -a -d When a clone request needs to include $root, the corresponding cached pack will be copied as-is, rather than enumerating all of the objects that are reachable from $root. For a linux-2.6 kernel repository that should be about 376 MiB, the above process creates two packs of 368 MiB and 38 MiB[1]. This is a local disk usage increase of ~26 MiB, due to reduced delta compression between the large cached pack and the smaller recent activity pack. The overhead is similar to 1 full copy of the compressed project sources. With this cached pack in hand, JGit daemon completes a clone request in 1m17s less time, but a slightly larger data transfer (+2.39 MiB): Before: remote: Counting objects: 1861830, done remote: Finding sources: 100% (1861830/1861830) remote: Getting sizes: 100% (88243/88243) remote: Compressing objects: 100% (88184/88184) Receiving objects: 100% (1861830/1861830), 376.01 MiB \| 19.01 MiB/s, done. remote: Total `1861830` (delta 4706), reused `1851053` (delta `1553844`) Resolving deltas: 100% (1564621/1564621), done. real 3m19.005s After: remote: Counting objects: 1601, done remote: Counting objects: 1828460, done remote: Finding sources: 100% (50475/50475) remote: Getting sizes: 100% (18843/18843) remote: Compressing objects: 100% (7585/7585) remote: Total `1861830` (delta 2407), reused `1856197` (delta 37510) Receiving objects: 100% (1861830/1861830), 378.40 MiB \| 31.31 MiB/s, done. Resolving deltas: 100% (1559477/1559477), done. real 2m2.938s Repository owners can periodically refresh their cached packs by repacking their repository, folding all newer objects into a larger cached pack. Since repacking is already considered to be a normal Git maintenance activity, this isn't a very big burden. [1] In this test $root was set back about two weeks. Change-Id: Ib87131d5c4b5e8c5cacb0f4fe16ff4ece554734b Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	pirms 13 gadiem
Matthias Sohn	38eec8f4a2	[findbugs] Do not ignore exceptional return value of mkdir java.io.File.mkdir() and mkdirs() report failure as an exceptional return value false. Fix the code which silently ignored this exceptional return value. Change-Id: I41244f4b9d66176e68e2c07e2329cf08492f8619 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	pirms 13 gadiem
Robin Rosenberg	24e7f0f6fa	Fix tests broken by fix for adding files in a network share The change Ie0350e032a97e0d09626d6143c5c692873a5f6a2 was not done properly. The renamed file was not write protected, and this broke a test. Bug: 335388 Change-Id: I41b2235b7677bc5fddc70dda2a56cdd2cb53ce5d Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>	pirms 13 gadiem
Robin Rosenberg	c4c8d80fd3	Fix adding files in a network share We cannot always rename read-only files on network shares, so rename the temp file for a new loose object first, and then set it as read-only. Bug: 335388 Change-Id: Ie0350e032a97e0d09626d6143c5c692873a5f6a2 Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>	pirms 13 gadiem
Shawn O. Pearce	1bf0c3cdb1	Refactor IndexPack to not require local filesystem By moving the logic that parses a pack stream from the network (or a bundle) into a type that can be constructed by an ObjectInserter, repository implementations have a chance to inject their own logic for storing object data received into the destination repository. The API isn't completely generic yet, there are still quite a few assumptions that the PackParser subclass is storing the data onto the local filesystem as a single file. But its about the simplest split of IndexPack I can come up with without completely ripping the code apart. Change-Id: I5b167c9cc6d7a7c56d0197c62c0fd0036a83ec6c Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>	pirms 13 gadiem
Shawn O. Pearce	c8db22f355	Extract pack directory last modified check code Pulling the last modified checking logic out of ObjectDirectory makes it possible to reuse this code for other files, such as the $GIT_DIR/config or $GIT_DIR/packed-refs files. Change-Id: If2f27a89fc3b7adde7e65ff40bbca5d55b98b772 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	pirms 13 gadiem
Matthias Sohn	45731756a5	[findbugs] Do not ignore exceptional return value java.io.File.delete() reports failure as an exceptional return value false. Fix the code which silently ignored this exceptional return value. Also remove some duplicate deletion helper methods. Change-Id: I80ed20ca1f07a2bc6e779957a4ad0c713789c5be Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	pirms 13 gadiem
Shawn O. Pearce	d00420ae6e	Make ObjectDirectory getPacks() work the first time If an object hasn't been accessed yet the pack list for a repository may not have been scanned from disk. If an application (e.g. the dumb transport servlet support code) asks for the pack list for an ObjectDirectory, we should load it immediately. Change-Id: I93d7b1bca422d905948e8e83b2afa83c8894a68b Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	pirms 13 gadiem
Shawn O. Pearce	e51e06946f	Update CachedObjectDirectory when inserting objects If an ObjectInserter is created from a CachedObjectDirectory, we need to ensure the cache is updated whenever a new loose object is actually added to the loose objects directory, otherwise a future read from an ObjectReader on the CachedObjectDirectory might not be able to open the newly created object. We mostly had the infrastructure in place to implement this due to the injection of unpacked large deltas, but we didn't have a way to pass the ObjectId from ObjectDirectoryInserter to CachedObjectDirectory, because the inserter was using the underlying ObjectDirectory and not the CachedObjectDirectory. Redirecting to CachedObjectDirectory ensures the cache is updated. Change-Id: I1f7bdfacc7ad77ebdb885f655e549cc570652225 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	pirms 13 gadiem
Shawn O. Pearce	5fce8d81d8	Fix cloning of repositories with big objects When running IndexPack we use a CachedObjectDirectory, which knows what objects are loose and tries to avoid stat(2) calls for objects that do not exist in the repository, as stat(2) on Win32 is very slow. However large delta objects found in a pack file are expanded into a loose object, in order to avoid costly delta chain processing when that object is used as a base for another delta. If this expand occurs while working with the CachedObjectDirectory, we need to update the cached directory data to include this new object, otherwise it won't be available when we try to open it during the object verify phase. Bug: 324868 Change-Id: Idf0c76d4849d69aa415ead32e46a435622395d68 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	pirms 13 gadiem
Shawn O. Pearce	41dd9ed1c0	Unpack and cache large deltas as loose objects Instead of spooling large delta bases into temporary files and then immediately deleting them afterwards, spool the large delta out to a normal loose object. Later any requests for that large delta can be answered by reading from the loose object, which is much easier to stream efficiently for readers. Since the object is now duplicated, once in the pack as a delta and again as a loose object, any future prune-packed will automatically delete the loose object variant, releasing the wasted disk space. As prune-packed is run automatically during either repack or gc, and gc --auto triggers automatically based on the number of loose objects, we get automatic cache management for free. Large objects that were unpacked will be periodically cleared out, and will simply be restored later if they are needed again. After a short offline discussion with Junio Hamano today, we may want to propose a change to prune-packed to hold onto larger loose objects which also exist in pack files as deltas, if the loose object was recently accessed or modified in the last 2 days. Change-Id: I3668a3967c807010f48cd69f994dcbaaf582337c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	pirms 13 gadiem
Shawn O. Pearce	3f66e65e71	Remember loose objects and fast-track their lookup Recently created objects are usually what branches point to, and are usually written out as loose objects. But due to the high cost of asking the operating system if a file exists, these are the last thing that ObjectDirectory examines when looking for an object by its ObjectId. Caching recently seen loose objects permits the opening code to jump directly to the loose object, accelerating lookup for branch heads that are accessed often. To avoid exploding the cache its limited to approximately 2048 entries. When more ids are added, the table is simply cleared and reset in size. Change-Id: I18f483217412b102f754ffd496c87061d592e535 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	pirms 13 gadiem
Shawn O. Pearce	e29cd27961	Move ObjectDirectory streaming limit to WindowCacheConfig IDEs like Eclipse offer up the settings in WindowCacheConfig to the user as a global set of options that are configured for the entire JVM process, not per-repository, as the cache is shared across the entire JVM. The limit on how much we are willing to allocate for an object buffer is similar to the limit on how much we can use for data caches, allocating that much space impacts the entire JVM and not just a single repository, so it should be a global limit. Change-Id: I22eafb3e223bf8dea57ece82cd5df8bfe5badebc Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	pirms 14 gadiem
Shawn O. Pearce	1c3f3fdbd2	Fix ObjectDirectory abbreviation resolution to notice new packs If we can't resolve an abbreviation, it might be because there is a new pack file we haven't picked up yet. Try scanning the packs again and recheck each pack if there were differences from the last scan we did. Because of this, we don't have to open a pack during the test where we generate a pack on the fly. We'll miss on the first loop during which the PackList is the NO_PACKS magic initialization constant, and pick up the newly created index during this retry logic. Change-Id: I7b97efb29a695ee60c90818be380f7ea23ad13a3 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	pirms 14 gadiem
Shawn O. Pearce	a5c18fcfc7	Fully implement SHA-1 abbreviations ObjectReader implementations are now responsible for creating the unique abbreviation of an ObjectId, or for resolving an abbreviation back to its full form. In this latter case the reader can offer up multiple candidates to the caller, who may be able to disambiguate them based on context. Repository.resolve() doesn't take multiple candidates into account right now, but it could in the future by looking for a remaining ^0 or ^{commit} suffix and take an expansion if there is only one commit that matches the input abbreviation. It could also use the distance from an annotated tag to resolve "tag-NNN-gcommit" style strings that are often output by `git describe`. Change-Id: Icd3250adc8177ae05278b858933afdca0cbbdb56 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	pirms 14 gadiem
Shawn O. Pearce	b584cb8754	Add getObjectSize to ObjectReader This is an informational function used by PackWriter to help it better organize objects for delta compression. Storage systems can implement it to provide up more detailed size information, or they can simply rely on the default behavior that uses the ObjectLoader obtained from open. For local file storage, we can obtain this information faster through specialized routines that parse a pack object header. Change-Id: I13a09b4effb71ea5151b51547f7d091564531e58 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	pirms 14 gadiem
Shawn O. Pearce	113577617b	Use core.streamFileThreshold to set our streaming limit We default this to 1 MiB for now, but we allow users to modify it through the Repository's configuration file to be a different value. A new repository listener is used to identify when the setting has been updated and trigger a reconfiguration of any active ObjectReaders. To prevent a horrible explosion we cap core.streamFileThreshold at no more than 1/4 of the maximum JVM heap size. We do this because we need at least 2 byte arrays equal in size to the stream threshold for the worst case delta inflation scenario, and our host application probably also needs some amount of the heap for their working set size. Change-Id: I103b3a541dc970bbf1a6d92917a12c5a1ee34d6c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	pirms 14 gadiem
Shawn O. Pearce	13e0218a25	Replace PackedObjectLoader with ObjectLoader.SmallObject The class is identical, but ObjectLoader.SmallObject is part of our public API for storage implementations to build on top of. Change-Id: I381a3953b14870b6d3d74a9c295769ace78869dc Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	pirms 14 gadiem
Shawn O. Pearce	fa23482ca7	Support large loose objects as streams Big loose objects can now be streamed if they are over the large object size threshold. This prevents the JVM heap from exploding with a very large byte array to hold the slurped file, and then again with its uncompressed copy. We may have slightly slowed down the simple case for small loose objects, as the loader no longer slurps the entire thing and decompresses in memory. To try and keep good performance for the very common small objects that are below 8 KiB in size, buffers are set to 8 KiB, causing the reader to slurp most of the file anyway. However the data has to be copied at least once, from the BufferedInputStream into the InflaterInputStream. New unit tests are supplied to get nearly 100% code coverage on the unpacked code paths, for both standard and pack style loose objects. We tested a fair chunk of the code elsewhere, but these new tests are better isolated to the specific branches in the code path. Change-Id: I87b764ab1b84225e9b5619a2a55fd8eaa640e1fe Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	pirms 14 gadiem
Shawn O. Pearce	ea21c111cb	Move PackWriter over to storage.pack.PackWriter Similar to what we did with the file code, move the pack writer into its own package so the related classes and their package private methods are hidden from the rest of the library. Change-Id: Ic1b5c7c8c8d266e90c910d8d68dfc8e93586854f Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	pirms 14 gadiem
Shawn O. Pearce	71aace52f7	Simplify ObjectLoaders coming from PackFile We no longer need an ObjectLoader to be lazy and try to delay the materialization of the object content. That was done only to support PackWriter searching for a good reuse candidate. Instead, simplify the code base by doing the materialization immediately when the loader asks for it, because any caller asking for the loader is going to need the content. Change-Id: Id867b1004529744f234ab8f9cfab3d2c52ca3bd0 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	pirms 14 gadiem
Shawn O. Pearce	86547022f0	Tighten up local packed object representation during packing Rather than making a loader, and then using that to fill the object representation, parse the header and set up our data directly. This saves some time, as we don't waste cycles on information we won't use right now. The weight computed for a representation is now its actual stored size in the pack file, rather than its inflated size. This accounts for changes made when the compression level is modified on the repository. It is however more costly to determine the weight of the object, since we have to find its length in the pack. To try and recover that cost we now cache the length as part of our ObjectToPack record, so it doesn't have to be found during the output phase. A LocalObjectToPack now costs us (assuming 32 bit pointers): (32 bit) (64 bit) vm header: 8 bytes 8 bytes ObjectId: 20 bytes 20 bytes PackedObjectInfo: 12 bytes 12 bytes ObjectToPack: 8 bytes 12 bytes LocalOTP: 20 bytes 24 bytes ----------- --------- 68 bytes 74 bytes Change-Id: I923d2736186eb2ac8ab498d3eb137e17930fcb50 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	pirms 14 gadiem
Shawn O. Pearce	ad5238dc67	Move FileRepository to storage.file.FileRepository This move isolates all of the local file specific implementation code into a single package, where their package-private methods and support classes are properly hidden away from the rest of the core library. Because of the sheer number of files impacted, I have limited this change to only the renames and the updated imports. Change-Id: Icca4884e1a418f83f8b617d0c4c78b73d8a4bd17 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	pirms 14 gadiem

19 Revīzijas (82b1af31e295273eeedd82fc5db3266c909c9e80)