mirrors/jgit - jgit - source @ dussan.org

Commit Graph

Author	SHA1	Message	Date
Shawn O. Pearce	5fce8d81d8	Fix cloning of repositories with big objects When running IndexPack we use a CachedObjectDirectory, which knows what objects are loose and tries to avoid stat(2) calls for objects that do not exist in the repository, as stat(2) on Win32 is very slow. However large delta objects found in a pack file are expanded into a loose object, in order to avoid costly delta chain processing when that object is used as a base for another delta. If this expand occurs while working with the CachedObjectDirectory, we need to update the cached directory data to include this new object, otherwise it won't be available when we try to open it during the object verify phase. Bug: 324868 Change-Id: Idf0c76d4849d69aa415ead32e46a435622395d68 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	41dd9ed1c0	Unpack and cache large deltas as loose objects Instead of spooling large delta bases into temporary files and then immediately deleting them afterwards, spool the large delta out to a normal loose object. Later any requests for that large delta can be answered by reading from the loose object, which is much easier to stream efficiently for readers. Since the object is now duplicated, once in the pack as a delta and again as a loose object, any future prune-packed will automatically delete the loose object variant, releasing the wasted disk space. As prune-packed is run automatically during either repack or gc, and gc --auto triggers automatically based on the number of loose objects, we get automatic cache management for free. Large objects that were unpacked will be periodically cleared out, and will simply be restored later if they are needed again. After a short offline discussion with Junio Hamano today, we may want to propose a change to prune-packed to hold onto larger loose objects which also exist in pack files as deltas, if the loose object was recently accessed or modified in the last 2 days. Change-Id: I3668a3967c807010f48cd69f994dcbaaf582337c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	3f66e65e71	Remember loose objects and fast-track their lookup Recently created objects are usually what branches point to, and are usually written out as loose objects. But due to the high cost of asking the operating system if a file exists, these are the last thing that ObjectDirectory examines when looking for an object by its ObjectId. Caching recently seen loose objects permits the opening code to jump directly to the loose object, accelerating lookup for branch heads that are accessed often. To avoid exploding the cache its limited to approximately 2048 entries. When more ids are added, the table is simply cleared and reset in size. Change-Id: I18f483217412b102f754ffd496c87061d592e535 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	e29cd27961	Move ObjectDirectory streaming limit to WindowCacheConfig IDEs like Eclipse offer up the settings in WindowCacheConfig to the user as a global set of options that are configured for the entire JVM process, not per-repository, as the cache is shared across the entire JVM. The limit on how much we are willing to allocate for an object buffer is similar to the limit on how much we can use for data caches, allocating that much space impacts the entire JVM and not just a single repository, so it should be a global limit. Change-Id: I22eafb3e223bf8dea57ece82cd5df8bfe5badebc Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	1c3f3fdbd2	Fix ObjectDirectory abbreviation resolution to notice new packs If we can't resolve an abbreviation, it might be because there is a new pack file we haven't picked up yet. Try scanning the packs again and recheck each pack if there were differences from the last scan we did. Because of this, we don't have to open a pack during the test where we generate a pack on the fly. We'll miss on the first loop during which the PackList is the NO_PACKS magic initialization constant, and pick up the newly created index during this retry logic. Change-Id: I7b97efb29a695ee60c90818be380f7ea23ad13a3 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	a5c18fcfc7	Fully implement SHA-1 abbreviations ObjectReader implementations are now responsible for creating the unique abbreviation of an ObjectId, or for resolving an abbreviation back to its full form. In this latter case the reader can offer up multiple candidates to the caller, who may be able to disambiguate them based on context. Repository.resolve() doesn't take multiple candidates into account right now, but it could in the future by looking for a remaining ^0 or ^{commit} suffix and take an expansion if there is only one commit that matches the input abbreviation. It could also use the distance from an annotated tag to resolve "tag-NNN-gcommit" style strings that are often output by `git describe`. Change-Id: Icd3250adc8177ae05278b858933afdca0cbbdb56 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	b584cb8754	Add getObjectSize to ObjectReader This is an informational function used by PackWriter to help it better organize objects for delta compression. Storage systems can implement it to provide up more detailed size information, or they can simply rely on the default behavior that uses the ObjectLoader obtained from open. For local file storage, we can obtain this information faster through specialized routines that parse a pack object header. Change-Id: I13a09b4effb71ea5151b51547f7d091564531e58 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	113577617b	Use core.streamFileThreshold to set our streaming limit We default this to 1 MiB for now, but we allow users to modify it through the Repository's configuration file to be a different value. A new repository listener is used to identify when the setting has been updated and trigger a reconfiguration of any active ObjectReaders. To prevent a horrible explosion we cap core.streamFileThreshold at no more than 1/4 of the maximum JVM heap size. We do this because we need at least 2 byte arrays equal in size to the stream threshold for the worst case delta inflation scenario, and our host application probably also needs some amount of the heap for their working set size. Change-Id: I103b3a541dc970bbf1a6d92917a12c5a1ee34d6c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	13e0218a25	Replace PackedObjectLoader with ObjectLoader.SmallObject The class is identical, but ObjectLoader.SmallObject is part of our public API for storage implementations to build on top of. Change-Id: I381a3953b14870b6d3d74a9c295769ace78869dc Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	fa23482ca7	Support large loose objects as streams Big loose objects can now be streamed if they are over the large object size threshold. This prevents the JVM heap from exploding with a very large byte array to hold the slurped file, and then again with its uncompressed copy. We may have slightly slowed down the simple case for small loose objects, as the loader no longer slurps the entire thing and decompresses in memory. To try and keep good performance for the very common small objects that are below 8 KiB in size, buffers are set to 8 KiB, causing the reader to slurp most of the file anyway. However the data has to be copied at least once, from the BufferedInputStream into the InflaterInputStream. New unit tests are supplied to get nearly 100% code coverage on the unpacked code paths, for both standard and pack style loose objects. We tested a fair chunk of the code elsewhere, but these new tests are better isolated to the specific branches in the code path. Change-Id: I87b764ab1b84225e9b5619a2a55fd8eaa640e1fe Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	ea21c111cb	Move PackWriter over to storage.pack.PackWriter Similar to what we did with the file code, move the pack writer into its own package so the related classes and their package private methods are hidden from the rest of the library. Change-Id: Ic1b5c7c8c8d266e90c910d8d68dfc8e93586854f Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	71aace52f7	Simplify ObjectLoaders coming from PackFile We no longer need an ObjectLoader to be lazy and try to delay the materialization of the object content. That was done only to support PackWriter searching for a good reuse candidate. Instead, simplify the code base by doing the materialization immediately when the loader asks for it, because any caller asking for the loader is going to need the content. Change-Id: Id867b1004529744f234ab8f9cfab3d2c52ca3bd0 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	86547022f0	Tighten up local packed object representation during packing Rather than making a loader, and then using that to fill the object representation, parse the header and set up our data directly. This saves some time, as we don't waste cycles on information we won't use right now. The weight computed for a representation is now its actual stored size in the pack file, rather than its inflated size. This accounts for changes made when the compression level is modified on the repository. It is however more costly to determine the weight of the object, since we have to find its length in the pack. To try and recover that cost we now cache the length as part of our ObjectToPack record, so it doesn't have to be found during the output phase. A LocalObjectToPack now costs us (assuming 32 bit pointers): (32 bit) (64 bit) vm header: 8 bytes 8 bytes ObjectId: 20 bytes 20 bytes PackedObjectInfo: 12 bytes 12 bytes ObjectToPack: 8 bytes 12 bytes LocalOTP: 20 bytes 24 bytes ----------- --------- 68 bytes 74 bytes Change-Id: I923d2736186eb2ac8ab498d3eb137e17930fcb50 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	ad5238dc67	Move FileRepository to storage.file.FileRepository This move isolates all of the local file specific implementation code into a single package, where their package-private methods and support classes are properly hidden away from the rest of the core library. Because of the sheer number of files impacted, I have limited this change to only the renames and the updated imports. Change-Id: Icca4884e1a418f83f8b617d0c4c78b73d8a4bd17 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	bf4ffff07f	Redo PackWriter object reuse selection The new selection implementation uses a public API on the ObjectReader, allowing the storage library to enumerate its candidates and select the best one for this packer without needing to build a temporary list of the candidates first. Change-Id: Ie01496434f7d3581d6d3bbb9e33c8f9fa649b6cd Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	5cfc29b491	Replace WindowCache with ObjectReader The WindowCache is an implementation detail of PackFile and how its used by ObjectDirectory. Lets start to hide it and replace the public API with a more generic concept, ObjectReader. Because PackedObjectLoader is also considered a private detail of PackFile, we have to make PackWriter temporarily dependent upon the WindowCursor and thus FileRepository and ObjectDirectory in order to just start the refactoring. In later changes we will clean up the APIs more, exposing sufficient support to PackWriter without needing the file specific implementation details. Change-Id: I676be12b57f3534f1285854ee5de1aa483895398 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	133c987f4d	Refactor alternate object databases below ObjectDirectory Not every object storage system will have the concept of alternate object databases to search, and even if they do, they may not have the notion of fast-access / slow-access split like we do within the ObjectDirectory code for pack files and loose objects. Push all of that down below the generic API so that it is a hidden detail of the ObjectDirectory and its related supporting classes. Change-Id: I54bc1ca5ff2ac94dfffad1f9a9dad7af202b9523 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	cad10e6640	Refactor object writing responsiblities to ObjectDatabase The ObjectInserter API permits ObjectDatabase implementations to control their own object insertion behavior, rather than forcing it to always be a new loose file created in the local filesystem. Inserted objects can also be queued and written asynchronously to the main application, such as by appending into a pack file that is later closed and added to the repository. This change also starts to open the door to non-file based object storage, such as an in-memory HashMap for unit testing, or a more complex system built on top of a distributed hash table. To help existing application code port to the newer interface we are keeping ObjectWriter as a delegation wrapper to the new API. Each ObjectWriter instances holds a reference to an ObjectInserter for the Repository's top-level ObjectDatabase, and it flushes and releases that instance on each object processed. Change-Id: I413224fb95563e7330c82748deb0aada4e0d6ace Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Marc Strapetz	936e4ab2f2	Repository can be configured with FS On Windows, FS_Win32_Cygwin has been used if a Cygwin Git installation is present in the PATH. Assuming that the user works with the Cygwin Git installation may result in unnecessary overhead if he actually does not. Applications built on top of jgit may have more knowledge on the actually used Git client (Cygwin or not) and hence should be able to configure which FS to use accordingly. Change-Id: Ifc4278078b298781d55cf5421e9647a21fa5db24	14 years ago
Sasa Zivkov	f3d8a8ecad	Externalize strings from JGit The strings are externalized into the root resource bundles. The resource bundles are stored under the new "resources" source folder to get proper maven build. Strings from tests are, in general, not externalized. Only in cases where it was necessary to make the test pass the strings were externalized. This was typically necessary in cases where e.getMessage() was used in assert and the exception message was slightly changed due to reuse of the externalized strings. Change-Id: Ic0f29c80b9a54fcec8320d8539a3e112852a1f7b Signed-off-by: Sasa Zivkov <sasa.zivkov@sap.com>	14 years ago
Shawn O. Pearce	374c28057a	Don't insert the same pack twice into a pack list If a concurrent thread picks up a newly created PackFile and adds it to the pack list before the IndexPack thread itself can insert the item onto the front of the list, do nothing and use the item that was picked up by that other concurrent scanning thread. This avoids a potential condition where the same pack exists in memory twice, which causes confusion later during a rescan of the directory because we don't know exactly which PackFile instance should be retained into the new list, and which should be discarded. We can stop searching through the old pack list as soon as the sort function declares that the item to insert should be before the item already in the list. Because the list is always sorted by modification time (in seconds), we should never encounter a case where the pack is positioned at the wrong spot in the list. This early break out still permits an efficient implementation of the common case, inserting a new pack at the head of the list. Change-Id: Ice4459bbd4ee9487078aff5257893883d04f05fb Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	a0a52897ed	Favor earlier PackFile instances over later duplicates There is a potential race condition during insertPack that can lead to us having the same pack file open twice in the same directory. A different thread can miss an object on disk, and trigger a scan of the directory, and notice the pack that was put in by IndexPack. So the pack winds up in the newly created PackList. The IndexPack thread then wakes up and finishes its insertPack by creating a new PackFile and inserting it into position 0 of the list. We now have the same pack listed twice. Readers will favor the earlier PackFile instance, because its the first one they come across as they iterate through the list. Keep that earlier one when we scan the pack directory again, as this will avoid needing to purge out all of the windows that may have been cached. Of course we should also fix that race condition, but this block was taking the wrong resolution if this error ever shows up, so lets first fix the block to use a more sane resolution. Change-Id: I0d339b9fd1dd8012e8fe5a564b893c0f69109e28 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Constantine Plotnikov	cc64794b24	Added caching for loose object lookup during pack indexing On Windows systems, file system lookup is a slow operation, so checking each object if it exists during indexing (after receiving the pack) could take a siginificant time. This patch introduces CachedObjectDirectory that pre-caches lookup results. Bug: 300397 Change-Id: I471b93f9bb3ee173eb37cae1d75e9e4eb49985e7 Signed-off-by: Constantine Plotnikov <constantine.plotnikov@gmail.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	0b821817fc	Add getPacks to ObjectDirectory This exposes the list of known packs, allowing callers to list them into a context like the objects/info/packs file. Change-Id: I0b889564bd176836ff5c77ba310c6d229409dcd5 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Robin Rosenberg	eb63bfc1b8	Recognize Git repository environment variables This makes the jgit command line behave like the C Git implementation in the respect. These variables are not recognized in the core, though we add support to do the overrides there. Hence other users of the JGit library, like the Eclipse plugin and others, will not be affected. GIT_DIR The location of the ".git" directory. GIT_WORK_TREE The location of the work tree. GIT_INDEX_FILE The location of the index file. GIT_CEILING_DIRECTORIES A colon (semicolon on Windows) separated list of paths that which JGit will not cross when looking for the .git directory. GIT_OBJECT_DIRECTORY The location of the objects directory under which objects are stored. GIT_ALTERNATE_OBJECT_DIRECTORIES A colon (semicolon on Windows) separated list of object directories to search for objects. In addition to these we support the core.worktree config setting when the git directory is set deliberately instead of being found. Change-Id: I2b9bceb13c0f66b25e9e3cefd2e01534a286e04c Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Git Development Community	1a6964c827	Initial JGit contribution to eclipse.org Per CQ 3448 this is the initial contribution of the JGit project to eclipse.org. It is derived from the historical JGit repository at commit `3a2dd9921c`. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago

14 Commits (5fce8d81d89a3b9790e93590b919f5af114e8628)