mirrors/jgit - jgit - source @ dussan.org

Commit Graph

Author	SHA1	Message	Date
Marc Strapetz	57d13c047f	Fix processing of broken symbolic references in RefDirectory Change-Id: Ic1ceb9c99dca2c69e61ea0ef03ec64f13714b80a Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	13 years ago
Shawn O. Pearce	461b012e95	PackWriter: Support reuse of entire packs The most expensive part of packing a repository for transport to another system is enumerating all of the objects in the repository. Once this gets to the size of the linux-2.6 repository (1.8 million objects), enumeration can take several CPU minutes and costs a lot of temporary working set memory. Teach PackWriter to efficiently reuse an existing "cached pack" by answering a clone request with a thin pack followed by a larger cached pack appended to the end. This requires the repository owner to first construct the cached pack by hand, and record the tip commits inside of $GIT_DIR/objects/info/cached-packs: cd $GIT_DIR root=$(git rev-parse master) tmp=objects/.tmp-$$ names=$(echo $root \| git pack-objects --keep-true-parents --revs $tmp) for n in $names; do chmod a-w $tmp-$n.pack $tmp-$n.idx touch objects/pack/pack-$n.keep mv $tmp-$n.pack objects/pack/pack-$n.pack mv $tmp-$n.idx objects/pack/pack-$n.idx done (echo "+ $root"; for n in $names; do echo "P $n"; done; echo) >>objects/info/cached-packs git repack -a -d When a clone request needs to include $root, the corresponding cached pack will be copied as-is, rather than enumerating all of the objects that are reachable from $root. For a linux-2.6 kernel repository that should be about 376 MiB, the above process creates two packs of 368 MiB and 38 MiB[1]. This is a local disk usage increase of ~26 MiB, due to reduced delta compression between the large cached pack and the smaller recent activity pack. The overhead is similar to 1 full copy of the compressed project sources. With this cached pack in hand, JGit daemon completes a clone request in 1m17s less time, but a slightly larger data transfer (+2.39 MiB): Before: remote: Counting objects: 1861830, done remote: Finding sources: 100% (1861830/1861830) remote: Getting sizes: 100% (88243/88243) remote: Compressing objects: 100% (88184/88184) Receiving objects: 100% (1861830/1861830), 376.01 MiB \| 19.01 MiB/s, done. remote: Total `1861830` (delta 4706), reused `1851053` (delta `1553844`) Resolving deltas: 100% (1564621/1564621), done. real 3m19.005s After: remote: Counting objects: 1601, done remote: Counting objects: 1828460, done remote: Finding sources: 100% (50475/50475) remote: Getting sizes: 100% (18843/18843) remote: Compressing objects: 100% (7585/7585) remote: Total `1861830` (delta 2407), reused `1856197` (delta 37510) Receiving objects: 100% (1861830/1861830), 378.40 MiB \| 31.31 MiB/s, done. Resolving deltas: 100% (1559477/1559477), done. real 2m2.938s Repository owners can periodically refresh their cached packs by repacking their repository, folding all newer objects into a larger cached pack. Since repacking is already considered to be a normal Git maintenance activity, this isn't a very big burden. [1] In this test $root was set back about two weeks. Change-Id: Ib87131d5c4b5e8c5cacb0f4fe16ff4ece554734b Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	71f168fcd7	PackWriter: Display totals after sending objects CGit pack-objects displays a totals line after the pack data was fully written. This can be useful to understand some of the decisions made by the packer, and has been a great tool for helping to debug some of that code. Track some of the basic values, and send it to the client when packing is done: remote: Counting objects: 1826776, done remote: Finding sources: 100% (55121/55121) remote: Getting sizes: 100% (25654/25654) remote: Compressing objects: 100% (11434/11434) remote: Total `1861830` (delta 3926), reused `1854705` (delta 38306) Receiving objects: 100% (1861830/1861830), 386.03 MiB \| 30.32 MiB/s, done. Change-Id: If3b039017a984ed5d5ae80940ce32bda93652df5 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	04759f3274	RefAdvertiser: Avoid object parsing It isn't strictly necessary to validate every reference's target object is reachable in the repository before advertising it to a client. This is an expensive operation when there are thousands of references, and its very unlikely that a reference uses a missing object, because garbage collection proceeds from the references and walks down through the graph. So trying to hide a dangling reference from clients is relatively pointless. Even if we are trying to avoid giving a client a corrupt repository, this simple check isn't sufficient. It is possible for a reference to point to a valid commit, but that commit to have a missing blob in its root tree. This can be caused by staging a file into the index, waiting several weeks, then committing that file while also racing against a prune. The prune may delete the blob, since its modification time is more than 2 weeks ago, but retain the commit, since its modification time is right now. Such graph corruption is already caught during PackWriter as it enumerates the graph from the client's want list and digs back to the roots or common base. Leave the reference validation also for that same phase, where we know we have to parse the object to support the enumeration. Change-Id: Iee70ead0d3ed2d2fcc980417d09d7a69b05f5c2f Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	13bcf05a9e	PackWriter: Make thin packs more efficient There is no point in pushing all of the files within the edge commits into the delta search when making a thin pack. This floods the delta search window with objects that are unlikely to be useful bases for the objects that will be written out, resulting in lower data compression and higher transfer sizes. Instead observe the path of a tree or blob that is being pushed into the outgoing set, and use that path to locate up to WINDOW ancestor versions from the edge commits. Push only those objects into the edgeObjects set, reducing the number of objects seen by the search window. This allows PackWriter to only look at ancestors for the modified files, rather than all files in the project. Limiting the search to WINDOW size makes sense, because more than WINDOW edge objects will just skip through the window search as none of them need to be delta compressed. To further improve compression, sort edge objects into the front of the window list, rather than randomly throughout. This puts non-edges later in the window and gives them a better chance at finding their base, since they search backwards through the window. These changes make a significant difference in the thin-pack: Before: remote: Counting objects: 144190, done remote: Finding sources: 100% (50275/50275) remote: Getting sizes: 100% (101405/101405) remote: Compressing objects: 100% (7587/7587) Receiving objects: 100% (50275/50275), 24.67 MiB \| 9.90 MiB/s, done. Resolving deltas: 100% (40339/40339), completed with 2218 local objects. real 0m30.267s After: remote: Counting objects: 61549, done remote: Finding sources: 100% (50275/50275) remote: Getting sizes: 100% (18862/18862) remote: Compressing objects: 100% (7588/7588) Receiving objects: 100% (50275/50275), 11.04 MiB \| 3.51 MiB/s, done. Resolving deltas: 100% (43160/43160), completed with 5014 local objects. real 0m22.170s The resulting pack is 13.63 MiB smaller, even though it contains the same exact objects. 82,543 fewer objects had to have their sizes looked up, which saved about 8s of server CPU time. 2,796 more objects from the client were used as part of the base object set, which contributed to the smaller transfer size. Change-Id: Id01271950432c6960897495b09deab70e33993a9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Sigend-off-by: Chris Aniszczyk <caniszczyk@gmail.com>	13 years ago
Shawn O. Pearce	2fbcba41e3	PackWriter: Cleanup findObjectToPack method Some of this code predates making ObjectId.equals() final and fixing RevObject.equals() to match ObjectId.equals(). It was therefore more complex than it needs to be, because it tried to work around RevObject's broken equals() rules by converting to ObjectId in a different collection. Also combine setUpWalker() and findObjectsToPack() methods, these can be one method and the code is actually cleaner. Change-Id: I0f4cf9997cd66d8b6e7f80873979ef1439e507fe Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>	13 years ago
Shawn O. Pearce	8f63dface2	PackWriter: Correct 'Compressing objects' progress message The first 'Compressing objects' progress message is wrong, its actually PackWriter looking up the sizes of each object in the ObjectDatabase, so objects can be sorted correctly in the later type-size sort that tries to take advantage of "Linus' Law" to improve delta compression. Rename the progress to say 'Getting sizes', which is an accurate description of what it is doing. Change-Id: Ida0a052ad2f6e994996189ca12959caab9e556a3 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>	13 years ago
Shawn O. Pearce	37a10e3006	PackWriter: Don't include edges in progress meter When compressing objects, don't include the edges in the progress meter. These cost almost no CPU time as they are simply pushed into and popped out of the delta search window. Change-Id: I7ea19f0263e463c65da34a7e92718c6db1d4a131 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>	13 years ago
Shawn O. Pearce	168114fd39	Show resolving deltas progress to push clients CGit push clients 1.6.6 and later support progress messages on the side-band-64k channel during push, as this was introduced to handle server side hook errors reported over smart HTTP. Since JGit's delta resolution isn't always as fast as CGit's is, a user may think the server has crashed and failed to report status if the user pushed a lot of content and sees no feedback. Exposing the progress monitor during the resolving deltas phase will let the user know the server is still making forward progress. This also helps BasePackPushConnection, which has a bounded timeout on how long it will wait before assuming the remote server is dead. Progress messages pushed down the side-band channel will reset the read timer, helping the connection to stay alive and avoid timing out before the remote side's work is complete. Change-Id: I429c825e5a724d2f21c66f95526d9c49edcc6ca9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	17dc6bdafd	ObjectIdSubclassMap: Support duplicate additions The new addIfAbsent() method combines get() with add(), but does it in a single step so that the common case of get() returning null for a new object can immediately insert the object into the map. Change-Id: Ib599ab4de13ad67665ccfccf3ece52ba3222bcba Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	065a0a8122	Revert "Teach PackWriter how to reuse an existing object list" This reverts commit `f5fe2dca3c`. I regret adding this feature to the public API. Caches aren't always the best idea, as they require work to maintain. Here the cache is redundant information that must be computed, and when it grows stale must be removed. The redundant information takes up more disk space, about the same size as the pack-*.idx files are. For the linux-2.6 repository, that's more than 40 MB for a 400 MB repository. So the cache is a 10% increase in disk usage. The entire point of this cache is to improve PackWriter performance, and only PackWriter performance, and only when sending an initial clone to a new client. There may be better ways to optimize this, and until we have a solid solution, we shouldn't be using a separate cache in JGit.	13 years ago
Matthias Sohn	38eec8f4a2	[findbugs] Do not ignore exceptional return value of mkdir java.io.File.mkdir() and mkdirs() report failure as an exceptional return value false. Fix the code which silently ignored this exceptional return value. Change-Id: I41244f4b9d66176e68e2c07e2329cf08492f8619 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	13 years ago
Shawn O. Pearce	f5fe2dca3c	Teach PackWriter how to reuse an existing object list Counting the objects needed for packing is the most expensive part of an UploadPack request that has no uninteresting objects (otherwise known as an initial clone). During this phase the PackWriter is enumerating the entire set of objects in this repository, so they can be sent to the client for their new clone. Allow the ObjectReader (and therefore the underlying storage system) to keep a cached list of all reachable objects from a small number of points in the project's history. If one of those points is reached during enumeration of the commit graph, most objects are obtained from the cached list instead of direct traversal. PackWriter uses the list by discarding the current object lists and restarting a traversal from all refs but marking the object list name as uninteresting. This allows PackWriter to enumerate all objects that are more recent than the list creation, or that were on side branches that the list does not include. However, ObjectWalk tags all of the trees and commits within the list commit as UNINTERESTING, which would normally cause PackWriter to construct a thin pack that excludes these objects. To avoid that, addObject() was refactored to allow this list-based enumeration to always include an object, even if it has been tagged UNINTERESTING by the ObjectWalk. This implies the list-based enumeration may only be used for initial clones, where all objects are being sent. The UNINTERESTING labeling occurs because StartGenerator always enables the BoundaryGenerator if the walker is an ObjectWalk and a commit was marked UNINTERESTING, even if RevSort.BOUNDARY was not enabled. This is the default reasonable behavior for an ObjectWalk, but isn't desired here in PackWriter with the list-based enumeration. Rather than trying to change all of this behavior, PackWriter works around it. Because the list name commit's immediate files and trees were all enumerated before the list enumeration itself starts (and are also within the list itself) PackWriter runs the risk of adding the same objects to its ObjectIdSubclassMap twice. Since this breaks the internal map data structure (and also may cause the object to transmit twice), PackWriter needs to use a new "added" RevFlag to track whether or not an object has been put into the outgoing list yet. Change-Id: Ie99ed4d969a6bb20cc2528ac6b8fb91043cee071 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	a017fdf112	Allow ObjectReuseAsIs to resort objects during writing It can be very handy for the implementation to resort the object list based on data locality, improving prefetch in the operating system's buffer cache. Export the list to the implementation was a proper List, and document that its mutable and OK to be modified. The only caller in PackWriter is already OK with these rules. Change-Id: I3f51cf4388898917b2be36670587a5aee902ff10 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	c218a0760d	PackWriter: Use TOPO order only for incremental packs When performing an initial clone of a repository there are no uninteresting commits, and the resulting pack will be completely self-contained. Therefore PackWriter does not need to honor C Git standard TOPO ordering as described in JGit commit `ba984ba2e0` ("Fix checkReferencedIsReachable to use correct base list"). Switching to COMMIT_TIME_DESC when there are no uninteresting commits allows the "Counting objects" phase to emit progress earlier, as the RevWalk will not buffer the commit list. When TOPO is set the RevWalk enumerates all commits first, before outputing any for PackWriter to mark progress updates from. Change-Id: If2b6a9903b536c7fb3c45f85d0a67ff6c6e66f22 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Robin Rosenberg	24e7f0f6fa	Fix tests broken by fix for adding files in a network share The change Ie0350e032a97e0d09626d6143c5c692873a5f6a2 was not done properly. The renamed file was not write protected, and this broke a test. Bug: 335388 Change-Id: I41b2235b7677bc5fddc70dda2a56cdd2cb53ce5d Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>	13 years ago
Robin Rosenberg	c4c8d80fd3	Fix adding files in a network share We cannot always rename read-only files on network shares, so rename the temp file for a new loose object first, and then set it as read-only. Bug: 335388 Change-Id: Ie0350e032a97e0d09626d6143c5c692873a5f6a2 Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>	13 years ago
Shawn O. Pearce	1bf0c3cdb1	Refactor IndexPack to not require local filesystem By moving the logic that parses a pack stream from the network (or a bundle) into a type that can be constructed by an ObjectInserter, repository implementations have a chance to inject their own logic for storing object data received into the destination repository. The API isn't completely generic yet, there are still quite a few assumptions that the PackParser subclass is storing the data onto the local filesystem as a single file. But its about the simplest split of IndexPack I can come up with without completely ripping the code apart. Change-Id: I5b167c9cc6d7a7c56d0197c62c0fd0036a83ec6c Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>	13 years ago
Shawn O. Pearce	165358bc99	Use heap based stack for PackFile deltas Instead of using the current thread's stack to recurse through the delta chain, use a linked list that is stored in the heap. This permits the any thread to load a deep delta chain without running out of thread stack space. Despite needing to allocate a stack entry object for each delta visited along the chain being loaded, the object allocation count is kept the same as in the prior version by removing the transient ObjectLoaders from the intermediate objects accessed in the chain. Instead the byte[] for the raw data is passed, and null is used as a magic value to signal isLarge() and enter the large object code path. Like the old version, this implementation minimizes the amount of memory that must be live at once. The current delta instruction sequence, the base it applies onto, and the result are the only live data arrays. As each level is processed, the prior base is discarded and replaced with the new result. Each Delta frame on the stack is slightly larger than the standard ObjectLoader.SmallObject type that was used before, however the Delta instances should be smaller than the old method stack frames, so total memory usage should actually be lower with this new implementation. Change-Id: I6faca2a440020309658ca23fbec4c95aa637051c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Robin Rosenberg	797ebba307	Add support for getting the system wide configuration These settings are stored in <prefix>/etc/gitconfig. The C Git binary is installed in <prefix>/bin, so we look for the C Git executable to find this location, first by looking at the PATH environment variable and then by attemting to launch bash as a login shell to find out. Bug: 333216 Change-Id: I1bbee9fb123a81714a34a9cc242b92beacfbb4a8 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>	13 years ago
Robin Rosenberg	c3f52c62a8	Fix FileSnapShot We cannot use SystemReader to get the time, unless we do that consistently, which is harder to do and be sure we are really testing what we want. Then we need to update our lastRead variable whenever we conclude that our file is not racily clean according to lastRead. It may well be clean, but we do not know that until we check the system clock again. Finally add a test for this class. Change-Id: I1894b032b9bd359d1b5325e5472d48e372599e4c Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>	13 years ago
Shawn O. Pearce	3922e026e0	FileBasedConfig: Use FileSnapshot for isOutdated() Relying only on the last modified time for a file can be tricky. The "racy git" problem may cause some modifications to be missed. Use the new FileSnapshot code to track when a configuration file has been modified, and needs to be reloaded in memory. Change-Id: Ib6312fdd3b2403eee5af3f8ae711294b0e5f9035 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	c8db22f355	Extract pack directory last modified check code Pulling the last modified checking logic out of ObjectDirectory makes it possible to reuse this code for other files, such as the $GIT_DIR/config or $GIT_DIR/packed-refs files. Change-Id: If2f27a89fc3b7adde7e65ff40bbca5d55b98b772 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	013cb8de38	Reduce calls to Repository.getConfig Each time getConfig() is called on FileRepository, it checks the last modified time of both ~/.gitconfig and $GIT_DIR?config. If $GIT_DIR/config appears to have been modified, it is read back in from disk and the current config is wiped out. When mutating a configuration file, this may cause in-memory edits to disappear. To avoid that callers need to avoid calling getConfig until after the configuration has been saved to disk. Unfortunately the API is still horribly broken. Configuration should be modified only while a lock is held on the configuration file, very similar to the way a ref is updated via its locking protocol. But our existing API is really broken for that so we'll have to defer cleaning up the edit path for a future change. Change-Id: I5888dd97bac20ddf60456c81ffc1eb8df04ef410 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Matthias Sohn	45731756a5	[findbugs] Do not ignore exceptional return value java.io.File.delete() reports failure as an exceptional return value false. Fix the code which silently ignored this exceptional return value. Also remove some duplicate deletion helper methods. Change-Id: I80ed20ca1f07a2bc6e779957a4ad0c713789c5be Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	13 years ago
Jens Baumgart	cbf5ff6ac7	LockFile.commit: retry renaming Currently the following can happen in LockFile.commit: deletion of the original file succeeds but renaming fails afterwards. In this case the original file (e.g. branch file in refs/heads) is lost. To workaround the issue the same retry logic as for file deletion is applied to file renaming. Bug: 331890 Change-Id: I68620c07f2d3ab7f3279c71a91e184e8eac69832 Signed-off-by: Jens Baumgart <jens.baumgart@sap.com> Signed-off-by: Philipp Thun <philipp.thun@sap.com>	13 years ago
Shawn O. Pearce	e0a9961b78	Avoid unnecessary decoding of length in PackFile If the object type is a whole object and all we want is the type, there is no need to skip the length header. The type is already known and can be returned as-is. Instead skip the length header only for the two delta formats, where the delta base must itself be scanned. Change-Id: I87029258e88924b3e5850bdd6c9006a366191d10 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	d29b5db695	Remove unused 'shift' variable from PackFile This variable was not used for anything, but Eclipse's JDT failed to notice because of the "shift += " operation within the body of the while loop. Here we don't need the shift because we do not decode the length, but we do have to skip over the bytes that store the length to locate the delta base. Bug: 331319 Change-Id: I200a874fd7e39e3adf2640b8cd0f53dcf91ef4c9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> CC: Remy Suen <remysuen@ca.ibm.com>	13 years ago
Shawn O. Pearce	2f6e79307d	Remove unnecessary flush calls from LockFile Change-Id: I144af9db4714acabd796880be73bd50d84b92efe Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	ed5fe8af9a	Remove unnecessary region locking from LockFile The lock file protocol relies on the atomic creation of a standardized name in the parent directory of the file being updated. Since the creation is atomic, at most one thread in any process can succeed on this creation, and all others will fail. While the lock file exists, that file is private to the thread that is writing it, and no others will attempt to read or modify the file. Consequently the use of the region level locks around the file are unnecessary, and may actually reduce performance when using NFS, SMB, or some other sort of remote filesystem that supports locking. Change-Id: Ice312b6fb4fdf9d36c734c3624c6d0537903913b Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	e0e7fe531d	Support core.fsyncRefFiles option If core.fsyncRefFiles is set to true, fsync is used whenever a reference file is updated, ensuring the file contents are also written to disk. This can help to prevent empty ref files after a system crash when using a filesystem such as HFS+ where data writes may be delayed. Change-Id: Ie508a974da50f63b0409c38afe68772322dc19f1 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	24fccadeda	Support core.fsyncObjectFiles option Some repositories may be on really unstable filesystems, but still want to have good reliability when objects are written to disk. If core.fsyncObjectFiles is set to true, request the JVM to ensure the data is written before returning success to the caller of insert. The option defaults to false because it should be useless on any filesystem that orders writes and metadata, such as ext3 mounted with data=ordered (or data=journal). But it may be useful on some systems (especially HFS+) where file content may flush to the disk independently of filesystem structure changes. Because FileChannel.force(boolean) only claims to ensure data is written if it was written using the write(ByteBuffer) method of FileChannel, redirect all writes when using fsyncObjectFiles to go through the FileChannel interface instead of through the older style OutputStream interface. This may not be necessary on all JVMs, but its more portable to follow the definition than the common behavior. Change-Id: I57f6b6bb7e403c07fbae989dbf3758eaf5edbc78 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	cfa3f365d6	Simplify LockFile write(ObjectId) case The ObjectId (for a ref) can be easily reformatted into a temporary byte[] and then passed off to write(byte[]), removing the duplicated code that existed in both write methods. Change-Id: I09740658e070d5f22682333a2e0d325fd1c4a6cb Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Robin Stocker	8cbed3462e	Make private final field static It's used as a constant. Change-Id: Ic267e8cb5b62228de15e134cd80725df592a0171	13 years ago
Robin Stocker	96bea14c7b	Use readFully() instead of read() Fixes the "Method ignores results of InputStream.read()" warning. This is the only place where read() was used instead of readFully() and the return value was not checked. So it was either an oversight or should be documented. This change assumes it was an oversight. Change-Id: I859404a7d80449c538a552427787f3e57d7c92b4	13 years ago
Shawn O. Pearce	bdf535de4f	Call ProgressMonitor.update() from main thread Don't permit transient worker threads to access the underlying output stream of a ProgressMonitor, as they might get marked as the stream's writer thread. Instead proxy update events from the workers back onto the application's real work thread. This ensures that the stream only sees a single thread, and its the thread that will remain alive for the entire life cycle of the operation. This fixes IOException("Write end dead") during local repository fetch when threaded delta search is enabled. One of the transient delta search threads became the designated writer for the pipe, and when it terminated the reader end thought the writer was dead, even though the main writer thread was still executing in PackWriter. Bug: 326557 Change-Id: I01d1b20a3d7be1c0b480c7fb5c9773c161fe5c15 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Christian Halstrick	2c38e5d461	Prevent endless loop of events fired by RefsDirectory RefsDirectory fires a RefsChangedEvent when it detect that one ref changed (new, modified, deleted). But there was a potential of wrong events beeing fired leading to a endless loop in EGit. Problem is that when calling getRefs(ALL) we don't want to report additional refs and by that we remove the additional refs from the list of "refs reported upwards last time". We fire an RefsChangedEvent because we think that the special refs are not there anymore. I fixed this by removing eventing for the additional refs. Another alternative would be to always scan also for additional refs and put them in the list of refs. But getRefs(ALL) would then remove the additional refs again. I didn't do that for performance reasons and also because I am not sure whether we want evnting for additional refs. Change-Id: Icb9398b55a8c6bbf03e38f6670feb67754ce91e0 Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>	13 years ago
Christian Halstrick	a4f7992dfb	Add support for special symref FETCH_HEAD and MERGE_HEAD The RefDirectory class was not returning FETCH_HEAD and MERGE_HEAD when trying to get all refs via getRefs(RefDatabase.ALL). This fix adds constants for FETCH_HEAD and ORIG_HEAD and adds a new getter getAdditionalRefs() to get these additional refs. To be compatible with c git the getRefs(ALL) method will not return FETCH_HEAD, MERGE_HEAD and ORIG_HEAD. Change-Id: Ie114ca92e9d5e7d61d892f4413ade65acdc08c32 Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>	13 years ago
Matthias Sohn	ffc010fda4	[findbugs] Static comparator made final Fixing FindBugs warning MS_SHOULD_BE_FINAL. Change-Id: Ic69e6f6425e0a8950ce809eb3894f48a33e860aa Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	13 years ago
Shawn O. Pearce	d00420ae6e	Make ObjectDirectory getPacks() work the first time If an object hasn't been accessed yet the pack list for a repository may not have been scanned from disk. If an application (e.g. the dumb transport servlet support code) asks for the pack list for an ObjectDirectory, we should load it immediately. Change-Id: I93d7b1bca422d905948e8e83b2afa83c8894a68b Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	13 years ago
Shawn O. Pearce	e51e06946f	Update CachedObjectDirectory when inserting objects If an ObjectInserter is created from a CachedObjectDirectory, we need to ensure the cache is updated whenever a new loose object is actually added to the loose objects directory, otherwise a future read from an ObjectReader on the CachedObjectDirectory might not be able to open the newly created object. We mostly had the infrastructure in place to implement this due to the injection of unpacked large deltas, but we didn't have a way to pass the ObjectId from ObjectDirectoryInserter to CachedObjectDirectory, because the inserter was using the underlying ObjectDirectory and not the CachedObjectDirectory. Redirecting to CachedObjectDirectory ensures the cache is updated. Change-Id: I1f7bdfacc7ad77ebdb885f655e549cc570652225 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	7ba31474a3	Increase core.streamFileThreshold default to 50 MiB Projects like org.eclipse.mdt contain large XML files about 6 MiB in size. So does the Android project platform/frameworks/base. Doing a clone of either project with JGit takes forever to checkout the files into the working directory, because delta decompression tends to be very expensive as we need to constantly reposition the base stream for each copy instruction. This can be made worse by a very bad ordering of offsets, possibly due to an XML editor that doesn't preserve the order of elements in the file very well. Increasing the threshold to the same limit PackWriter uses when doing delta compression (50 MiB) permits a default configured JGit to decompress these XML file objects using the faster random-access arrays, rather than re-seeking through an inflate stream, significantly reducing checkout time after a clone. Since this new limit may be dangerously close to the JVM maximum heap size, every allocation attempt is now wrapped in a try/catch so that JGit can degrade by switching to the large object stream mode when the allocation is refused. It will run slower, but the operation will still complete. The large stream mode will run very well for big objects that aren't delta compressed, and is acceptable for delta compressed objects that are using only forward referencing copy instructions. Copies using prior offsets are still going to be horrible, and there is nothing we can do about it except increase core.streamFileThreshold. We might in the future want to consider changing the way the delta generators work in JGit and native C Git to avoid prior offsets once an object reaches a certain size, even if that causes the delta instruction stream to be slightly larger. Unfortunately native C Git won't want to do that until its also able to stream objects rather than malloc them as contiguous blocks. Change-Id: Ief7a3896afce15073e80d3691bed90c6a3897307 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>	13 years ago
Shawn O. Pearce	2ee6d95e5b	Fix UnsupportedOperationException while fixing thin pack If a thin pack has a large delta we need to be able to open its cached copy from the loose object directory through the CachedObjectDatabase handle. Unfortunately that did not support the openObject2 method, which the LargePackedDeltaObject used directly to bypass looking at the pack files. Bug: 324868 Change-Id: I1d5886a6c3254c6dea2852d50b8614c31a93e615 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	5fce8d81d8	Fix cloning of repositories with big objects When running IndexPack we use a CachedObjectDirectory, which knows what objects are loose and tries to avoid stat(2) calls for objects that do not exist in the repository, as stat(2) on Win32 is very slow. However large delta objects found in a pack file are expanded into a loose object, in order to avoid costly delta chain processing when that object is used as a base for another delta. If this expand occurs while working with the CachedObjectDirectory, we need to update the cached directory data to include this new object, otherwise it won't be available when we try to open it during the object verify phase. Bug: 324868 Change-Id: Idf0c76d4849d69aa415ead32e46a435622395d68 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	3fe527624d	Probe filesystem and set core.filemode correctly When creating a new FileRepository, probe the capability of the local filesystem and set core.filemode based on how it reacts. We can't just rely on FS.supportsExecute() because a POSIX system (which usually does support execute) might be storing the repository on a partition that doesn't have execute support (e.g. plain FAT-32). Creating a temporary file, setting both states, checking we get the desired results will let us set the variable correctly on all systems. Change-Id: I551488ea8d352d2179c7b244f474d2e3d02567a2 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	41dd9ed1c0	Unpack and cache large deltas as loose objects Instead of spooling large delta bases into temporary files and then immediately deleting them afterwards, spool the large delta out to a normal loose object. Later any requests for that large delta can be answered by reading from the loose object, which is much easier to stream efficiently for readers. Since the object is now duplicated, once in the pack as a delta and again as a loose object, any future prune-packed will automatically delete the loose object variant, releasing the wasted disk space. As prune-packed is run automatically during either repack or gc, and gc --auto triggers automatically based on the number of loose objects, we get automatic cache management for free. Large objects that were unpacked will be periodically cleared out, and will simply be restored later if they are needed again. After a short offline discussion with Junio Hamano today, we may want to propose a change to prune-packed to hold onto larger loose objects which also exist in pack files as deltas, if the loose object was recently accessed or modified in the last 2 days. Change-Id: I3668a3967c807010f48cd69f994dcbaaf582337c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	3f66e65e71	Remember loose objects and fast-track their lookup Recently created objects are usually what branches point to, and are usually written out as loose objects. But due to the high cost of asking the operating system if a file exists, these are the last thing that ObjectDirectory examines when looking for an object by its ObjectId. Caching recently seen loose objects permits the opening code to jump directly to the loose object, accelerating lookup for branch heads that are accessed often. To avoid exploding the cache its limited to approximately 2048 entries. When more ids are added, the table is simply cleared and reset in size. Change-Id: I18f483217412b102f754ffd496c87061d592e535 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	eb64ccad6d	Correctly name DeltaBaseCache This class is used only to cache the unpacked form of an object that was used as a base for another object. The theory goes that if an object is used as a delta base for A, it will probably also be a delta base for B, C, D, E, etc. and therefore having an unpacked copy of it on hand will make delta resolution for the others very fast. However since objects are usually only accessed once, we don't want to cache everything we unpack, just things that we are likely to need again. The only things we need again are the delta bases. Hence, its a delta base cache. This gets us the class name UnpackedObjectCache back, so we can use it to actually create a cache of unpacked object information. Change-Id: I121f356cf4eca7b80126497264eac22bd5825a1d Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	ba984ba2e0	Fix checkReferencedIsReachable to use correct base list When checkReferencedIsReachable is set in ReceivePack we are trying to prove that the push client is permitted to access an object that it did not send to us, but that the received objects link to either via a link inside of an object (e.g. commit parent pointer or tree member) or by a delta base reference. To do this check we are making a list of every potential delta base, and then ensuring that every delta base used appears on this list. If a delta base does not appear on this list, we abort with an error, letting the client know we are missing a particular object. Preventing spurious errors about missing delta base objects requires us to use the exact same list of potential delta bases as the remote push client used. This means we must use TOPO ordering, and we need to enable BOUNDARY sorting so that ObjectWalk will correctly include any trees found during the enumeration back to the common merge base between the interesting and uninteresting heads. To ensure JGit's own push client matches this same potential delta base list, we need to undo `60aae90d4d` ("Disable topological sorting in PackWriter") and switch back to using the conventional TOPO ordering for commits in a pack file. This ensures that our own push client will use the same potential base object list as checkReferencedIsReachable uses on the receiving side. Change-Id: I14d0a326deb62a43f987b375cfe519711031e172 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	741659fed4	DeltaStream: Fix data corruption when reading large copies If the copy instruction was larger than the input buffer given to us, we copied the wrong part of the base stream during the next read(). This occurred on really big binary files where a copy instruction of 64k wasn't unreasonable, but the caller's buffer was only 8192 bytes long. We copied the first 8192 bytes correctly, but then reseeked the base stream back to the start of the copy region on the second read of 8192 bytes. Instead of a sequence like ABCD being read into the caller, we read AAAA. Change-Id: I240a3f722a3eda1ce8ef5db93b380e3bceb1e201 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago

1 2 3

125 Commits (stable-0.11)