mirrors/jgit - jgit - source @ dussan.org

Commit Graph

Author	SHA1	Message	Date
Shawn Pearce	1513a5632d	Allow DfsReader to be subclassed Necessary if a DFS implementation wants to override close() to record DfsReaderIoStats. Change-Id: I144575f9bf1abf2c1fd72030550c4f0795fcf44d	7 years ago
Thirumala Reddy Mutchukota	5e250e45be	Delete expired garbage even when there is no GC pack present. Delete the condition to check whether the garbage pack creation time is older than the last GC operation, because it's not possible to find the last GC operation time when there is no GC pack. Add additional tests to make sure the contents of the expired garbage packs are considered during the GC operation and any actively referenced objects from the garbage packs are copied successfully into the GC pack before deleting the garbage pack. Change-Id: I09e8b2656de8ba7f9b996724ad1961d908e937b6 Signed-off-by: Thirumala Reddy Mutchukota <thirumala@google.com>	7 years ago
David Pursehouse	3b4448637f	Enable and fix warnings about redundant specification of type arguments Since the introduction of generic type parameter inference in Java 7, it's not necessary to explicitly specify the type of generic parameters. Enable the warning in Eclipse, and fix all occurrences. Change-Id: I9158caf1beca5e4980b6240ac401f3868520aad0 Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>	7 years ago
Shawn Pearce	d67b183537	Prefer smaller GC files during DFS garbage collection In `8ac65d33ed` PackWriter changed its behavior to always prefer the last object representation presented to it by the ObjectReuseAsIs implementation. This was a fix to avoid delta chain cycles. Unfortunately it can lead to suboptimal compression when concurrent GCs are run on the same repository. One case is automatic GC running (with default settings) in parallel to a manual GC that has disabled delta reuse in order to generate new smaller deltas for the entire history of the repository. Running GC with no-reuse generally requires more CPU time, which also translates to a longer running time. This can lead to a race where the automatic GC completes before the no-reuse GC, leaving the repository in a state such as: no-reuse GC: size 1 GiB, mtime = 18:45 auto GC: size 8 GiB, mtime = 17:30 With the default sort ordering, the smaller no-reuse GC pack is sorted earlier in the pack list, due to its more recent mtime. During object reuse in a future GC, these smaller representations are considered first by PackWriter, but are all discarded when the auto GC file from 17:30 is examined second (due to its older mtime). Work around this in two ways. Well formed DFS repositories should have at most 1 GC pack. If 2 or more GC packs exist, break the sorting tie by selecting the smaller file earlier in the pack list. This allows all normal read code paths to favor the smaller file, which places less pressure on the DfsBlockCache. If any GC race happens, readers serving clone requests will prefer the file that is smaller. During object reuse, flip this ordering so that the smaller file is last. This allows PackWriter to see smaller deltas last, replacing larger representations that were previously considered from other pack files. Change-Id: I0b7dc8bb9711c82abd6bd16643f518cfccc6d31a	7 years ago
Thirumala Reddy Mutchukota	006f4d4d29	Reintroduce garbage pack coalescing when ttl > 0. Disabling the garbage pack coalescing when garbageTtl > 0 can result in lot of garbage packs if they are created within the garbageTtl time. To avoid a large number of garbage packs, re-introducing garbage pack coalescing for the packs that are created within a single calendar day when the garbageTtl is more than one day or one third of the garbageTtl. Change-Id: If969716aeb55fb4fd0ff71d75f41a07638cd5a69 Signed-off-by: Thirumala Reddy Mutchukota <thirumala@google.com>	7 years ago
Thirumala Reddy Mutchukota	c9f55032a2	Record the estimated size of the pack files. The Compacter and Garbage Collector will record the estimated size of the newly going to be created compact, gc or garbage packs. This information can be used by the clients to better make a call on how to actually store the pack based on the approximated expected size. Added a new protected method DfsObjDatabase.newPack(PackSource packSource, long estimatedPackSize), so that the clients can override this method to make use of the estimatedPackSize while creating a new PackDescription object. The default implementation of this method is equivalent to newPack(packSource).setEstimatedPackSize(estimatedPackSize). I didn't make it abstract because that would force all the existing sub classes of DfsObjDatabase to implement this method. Due to this default implementation, the estimatedPackSize is added to DfsPackDescription using a setter instead of a constructor parameter (even though constructor parameter would be a better choice as this value is set only during the object creation). Change-Id: Iade1122633ea774c2e842178a6a6cbb4a57b598b Signed-off-by: Thirumala Reddy Mutchukota <thirumala@google.com>	7 years ago
Philipp Marx	8adbfe4da6	Check that DfsBlockCache#blockSize is a power of 2 In case a value is used which isn’t a power of 2 there will be a high chance of java.lang.ArrayIndexOutBoundsException and org.eclipse.jgit.errors.CorruptObjectException due to a mismatching assumption for the DfsBlockCache#blockSizeShift parameter. Change-Id: Ib348b3704edf10b5f93a3ffab4fa6f09cbbae231 Signed-off-by: Philipp Marx <smigfu@googlemail.com>	7 years ago
Mike Williams	fd527a2cd7	Prune UNREACHABLE_GARBAGE packs when they expire DfsGarbageCollector will now enforce a maximum time to live (TTL) for UNREACHABLE_GARBAGE packs. The default TTL is 1 day, which should be enough time to avoid races with other processes that are inserting data into the repository. Change-Id: Id719e6e2a03cfc9a0c0aef8ed71d261dda14bd0c Signed-off-by: Mike Williams <miwilliams@google.com>	8 years ago
Dave Borowitz	0d6ba84065	DfsInserter: Optionally disable existing object check When using a DfsInserter for high-throughput insertion of many objects (analogous to git-fast-import), we don't necessarily want to do a random object lookup for each. It'll be faster from the inserter's perspective to insert the duplicate objects and let a later GC handle the deduplication. Change-Id: Ic97f5f01657b4525f157e6df66023f1f07fc1851	8 years ago
Dave Borowitz	adff322a69	Expose the ObjectInserter that created an ObjectReader We've found in Gerrit Code Review that it is common to pass around both an ObjectReader (or more commonly a RevWalk wrapping one) and an ObjectInserter. These code paths often assume that the ObjectReader can read back any objects created by the ObjectInserter without flushing. However, we previously had no way to enforce that constraint programmatically, leading to hard-to-spot problems. Provide a solution by exposing the ObjectInserter that created an ObjectReader, when known. Callers can either continue passing both objects and check: reader.getCreatedFromInserter() == inserter or they can just pass around ObjectReader and extract the inserter when it's needed (checking that it's not null at usage time). Change-Id: Ibbf5d1968b506f6b47030ab1b046ffccb47352ea	8 years ago
Matthias Sohn	7ee184acfa	Fix imports in DfsInserterTest - remove unused import of AnyObjectId - auto-sort import statements Change-Id: I1c7cec2734bd58370a7dfae70a6a4ccbe3e304ce Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	8 years ago
Mike Williams	c4d73fb7cc	Insert duplicate objects to prevent race during garbage collection. Prior to this change, DfsInserter would not insert an object into a pack if it already existed in another pack in the repository, even if that pack was unreachable. Consider this sequence of events: - Object FOO is pushed to a repository. - Subsequent ref changes make FOO UNREACHABLE_GARBAGE. - FOO is subsequently re-inserted using a DfsInserter, but skipped due to existing in UNREACHABLE_GARBAGE. - The repository is repacked; FOO will not be written into a new pack because it is not yet reachable from a reference. If the UNREACHABLE_GARBAGE packs are deleted, FOO disappears. - A reference is updated to reference FOO. This reference is now broken as FOO was removed when the repacking process deleted the UNREACHABLE_GARBAGE pack that stored the only copy of FOO. The garbage collector can't safely delete the UNREACHABLE_GARBAGE pack because FOO might be in the middle of being re-inserted/re-packed. This change writes a duplicate copy of an object if it only exists in UNREACHABLE_GARBAGE. This "freshens" the object to give it a chance to survive long enough to be made reachable through a reference. Change-Id: I20f2062230f3af3bccd6f21d3b7342f1152a5532 Signed-off-by: Mike Williams <miwilliams@google.com>	8 years ago
Matthias Sohn	686124bec3	Replace deprecated release() methods by close() See the discussion [1] in the Gerrit mailing list. [1] https://groups.google.com/forum/#!topic/repo-discuss/RRQT_xCqz4o Change-Id: I2c67384309c5c2e8511a7d0d4e088b4e95f819ff Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	9 years ago
Shawn Pearce	8ff08455f6	Fix memory leak in dfs.DeltaBaseCase The LRU chain management code was broken leading to situations where the chain was incomplete. This prevented the cache from removing items when it exceeded its memory target, causing a leak. One case was repeated hit on the head of the chain. moveToHead(e) was invoked linking the head back to itself in a cycle orphaning the rest of the table. Add some unit tests to cover this and a few other paths. Change-Id: Ib27486eaa1b1d2bf1c745a56d0a5832bfb029322	9 years ago
Shawn Pearce	d70419ab00	Revert "Add a method to DfsOutputStream to read as an InputStream" This reverts commit `b646578d89`. openInputStream() is never used in JGit, nor is it used by any known working DFS implementation. The method was added as a utility for reading back from a DfsInserter, but the final implementation of that feature does not requrire this method. Change-Id: I075ad95e40af49c92b554480f8993ef5658f7684	9 years ago
Dave Borowitz	e1856dbf44	Add a method to ObjectInserter to read back inserted objects In the DFS implementation, flushing an inserter writes a new pack to the storage system and is potentially very slow, but was the only way to ensure previously-inserted objects were available. For some tasks, like performing a series of three-way merges, the total size of all inserted objects may be small enough to avoid flushing the in-memory buffered data. DfsOutputStream already provides a read method to read back from the not-yet-flushed data, so use this to provide an ObjectReader in the DFS case. In the file-backed case, objects are written out loosely on the fly, so the implementation can just return the existing WindowCursor. Change-Id: I454fdfb88f4d215e31b7da2b2a069853b197b3dd	11 years ago
Dave Borowitz	b646578d89	Add a method to DfsOutputStream to read as an InputStream Change-Id: I0ec1f17a88bc14f22c10f9bc8d6f5b5118410e3a	11 years ago

17 Commits (stable-4.8)