Add config parameter gc.prunePackExpire for packfile expiration
JGit's Garbage Collector is repacking relevant objects into new
packfiles and is afterwards deleting the now obsolete packfiles. But to
prevent problems caused by race conditions JGit was not deleting
packfiles when they are too young. The same mechanism as for loose
objects and the config parameter gc.pruneExpire was used.
But JGit was reusing the parameter gc.pruneExpire also for packfiles
which may cause a lot of filesystem consumption if gc.pruneExpire was
set to the default of 2 weeks. Only two weeks after packfile creation gc
was allowed to delete this packfile.
This change introduces a new config paramter gc.prunePackExpire with a
default of "1.hour". This parameter is used when packfiles are deleted.
Only packfiles older than the specified time can be deleted.
For loose objects the behaviour is not changed and only the old
parameter gc.pruneExpire is relevant.
Change-Id: I6209efb05678b15153bd22479dc13486907a44f8
ObjectDirectoryTest: Fix warnings about variable hiding
The variable and parameter named 'db' were hiding class members
with the same name.
Change-Id: I27017afdc5f49c38c6f5be494e7a21239ea601a7
Signed-off-by: David Pursehouse <david.pursehouse@sonymobile.com>
RefDirectoryTest: Fix warning about member variable hiding
The parameter name 'totalWork' was hiding a class member variable
of the same name.
Change-Id: I646525e82900e23ffabfc756bcf5052ef873656a
Signed-off-by: David Pursehouse <david.pursehouse@sonymobile.com>
T0003_BasicTest: Open autocloseable types in try-with-resource
FileRepository and ObjectInserter.Formatter are autocloseable, so
use try-with-resource for these.
Remove suppression of unused variable warning that is no longer
necessary.
Change-Id: I270829f0a4030083c9599eb5785b0145dc590ed8
Signed-off-by: David Pursehouse <david.pursehouse@sonymobile.com>
ConcurrentRepackTest: Don't use deprecated WindowCache.reconfigure
Replace with calls to WindowCacheConfig.install() as mentioned in
WindowCache.reconfigure's deprecation notice.
Change-Id: Ifdb33501a2209239029c815b1e4e844ea5b56075
Signed-off-by: David Pursehouse <david.pursehouse@sonymobile.com>
UnpackedObjectTest: Create ObjectInserter.Formatter in try-with-resource
The ObjectInserter.Formatter instance is only used to call idFor.
Factor out a utility method to do that.
Change-Id: I4ef823110c2152ac7905681df3217eb8001f5bd9
Signed-off-by: David Pursehouse <david.pursehouse@sonymobile.com>
FileRepositoryBuilderTest: Use try-with-resource for auto-closeables
Use try-with-resource to create instances of FileRepository and
FileWriter.
"resource" and "unused" warnings no longer occur, so remove the
suppression annotations.
Change-Id: I3ad58d4cc2d4c019cd8edda7cb401e9d9f3fb790
Signed-off-by: David Pursehouse <david.pursehouse@sonymobile.com>
Use FileRepositoryBuilder to create the Repository, except in cases
where the creation was already in a try-block. Convert those to use
a try-with-resource.
Change-Id: I7d7adeee81bda6e80d91a119c7d690de3d00dc2b
Signed-off-by: David Pursehouse <david.pursehouse@sonymobile.com>
RefTreeDatabase: Allow ORIG_HEAD, etc. on non-bare repositories
Store these in the bootstrap layer where they are using $GIT_DIR
as the storage directory for any reference that does not contain '/'.
Change-Id: I5595bf514e4475b7c7e799c2c79446597a3abb4a
RefTreeDatabase: Expose bootstrap refs in getAdditionalRefs
By showing the bootstrap layer in getAdditionalRefs() garbage
collector code can be more RefDatabase agnostic and not care about
the special case of RefTree and RefTreeNames for the purposes of
building up the roots to GC. Instead they can combine getRefs(ALL)
and getAdditionalRefs() and have a clean set of roots.
Change-Id: I665cd2456e9316640215b6a08bc728d1356f36d8
PackWriter: Declare preparePack object sets as @NonNull
Require callers to pass in valid sets for both want and have
collections. Offer PackWriter.NONE as a handy constant for an
empty collection for the have part of preparePack instead of null.
Change-Id: Ifda4450f5e488cbfefd728382b7d30797e229217
FileRepository: Support extensions.refsBackendType = RefTree
This experimental code can be enabled in $GIT_DIR/config:
[core]
repositoryformatversion = 1
[extensions]
refsBackendType = RefTree
When these are set the repository will read references from the
RefTree rooted by the $GIT_DIR/refs/txn/committed reference.
Update debug-rebuild-ref-tree to rebuild refs/txn/committed only from
the bootstrap layer. This avoids misuse by rebuilding using packed-refs
and $GIT_DIR/refs tree.
Change-Id: Icf600e4a36b2f7867822a7ab1f1617d73c710a4b
RefTreeDatabase: Ref database using refs/txn/committed
Instead of storing references in the local filesystem rely on the
RefTree rooted at refs/txn/committed. This avoids needing to store
references in the packed-refs file by keeping all data rooted under
a single refs/txn/committed ref.
Performance to scan all references from a well packed RefTree is very
close to reading the packed-refs file from local disk.
Storing a packed RefTree is smaller due to pack file compression,
about 49.39 bytes/ref (on average) compared to packed-refs using
~65.49 bytes/ref.
Change-Id: I75caa631162dc127a780095066195cbacc746d49
Remove deprecated Tree, TreeEntry, FileTreeEntry and friends
These types were deprecated in 0.9.1 (aka 384a19eee0).
If anyone is still using them, its time to stop.
Change-Id: I3f73347ba78c639e0c6a504812bc1a0702f829b1
A group of updates can be applied by updating the tree in one step,
writing out a new root tree, and storing its SHA-1. If references
are stored in RefTrees, comparing two repositories is a matter of
checking if two SHA-1s are identical. Without RefTrees comparing two
repositories requires listing all references and comparing the sets.
Track the "refs/" directory as a root tree by storing references
that point directly at an object as a GITLINK entry in the tree.
For example "refs/heads/master" is written as "heads/master".
Annotated tags also store their peeled value with ^{} suffix, using
"tags/v1.0" and "tags/v1.0^{}" GITLINK entries.
Symbolic references are written as SYMLINK entries with the blob of
the symlink carrying the name of the symbolic reference target.
HEAD is outside of "refs/" namespace so it is stored as a special
"..HEAD" entry. This name is chosen because ".." is not valid in
a reference name and it almost looks like "../HEAD" which names
HEAD if the reader was inside of the "refs/" directory.
A new Command type is required to handle symbolic references and
peeled references.
Change-Id: Id47e5d4d32149a9e500854147edd7d93c1041a39
Hoist ObjectIdSet up to lib as part of the public API and add
the interface to some common types like PackIndex and JGit custom
ObjectId map types. This cleans up wrapper code in a number of
places by allowing direct use of the types as an ObjectIdSet.
Future commits can now rely on ObjectIdSet as a simple read-only
type to check a set of objects from a number of storage options.
Change-Id: Ib62b062421d475bd52abd6c84a73916ef36e084b
Previously, non-reuse deltas were only included in packStatistics if they
were not cached by the deltaWindow.
Change-Id: I7684d8214875f0a7569b34614f8a3ba341dbde9c
Signed-off-by: James Kolb <jkolb@google.com>
Repository: Introduce exactRef and findRef, deprecate getRef
The Repository class provides only one method to look up a ref by
name, getRef. If I request refs/heads/master and that ref does not
exist, getRef will look further in the search path:
ref/refs/heads/master
refs/heads/refs/heads/master
refs/remotes/refs/heads/master
This behavior is counterintuitive, needlessly inexpensive, and usually
not what the caller expects.
Allow callers to specify whether to use the search path by providing
two separate methods:
- exactRef, which looks up a ref when its exact name is known
- findRef, which looks for a ref along the search path
For backward compatibility, keep getRef as a deprecated synonym for
findRef.
This change introduces findRef and exactRef but does not update
callers outside tests to use them yet.
Change-Id: I35375d942baeb3ded15520388f8ebb9c0cc86f8c
Signed-off-by: Jonathan Nieder <jrn@google.com>
RefDirectory.getRef: Treat fake missing symrefs like real ones
getRef() loops over its search path to find a ref:
Ref ref = null;
for (String prefix : SEARCH_PATH) {
ref = readRef(prefix + needle, packed);
if (ref != null) {
ref = resolve(ref, 0, null, null, packed);
break;
}
}
fireRefsChanged();
return ref;
If readRef returns null (indicating that the ref does not exist), the
loop continues so we can find the ref later in the search path. And
resolve should never return null, so if we return null it should mean
we exhausted the entire search path and didn't find the ref.
... except that resolve can return null: it does so when it has
followed too many symrefs and concluded that there is a symref loop:
if (MAX_SYMBOLIC_REF_DEPTH <= depth)
return null; // claim it doesn't exist
Continue the loop instead of returning null immediately. This makes
the behavior more consistent.
Arguably getRef should throw an exception when a symref loop is
detected. That would be a more invasive change, so if it's a good
idea it will have to wait for another patch.
Change-Id: Icb1c7fafd4f1e34c9b43538e27ab5bbc17ad9eef
Signed-off-by: Jonathan Nieder <jrn@google.com>
RefDirectory.exactRef: Do not ignore symrefs to unborn branch
When asked to read a symref pointing to a branch-yet-to-be-born (such
as HEAD in a newly initialized repository), DfsRepository and
FileRepository return different results.
FileRepository:
exactRef("HEAD") => null
DfsRepository:
exactRef("HEAD") => SymbolicRef[HEAD -> refs/heads/master=00000000]
getRef("HEAD") returns the same as DfsRepository's exactRef in both
backends.
The intended behavior is the DfsRepository one: exactRef() is supposed
to be like getRef(), but more exact because it doesn't need to
traverse the search path.
The discrepancy is because DfsRefDatabase implements exactRef()
directly with the intended semantics, while RefDirectory uses a
fallback implementation built on top of getRefs(). getRefs() skips
symrefs to an unborn branch.
Override the fallback implementation with a correct implementation
that is similar to getRef() to avoid this. A followup change will fix
the fallback.
Change-Id: Ic138a5564a099ebf32248d86b93e2de9ab3c94ee
Reported-by: David Pursehouse <david.pursehouse@sonymobile.com>
Improved-by: Christian Halstrick <christian.halstrick@sap.com>
Bug: 478865
Insert duplicate objects to prevent race during garbage collection.
Prior to this change, DfsInserter would not insert an object into a pack
if it already existed in another pack in the repository, even if that
pack was unreachable. Consider this sequence of events:
- Object FOO is pushed to a repository.
- Subsequent ref changes make FOO UNREACHABLE_GARBAGE.
- FOO is subsequently re-inserted using a DfsInserter, but skipped
due to existing in UNREACHABLE_GARBAGE.
- The repository is repacked; FOO will not be written into a new pack
because it is not yet reachable from a reference. If the
UNREACHABLE_GARBAGE packs are deleted, FOO disappears.
- A reference is updated to reference FOO. This reference is now broken
as FOO was removed when the repacking process deleted the
UNREACHABLE_GARBAGE pack that stored the only copy of FOO.
The garbage collector can't safely delete the UNREACHABLE_GARBAGE
pack because FOO might be in the middle of being re-inserted/re-packed.
This change writes a duplicate copy of an object if it only exists in
UNREACHABLE_GARBAGE. This "freshens" the object to give it a chance to
survive long enough to be made reachable through a reference.
Change-Id: I20f2062230f3af3bccd6f21d3b7342f1152a5532
Signed-off-by: Mike Williams <miwilliams@google.com>
Bitmap generation: Add a test of ordering commits by "chains"
When commits are selected for bitmap generation, they are reordered
so that related "chains" of commits are grouped together. Chains are
"subbranches" of commits that may branch off of and re-merge with the
main line. Grouping by chains means that the XOR difference between
consecutive selected commits will be smaller, resulting in better
run-length compression of the XORed bitmaps.
Add a new testSelectionOrderingWithChains() test in a new
GcCommitSelectionTest test class. Also move related GC commit selection
tests out of GcBasicPackingTest and into GcCommitSelectionTest.
Change-Id: I8e80cac29c4ca8193b41c9898e5436c22a659f11
Signed-off-by: Terry Parker <tparker@google.com>
Expose the following bitmap selection parameters via PackConfig:
"bitmapContiguousCommitCount", "bitmapRecentCommitCount",
"bitmapRecentCommitSpan", "bitmapDistantCommitSpan",
"bitmapExcessiveBranchCount", and "bitmapInactiveBranchAge".
The value of bitmapContiguousCommitCount, whereby bitmaps are
created for the most recent N commits in a branch, has never
been verified. If experiments show that they are not valuable,
then we can simplify the implementation so that there is only
a concept of recent and distant commit history (defined by
"bitmapRecentCommitCount"), and the only controls we need are
"bitmapRecentCommitSpan" and "bitmapDistantCommitSpan".
Change-Id: I288bf3f97d6fbfdfcd5dde2699eff433a7307fb9
Signed-off-by: Terry Parker <tparker@google.com>
Update bitmap selection throttling to fully span active branches.
Replace the “bitmapCommitRange” parameter that was recently introduced
with two new parameters: “bitmapExcessiveBranchCount” and
“bitmapInactiveBranchAgeInDays”. If the count of branches does not
exceed “bitmapExcessiveBranchCount”, then the current algorithm is kept
for all branches.
If the branch count is excessive, then the commit time for the tip
commit for each branch is used to determine if a branch is “inactive”.
"Active" branches get full commit selection using the existing
algorithm. "Inactive" branches get fewer bitmaps near the branch tips.
Introduce a "contiguousCommitCount" parameter that always enforces that
the N most recent commits in a branch are selected for bitmaps. The
previous nextSelectionDistance() algorithm created anywhere from 1-100
contiguous bitmaps at branch tips.
For example, consider a branch with commits numbering 0-300, with 0
being the most recent commit. If the most recent 200 commits are not
merge commits and the 200th commit was the last one selected,
nextSelectionDistance() returned 100, causing commits 200-101 to be
ignored. Then a window of size 100 was evaluated, searching for merge
commits. Since no merge commits are found, the next commit (commit 0)
was selected, for a total of 1 commit in the topmost 100 commits.
If instead the 250th commit was selected, then by the same logic
commit 50 is selected. At that point nextSelectionDistance() switches to
selecting consecutive commits, so commits 0-50 in the topmost 100
commits are selected. The "contiguousCommitCount" parameter provides
more determinism by always selecting a constant number or topmost
commits.
Add an optimization to break out of the inner loop of selectCommits() if
all of the commits for the current branch have already been found.
When reusing bitmaps from an existing pack, remove unnecessary
populating and clearing of the writeBitmaps/PackBitmapIndexBuilder.
Add comments to PackWriterBitmapPreparer, rename methods and variables
for readability.
Add tests for bitmap selection with and without merge commits and with
excessive branch pruning triggered.
Note: I will follow up with an additional change that exposes the new
parameters through PackConfig.
Change-Id: I5ccbb96c8849f331c302d9f7840e05f9650c4608
Signed-off-by: Terry Parker <tparker@google.com>
Test stability: add fsTick() to avoid random testPruneNone() failures
At least on Windows the test failed each second time on the last assert.
Adding a small timeout before gc.prune() makes the test stable again.
Change-Id: I23d98dd565912c58dcf2f24f3ebc24824670cff3
Signed-off-by: Andrey Loskutov <loskutov@gmx.de>
Limit the range of commits for which bitmaps are created.
A bitmap index contains bitmaps for a set of commits in a pack file.
Creating a bitmap for every commit is too expensive, so heuristics
select the most "important" commits. The most recent commits are the
most valuable. To clone a repository only those for the branch tips are
needed. When fetching, only commits since the last fetch are needed.
The commit selection heuristics generally work, but for some
repositories the number of selected commits is prohibitively high. One
example is the MSM 3.10 Linux kernel. With over 1 million commits on
2820 branches, the current heuristics resulted in +36k selected commits.
Each uncompressed bitmap for that repository is ~413k, making it
difficult to complete a GC operation in available memory.
The benefit of creating bitmaps over the entire history of a repository
like the MSM 3.10 Linux kernel isn't clear. For that repository, most
history for the last year appears to be in the last 100k commits.
Limiting bitmap commit selection to just those commits reduces the count
of selected commits from ~36k to ~10.5k. Dropping bitmaps for older
commits does not affect object counting times for clones or for fetches
on clients that are reasonably up-to-date.
This patch defines a new "bitmapCommitRange" PackConfig parameter to
limit the commit selection process when building bitmaps. The range
starts with the most recent commit and walks backwards. A range of 10k
considers only the 10000 most recent commits. A range of zero creates
bitmaps only for branch tips. A range of -1 (the default) does not limit
the range--all commits in the pack are used in the commit selection
process.
Change-Id: Ied92c70cfa0778facc670e0f14a0980bed5e3bfb
Signed-off-by: Terry Parker <tparker@google.com>
For loose objects an expiration date can be set which will save too
young objects from being deleted. Add the same for packfiles. Packfiles
which are too young are not deleted.
Bug: 468024
Change-Id: I3956411d19b47aaadc215dab360d57fa6c24635e
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
exactRef(ref1, ref2, ref3) requests multiple specific refs in a single
lookup, which may be faster in some backends than looking them up one by
one.
firstExactRef generalizes getRef by finding the first existing ref from
the list of refs named. Its main purpose is for the default
implementation of getRef (finding the first existing ref in a search
path). Hopefully it can be useful for other operations that look for
refs in a search path (e.g., git log --notes=<name>), too.
Change-Id: I5c6fcf1d3920f6968b8b97f3d4c3a267258c4b86
Signed-off-by: Jonathan Nieder <jrn@google.com>
Introduce exactRef to read a ref whose exact name is known
Unlike getRef(name), the new exactRef method does not walk the search
path. This should produce a less confusing result than getRef when the
exact ref name is known: it will not try to resolve refs/foo/bar to
refs/heads/refs/foo/bar even when refs/foo/bar does not exist.
It can be faster than both getRefs(ALL).get(name) and getRef(name)
because it only needs to examine a single ref.
A follow-up change will introduce a findRef synonym to getRef and
deprecate getRef to make the choice a caller is making more obvious
(exactRef or findRef, with the same semantics as getRefs(ALL).get and
getRefs(ALL).findRef).
Change-Id: If1bd09bcfc9919e7976a4d77f13184ea58dcda52
Signed-off-by: Jonathan Nieder <jrn@google.com>
ObjectReader release method was replaced by close method but
WindowCursor was still implementing release method.
To prevent the same mistake again, make ObjectReader close method
abstract to force sub classes to implement it.
Change-Id: I50d0d1d19a26e306fd0dba77b246a95a44fd6584
Signed-off-by: Hugo Arès <hugo.ares@ericsson.com>
The LRU chain management code was broken leading to situations where
the chain was incomplete. This prevented the cache from removing
items when it exceeded its memory target, causing a leak.
One case was repeated hit on the head of the chain. moveToHead(e)
was invoked linking the head back to itself in a cycle orphaning
the rest of the table.
Add some unit tests to cover this and a few other paths.
Change-Id: Ib27486eaa1b1d2bf1c745a56d0a5832bfb029322
JGit style is to import exactly the classes required, and never
to use "import foo.*" as the foo package could add new classes
in the future which are conflicting/confusing with the imports
already used by a source file.
Change-Id: I5693408c777e5843ec65fff1163d5d717849fa34
When writing new packs it should be allowed to specify objects as "have"
(objects which should not be included in the pack) which do not exist in
the local repository.
This works with the traditional PackWriter, but when PackWriter was
working on a repository with bitmap indexes and used
PackWriterBitmapWalker then this feature was broken. Non-existing "have"
objects lead to MissingObjectExceptions. That broke push and Gerrit
replication. When the replication target had branches unknown to the
replication source then the source repository wanted to build pack files
where "have" included branch-tips which were unknown in the source
repository.
Bug: 427107
Change-Id: I6b6598a1ec49af68aa77ea6f1f06e827982ea4ac
Also-by: Matthias Sohn <matthias.sohn@sap.com>
JGit should offer the possibility to do a garbage collection in
"aggressive" mode. In this mode garbage collection more aggressively
optimize the repository at the expense of taking much more time.
Technically a aggressive mode garbage collection differs from a
non-aggressive one by:
- not reusing packed objects found in old packs. Recompress every object
- the configuration pack.window is set to 250 (the default is 10)
- the configuration pack.depths is set to 250 (the default is 50)
The associated classes in org.eclipse.jgit.api and the command line
command in org.eclipse.jgit.pgm expose this new option.
The configuration parameters gc.aggressiveDepth and gc.aggressiveWindow
have been introduced to configure this feature.
Bug: 444332
Change-Id: I024101f2810acf6be13ce144c9893d98f5c4ae76
Revert "Add a method to DfsOutputStream to read as an InputStream"
This reverts commit b646578d89.
openInputStream() is never used in JGit, nor is it used by any
known working DFS implementation. The method was added as a
utility for reading back from a DfsInserter, but the final
implementation of that feature does not requrire this method.
Change-Id: I075ad95e40af49c92b554480f8993ef5658f7684
Add a method to ObjectInserter to read back inserted objects
In the DFS implementation, flushing an inserter writes a new pack to
the storage system and is potentially very slow, but was the only way
to ensure previously-inserted objects were available. For some tasks,
like performing a series of three-way merges, the total size of all
inserted objects may be small enough to avoid flushing the in-memory
buffered data.
DfsOutputStream already provides a read method to read back from the
not-yet-flushed data, so use this to provide an ObjectReader in the
DFS case.
In the file-backed case, objects are written out loosely on the fly,
so the implementation can just return the existing WindowCursor.
Change-Id: I454fdfb88f4d215e31b7da2b2a069853b197b3dd