Don't create the stream eagerly in lock(); that may cause JGit to
exceed OS or JVM limits on open file descriptors if many locks need
to be created, for instance when creating many refs. Instead create
the output stream only when one really needs to write something.
Bug: 573328
Change-Id: If9441ed40494d46f594a896d34a5c4f56f91ebf4
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Don't create the stream eagerly in lock(); that may cause JGit to
exceed OS or JVM limits on open file descriptors if many locks need
to be created, for instance when creating many refs. Instead create
the output stream only when one really needs to write something.
Bug: 573328
Change-Id: If9441ed40494d46f594a896d34a5c4f56f91ebf4
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Add new constructors to PackFile to improve a common use case where
callers know the directory, id, and extension, but previously needed to
construct a valid file name (with prefix, '.', etc) to create a
PackFile. Most callers can use the variant that has id as an ObjectId,
but provide an id as String variant too.
Change-Id: I39e4466abe8c9509f5916d5bfe675066570b8585
Signed-off-by: Nasser Grainawi <quic_nasserg@quicinc.com>
Restore preserved packs during missing object seeks
Provide a recovery path for objects being referenced during the pack
pruning race. Due to the pack pruning race, it is possible for objects
to become referenced after a pack has been deemed safe to prune, but
before it actually gets pruned. If this happened previously, the newly
referenced objects would be missing and potentially result in a
corrupted ref.
Add the ability to recover from this situation when an object is missing
but happens to still be available in a pack in the "preserved"
directory. This is likely only useful when used in conjunction with the
--preserve-old-packs GC option, which prunes packs by hard-linking to
the preserved directory. If an object is missing and found in a pack in
the preserved directory, immediately recover that pack and its
associated files (idx, bitmaps...) by moving them back to the original
pack directory, and then retry the operation that would have failed due
to the missing object. This retry can now succeed and the repository
may avoid corruption. This approach should drastically reduce the
chance of a corrupt repository during pack pruning at very little extra
cost. This extra cost should only be incurred when objects are missing
and a failure would normally occur.
Change-Id: I2a704e3276b88cc892159d9bfe2455c6eec64252
Signed-off-by: Martin Fick <quic_mfick@quicinc.com>
Signed-off-by: Nasser Grainawi <quic_nasserg@quicinc.com>
Pack: Replace extensions bitset with bitmapIdx PackFile
The only extension that was ever consulted from the bitmap was the
bitmap index. We can simplify the Pack code as well as the code of
all the callers if we focus on just that usage.
Change-Id: I799ddfdee93142af67ce5081d14a430d36aa4c15
Signed-off-by: Nasser Grainawi <quic_nasserg@quicinc.com>
The PackFile class is intended to be a central place to do all
common pack filename manipulation and parsing to help reduce repeated
code and bugs. Use the PackFile class in the Pack class and in many
tests to ensure it works well in a variety of situations. Later changes
will expand use of PackFiles to even more areas.
Change-Id: I921b30f865759162bae46ddd2c6d669de06add4a
Signed-off-by: Nasser Grainawi <quic_nasserg@quicinc.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
A cookie file stores the expiration in seconds since the Linux Epoch,
not in milliseconds. Correct reading and writing cookie files; with
a backwards-compatibility hack to read files that contain a millisecond
timestamp.
Add a test, and fix tests not to rely on the actual current time so
that they will also run successfully after 2030-01-01 noon.
Bug: 571574
Change-Id: If3ba68391e574520701cdee119544eedc42a1ff2
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Pack better represents the purpose of the object and paves the way to
add a PackFile object that extends File.
Change-Id: I39b4f697902d395e9b6df5e8ce53078ce72fcea3
Signed-off-by: Nasser Grainawi <quic_nasserg@quicinc.com>
Move reachability checker generation into the ObjectReader object
Reachability checkers are retrieved from RevWalk and ObjectWalk objects:
* RevWalk.createReachabilityChecker()
* ObjectWalk.createObjectReachabilityChecker()
Since RevWalks and ObjectWalks are themselves directly instantiated
in hundreds of places (e.g. UploadPack...) overriding them in a
consistent way requires overloading 100s of methods, which isn't
feasible. Moving reachability checker generation to a more central
place solves that problem.
The ObjectReader object seems a good place from which to get
reachability checkers, because reachability checkers return
information about relationships between objects. ObjectDatabases
delegate many operations to ObjectReaders, and reachability bitmaps
are attached to ObjectReaders.
The Bitmapped and Pedestrian reachability checker objects were
package private in the org.eclipse.jgit.revwalk package. This change
makes them public and moves them to the
org.eclipse.jgit.internal.revwalk package. Corresponding tests are
also moved.
Motivation:
1) Reachability checking algorithms need to scale. One of the
internal Android repositories has ~2.4 million refs/changes/*
references, causing bad long tail performance in reachability
checks.
2) Reachability check performance is impacted by repository
topography: number of refs, number of objects, amounts of
related vs. unrelated history.
3) Reachability check performance is also affected by per-branch
access (Gerrit branch permissions) since different users can
see different branches.
4) Reachability check performance isn't affected by any state in a
RevWalk or ObjectWalk.
I don't yet know if a single algorithm will work for all cases in #2
and #3. We may need to evolve the ReachabilityChecker interfaces
over time to solve the Gerrit branch permissions case, or use
Gerrit-specific identity information to solve that in an efficient
way.
This change takes the existing public API and moves it to the
ObjectReader/whole repository level, which is where we can do
consistent customizations for #2 and #3. We intend to upstream the
best of whatever works, but anticipate the need for multiple rounds
of experimentation.
Change-Id: I9185feff43551fb387957c436112d5250486833d
Signed-off-by: Terry Parker <tparker@google.com>
Add getsRefsByPrefixWithSkips (excluding prefixes) to ReftableDatabase
We sometimes want to get all the refs except specific prefixes,
similarly to getRefsByPrefix that gets all the refs of a specific
prefix.
We now create a new method that gets all refs matching a prefix except a
set of specific prefixes.
One use-case is for Gerrit to be able to get all the refs except
refs/changes; in Gerrit we often have lots of refs/changes, but very
little other refs. Currently, to get all the refs except refs/changes we
need to get all the refs and then filter the refs/changes, which is very
inefficient. With this method, we can simply skip the unneeded prefix so
that we don't have to go over all the elements.
RefDirectory still uses the inefficient implementation, since there
isn't a simple way to use Refcursor to achieve the efficient
implementation (as done in ReftableDatabase).
Signed-off-by: Gal Paikin <paiking@google.com>
Change-Id: I8c5db581acdeb6698e3d3a2abde8da32f70c854c
This method will be used by the follow-up change. This useful if we want
to go over all the changes after a specific ref.
For example, the new method allows us to create a follow-up that would
go over all the refs until we reach a specific ref (e.g refs/changes/),
and then we use seekPastPrefix(refs/changes/) to read the rest of the refs,
thus basically we return all refs except a specific prefix.
When seeking past a prefix, the previous condition that created the
RefCursor still applies. E.g, if the cursor was created by
seekRefsWithPrefix, we can skip some refs but we will not return refs
that are not starting with this prefix.
Signed-off-by: Gal Paikin <paiking@google.com>
Change-Id: I2c02e89c877fe90da8619cb8a4a9a0c865f238ef
[spotbugs] Fix potential NPE in PackFileSnapshotTest
Path#getFileName can return null. Fix the warning by asserting the file
name isn't null.
Change-Id: I7f2fe75b46113d8be1d14e3f18dd77da27df25ed
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
[spotbugs] Fix potential NPEs in FileReftableStackTest
File#listFiles can return null. Use Files#list instead to fix the
problem.
Change-Id: I74e0b49aa6dae370219507c64aa43be4d8aa7b82
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
[spotbugs] Fix potential NPE in GcPruneNonReferencedTest
File#listFiles can return null, assert it is not null to fix the
warning.
Change-Id: I28fc668fee760d39965e6e039003ac9f85fd461b
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
This was experimental code and never used in production.
Change-Id: Ia3da7f2b82d9e365cec2ccf9397cbc47439cd150
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
If pack or index files are guarded by a pack lock (.keep file)
deleteOrphans() should not touch the respective files protected by the
lock file. Otherwise it may interfere with PackInserter concurrently
inserting a new pack file and its index.
The problem was caused by the following race.
All mentioned files are located in "objects/pack/".
File endings relevant in "pack" dir:
.pack
.keep
.idx
.bitmap
When ReceivePack receives a pack file it executes the following steps:
ReceivePack.service():
receivePackAndCheckConnectivity():
receivePack():
receive the pack
parse the pack, returns packLock (.keep file)
PackInserter.flush():
write tmpPck file: "insert_<random>.pack"
write tmpIdx file: "insert_<random>.idx"
real pack name: "pack-<SHA1>.pack"
real index name: "pack-<SHA1>.idx"
atomic rename tmpPack to realPack
atomic rename tmpIdx to tmpIdx
execute commands
unlock pack by removing .keep file
trigger auto gc if enabled
When PackInserter.flush() renames the temporary pack to the final
"pack-xxx.pack" file the temporary pack index file "insert_xxx.idx"
has no matching .pack file with the same base name for a short interval.
If deleteOrphans() ran during that interval it deduced the pack index
file was orphaned. Subsequently the missing pack index caused
MissingObjectExceptions since objects contained in the pack couldn't be
looked up anymore.
Bug: https://bugs.chromium.org/p/gerrit/issues/detail?id=13544
Change-Id: I559c81e4b1d7c487f92a751bd78b987d32c98719
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Fix IOException occurring when calling
GC on a repository with absent objects/pack folder.
Change-Id: I5be1333a0726f4d7491afd25ddac85451686c30a
Signed-off-by: Nail Samatov <sanail@yandex.ru>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
DfsBundleWriter writes out the entire repository to a Git bundle file.
It packs all objects included in the packfile by concatenating all pack
files. This makes the bundle creation fast and cheap. Useful for backing
up a repository as-is.
Change-Id: Iee20e4b1ab45b2a178dde8c72093c0dd83f04805
Signed-off-by: Masaya Suzuki <masayasuzuki@google.com>
Fix possible NegativeArraySizeException in PackIndexV1
Due to an integer overflow bug, the current "Index file is too large
for jgit" check did not work properly and subsequently a
NegativeArraySizeException was raised.
Change-Id: I2736efb28987c29e56bc946563b7fa781898a94a
Signed-off-by: Marc Strapetz <marc.strapetz@syntevo.com>
MergedReftable: Include the last reftable in determining minUpdateIndex
MergedReftable ignores the last reftable in the stack while calculating the
minUpdateIndex.
Update the loop indices to include all reftables in the minUpdateIndex
calculation, while skipping position 0 as it is read outside the loop.
Change-Id: I12d3e714581e93d178be79c02408a67ab2bd838e
Signed-off-by: Minh Thai <mthai@google.com>
PackBitmapIndex: Move BitmapCommit to a top-level class
Move BitmapCommit from inside the PackWriterBitmapPreparer to a new
top-level class in preparation for improving the memory footprint of GC's
bitmap generation phase.
Change-Id: I4d404a5b3a34998b441d23105197f33d32d39670
Signed-off-by: Yunjie Li <yunjieli@google.com>
Introduce an IterativeConnectivityChecker which runs a connectivity
check with a filtered set of references, and falls back to using the
full set of advertised references.
It uses references during first check attempt:
- References that are ancestors of an incoming commits (e.g., pushing
a commit onto an existing branch or pushing a new branch based on
another branch)
- Additional list of references we know client can be interested in
(e.g. list of open changes for Gerrit)
We tested it inside Google and it improves connectivity for certain
topologies. For example connectivity counts for
chromium.googlesource.com/chromium/src:
percentile_50: 1923 (was: 22777)
percentile_90: 23272 (was: 353003)
percentile_99: 345522 (was: 353435)
This saved ~2 seconds on every push to this repository.
Signed-off-by: Demetr Starshov <dstarshov@google.com>
Change-Id: I6543c2e10ed04622ca795b195665133e690d3b10
Scan through all merged reftables for max/min update indices
Since reftables might have update index ranges that are overlapped.
Change-Id: I8f8215b99a0a978d4dd0155dbaf33e5e06ea8202
Signed-off-by: Minh Thai <mthai@google.com>
(cherry picked from commit 06748c205c)
Scan through all merged reftables for max/min update indices
Since reftables might have update index ranges that are overlapped.
Change-Id: I8f8215b99a0a978d4dd0155dbaf33e5e06ea8202
Signed-off-by: Minh Thai <mthai@google.com>
The change Ic0b974fa (c217d33, "Documentation/technical/reftable:
improve repo layout") defines a new repository layout, which was
agreed with the git-core mailing list.
It addresses the following problems:
* old git clients will not recognize reftable-based repositories, and
look at encompassing directories.
* Poorly written tools might write directly into
.git/refs/heads/BRANCH.
Since we consider JGit reftable as experimental (git-core doesn't
support it yet), we have no backward compatibility. If you created a
repository with reftable between mid-Nov 2019 and now, you can do the
following to convert:
mv .git/refs .git/reftable/tables.list
git config core.repositoryformatversion 1
git config extensions.refStorage reftable
Change-Id: I80df35b9d22a8ab893dcbe9fbd051d924788d6a5
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Reorder modifiers to follow Java Language Specification
The Java Language Specification recommends listing modifiers in
the following order:
1. Annotations
2. public
3. protected
4. private
5. abstract
6. static
7. final
8. transient
9. volatile
10. synchronized
11. native
12. strictfp
Not following this convention has no technical impact, but will reduce
the code's readability because most developers are used to the standard
order.
This was detected using SonarLint.
Change-Id: I9cddecb4f4234dae1021b677e915be23d349a380
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
The ReftableCompactor supported a byteLimit, but this is currently
unused. The FileReftableStack has a more sophisticated strategy that
amortizes compaction costs.
Rename min/maxUpdateIndex to reflogExpire{Min,Max}UpdateIndex to
reflect their purpose more accurately.
Since reflogs are generally pruned chronologically (oldest entries are
expired first), one can only prune entries on full compaction, so they
should not be set by default.
Rephrase the function Reader#minUpdateIndex and maxUpdateIndex. These
vars are documented to affect log entries, but semantically, they are
about ref entries. Since ref entries have their timestamps
delta-compressed, it is important for the min/maxUpdateIndex values to
be coherent between different tables.
The logical timestamps for log entries do not have to be coherent in
different tables, as the timestamps of a log entry is part of the key.
For example, a table written at update index 20 may contain a tombstone
log entry at timestamp 1.
Therefore, we set ReftableWriter's min/maxUpdateIndex from the merged
tables we are compacting, rather than from the compaction settings
(which should only control reflog expiry.)
The previous behavior could drop log entries erroneously, especially
in the presence of tombstone log entries. Unfortunately, testing this
properly requires both an API for adding log tombstones, and a more
refined API for controlling automatic compaction. Hence, no test.
Change-Id: I2f4eb7866f607fddd0629809e8e61f0b9097717f
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
The merged table contains handles to open files. A full compaction
causes those files to be closed, and so further lookups would fail
with EBADF.
Change-Id: I7bb74f7228ecc7fec9535b00e56a617a9c18e00e
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Since version 4.13 JUnit has an assertThrows method. Remove the
implementation in MoreAsserts and use the one from JUnit.
CQ: 21439
Change-Id: I086baa94aa3069cebe87c4cbf91ed1534523c6cb
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
WindowCache: add option to use strong refs to reference ByteWindows
Java GC evicts all SoftReferences when the used heap size comes close to
the maximum heap size. This means peaks in heap memory consumption can
flush the complete WindowCache which was observed to have negative
impact on performance of upload-pack in Gerrit.
Hence add a boolean option core.packedGitUseStrongRefs to allow using
strong references to reference packfile pages cached in the WindowCache.
If this option is set to true Java gc can no longer flush the
WindowCache to free memory if the used heap comes close to the maximum
heap size. On the other hand this provides more predictable performance.
Bug: 553573
Change-Id: I9de406293087ab0fa61130c8e0829775762ece8d
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
and switch over strings where possible. Sometimes if statements are
chained and form a series of comparisons against constants. Using switch
statements improves readability.
Bug: 545856
Change-Id: Iacb78956ee5c20db4d793e6b668508ec67466606
Signed-off-by: Carsten Hammer <carsten.hammer@t-online.de>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Add the following statistics
- cache hit count and hit ratio
- cache miss count and miss ratio
- count of successful and failed loads
- rate of failed loads
- load, eviction and request count
- average and total load time
Use LongAdder instead of AtomicLong to implement counters in order to
improve scalability.
Optionally expose these metrics via JMX, they are registered with the
platform MBean server if the config option jmx.WindowCacheStats = true
in the user or system level git config.
Bug: 553573
Change-Id: Ia2d5246ef69b9c2bd594a23934424bc5800774aa
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
ReftableTest: Clean up boxing warnings on usage of String.format
Passing int as an argument to String.format causes a warning:
The expression of type int is boxed into Integer
Most of these are already suppressed, but there are a couple that are
not. Add suppressions for those.
For the existing ones, move the suppression scope from the method to
the actual usage. Where necessary extract the usage out to a local
variable.
Change-Id: I7a7ff6dec49467e4b5c58d27a231c74e6e1c5437
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
The reftable format supports fast inverse (SHA1 => ref) queries.
If the ref database does not support fast inverse queries, it may be
advantageous to build a complete SHA1 to ref map in advance for
multiple uses. To let applications decide, this function indicates
whether the inverse map is available.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Change-Id: Idaf7e01075906972ec21332cade285289619c2b3
RepositoryCache: don't require HEAD in git repositories
Reftable-enabled repositories don't have a file called HEAD. Check for
reftable/ instead.
This fixes repository creation on reftable in Gerrit.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Change-Id: I778c2be01d96aaf135affae4b457b5fe5b483bee
Reftable is a binary, block-based storage format for the ref-database.
It provides several advantages over the traditional packed + loose
storage format:
* O(1) write performance, even for deletions and transactions.
* atomic updates to the ref database.
* O(log N) lookup and prefix scans
* free from restrictions imposed by the file system: it is
case-sensitive even on case-insensitive file systems, and has
no inherent limitations for directory/file conflicts
* prefix compression reduces space usage for repetitive ref names,
such as gerrit's refs/changes/xx/xxxxx format.
FileReftableDatabase is based on FileReftableStack, which does
compactions inline. This is simple, and has good median performance,
but every so often it will rewrite the entire ref database.
For testing, a FileReftableTest (mirroring RefUpdateTest) is added to
check for Reftable specific behavior. This must be done separately, as
reflogs have different semantics.
Add a reftable flavor of BatchRefUpdateTest.
Add a FileReftableStackTest to exercise compaction.
Add FileRepository#convertToReftable so existing testdata can be
reused.
CQ: 21007
Change-Id: I1837f268e91c6b446cb0155061727dbaccb714b8
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
reftable: enforce ascending order in sortAndWriteRefs
MergedReftableTest#scanDuplicates tests whether we can write duplicate
keys in a merged reftable. Apparently, the first key appearing should
get precedence, and this works because the sort() algorithm on ordered
collections is stable.
This is potentially confusing behavior, because you can write data
into the table that cannot be retrieved (Merged table can only have
one entry per key), and the APIs such as exactRef() only return a
single value.
Make this consistent with behavior introduced in I04f55c481 "reftable:
enforce ordering for ref and log writes" by considering a duplicate key
in sortAndWriteRefs as a fatal runtime error.
Change-Id: I1eedd18f028180069f78c5c467169dcfe1521157
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
This doesn't yet ensure that _all_ repositories are closed. It only
handles the obvious, local, and easy cases.
Change-Id: I0f9f8607791f0f03ed1f5ad71e9595e78b78892f
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Enable and fix "Statement unnecessarily nested within else clause" warnings
Since [1] the gerrit project includes jgit as a submodule, and has this
warning enabled, resulting in 100s of warnings in the console.
Also enable the warning here, and fix them.
At the same time, add missing braces around adjacent and nearby one-line
blocks.
[1] https://gerrit-review.googlesource.com/c/gerrit/+/227897
Change-Id: I81df3fc7ed6eedf6874ce1a3bedfa727a1897e4c
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>