BatchRefUpdate: Skip saving conflicting ref names and prefixes in memory
Rather than getting all ref names and prefixes and saving them
in memory to perform the check for conflicting names, rely on
RefDirectory.isNameConflicting as it is no longer an expensive
call after it was optimized in Ie994fc.
The old optimization to save ref names and prefixes in memory
was targeted towards making clones faster. With this change,
the clone performance is unaffected when tests were done with
repos containing many(~500k) refs.
Here are few recorded elapsed times for creating 10 branches
using BatchRefUpdate on NFS based repositories with varying
loose refs count. As seen here, this change helps improve the
BatchRefUpdate performance from O(n^2) to O(1).
loose_refs_count with_change without_change
50 241 ms 310 ms
300 263 ms 1502 ms
1k 181 ms 4241 ms
2k 204 ms 6440 ms
9k 158 ms 25930 ms
20k 154 ms 60443 ms
50k 171 ms 135199 ms
110k 157 ms 329450 ms
160k 209 ms 396328 ms
This update improves the Gerrit notedb migration performance
as it uses BatchRefUpdate to write change meta refs similar to
the test performed above.
Change-Id: I853ac6c7feb4b39c3156c01876b38cbd182accfe
Signed-off-by: Kaushik Lingarkar <quic_kaushikl@quicinc.com>
Avoid having to scan over ALL loose refs to determine if the
name is nested within or is a container of an existing reference.
This can get really expensive if there are too many loose refs.
Instead use exactRef and getRefsByPrefix which scan based on a
prefix.
With a simple shell script(like below) using jgit client to create
1k refs in a new repository on NFS, this change brings down the time
from 12mins to 7mins.
for ref in $(seq 1 1000); do
jgit branch "$ref"
done
Here are few recorded elapsed times to create a new branch on NFS
based repositories with varying loose refs count. As we see here,
this change improves the name conflicting check from O(n^2) to O(1).
loose_refs_count with_change without_change
50 44 ms 164 ms
300 45 ms 1193 ms
1k 38 ms 2610 ms
2k 44 ms 6003 ms
9k 46 ms 27860 ms
20k 45 ms 48591 ms
50k 51 ms 135471 ms
110k 43 ms 294252 ms
160k 52 ms 430976 ms
Change-Id: Ie994fc184b8f82811bfb37b111eb9733dbe3e6e0
Signed-off-by: Kaushik Lingarkar <quic_kaushikl@quicinc.com>
Fix PackInvalidException when fetch and repack run concurrently
We are running several servers with jGit. We need to run repack from
time to time to keep the repos performant. I.e. after push we test how
many small packs are in the repo and when a threshold is reached we run
the repack.
After upgrading jGit version we've found that if someone does the clone
at the time repack is running the clone sometimes (not always) fails
because the repack removes .pack file used by the clone. Server
exception and client error attached.
I've tracked down the cause and it seems to be introduced between jGit
5.2 (which we upgraded from) and 5.3 and being caused by this commit:
Move throw of PackInvalidException outside the catch -
afef866a44
The problem is that when the throw was inside of the try block the last
catch block catched the exception and called openFailed(false) method.
It is true that it called it with invalidate = false, which is wrong.
The real problem though is that with the throw outside of the try block
the openFail is not called at all and the fields activeWindows and
activeCopyRawData are not set to 0. Which affects the later called tests
like: if (++activeCopyRawData == 1 && activeWindows == 0).
The fix for this is relatively simple keeping the throw outside of the
try block and still having the invalid field set to true. I did
exhaustive testing of the change running concurrent clones and pushes
indefinitely and with the patch applied it never fails while without the
patch it takes relatively short to get the error.
See: https://www.eclipse.org/lists/jgit-dev/msg04014.html
Bug: 569349
Change-Id: I9dbf8801c8d3131955ad7124f42b62095d96da54
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
GC#deleteOrphans: handle failure to list files in pack directory
- log an error
- either there is no list or it is incomplete hence return immediately
Change-Id: Ieee5378ca06304056b9ccc30c1acd5f52360052d
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
If pack or index files are guarded by a pack lock (.keep file)
deleteOrphans() should not touch the respective files protected by the
lock file. Otherwise it may interfere with PackInserter concurrently
inserting a new pack file and its index.
The problem was caused by the following race.
All mentioned files are located in "objects/pack/".
File endings relevant in "pack" dir:
.pack
.keep
.idx
.bitmap
When ReceivePack receives a pack file it executes the following steps:
ReceivePack.service():
receivePackAndCheckConnectivity():
receivePack():
receive the pack
parse the pack, returns packLock (.keep file)
PackInserter.flush():
write tmpPck file: "insert_<random>.pack"
write tmpIdx file: "insert_<random>.idx"
real pack name: "pack-<SHA1>.pack"
real index name: "pack-<SHA1>.idx"
atomic rename tmpPack to realPack
atomic rename tmpIdx to tmpIdx
execute commands
unlock pack by removing .keep file
trigger auto gc if enabled
When PackInserter.flush() renames the temporary pack to the final
"pack-xxx.pack" file the temporary pack index file "insert_xxx.idx"
has no matching .pack file with the same base name for a short interval.
If deleteOrphans() ran during that interval it deduced the pack index
file was orphaned. Subsequently the missing pack index caused
MissingObjectExceptions since objects contained in the pack couldn't be
looked up anymore.
Bug: https://bugs.chromium.org/p/gerrit/issues/detail?id=13544
Change-Id: I559c81e4b1d7c487f92a751bd78b987d32c98719
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
WindowCache: add metric for cached bytes per repository
Since ObjectDatabase and PackFile don't know their repository use the
packfile's grand-grand-parent directory as an identifier for the
repository the packfile resides in.
Remove metric for a repository if the number of cached bytes for the
repository drops to 0 in order to ensure the map of cached bytes per
repository doesn't contain repositories which have no data cached in the
WindowCache.
Change-Id: I969ab8029db0a292e7585cbb36ca0baa797da20b
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
WindowCache: add option to use strong refs to reference ByteWindows
Java GC evicts all SoftReferences when the used heap size comes close to
the maximum heap size. This means peaks in heap memory consumption can
flush the complete WindowCache which was observed to have negative
impact on performance of upload-pack in Gerrit.
Hence add a boolean option core.packedGitUseStrongRefs to allow using
strong references to reference packfile pages cached in the WindowCache.
If this option is set to true Java gc can no longer flush the
WindowCache to free memory if the used heap comes close to the maximum
heap size. On the other hand this provides more predictable performance.
Bug: 553573
Change-Id: I9de406293087ab0fa61130c8e0829775762ece8d
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Replace usage of ArrayIndexOutOfBoundsException in treewalk
Using exceptions during normal operations - for example with the
desire of expanding an array in the failure case - can have a
severe performance impact. When exceptions are instantiated,
a stack trace is collected. Generating stack trace can be expensive.
Compared to that, checking an array for length - even if done many
times - is cheap since this is a check that can run in just a
handful of CPU cycles.
Change-Id: Ifaf10623f6a876c9faecfa44654c9296315adfcb
Signed-off-by: Patrick Hiesel <hiesel@google.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Add the following statistics
- cache hit count and hit ratio
- cache miss count and miss ratio
- count of successful and failed loads
- rate of failed loads
- load, eviction and request count
- average and total load time
Use LongAdder instead of AtomicLong to implement counters in order to
improve scalability.
Optionally expose these metrics via JMX, they are registered with the
platform MBean server if the config option jmx.WindowCacheStats = true
in the user or system level git config.
Bug: 553573
Change-Id: Ia2d5246ef69b9c2bd594a23934424bc5800774aa
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
SystemReader.updateAll() must _not_ test whether the file exists. In
tests at least there are FileBasedConfigs with a null file. Test
configs should (and do) override isOutdated() to deal with this case.
Change-Id: I56303fe0d56afeb9f2203ee807a92c5dcf3809e9
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Prepend hostname to subsection used to store file timestamp resolution
This ensures the measured filesystem timestamp resolution will be only
used on the machine where it was measured and avoid errors in case the
~/.jgitconfig file is copied to another machine.
Bug: 551850
Change-Id: Iff2a11be62ca94c3bbe4a955182988dc50852f9f
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Store filesystem timestamp resolution in extra jgit config
This avoids polluting hand-crafted user level config with
auto-configured options which might disturb in environments where
the user level config is replicated between different machines.
Add a jgit config as parent of the system level config. Persist
measured timestamp resolutions always in this jgit config and read it
via the user global config. This has the effect that auto-configured
timestamp resolution will be used by default and can be overridden in
either the system level or user level config.
Store the jgit config under the XDG_CONFIG_HOME directory following the
XDG base directory specification [1] in order to ensure that we have
write permissions to persist the file. This has the effect that each OS
user will use its jgit config since they typically use different
XDG_CONFIG_HOME directories.
If the environment variable XDG_CONFIG_HOME is defined the jgit config
file is located at $XDG_CONFIG_HOME/jgit/config otherwise the default is
~/.config/jgit/config.
If you want to avoid redundant measurement for different OS users
manually copy the values measured and auto-configured for one OS user to
the system level git config.
[1] https://wiki.archlinux.org/index.php/XDG_Base_Directory
Bug: 551850
Change-Id: I0022bd40ae62f82e5b964c2ea25822eb55d94687
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
SystemReader: extract updating config and its parents if outdated
Change-Id: Ia77f442e47c5670c2d6d279ba862044016aabd86
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Do not rely on ArrayIndexOutOfBoundsException to detect end of input
In the Config#StringReader we relied on ArrayIndexOutOfBoundsException
to detect the end of the input. Creation of exception with (deep) stack
trace can significantly degrade performance in case when we read
thousands of config files, like in the case when Gerrit reads all
external ids from the NoteDb.
Use the buf.length to detect the end of the input.
Change-Id: I12266f25751373a870ce3fa623cf2a95d882d521
WorkingTreeIterator: handle different timestamp resolutions
Older JGit stored only milliseconds timestamps in the index. Newer
JGit may get finer timestamps from the file system. This leads to
slow index diffs when a new JGit runs against an index produced
by older JGit because many timestamps will differ and JGit will
then do many content checks. See [1].
Handle this migration case by only comparing milliseconds if the
index entry has only millisecond precision.
The inverse may also occur; also compare only milliseconds if the
file timestamp has only millisecond precision.
Do the same also for microsecond resolution. On Windows, NTFS may
provide 100ns resolution and may be used by external programs writing
the index, but Java's WindowsFileAttributes may provide only
microseconds.
File timestamp precision in Java depends not only on the Java APIs
used by different JGit versions but may also change when running the
same Java code on different VMs. And of course the resolution may
vary among operating and file systems. Moreover, timestamp precision
in the index depends on the program that wrote the index. Canonical
git may use a different resolution, maybe even different between git
versions.
[1] https://www.eclipse.org/forums/index.php/t/1100344/
Change-Id: Idfd08606c883cb98787b2138f9baf0cc89a57b56
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Fix WorkingTreeIterator.compareMetadata() for CheckStat.MINIMAL
If CheckStat is MINIMAL or timestamps have no nanosecond part
WorkingTreeIterator.compareMetaData only checks the second part of
timestamps and ignores nanoseconds which may have ended up in the index
by using native git.
If
fileLastModified.getEpochSecond() == cacheLastModified.getEpochSecond()
we currently proceed comparing fileLastModified and cacheLastModified
with full precision which is wrong since we determined that we detected
reduced timestamp resolution.
Fix this and also handle smudged index entries for CheckStat.MINIMAL.
Change-Id: I6149885903ac63d79b42d234cc02aa4e19578f3c
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
[error prone] Suppress NonAtomicVolatileUpdate in SimpleLruCache
We don't need to update time atomically since it's only used to order
cache entries in LRU order.
Change-Id: I756fa6d90b180c519bf52925f134763744f2c1f1
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
BatchRefUpdate: repro racy atomic update, and fix it
PackedBatchRefUpdate was creating a new packed-refs list that was
potentially unsorted. This would be papered over when the list was
read back from disk in parsePackedRef, which detects unsorted ref
lists on reading, and sorts them. However, the BatchRefUpdate also
installed the new (unsorted) list in-memory in
RefDirectory#packedRefs.
With the timestamp granularity code committed to stable-5.1, we can
more often accurately decide that the packed-refs file is clean, and
will return the erroneous unsorted data more often. Unluckily timed
delays also cause the file to be clean, hence this problem was
exacerbated under load.
The symptom is that refs added by a BatchRefUpdate would stop being
visible directly after they were added. In particular, the Gerrit
integration tests uses BatchRefUpdate in its setup for creating the
Admin group, and then tries to read it out directly afterward.
The tests recreates one failure case. A better approach would be to
revise RefList.Builder, so it detects out-of-order lists and
automatically sorts them.
Fixes https://bugs.eclipse.org/bugs/show_bug.cgi?id=548716 and
https://bugs.chromium.org/p/gerrit/issues/detail?id=11373.
Bug: 548716
Change-Id: I613c8059964513ce2370543620725b540b3cb6d1
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Add the constant, and implement hashing of known host names in
OpenSshServerKeyDatabase. Add a test verifying that the hashing
works.
Bug: 548492
Change-Id: Iabe82b666da627bd7f4d82519a366d166aa9ddd4
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Return a new instance from openSystemConfig and openUserConfig
Move the handling of cached user and system config to getSystemConfig
and getUserConfig methods and revert the implementation of
openSystemConfig and openUserConfig to the old stateless
implementation.
This ensures the open methods respect the passed-in parent config, which
may be different on each invocation. Additionally, returning a new
instance matches the behavior of the previous implementation of the
default system reader, which downstream callers may be depending on.
Move the implementation of the new caching methods getSystemConfig and
getUserConfig up to SystemReader. This avoids that we break the ABI for
subclasses of SystemReader.
Also see [1] which fixed a similar problem with Gerrit's custom
SystemReader.
[1] https://gerrit-review.googlesource.com/c/gerrit/+/225458
Change-Id: If54a2491932d8fc914d4649cb73c9e837c5b8ad0
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
These warnings were missed to address in a0048208 which introduced them.
Change-Id: Ia2d15fdce72c10378d020682b80fe7fc548c0d4c
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
RevWalk: Traverse all parents of UNINTERESTING commits
When firstParent is set, RevWalk traverses only the first parent of a
commit, even though that commit is UNINTERESTING. Since we want the
maximal UNINTERESTING set, we shouldn't prune any parents here. This
issue is apparent only when some of the commits being traversed are
unparsed, since walker.carryFlagsImpl() propagates the UNINTERESTING
flag to all parsed ancestors, masking the issue.
Therefore teach RevWalk to traverse all parents when a commit is
UNINTERESTING and not only the first parent. Since this issue is
masked by commit parsing, also test situations when the commits
involved are unparsed.
Signed-off-by: Alex Spradlin <alexaspradlin@google.com>
Change-Id: I95e2ad9ae8f1f50fbecae674367ee7e0855519b1
Fix error occurring when SecurityManager is enabled
It's expected that jgit should work without native git installation.
In such case Security Manager can be configured to deny access to the
files outside of git repository. JGit tries to find cygwin
installation. If Security manager restricts access to some folders
in PATH, it should be considered that those folders are absent
for jgit.
Also JGit tries to detect if symbolic links are supported by OS. If
security manager forbids creation of symlinks, it should be assumed
that symlinks aren't supported.
Bug: 550115
Change-Id: Ic4b243cada604bc1090db6cc1cfd74f0fa324b98
Signed-off-by: Nail Samatov <sanail@yandex.ru>
Use AtomicReferences to cache user and system level configs
This ensures that only one instance of user and one instance of system
config is set.
Change-Id: Idd00150f91d2d40af79499dd7bf8ad5940f87c4e
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
deleteChildren was called on directory instead of gitDir, leading to a
potential null pointer exception if the git directory existed initially.
Bug: 550340
Change-Id: Iafc3b2961253a99862a59e81c7371f7bc564b412
Signed-off-by: Adrien Bustany <adrien-xx-eclipse@bustany.org>
SystemReader: Use correct constructor of FileBasedConfig
The merge done in change If0c5010a2 resolved a conflict incorrectly
and reverted the fix that was done in change Id0bcdc93b.
Change-Id: I0f5fde33d1f366817f2b966eb42535f7bd3b063e
Reported-by: Thomas Wolf <thomas.wolf@paranor.ch>
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Avoid sign extension when comparing mtime with Instant#getEpochSecond
Ensure we use the same type when comparing seconds since the epoch.
This does not prevent that in 2038 timestamps in seconds since the epoch
stored in a 32 bit integer will overflow. Integer.MAX_VALUE translates
to 2038-01-19T03:14:07Z. After this date we'll have an issue since we
store seconds since the epoch in a 32 bit integer in some places.
Bug: 319142
Change-Id: If0c03003d40b480f044686e2f7a2f62c9f4e2fe1
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Fix deprecation in DirCache caused by Instant based DirCacheEntry
Replace the two int variables smudge_s and smudge_ns by an Instant and
use the new method DirCacheEntry.mightBeRacilyClean(Instant).
Change-Id: Id70adbb0856a64909617acf65da1bae8e2ae934a
Signed-off-by: Michael Keppler <Michael.Keppler@gmx.de>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
UploadPack: support custom packfile-to-URI mapping
Teach UploadPack to take a provider of URIs corresponding to cached
packs. When fetching, if the client supports the packfile-uri feature,
and if such a cached pack were to be streamed, instead send the
corresponding URI.
This packfile-uri feature is implemented in the jt/fetch-cdn-offload
branch of Git. There is interest in this feature [1], but it is not yet
merged.
[1] https://public-inbox.org/git/cover.1552073690.git.jonathantanmy@google.com/
Change-Id: I9a32dae131c9c56ad2ff4a8a9638ae3b5e44dc15
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Previously, the API did not enforce ordering of writes. Misuse of
this API would lead to data effectively being lost.
Guard against that with IllegalArgumentException, and add a test.
Change-Id: I04f55c481d60532fc64d35fa32c47037a03988ae
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
reftable: fix seeking to refs in reflog implementation
Small reftables omit the log index. Currently,
ReftableWriter#shouldHaveIndex does this if there is a single-block
log, but other writers could decide on different criteria.
In the case that the log index is missing, we have to linearly search
for the right block. It is never appropriate to use binary search on
blocks for log data, as the blocks are compressed and therefore
irregularly sized.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Change-Id: Id59874edf6bf45c7dec502d9465888e077ffe198
Cache user global and system-wide git configurations
So far the git configuration and the system wide git configuration were
always reloaded when jgit accessed these global configuration files to
access global configuration options which are not in the context of a
single git repository. Cache these configurations in SystemReader and
only reload them if their file metadata observed using FileSnapshot
indicates a modification.
Change-Id: I092fe11a5d95f1c5799273cacfc7a415d0b7786c
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
The existing javadoc was copied from another method and not adapted.
Change-Id: I39a7e5d719b2c379de9bd1a4710a55a73700c6f0
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>