Move this test to another class and skip it when running tests with
bazel since the bazel test runner does not allow to create files in the
home directory.
FS#userHome retrieves the home directory on the first call and caches it
for subsequent calls to avoid overhead in case path translation is
required (currently on cygwin). This prevents that the test can mock the
home directory using MockSystemReader like SshTestHarness does.
Always parse RevTags including their body before getting their object
to ensure that non-cached objects are handled correctly when traversing
a tag chain. An NPE in UploadPack#addTagChain will occur on a depth=1
fetch of a branch containing a tag chain and the ref to one of the
middle tags in the chain is deleted.
Matthias Sohn [Thu, 20 Apr 2023 14:01:33 +0000 (16:01 +0200)]
Merge branch 'stable-5.13' into stable-6.0
* stable-5.13:
Add missing since tag for SshBasicTestBase
Add missing since tag for SshTestHarness#publicKey2
Silence API errors
Prevent infinite loop rescanning the pack list on
PackMismatchException
Remove blank in maven.config
Matthias Sohn [Thu, 20 Apr 2023 13:40:36 +0000 (15:40 +0200)]
Merge branch 'stable-5.12' into stable-5.13
* stable-5.12:
Add missing since tag for SshBasicTestBase
Add missing since tag for SshTestHarness#publicKey2
Silence API errors
Prevent infinite loop rescanning the pack list on
PackMismatchException
Remove blank in maven.config
Matthias Sohn [Thu, 20 Apr 2023 13:12:01 +0000 (15:12 +0200)]
Merge branch 'stable-5.11' into stable-5.12
* stable-5.11:
Add missing since tag for SshBasicTestBase
Add missing since tag for SshTestHarness#publicKey2
Silence API errors
Prevent infinite loop rescanning the pack list on PackMismatchException
Remove blank in maven.config
Matthias Sohn [Thu, 20 Apr 2023 12:42:56 +0000 (14:42 +0200)]
Merge branch 'stable-5.10' into stable-5.11
* stable-5.10:
Add missing since tag for SshTestHarness#publicKey2
Silence API errors
Prevent infinite loop rescanning the pack list on
PackMismatchException
Remove blank in maven.config
Migrated "Prevent infinite loop rescanning the pack list on
PackMismatchException" to refactoring done in
https://git.eclipse.org/r/q/topic:restore-preserved-packs
Matthias Sohn [Thu, 30 Mar 2023 11:43:17 +0000 (13:43 +0200)]
Prevent infinite loop rescanning the pack list on PackMismatchException
We found, when analysing an incident where Gerrit's gc runner thread got
stuck, that we can end up in an infinite loop in
ObjectDirectory#openPackedObject which tries to rescan the pack
list and starts over trying to open a packed object in an unconfined
loop if it catches a PackMismatchException.
Here the relevant part of a thread dump we created while the gc runner
was stuck:
"WorkQueue-2[java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@350812a3[Not
completed,
task = java.util.concurrent.Executors$RunnableAdapter@5425d7ee]]" #72
tid=0x00007f73cee1c800 nid=0x584
runnable [0x00007f7392d57000]
java.lang.Thread.State: RUNNABLE
at org.eclipse.jgit.internal.storage.file.WindowCache.removeAll(WindowCache.java:716)
at org.eclipse.jgit.internal.storage.file.WindowCache.purge(WindowCache.java:399)
at org.eclipse.jgit.internal.storage.file.PackFile.close(PackFile.java:296)
at org.eclipse.jgit.internal.storage.file.ObjectDirectory.reuseMap(ObjectDirectory.java:973)
at org.eclipse.jgit.internal.storage.file.ObjectDirectory.scanPacksImpl(ObjectDirectory.java:904)
at org.eclipse.jgit.internal.storage.file.ObjectDirectory.scanPacks(ObjectDirectory.java:895)
- locked <0x000000050a498f60> (a
java.util.concurrent.atomic.AtomicReference)
at org.eclipse.jgit.internal.storage.file.ObjectDirectory.searchPacksAgain(ObjectDirectory.java:794)
at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openPackedObject(ObjectDirectory.java:465)
at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openPackedFromSelfOrAlternate(ObjectDirectory.java:417)
at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openObject(ObjectDirectory.java:408)
at org.eclipse.jgit.internal.storage.file.WindowCursor.open(WindowCursor.java:132)
at org.eclipse.jgit.lib.ObjectReader$1.open(ObjectReader.java:279)
at org.eclipse.jgit.revwalk.RevWalk$2.next(RevWalk.java:1031)
at org.eclipse.jgit.internal.storage.pack.PackWriter.findObjectsToPack(PackWriter.java:1911)
at org.eclipse.jgit.internal.storage.pack.PackWriter.preparePack(PackWriter.java:960)
at org.eclipse.jgit.internal.storage.pack.PackWriter.preparePack(PackWriter.java:876)
at org.eclipse.jgit.internal.storage.file.GC.writePack(GC.java:1168)
at org.eclipse.jgit.internal.storage.file.GC.repack(GC.java:852)
at org.eclipse.jgit.internal.storage.file.GC.doGc(GC.java:269)
at org.eclipse.jgit.internal.storage.file.GC.gc(GC.java:220)
at org.eclipse.jgit.api.GarbageCollectCommand.call(GarbageCollectCommand.java:179)
at com.google.gerrit.server.git.GarbageCollection.run(GarbageCollection.java:112)
at com.google.gerrit.server.git.GarbageCollection.run(GarbageCollection.java:75)
at com.google.gerrit.server.git.GarbageCollection.run(GarbageCollection.java:71)
at com.google.gerrit.server.git.GarbageCollectionRunner.run(GarbageCollectionRunner.java:76)
at com.google.gerrit.server.logging.LoggingContextAwareRunnable.run(LoggingContextAwareRunnable.java:103)
at java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.18/Executors.java:515)
at java.util.concurrent.FutureTask.runAndReset(java.base@11.0.18/FutureTask.java:305)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(java.base@11.0.18/ScheduledThreadPoolExecutor.java:305)
at com.google.gerrit.server.git.WorkQueue$Task.run(WorkQueue.java:612)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.18/ThreadPoolExecutor.java:1128)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.18/ThreadPoolExecutor.java:628)
at java.lang.Thread.run(java.base@11.0.18/Thread.java:829)
The code in ObjectDirectory#openPackedObject [1] apparently assumes that
this is caused by a transient problem which it can resume from by
retrying. We use `core.trustFolderStat = false` on this server since it
uses NFS. The incident we had showed that we can enter into an infinite
loop here if there is a permanent mismatch between a pack file and its
corresponding pack index. I am not yet sure how this can happen.
Break the infinite loop by limiting the number of attempts rescanning
the pack list to 5 retries. When we exceed this threshold set the type
of the PackMismatchException to permanent and rethrow it which breaks
the infinite loop.
Also apply the same limit in #getPackedObjectSize
and #selectObjectRepresentation where we use similar retry loops.
Matthias Sohn [Mon, 27 Mar 2023 20:23:11 +0000 (22:23 +0200)]
DirCache: support option index.skipHash
Support the new option index.skipHash which was introduced in git 2.40
[1]. If it is set to true skip computing the git index checksum. This
accelerates Git commands that manipulate the index, such as git add, git
commit, or git status. Instead of storing the checksum, write a trailing
set of bytes with value zero, indicating that the computation was
skipped.
Accept a skipped checksum consisting of 20 null bytes when reading the
index since the option could have been set to true at the time when the
index was written.
Xing Huang [Tue, 21 Mar 2023 22:27:49 +0000 (17:27 -0500)]
GC: Close File.lines stream
From File#lines javadoc: The returned stream from File Lines
encapsulates a Reader. If timely disposal of file system resources is
required, the try-with-resources construct should be used to ensure
that the stream's close method is
invoked after the stream operations are completed.
Matthias Sohn [Wed, 22 Feb 2023 20:02:09 +0000 (21:02 +0100)]
Merge branch 'stable-5.13' into stable-6.0
* stable-5.13:
If tryLock fails to get the lock another gc has it
Fix GcConcurrentTest#testInterruptGc
Don't swallow IOException in GC.PidLock#lock
Check if FileLock is valid before using or releasing it
Matthias Sohn [Fri, 10 Feb 2023 22:39:20 +0000 (23:39 +0100)]
Acquire file lock "gc.pid" before running gc
Git guards gc by locking a lock file "gc.pid" before starting execution.
The lock file contains the pid and hostname of the process holding the
lock. Git tries to kill the process holding that lock if the lock file
wasn't modified in the last 12 hours and was started from the same host.
Teach JGit to acquire this lock before running gc but skip execution if
another process already holds the lock. Killing the other process could
be undesired if it's a long running application.
If the lock file wasn't modified in the last 12 hours try to lock it and
run gc if locking succeeds.
Register a shutdown hook for the lock file to ensure it is cleaned up if
the process is gracefully killed.
Matthias Sohn [Thu, 16 Feb 2023 15:42:58 +0000 (16:42 +0100)]
Merge branch 'stable-5.13' into stable-6.0
* stable-5.13:
Add pack options to preserve and prune old pack files
Allow to perform PackedBatchRefUpdate without locking loose refs
Document option "core.sha1Implementation" introduced in 59029aec
Saša Živkov [Fri, 21 Oct 2022 14:32:03 +0000 (16:32 +0200)]
Allow to perform PackedBatchRefUpdate without locking loose refs
Add another newBatchUpdate method in the RefDirectory where we can
control if the created PackedBatchRefUpdate will lock the loose refs or
not.
This can be useful in cases when we run programs which have exclusive
access to a Git repository and we know that locking loose refs is
unnecessary and just a performance loss.
Matthias Sohn [Tue, 31 Jan 2023 23:30:52 +0000 (00:30 +0100)]
Merge branch 'stable-5.13' into stable-6.0
* stable-5.13:
Shortcut during git fetch for avoiding looping through all local refs
FetchCommand: fix fetchSubmodules to work on a Ref to a blob
Silence API warnings introduced by I466dcde6
Allow the exclusions of refs prefixes from bitmap
PackWriterBitmapPreparer: do not include annotated tags in bitmap
BatchingProgressMonitor: avoid int overflow when computing percentage
Speedup GC listing objects referenced from reflogs
FileSnapshotTest: Add more MISSING_FILE coverage
Luca Milanesio [Wed, 18 May 2022 12:31:30 +0000 (13:31 +0100)]
Shortcut during git fetch for avoiding looping through all local refs
The FetchProcess needs to verify that all the refs received point
to objects that are reachable from the local refs, which could be
very expensive but is needed to avoid missing objects exceptions
because of broken chains.
When the local repository has a lot of refs (e.g. millions) and the
client is fetching a non-commit object (e.g. refs/sequences/changes in
Gerrit) the reachability check on all local refs can be very expensive
compared to the time to fetch the remote ref.
Example for a 2M refs repository:
- fetching a single non-commit object: 50ms
- checking the reachability of local refs: 30s
A ref pointing to a non-commit object doesn't have any parent or
successor objects, hence would never need to have a reachability check
done. Skipping the askForIsComplete() altogether would save the 30s
time spent in an unnecessary phase.
Matthias Sohn [Tue, 4 Oct 2022 13:42:25 +0000 (15:42 +0200)]
FetchCommand: fix fetchSubmodules to work on a Ref to a blob
FetchCommand#fetchSubmodules assumed that FETCH_HEAD can always be
parsed as a tree. This isn't true if it refers to a Ref referring to a
BLOB. This is e.g. used in Gerrit for Refs like refs/sequences/changes
which are used to implement sequences stored in git.
Luca Milanesio [Tue, 20 Dec 2022 21:50:19 +0000 (21:50 +0000)]
Allow the exclusions of refs prefixes from bitmap
When running a GC.repack() against a repository with over one
thousands of refs/heads and tens of millions of ObjectIds,
the calculation of all bitmaps associated with all the refs
would result in an unreasonable big file that would take up to
several hours to compute.
Test scenario: repo with 2500 heads / 10M obj Intel Xeon E5-2680 2.5GHz
Before this change: 20 mins
After this change and 2300 heads excluded: 10 mins (90s for bitmap)
Having such a large bitmap file is also slow in the runtime
processing and have negligible or even negative benefits, because
the time lost in reading and decompressing the bitmap in memory
would not be compensated by the time saved by using it.
It is key to preserve the bitmaps for those refs that are mostly
used in clone/fetch and give the ability to exlude some refs
prefixes that are known to be less frequently accessed, even
though they may actually be actively written.
Example: Gerrit sandbox branches may even be actively
used and selected automatically because its commits are very
recent, however, they may bloat the bitmap, making it ineffective.
A mono-repo with tens of thousands of developers may have
a relatively small number of active branches where the
CI/CD jobs are continuously fetching/cloning the code. However,
because Gerrit allows the use of sandbox branches, the
total number of refs/heads may be even tens to hundred
thousands.
Luca Milanesio [Wed, 28 Dec 2022 01:09:52 +0000 (01:09 +0000)]
PackWriterBitmapPreparer: do not include annotated tags in bitmap
The annotated tags should be excluded from the bitmap associated
with the heads-only packfile. However, this was not happening
because of the check of exclusion of the peeled object instead
of the objectId to be excluded from the bitmap.
When creating a bitmap for the above commit graph, before this
change all the commits are included (3 bitmaps), which is
incorrect, because all commits reachable from annotated tags
should not be included.
The heads-only bitmap should include only commit0 and commit1
but because PackWriterBitPreparer was checking for the peeled
pointer of tag1 to be excluded (commit2) which was not found in
the list of tags to exclude (annotated-tag1), the commit2 was
included, even if it wasn't reachable only from the head.
Add an additional check for exclusion of the original objectId
for allowing the exclusion of annotated tags and their pointed
commits. Add one specific test associated with an annotated tag
for making sure that this use-case is covered also.
Example repository benchmark for measuring the improvement:
# refs: 400k (2k heads, 88k tags, 310k changes)
# objects: 11M (88k of them are annotate tags)
# packfiles: 2.7G
Before this change:
GC time: 5h
clone --bare time: 7 mins
After this change:
GC time: 20 mins
clone --bare time: 3 mins
Matthias Sohn [Wed, 18 Jan 2023 16:39:19 +0000 (17:39 +0100)]
Speedup GC listing objects referenced from reflogs
GC needs to get a ReflogReader for all existing refs to list all objects
referenced from reflogs. The existing Repository#getReflogReader method
accepts the ref name and then resolves the Ref to create a ReflogReader.
GC calling that for a huge number of Refs one by one is very slow. GC
first gets all Refs in bulk and then calls getReflogReader for each of
them.
Fix this by adding another getReflogReader method to Repository which
accepts a Ref directly.
This speeds up running JGit gc on a mirror clone of the Gerrit
repository from 15:36 min to 1:08 min. The repository used in this test
had 45k refs, 275k commits and 1.2m git objects.
Matthias Sohn [Tue, 15 Nov 2022 23:15:17 +0000 (00:15 +0100)]
Merge branch 'stable-5.13' into stable-6.0
* stable-5.13:
[benchmarks] Remove profiler configuration
Add SHA1 benchmark
[benchmarks] Set version of maven-compiler-plugin to 3.8.1
Fix running JMH benchmarks
Add option to allow using JDK's SHA1 implementation
Ignore IllegalStateException if JVM is already shutting down
Matthias Sohn [Tue, 4 Oct 2022 13:45:01 +0000 (15:45 +0200)]
Fix running JMH benchmarks
Without this I sometimes hit the error:
$ java -jar target/benchmarks.jar
Exception in thread "main" java.lang.RuntimeException: ERROR: Unable to
find the resource: /META-INF/BenchmarkList
at org.openjdk.jmh.runner.AbstractResourceReader.getReaders(AbstractResourceReader.java:98)
at org.openjdk.jmh.runner.BenchmarkList.find(BenchmarkList.java:124)
at org.openjdk.jmh.runner.Runner.internalRun(Runner.java:253)
at org.openjdk.jmh.runner.Runner.run(Runner.java:209)
at org.openjdk.jmh.Main.main(Main.java:71)
Matthias Sohn [Fri, 11 Nov 2022 16:54:06 +0000 (17:54 +0100)]
Add option to allow using JDK's SHA1 implementation
The change If6da9833 moved the computation of SHA1 from the JVM's
JCE to a pure Java implementation with collision detection.
The extra security for public sites comes with a cost of slower
SHA1 processing compared to the native implementation in the JDK.
When JGit is used internally and not exposed to any traffic from
external or untrusted users, the extra cost of the pure Java SHA1
implementation can be avoided, falling back to the previous
native MessageDigest implementation.
Matthias Sohn [Thu, 27 Oct 2022 18:31:31 +0000 (20:31 +0200)]
Ignore IllegalStateException if JVM is already shutting down
Trying to register/unregister a shutdown hook when the JVM is already in
shutdown throws an IllegalStateException. Ignore this exception since we
can't do anything about it.
Matthias Sohn [Wed, 6 Jul 2022 08:38:30 +0000 (10:38 +0200)]
Merge branch 'stable-5.13' into stable-6.0
* stable-5.13:
UploadPack: don't prematurely terminate timer in case of error
Do not create reflog for remote tracking branches during clone
UploadPack: do not check reachability of visible SHA1s
Add missing package import javax.management to org.eclipse.jgit
Matthias Sohn [Wed, 29 Jun 2022 12:58:17 +0000 (14:58 +0200)]
UploadPack: don't prematurely terminate timer in case of error
In uploadWithExceptionPropagation don't prematurely terminate timer in
case of error to enable reporting it to the client. Expose a close
method so that callers can terminate it at the appropriate time.
If the timer is already terminated when trying to report it to the
client this failed with the error java.lang.IllegalStateException:
"Timer already terminated".
Do not create reflog for remote tracking branches during clone
When using JGit on a non-bare repository, the CloneCommand
it previously created local reflogs for all branches including remote
tracking ones, causing the generation of a potentially large
number of files on the local filesystem.
The creation of the remote-tracking branches (refs/remotes/*) during
clone is not an issue for the local filesystem because all of them are
stored in a single packed-refs file. However, the creation of a large
number of ref logs on a local filesystem IS an issue because it
may not be tuned or initialised in term of inodes to contain a very
large number of files.
When a user (or a CI system) performs the CloneCommand against
a potentially large repository (e.g., millions of branches), it is
interested in working or validating a single branch or tag and is
unlikely to work with all the remote-tracking branches.
The eager creation of a reflogs for all the remote-tracking branches is
not just a performance issue but may also compromise the ability to
use JGit for cloning a large repository.
The behaviour implemented in this change is also consistent with the
optimisation done in the C code-base [1].
We differentiate between clone and fetch commands using --branch
<initialBranch> option, that is only available in clone command,
and is set as HEAD per default.
Luca Milanesio [Thu, 19 May 2022 10:12:28 +0000 (11:12 +0100)]
UploadPack: do not check reachability of visible SHA1s
When JGit needs to serve a Git client requesting SHA1s
during the want phase, it needs to make a full reachability
check from the advertised refs to the ones requested to
keep all objects in the correct scope of confidentiality
allowed by the avertised refs.
The check is also performed when the SHA1 corresponds to
one of the tips of the advertised refs which is a waste of
resources.
eric.steele [Wed, 1 Jun 2022 01:03:17 +0000 (18:03 -0700)]
AmazonS3: Add support for AWS API signature version 4
Updating the AmazonS3 class to support AWS Signature version 4 because
version 2 is no longer supported in all AWS regions. The version can be
selected with the new 'aws.api.signature.version' property (defaults to
2 for backwards compatibility). When set to '4', the user must also
specify the AWS region via the 'region' property. The 'region' property
must match the region that the 'domain' property resolves to.
Saša Živkov [Fri, 3 Jun 2022 14:36:43 +0000 (16:36 +0200)]
Fix connection leak for smart http connections
SmartHttpPushConnection: close InputStream and OutputStream after
processing. Wrap IOExceptions which aren't TransportExceptions already
as a TransportException.
Also-By: Matthias Sohn <matthias.sohn@sap.com>
Change-Id: I8e11d899672fc470c390a455dc86367e92ef9076
Remove stray files (probes or lock files) created by background threads
NOTE: port back from master branch.
On process exit, it was possible that the filesystem timestamp
resolution measurement left behind .probe files or even a lock file
for the jgit.config.
Ensure the SAVE_RUNNER is shut down when the process exits (via
System.exit() or otherwise). Move lf.lock() into the try-finally
block when saving the config file.
Delete .probe files on JVM shutdown -- they are created in daemon
threads that may terminate abruptly, not executing the "finally"
clause that normally removes these files.
BasePackConnection::readAdvertisedRefsImpl was creating an exception by
calling `noRepository`, and then blindly calling `initCause` on it. As
`noRepository` can be overridden, it's not guaranteed to be missing a
cause.
BasePackPushConnection overrides `noRepository` and initiates a fetch,
which may throw a `NoRemoteRepositoryException` with a cause.
In this case calling `initCause` threw an `IllegalStateException`.
In order to throw the correct exception, we now return the
BasePackPushConnection exception and suppress the one thrown by
BasePackConnection
Move this test to another class and skip it when running tests with
bazel since the bazel test runner does not allow to create files in the
home directory.
FS#userHome retrieves the home directory on the first call and caches it
for subsequent calls to avoid overhead in case path translation is
required (currently on cygwin). This prevents that the test can mock the
home directory using MockSystemReader like SshTestHarness does.
Marcin Czech [Wed, 22 Dec 2021 16:42:36 +0000 (17:42 +0100)]
UploadPack v2 protocol: Stop negotiation for orphan refs
The fetch of a single orphan ref (for example Gerrit meta ref:
refs/changes/21/21/meta) did not stop the negotiation so client
had to advertise all refs. This impacts the fetch performance
on repositories with a large number of refs (for example on
Gerrit repository it takes 20 seconds to fetch meta ref
comparing to 1.2 second to fetch ref with parent).
To avoid this issue UploadPack, used on the server side,
now checks if all `want` refs have parents, if not this
means that client doesn't need any extra objects, hence
the server responds with `ready` and finishes the
negotiation phase.
Matthias Sohn [Thu, 16 Dec 2021 17:42:45 +0000 (18:42 +0100)]
Use slf4j-simple instead of log4j for logging
JGit uses slf4j-api as logging API.
The libraries
- org.eclipse.jgit.http.test
- org.eclipse.jgit.pgm
- org.eclipse.jgit.ssh.apache.test
- org.eclipse.jgit.test
used the outdated log4j 1.2.15 which is EOL since years.
Since both jgit command line and also the tests don't need sophisticated
logging features replace log4j with the much simpler slf4j-simple log
implementation. The org.slf4j.binding.simple 1.7.30 archive has only
25kB instead of 429kB for log4j 1.2.15
Applications using jgit are free to choose any other log implementation
supporting slf4j API.
Matthias Sohn [Mon, 13 Dec 2021 23:07:10 +0000 (00:07 +0100)]
Update orbit to R20211213173813
and update
- com.google.gson to 2.8.8.v20211029-0838
- javaewah to 1.1.13.v20211029-0839
- net.i2p.crypto.eddsa to 0.3.0.v20210923-1401
- org.apache.ant to 1.10.12.v20211102-1452
- org.apache.commons.compress to 1.21.0.v20211103-2100
- org.bouncycastle.bcprov to 1.69.0.v20210923-1401
- org.junit to 4.13.2.v20211018-1956