Marcin Czech [Wed, 22 Dec 2021 16:42:36 +0000 (17:42 +0100)]
UploadPack v2 protocol: Stop negotiation for orphan refs
The fetch of a single orphan ref (for example Gerrit meta ref:
refs/changes/21/21/meta) did not stop the negotiation so client
had to advertise all refs. This impacts the fetch performance
on repositories with a large number of refs (for example on
Gerrit repository it takes 20 seconds to fetch meta ref
comparing to 1.2 second to fetch ref with parent).
To avoid this issue UploadPack, used on the server side,
now checks if all `want` refs have parents, if not this
means that client doesn't need any extra objects, hence
the server responds with `ready` and finishes the
negotiation phase.
Matthias Sohn [Fri, 3 Dec 2021 21:45:05 +0000 (22:45 +0100)]
Update maven plugins
- build-helper-maven-plugin to 3.2.0
- jacoco-maven-plugin to 0.8.7
- maven-antrun-plugin to 3.0.0
- maven-dependency-plugin to 3.2.0
- maven-enforcer-plugin to 3.0.0
- maven-jar-plugin to 3.2.0
- maven-javadoc-plugin to 3.3.1
- maven-jxr-plugin to 3.1.1
- maven-pmd-plugin to 3.15.0
- maven-project-info-reports-plugin to 3.1.2
- maven-resources-plugin to 3.2.0
- maven-shade-plugin to 3.2.4
- maven-site-plugin to 3.9.1
- maven-source-plugin to 3.2.1
- maven-surefire-plugin to 3.0.0-M5
- spotbugs-maven-plugin to 4.3.0
- tycho and tycho-extras to 1.7.0
RefDirectory.scanRef: Re-use file existence check done in snapshot creation
Return immediately in scanRef if the loose ref was identified as
missing when a snapshot was attempted for the ref. This will help
performance of scanRef when the ref is packed but has a corresponding
empty dir in 'refs/'.
For example, consider the case where we create 50k sharded refs in
a new namespace called 'new-refs' using an atomic 'BatchRefUpdate'.
The refs are named like 'refs/new-refs/01/1/1', 'refs/new-refs/01/1/2',
'refs/new-refs/01/1/3' and so on. After the refs are created, the
'new-refs' namespace looks like below:
$ find refs/new-refs -type f | wc -l
0
$ find refs/new-refs -type d | wc -l
5101
At this point, an 'exactRef' call on each of the 50k refs without
this change takes ~2.5s, where as with this change it takes ~1.5s.
FileSnapshot: Lazy load file store attributes cache
Doing a getFileStoreAttributes call even when the file doesn't
exist is unnecessary. This call is particularly slow on some
filesystems. Instead, do it only when the file exists and load
the appropriate cache.
This update can help speed up RefDirectory.exactRef when the ref
is packed, but has a corresponding empty dir for it under 'refs/'.
This scenario can happen when an atomic 'BatchRefUpdate' creates
new sharded refs.
For example, consider the case where we create 50k sharded refs in
a new namespace called 'new-refs' using an atomic 'BatchRefUpdate'.
The refs are named like 'refs/new-refs/01/1/1', 'refs/new-refs/01/1/2',
'refs/new-refs/01/1/3' and so on. After the refs are created, the
'new-refs' namespace looks like below:
$ find refs/new-refs -type f | wc -l
0
$ find refs/new-refs -type d | wc -l
5101
At this point, an 'exactRef' call on each of the 50k refs without
this change takes ~30s, where as with this change it takes ~2.5s.
Thomas Wolf [Sun, 28 Nov 2021 10:42:20 +0000 (11:42 +0100)]
FS: debug logging only if system config file cannot be found
The command 'git config --system --show-origin --list -z' fails if
the system config doesn't exist. Use debug logging instead of a
warning for failures of that command. Typically the user cannot do
anything about it anyway, and JGit will just work without system
config.
Bug: 577492
Change-Id: If628ab376182183aea57a385c169e144d371bbb2 Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Thomas Wolf [Fri, 19 Nov 2021 18:15:46 +0000 (19:15 +0100)]
Better git system config finding
We've used "GIT_EDITOR=edit git config --system --edit" to determine
the location of the git system config for a long time. But git 2.34.0
always expects this command to have a tty, but there isn't one when
called from Java. If there isn't one, the Java process may get a
SIGTTOU from the child process and hangs.
Arguably it's a bug in C git 2.34.0 to unconditionally assume there
was a tty. But JGit needs a fix *now*, otherwise any application using
JGit will lock up if git 2.34.0 is installed on the machine.
Therefore, use a different approach if the C git found is 2.8.0 or
newer: parse the output of
git config --system --show-origin --list -z
"--show-origin" exists since git 2.8.0; it prefixes the values with
the file name of the config file they come from, which is the system
config file for this command. (This works even if the first item in
the system config is an include.)
Bug: 577358
Change-Id: I3ef170ed3f488f63c3501468303119319b03575d Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
(cherry picked from commit 9446e62733da5005be1d5182f0dce759a3052d4a)
Matthias Sohn [Fri, 15 Oct 2021 20:48:01 +0000 (22:48 +0200)]
Merge branch 'stable-5.11' into stable-5.12
* stable-5.11:
Fix missing peel-part in lsRefsV2 for loose annotated tags
reftable: drop code for truncated reads
reftable: pass on invalid object ID in conversion
Update eclipse-jarsigner-plugin to 1.3.2
Fix running benchmarks from bazel
Update eclipse-jarsigner-plugin to 1.3.2
Matthias Sohn [Fri, 15 Oct 2021 20:45:18 +0000 (22:45 +0200)]
Merge branch 'stable-5.10' into stable-5.11
* stable-5.10:
Fix missing peel-part in lsRefsV2 for loose annotated tags
reftable: drop code for truncated reads
reftable: pass on invalid object ID in conversion
Update eclipse-jarsigner-plugin to 1.3.2
Fix running benchmarks from bazel
Update eclipse-jarsigner-plugin to 1.3.2
Matthias Sohn [Fri, 15 Oct 2021 20:32:00 +0000 (22:32 +0200)]
Merge branch 'stable-5.9' into stable-5.10
* stable-5.9:
Fix missing peel-part in lsRefsV2 for loose annotated tags
reftable: drop code for truncated reads
reftable: pass on invalid object ID in conversion
Update eclipse-jarsigner-plugin to 1.3.2
Fix running benchmarks from bazel
Update eclipse-jarsigner-plugin to 1.3.2
Unfortunately, the existing UploadPackTest didn't reveal this issue
although it provided the test case needed to do so: testV2LsRefsPeel.
This is because The UploadPackTest uses InMemoryRepository which
internally uses Dfs* implementations. The issue is only reproducible
when using the FileRepository.
It is a non-trivial task to refactor the UploadPackTest to work against
both InMemoryRepository and FileRepository and this change is not trying
to do that. This change creates a new test:
UploadPackLsRefsFileRepositoryTest and copies the necesssary code from
the UploadPackTest.
Find out which branches is 2 merged into:
After we calculated that master contains 2, we can mark 6 as TEMP_MARK
to avoid unwanted walks.
When we want to figure out if 2 is merge into the topic, the traversal
path would be [7, 6] instead of [7, 6, 5, 3, 2].
Test:
This change can significantly improve performance for tags.
On a copy of the Linux repository, the command 'git tag --contains
<commit>' had the following performance improvement:
The current version ignores tags, even though the tag is a type of
the ref.
Follow-up commits I'll fix it.
Change-Id: Ie6295ca4d16070499912af462239e679a97cce47 Signed-off-by: kylezhao <kylezhao@tencent.com> Reviewed-by: Christian Halstrick <christian.halstrick@sap.com> Reviewed-by: Martin Fick <mfick@codeaurora.org>
The reftable format is a block based format, but allows for variably
sized blocks. This obviously happens for reflog blocks (which are zlib
compressed), but is also accepted for index blocks: In the spec, this
is motivated as
To achieve constant O(1) disk seeks for lookups the index must be
a single level, which is permitted to exceed the file's
configured block size, but not the format's max block size of
15.99 MiB.
Hence, when parsing a block, one cannot be sure of its exact size:
after reading a default-size block (eg. 4kb), the block header may
state that the block is in fact larger.
Before, the code would mark the block as `truncated`, noting
// Its OK during sequential scan for an index block to have been
// partially read and be truncated in-memory. This happens when
// the index block is larger than the file's blockSize. Caller
// will break out of its scan loop once it sees the blockType.
This looks like either
* a remnant of never-implemented functionality. There is no reason to
ever sequentially scan an index block.
* alluding to sequential scan of the data blocks before the index
blocks (eg. scanning refs, which ends when we find the first ref index
block, and we can then ignore the index block).
This comment is followed by code that populates the
restartTbl/restartCnt fields relative to the (possibly truncated)
buffer. If the buffer is truncated, this essentially reads garbage,
leading to OOB array access when using the index block.
Fix this by dropping the truncated logic and issuing a second read if
the first read was short.
Add a test.
We have never observed this failure scenario at Google. We use 64kb
blocksize, which requires us to need fewer index entries. The reftable
spec mentions an Android repo of size 36M. With 64kb blocks, that's
just 562 index entries. Even with historical growth, we are long from
requiring an index whose size exceeds a single block.
When adding the analogous test for seeking refs, there was no failure.
This points to another possibility which is that the code tries to
avoid writing large index blocks for refs.
Antonio Barone [Wed, 1 Sep 2021 10:04:07 +0000 (12:04 +0200)]
GitServlet: allow to override default error handlers
GitServlet delegates repository access over HTTP to the GitFilter
servlet.
GitServlet, in turn, can be extended by jgit consumers to provide custom
logic when handling such operations.
This is the case, for example, with Gerrit Code Review, which provides
custom behavior with a GitOverHttpServlet [1].
Among possible customizations, the ability of specifying a custom error
handler for UploadPack and ReceivePack was already introduced in
GitFilter by Idd3b87d6b and I9c708aa5a2, respectively.
However the `setUploadPackErrorHandler` and `setReceivePackErrorHandler`
methods were never added to the GitServlet.
Expose the `setUploadPackErrorHandler` and `setReceivePackErrorHandler`
methods to the GitServlet, so that consumers of the jgit library might
specify custom error handlers.
Thomas Wolf [Sun, 25 Jul 2021 13:44:35 +0000 (15:44 +0200)]
[test] Create keystore with the keytool of the running JDK
Call keytool with the absolute path of "java.home". Otherwise a keytool
for a different, maybe even newer Java version might be picked up, and
then the keystore may not be readable by the JVM used to run the tests.
Antonio Barone [Wed, 2 Jun 2021 15:13:17 +0000 (18:13 +0300)]
Retry loose object read upon "Stale file handle" exception
When reading loose objects over NFS it is possible that the OS syscall
would fail with ESTALE errors: This happens when the open file
descriptor no longer refers to a valid file.
Notoriously it is possible to hit this scenario when git data is shared
among multiple clients, for example by multiple gerrit instances in HA.
If one of the two clients performs a GC operation that would cause the
packing and then the pruning of loose objects, the other client might
still hold a reference to those objects, which would cause an exception
to bubble up the stack.
The Linux NFS FAQ[1] (at point A.10), suggests that the proper way to
handle such ESTALE scenarios is to:
"[...] close the file or directory where the error occurred, and reopen
it so the NFS client can resolve the pathname again and retrieve the new
file handle."
In case of a stale file handle exception, we now attempt to read the
loose object again (up to 5 times), until we either succeed or encounter
a FileNotFoundException, in which case the search can continue to
Packfiles and alternates.
The limit of 5 provides an arbitrary upper bounds that is consistent to
the one chosen when handling stale file handles for packed-refs
files (see [2] for context).
andrewxian2000 [Mon, 14 Jun 2021 21:58:52 +0000 (09:58 +1200)]
Fix garbage collection failing to delete pack file
The loosen() method has opened pack file and the open pack file handle
may prevent it from being deleted e.g. on Windows. Fix this by closing
the pack file only after loosen() finished.
Bug: 574178
Change-Id: Icd59931a218d84c9c97b450eea87b21ed01248ff Signed-off-by: andrew.xian2000@gmail.com Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>