Thomas Wolf [Fri, 19 Nov 2021 18:15:46 +0000 (19:15 +0100)]
Better git system config finding
We've used "GIT_EDITOR=edit git config --system --edit" to determine
the location of the git system config for a long time. But git 2.34.0
always expects this command to have a tty, but there isn't one when
called from Java. If there isn't one, the Java process may get a
SIGTTOU from the child process and hangs.
Arguably it's a bug in C git 2.34.0 to unconditionally assume there
was a tty. But JGit needs a fix *now*, otherwise any application using
JGit will lock up if git 2.34.0 is installed on the machine.
Therefore, use a different approach if the C git found is 2.8.0 or
newer: parse the output of
git config --system --show-origin --list -z
"--show-origin" exists since git 2.8.0; it prefixes the values with
the file name of the config file they come from, which is the system
config file for this command. (This works even if the first item in
the system config is an include.)
Bug: 577358
Change-Id: I3ef170ed3f488f63c3501468303119319b03575d Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
(cherry picked from commit 9446e62733da5005be1d5182f0dce759a3052d4a)
Matthias Sohn [Fri, 15 Oct 2021 20:48:01 +0000 (22:48 +0200)]
Merge branch 'stable-5.11' into stable-5.12
* stable-5.11:
Fix missing peel-part in lsRefsV2 for loose annotated tags
reftable: drop code for truncated reads
reftable: pass on invalid object ID in conversion
Update eclipse-jarsigner-plugin to 1.3.2
Fix running benchmarks from bazel
Update eclipse-jarsigner-plugin to 1.3.2
Matthias Sohn [Fri, 15 Oct 2021 20:45:18 +0000 (22:45 +0200)]
Merge branch 'stable-5.10' into stable-5.11
* stable-5.10:
Fix missing peel-part in lsRefsV2 for loose annotated tags
reftable: drop code for truncated reads
reftable: pass on invalid object ID in conversion
Update eclipse-jarsigner-plugin to 1.3.2
Fix running benchmarks from bazel
Update eclipse-jarsigner-plugin to 1.3.2
Matthias Sohn [Fri, 15 Oct 2021 20:32:00 +0000 (22:32 +0200)]
Merge branch 'stable-5.9' into stable-5.10
* stable-5.9:
Fix missing peel-part in lsRefsV2 for loose annotated tags
reftable: drop code for truncated reads
reftable: pass on invalid object ID in conversion
Update eclipse-jarsigner-plugin to 1.3.2
Fix running benchmarks from bazel
Update eclipse-jarsigner-plugin to 1.3.2
Unfortunately, the existing UploadPackTest didn't reveal this issue
although it provided the test case needed to do so: testV2LsRefsPeel.
This is because The UploadPackTest uses InMemoryRepository which
internally uses Dfs* implementations. The issue is only reproducible
when using the FileRepository.
It is a non-trivial task to refactor the UploadPackTest to work against
both InMemoryRepository and FileRepository and this change is not trying
to do that. This change creates a new test:
UploadPackLsRefsFileRepositoryTest and copies the necesssary code from
the UploadPackTest.
Find out which branches is 2 merged into:
After we calculated that master contains 2, we can mark 6 as TEMP_MARK
to avoid unwanted walks.
When we want to figure out if 2 is merge into the topic, the traversal
path would be [7, 6] instead of [7, 6, 5, 3, 2].
Test:
This change can significantly improve performance for tags.
On a copy of the Linux repository, the command 'git tag --contains
<commit>' had the following performance improvement:
The current version ignores tags, even though the tag is a type of
the ref.
Follow-up commits I'll fix it.
Change-Id: Ie6295ca4d16070499912af462239e679a97cce47 Signed-off-by: kylezhao <kylezhao@tencent.com> Reviewed-by: Christian Halstrick <christian.halstrick@sap.com> Reviewed-by: Martin Fick <mfick@codeaurora.org>
The reftable format is a block based format, but allows for variably
sized blocks. This obviously happens for reflog blocks (which are zlib
compressed), but is also accepted for index blocks: In the spec, this
is motivated as
To achieve constant O(1) disk seeks for lookups the index must be
a single level, which is permitted to exceed the file's
configured block size, but not the format's max block size of
15.99 MiB.
Hence, when parsing a block, one cannot be sure of its exact size:
after reading a default-size block (eg. 4kb), the block header may
state that the block is in fact larger.
Before, the code would mark the block as `truncated`, noting
// Its OK during sequential scan for an index block to have been
// partially read and be truncated in-memory. This happens when
// the index block is larger than the file's blockSize. Caller
// will break out of its scan loop once it sees the blockType.
This looks like either
* a remnant of never-implemented functionality. There is no reason to
ever sequentially scan an index block.
* alluding to sequential scan of the data blocks before the index
blocks (eg. scanning refs, which ends when we find the first ref index
block, and we can then ignore the index block).
This comment is followed by code that populates the
restartTbl/restartCnt fields relative to the (possibly truncated)
buffer. If the buffer is truncated, this essentially reads garbage,
leading to OOB array access when using the index block.
Fix this by dropping the truncated logic and issuing a second read if
the first read was short.
Add a test.
We have never observed this failure scenario at Google. We use 64kb
blocksize, which requires us to need fewer index entries. The reftable
spec mentions an Android repo of size 36M. With 64kb blocks, that's
just 562 index entries. Even with historical growth, we are long from
requiring an index whose size exceeds a single block.
When adding the analogous test for seeking refs, there was no failure.
This points to another possibility which is that the code tries to
avoid writing large index blocks for refs.
Antonio Barone [Wed, 1 Sep 2021 10:04:07 +0000 (12:04 +0200)]
GitServlet: allow to override default error handlers
GitServlet delegates repository access over HTTP to the GitFilter
servlet.
GitServlet, in turn, can be extended by jgit consumers to provide custom
logic when handling such operations.
This is the case, for example, with Gerrit Code Review, which provides
custom behavior with a GitOverHttpServlet [1].
Among possible customizations, the ability of specifying a custom error
handler for UploadPack and ReceivePack was already introduced in
GitFilter by Idd3b87d6b and I9c708aa5a2, respectively.
However the `setUploadPackErrorHandler` and `setReceivePackErrorHandler`
methods were never added to the GitServlet.
Expose the `setUploadPackErrorHandler` and `setReceivePackErrorHandler`
methods to the GitServlet, so that consumers of the jgit library might
specify custom error handlers.
Thomas Wolf [Sun, 25 Jul 2021 13:44:35 +0000 (15:44 +0200)]
[test] Create keystore with the keytool of the running JDK
Call keytool with the absolute path of "java.home". Otherwise a keytool
for a different, maybe even newer Java version might be picked up, and
then the keystore may not be readable by the JVM used to run the tests.
Antonio Barone [Wed, 2 Jun 2021 15:13:17 +0000 (18:13 +0300)]
Retry loose object read upon "Stale file handle" exception
When reading loose objects over NFS it is possible that the OS syscall
would fail with ESTALE errors: This happens when the open file
descriptor no longer refers to a valid file.
Notoriously it is possible to hit this scenario when git data is shared
among multiple clients, for example by multiple gerrit instances in HA.
If one of the two clients performs a GC operation that would cause the
packing and then the pruning of loose objects, the other client might
still hold a reference to those objects, which would cause an exception
to bubble up the stack.
The Linux NFS FAQ[1] (at point A.10), suggests that the proper way to
handle such ESTALE scenarios is to:
"[...] close the file or directory where the error occurred, and reopen
it so the NFS client can resolve the pathname again and retrieve the new
file handle."
In case of a stale file handle exception, we now attempt to read the
loose object again (up to 5 times), until we either succeed or encounter
a FileNotFoundException, in which case the search can continue to
Packfiles and alternates.
The limit of 5 provides an arbitrary upper bounds that is consistent to
the one chosen when handling stale file handles for packed-refs
files (see [2] for context).
andrewxian2000 [Mon, 14 Jun 2021 21:58:52 +0000 (09:58 +1200)]
Fix garbage collection failing to delete pack file
The loosen() method has opened pack file and the open pack file handle
may prevent it from being deleted e.g. on Windows. Fix this by closing
the pack file only after loosen() finished.
Bug: 574178
Change-Id: Icd59931a218d84c9c97b450eea87b21ed01248ff Signed-off-by: andrew.xian2000@gmail.com Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Matthias Sohn [Tue, 1 Jun 2021 14:36:43 +0000 (16:36 +0200)]
Merge branch 'master' into stable-5.12
* master:
Update Orbit to S20210518003616 and ant to 1.10.10.v20210426-1926
Skip detecting content renames for binary files
RepoCommand: Retry commit on LockFailure
Finish upgrading eclipse-jarsigner-plugin to 1.3.1
Upgrade maven-javadoc-plugin to 3.3.0
Update maven-project-info-reports-plugin to 3.1.2
Update spotbugs-maven-plugin to 4.2.3
This is similar to change Idbc2c29bd that skipped detecting content
renames for large files. With this change, we added a new option in
RenameDetector called "skipContentRenamesForBinaryFiles", that when set,
causes binary files with any slight modification to be identified as
added/deleted. The default for this boolean is false, so preserving
current behaviour.
* changes:
Finish upgrading eclipse-jarsigner-plugin to 1.3.1
Upgrade maven-javadoc-plugin to 3.3.0
Update maven-project-info-reports-plugin to 3.1.2
Update spotbugs-maven-plugin to 4.2.3
Ivan Frade [Mon, 24 May 2021 16:53:08 +0000 (09:53 -0700)]
RepoCommand: Retry commit on LockFailure
When the target repository is receiving commits from other sources,
the repo command commit can fail with a LOCK_FAILURE. We could let
callers retry, but then the command needs to redo all the work (opening
all subrepos to recreate the tree).
Retry the commit in LOCK_FAILURE inside the command. The commit
rewrites the whole tree, so it shouldn't have merge errors. Use an
exponential delay with jitter for the retries.
Change-Id: I517b6f2afd16a4b695e6cf471b5d6cf492024ec4 Signed-off-by: Ivan Frade <ifrade@google.com>
Matthias Sohn [Wed, 26 May 2021 15:21:02 +0000 (17:21 +0200)]
Merge branch 'master' into stable-5.12
* master:
RepoCommand: Do not set 'branch' if the revision is a tag
pgm: rewrite parents when --parents flag is passed
ApplyCommand: fix "no newline at end" detection
ApplyCommand: handle completely empty context lines in text patches
ApplyCommand: use byte arrays for text patches, not strings
ApplyCommand: support binary patches
ApplyCommand: add a stream to apply a delta patch
ApplyCommand: add streams to read/write binary patch hunks
ApplyCommand: add a base-85 codec
ApplyCommand: convert to git internal format before applying patch
SSH config: fix whitespace handling
SSH config: fix negated patterns
Fix @since tag for introduction of PUBKEY_ACCEPTED_ALGORITHMS
Prepare 5.11.2-SNAPSHOT builds
JGit v5.11.1.202105131744-r
Add a cgit interoperability test for LockFile
Add TemporaryBuffer.toString(int limit)
LockFile: create OutputStream only when needed
Add git config for conflict style merge/diff3
Change-Id: If7751ff99079eaea31ed1fce811d141ecf209727 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Ivan Frade [Tue, 25 May 2021 00:31:25 +0000 (17:31 -0700)]
RepoCommand: Do not set 'branch' if the revision is a tag
The "branch" field in the .gitmodules is the signal for gerrit to keep
the superproject autoupdated. Tags are immutable and there is no need to
track them, plus the cgit client requires the field to be a "remote
branch name" but not a tag.
Do not set the "branch" field if the revision is a tag. Keep those tags
in another field ("ref") as they help other tools to find the commit in
the destination repository.
We can still have false negatives when a refname is not fully qualified,
but this check covers e.g. the most common case in android.
Note that the javadoc of #setRecordRemoteBranch already mentions that
"submodules that request a tag will not have branch name recorded".
Change-Id: Ib1c321a4d3b7f8d51ca2ea204f72dc0cfed50c37 Signed-off-by: Ivan Frade <ifrade@google.com>
Matthias Sohn [Wed, 26 May 2021 12:12:17 +0000 (08:12 -0400)]
Merge changes from topic "apply"
* changes:
ApplyCommand: fix "no newline at end" detection
ApplyCommand: handle completely empty context lines in text patches
ApplyCommand: use byte arrays for text patches, not strings
ApplyCommand: support binary patches
ApplyCommand: add a stream to apply a delta patch
ApplyCommand: add streams to read/write binary patch hunks
ApplyCommand: add a base-85 codec
ApplyCommand: convert to git internal format before applying patch
kylezhao [Tue, 30 Mar 2021 03:04:12 +0000 (11:04 +0800)]
pgm: rewrite parents when --parents flag is passed
According to [1], we should rewrite parents in RevWalkTextBuiltin
when variable parents is true.
[1]
https://git-scm.com/docs/git-rev-list#Documentation/git-rev-list.txt---parents
Thomas Wolf [Wed, 10 Mar 2021 18:26:39 +0000 (19:26 +0100)]
ApplyCommand: fix "no newline at end" detection
Check the last line of the last hunk of a file, not the last line of
the whole patch.
Note that C git only checks that this line starts with "\ " and is at
least 12 characters long because of possible different texts when non-
English messages are used.
Change-Id: I0db81699eb3e99ed7b536a3e2b8dc97df1f58a89 Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Thomas Wolf [Wed, 10 Mar 2021 17:04:25 +0000 (18:04 +0100)]
ApplyCommand: handle completely empty context lines in text patches
C git treats completely empty lines as empty context lines (which
traditionally have a single blank). Apparently newer GNU diff may
produce such lines; see [1]. ("Newer" meaning "since 2006"...)
Thomas Wolf [Wed, 10 Mar 2021 13:25:37 +0000 (14:25 +0100)]
ApplyCommand: use byte arrays for text patches, not strings
Instead of converting the patch bytes to strings apply the patch on
byte level, like C git does. Converting the input lines and the hunk
lines from bytes to strings and then applying the patch based on
strings may give surprising results if a patch converts a text file
from one encoding to another. Moreover, in the end we don't know which
encoding to use to write the result.
Previous code just wrote the result as UTF-8, which forcibly changed
the encoding if the original input had some other encoding (even if the
patch had the same non-UTF-8 encoding). It was also wrong if the input
was UTF-8, and the patch should have changed the encoding to something
else.
So use ByteBuffers instead of Strings. This has the additional advantage
that all these ByteBuffers can share the underlying byte arrays of the
input and of the patch, so it also reduces memory consumption.
Change-Id: I450975f2ba0e7d0bec8973e3113cc2e7aea187ee Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Thomas Wolf [Sun, 7 Mar 2021 17:50:23 +0000 (18:50 +0100)]
ApplyCommand: support binary patches
Implement applying binary patches. Handles both literal and delta
patches. Note that C git also runs binary files through the clean
and smudge filters. Implement the same safeguards against corrupted
patches as in C git: require the full OIDs to be present in the patch
file, and apply a binary patch only if both pre- and post-image hashes
match.
Add tests for applying literal and delta patches.
Bug: 371725
Change-Id: I71dc214fe4145d7cc8e4769384fb78c7d0d6c220 Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Thomas Wolf [Sun, 7 Mar 2021 17:49:01 +0000 (18:49 +0100)]
ApplyCommand: add a stream to apply a delta patch
Add a new BinaryDeltaInputStream that applies a delta provided by
another InputStream to a given base. Because delta application needs
random access to the base, the base itself cannot be yet another
InputStream. But at least this enables streaming of the result.
Add a simple test using delta hunks generated by C git.
Bug: 371725
Change-Id: Ibd26fa2f49860737ad5c5387f7f4870d3e85e628 Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Thomas Wolf [Fri, 5 Mar 2021 23:00:15 +0000 (00:00 +0100)]
ApplyCommand: add streams to read/write binary patch hunks
Add streams that can encode or decode git binary patch data on the fly.
Git writes binary patches base-85 encoded, at most 52 un-encoded bytes,
with the unencoded data length prefixed in a one-character encoding, and
suffixed with a newline character.
Add a test for both the new input and the output stream. The test
roundtrips binary data of different lengths in different ways.
Bug: 371725
Change-Id: Ic3faebaa4637520f5448b3d1acd78d5aaab3907a Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Thomas Wolf [Fri, 5 Mar 2021 22:55:18 +0000 (23:55 +0100)]
ApplyCommand: add a base-85 codec
Add an implementation for base-85 encoding and decoding [1]. Git binary
patches use this format.
Base-85 encoding assembles bytes as 32-bit MSB values, then converts
these values to base-85 numbers (always 5 bytes) encoded as printable
ASCII characters. Decoding base-85 is the reverse operation. Note
that decoding may overflow on invalid input as 85^5 > 2^32. Encodings
always have a length that is a multiple of 5. If input length is not
divisible by 4, padding bytes are (logically) added, which are ignored
when decoding. The encoding for n bytes has thus always exactly length
(n + 3) / 4 * 5 in integer arithmetic (truncating division).
Includes tests.
[1] https://datatracker.ietf.org/doc/html/rfc1924
Bug: 371725
Change-Id: Ib5b9a503cd62cf70e080a4fb38c8cd1eeeaebcfe Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
BatchRefUpdate: Skip saving conflicting ref names and prefixes in memory
Rather than getting all ref names and prefixes and saving them
in memory to perform the check for conflicting names, rely on
RefDirectory.isNameConflicting as it is no longer an expensive
call after it was optimized in Ie994fc.
The old optimization to save ref names and prefixes in memory
was targeted towards making clones faster. With this change,
the clone performance is unaffected when tests were done with
repos containing many(~500k) refs.
Here are few recorded elapsed times for creating 10 branches
using BatchRefUpdate on NFS based repositories with varying
loose refs count. As seen here, this change helps improve the
BatchRefUpdate performance from O(n^2) to O(1).
loose_refs_count with_change without_change
50 241 ms 310 ms
300 263 ms 1502 ms
1k 181 ms 4241 ms
2k 204 ms 6440 ms
9k 158 ms 25930 ms
20k 154 ms 60443 ms
50k 171 ms 135199 ms
110k 157 ms 329450 ms
160k 209 ms 396328 ms
This update improves the Gerrit notedb migration performance
as it uses BatchRefUpdate to write change meta refs similar to
the test performed above.