BatchRefUpdate: Skip saving conflicting ref names and prefixes in memory
Rather than getting all ref names and prefixes and saving them
in memory to perform the check for conflicting names, rely on
RefDirectory.isNameConflicting as it is no longer an expensive
call after it was optimized in Ie994fc.
The old optimization to save ref names and prefixes in memory
was targeted towards making clones faster. With this change,
the clone performance is unaffected when tests were done with
repos containing many(~500k) refs.
Here are few recorded elapsed times for creating 10 branches
using BatchRefUpdate on NFS based repositories with varying
loose refs count. As seen here, this change helps improve the
BatchRefUpdate performance from O(n^2) to O(1).
loose_refs_count with_change without_change
50 241 ms 310 ms
300 263 ms 1502 ms
1k 181 ms 4241 ms
2k 204 ms 6440 ms
9k 158 ms 25930 ms
20k 154 ms 60443 ms
50k 171 ms 135199 ms
110k 157 ms 329450 ms
160k 209 ms 396328 ms
This update improves the Gerrit notedb migration performance
as it uses BatchRefUpdate to write change meta refs similar to
the test performed above.
Update tests to record the number of events fired post-setup and only
assert for events fired during BatchRefUpdate.execute. For tests which
use writeLooseRef to setup refs, create new tests which assert the
number of RefsChangedEvent(s) rather than updating the existing ones
to call RefDirectory.exactRef as it changes the code path.
Avoid having to scan over ALL loose refs to determine if the
name is nested within or is a container of an existing reference.
This can get really expensive if there are too many loose refs.
Instead use exactRef and getRefsByPrefix which scan based on a
prefix.
With a simple shell script(like below) using jgit client to create
1k refs in a new repository on NFS, this change brings down the time
from 12mins to 7mins.
for ref in $(seq 1 1000); do
jgit branch "$ref"
done
Here are few recorded elapsed times to create a new branch on NFS
based repositories with varying loose refs count. As we see here,
this change improves the name conflicting check from O(n^2) to O(1).
loose_refs_count with_change without_change
50 44 ms 164 ms
300 45 ms 1193 ms
1k 38 ms 2610 ms
2k 44 ms 6003 ms
9k 46 ms 27860 ms
20k 45 ms 48591 ms
50k 51 ms 135471 ms
110k 43 ms 294252 ms
160k 52 ms 430976 ms
Matthias Sohn [Mon, 10 May 2021 22:56:57 +0000 (00:56 +0200)]
Merge branch 'stable-5.9' into stable-5.10
* stable-5.9:
LockFile: create OutputStream only when needed
Remove ReftableNumbersNotIncreasingException
Fix stamping to produce stable file timestamps
Thomas Wolf [Tue, 4 May 2021 21:48:56 +0000 (23:48 +0200)]
LockFile: create OutputStream only when needed
Don't create the stream eagerly in lock(); that may cause JGit to
exceed OS or JVM limits on open file descriptors if many locks need
to be created, for instance when creating many refs. Instead create
the output stream only when one really needs to write something.
Bug: 573328
Change-Id: If9441ed40494d46f594a896d34a5c4f56f91ebf4 Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Thomas Wolf [Fri, 19 Mar 2021 08:35:34 +0000 (09:35 +0100)]
sshd: try all configured signature algorithms for a key
For RSA keys, there may be several configured signature algorithms:
rsa-sha2-512, rsa-sha2-256, and ssh-rsa. Upstream sshd has bug
SSHD-1105 [1] and always and unconditionally uses only the first
configured algorithm. With the default order, this means that it cannot
connect to a server that knows only ssh-rsa, like for instance Apache
MINA sshd servers older than 2.6.0.
This affects for instance bitbucket.org or also AWS Code Commit.
Re-introduce our own pubkey authenticator that fixes this.
Note that a server may impose a penalty (back-off delay) for subsequent
authentication attempts with signature algorithms unknown to the server.
In such cases, users can re-order the signature algorithm list via the
PubkeyAcceptedAlgorithms (formerly PubkeyAcceptedKeyTypes) ssh config.
Apache MINA sshd 2.6.0 appears to use only the first appropriate
public key signature algorithm for a particular key. See [1]. For
RSA keys, that is rsa-sha2-512. This breaks authentication at servers
that only know the older (and deprecated) ssh-rsa algorithm.
With PubkeyAcceptedAlgorithms, users can re-order algorithms in
the ssh config file per host, if needed. Setting
PubkeyAcceptedAlgorithms ^ssh-rsa
will put "ssh-rsa" at the front of the list of algorithms, and then
authentication at such servers with RSA keys works again.
Matthias Sohn [Tue, 9 Mar 2021 17:00:55 +0000 (18:00 +0100)]
Merge branch 'master' into stable-5.11
* master:
Manually set status of jmh dependencies
Update DEPENDENCIES report for 5.11.0
Add dependency to dash-licenses
PackFile: Add id + ext based constructors
GC: deleteOrphans: Use PackFile
PackExt: Convert to Enum
Restore preserved packs during missing object seeks
Pack: Replace extensions bitset with bitmapIdx PackFile
PackDirectory: Use PackFile to ensure we find preserved packs
GC: Use PackFile to de-dup logic
Create a PackFile class for Pack filenames
Matthias Sohn [Sun, 7 Mar 2021 17:41:05 +0000 (18:41 +0100)]
Manually set status of jmh dependencies
The following jmh dependencies were approved as works-with:
- jmh-core/1.21 has GPL-2.0 license and was approved in CQ20517
- jmh-generator-annprocess/1.21 has GPL-2.0 license and was approved in
CQ20518
Nasser Grainawi [Thu, 4 Mar 2021 21:14:43 +0000 (14:14 -0700)]
PackFile: Add id + ext based constructors
Add new constructors to PackFile to improve a common use case where
callers know the directory, id, and extension, but previously needed to
construct a valid file name (with prefix, '.', etc) to create a
PackFile. Most callers can use the variant that has id as an ObjectId,
but provide an id as String variant too.
Martin Fick [Tue, 15 Dec 2020 21:20:44 +0000 (14:20 -0700)]
Restore preserved packs during missing object seeks
Provide a recovery path for objects being referenced during the pack
pruning race. Due to the pack pruning race, it is possible for objects
to become referenced after a pack has been deemed safe to prune, but
before it actually gets pruned. If this happened previously, the newly
referenced objects would be missing and potentially result in a
corrupted ref.
Add the ability to recover from this situation when an object is missing
but happens to still be available in a pack in the "preserved"
directory. This is likely only useful when used in conjunction with the
--preserve-old-packs GC option, which prunes packs by hard-linking to
the preserved directory. If an object is missing and found in a pack in
the preserved directory, immediately recover that pack and its
associated files (idx, bitmaps...) by moving them back to the original
pack directory, and then retry the operation that would have failed due
to the missing object. This retry can now succeed and the repository
may avoid corruption. This approach should drastically reduce the
chance of a corrupt repository during pack pruning at very little extra
cost. This extra cost should only be incurred when objects are missing
and a failure would normally occur.
Change-Id: I2a704e3276b88cc892159d9bfe2455c6eec64252 Signed-off-by: Martin Fick <quic_mfick@quicinc.com> Signed-off-by: Nasser Grainawi <quic_nasserg@quicinc.com>
Nasser Grainawi [Thu, 11 Feb 2021 06:26:17 +0000 (23:26 -0700)]
Pack: Replace extensions bitset with bitmapIdx PackFile
The only extension that was ever consulted from the bitmap was the
bitmap index. We can simplify the Pack code as well as the code of
all the callers if we focus on just that usage.
Nasser Grainawi [Thu, 11 Feb 2021 06:33:43 +0000 (23:33 -0700)]
PackDirectory: Use PackFile to ensure we find preserved packs
Update scanPacksImpl and listPackDirectory (renamed to
getPackFilesByExtById) to use the new PackFile functionality to
validate file names and complete pack file sets (.pack, .idx, etc).
Most importantly, this allows a later change to rely on scanPacks() to
complete a packList that contains packs with the 'old-' prefix in their
extension.
This also eliminates duplication of logic for how to identify and
construct pack files.
Nasser Grainawi [Thu, 11 Feb 2021 05:51:05 +0000 (22:51 -0700)]
Create a PackFile class for Pack filenames
The PackFile class is intended to be a central place to do all
common pack filename manipulation and parsing to help reduce repeated
code and bugs. Use the PackFile class in the Pack class and in many
tests to ensure it works well in a variety of situations. Later changes
will expand use of PackFiles to even more areas.
Change-Id: I921b30f865759162bae46ddd2c6d669de06add4a Signed-off-by: Nasser Grainawi <quic_nasserg@quicinc.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Thomas Wolf [Mon, 1 Mar 2021 07:30:09 +0000 (08:30 +0100)]
HTTP: cookie file stores expiration in seconds
A cookie file stores the expiration in seconds since the Linux Epoch,
not in milliseconds. Correct reading and writing cookie files; with
a backwards-compatibility hack to read files that contain a millisecond
timestamp.
Add a test, and fix tests not to rely on the actual current time so
that they will also run successfully after 2030-01-01 noon.
Bug: 571574
Change-Id: If3ba68391e574520701cdee119544eedc42a1ff2 Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Thomas Wolf [Fri, 29 Jan 2021 22:03:44 +0000 (23:03 +0100)]
LFS: handle invalid pointers better
Make sure that SmudgeFilter calls LfsPointer.parseLfsPointer() with
a stream that supports mark/reset, and make sure that parseLfsPointer()
resets the stream properly if it decides that the stream content is not
a LFS pointer.
Add a test.
Bug: 570758
Change-Id: I2593d67cff31b2dfdfaaa48e437331f0ed877915 Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
In a distributed setting, one can have multiple datacenters use
reftables for serving, while the ground truth for the Ref database is
administered centrally. In this setting, replication delays combined
with compaction can cause update-index ranges to overlap.
Such a setting is used at Google, and the JGit code already handles
this correctly (modulo a bugfix that applied in change I8f8215b99a).
Remove the restriction that was applied at FileReftableDatabase.
Matthias Sohn [Sun, 27 Dec 2020 01:11:47 +0000 (02:11 +0100)]
Fix errorprone configuration for maven-compiler-plugin with javac
See https://errorprone.info/docs/installation.
Add new profile jdk8 to enable running errorprone with javac on java 8
and java 11. Remove errorprone configuration from benchmark module,
didn't find a way to make it work and this module does not contain any
productive code.
Change-Id: I6a84195af05e6cea9e7c04ad5cd4c79742e80cb3 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Matthias Sohn [Wed, 24 Feb 2021 13:54:52 +0000 (14:54 +0100)]
Merge branch 'master' into stable-5.11
* master: (35 commits)
[releng] japicmp: update last release version
IgnoreNode: include path to file for invalid .gitignore patterns
FastIgnoreRule: include bad pattern in log message
init: add config option to set default for the initial branch name
init: allow specifying the initial branch name for the new repository
Fail clone if initial branch doesn't exist in remote repository
GPG: fix reading unprotected old-format secret keys
Update Orbit to S20210216215844
Add missing bazel dependency for o.e.j.gpg.bc.test
GPG: handle extended private key format
dfs: handle short copies
[GPG] Provide a factory for the BouncyCastleGpgSigner
Fix boxing warnings
GPG: compute the keygrip to find a secret key
GPG signature verification via BouncyCastle
Post commit hook failure should not cause commit failure
Allow to define additional Hook classes outside JGit
GitHook: use default charset for output and error streams
GitHook: use generic OutputStream instead of PrintStream
Update jetty to 9.4.36.v20210114
...
Thomas Wolf [Tue, 23 Feb 2021 17:10:08 +0000 (18:10 +0100)]
IgnoreNode: include path to file for invalid .gitignore patterns
Include the full file path of the .gitignore file and the line number
of the invalid pattern. Also include the pattern itself.
.gitignore files inside the repository are reported with their
repository-relative path; files outside (from git config
core.excludesFile or .git/info/exclude) are reported with their
full absolute path.
Bug: 571143
Change-Id: Ibe5969679bc22cff923c62e3ab9801d90d6d06d1 Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Thomas Wolf [Tue, 23 Feb 2021 12:11:56 +0000 (13:11 +0100)]
FastIgnoreRule: include bad pattern in log message
When a .gitignore pattern cannot be parsed include the pattern in the
log message. Just reporting "not closed bracket" isn't helpful if the
user doesn't know in which pattern the problem occurred.
Even better would be to include the full path of the .gitignore file
that contained the offending pattern. This is not implemented in this
change; it may need new API and needs more thought.
Bug: 571143
Change-Id: Id5b16d9cf550544ba3ad409a02041946fa8516ab Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Matthias Sohn [Mon, 25 Jan 2021 01:43:18 +0000 (02:43 +0100)]
init: add config option to set default for the initial branch name
We introduced the option --initial-branch=<branch-name> to allow
initializing a new repository with a different initial branch.
To allow users to override the initial branch name more permanently
(i.e. without having to specify the name manually for each 'git init'),
introduce the 'init.defaultBranch' option.
This option was added to git in 2.28.0.
See https://git-scm.com/docs/git-config#Documentation/git-config.txt-initdefaultBranch
Bug: 564794
Change-Id: I679b14057a54cd3d19e44460c4a5bd3a368ec848 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Matthias Sohn [Mon, 25 Jan 2021 00:54:03 +0000 (01:54 +0100)]
init: allow specifying the initial branch name for the new repository
Add option --initial-branch/-b to InitCommand and the CLI init command.
This is the first step to implement support for the new option
init.defaultBranch. Both were added to git in release 2.28.
See https://git-scm.com/docs/git-init#Documentation/git-init.txt--bltbranch-namegt
Bug: 564794
Change-Id: Ia383b3f90b5549db80f99b2310450a7faf6bce4c Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Thomas Wolf [Sun, 24 Jan 2021 01:13:43 +0000 (02:13 +0100)]
GPG: handle extended private key format
Add detection for the key-value pair format that was available in
gpg-agent for some time already and that has become the default since
gpg-agent 2.2.20. If a secret key in the .gnupg/private-keys-v1.d
directory is found to have this format, extract the human-readable key
from it, convert it to the binary serialized form and hand that to
BouncyCastle.
Encrypted keys in the new format may use AES/OCB. OCB is a patent-
encumbered algorithm; although there is a license for open-source
software, that may not be good enough and OCB may not be available in
Java. It is not available in the default security provider in Java,
and it is also not available in the BouncyCastle version included in
Eclipse.
Implement AES/OCB decryption, throwing a PGPException with a nice
message if the algorithm is not available. Include a copy of the normal
s-expression parser of BouncyCastle and fix it to properly handle data
from such keys: such keys do not contain an internal hash since the
AES/OCB cipher includes and checks a MAC already.
Bug: 570501
Change-Id: Ifa6391a809a84cfc6ae7c6610af6a79204b4143b Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
wh [Thu, 17 Dec 2020 18:14:32 +0000 (18:14 +0000)]
dfs: handle short copies
`copy` is documented as possibly returning a smaller number of bytes
than requested. In practice, this can occur if a block is cached and the
reader never pulls in the file to check its size.
Thomas Wolf [Sun, 17 Jan 2021 15:21:28 +0000 (16:21 +0100)]
GPG: compute the keygrip to find a secret key
The gpg-agent stores secret keys in individual files in the secret
key directory private-keys-v1.d. The files have the key's keygrip
(in upper case) as name and extension ".key".
A keygrip is a SHA1 hash over the parameters of the public key. By
computing this keygrip, we can pre-compute the expected file name and
then check only that one file instead of having to iterate over all
keys stored in that directory.
This file naming scheme is actually an implementation detail of
gpg-agent. It is unlikely to change, though. The keygrip itself is
computed via libgcrypt and will remain stable according to the GPG
main author.[1]
Add an implementation for calculating the keygrip and include tests.
Do not iterate over files in BouncyCastleGpgKeyLocator but only check
the single file identified by the keygrip.
Ideally upstream BouncyCastle would provide such a getKeyGrip() method.
But as it re-builds GPG and libgcrypt internals, it's doubtful it would
be included there, and since BouncyCastle even lacks a number of curve
OIDs for ed25519/curve25519 and uses the short-Weierstrass parameters
instead of the more common Montgomery parameters, including it there
might be quite a bit of work.
Thomas Wolf [Thu, 7 Jan 2021 16:11:57 +0000 (17:11 +0100)]
GPG signature verification via BouncyCastle
Add a GpgSignatureVerifier interface, plus a factory to create
instances thereof that is provided via the ServiceLoader mechanism.
Implement the new interface for BouncyCastle. A verifier maintains
an internal LRU cache of previously found public keys to speed up
verifying multiple objects (tag or commits). Mergetags are not handled.
Provide a new VerifySignatureCommand in org.eclipse.jgit.api together
with a factory method Git.verifySignature(). The command can verify
signatures on tags or commits, and can be limited to accept only tags
or commits. Provide a new public WrongObjectTypeException thrown when
the command is limited to either tags or commits and a name resolves
to some other object kind.
In jgit.pgm, implement "git tag -v", "git log --show-signature", and
"git show --show-signature". The output is similar to command-line
gpg invoked via git, but not identical. In particular, lines are not
prefixed by "gpg:" but by "bc:".
Trust levels for public keys are read from the keys' trust packets,
not from GPG's internal trust database. A trust packet may or may
not be set. Command-line GPG produces more warning lines depending
on the trust level, warning about keys with a trust level below
"full".
There are no unit tests because JGit still doesn't have any setup to
do signing unit tests; this would require at least a faked .gpg
directory with pre-created key rings and keys, and a way to make the
BouncyCastle classes use that directory instead of the default. See
bug 547538 and also bug 544847.
Tested manually with a small test repository containing signed and
unsigned commits and tags, with signatures made with different keys
and made by command-line git using GPG 2.2.25 and by JGit using
BouncyCastle 1.65.
Bug: 547751
Change-Id: If7e34aeed6ca6636a92bf774d893d98f6d459181 Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Matthias Sohn [Sun, 7 Feb 2021 22:48:57 +0000 (23:48 +0100)]
Allow to define additional Hook classes outside JGit
EGit wants to add gitflow specific hooks in org.eclipse.egit.gitflow.
Make GitHook public to allow sub-classing outside of the
org.eclipse.jgit.hooks package.
Change-Id: I439575ec901e3610b5cf9d66f7641c8324faa865 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Marija Savtchouk [Thu, 21 Jan 2021 15:08:57 +0000 (15:08 +0000)]
Allow dir/file conflicts in virtual base commit on recursive merge.
If RecursiveMerger finds multiple base commits, it tries to compute
the virtual ancestor to use as a base for the three way merge.
Currently, the content conflicts between ancestors are ignored (file
staged with the conflict markers). If the path is a file in one ancestor
and a dir in the other, it results in NoMergeBaseException
(CONFLICTS_DURING_MERGE_BASE_CALCULATION).
Allow these conflicts by ignoring this unmerged path in the virtual
base. The merger will compute diff in the children instead and it
can be further fixed manually if needed.
Change-Id: Id59648ae1d6bdf300b26fff513c3204317b755ab Signed-off-by: Marija Savtchouk <mariasavtchouk@google.com>
Thomas Wolf [Sun, 24 Jan 2021 00:57:09 +0000 (01:57 +0100)]
GPG: support git config gpg.program
Add it to the GpgConfig. Change GpgConfig to load the values once only.
Add a parameter to the GpgObjectSigner interface's operations to pass
in a GpgConfig. Update CommitCommand and TagCommand to pass the value
to the signer. Let the signer decide whether it can actually produce
the wanted signature type (openpgp or x509).
No behavior change. But this makes it possible to implement different
signers that might support x509 signatures, or use gpg.program and
shell out to an external GPG executable for signing.
Change-Id: I427f83eb1ece81c310e1cddd85315f6f88cc99ea Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Adithya Chakilam [Sat, 30 Jan 2021 01:35:04 +0000 (19:35 -0600)]
Fix DateRevQueue tie breaks with more than 2 elements
DateRevQueue is expected to give out the commits that have higher
commit time. But in case of tie(same commit time), it should give
the commit that is inserted first. This is inferred from the
testInsertTie test case written for DateRevQueue. Also that test
case, right now uses just two commits which caused it not to fail
with the current implementation, so added another commit to make
the test more robust.
By fixing the DateRevQueue, we would also match the behaviour of
LogCommand.addRange(c1,c2) with git log c1..c2. A test case for
the same is added to show that current behaviour is not the
expected one.
By fixing addRange(), the order in which commits are applied during
a rebase is altered. Rebase logic should have never depended upon
LogCommand.addRange() since the intended order of addRange() is not
the order a rebase should use. So, modify the RebaseCommand to use
RevWalk directly with TopoNonIntermixSortGenerator.
Add a new LogCommandTest.addRangeWithMerge() test case which creates
commits in the following order:
A - B - C - M
\ /
-D-
Using git 2.30.0, git log B..M outputs: M C D
LogCommand.addRange(B, M) without this fix outputs: M D C
LogCommand.addRange(B, M) with this fix outputs: M C D
Matthias Sohn [Sun, 3 Jan 2021 01:10:58 +0000 (02:10 +0100)]
Fix FileRepository#convertToReftable which failed if no reflog existed
Deleting non-existing files when converting to reftable without backup
caused convertToReftable to fail. Observed this on a mirrored repository
which had no reflogs. Fix this by skipping missing files during
deletion.
Change-Id: I3bb913d5bfddccc6813677b873006efb849a6ebc Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>