Matthias Sohn [Tue, 31 Jan 2023 23:59:32 +0000 (00:59 +0100)]
Merge branch 'stable-6.2' into stable-6.3
* stable-6.2:
Shortcut during git fetch for avoiding looping through all local refs
FetchCommand: fix fetchSubmodules to work on a Ref to a blob
Silence API warnings introduced by I466dcde6
Allow the exclusions of refs prefixes from bitmap
PackWriterBitmapPreparer: do not include annotated tags in bitmap
BatchingProgressMonitor: avoid int overflow when computing percentage
Speedup GC listing objects referenced from reflogs
FileSnapshotTest: Add more MISSING_FILE coverage
Matthias Sohn [Tue, 31 Jan 2023 23:44:41 +0000 (00:44 +0100)]
Merge branch 'stable-6.1' into stable-6.2
* stable-6.1:
Shortcut during git fetch for avoiding looping through all local refs
FetchCommand: fix fetchSubmodules to work on a Ref to a blob
Silence API warnings introduced by I466dcde6
Allow the exclusions of refs prefixes from bitmap
PackWriterBitmapPreparer: do not include annotated tags in bitmap
BatchingProgressMonitor: avoid int overflow when computing percentage
Speedup GC listing objects referenced from reflogs
FileSnapshotTest: Add more MISSING_FILE coverage
Matthias Sohn [Tue, 31 Jan 2023 23:38:52 +0000 (00:38 +0100)]
Merge branch 'stable-6.0' into stable-6.1
* stable-6.0:
Shortcut during git fetch for avoiding looping through all local refs
FetchCommand: fix fetchSubmodules to work on a Ref to a blob
Silence API warnings introduced by I466dcde6
Allow the exclusions of refs prefixes from bitmap
PackWriterBitmapPreparer: do not include annotated tags in bitmap
BatchingProgressMonitor: avoid int overflow when computing percentage
Speedup GC listing objects referenced from reflogs
FileSnapshotTest: Add more MISSING_FILE coverage
Matthias Sohn [Tue, 31 Jan 2023 23:30:52 +0000 (00:30 +0100)]
Merge branch 'stable-5.13' into stable-6.0
* stable-5.13:
Shortcut during git fetch for avoiding looping through all local refs
FetchCommand: fix fetchSubmodules to work on a Ref to a blob
Silence API warnings introduced by I466dcde6
Allow the exclusions of refs prefixes from bitmap
PackWriterBitmapPreparer: do not include annotated tags in bitmap
BatchingProgressMonitor: avoid int overflow when computing percentage
Speedup GC listing objects referenced from reflogs
FileSnapshotTest: Add more MISSING_FILE coverage
Luca Milanesio [Wed, 18 May 2022 12:31:30 +0000 (13:31 +0100)]
Shortcut during git fetch for avoiding looping through all local refs
The FetchProcess needs to verify that all the refs received point
to objects that are reachable from the local refs, which could be
very expensive but is needed to avoid missing objects exceptions
because of broken chains.
When the local repository has a lot of refs (e.g. millions) and the
client is fetching a non-commit object (e.g. refs/sequences/changes in
Gerrit) the reachability check on all local refs can be very expensive
compared to the time to fetch the remote ref.
Example for a 2M refs repository:
- fetching a single non-commit object: 50ms
- checking the reachability of local refs: 30s
A ref pointing to a non-commit object doesn't have any parent or
successor objects, hence would never need to have a reachability check
done. Skipping the askForIsComplete() altogether would save the 30s
time spent in an unnecessary phase.
Matthias Sohn [Tue, 4 Oct 2022 13:42:25 +0000 (15:42 +0200)]
FetchCommand: fix fetchSubmodules to work on a Ref to a blob
FetchCommand#fetchSubmodules assumed that FETCH_HEAD can always be
parsed as a tree. This isn't true if it refers to a Ref referring to a
BLOB. This is e.g. used in Gerrit for Refs like refs/sequences/changes
which are used to implement sequences stored in git.
Luca Milanesio [Tue, 20 Dec 2022 21:50:19 +0000 (21:50 +0000)]
Allow the exclusions of refs prefixes from bitmap
When running a GC.repack() against a repository with over one
thousands of refs/heads and tens of millions of ObjectIds,
the calculation of all bitmaps associated with all the refs
would result in an unreasonable big file that would take up to
several hours to compute.
Test scenario: repo with 2500 heads / 10M obj Intel Xeon E5-2680 2.5GHz
Before this change: 20 mins
After this change and 2300 heads excluded: 10 mins (90s for bitmap)
Having such a large bitmap file is also slow in the runtime
processing and have negligible or even negative benefits, because
the time lost in reading and decompressing the bitmap in memory
would not be compensated by the time saved by using it.
It is key to preserve the bitmaps for those refs that are mostly
used in clone/fetch and give the ability to exlude some refs
prefixes that are known to be less frequently accessed, even
though they may actually be actively written.
Example: Gerrit sandbox branches may even be actively
used and selected automatically because its commits are very
recent, however, they may bloat the bitmap, making it ineffective.
A mono-repo with tens of thousands of developers may have
a relatively small number of active branches where the
CI/CD jobs are continuously fetching/cloning the code. However,
because Gerrit allows the use of sandbox branches, the
total number of refs/heads may be even tens to hundred
thousands.
Luca Milanesio [Wed, 28 Dec 2022 01:09:52 +0000 (01:09 +0000)]
PackWriterBitmapPreparer: do not include annotated tags in bitmap
The annotated tags should be excluded from the bitmap associated
with the heads-only packfile. However, this was not happening
because of the check of exclusion of the peeled object instead
of the objectId to be excluded from the bitmap.
When creating a bitmap for the above commit graph, before this
change all the commits are included (3 bitmaps), which is
incorrect, because all commits reachable from annotated tags
should not be included.
The heads-only bitmap should include only commit0 and commit1
but because PackWriterBitPreparer was checking for the peeled
pointer of tag1 to be excluded (commit2) which was not found in
the list of tags to exclude (annotated-tag1), the commit2 was
included, even if it wasn't reachable only from the head.
Add an additional check for exclusion of the original objectId
for allowing the exclusion of annotated tags and their pointed
commits. Add one specific test associated with an annotated tag
for making sure that this use-case is covered also.
Example repository benchmark for measuring the improvement:
# refs: 400k (2k heads, 88k tags, 310k changes)
# objects: 11M (88k of them are annotate tags)
# packfiles: 2.7G
Before this change:
GC time: 5h
clone --bare time: 7 mins
After this change:
GC time: 20 mins
clone --bare time: 3 mins
Matthias Sohn [Wed, 18 Jan 2023 16:39:19 +0000 (17:39 +0100)]
Speedup GC listing objects referenced from reflogs
GC needs to get a ReflogReader for all existing refs to list all objects
referenced from reflogs. The existing Repository#getReflogReader method
accepts the ref name and then resolves the Ref to create a ReflogReader.
GC calling that for a huge number of Refs one by one is very slow. GC
first gets all Refs in bulk and then calls getReflogReader for each of
them.
Fix this by adding another getReflogReader method to Repository which
accepts a Ref directly.
This speeds up running JGit gc on a mirror clone of the Gerrit
repository from 15:36 min to 1:08 min. The repository used in this test
had 45k refs, 275k commits and 1.2m git objects.
Nasser Grainawi [Tue, 10 Jan 2023 23:15:42 +0000 (16:15 -0700)]
Cache trustFolderStat/trustPackedRefsStat value per-instance
Instead of re-reading the config every time the methods using these
values were called, cache the config value at the time of instance
construction. Caching the values improves performance for each of the
method calls. These configs are set based on the filesystem storing the
repository and unlikely to change while an application is running.
Refresh 'objects' dir and retry if a loose object is not found
A new loose object may not be immediately visible on a NFS
client if it was created on another client. Refreshing the
'objects' dir and trying again can help work around the NFS
behavior.
Here's an E2E problem that this change can help fix. Consider
a Gerrit multi-primary setup with repositories based on NFS.
Add a new patch-set to an existing change and then immediately
fetch the new patch-set of that change. If the fetch is handled
by a Gerrit primary different that the one which created the
patch-set, then we sometimes run into a MissingObjectException
that causes the fetch to fail.
Currently, we always read packed-refs file when 'trustFolderStat'
is false. Introduce a new config 'trustPackedRefsStat' which takes
precedence over 'trustFolderStat' when reading packed refs. Possible
values for this new config are:
* always: Trust packed-refs file attributes
* after_open: Same as 'always', but refresh the file attributes of
packed-refs before trusting it
* never: Always read the packed-refs file
* unset: Fallback to 'trustFolderStat' to determine if the file
attributes of packed-refs can be trusted
Folks whose repositories are on NFS and have traditionally been
setting 'trustFolderStat=false' can now get some performance improvement
with 'trustPackedRefsStat=after_open' as it refreshes the file
attributes of packed-refs (at least on some NFS clients) before
considering it.
For example, consider a repository on NFS with ~500k packed-refs. Here
are some stats which illustrate the improvement with this new config
when reading packed refs on NFS:
Matthias Sohn [Sun, 20 Nov 2022 19:22:24 +0000 (20:22 +0100)]
Merge branch 'stable-6.2' into stable-6.3
* stable-6.2:
Remove unused imports
Suppress non-externalized String warnings
Remove unused API problem filters
Silence API errors
Silence API errors
Silence API warnings
Matthias Sohn [Wed, 16 Nov 2022 09:14:13 +0000 (10:14 +0100)]
Merge branch 'stable-6.2' into stable-6.3
* stable-6.2:
Extract Exception -> HTTP status code mapping for reuse
Don't handle internal git errors as an HTTP error
Allow to perform PackedBatchRefUpdate without locking loose refs
Matthias Sohn [Wed, 16 Nov 2022 09:13:20 +0000 (10:13 +0100)]
Merge branch 'stable-6.1' into stable-6.2
* stable-6.1:
Extract Exception -> HTTP status code mapping for reuse
Don't handle internal git errors as an HTTP error
Allow to perform PackedBatchRefUpdate without locking loose refs
Matthias Sohn [Wed, 16 Nov 2022 08:56:08 +0000 (09:56 +0100)]
Merge branch 'stable-6.2' into stable-6.3
* stable-6.2:
[benchmarks] Remove profiler configuration
Add SHA1 benchmark
[benchmarks] Set version of maven-compiler-plugin to 3.8.1
Fix running JMH benchmarks
Add option to allow using JDK's SHA1 implementation
Ignore IllegalStateException if JVM is already shutting down
Matthias Sohn [Wed, 16 Nov 2022 08:55:22 +0000 (09:55 +0100)]
Merge branch 'stable-6.1' into stable-6.2
* stable-6.1:
[benchmarks] Remove profiler configuration
Add SHA1 benchmark
[benchmarks] Set version of maven-compiler-plugin to 3.8.1
Fix running JMH benchmarks
Add option to allow using JDK's SHA1 implementation
Ignore IllegalStateException if JVM is already shutting down
Matthias Sohn [Wed, 16 Nov 2022 08:54:28 +0000 (09:54 +0100)]
Merge branch 'stable-6.0' into stable-6.1
* stable-6.0:
[benchmarks] Remove profiler configuration
Add SHA1 benchmark
[benchmarks] Set version of maven-compiler-plugin to 3.8.1
Fix running JMH benchmarks
Add option to allow using JDK's SHA1 implementation
Ignore IllegalStateException if JVM is already shutting down
Matthias Sohn [Tue, 15 Nov 2022 23:15:17 +0000 (00:15 +0100)]
Merge branch 'stable-5.13' into stable-6.0
* stable-5.13:
[benchmarks] Remove profiler configuration
Add SHA1 benchmark
[benchmarks] Set version of maven-compiler-plugin to 3.8.1
Fix running JMH benchmarks
Add option to allow using JDK's SHA1 implementation
Ignore IllegalStateException if JVM is already shutting down
Matthias Sohn [Tue, 4 Oct 2022 13:45:01 +0000 (15:45 +0200)]
Fix running JMH benchmarks
Without this I sometimes hit the error:
$ java -jar target/benchmarks.jar
Exception in thread "main" java.lang.RuntimeException: ERROR: Unable to
find the resource: /META-INF/BenchmarkList
at org.openjdk.jmh.runner.AbstractResourceReader.getReaders(AbstractResourceReader.java:98)
at org.openjdk.jmh.runner.BenchmarkList.find(BenchmarkList.java:124)
at org.openjdk.jmh.runner.Runner.internalRun(Runner.java:253)
at org.openjdk.jmh.runner.Runner.run(Runner.java:209)
at org.openjdk.jmh.Main.main(Main.java:71)
Matthias Sohn [Fri, 11 Nov 2022 16:54:06 +0000 (17:54 +0100)]
Add option to allow using JDK's SHA1 implementation
The change If6da9833 moved the computation of SHA1 from the JVM's
JCE to a pure Java implementation with collision detection.
The extra security for public sites comes with a cost of slower
SHA1 processing compared to the native implementation in the JDK.
When JGit is used internally and not exposed to any traffic from
external or untrusted users, the extra cost of the pure Java SHA1
implementation can be avoided, falling back to the previous
native MessageDigest implementation.
Sven Selberg [Wed, 9 Nov 2022 17:28:45 +0000 (18:28 +0100)]
Extract Exception -> HTTP status code mapping for reuse
Extract private static method UploadPackServlet#statusCodeForThrowable
to a public static method in the UploadPackErrorHandler interface so
that implementers of this interface can reuse the default mapping.
Matthias Sohn [Thu, 27 Oct 2022 18:31:31 +0000 (20:31 +0200)]
Ignore IllegalStateException if JVM is already shutting down
Trying to register/unregister a shutdown hook when the JVM is already in
shutdown throws an IllegalStateException. Ignore this exception since we
can't do anything about it.
Saša Živkov [Fri, 21 Oct 2022 14:32:03 +0000 (16:32 +0200)]
Allow to perform PackedBatchRefUpdate without locking loose refs
Add another newBatchUpdate method in the RefDirectory where we can
control if the created PackedBatchRefUpdate will lock the loose refs or
not.
This can be useful in cases when we run programs which have exclusive
access to a Git repository and we know that locking loose refs is
unnecessary and just a performance loss.
Thomas Wolf [Mon, 15 Aug 2022 23:02:21 +0000 (01:02 +0200)]
[merge] Fix merge conflicts with symlinks
Previous code would do a content merge on symlinks, and write the merge
result to the working tree as a file. C git doesn't do this; it leaves
a symlink in the working tree unchanged, or in a delete-modify conflict
it would check out "theirs".
Moreover, previous code would write the merge result to the link target,
not to the link. This would overwrite an existing link target, or fail
if the link pointed to a directory.
In link/file conflicts or file/link conflicts, C git always puts the
file into the working tree.
Change conflict handling accordingly. Add tests for all the conflict
cases.
Bug: 580347
Change-Id: I3cffcb4bcf8e336a85186031fff23f0c4b6ee19d Signed-off-by: Thomas Wolf <twolf@apache.org>
Matthias Sohn [Tue, 6 Sep 2022 13:58:47 +0000 (15:58 +0200)]
Merge branch 'master' into stable-6.3
* master:
Move WorkTreeUpdater to merge package
WorkTreeUpdater: use DirCacheCheckout#StreamSupplier
DirCacheCheckout#getContent: also take InputStream supplier
WorkTreeUpdater: remove safeWrite option
* changes:
Move WorkTreeUpdater to merge package
WorkTreeUpdater: use DirCacheCheckout#StreamSupplier
DirCacheCheckout#getContent: also take InputStream supplier
WorkTreeUpdater: remove safeWrite option
Han-Wen Nienhuys [Tue, 30 Aug 2022 08:07:21 +0000 (10:07 +0200)]
DirCacheCheckout#getContent: also take InputStream supplier
This lets us use DirCacheCheckout for routines that want to write
files in the worktree that aren't available as a git object.
DirCacheCheckout#getContent takes a InputStream supplier rather than
InputStream: if filtering fails with IOException, the data is placed
unfiltered in the checkout. This means that the stream has to be read
again, from the start.
Use it in this way in ApplyCommand. This use is incorrect, though: the
same InputStream is returned twice, so if the read to be retried, the
stream will return 0 bytes. It doesn't really matter, because in
either case, the SHA1 will not match up, and the patch fails.
Han-Wen Nienhuys [Thu, 25 Aug 2022 17:37:35 +0000 (19:37 +0200)]
WorkTreeUpdater: remove safeWrite option
This was added in Ideaefd5178 to anticipate on writing files for
ApplyCommand, but we are keeping WorkTreeUpdater private to the merge
package for now.
Matthias Sohn [Mon, 5 Sep 2022 19:19:07 +0000 (21:19 +0200)]
Merge branch 'master' into stable-6.3
* master:
Update Orbit to R20220830213456 for 2022-09
BaseSuperprojectWriter: report invalid paths as manifest errors
ApplyCommand: fix ApplyResult#updatedFiles
WorkTreeUpdater: rename metadata maps
WorkTreeUpdater#Result: hide data members
Add javadoc on RevCommit
Option to pass start RevCommit to be blamed on to the BlameGenerator.
WorkTreeUpdater: re-format and clean-up
Adds FilteredRevCommit that can overwrites its parents in the DAG.
Ivan Frade [Tue, 23 Aug 2022 19:10:27 +0000 (12:10 -0700)]
BaseSuperprojectWriter: report invalid paths as manifest errors
An invalid path in the manifest (e.g. '.') is reported by DirCache in a
runtime exception. In server context this becomes a 500 instead of a user error.
Wrap the runtime invalid path exception into a checked ManifestErrorException that
caller can handle.
Option to pass start RevCommit to be blamed on to the BlameGenerator.
This can allow passing a FilteredRevCommit which is the filtered list of
commit graph making it easier for Blame to work on. This can
significantly improve blame performance since blame can skip expensive
RevWalk.
Thomas Wolf [Sun, 14 Aug 2022 15:47:36 +0000 (17:47 +0200)]
WorkTreeUpdater: re-format and clean-up
Reformat using the standard JGit formatter settings. Clean-ups:
* Try to improve javadoc.
* Remove blindly copy-pasted "@since 6.1" annotations.
* Get rid of private method nonNullNonBareRepo(); it's not needed.
* Simplify method nonNullRepo(), and annotate as @NonNull.
* Rename setInCoreFileSizeLimit() to getInCoreFileSizeLimit().
Change-Id: Ib1797e7cf925d87554307468330971e8ab2e05e9 Signed-off-by: Thomas Wolf <twolf@apache.org> Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Thomas Wolf [Sun, 14 Aug 2022 14:34:50 +0000 (16:34 +0200)]
DirCacheCheckout: load WorkingTreeOptions only once
Previous code loaded the WorkingTreeOptions afresh for every single
file being checked out. This checked the git config (all three files,
repo, user and system config) for having been modified every time.
These checks can be costly, for instance on Windows, or if one of the
three config files is not on a local disk, or on an otherwise slow
storage.
Improve this by loading the options and thus checking the git config
only once before the checkout.
Bug: 579715
Change-Id: I21cd5a808f9d90b5ca2d022f91f0eeb8ca26091c Signed-off-by: Thomas Wolf <twolf@apache.org>
Thomas Wolf [Sun, 14 Aug 2022 14:38:57 +0000 (16:38 +0200)]
WorkTreeUpdater: Fix unclosed streams
1. A TemporaryBuffer.LocalFile must be destroyed to ensure the
temporary file gets deleted on disk.
2. TemporaryBuffer.openInputStream() may be used only after
TemporaryBuffer.close().
3. The caller of DirCacheCheckout.getContent() is responsible for
closing the OutputStream!
Change-Id: I46abb0fba27656a1026858e5783fc60d4738a45e Signed-off-by: Thomas Wolf <twolf@apache.org>
Thomas Wolf [Wed, 20 Jul 2022 16:30:18 +0000 (18:30 +0200)]
Fix adding symlinks to the index when core.symlinks=false
With core.symlinks=false, symlinks are checked out as plain files.
When such a file is re-added to the index, and the index already
contains a symlink there, add the file as a symlink. Previous code
changed the index entry to a regular file.
Bug: 580412
Change-Id: I5497bedc3da89c8b10120b8077c56bc5b67cb791 Signed-off-by: Thomas Wolf <twolf@apache.org>
- add missing @since 6.3 for new protected field workTreeUpdater and new
class WorkTreeUpdater
- suppress API errors caused by removing/adding protected fields and
methods
We follow OSGi semantic versioning which allows breaking implementers in
minor versions which are e.g. subclassing a public class.
* changes:
Reapply "Create util class for work tree updating in both filesystem and index."
ResolveMerger: add coverage for inCore file => directory transition
Provide default shallowCommits getter and setter in ObjectDatabase
I649db9ae679ec2606cf7c530b040f8b6b93eb81a added a default implementation
for getShallowCommits and setShallowCommits to DfsObjDatabase, for the
convenience of any implementers that define subclasses. But we forgot
that some implementers inherit from ObjectDatabase directly instead.
Move the default getter and setter to the base class so that such
callers do not need source changes to unbreak their build.
This also lets us update the api_filters to reflect that this is no
longer an API-breaking change.
Add a bugfix for deletions in ResolveMergers instantiated with just an
ObjectInserter as argument.
Original change description:
Create util class for work tree updating in both filesystem and index.
This class intends to make future support in index updating easier.
This class currently extracts some logic from ResolveMerger. Logic
related to StreamSupplier was copied from ApplyCommand, which will be
integrated in a following change.
com.google.gerrit.extensions.restapi.RestApiException: Cannot rebase ps
[...]
at com.google.gerrit.server.api.changes.RevisionApiImpl.rebase(RevisionApiImpl.java:280)
at com.google.gerrit.acceptance.api.change.ChangeIT.rebaseChangeBase(ChangeIT.java:1584)
Caused by: com.google.gerrit.server.update.UpdateException: java.lang.NullPointerException: repository is required
at com.google.gerrit.server.update.BatchUpdate.executeUpdateRepo(BatchUpdate.java:588)
[...]
Caused by: java.lang.NullPointerException: repository is required
at org.eclipse.jgit.merge.Merger.nonNullRepo(Merger.java:128)
at org.eclipse.jgit.merge.ResolveMerger.addDeletion(ResolveMerger.java:380)
at org.eclipse.jgit.merge.ResolveMerger.processEntry(ResolveMerger.java:553)
at org.eclipse.jgit.merge.ResolveMerger.mergeTreeWalk(ResolveMerger.java:1224)
at org.eclipse.jgit.merge.ResolveMerger.mergeTrees(ResolveMerger.java:1174)
at org.eclipse.jgit.merge.ResolveMerger.mergeImpl(ResolveMerger.java:299)
at org.eclipse.jgit.merge.Merger.merge(Merger.java:233)
at org.eclipse.jgit.merge.Merger.merge(Merger.java:186)
at org.eclipse.jgit.merge.ThreeWayMerger.merge(ThreeWayMerger.java:96)
at com.google.gerrit.server.change.RebaseChangeOp.rebaseCommit(RebaseChangeOp.java:360)
Provide a default implementation for set/get shallowCommits on DfsObjDatabase
Jgit change https://git.eclipse.org/r/c/jgit/jgit/+/193329 adds an implementation for get/set shallow commits in ObjectDatabase. This failed gerrit's acceptance tests since there is no default implementation for them in DfsObjDatabase.
Option to pass start RevCommit to be blamed on to the BlameGenerator.
This can allow passing a FilteredRevCommit which is the filtered list of
commit graph making it easier for Blame to work on. This can
significantly improve blame performance since blame can skip expensive
RevWalk.
Ronald Bhuleskar [Wed, 15 Jun 2022 21:37:21 +0000 (14:37 -0700)]
Add the ability to override parents on RevCommit.
This makes RevCommit extensible to allow having different structure of
child-parent relationship. This change is a pre-requsite for having a
FilteredRevCommit that overrides parents from the RevCommit. That then
provides a cheaper way to walk over a subset of RevCommits instead of
an expensive way that applies filters while walking over selected
commits. Useful with Blame which works on a single file and that can be
made performant, if we know all the commits needed by the Blame
algorithm. So Blame algorithm can avoid walking over finding what
commits to blame on.
This change makes parents field on RevCommit private and exposes it
thrrough overrideable methods such as getParents, getParent at index,
getParentCount and setParents. All other files other than RevCommit are
updating the usages of accessing RevCommits parents.