Matthias Sohn [Tue, 24 Sep 2024 08:51:22 +0000 (10:51 +0200)]
AdvertisedRequestValidator: fix WantNotValidException caused by race
Fetch with protocol V2 failed under the following conditions
- fetch uses bidirectional protocol (git, ssh) which uses a shortcut
to determine invalid wants
- not all wants are advertised
- race condition: wanted ref is updated during fetch by another thread
after the thread serving upload-pack determined wants and before it
checks not advertised wants
Fix this by calling
`new ReachableCommitRequestValidator().checkWants(up, wants)`
instead of throwing WantNotValidException in [1]
if this race happened in the same way like it's done for unidirectional
protocols (http) [2].
Matthias Sohn [Tue, 20 Aug 2024 12:54:08 +0000 (14:54 +0200)]
Merge branch 'stable-6.6' into stable-6.7
* stable-6.6:
Update tycho to 4.0.8
Update org.eclipse.dash:license-tool-plugin to 1.1.0
Fix "Comparison of narrow type with wide type in loop condition"
JGit v5.13.3.202401111512-r
Matthias Sohn [Tue, 20 Aug 2024 12:28:33 +0000 (14:28 +0200)]
Merge branch 'stable-6.5' into stable-6.6
* stable-6.5:
Update org.eclipse.dash:license-tool-plugin to 1.1.0
Fix "Comparison of narrow type with wide type in loop condition"
JGit v5.13.3.202401111512-r
Matthias Sohn [Sun, 18 Aug 2024 16:35:29 +0000 (18:35 +0200)]
Merge branch 'stable-6.4' into stable-6.5
* stable-6.4:
Update org.eclipse.dash:license-tool-plugin to 1.1.0
Fix "Comparison of narrow type with wide type in loop condition"
JGit v5.13.3.202401111512-r
Matthias Sohn [Fri, 9 Aug 2024 09:53:01 +0000 (11:53 +0200)]
Fix "Comparison of narrow type with wide type in loop condition"
This issue was detected by a GitHub CodeQL security scan run on JGit
source code.
Description of the error raised by the security scan:
"In a loop condition, comparison of a value of a narrow type with a
value of a wide type may always evaluate to true if the wider value is
sufficiently large (or small). This is because the narrower value may
overflow. This can lead to an infinite loop."
Fix this by using type `long` for the local variable `done`.
Luca Milanesio [Wed, 10 Jan 2024 19:38:46 +0000 (19:38 +0000)]
Allow to discover bitmap on disk created after the packfile
When the bitmap file was created *after* a packfile had been
loaded into the memory, JGit was unable to discover them.
That happed because of two problems:
1. The PackDirectory.getPacks() does not implement the usual
while loop that is scanning through the packs directory
as in the other parts of JGit.
2. The scan packs does not look for newly created bitmap files
if the packfile is already loaded in memory.
Implement the normal packfiles scanning whenever the PackDirectory
needs to return a list of packs, and make sure that any reused
Pack object would have its associated bitmap properly refreshed
from disk.
Adapt the assertions in GcConcurrentTest with the rescanned list
of Pack from the objects/packs directory.
Dariusz Luksza [Thu, 15 Feb 2024 10:36:41 +0000 (10:36 +0000)]
Merge branch 'stable-6.6' into stable-6.7
* stable-6.6:
RefDirectory: Do not unlock until after deleting loose ref
Add missing javadoc description for declared exception
SnapshottingRefDirectory: Invalidate snapshot after locking ref for update
SnapshottingRefDir: Replace lambas with method refs
SnapshottingRefDir: Reduce casts with overrides
Nasser Grainawi [Fri, 26 Jan 2024 01:59:15 +0000 (18:59 -0700)]
RefDirectory: Do not unlock until after deleting loose ref
Fix a potential race condition where we would remove our loose ref lock
file before deleting the loose ref itself. This race could result in the
current thread deleting a loose ref newly written by another thread.
Other callers seem to be following the correct pattern, but improve the
method naming to try to help future callers.
Nasser Grainawi [Thu, 25 Jan 2024 23:29:05 +0000 (16:29 -0700)]
SnapshottingRefDirectory: Invalidate snapshot after locking ref for
update
When using the SnapshottingRefDirectory, if a thread has already read
packed-refs, then another actor updates packed-refs, the original
thread may create an update that is based on the old cached/snapshotted
packed-refs content. That update could effectively perform a forced
update unintentionally because it is unaware of the new content.
This seems particularly likely to happen in a scenario where a loose
ref was just packed. If the ref was loose, our thread would see the
current ref value (because we don't snapshot loose refs and always read
them from disk), but since there is no loose ref, we expect to find the
current value in packed-refs. However, (before this change) we rely
on our snapshot of packed-refs which does not contain the updated ref
value.
Invalidating the cache after the loose ref is locked ensures that the
ref value does not change again before we read it to perform the update.
Matthias Sohn [Fri, 15 Sep 2023 09:48:05 +0000 (11:48 +0200)]
[errorprone] Fix wrong comparison which always evaluated to false
org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/commitgraph/GraphObjectIndex.java:59:
error: [ComparisonOutOfRange] ints may have a value in the range
-2147483648 to 2147483647; therefore, this comparison to
Integer.MAX_VALUE will always evaluate to false
if (table[k] > Integer.MAX_VALUE) {
^
See https://errorprone.info/bugpattern/ComparisonOutOfRange
We need to check if variable `uint` of type `long` exceeds the maximum
possible int value before casting it to `int` below.
This was introduced in Ib5c0d6678cb242870a0f5841bd413ad3885e95f6
Matthias Sohn [Fri, 15 Sep 2023 09:44:09 +0000 (11:44 +0200)]
[errorprone] Remove unnecessary comparison
Raised by errorprone:
org.eclipse.jgit/src/org/eclipse/jgit/lib/CommitConfig.java:406: error:
[ComparisonOutOfRange] chars may have a value in the range 0 to 65535;
therefore, this comparison to 0 will always evaluate to true
if (ch >= 0 && ch < inUse.length) {
^
see https://errorprone.info/bugpattern/ComparisonOutOfRange
Dariusz Luksza [Mon, 12 Feb 2024 09:56:36 +0000 (09:56 +0000)]
Merge branch 'stable-6.6' into stable-6.7
* stable-6.6:
Improve handling of NFS stale handle errors
Fix handling of missing pack index file
Add tests for handling pack files removal during fetch
Dariusz Luksza [Mon, 20 Nov 2023 11:53:19 +0000 (11:53 +0000)]
Improve handling of NFS stale handle errors
Mark packfile as invalid when NFS stale handle error occurs.
This should fix broken fetch operations when the repo is located on the
NFS system and is GC'ed on a separate system (or process). Which may
result in the index, pack or bitmap file being removed when they are
accessed from the fetch operation.
Dariusz Luksza [Mon, 20 Nov 2023 11:00:51 +0000 (11:00 +0000)]
Fix handling of missing pack index file
As demonstrated in
`UploadPackHandleDeletedPackFile.testV2IdxFileRemovedDuringUploadPack`
the fetch operation will fail when the pack index file is removed.
This is due to a wrapping of `FileNotFoundException` (which is a
subclass of `IOExeption`) in an `IOException` at PackIndex L#68. This
is then changing the behaviour of error handling in
`Pack.file.getBitmapIndex()` where the `FileNotFoundException` is
swallowed and allows the fetch process to continue. With FNFE being
wrapped in IOE, this blows up and breaks the fetch operation.
Simply rethrowing `FileNotFoundException` from `PackFile.open()` fixes
the broken fetch operation. This will also mark the whole pack as
invalid in the `IOException` handler in `Pack.idx()` method.
Dariusz Luksza [Fri, 17 Nov 2023 19:28:53 +0000 (19:28 +0000)]
Add tests for handling pack files removal during fetch
Although this could sound like a corner case, it really can occur out
there in the real world. Especially in the Gerrit world where the
repositories could be GC'ed on a separate process or system.
The `FileNotFoundException` seems to be handled correctly in
`PackFile#doOpen` (line 671) and it will mark the pack as invalid. But
triggering that code path was not an easy task.
First of all, we need to add a new commit to the `master` branch of the
test repository after `UploadPack` object is created.
Secondly, in the refspec for fetch, commit id instead of "regular"
refspec must be used.
With both in place, we can see a warning log statement about deleted
pack file. And the fetch succeeds!
Also, tests for the removal of *.idx and *.bitmap files were added.
This unveiled a corner for the *.idx file deletion while fetching, as
the test will fail with "Unreachable pack index" IOException only
when the HEAD commit is empty.
Matthias Sohn [Fri, 19 Jan 2024 23:18:25 +0000 (00:18 +0100)]
Merge branch 'stable-6.6' into stable-6.7
* stable-6.6:
Introduce a PriorityQueue sorting RevCommits by commit timestamp
Remove org.eclipse.jgit.benchmark/.factorypath
Update jmh to 1.37 for org.eclipse.jgit.benchmark
Luca Milanesio [Mon, 13 Jun 2022 22:09:55 +0000 (23:09 +0100)]
Introduce a PriorityQueue sorting RevCommits by commit timestamp
The DateRevQueue uses a tailor-made algorithm to keep
RevCommits sorted by reversed commit timestamp, which has a O(n*n/2)
complexity and caused the explosion of the Git fetch times to
tens of seconds.
The standard Java PriorityQueue provides a O(n*log(n)) complexity
and scales much better with the increase of the number of
RevCommits.
Introduce a new implementation DateRevPriorityQueue of the DateRevQueue
based on PriorityQueue.
Enable usage of the new DateRevPriorityQueue implementation by setting
the system property REVWALK_USE_PRIORITY_QUEUE=true. By default the old
implementation DateRevQueue is used.
Revert commit 170244d05977491271a1cc234583d2e5ba75145d
"Checkout: better directory handling" which is the downport of the
original fix Ie12864c54c9f901a2ccee7caddec73027f353111 which was done
on stable-6.6. Merging this up to stable-6.6 would be a lot of work and
these branches aren't maintained anymore hence revert this change here.
This way the fix is available on stable-5.13 for those who still need
Java 8 and everybody else should upgrade to 6.6.1 or higher.
Fabio Ponciroli [Wed, 6 Dec 2023 13:38:21 +0000 (14:38 +0100)]
Make sure ref to prune is in packed refs
RefDirectory:pack might raise an NPE when deleting loose
refs as final part of the RefDirectory.pack().
This is what the code does:
1) packed ref update: update the list of refs which will be
persisted in packed-refs
2) persit packed-refs: flush on file the refs computed in #1
3) prune loose refs: prune loose refs that have been packed in #2
The code correctly locks the packed-refs file during phases 1 to 3.
However, it makes the wrong assumption of considering
the loose refs set as immutable between phases 1 and 3.
The number and values of loose refs on the filesystem can mutate
at any time whilst the RefDirectory.pack() is in progress.
Assuming the contrary can lead to an NPE when retrieving refs
from the mutable loose refs list during phase #3.
Make sure that the ref is not null before dereferencing its
object-id value.
When checking out a file into the working tree ensure that all parent
directories of the file below the working tree root are actually
directories and do exist before we try to create the file.
When multiple files are to be checked out (or even a whole tree), this
may check the same directories over and over again. Asking the file
system every time for file attributes is a potentially expensive
operation. As a remedy, introduce an in-memory cache of directory
states for a particular check-out operation.
Apply the same fix also in the ResolveMerger, which may also check out
files, and also in the PatchApplier. In PatchApplier, also validate
paths.
Change-Id: Ie12864c54c9f901a2ccee7caddec73027f353111 Signed-off-by: Thomas Wolf <twolf@apache.org> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Matthias Sohn [Fri, 13 Oct 2023 07:06:21 +0000 (09:06 +0200)]
Merge branch 'stable-6.6' into stable-6.7
* stable-6.6:
PackConfig: fix @since tags
Remove unused API problem filters
Add support for git config repack.packKeptObjects
Do not exclude objects in locked packs from bitmap processing
Matthias Sohn [Fri, 13 Oct 2023 06:46:11 +0000 (08:46 +0200)]
Merge branch 'stable-6.5' into stable-6.6
* stable-6.5:
PackConfig: fix @since tags
Remove unused API problem filters
Add support for git config repack.packKeptObjects
Do not exclude objects in locked packs from bitmap processing
Matthias Sohn [Thu, 12 Oct 2023 23:52:43 +0000 (01:52 +0200)]
Merge branch 'stable-6.4' into stable-6.5
* stable-6.4:
PackConfig: fix @since tags
Remove unused API problem filters
Add support for git config repack.packKeptObjects
Do not exclude objects in locked packs from bitmap processing
Matthias Sohn [Thu, 12 Oct 2023 23:33:56 +0000 (01:33 +0200)]
Merge branch 'stable-6.3' into stable-6.4
* stable-6.3:
PackConfig: fix @since tags
Remove unused API problem filters
Add support for git config repack.packKeptObjects
Do not exclude objects in locked packs from bitmap processing
Matthias Sohn [Thu, 12 Oct 2023 22:51:46 +0000 (00:51 +0200)]
Merge branch 'stable-6.2' into stable-6.3
* stable-6.2:
PackConfig: fix @since tags
Remove unused API problem filters
Add support for git config repack.packKeptObjects
Do not exclude objects in locked packs from bitmap processing
Matthias Sohn [Thu, 12 Oct 2023 22:44:33 +0000 (00:44 +0200)]
Merge branch 'stable-6.1' into stable-6.2
* stable-6.1:
PackConfig: fix @since tags
Remove unused API problem filters
Add support for git config repack.packKeptObjects
Do not exclude objects in locked packs from bitmap processing
Matthias Sohn [Thu, 12 Oct 2023 22:33:19 +0000 (00:33 +0200)]
Merge branch 'stable-6.0' into stable-6.1
* stable-6.0:
PackConfig: fix @since tags
Remove unused API problem filters
Add support for git config repack.packKeptObjects
Do not exclude objects in locked packs from bitmap processing
Matthias Sohn [Thu, 12 Oct 2023 22:22:11 +0000 (00:22 +0200)]
Merge branch 'stable-5.13' into stable-6.0
* stable-5.13:
PackConfig: fix @since tags
Remove unused API problem filters
Add support for git config repack.packKeptObjects
Do not exclude objects in locked packs from bitmap processing
Antonio Barone [Thu, 5 Oct 2023 19:58:13 +0000 (21:58 +0200)]
Add support for git config repack.packKeptObjects
Change Ide3445e652 introduced the `--pack-kept-objects` option to GC for
including the objects contained in the locked packfiles during the
repack phase.
Whilst this allowed to explicitly pass a command line argument to the
jgit gc program, it did not allow the option to be read from
configuration.
Allow the pack kept objects option to be configured exactly as C-Git
documents [1], by introducing a new `repack.packKeptObjects`
configuration.
`repack.packKeptObjects` defaults to `true`, when the
`pack.buildBitmaps` is `true` (which is the default case), `false`
otherwise.
Luca Milanesio [Fri, 11 Aug 2023 19:15:17 +0000 (20:15 +0100)]
Do not exclude objects in locked packs from bitmap processing
Packfiles having an equivalent .keep file are associated with in-flight
pushes that haven't been completed, with potentially a set of git
objects not yet referenced by a ref.
If the Git client is not up-to-date, it may result in pushing a
packfile, generating a <packfile>.keep on the server, which
may also contain existing commits due to the lack of Git protocol
negotiation in the git-receive-pack.
The Git protocol negotiation is the phase where the client and the
server exchange the list of refs they have for trying to find a common
base and minimise the amount of objects to be transferred.
The repack phase in GC was previously skipping all objects that were
contained in all packfiles having a <packfile>.keep file associated
(aka "locked packfiles"), which did not take into consideration the
fact that excluding the existing commits would have resulted in the
generation of an invalid bitmap file.
The code for excluding the objects in the locked packfiles was written
well before the bitmap was introduced, hence could not consider a use
case that did not exist at that time.
However, when the bitmap was introduced, the exclusion of locked
packfiles was not changed, hence creating a potential problem.
The issue went unnoticed for many years because the bitmap generation
was disabled when JGit noticed any locked packfiles; however, the
bitmaps are enabled again since Id722e68d9f , and the the issue is now
visible and is impacting the GC repack phase.
Introduce the '--pack-kept-objects' option in GC for including the
objects contained in the locked packfiles during the repack phase,
which is not an issue because of the following:
- If there are any existing commits duplicated in the packfiles
they will be just considered once anyway because the repack doesn't
generate duplicates in the output packfile.
- If there are any new commits that do not have any ref pointing to
them, they will be automatically excluded from the output repacked
packfile.
The same identical solution is adopted in the C implementation of git
in repack.c.
Because the locked packfile is not pruned, any new commits not pointed
by any refs will remain in the repository and there will not be any
accidental pruning or object loss as it is today before this change.
As a side-effect of this change, it is now potentially possible to still
have duplicate BLOBs after GC when the keep packfile contained existing
objects. However, it is way better to keep the duplication until the
next GC phase rather than omitting existing objects from repacking and,
therefore generating an invalid bitmap and incorrect packfile.
Thomas Wolf [Fri, 11 Aug 2023 19:40:13 +0000 (21:40 +0200)]
Checkout: better directory handling
When checking out a file into the working tree ensure that all parent
directories of the file below the working tree root are actually
directories and do exist before we try to create the file.
When multiple files are to be checked out (or even a whole tree), this
may check the same directories over and over again. Asking the file
system every time for file attributes is a potentially expensive
operation. As a remedy, introduce an in-memory cache of directory
states for a particular check-out operation.
Apply the same fix also in the ResolveMerger, which may also check out
files, and also in the PatchApplier. In PatchApplier, also validate
paths.
Change-Id: Ie12864c54c9f901a2ccee7caddec73027f353111 Signed-off-by: Thomas Wolf <twolf@apache.org> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Ivan Frade [Fri, 1 Sep 2023 18:25:50 +0000 (11:25 -0700)]
CommitGraphWriter: throw exception on unknown chunk
CommitGraphWriter first defines the chunks and then writes them. If at
write time a chunk is unknown, it is ignored. This is brittle: if
somebody adds a chunk to the header but not to the actual writing, the
commit-graph is broken and there is no error reported anywhere.
Throw exception if at write time a chunk is unknown. This can only
happen by a coding error in the writer.
Matthias Sohn [Wed, 30 Aug 2023 14:43:08 +0000 (16:43 +0200)]
Merge branch 'master' into stable-6.7
* master:
Remove the cbi-snapshots Maven repository
Update Orbit to orbit-aggregation/release/4.29.0
Add target platform for Eclipse 2023-09 (4.29)
Use release p2 repo for Eclipse 2023-06 (4.28)
Update tycho to 4.0.2
Update jmh to 1.37
Update bouncycastle to 1.76
Fix some tests in ConfigTest
Handle global git config $XDG_CONFIG_HOME/git/config
IO: use JDK convenience methods
org.eclipse.jgit.junit.ssh/.settings/.api_filters: fix unclosed tags
ReadChangedPathFilter: fix Non-externalized string literal warning
Introduce core.packedIndexGitUseStrongRefs config key
DfsReader: Make PackLoadListener interface visible to subclasses
DfsGarbageCollector: provide commit graph stats
DfsGarbageCollector: put only GC commits into the commit graph
DfsReader: Expose when indices are loaded
Thomas Wolf [Wed, 5 Jul 2023 20:21:30 +0000 (22:21 +0200)]
Handle global git config $XDG_CONFIG_HOME/git/config
C git uses this alternate fallback location if the file exists and
~/.gitconfig does not. Implement this also for JGit.
If both files exist, reading behavior is as if the XDG config was
inserted between the HOME config and the system config. Writing
behaviour is different: all changes will be applied only in the HOME
config. Updates will occur in the XDG config only if the HOME config
does not exist.
This is consistent with the behavior of C git; compare [1], especially
the sections on FILES and SCOPES, and the description of the --global
option.
[1] https://git-scm.com/docs/git-config
Bug: 581875
Change-Id: I2460b9aa963fd2811ed8a5b77b05107d916f2b44 Signed-off-by: Thomas Wolf <twolf@apache.org>
Introduce a core.packedIndexGitUseStrongRefs configuration key, which
defaults to true so that the current behavior does not change. However,
setting it to false allows soft references to be used for Pack indices
instead of strong references so that they can be garbage collected when
there is memory pressure.
Pack objects can be large when associated with pack files with large
object counts, and this memory is not really accounted for or tracked by
the WindowCache and it can be very substantial at times, especially with
many large object count projects. A particularly problematic use case is
Gerrit's ls-projects command which loads very little data in the
WindowCache via ByteWindows, but ends up loading and holding many entire
indices in memory, sometimes even after the ByteWindows for their Pack
objects have already been garbage collected since they won't get cleared
until after a new ByteWindow is loaded. By using SoftReferences, single
use indices can get cleared when there is memory pressure and OOMs can
be easily avoided, drastically reducing the amount of memory required to
perform an ls-projects on large sites with many projects and large
object counts.
On one of our test sites, an ls-projects command with strong index
references requires more than 66GB of heap to complete successfully,
with soft index references it requires less than 23GB.
Change-Id: I3cb3df52f4ce1b8c554d378807218f199077d80b Signed-off-by: Martin Fick <quic_mfick@quicinc.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Ivan Frade [Wed, 16 Aug 2023 20:26:39 +0000 (13:26 -0700)]
DfsGarbageCollector: put only GC commits into the commit graph
GC puts all commits reachable from heads and tags into the GC pack,
and commits reachable only from other refs (e.g. refs/changes) into
GC_REST. The commit-graph contains all commits in GC and GC_REST. This
produces too big commit graphs in some repos, beating the purpose of
loading the index.
Limit the commit graph to commits reachable from heads and tags
(i.e. commits in the GC pack).
Ivan Frade [Fri, 28 Jul 2023 09:41:26 +0000 (11:41 +0200)]
DfsReader: Expose when indices are loaded
We want to measure the data used to serve a request. As a first step,
we want to know how many indices are accessed during the request and
their sizes.
Expose an interface in DfsReader to announce when an index is loaded
into the reader, i.e. when its reference is set.
The interface is more flexible to implementors (what/how to collect)
than the existing DfsReaderIOStats object.
Matthias Sohn [Thu, 3 Aug 2023 08:19:05 +0000 (10:19 +0200)]
Merge branch 'stable-6.7'
* stable-6.7:
Update to Tycho 4.0.1
Prepare 6.7.0-SNAPSHOT builds
JGit v6.7.0.202308011830-m2
Add verification in GcKeepFilesTest that bitmaps are generated
Express the explicit intention of creating bitmaps in GC
GC: prune all packfiles after the loosen phase
Prepare 5.13.3-SNAPSHOT builds
JGit v5.13.2.202306221912-r