Thomas Wolf [Fri, 11 Aug 2023 19:40:13 +0000 (21:40 +0200)]
Checkout: better directory handling
When checking out a file into the working tree ensure that all parent
directories of the file below the working tree root are actually
directories and do exist before we try to create the file.
When multiple files are to be checked out (or even a whole tree), this
may check the same directories over and over again. Asking the file
system every time for file attributes is a potentially expensive
operation. As a remedy, introduce an in-memory cache of directory
states for a particular check-out operation.
Apply the same fix also in the ResolveMerger, which may also check out
files, and also in the PatchApplier. In PatchApplier, also validate
paths.
Change-Id: Ie12864c54c9f901a2ccee7caddec73027f353111 Signed-off-by: Thomas Wolf <twolf@apache.org> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Ivan Frade [Fri, 1 Sep 2023 18:25:50 +0000 (11:25 -0700)]
CommitGraphWriter: throw exception on unknown chunk
CommitGraphWriter first defines the chunks and then writes them. If at
write time a chunk is unknown, it is ignored. This is brittle: if
somebody adds a chunk to the header but not to the actual writing, the
commit-graph is broken and there is no error reported anywhere.
Throw exception if at write time a chunk is unknown. This can only
happen by a coding error in the writer.
Matthias Sohn [Wed, 30 Aug 2023 14:43:08 +0000 (16:43 +0200)]
Merge branch 'master' into stable-6.7
* master:
Remove the cbi-snapshots Maven repository
Update Orbit to orbit-aggregation/release/4.29.0
Add target platform for Eclipse 2023-09 (4.29)
Use release p2 repo for Eclipse 2023-06 (4.28)
Update tycho to 4.0.2
Update jmh to 1.37
Update bouncycastle to 1.76
Fix some tests in ConfigTest
Handle global git config $XDG_CONFIG_HOME/git/config
IO: use JDK convenience methods
org.eclipse.jgit.junit.ssh/.settings/.api_filters: fix unclosed tags
ReadChangedPathFilter: fix Non-externalized string literal warning
Introduce core.packedIndexGitUseStrongRefs config key
DfsReader: Make PackLoadListener interface visible to subclasses
DfsGarbageCollector: provide commit graph stats
DfsGarbageCollector: put only GC commits into the commit graph
DfsReader: Expose when indices are loaded
Thomas Wolf [Wed, 5 Jul 2023 20:21:30 +0000 (22:21 +0200)]
Handle global git config $XDG_CONFIG_HOME/git/config
C git uses this alternate fallback location if the file exists and
~/.gitconfig does not. Implement this also for JGit.
If both files exist, reading behavior is as if the XDG config was
inserted between the HOME config and the system config. Writing
behaviour is different: all changes will be applied only in the HOME
config. Updates will occur in the XDG config only if the HOME config
does not exist.
This is consistent with the behavior of C git; compare [1], especially
the sections on FILES and SCOPES, and the description of the --global
option.
[1] https://git-scm.com/docs/git-config
Bug: 581875
Change-Id: I2460b9aa963fd2811ed8a5b77b05107d916f2b44 Signed-off-by: Thomas Wolf <twolf@apache.org>
Introduce a core.packedIndexGitUseStrongRefs configuration key, which
defaults to true so that the current behavior does not change. However,
setting it to false allows soft references to be used for Pack indices
instead of strong references so that they can be garbage collected when
there is memory pressure.
Pack objects can be large when associated with pack files with large
object counts, and this memory is not really accounted for or tracked by
the WindowCache and it can be very substantial at times, especially with
many large object count projects. A particularly problematic use case is
Gerrit's ls-projects command which loads very little data in the
WindowCache via ByteWindows, but ends up loading and holding many entire
indices in memory, sometimes even after the ByteWindows for their Pack
objects have already been garbage collected since they won't get cleared
until after a new ByteWindow is loaded. By using SoftReferences, single
use indices can get cleared when there is memory pressure and OOMs can
be easily avoided, drastically reducing the amount of memory required to
perform an ls-projects on large sites with many projects and large
object counts.
On one of our test sites, an ls-projects command with strong index
references requires more than 66GB of heap to complete successfully,
with soft index references it requires less than 23GB.
Change-Id: I3cb3df52f4ce1b8c554d378807218f199077d80b Signed-off-by: Martin Fick <quic_mfick@quicinc.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Ivan Frade [Wed, 16 Aug 2023 20:26:39 +0000 (13:26 -0700)]
DfsGarbageCollector: put only GC commits into the commit graph
GC puts all commits reachable from heads and tags into the GC pack,
and commits reachable only from other refs (e.g. refs/changes) into
GC_REST. The commit-graph contains all commits in GC and GC_REST. This
produces too big commit graphs in some repos, beating the purpose of
loading the index.
Limit the commit graph to commits reachable from heads and tags
(i.e. commits in the GC pack).
Ivan Frade [Fri, 28 Jul 2023 09:41:26 +0000 (11:41 +0200)]
DfsReader: Expose when indices are loaded
We want to measure the data used to serve a request. As a first step,
we want to know how many indices are accessed during the request and
their sizes.
Expose an interface in DfsReader to announce when an index is loaded
into the reader, i.e. when its reference is set.
The interface is more flexible to implementors (what/how to collect)
than the existing DfsReaderIOStats object.
Matthias Sohn [Thu, 3 Aug 2023 08:19:05 +0000 (10:19 +0200)]
Merge branch 'stable-6.7'
* stable-6.7:
Update to Tycho 4.0.1
Prepare 6.7.0-SNAPSHOT builds
JGit v6.7.0.202308011830-m2
Add verification in GcKeepFilesTest that bitmaps are generated
Express the explicit intention of creating bitmaps in GC
GC: prune all packfiles after the loosen phase
Prepare 5.13.3-SNAPSHOT builds
JGit v5.13.2.202306221912-r
Matthias Sohn [Thu, 3 Aug 2023 08:17:22 +0000 (10:17 +0200)]
Merge branch 'stable-6.6' into stable-6.7
* stable-6.6:
Update to Tycho 4.0.1
Add verification in GcKeepFilesTest that bitmaps are generated
Express the explicit intention of creating bitmaps in GC
GC: prune all packfiles after the loosen phase
Prepare 5.13.3-SNAPSHOT builds
JGit v5.13.2.202306221912-r
Matthias Sohn [Thu, 3 Aug 2023 08:14:45 +0000 (10:14 +0200)]
Merge branch 'stable-6.5' into stable-6.6
* stable-6.5:
Add verification in GcKeepFilesTest that bitmaps are generated
Express the explicit intention of creating bitmaps in GC
GC: prune all packfiles after the loosen phase
Prepare 5.13.3-SNAPSHOT builds
JGit v5.13.2.202306221912-r
Matthias Sohn [Thu, 3 Aug 2023 08:12:57 +0000 (10:12 +0200)]
Update to Tycho 4.0.1
Tycho 4.0.0-SNAPSHOT is no longer available and it's a bad practice to
depend on any snapshot version (we had to since this was the only way
to get gpg signing to work in time for releasing 6.6.0).
Matthias Sohn [Wed, 2 Aug 2023 23:55:12 +0000 (01:55 +0200)]
Merge branch 'stable-6.4' into stable-6.5
* stable-6.4:
Add verification in GcKeepFilesTest that bitmaps are generated
Express the explicit intention of creating bitmaps in GC
GC: prune all packfiles after the loosen phase
Prepare 5.13.3-SNAPSHOT builds
JGit v5.13.2.202306221912-r
Matthias Sohn [Wed, 2 Aug 2023 23:51:36 +0000 (01:51 +0200)]
Merge branch 'stable-6.3' into stable-6.4
* stable-6.3:
Add verification in GcKeepFilesTest that bitmaps are generated
Express the explicit intention of creating bitmaps in GC
GC: prune all packfiles after the loosen phase
Prepare 5.13.3-SNAPSHOT builds
JGit v5.13.2.202306221912-r
Matthias Sohn [Wed, 2 Aug 2023 23:37:43 +0000 (01:37 +0200)]
Merge branch 'stable-6.2' into stable-6.3
* stable-6.2:
Add verification in GcKeepFilesTest that bitmaps are generated
Express the explicit intention of creating bitmaps in GC
GC: prune all packfiles after the loosen phase
Prepare 5.13.3-SNAPSHOT builds
JGit v5.13.2.202306221912-r
Matthias Sohn [Wed, 2 Aug 2023 23:28:07 +0000 (01:28 +0200)]
Merge branch 'stable-6.1' into stable-6.2
* stable-6.1:
Add verification in GcKeepFilesTest that bitmaps are generated
Express the explicit intention of creating bitmaps in GC
GC: prune all packfiles after the loosen phase
Prepare 5.13.3-SNAPSHOT builds
JGit v5.13.2.202306221912-r
Matthias Sohn [Wed, 2 Aug 2023 23:19:21 +0000 (01:19 +0200)]
Merge branch 'stable-6.0' into stable-6.1
* stable-6.0:
Add verification in GcKeepFilesTest that bitmaps are generated
Express the explicit intention of creating bitmaps in GC
GC: prune all packfiles after the loosen phase
Prepare 5.13.3-SNAPSHOT builds
JGit v5.13.2.202306221912-r
Matthias Sohn [Wed, 2 Aug 2023 23:17:17 +0000 (01:17 +0200)]
Merge branch 'stable-5.13' into stable-6.0
* stable-5.13:
Add verification in GcKeepFilesTest that bitmaps are generated
Express the explicit intention of creating bitmaps in GC
GC: prune all packfiles after the loosen phase
Prepare 5.13.3-SNAPSHOT builds
JGit v5.13.2.202306221912-r
Ivan Frade [Tue, 25 Jul 2023 10:25:33 +0000 (03:25 -0700)]
RevCommitCG: Read changed-path-filters directly from commit graph
RevCommit and RevCommitCG were designed like "pointers" to data that
load the content on demand, not on construction. This saves memory.
Make the loading of changed-path-filter follow the same pattern. The
ChangedPathFilters are only pointers to locations in the commit-graph
(not the actual data), so the memory saving is not that big, but this
is more consistent with the rest of the API.
As 6.7 is not released, we can still change the RevWalk API.
Ronald Bhuleskar [Wed, 19 Jul 2023 21:25:46 +0000 (14:25 -0700)]
Identify a commit that generates a diffEntry on a rename Event.
When using FollowFilter's rename callback, a callback is generated with the diff. The caller that is interested in the renames knows what the diff's are but have no idea what commit generated that diff.
This will allow FollowFilter's rename callback to track diffEntry for a given commit.
Anna Papitto [Fri, 14 Jul 2023 19:19:27 +0000 (12:19 -0700)]
PackReverseIndexV1: reverse index parsed from version 1 file
The reverse index for a pack is used to quickly find an object's
position in the pack's forward index based on that object's pack offset.
It is currently computed from the forward index by sorting the index
entries by the corresponding pack offset. This computation uses
insertion sort, which has an average runtime of O(n^2).
Cgit persists a pack reverse index file
to avoid recomputing the reverse index ordering. Instead they write a
file with format
https://git-scm.com/docs/pack-format#_pack_rev_files_have_the_format
which can later be read and parsed into the in-memory reverse index
each time it is needed.
PackReverseIndexV1 parses a reverse index file with the official
version 1 format into an in-memory representation of the reverse index
which implements methods to find an object's forward index position
from its offset in logorithmic time.
Change-Id: I60a92463fbd6a8cc9c1c7451df1c14d0a21a0f64 Signed-off-by: Anna Papitto <annapapitto@google.com>
Anna Papitto [Fri, 14 Jul 2023 19:19:27 +0000 (12:19 -0700)]
PackReverseIndex: open file if present otherwise compute
The existing #read and #computeFromIndex static builder methods require
the caller to choose whether to supply an input stream of a reverse
index file or a forward index to compute the reverse index from, which
is slower.
Allow a caller to provide a file path where the pack's reverse index
might be and the pack's forward index index and simply get some reverse
index instance back. Prefer opening and parsing the file if it is
present, to save computation time. Otherwise, fall back onto computing
the reverse index from the pack's forward index.
Change-Id: I09bdd4b813ad62c86add586417b2ab86e9331aec Signed-off-by: Anna Papitto <annapapitto@google.com>
Anna Papitto [Fri, 14 Jul 2023 19:19:27 +0000 (12:19 -0700)]
PackReverseIndex: verify checksums
The new version 1 file-based reverse index has a footer with the
checksum of the corresponding pack file and a checksum of its own
contents. The initial implementation doesn't enforce that the pack
checksum matches the checksum found in the forward index nor that the
self checksum matches the contents of the file just read in.
Offer a method for reverse index users to verify the checksums in a way
appropriate to the version being used. For the pre-existing computed
version, always succeed since it is not based on a file so there is no
possibility of corruption.
Check for corruption of the file itself during parsing the checksum
footer, by comparing the self checksum with the digest of the file
contents read.
Change-Id: I87ff3933cf1afa76663350400b616695e4966cb6 Signed-off-by: Anna Papitto <annapapitto@google.com>
The ComputedPackReverseIndex uses a custom sorting algorithm, based on
bucket sort with insertion sort but with the data managed as a linked
list across two int arrays. This custom algorithm relies on the set of
values being sorted being exactly 0, ..., n-1; so that they can serve a
second purpose of being indexes into a second equally sized list.
This custom algorithm was introduced ~10 years ago in
https://eclipse.googlesource.com/jgit/jgit/+/6cc532a43cf28403cb623d3df8600a2542a40a43.
The original author is no longer an active contributor, so it is
valuable for the code to be readable, especially as there is currently
active work on reverse indexes.
Rename variables and add comments to clarify the algorithm and improve
readability. There are no functional changes to the algorithm.
Change-Id: Ic3b682203f20e06f9f865f81259e034230f9720a Signed-off-by: Anna Papitto <annapapitto@google.com>
Ronald Bhuleskar [Wed, 17 May 2023 23:29:14 +0000 (16:29 -0700)]
CommitGraphWriter: add option for writing/using bloom filters
Currently, bloom filters are written and used without any way to turn
them off. Add a per-repo config variable to control whether bloom
filters are written. As for reading, add a JGit option to control this.
(A JGit option is used instead of a per-repo config variable as there is
usually no reason not to use the bloom filters if they are present, but
a global control to disable them is useful if there turns out to be an
issue with the implementation of bloom filters.)
The config that controls reading is the same as C Git, but the config
for writing is not: C Git has no config to control writing, but whether
bloom filters are written depends on whether bloom filters are already
present and what arguments are passed to "git commit-graph write". See
the manpage of "git commit-graph" for more information.
Jonathan Tan [Tue, 2 May 2023 17:44:16 +0000 (10:44 -0700)]
RevWalk: use changed path filters
Teach RevWalk, TreeRevFilter, PathFilter, and FollowFilter to use
changed path filters, whenever available, to speed revision walks by
skipping commits that fail the changed path filter.
This work is based on earlier work by Kyle Zhao
(I441be984b609669cff77617ecfc838b080ce0816).
Change-Id: I7396f70241e571c63aabe337f6de1b8b9800f7ed Signed-off-by: kylezhao <kylezhao@tencent.com> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Jonathan Tan [Mon, 24 Apr 2023 19:55:30 +0000 (12:55 -0700)]
CommitGraphLoader: read changed-path filters
As described in the parent commit, add support for reading the BIDX and
BDAT chunks of the commit graph file, as described in man gitformat-
commit-graph(5).
This work is based on earlier work by Kyle Zhao
(I160f6b022afaa842c331fb9a086974e49dced7b2).
Change-Id: I82e02e6a3a3b758e6bf9d7bbd2198f0ffe3a331b Signed-off-by: kylezhao <kylezhao@tencent.com> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Jonathan Tan [Tue, 18 Apr 2023 22:20:02 +0000 (15:20 -0700)]
CommitGraphWriter: write changed-path filters
Add support for writing the BIDX and BDAT chunks of the commit graph
file, as described in man gitformat-commit-graph(5). The ability to read
such chunks will be added in a subsequent commit.
This work is based on earlier work by Kyle Zhao
(Ib863782af209f26381e3ca0a2c119b99e84b679c).
Change-Id: Ic18e6f0eeec7da1e1ff31751aabda5e6952dbe6e Signed-off-by: kylezhao <kylezhao@tencent.com> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Thomas Wolf [Sun, 9 Jul 2023 18:06:37 +0000 (20:06 +0200)]
ssh: PKCS#11 support
Support PKCS#11 HSMs (like YubiKey PIV) for SSH authentication.
Use the SunPKCS11 provider as described at [1]. This provider
dynamically loads the library from the PKCS11Provider SSH configuration
and creates a Java KeyStore with that provider. A Java CallbackHandler
is needed to feed PIN prompts from the KeyStore into the JGit
CredentialsProvider framework. Because the JGit CredentialsProvider may
be specific to a SSH session but the PKCS11Provider may be used by
several sessions, the CallbackHandler needs to be configurable per
session.
PIN prompts respect the NumberOfPasswordPrompts SSH configuration. As
long as the library asks only for a PIN, we use the KeyPasswordProvider
to prompt for it. This gives automatic integration in Eclipse with the
Eclipse secure storage, so a user has even the option to store the PIN
there. (Eclipse will then ask for the secure storage master password on
first access, so the usefulness of this is debatable.)
By default the provider uses the first PKCS#11 token (slot list index
zero). This can be overridden by a non-standard PKCS11SlotListIndex
ssh configuration entry. (For OpenSSH interoperability, also set
"IgnoreUnknown PKCS11SlotListIndex" in the SSH config file then.)
Once loaded, the provider and its shared library and the keys
contained remain available until the application exits.
Manually tested using SoftHSM. See file manual_tests.txt. Kudos to
Christopher Lamb for additional manual testing with a real YubiKey,
also on Windows.[2]
Matthias Sohn [Sun, 16 Jul 2023 22:56:17 +0000 (00:56 +0200)]
GC: Remove handling of extra pack for RefTree
RefTree was packed in its own packfile, see
Icbb735be8fa91ccbf0708ca3a219b364e11a6b83.
RefTree was deleted in Ia3da7f2b82d9e365cec2ccf9397cbc47439cd150, since
it was experimental and never used productively. This change missed to
remove the extra pack handling for RefTree.
Ivan Frade [Wed, 15 Dec 2021 00:35:54 +0000 (16:35 -0800)]
DfsPackParser: Create object indices if config says so
The DfsInserter writes the pack and its indices in the flush() method,
but when the writing happens via DfsPackParser, it is the parser which
writes the pack and indices. When combined with a parser, flushing the
inserter is a noop.
Add the writing of the object size index to the packparser#parse
method, mirroring how the primary index is written.
Ivan Frade [Wed, 29 Dec 2021 23:39:41 +0000 (15:39 -0800)]
DFSGarbargeCollector: Write object size indices
PackWriter knows how to add an object size index to the pack, but the
garbage collector is not using it yet.
Teach DfsGarbageCollector to write the object size index on
writePack(). Disable by default in the unreachable-garbage pack.
Callers control the content/presence of the index through the
PackConfig option (minBytesForObjSizeIndex) for all other packs, so
there is no need of a specific flag in DfsGarbageCollector.
Ivan Frade [Thu, 30 Dec 2021 16:33:08 +0000 (08:33 -0800)]
DfsReader/PackFile: Implement isNotLargerThan using the obj size idx
isNotLargerThan() can avoid reading the size of a blob from disk using
the object size idx if available.
Load the object size index in the DfsPackfile following the same
pattern than the other indices. Override isNotLargerThan in DfsReader
to use the index when available.
Following CL introduces the writing of the object size index and the
tests cover this code.
Luca Milanesio [Sat, 10 Jun 2023 00:20:44 +0000 (01:20 +0100)]
Add verification in GcKeepFilesTest that bitmaps are generated
The packfiles with the .keep extensions are meant to prevent
a packfile from being processed or removed during GC.
From the point of view of the GC process then, the associated
packfile should be completely transparent:
- it should not included in the repacked file
- it should not pruned
- its objects should be left untouched, even if unreachable
- the GC process, including the bitmap generation should continue
as usual, as the the packfiles with .keep file did not exist
Add one explicit test for making sure that the management
of .keep file is also transparent to the generation of bitmaps,
which are still generated if a .keep file exists.
Luca Milanesio [Sat, 10 Jun 2023 01:21:07 +0000 (02:21 +0100)]
Express the explicit intention of creating bitmaps in GC
Add an explicit flag to PackWriter for allowing the
GC.repack() phase to explicitly generate bitmaps only for the
heads packfile and not for the others.
Previously the bitmap generation was conditioned to the
presence of object ids exclusion from the PackWriter.
The introduction of the bitmap generation in the PackWriter
done in Icdb0cdd66 has accidentally made the .keep files not
completely transparent, because their presence have disabled
the generation of the bitmap index, even if the generation
of bitmaps is enabled.
This bug has been an accidental consequence of the intention
of the bitmap generator to avoid generating bitmaps for the
non-heads packfile, however the implementation done by Colby
decided to use the excludeInPacks variable (see [1]) which
is unfortunately also used for excluding the packfiles having
an associated .keep file (see [2]).
Luca Milanesio [Sun, 11 Jun 2023 16:24:29 +0000 (17:24 +0100)]
GC: prune all packfiles after the loosen phase
When loosening the objects inside the packfiles to be pruned, make sure
that the packfile list is stable and prune all the files after the
loosening is done.
This prevents a series of exceptions previously thrown when loosening
the packfiles, due to the too early pruning of the packfiles that were
still in the pack list.
qin shulei [Wed, 28 Jun 2023 11:52:47 +0000 (19:52 +0800)]
Fix S3Repository getSize to handle larger object sizes
Update `getSize` method in `S3Repository` to handle larger object sizes.
The method previously used `Integer.parseInt`
to parse the `Content-Length` header of an HTTP response,
which limited the maximum object size to 2 GB.
Replaces `Integer.parseInt` with `Long.parseLong`,
allowing the method to handle object sizes larger than 2 GB.
- Use minio as local S3 service for gerrit lfs plugin
- The minio seems will return the Content-length
Anna Papitto [Tue, 27 Jun 2023 20:22:20 +0000 (13:22 -0700)]
DfsPackFile: make #getReverseIdx public
The DfsPackFile#getReverseIdx method, which wraps creating a
PackReverseIndex in caching, was package-private. This caused
implementations on top of DfsPackFile to directly instantiate a
PackReverseIndex in cases where it would benefit from caching.
Instead, make #getReverseIdx public so that the caching logic can be
reused by implementations where appropriate.
Change-Id: I4553e514a4ac320bfe2455c00023343ad97f9d15 Signed-off-by: Anna Papitto <annapapitto@google.com>