Ivan Frade [Tue, 25 Jul 2023 10:25:33 +0000 (03:25 -0700)]
RevCommitCG: Read changed-path-filters directly from commit graph
RevCommit and RevCommitCG were designed like "pointers" to data that
load the content on demand, not on construction. This saves memory.
Make the loading of changed-path-filter follow the same pattern. The
ChangedPathFilters are only pointers to locations in the commit-graph
(not the actual data), so the memory saving is not that big, but this
is more consistent with the rest of the API.
As 6.7 is not released, we can still change the RevWalk API.
Ronald Bhuleskar [Wed, 19 Jul 2023 21:25:46 +0000 (14:25 -0700)]
Identify a commit that generates a diffEntry on a rename Event.
When using FollowFilter's rename callback, a callback is generated with the diff. The caller that is interested in the renames knows what the diff's are but have no idea what commit generated that diff.
This will allow FollowFilter's rename callback to track diffEntry for a given commit.
Anna Papitto [Fri, 14 Jul 2023 19:19:27 +0000 (12:19 -0700)]
PackReverseIndexV1: reverse index parsed from version 1 file
The reverse index for a pack is used to quickly find an object's
position in the pack's forward index based on that object's pack offset.
It is currently computed from the forward index by sorting the index
entries by the corresponding pack offset. This computation uses
insertion sort, which has an average runtime of O(n^2).
Cgit persists a pack reverse index file
to avoid recomputing the reverse index ordering. Instead they write a
file with format
https://git-scm.com/docs/pack-format#_pack_rev_files_have_the_format
which can later be read and parsed into the in-memory reverse index
each time it is needed.
PackReverseIndexV1 parses a reverse index file with the official
version 1 format into an in-memory representation of the reverse index
which implements methods to find an object's forward index position
from its offset in logorithmic time.
Change-Id: I60a92463fbd6a8cc9c1c7451df1c14d0a21a0f64 Signed-off-by: Anna Papitto <annapapitto@google.com>
Anna Papitto [Fri, 14 Jul 2023 19:19:27 +0000 (12:19 -0700)]
PackReverseIndex: open file if present otherwise compute
The existing #read and #computeFromIndex static builder methods require
the caller to choose whether to supply an input stream of a reverse
index file or a forward index to compute the reverse index from, which
is slower.
Allow a caller to provide a file path where the pack's reverse index
might be and the pack's forward index index and simply get some reverse
index instance back. Prefer opening and parsing the file if it is
present, to save computation time. Otherwise, fall back onto computing
the reverse index from the pack's forward index.
Change-Id: I09bdd4b813ad62c86add586417b2ab86e9331aec Signed-off-by: Anna Papitto <annapapitto@google.com>
Anna Papitto [Fri, 14 Jul 2023 19:19:27 +0000 (12:19 -0700)]
PackReverseIndex: verify checksums
The new version 1 file-based reverse index has a footer with the
checksum of the corresponding pack file and a checksum of its own
contents. The initial implementation doesn't enforce that the pack
checksum matches the checksum found in the forward index nor that the
self checksum matches the contents of the file just read in.
Offer a method for reverse index users to verify the checksums in a way
appropriate to the version being used. For the pre-existing computed
version, always succeed since it is not based on a file so there is no
possibility of corruption.
Check for corruption of the file itself during parsing the checksum
footer, by comparing the self checksum with the digest of the file
contents read.
Change-Id: I87ff3933cf1afa76663350400b616695e4966cb6 Signed-off-by: Anna Papitto <annapapitto@google.com>
The ComputedPackReverseIndex uses a custom sorting algorithm, based on
bucket sort with insertion sort but with the data managed as a linked
list across two int arrays. This custom algorithm relies on the set of
values being sorted being exactly 0, ..., n-1; so that they can serve a
second purpose of being indexes into a second equally sized list.
This custom algorithm was introduced ~10 years ago in
https://eclipse.googlesource.com/jgit/jgit/+/6cc532a43cf28403cb623d3df8600a2542a40a43.
The original author is no longer an active contributor, so it is
valuable for the code to be readable, especially as there is currently
active work on reverse indexes.
Rename variables and add comments to clarify the algorithm and improve
readability. There are no functional changes to the algorithm.
Change-Id: Ic3b682203f20e06f9f865f81259e034230f9720a Signed-off-by: Anna Papitto <annapapitto@google.com>
Ronald Bhuleskar [Wed, 17 May 2023 23:29:14 +0000 (16:29 -0700)]
CommitGraphWriter: add option for writing/using bloom filters
Currently, bloom filters are written and used without any way to turn
them off. Add a per-repo config variable to control whether bloom
filters are written. As for reading, add a JGit option to control this.
(A JGit option is used instead of a per-repo config variable as there is
usually no reason not to use the bloom filters if they are present, but
a global control to disable them is useful if there turns out to be an
issue with the implementation of bloom filters.)
The config that controls reading is the same as C Git, but the config
for writing is not: C Git has no config to control writing, but whether
bloom filters are written depends on whether bloom filters are already
present and what arguments are passed to "git commit-graph write". See
the manpage of "git commit-graph" for more information.
Jonathan Tan [Tue, 2 May 2023 17:44:16 +0000 (10:44 -0700)]
RevWalk: use changed path filters
Teach RevWalk, TreeRevFilter, PathFilter, and FollowFilter to use
changed path filters, whenever available, to speed revision walks by
skipping commits that fail the changed path filter.
This work is based on earlier work by Kyle Zhao
(I441be984b609669cff77617ecfc838b080ce0816).
Change-Id: I7396f70241e571c63aabe337f6de1b8b9800f7ed Signed-off-by: kylezhao <kylezhao@tencent.com> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Jonathan Tan [Mon, 24 Apr 2023 19:55:30 +0000 (12:55 -0700)]
CommitGraphLoader: read changed-path filters
As described in the parent commit, add support for reading the BIDX and
BDAT chunks of the commit graph file, as described in man gitformat-
commit-graph(5).
This work is based on earlier work by Kyle Zhao
(I160f6b022afaa842c331fb9a086974e49dced7b2).
Change-Id: I82e02e6a3a3b758e6bf9d7bbd2198f0ffe3a331b Signed-off-by: kylezhao <kylezhao@tencent.com> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Jonathan Tan [Tue, 18 Apr 2023 22:20:02 +0000 (15:20 -0700)]
CommitGraphWriter: write changed-path filters
Add support for writing the BIDX and BDAT chunks of the commit graph
file, as described in man gitformat-commit-graph(5). The ability to read
such chunks will be added in a subsequent commit.
This work is based on earlier work by Kyle Zhao
(Ib863782af209f26381e3ca0a2c119b99e84b679c).
Change-Id: Ic18e6f0eeec7da1e1ff31751aabda5e6952dbe6e Signed-off-by: kylezhao <kylezhao@tencent.com> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Thomas Wolf [Sun, 9 Jul 2023 18:06:37 +0000 (20:06 +0200)]
ssh: PKCS#11 support
Support PKCS#11 HSMs (like YubiKey PIV) for SSH authentication.
Use the SunPKCS11 provider as described at [1]. This provider
dynamically loads the library from the PKCS11Provider SSH configuration
and creates a Java KeyStore with that provider. A Java CallbackHandler
is needed to feed PIN prompts from the KeyStore into the JGit
CredentialsProvider framework. Because the JGit CredentialsProvider may
be specific to a SSH session but the PKCS11Provider may be used by
several sessions, the CallbackHandler needs to be configurable per
session.
PIN prompts respect the NumberOfPasswordPrompts SSH configuration. As
long as the library asks only for a PIN, we use the KeyPasswordProvider
to prompt for it. This gives automatic integration in Eclipse with the
Eclipse secure storage, so a user has even the option to store the PIN
there. (Eclipse will then ask for the secure storage master password on
first access, so the usefulness of this is debatable.)
By default the provider uses the first PKCS#11 token (slot list index
zero). This can be overridden by a non-standard PKCS11SlotListIndex
ssh configuration entry. (For OpenSSH interoperability, also set
"IgnoreUnknown PKCS11SlotListIndex" in the SSH config file then.)
Once loaded, the provider and its shared library and the keys
contained remain available until the application exits.
Manually tested using SoftHSM. See file manual_tests.txt. Kudos to
Christopher Lamb for additional manual testing with a real YubiKey,
also on Windows.[2]
Matthias Sohn [Sun, 16 Jul 2023 22:56:17 +0000 (00:56 +0200)]
GC: Remove handling of extra pack for RefTree
RefTree was packed in its own packfile, see
Icbb735be8fa91ccbf0708ca3a219b364e11a6b83.
RefTree was deleted in Ia3da7f2b82d9e365cec2ccf9397cbc47439cd150, since
it was experimental and never used productively. This change missed to
remove the extra pack handling for RefTree.
Ivan Frade [Wed, 15 Dec 2021 00:35:54 +0000 (16:35 -0800)]
DfsPackParser: Create object indices if config says so
The DfsInserter writes the pack and its indices in the flush() method,
but when the writing happens via DfsPackParser, it is the parser which
writes the pack and indices. When combined with a parser, flushing the
inserter is a noop.
Add the writing of the object size index to the packparser#parse
method, mirroring how the primary index is written.
Ivan Frade [Wed, 29 Dec 2021 23:39:41 +0000 (15:39 -0800)]
DFSGarbargeCollector: Write object size indices
PackWriter knows how to add an object size index to the pack, but the
garbage collector is not using it yet.
Teach DfsGarbageCollector to write the object size index on
writePack(). Disable by default in the unreachable-garbage pack.
Callers control the content/presence of the index through the
PackConfig option (minBytesForObjSizeIndex) for all other packs, so
there is no need of a specific flag in DfsGarbageCollector.
Ivan Frade [Thu, 30 Dec 2021 16:33:08 +0000 (08:33 -0800)]
DfsReader/PackFile: Implement isNotLargerThan using the obj size idx
isNotLargerThan() can avoid reading the size of a blob from disk using
the object size idx if available.
Load the object size index in the DfsPackfile following the same
pattern than the other indices. Override isNotLargerThan in DfsReader
to use the index when available.
Following CL introduces the writing of the object size index and the
tests cover this code.
qin shulei [Wed, 28 Jun 2023 11:52:47 +0000 (19:52 +0800)]
Fix S3Repository getSize to handle larger object sizes
Update `getSize` method in `S3Repository` to handle larger object sizes.
The method previously used `Integer.parseInt`
to parse the `Content-Length` header of an HTTP response,
which limited the maximum object size to 2 GB.
Replaces `Integer.parseInt` with `Long.parseLong`,
allowing the method to handle object sizes larger than 2 GB.
- Use minio as local S3 service for gerrit lfs plugin
- The minio seems will return the Content-length
Anna Papitto [Tue, 27 Jun 2023 20:22:20 +0000 (13:22 -0700)]
DfsPackFile: make #getReverseIdx public
The DfsPackFile#getReverseIdx method, which wraps creating a
PackReverseIndex in caching, was package-private. This caused
implementations on top of DfsPackFile to directly instantiate a
PackReverseIndex in cases where it would benefit from caching.
Instead, make #getReverseIdx public so that the caching logic can be
reused by implementations where appropriate.
Change-Id: I4553e514a4ac320bfe2455c00023343ad97f9d15 Signed-off-by: Anna Papitto <annapapitto@google.com>
Anna Papitto [Tue, 20 Jun 2023 22:44:30 +0000 (15:44 -0700)]
PackReverseIndex: separate out the computed implementation
PackReverseIndex is a concrete class whose implementation is computed
from a pack's forward index. Callers which have a reverse index file may
want to use an implementation that is file-based instead.
Generalize PackReverseIndex into an interface without
implementation-specific logic and separate out the logic for the
computed implementation into a new concrete class.
Change-Id: I98d9835363c5e1c8c3c11a81b0761af3cdeaa41a Signed-off-by: Anna Papitto <annapapitto@google.com>
Thomas Wolf [Thu, 15 Jun 2023 20:08:06 +0000 (22:08 +0200)]
Default for global (user) git ignore file
C git has a default for git config core.excludesfile: "Its default
value is $XDG_CONFIG_HOME/git/ignore. If $XDG_CONFIG_HOME is either
not set or empty, $HOME/.config/git/ignore is used instead." [1]
Implement this in the WorkingTreeIterator$RootIgnoreNode.
To make this testable, mock the "user.home" directory for all JGit
tests, otherwise tests might pick up a real user's git ignore file.
Also ensure that JGit code always reads "user.home" via the
SystemReader.
Antoine Musso [Wed, 31 May 2023 15:57:28 +0000 (17:57 +0200)]
Fix all Javadoc warnings and fail on them
This fixes all the javadoc warnings, stops ignoring doclint 'missing'
category and fails the build on javadoc warnings for public and
protected classes and class members.
Since javadoc doesn't allow access specifiers when specifying doclint
configuration we cannot set `-Xdoclint:all,-missing/private`
hence there is no simple way to skip private elements from doclint.
Therefore we check javadoc using the Eclipse Java compiler
(which is used by default) and javadoc configuration in
`.settings/org.eclipse.jdt.core.prefs` files.
This allows more fine grained configuration.
We can reconsider this when javadoc starts supporting access specifiers
in the doclint configuration.
Below are detailled explanations for most modifications.
@inheritDoc
===========
doclint complains about explicits `{@inheritDoc}` when the parent does
not have any documentation. As far as I can tell, javadoc defaults to
inherit comments and should only be used when one wants to append extra
documentation from the parent. Given the parent has no documentation,
remove those usages which doclint complains about.
In some case I have moved up the documentation from the concrete class
up to the abstract class.
Remove `{@inheritDoc}` on overriden methods which don't add additional
documentation since javadoc defaults to inherit javadoc of overridden
methods.
@value to @link
===============
In PackConfig, DEFAULT_SEARCH_FOR_REUSE_TIMEOUT and similar are forged
from Integer.MAX_VALUE and are thus not considered constants (I guess
cause the value would depends on the platform). Replace it with a link
to `Integer.MAX_VALUE`.
In `StringUtils.toBoolean`, @value was used to refer to the
`stringValue` parameter. I have replaced it with `{@code stringValue}`.
{@link <url>} to <a>
====================
@link does not support being given an external URL. Replaces them with
HTML `<a>`.
@since: being invalid
=====================
org.eclipse.jgit/src/org/eclipse/jgit/util/Equality.java has an invalid
tag `@since: ` due to the extra `:`. Javadoc does not complain about it
with version 11.0.18+10 but does with 11.0.19.7. It is invalid
regardless.
invalid HTML syntax
===================
- javadoc doesn't allow <br/>, <p/> and </p> anymore, use <br> and <p>
instead
- replace <tt>code</tt> by {@code code}
- <table> tags don't allow summary attribute, specify caption as
<caption>caption</caption> to fix this
doclint visibility issue
========================
In the private abstract classes `BaseDirCacheEditor` and
`BasePackConnection` links to other methods in the abstract class are
inherited in the public subclasses but doclint gets confused and
considers them unreachable. The HTML documentation for the sub classes
shows the relative links in the sub classes, so it is all correct. It
must be a bug somewhere in javadoc.
Mute those warnings with: @SuppressWarnings("doclint:missing")
Misc
====
Replace `<` and `>` with HTML encoded entities (`< and `>`).
In `SshConstants` I went enclosing a serie of -> arrows in @literal.
Additional tags
===============
Configure maven-javad0c-plugin to allow the following additional tags
defined in https://openjdk.org/jeps/8068562:
- apiNote
- implSpec
- implNote
Missing javadoc
===============
Add missing @params and descriptions
Change-Id: I840056389aa59135cfb360da0d5e40463ce35bd0 Also-By: Matthias Sohn <matthias.sohn@sap.com>
Reason for revert: This change was based on the false claim that the
packedrefs file lock is held while the CAS is being done, but it is
actually released before the CAS (the in memory lock is still held,
however that does not prevent external actors from updating the
packedrefs files and then another thread from subsequently re-reading it
and updating the in memory packedRefList). Although reverting this
change can cause the CAS to fail, it should not actually matter since
the failure would indicate that another thread has already updated the
in memory packedRefList to either the same version this thread was
trying to update it too, or to a more recent version. Either way,
failing the CAS is then appropriate and should not be problematic.
Although this change reverts the code in the RefDirectory class, it
keeps the "improvements" to the test so that it continues to pass
reliably. The reason for the quotes around the word "improvements" is
because I believe the test alteration actually dramatically changes the
intent of the test, and that the original intent of the test is
untestable with the GC and RefDirectory classes as is.
Bug: 582044
Change-Id: I3acee7527bb542996dcdfaddfb2bdb45ec444db5 Signed-off-by: Martin Fick <quic_mfick@quicinc.com>
(cherry picked from commit c5617711a1b4d5d0807cc7eed702b78d114d46b3)
Anna Papitto [Tue, 30 May 2023 14:20:54 +0000 (16:20 +0200)]
PackReverseIndex: use static builder instead of constructor
PackReverseIndex instances are created using the constructor directly,
which limits control over the construction logic and refactoring
opportunities for the class itself. These will be needed for a
file-based implementation of the reverse index.
Use a static builder method to create a PackReverseIndex instance using
a pack's forward index.
Change-Id: I4421d907cd61d9ac932df5377e5e28a81679b63f Signed-off-by: Anna Papitto <annapapitto@google.com>
Anna Papitto [Tue, 30 May 2023 14:20:54 +0000 (16:20 +0200)]
Gc#writePack: write the reverse index file to disk
The reverse index is currently created in-memory when needed. A writer
for reverse index files was already implemented.
Make garbage collection write the reverse index file when the PackConfig
enables it. Write it during #writePack, which mirrors how the primary
index is written.
Change-Id: I50131af6622c41a7b24534aaaf2a423ab4178981 Signed-off-by: Anna Papitto <annapapitto@google.com>
David Ostrovsky [Thu, 25 May 2023 06:49:44 +0000 (08:49 +0200)]
Bazel: Fix remote build execution for Java 17
Bazel build and test with remote17 configuration was actually executed
locally, and not remotely.
Rename remote configuration setting for Java 11 from remote to remote11
and derive the common remote configuration settings for remote11 and
remote17 configurations.
Also remove deprecated --host_javabase and --javabase options.
Test Plan:
Verify that the remote build for Java 17 is actually executed remotely:
$ bazel test --config=remote17 --remote_instance_name=$PROJECT_NAME \
org.eclipse.jgit.test/...
Matthias Sohn [Wed, 24 May 2023 13:50:27 +0000 (15:50 +0200)]
Merge branch 'master' into stable-6.6
* master:
GraphObjectIndex: fix search in findGraphPosition
Update to Tycho 4.0.0-SNAPSHOT
PGP sign p2 artefacts
Revert 'Use net.i2p.crypto:eddsa directly from Maven Central'
Update dash license-tool-plugin to 1.0.2
Also add suppressed exception if unchecked exception occurs in finally
Candidate: use "Objects.equals" instead of "=="
Use hamcrest 2.2 directly from Maven Central
Use commons-logging directly from Maven Central
Update jna to 5.13.0
Use bytebuddy directly from Maven Central
Use jna directly from Maven Central
Use net.i2p.crypto:eddsa directly from Maven Central
Use org.tukaani:xz directly from Maven Central
Use args4j directly from Maven Central
Use gson directly from Maven Central
Remove unused $NON-NLS-1$
Remove unused API filters
Switch to Apache MINA sshd 2.10.0
[releng] API filter for PackIndex.DEFAULT_WRITE_REVERSE_INDEX
PackExt: add a #getTmpExtension method
UploadPack: Record negotiation stats on fetchV2 call
RewriteGeneratorTest: Introduce test cases for the RewriteGenerator
PackWriter: write the PackReverseIndex file
Jonathan Tan [Mon, 8 May 2023 20:55:40 +0000 (13:55 -0700)]
GraphObjectIndex: fix search in findGraphPosition
In findGraphPosition, when there is no object whose OID starts with
the first byte of the sought OID, low equals high. This violates an
invariant of the loop, and when the sought OID is lexicographically
greater than every other OID in the repository, causes an
ArrayIndexOutOfBoundsException (because we're trying to read outside the
list of OIDs).
Therefore, check the "low < high" condition at the start of the loop,
not only after the first iteration.
Change-Id: Ic8ac198c151bd161c4996b9e7cb6e6660f151733 Helped-by: Ivan Frade <ifrade@google.com> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Matthias Sohn [Tue, 9 May 2023 07:44:03 +0000 (09:44 +0200)]
Update to Tycho 4.0.0-SNAPSHOT
We need to update to Tycho in order to force PGP signing of the
bouncycastle libraries which isn't supported by earlier Tycho versions.
For that we need to run Maven on Java 17 or higher.
In order to run tests on Java 11 add a `toolchain.xml` file into the
`~/.m2` directory providing the path to Java installations:
Reason: the maven artifact has a broken MANIFEST.MF with a mandatory
dependency to sun.security.x509, which is an internal package in the
JDK and moreover not needed by the bundle except for one test class
that isn't in the bundle at all.
This extra dependency makes the JGit tycho packaging build fail when
Tycho 4 is used.
We must keep using the Orbit re-packaging of this artifact, which does
not have this unnecessary mandatory dependency.
Change-Id: Ica15a5ddcada09686de3055b2b3daf081e3c5ffc Signed-off-by: Thomas Wolf <twolf@apache.org>
Matthias Sohn [Thu, 18 May 2023 10:02:48 +0000 (12:02 +0200)]
Also add suppressed exception if unchecked exception occurs in finally
If a method called in a finally block throws an exception we should add
exceptions caught earlier to the exception we throw in the finally block
not regarding if it's a checked or unchecked exception.