Ivan Frade [Wed, 10 Apr 2024 20:52:56 +0000 (13:52 -0700)]
DfsPackFile: Abstract the loading of pack indexes
DfsPackFile assumes that the indexes are stored in file streams and
their references need to be cached in DFS. This doesn't allow us to
experiment other storage options, like key-value databases. In these
experiments not all indexes are together in the same storage.
Define an interface per index to load it, so implementors can focus on
the specifics of their index. Put them together in the IndexFactory
interface. The implementation of the IndexFactory chooses the right
combination of storages.
At the moment we do this only for primary and reverse
indexes. Following changes can do the same for other indexes.
Laura Hamelin [Fri, 7 Jun 2024 23:11:21 +0000 (16:11 -0700)]
PackExtBlockCacheTable: spread extensions over multiple dfs tables
The existing DfsBlockCache uses a single table for all
extensions (idx, ridx, ...).
This change introduces an implementation of the table
interface that can keep extensions in different cache
tables.
This selects the appropriate cache to use for a specific
PackExt or DfsStreamKey's PackExt type, allowing the
separation of entries from different pack types to help
limit churn in cache caused by entries of differing sizes.
This is especially useful in fine-tuning caches and
influencing interactions by extension type.
For example, a table holding INDEX types only will
not influence evictions of other PackExt types and
vice versa.
The PackExtBlockCacheTable allowing setting the
underlying DfsBlockCacheTables and mappinh directly,
letting users implement and use custom DfsBlockCacheTables.
Ivan Frade [Fri, 19 Jul 2024 22:44:15 +0000 (15:44 -0700)]
PackObjectSizeIndex: Read all bytes and use the byte[] directly
The parser reads N integers one by one from the stream, assuming the
InputStream does some ahead reading from storage. We see some very
slow loading of indexes and suspect that this preemptive reading is
not happening. The slow loading can be reproduced in clones, and it
produces higher latencies and locks many threads waiting for the
loading.
Read the whole array from storage in one shot to avoid many small IO
reads. Work directly on the resulting byte[], so there is no need of a
second copy to cast to int/long.
This is how other indexes, like primary or commit graph, work.
Ivan Frade [Wed, 10 Apr 2024 17:52:44 +0000 (10:52 -0700)]
DfsPackFile: Do not set local reverse index ref from cache callback
The DfsBlockCache loading callback sets the local reference to the
index in the DfsPackFile. This prevents abstracting the loading to
implement it over multiple backends.
Reorg the code so the loadReverseIndex do only the loading, the caller
sets it into DfsBlockCache and the external code sets the local
reference in DfsPackFile.
This is the same pattern we did with the PackIndex in the parent
commit.
Laura Hamelin [Mon, 15 Jul 2024 18:35:56 +0000 (11:35 -0700)]
DfsBlockCacheTable: extract stats get* methods to interface
Having the DfsBlockCacheTable methods extracted to an interface will
allow alternative implementations of BlockCacheStats not tied to the
current implementation.
Based on deritative work done in Andre's work in [1].
This change focuses on adding support for reading the repository
state when branches are checked out using git's worktrees.
I've refactored original work by removing all unrelevant
changes which were mostly around refactoring to extract
i.e. constants which mostly created noise for a review.
I've tried to address original review comments:
- Not adding non-behavioral changes
- "HEAD" should get resolved from gitDir
- Reftable recently landed in cgit 2.45,
see https://github.com/git/git/blob/master/Documentation/RelNotes/2.45.0.txt#L8
We can add worktree support for reftable in a later change.
- Some new tests to read from a linked worktree which
is created manually as there's no write support.
Laura Hamelin [Fri, 7 Jun 2024 23:12:18 +0000 (16:12 -0700)]
DfsBlockCacheConfig: support configurations for dfs cache tables per extensions
Parse configurations for tables containing a set of extensions,
defined in [core "dfs.*"] sections.
Parse configurations for cache tables according to configurations
defined in [core "dfs.*"] git config sections for sets of
extensions. The current [core "dfs"] is the default to any
extension not listed in any other table.
Configuration falls back to the defaults defined in the
DfsBlockCacheConfig.java file when not set on each cache
table configuration.
Sample format for individual cache tables:
In this example:
1. PACK types would go to the "default" table
2. INDEX and BITMAP_INDEX types would go to the
"multipleExtensionCache" table
3. REFTABLE types would go to the "reftableCache" table
[core "dfs"] // Configuration for the "default" cache table.
blockSize = 512
blockLimit = 100
concurrencyLevel = 5
(...)
Ivan Frade [Mon, 1 Jul 2024 19:24:36 +0000 (12:24 -0700)]
DfsPackFile: Enable/disable object size index via DfsReaderOptions
DfsPackFile always uses the object size index if available. That is
the desired final state, but for a safe rollout, we should be able to
disable using the object size index.
Add an option (dfs.useObjectSizeIndex) to enable/disable the usage of
the object size index. False by default.
This changes the default from true to false. It only makes a different
for the DFS stack when writing of the index was explicitely
enabled. This is an optimization, so it shouldn't cause any
regression. Operators can restore previous behaviour setting
"dfs.useObjectSizeIndex" to true.
RepoProject: read the 'dest-branch' attribute of a project
The manifest spec [1] defines a "dest-branch" attribute. Parse its
value and store it in the RepoProject. Also, create a getter/setter
for dest-branch.
Applications using JGit such as Gerrit plugins may have their own
manifest parsers. They can start using RepoProject to some extent
with this change. Eventually, they can be migrated to use the
ManifestParser in JGit, however until then, this change can help
make the migration incremental.
Ivan Frade [Tue, 9 Apr 2024 20:25:43 +0000 (13:25 -0700)]
DfsPackFile: Do not set primery index local ref from cache callback
DfsPackFile assumes the indices are in pack streams, but we would like
to consider other formats and storage. Currently, the local ref in the
DfsPackFile to the index is set in the cache loading callback, which
prevents abstracting the loading.
Reorganize the code so: the loadPackIndex function just parses the bytes
returning a reference and the caller sets the loaded index in the local
ref and DfsBlockCache.
We will follow this pattern with other indices in follow-up
changes. Note that although DfsPackFile is used only in one thread,
the loading in DfsBlockCache can happen from multiple threads
concurrently and we want to keep only one ref around.
Ivan Frade [Thu, 6 Jun 2024 19:01:04 +0000 (12:01 -0700)]
RepoCommand: Add error to ManifestErrorException
RepoCommand wraps errors in the manifest in a ManifestErrorException
with a fixed message ("Invalid manifest"). Callers like supermanifest
plugin cannot return a meaningful error to the client without digging
into the cause chain.
Add the actual error message to the ManifestErrorException, so callers
can rely on #getMessage() to see what happens.
Matthias Sohn [Tue, 4 Jun 2024 15:18:14 +0000 (17:18 +0200)]
Merge branch 'next'
* next:
Bump jetty version to 12.0.9 and servlet-api to 6.0
Bump jetty version to 11.0.20
Update minimum Java version to 17
Prepare 7.0.0-SNAPSHOT builds
Ivan Frade [Fri, 31 May 2024 19:12:57 +0000 (12:12 -0700)]
CommitGraphWriter: Move path diff calculation to its own class
To verify that we have the right paths between commits we are writing
the bloom filters, reading them and querying. The path diff
calculation is tricky enough for correctness and performance that
should be tested on its own.
Move the path diff calculation to its own class, so we can test it on
its own.
This is a noop refactor so we can verify later the steps taken in the
walk.
Ivan Frade [Thu, 30 May 2024 21:04:56 +0000 (14:04 -0700)]
RepoCommand: Copy manifest upstream into .gitmodules ref field
Project entries in the manifest with a specific sha1 as revision can
use the "upstream" field to report the ref pointing to that sha1. This
information is very valuable for downstream tools, as they can limit
their search for a blob to the relevant ref, but it gets lost in the
translation to .gitmodules.
Save the value of the upstream field when available/relevant in the
ref field of the .gitmodules entry.
Ivan Frade [Thu, 30 May 2024 17:56:20 +0000 (10:56 -0700)]
RepoProject: read the "upstream" attribute of a project
The manifest spec [1] defines the "upstream" attribute: "name of the
git ref in which a sha1 can be found", when the revision is a
sha1. The parser is ignoring it, but RepoCommand could use it to
populate the "ref=" field of pinned submodules.
Parse the value and store it in the RepoProject.
RepoProject is public API and the current constructors are not
telescopic, so we cannot just add a new constructor with an extra
argument. Use plain getter/setters.j
Matthias Sohn [Tue, 28 May 2024 22:20:58 +0000 (00:20 +0200)]
Merge branch 'master' into stable-6.10
* master:
PatchApplier.Result.Error: mark fields final
Update tycho to 4.0.8
Update to org.assertj:assertj-core:3.26.0
PatchApplier: Set a boolean on the result if conflict markers were added
PatchApplier: Add test for conflict markers on a deleted file
Update org.apache.commons:commons-compress to 1.26.2
Remove version override of commons-codec
Update spring-boot-maven-plugin to 2.7.18
Update jacoco-maven-plugin to 0.8.12
Update maven-source-plugin to 3.3.1
Update maven-shade-plugin to 3.5.3
Update maven-pmd-plugin to 3.22.0
Update cyclonedx-maven-plugin to 2.8.0
Update build-helper-maven-plugin to 3.6.0
Update maven-site-plugin to 4.0.0-M14
Update maven-jar-plugin to 3.4.1
Update maven-install-plugin to 3.1.2
Update maven-deploy-plugin to 3.1.2
Update maven-artifact-plugin to 3.5.1
Update tycho to 4.0.7 and set minimum maven version to 3.9.0
Update git-commit-id-maven-plugin to 8.0.2
Update spotbugs-maven-plugin to 4.8.5.0
Update japicmp-maven-plugin to 0.21.2
Update maven-compiler-plugin to 3.13.0
Update bytebuddy to 1.14.16
Update com.google.code.gson:gson to 2.11.0
Patrick Hiesel [Mon, 27 May 2024 08:16:34 +0000 (10:16 +0200)]
PatchApplier: Add test for conflict markers on a deleted file
For deleted files, we want to keep erroring out even if conflicts
are allowed for the apply patch logic. The resulting file would
otherwise only consist of the patch.
* changes:
Update spring-boot-maven-plugin to 2.7.18
Update jacoco-maven-plugin to 0.8.12
Update maven-source-plugin to 3.3.1
Update maven-shade-plugin to 3.5.3
Update maven-pmd-plugin to 3.22.0
Update cyclonedx-maven-plugin to 2.8.0
Update build-helper-maven-plugin to 3.6.0
Update maven-site-plugin to 4.0.0-M14
Update maven-jar-plugin to 3.4.1
Update maven-install-plugin to 3.1.2
Update maven-deploy-plugin to 3.1.2
Update maven-artifact-plugin to 3.5.1
Update tycho to 4.0.7 and set minimum maven version to 3.9.0
Update git-commit-id-maven-plugin to 8.0.2
Thomas Wolf [Sat, 25 May 2024 15:03:34 +0000 (17:03 +0200)]
Remove version override of commons-codec
Since commit 8164155b the commons-codec version is pinned in the parent
POM's dependency management. Remove the version specification in
org.eclipse.jgit/pom.xml.
Also give the package-import in the MANIFEST.MF an upper bound.
Change-Id: I2785a87cf77d6df110f57a0cb939dbc9772b8ee6 Signed-off-by: Thomas Wolf <twolf@apache.org>
Ivan Frade [Thu, 16 May 2024 19:28:53 +0000 (12:28 -0700)]
WalkFetchConnection: Remove marked packs on all function exits
[1] replaces Iterator.remove() with a list of "toRemove" that gets
processed when returning at the end. There are two others returns in
the function where the list is not processed.
Let the method report the broken packages and wrap it so the caller
can clean them up in any case.
In https://gerrithub.io/c/eclipse-jgit/jgit/+/1194015, LinkedList was
replaced with ArrayList in DfsReader and WalkFetchConnection. In this
case, the Iterator.remove() method of List is called, which is an O(n)
operation for ArrayList. This results in an O(n^2) algorithm.
Instead of reverting to LinkedList, use a HashSet and LinkedHashmap
instead. This maintains O(1) removal, and is less likely to be treated
as an antipattern than LinkedList.
A likely innocuous usage of Iterator.remove() in UnionInputStream was
also fixed.
Patrick Hiesel [Fri, 10 Mar 2023 15:50:37 +0000 (16:50 +0100)]
Allow applying a patch with conflicts
In some settings, we want to let users apply a patch that does
not cleanly apply and add conflict markers. In Gerrit, this is
useful when cherry picking (via Git patches) from one host to
another.
This commit takes a simple approach: If a hunk doesn't apply,
go to the pre-image line, treat all lines in pre-image length
as left side of the conflict and all context and newly added
lines as right side of the conflict.
Thomas Wolf [Mon, 6 May 2024 17:32:12 +0000 (19:32 +0200)]
sshd: fix IdentiesOnly if SSH agent is enabled and has keys
Commit a44b9e8bf changed the logic so that we try to read a public key
from the file given first, and only then try the file with the ".pub"
extension. Unfortunately the exception handling was not sufficient to
correctly deal with the given file containing a private key.
Apache MINA SSHD may throw a StreamCorruptedException when one tries
to read a public key from a file containing a private key. Handle
this exception in addition to GeneralSecurityException, and change
the order of exception handlers because StreamCorruptedException is
an IOException.
Bug: jgit-53
Change-Id: I7dddc2c11aa75d7663f7fe41652df612bf8c88cd Signed-off-by: Thomas Wolf <twolf@apache.org>