FooterLines: handle extraction from messages without headers
Prior to this change, long subjects of messages with no headers were
treated as headers, and therefore were skipped. In a message of the
form `<long subject>\n\n<footers>`, the footers would then get parsed
as a message, meaning no footers were returned.
After this change, the first lines are skipped only if they match any
of the known headers. The first line ofter the optional headers is then
assumed to be the subject line.
`FooterLineTest` had a few test cases for extracting footers from
messages with no headers. However, there were all with short messages,
so the "skip this line" logic in `RawParseUtils` was never triggered.
Added test case to catch this issue.
Change-Id: I971a1dddf1a9aea094360c3c8fc3b9a8b011bbf9
Issue: Google b/287891316
Kamil Musin [Tue, 5 Dec 2023 15:22:08 +0000 (16:22 +0100)]
FooterLine: Protect from ill-formed message
A raw commit message has some headers and then the actual
message. RawParseUtils.commitMessage returns the start position of the
actual message, or -1 when the message is not raw. FooterLine is not
handling this -1 and throws an IndexOutOfBounds exception.
Consider than msgB can be -1 when looking for the beginning of the
last paragraph.
FooterLine javadoc and parameter talk only about "raw" but previous
code accepted non-raw messages (used mostly in unit tests) so we need
to keep this behavior.
Ronald Bhuleskar [Wed, 25 Oct 2023 20:10:33 +0000 (13:10 -0700)]
StartGenerator: Fix parent rewrite with non-default RevFilter
StartGenerator is responsible for propagating the RevWalk's
parent rewrite setting, but it currently only does so when a
non-default TreeFilter is set, when it should also do so if
the default TreeFilter is used with a non-default RevFilter.
Adding a new if condition within StartGenerator to enable parent
rewrite with non-default RevFilter.
TreeRevFilter relied on the old buggy functionality and has
been modified to explicitly refrain from rewriting parents.
Ivan Frade [Fri, 17 Nov 2023 18:44:56 +0000 (10:44 -0800)]
PackWriter: store the objects with bitmaps in the statistics
We want to know what objects had bitmaps in the walk of the
request. We can check their position in the history and evaluate
our bitmap selection algorithm.
Use the listener interface of the BitmapWalker to get the objects
walked with bitmaps and store them in the statistics.
Ivan Frade [Wed, 29 Nov 2023 18:51:04 +0000 (10:51 -0800)]
FooterLine: First line cannot be a footer
The first line of the commit message cannot be a footer line. This
restriction was dropped in commit [1] while adding multiline
footers. This affects at least the git-numberer gerrit plugin, that
even have a test for it [2].
Reintroduce the restriction that the first line of the commit message
cannot be a footer and bring the test from git-numberer to jgit.
This breaks a test in the git_numberer gerrit plugin used by chromium
[1]. The test checks that first line is never a footer, which sounds
right. That test should be included in FooterLineTest.
Matthias Sohn [Mon, 27 Nov 2023 22:34:02 +0000 (23:34 +0100)]
Merge branch 'master' into stable-6.8
* master:
Adapt to type parameter added in commons-compress 1.25.0
Improve footer parsing to allow multiline footers.
Make the tests buildable by bazel test
BitmapIndex: Add interface to track bitmaps found (or not)
BitmapWalker: Remove BitmapWalkListener
Kamil Musin [Fri, 24 Nov 2023 14:17:26 +0000 (15:17 +0100)]
Improve footer parsing to allow multiline footers.
According to the https://git-scm.com/docs/git-interpret-trailers the
CGit supports multiline trailers. Subsequent lines of such multiline
trailers have to start with a whitespace.
We also rewrite the original parsing code to make it easier to work
with. The old code had pointers moving both backwards and forwards at
the same time. In the rewritten code we first find the start of the last
paragraph and then do all the parsing.
Since all the getters of the FooterLine return String, I've considered
rewriting the parsing code to operate on strings. However the original
code seems to be written with the idea, that the data is only lazily
copied in getters and no extra allocations should be performed during
original parsing (ex. during RevWalk). The changed code keeps to this
idea.
Bug: Google b/312440626
Change-Id: Ie1e3b17a4a5ab767b771c95f00c283ea6c300220
Ivan Frade [Mon, 20 Nov 2023 20:01:12 +0000 (12:01 -0800)]
BitmapIndex: Add interface to track bitmaps found (or not)
We want to know what objects had bitmaps in the walk of the
request. We can check their position in the history and evaluate our
bitmap selection algorithm.
Introduce a listener interface to the BitmapIndex to report which
getBitmap() calls returned a bitmap (or not) and a method to the
bitmap index to set the listener.
florian.signoret [Mon, 16 Oct 2023 14:08:14 +0000 (16:08 +0200)]
Fix branch ref exist check
When a tag with the same name as the branch exists, the branch creation
process should work too. We should detect that the branch already
exists, and allow to force create it when the force option is used.
Ivan Frade [Fri, 17 Nov 2023 18:11:17 +0000 (10:11 -0800)]
Adjust javadoc to pass errorprone checks
bazel build //org.eclipse.jgit.test:all doesn't build due to
errorprone errors. This leds to disabling the checks and find the
errors later in the jenkins builder.
Fix javadoc errors reported by errorprone. This doesn't fix completely
the build but it is a step towards it.
Ivan Frade [Thu, 16 Nov 2023 23:12:47 +0000 (15:12 -0800)]
BitmapWalkListener: Use plain interface with noop instance
In this new interface default methods are useful only to instantiate
noop instances. We rather reuse the same noop instance and save the
"default" to add backward compatible methods to existing interfaces.
Make the methods regular interface methods and provide a noop
instance.
Ivan Frade [Thu, 16 Nov 2023 18:46:07 +0000 (10:46 -0800)]
BitmapWalkListener: Add method and rename for commits
During the walk, the commit can be either
1. already in the walk bitmap
2. unvisited so far with bitmap in the bitmap index
3. unvisited so far without bitmap in the bitmap index
Expose these three states in the interface. This makes the interface
easier to explain: it reports the commits found during the walk.
As it is all about commits, rename the methods to onCommit***.
PatchApplier: wrap output's TemporaryBuffer with a CountingOutputStream
The documentation for TemporaryBuffer::length says:
"The length is only accurate after {@link #close()} has been invoked".
However, we need to have the stream open while accessing the length.
This prevents patches on large files to be applied correctly, as the
result get trimmed.
Bug: Google b/309500446
Change-Id: Ic1540f6d0044088f3b46f1fad5f6a28ec254b711
Matthias Sohn [Wed, 15 Nov 2023 17:05:54 +0000 (18:05 +0100)]
Merge branch 'master' into stable-6.8
* master: (42 commits)
Fix typo in constant name CONFIG_KEY_STREAM_FILE_TRESHOLD
Simplify StringUtils#commonPrefix
Optimize RefDirectory.getRefsByPrefix(String...)
Use try-with-resource to ensure UploadPack is closed
Fix hiding field warning
Fix warning for empty code blocks
Fix boxing warnings
errorprone: remove unnecessary parentheses
Update mockito to 5.7.0 and bytebuddy to 1.14.9
Enable Maven reproducible builds
Upgrade bazlets to the latest revision
Revert "Optimise Git protocol v2 `ref-prefix` scanning"
Document GIT_TRACE_PERFORMANCE to show timings
config-options.md: fix sort order
ComboBitset: Add Javadoc
CommitGraphWriter: Add progress monitor to bloom filter computation
CommitGraphWriter: Use ProgressMonitor from the OutputStream
CommitGraphWriter: Unnest generation-number progress
Optimise Git protocol v2 `ref-prefix` scanning
UploadPackTest: Cover using wanted-refs as advertised set
...
Dariusz Luksza [Thu, 9 Nov 2023 11:18:38 +0000 (11:18 +0000)]
Optimize RefDirectory.getRefsByPrefix(String...)
Currently for file-based repositories JGit will go over all refs in the
repository forach `ref-prefix` listed in the `ls-refs` command in git
protocol v2 request.
Native git, uses a different approach, where all refs are read once and
then for each ref, all `ref-prefix` filter values are checked in one
pass.
This change implements this approach in JGit only in the `RefDirectory`
backend. And makes `ref-prefix` filtering ~40% faster for repositories
with packed refs.
Different implementations were tested on a synthetic file repository
with 10k refs in `refs/heads/` and `290k` in `refs/changes`. Before
testing `git pack-refs` command was executed. All results are in
seconds.
The elapsed time was measured around `getRefByPrefix` call in
`Uploadapack.getFilteredRefs(Collection<String>)` (around lines 952 and
954). For testing a modified version of
`UploadPackTest.testV2LsRefsRefPrefix()` was used. The modifications
here included:
* shadowing protected `repo` variable with `FileRepository` pointing
to the synthetic repo with 300k refs described above,
* mimicking the git client clone request by adding `ref-prefix HEAD`,
`ref-prefix refs/heads/` and `ref-prefix refs/tags/`
Based on the above results, the implementation with parallel stream and
stream was selected.
Change [1] reduced the scope of the "writing commit graph" monitoring
task. This left some monitor#update() calls out of any task. When out
of a task, the #update call is a noop.
Delete this update calls as they are noops and misleading.
The affected chunks are usually small and quick to write, so probably
they don't need progress monitoring.
* changes:
Use try-with-resource to ensure UploadPack is closed
Fix hiding field warning
Fix warning for empty code blocks
Fix boxing warnings
errorprone: remove unnecessary parentheses
Update mockito to 5.7.0 and bytebuddy to 1.14.9
Enable Maven reproducible builds
Upgrade bazlets to the latest revision
Matthias Sohn [Thu, 5 Oct 2023 23:10:40 +0000 (01:10 +0200)]
Enable Maven reproducible builds
- configure Maven to run build reproducibly [1]
- use UTC timestamp of checked out commit as build timestamp
- add git-describe, git-commit-id, git-commit-id, git-tags,
git-remote-origin-url to MANIFEST.MF files
- configure cyclonedx-maven-plugin to also use UTC timestamp of
checked out commit
- for packaging build use tycho-buildtimestamp-jgit [2] to ensure
version uses the timestamp of the last commit
- SBOMs are not reproducible by design [3] they should have a build
timestamp matching the time when the build was executed and a serial
number which is a unique UUID per build run. Hence exclude them from
comparison [4].
- Use gmavenplus-plugin to format build timestamps. Maven expects
build timestamp in ISO-8601 format, to replace the qualifier in
versions the timestamp format must be compatible with rules for OSGi
version numbers. Didn't find a way to read the properties set by the
git-commit-id-maven-plugin from another plugin. Hence use JGit in a
groovy script to get the commit time of the current HEAD and provide
it in these two formats.
TODO: packaging build (features and p2 repository) is not yet binary
reproducible since that's not yet supported by Tycho [5], artefacts have
reproducible version numbers but file lastModified timestamps are not
yet reproducible.
Test plan for Maven build:
- build using
mvn clean install"
- verify second build is reproducible:
mvn -T1 clean verify artifact:compare
verification seems not to be thread-safe, hence run it with a single
thread using option -T1
For packaging build (still fails due to non-reproducible file
timestamps):
- build using
mvn -f org.eclipse.jgit.packaging/pom.xml clean install
- verify second build is reproducible:
mvn -T1 -f org.eclipse.jgit.packaging/pom.xml clean verify artifact:compare
Ivan Frade [Tue, 7 Nov 2023 22:00:05 +0000 (14:00 -0800)]
CommitGraphWriter: Use ProgressMonitor from the OutputStream
The same progress monitor is passed around as parameter and inside the
output stream. The functions use one to start tasks and another to
report progress, which is confusing. The stream needs the monitor to
check cancellations so we cannot remove it from there.
The ProgressMonitor task to track the calculation of generation
numbers is nested inside the task that follows the writing of all
lines in the commit-graph. ProgressMonitor doesn't support nested
tasks and this confuses the counting.
Move the start/end of the "writing commit graph" task to the
writeCommitData section, after calculating the generation
numbers. Make that task track by commits instead of by lines.
Moving the start/end of the progress task to the chunk-writing
functions is clearer and easier to extend.
Logging of GC before:
Writing out commit-graph in 3 passes: 51% ( 9807/19358)
Computing commit-graph generation numbers: 100% (9551/9551)
Logging of GC after:
Computing commit-graph generation numbers: 100% (9551/9551)
Writing out commit-graph: 100% (9551/9551)
Dariusz Luksza [Fri, 3 Nov 2023 11:52:03 +0000 (11:52 +0000)]
Optimise Git protocol v2 `ref-prefix` scanning
Currenty JGit will go over all refs in the repository for each
`ref-prefix`. This means that refs will be read multiple
times, which leads to subpar performance.
Native git, uses a different approach, where all refs are read once
and then for each ref, all `ref-prefix` filter values are checked in
one pass.
This change implements this approach in JGit. And makes `ref-prefix`
filtering ~28% faster for a repository with fully packed refs
and ~5% when RefTable is used instead of refdir.
Different implementations were tested on a synthetic file repository
with 300k refs. Different implementations were tested for unpacked and
fully packed refs (results are in seconds).
Ivan Frade [Tue, 24 Oct 2023 16:37:13 +0000 (09:37 -0700)]
UploadPackTest: Cover using wanted-refs as advertised set
Parent change introduced using "wanted-refs" as advertised set during
fetchV2, but it tested only a request without wantIds (only
wanted-refs).
Add a second where the request has wanted-refs AND wantId (which
disables the optimization). Change the test to measure the amount of
refs considered advertised, instead of relying in calls to the
refdb.
Patrick Hiesel [Fri, 13 May 2022 11:49:02 +0000 (13:49 +0200)]
UploadPack: use want-refs as advertised set in fetch v2
Protocol v2 introduced refs-in-wants and ls-remote with
prefixes. UploadPack already uses prefixes provided by the client
during a v2 ref advertisement (ls-refs). However, when the client
consequently sends another request to fetch a previously advertised
ref (with want-ref lines), the server uses the whole set of advertised
refs to compute reachability.
In repos with many refs, this slows down the reachability checks
setting up and walking through unnecessary refs. For gerrit it can
also break valid requests because in gerrit "all" means "recent" and
the wanted-ref could fall out of the "recent" range when reloading all
refs at fetch time.
Treat wanted-refs like a ref-prefix when calculating the advertised
refs on v2 fetch command. Less refs means a faster setup and less walk
for the reachability checks. Note that wanted-refs filters only over
the refs visible to the user, so this doesn't give any extra
visibility to the caller.
If the request contains also "want <oid>" lines, we cannot use this
optimization. Those objects could be reachable from any visible
branch, not necessarily in the wanted-refs.
Ronald Bhuleskar [Thu, 21 Sep 2023 23:12:06 +0000 (16:12 -0700)]
BasePackFetchConnection: Avoid full clone with useNegotiationTip
With the useNegotiationTip flag (introduced in change 738dacb), the client sends to the server only the tips of the wanted refs for the negotiation. Some wanted refs may not exist in the client (yet) and our implementation ignores them. So when only non-existing refs are wanted, jgit doesn't send any tips and the server understands it is a full clone.
In useNegotiationTip, send ALL_REFS if any of the wanted refs does not exists locally.
Matthias Sohn [Tue, 17 Oct 2023 09:25:12 +0000 (11:25 +0200)]
Silence API warnings for API added in 5.13.3
This was added in
- f103a1d5c605 "Add support for git config repack.packKeptObjects"
- f5f4bf0ad97f "Do not exclude objects in locked packs from bitmap
processing"
Thomas Wolf [Sat, 14 Oct 2023 21:25:19 +0000 (23:25 +0200)]
FileBasedConfig: in-process synchronization for load() and save()
On Windows reading and replacing a file via renaming concurrently may
fail either in the reader or in the thread renaming the file. For
renaming, FileUtils.rename() has a last-case fallback in which it
deletes the target file before attempting the rename. If a reader reads
at that moment, it will produce an empty config, and the snapshot and
hash may be wrong because the concurrently running save() may set them.
It's not really possible to do all this in a thread-safe manner without
some synchronization. Add a read-write lock to synchronize readers and
writers to avoid at least that JGit steps on its own feet.
Bug: 451508
Change-Id: I7e5f0f26e02f34ba02dc925a445044d3e21389b4 Signed-off-by: Thomas Wolf <twolf@apache.org>
Thomas Wolf [Sun, 8 Oct 2023 20:03:22 +0000 (22:03 +0200)]
FileUtils.rename(): better retry handling
When the atomic move fails on Windows, it may be because some other
thread is currently reading the destination. If we delete the file
then, that reader may get an exception, and conclude the file didn't
exist, even though the rename() would re-create it right away.
Try to avoid this from happening frequently by only deleting the
destination on the last retry. Also don't sleep after the last attempt.
Bug: 451508
Change-Id: I95bb4ec59d6e7efb4a7fc8d67f5df301f690257a Signed-off-by: Thomas Wolf <twolf@apache.org>
Thomas Wolf [Sun, 8 Oct 2023 19:50:34 +0000 (21:50 +0200)]
DeleteBranchCommand: update config only at the end
When multiple branches were to be removed, the git config was updated
after each and every branch. Newly do so only once at the end, after all
branches have been deleted.
Because there may be an exception after some branches have already been
deleted, take care to update the config even if an exception is thrown.
Bug: 451508
Change-Id: I645be8a1a59a1476d421e46933c3f7cbd0639fec Signed-off-by: Thomas Wolf <twolf@apache.org>
Thomas Wolf [Sun, 8 Oct 2023 19:47:05 +0000 (21:47 +0200)]
Config.removeSection() telling whether it changed the config
Add a variant of unsetSection() that returns whether it did indeed
change the config. This can be used in to skip saving the config if
it was not changed.
Also fix the iteration over the entries: lastWasMatch was never reset,
and thus all empty lines after a match would be removed.
Change-Id: Iea9e84aa74b1e4bb3c89efe3936fa3a8a09532e5 Signed-off-by: Thomas Wolf <twolf@apache.org>