source.dussan.org Git - jgit.git/log

Merge branch 'master' into stable-6.5

* master:
  Externalize strings introduced in c9552aba
  Silence API error introduced by 596c445a
  PackConfig: add entry for minimum size to index
  Fix getPackedRefs to not throw NoSuchFileException
  PackObjectSizeIndex: interface and impl for the object-size index
  UInt24Array: Array of unsigned ints encoded in 3 bytes.
  PackIndex: expose the position of an object-id in the index
  Add pack options to preserve and prune old pack files
  DfsPackFile/DfsGC: Write commit graphs and expose in pack
  ObjectReader: Allow getCommitGraph to throw IOException
  Allow to perform PackedBatchRefUpdate without locking loose refs
  Document option "core.sha1Implementation" introduced in 59029aec
  UploadPack: consume delimiter in object-info command
  PatchApplier fix - init cache with provided tree
  Avoid error-prone warning
  Fix unused exception error-prone warning
  UploadPack: advertise object-info command if enabled
  Move MemRefDatabase creation in a separate method.
  DfsReaderIoStats: Add Commit Graph fields into DfsReaderIoStats

Change-Id: Ic9f91f2139432999b99c444302457b3c08911009

Externalize strings introduced in c9552aba

Change-Id: I81bb78344df61e6eb42622fcef6235d4da0ae052

Silence API error introduced by 596c445a

Change-Id: I961ba2d89c11373ccb81e6450d7d951204ffca36

Merge branch 'stable-6.4'

* stable-6.4:
  Fix getPackedRefs to not throw NoSuchFileException
  Add pack options to preserve and prune old pack files
  Allow to perform PackedBatchRefUpdate without locking loose refs
  Document option "core.sha1Implementation" introduced in 59029aec

Change-Id: I36051c623fcd480aa80ed32b4e89f9bdd1b798e0

Merge branch 'stable-6.3' into stable-6.4

* stable-6.3:
  Fix getPackedRefs to not throw NoSuchFileException
  Add pack options to preserve and prune old pack files
  Allow to perform PackedBatchRefUpdate without locking loose refs
  Document option "core.sha1Implementation" introduced in 59029aec

Change-Id: I1073098fb06eabafdb3c5e7fcf44d55b86a1b152

Merge branch 'stable-6.2' into stable-6.3

* stable-6.2:
  Fix getPackedRefs to not throw NoSuchFileException
  Add pack options to preserve and prune old pack files
  Allow to perform PackedBatchRefUpdate without locking loose refs
  Document option "core.sha1Implementation" introduced in 59029aec

Change-Id: I765c7302ce84a6a9c28fdef29da2bfaa49477c6e

PackConfig: add entry for minimum size to index

The object size index can have up to #(blobs-in-repo) entries, taking
a relevant amount of memory. Let operators configure the threshold size
to include objects in the size index.

The index will include objects with size *at or above* this
value (with -1 for none). This is more effective for the
filter-by-size case.

Lowering the threshold adds more objects to the index. This improves
performance at the cost of memory/storage space. For the object-size
case, more calls will use the index instead of reading IO. For the
filter-by-size case, lower threshold means better granularity (if
ObjectReader#isSmallerThan is implemented based only on the index).

Change-Id: I6ccd9334adbbc2abf95fde51dbbfc85b8230ade0

Merge branch 'stable-6.1' into stable-6.2

* stable-6.1:
  Fix getPackedRefs to not throw NoSuchFileException
  Add pack options to preserve and prune old pack files
  Allow to perform PackedBatchRefUpdate without locking loose refs
  Document option "core.sha1Implementation" introduced in 59029aec

Change-Id: Id32683d5f506e082d39af269803bccee0280cc27

Merge branch 'stable-6.0' into stable-6.1

* stable-6.0:
  Add pack options to preserve and prune old pack files
  Allow to perform PackedBatchRefUpdate without locking loose refs
  Document option "core.sha1Implementation" introduced in 59029aec

Change-Id: I876a38c2de8b7d5eaacd00e36b85599f88173221

Merge branch 'stable-5.13' into stable-6.0

* stable-5.13:
  Add pack options to preserve and prune old pack files
  Allow to perform PackedBatchRefUpdate without locking loose refs
  Document option "core.sha1Implementation" introduced in 59029aec

Change-Id: I423f410578f5bbe178832b80fef8998a5372182c

Fix getPackedRefs to not throw NoSuchFileException

Since Files.newInputStream is from java.nio package, it throws
java.nio.file.NoSuchFileException. This was missed in the change
I00da88e. Without this change, getPackedRefs fails with
NoSuchFileException when there is no packed-refs file in a project.

Change-Id: I93c202ddb73a0a5979af8e4d09e45f5645664b45
Signed-off-by: Prudhvi Akhil Alahari <quic_prudhvi@quicinc.com>

PackObjectSizeIndex: interface and impl for the object-size index

Operations like "clone --filter=blob:limit=N" or the "object-info"
command need to read the size of the objects from the storage. An
index would provide those sizes at once rather than having to seek in
the packfile.

Introduce an interface for the Object-size index. This index returns
the inflated size of an object. Not all objects could be indexed (to
limit memory usage).

This implementation indexes only blobs (no trees, nor
commits) *above* certain size threshold (configurable). Lower
threshold adds more objects to the index, consumes more memory and
provides better performance. 0 means "all blobs" and -1 "disabled".

If we don't index everything, for the filter use case is more
efficient to index the biggest objects first: the set is small and
most objects are filtered by NOT being in the index. For the
object-size, the more objects in the index the better, regardless
their size. All together, it is more helpful to index above threshold.

Change-Id: I9ed608ac240677e199b90ca40d420bcad9231489

UInt24Array: Array of unsigned ints encoded in 3 bytes.

The object size index stores positions of objects in the main
index (when ordered by sha1). These positions are per-pack and usually
a pack has <16 million objects (there are exceptions but rather
rare). It could save some memory storing these positions in three bytes
instead of four. Note that these positions are sorted and always positive.

Implement a wrapper around a byte[] to access and search "ints" while
they are stored as unsigned 3 bytes.

Change-Id: Iaa26ce8e2272e706e35fe4cdb648fb6ca7591972

PackIndex: expose the position of an object-id in the index

The primary index returns the offset in the pack for an
objectId. Internally it keeps the object-ids in lexicographical order,
but doesn't expose an API to find the position of an object-id in that
list. This is needed for the object-size index, that we want to store
as "position-in-idx, size".

Add a #findPosition(object-id) method to the PackIndex interface to
know where an object-id sits in the ordered list of ids in the pack.

Note that this index position is over the list of ordered object-ids,
while reverse-index position is over the list of objects in packed
order.

Change-Id: I89fa146599e347a26d3012d3477d7f5bbbda7ba4

Add pack options to preserve and prune old pack files

Add the options
- pack.preserveOldPacks
- pack.prunePreserved

This allows to configure in git config if old packs should be preserved
during gc and pruned during the next gc.

The original implementation in 91132bb0 only allows to set these options
using the API.

Change-Id: I5b23ab4f317d12f5ccd234401419913e8263cc9a

DfsPackFile/DfsGC: Write commit graphs and expose in pack

JGit knows how to read/write commit graphs but the DFS stack is not
using it yet.

The DFS garbage collector generates a commit-graph with commits
reachable from any ref. The pack is stored as extra stream in the GC
pack. DfsPackFile mimicks how other indices are loaded storing the
reference in DFS cache.

Signed-off-by: Xing Huang <xingkhuang@google.com>
Change-Id: I3f94997377986d21a56b300d8358dd27be37f5de

ObjectReader: Allow getCommitGraph to throw IOException

ObjectReader#getCommitGraph doesn't report errors loading the
commit graph. The caller should be aware of the situation and
ultimately decide what to do.

Add IOException to ObjectReader#getCommitGraph signature. RevWalk
defaults to an empty commit-graph on IO errors.

Signed-off-by: Xing Huang <xingkhuang@google.com>
Change-Id: I38eeacff76c7f926b6dfb192d1e5916e40770024

Allow to perform PackedBatchRefUpdate without locking loose refs

Add another newBatchUpdate method in the RefDirectory where we can
control if the created PackedBatchRefUpdate will lock the loose refs or
not.

This can be useful in cases when we run programs which have exclusive
access to a Git repository and we know that locking loose refs is
unnecessary and just a performance loss.

Change-Id: I7d0932eb1598a3871a2281b1a049021380234df9
(cherry picked from commit cb90ed08526bd51f04e5d72e3ba3cf5bd30c11e4)

Merge "Merge branch 'stable-6.5'"

Document option "core.sha1Implementation" introduced in 59029aec

Bug: 580310
Change-Id: I10f3d6f6b5af7ab96683994c9cbd85e6c18a5084

Merge "UploadPack: consume delimiter in object-info command"

Merge "PatchApplier fix - init cache with provided tree"

UploadPack: consume delimiter in object-info command

The 'size' packet line is an argument, so it
must be preceeded by a 0001 delimiter. See also git's
t5701-git-serve.sh test,

https://github.com/git/git/blob/8b8d9a2/t/t5701-git-serve.sh#L329

Without this fix, the server will choke on the delimiter line, saying
PackProtocolException: unexpected <empty string>

To test, I ran Gerrit locally with this fix

$ curl -X POST   -H 'git-protocol: version=2'   -H 'content-type:
application/x-git-upload-pack-request'   -H 'accept:
application/x-git-upload-pack-result'   --data
$'0018command=object-info\n00010009size\n0031oid
d38b1b92bdb2893eb4505667375563f2d6d4086b\n0000'
http://localhost:8080/git.git/git-upload-pack

=>

0008size0032d38b1b92bdb2893eb4505667375563f2d6d4086b 268590000

The same command completes identically on Gitlab (which supports the
object-info command)

$ curl -X POST   -H 'git-protocol: version=2'   -H 'content-type:
application/x-git-upload-pack-request'   -H 'accept:
application/x-git-upload-pack-result'   --data
$'0018command=object-info\n00010009size\n0031oid
d38b1b92bdb2893eb4505667375563f2d6d4086b\n0000'
https://gitlab.com/gitlab-org/git.git/git-upload-pack

=>

0008size0032d38b1b92bdb2893eb4505667375563f2d6d4086b 268590000

In this case, the blob is for the COPYING file in the Git source tree,
which is 26859 bytes long.

Change-Id: Ief4ce1eb9303a3b2479547d7950ef01c7c28f472

PatchApplier fix - init cache with provided tree

This change only affects inCore repositories.
Before this change, any file that wasn't part of the patch
wasn't read, and therefore wasn't part of the output tree.

Change-Id: I246ef957088f17aaf367143f7a0b3af0f8264ffb
Bug: Google b/267270348

Merge "DfsReaderIoStats: Add Commit Graph fields into DfsReaderIoStats"

Merge branch 'stable-6.5'

* stable-6.5:
Prepare 6.5.0-SNAPSHOT builds
JGit v6.5.0.202302011120-m2

Change-Id: I2629d0e07d25690e6de179a42e2d7e3321791f8f

Prepare 6.5.0-SNAPSHOT builds

Change-Id: Id0c7e51293d53b1eeec081cbbdf6e27d77123200

Merge changes I343cc3cf,I9dedf61b

* changes:
Avoid error-prone warning
Fix unused exception error-prone warning

JGit v6.5.0.202302011120-m2

Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Change-Id: I231d3f9b8a59e374477d3a33964061acb2c25ce4

Avoid error-prone warning

GC.gc() returns a Future, which should not be discarded. See also
https://errorprone.info/bugpattern/FutureReturnValueIgnored

Change-Id: I343cc3cfe74a564ad7f8d53f0fe9d96a23aaed00

Fix unused exception error-prone warning

Ignoring the exception seems intended in this case.

Change-Id: I9dedf61b9cb5a6ff39fb141dd5da19143f4f6978

UploadPack: advertise object-info command if enabled

Change-Id: Iad8e5b5f4fdd84bd275eb19ee0d01eb6986d79f2

Merge "Move MemRefDatabase creation in a separate method."

Merge branch 'master' into stable-6.5

* master:
  Shortcut during git fetch for avoiding looping through all local refs
  FetchCommand: fix fetchSubmodules to work on a Ref to a blob
  Silence API warnings introduced by I466dcde6
  Allow the exclusions of refs prefixes from bitmap
  PackWriterBitmapPreparer: do not include annotated tags in bitmap
  BatchingProgressMonitor: avoid int overflow when computing percentage
  [pgm] Fetch-CLI: add support for shallow
  Speedup GC listing objects referenced from reflogs
  Re-add servlet-api 4.0 to the target platform
  Upgrade maven plugins
  Cache trustFolderStat/trustPackedRefsStat value per-instance
  Refresh 'objects' dir and retry if a loose object is not found
  FileSnapshotTest: Add more MISSING_FILE coverage

Change-Id: I370bc228481864912c3cd88d43e5a70517b1c186

Merge branch 'stable-6.4'

* stable-6.4:
  Shortcut during git fetch for avoiding looping through all local refs
  FetchCommand: fix fetchSubmodules to work on a Ref to a blob
  Silence API warnings introduced by I466dcde6
  Allow the exclusions of refs prefixes from bitmap
  PackWriterBitmapPreparer: do not include annotated tags in bitmap
  BatchingProgressMonitor: avoid int overflow when computing percentage
  Speedup GC listing objects referenced from reflogs
  FileSnapshotTest: Add more MISSING_FILE coverage

Change-Id: Id0ebfbd85eb815716383b9495eb7dd1f54cf4d74

Merge branch 'stable-6.3' into stable-6.4

* stable-6.3:
  Shortcut during git fetch for avoiding looping through all local refs
  FetchCommand: fix fetchSubmodules to work on a Ref to a blob
  Silence API warnings introduced by I466dcde6
  Allow the exclusions of refs prefixes from bitmap
  PackWriterBitmapPreparer: do not include annotated tags in bitmap
  BatchingProgressMonitor: avoid int overflow when computing percentage
  Speedup GC listing objects referenced from reflogs
  FileSnapshotTest: Add more MISSING_FILE coverage

Change-Id: Iefcf5d832bd0087c1027876f2200689e1150abce

Merge branch 'stable-6.2' into stable-6.3

* stable-6.2:
  Shortcut during git fetch for avoiding looping through all local refs
  FetchCommand: fix fetchSubmodules to work on a Ref to a blob
  Silence API warnings introduced by I466dcde6
  Allow the exclusions of refs prefixes from bitmap
  PackWriterBitmapPreparer: do not include annotated tags in bitmap
  BatchingProgressMonitor: avoid int overflow when computing percentage
  Speedup GC listing objects referenced from reflogs
  FileSnapshotTest: Add more MISSING_FILE coverage

Change-Id: I2ff386d9a096277360e6c7bd5535b49984620fb3

Merge branch 'stable-6.1' into stable-6.2

* stable-6.1:
  Shortcut during git fetch for avoiding looping through all local refs
  FetchCommand: fix fetchSubmodules to work on a Ref to a blob
  Silence API warnings introduced by I466dcde6
  Allow the exclusions of refs prefixes from bitmap
  PackWriterBitmapPreparer: do not include annotated tags in bitmap
  BatchingProgressMonitor: avoid int overflow when computing percentage
  Speedup GC listing objects referenced from reflogs
  FileSnapshotTest: Add more MISSING_FILE coverage

Change-Id: Iff2fba026b49463016015b2fae1a42cf76ee2dbb

Merge branch 'stable-6.0' into stable-6.1

* stable-6.0:
  Shortcut during git fetch for avoiding looping through all local refs
  FetchCommand: fix fetchSubmodules to work on a Ref to a blob
  Silence API warnings introduced by I466dcde6
  Allow the exclusions of refs prefixes from bitmap
  PackWriterBitmapPreparer: do not include annotated tags in bitmap
  BatchingProgressMonitor: avoid int overflow when computing percentage
  Speedup GC listing objects referenced from reflogs
  FileSnapshotTest: Add more MISSING_FILE coverage

Change-Id: Ib5055f2f3b8a313c178d6f6c7c5630285ad5a726

Merge branch 'stable-5.13' into stable-6.0

* stable-5.13:
  Shortcut during git fetch for avoiding looping through all local refs
  FetchCommand: fix fetchSubmodules to work on a Ref to a blob
  Silence API warnings introduced by I466dcde6
  Allow the exclusions of refs prefixes from bitmap
  PackWriterBitmapPreparer: do not include annotated tags in bitmap
  BatchingProgressMonitor: avoid int overflow when computing percentage
  Speedup GC listing objects referenced from reflogs
  FileSnapshotTest: Add more MISSING_FILE coverage

Change-Id: I58ad4c210a5e7e5a1ba6b22315b04211c8909950

Shortcut during git fetch for avoiding looping through all local refs

The FetchProcess needs to verify that all the refs received point
to objects that are reachable from the local refs, which could be
very expensive but is needed to avoid missing objects exceptions
because of broken chains.

When the local repository has a lot of refs (e.g. millions) and the
client is fetching a non-commit object (e.g. refs/sequences/changes in
Gerrit) the reachability check on all local refs can be very expensive
compared to the time to fetch the remote ref.

Example for a 2M refs repository:
- fetching a single non-commit object: 50ms
- checking the reachability of local refs: 30s

A ref pointing to a non-commit object doesn't have any parent or
successor objects, hence would never need to have a reachability check
done. Skipping the askForIsComplete() altogether would save the 30s
time spent in an unnecessary phase.

Signed-off-by: Luca Milanesio <luca.milanesio@gmail.com>
Change-Id: I09ac66ded45cede199ba30f9e71cc1055f00941b

FetchCommand: fix fetchSubmodules to work on a Ref to a blob

FetchCommand#fetchSubmodules assumed that FETCH_HEAD can always be
parsed as a tree. This isn't true if it refers to a Ref referring to a
BLOB. This is e.g. used in Gerrit for Refs like refs/sequences/changes
which are used to implement sequences stored in git.

Change-Id: I414f5b7d9f2184b2d7d53af1dfcd68cccb725ca4

Silence API warnings introduced by I466dcde6

Change-Id: I510510da34d33757c2f83af8cd1e26f6206a486a

Allow the exclusions of refs prefixes from bitmap

When running a GC.repack() against a repository with over one
thousands of refs/heads and tens of millions of ObjectIds,
the calculation of all bitmaps associated with all the refs
would result in an unreasonable big file that would take up to
several hours to compute.

Test scenario: repo with 2500 heads / 10M obj Intel Xeon E5-2680 2.5GHz
Before this change: 20 mins
After this change and 2300 heads excluded: 10 mins (90s for bitmap)

Having such a large bitmap file is also slow in the runtime
processing and have negligible or even negative benefits, because
the time lost in reading and decompressing the bitmap in memory
would not be compensated by the time saved by using it.

It is key to preserve the bitmaps for those refs that are mostly
used in clone/fetch and give the ability to exlude some refs
prefixes that are known to be less frequently accessed, even
though they may actually be actively written.

Example: Gerrit sandbox branches may even be actively
used and selected automatically because its commits are very
recent, however, they may bloat the bitmap, making it ineffective.

A mono-repo with tens of thousands of developers may have
a relatively small number of active branches where the
CI/CD jobs are continuously fetching/cloning the code. However,
because Gerrit allows the use of sandbox branches, the
total number of refs/heads may be even tens to hundred
thousands.

Change-Id: I466dcde69fa008e7f7785735c977f6e150e3b644
Signed-off-by: Luca Milanesio <luca.milanesio@gmail.com>

Move MemRefDatabase creation in a separate method.

The InMemoryRepository is used in tests (e.g. in gerrit tests) and it
can be useful to create a custom MemRefDatabase for some tests.

Change-Id: I6fbbbfe04400ea1edc988c8788c8eeb06ca8480a

PackWriterBitmapPreparer: do not include annotated tags in bitmap

The annotated tags should be excluded from the bitmap associated
with the heads-only packfile. However, this was not happening
because of the check of exclusion of the peeled object instead
of the objectId to be excluded from the bitmap.

Sample use-case:

refs/heads/main
  ^
  |
commit1 <-- commit2 <- annotated-tag1 <- tag1
  ^
  |
commit0

When creating a bitmap for the above commit graph, before this
change all the commits are included (3 bitmaps), which is
incorrect, because all commits reachable from annotated tags
should not be included.

The heads-only bitmap should include only commit0 and commit1
but because PackWriterBitPreparer was checking for the peeled
pointer of tag1 to be excluded (commit2) which was not found in
the list of tags to exclude (annotated-tag1), the commit2 was
included, even if it wasn't reachable only from the head.

Add an additional check for exclusion of the original objectId
for allowing the exclusion of annotated tags and their pointed
commits. Add one specific test associated with an annotated tag
for making sure that this use-case is covered also.

Example repository benchmark for measuring the improvement:
# refs: 400k (2k heads, 88k tags, 310k changes)
# objects: 11M (88k of them are annotate tags)
# packfiles: 2.7G

Before this change:
GC time: 5h
clone --bare time: 7 mins

After this change:
GC time: 20 mins
clone --bare time: 3 mins

Bug: 581267
Signed-off-by: Luca Milanesio <luca.milanesio@gmail.com>
Change-Id: Iff2bfc6587153001837220189a120ead9ac649dc

BatchingProgressMonitor: avoid int overflow when computing percentage

When cloning huge repositories I observed percentage of object counts
turning negative. This happened if lastWork * 100 exceeded
Integer.MAX_VALUE.

Change-Id: Ic5f5cf5a911a91338267aace4daba4b873ab3900

DfsReaderIoStats: Add Commit Graph fields into DfsReaderIoStats

We are adding commit-graph loading to the DFS stack and the stats object doesn't have fields to track that.

This change replicates the stats of the primary index for the commit-graph.

Signed-off-by: Xing Huang <xingkhuang@google.com>
Change-Id: I4a657bed50083c4ae8bc9f059d4943d612ea2d49

[pgm] Fetch-CLI: add support for shallow

This adds support for shallow cloning. The CloneCommand and the
FetchCommand now have the new options --depth, --shallow-since and
--shallow-exclude to tell the server that the client doesn't want to
download the complete history.

Bug: https://bugs.eclipse.org/bugs/show_bug.cgi?id=475615
Change-Id: I8f113bed25dd6df64f2f95de6a59d4675ab8a903

Speedup GC listing objects referenced from reflogs

GC needs to get a ReflogReader for all existing refs to list all objects
referenced from reflogs. The existing Repository#getReflogReader method
accepts the ref name and then resolves the Ref to create a ReflogReader.
GC calling that for a huge number of Refs one by one is very slow. GC
first gets all Refs in bulk and then calls getReflogReader for each of
them.

Fix this by adding another getReflogReader method to Repository which
accepts a Ref directly.

This speeds up running JGit gc on a mirror clone of the Gerrit
repository from 15:36 min to 1:08 min. The repository used in this test
had 45k refs, 275k commits and 1.2m git objects.

Change-Id: I474897fdc6652923e35d461c065a29f54d9949f4

Re-add servlet-api 4.0 to the target platform

This was removed from the JGit target platform by mistake in 6ca3d219.

Change-Id: Iedae0586fb96651255b67ed6dbb9ff7702c0ea54

Upgrade maven plugins

Remove tycho-extras-version, because Tycho and Tycho Extras are
meanwhile in a single repository and maintained together.

Update
- build-helper-maven-plugin to 3.3.0
- eclipse-jarsigner-plugin to 1.3.5
- jacoco-maven-plugin to 0.8.8
- japicmp to 0.17.1
- maven-antrun-plugin to 3.1.0
- maven-clean-plugin to 3.2.0
- maven-compiler-plugin to 3.10.1
- maven-dependency-plugin to 3.5.0
- maven-deploy-plugin to 3.0.0
- maven-enforcer-plugin to 3.1.0
- maven-install-plugin to 3.1.0
- maven-jar-plugin to 3.3.0
- maven-javadoc-plugin to 3.4.1
- maven-jxr-plugin to 3.3.0
- maven-pmd-plugin to 3.20.0
- maven-project-info-reports-plugin to 3.4.2
- maven-resources-plugin to 3.3.0
- maven-shade-plugin to 3.4.1
- maven-site-plugin to 4.0.0-M4
- maven-surefire-plugin to 3.0.0-M8
- spotbugs-maven-plugin to 4.7.3.0
- spring-boot-maven-plugin to 2.7.7

Change-Id: I14d9ff06d2f509d782eb63adfa6b5733649f11f1

Merge branch 'stable-6.5'

* stable-6.5:
Prepare 6.5.0-SNAPSHOT builds
JGit v6.5.0.202301111425-m1

Change-Id: Ic37f47cb1a7d975918ee9d416576ea9e30aa62db

Merge branch 'stable-6.4'

* stable-6.4:
Cache trustFolderStat/trustPackedRefsStat value per-instance
Refresh 'objects' dir and retry if a loose object is not found

Change-Id: Iea8038dfde29ab988501469f86ee829e578a2fe8

Merge branch 'stable-6.3' into stable-6.4

* stable-6.3:
Cache trustFolderStat/trustPackedRefsStat value per-instance
Refresh 'objects' dir and retry if a loose object is not found

Change-Id: I1db2b51ae8101f345d08235d4f3dc416bfcb42d5

Merge branch 'stable-6.2' into stable-6.3

* stable-6.2:
Cache trustFolderStat/trustPackedRefsStat value per-instance
Refresh 'objects' dir and retry if a loose object is not found

Change-Id: Ibc9bffab8c9ef9c39384b53c142d99878f7f3f98

Merge branch 'stable-6.1' into stable-6.2

* stable-6.1:
Cache trustFolderStat/trustPackedRefsStat value per-instance
Refresh 'objects' dir and retry if a loose object is not found

Change-Id: I9e876f72f735f58bf02c7862a3d8e657fc46a7b9

Cache trustFolderStat/trustPackedRefsStat value per-instance

Instead of re-reading the config every time the methods using these
values were called, cache the config value at the time of instance
construction. Caching the values improves performance for each of the
method calls. These configs are set based on the filesystem storing the
repository and unlikely to change while an application is running.

Change-Id: I1cae26dad672dd28b766ac532a871671475652df
Signed-off-by: Nasser Grainawi <quic_nasserg@quicinc.com>

Refresh 'objects' dir and retry if a loose object is not found

A new loose object may not be immediately visible on a NFS
client if it was created on another client. Refreshing the
'objects' dir and trying again can help work around the NFS
behavior.

Here's an E2E problem that this change can help fix. Consider
a Gerrit multi-primary setup with repositories based on NFS.
Add a new patch-set to an existing change and then immediately
fetch the new patch-set of that change. If the fetch is handled
by a Gerrit primary different that the one which created the
patch-set, then we sometimes run into a MissingObjectException
that causes the fetch to fail.

Bug: 581317
Change-Id: Iccc6676c68ef13a1e8b2ff52b3eeca790a89a13d
Signed-off-by: Kaushik Lingarkar <quic_kaushikl@quicinc.com>

Prepare 6.5.0-SNAPSHOT builds

Change-Id: I6fbda9ad8e054a664cb79c3a32baded12ad6802f

JGit v6.5.0.202301111425-m1

Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Change-Id: I4e81d5c61d66f748e0c5bc86d95f579b5990e9f4

Update Orbit to S20230101190934

and update
- com.google.gson to 2.10.0.v20221207-1049"
- org.apache.commons.compress to 1.22.0.v20221207-1049
- org.apache.httpcomponents.httpclient to 4.5.14.v20221207-1049
- org.apache.httpcomponents.httpcore to 4.4.16.v20221207-1049

Change-Id: I8da9be68162636ca2530ea042b069c533c7d975a

Update to releases p2 repo for 4.26 simultaneous release

Change-Id: I31690aeba1f4a5e9111de184ba81c4f971c756e0

RevWalk: integrate commit-graph with commit parsing

RevWalk#createCommit() will inspect the commit-graph file to find the
specified object's graph position and then return a new RevCommitCG
instance.

RevCommitGC is a RevCommit with an additional "pointer" (the position)
to the commit-graph, so it can load the headers and metadata from there
instead of the pack. This saves IO access in walks where the body is not
needed (i.e. #isRetainBody is false and #parseBody is not invoked).

RevWalk uses automatically the commit-graph if available, no action
needed from callers. The commit-graph is fetched on first access from
the reader (that internally can keep it loaded and reuse it between
walks).

The startup cost of reading the entire commit graph is small. After
testing, reading a commit-graph with 1 million commits takes less than
50ms. If we use RepositoryCache, it will not be initialized util the
commit-graph is rewritten.

Bug: 574368
Change-Id: I90d0f64af24f3acc3eae6da984eae302d338f5ee
Signed-off-by: kylezhao <kylezhao@tencent.com>

FileSnapshotTest: Add more MISSING_FILE coverage

Add a couple tests that confirm what the docs say about isModified() and
equals(MISSING_FILE) behavior.

Change-Id: I6093040ba3594934c3270331405a44b2634b97c5
Signed-off-by: Nasser Grainawi <quic_nasserg@quicinc.com>

Merge branch 'stable-6.4'

* stable-6.4:
Introduce core.trustPackedRefsStat config
Fix documentation for core.trustFolderStat

Change-Id: I93ad0c49b70113134026364c9f647de89d948693

GC: disable writing commit-graph for shallow repos

In shallow repos, GC writes to the commit-graph that shallow commits
do not have parents. This won't be true after a "git fetch --unshallow"
(and before another GC).

Do not write the commit-graph from shallow clones of a repo. The
commit-graph must have the real metadata of commits and that is not
available in a shallow view of the repo.

Change-Id: Ic9f2358ddaa607c74f4dbf289c9bf2a2f0af9ce0
Signed-off-by: kylezhao <kylezhao@tencent.com>

Merge branch 'stable-6.3' into stable-6.4

* stable-6.3:
Introduce core.trustPackedRefsStat config
Fix documentation for core.trustFolderStat

Change-Id: I18d9fc89c9ac1ef069dcefa7d7f992a28539ccf3

Merge branch 'stable-6.2' into stable-6.3

* stable-6.2:
Introduce core.trustPackedRefsStat config
Fix documentation for core.trustFolderStat

Change-Id: I48b6c095ac62dc859829d6fef45325accbb0a144

Merge branch 'stable-6.1' into stable-6.2

* stable-6.1:
Introduce core.trustPackedRefsStat config
Fix documentation for core.trustFolderStat

Change-Id: Ic78630f74c72624932a384eed52ef79ae1eff3e5

Introduce core.trustPackedRefsStat config

Currently, we always read packed-refs file when 'trustFolderStat'
is false. Introduce a new config 'trustPackedRefsStat' which takes
precedence over 'trustFolderStat' when reading packed refs. Possible
values for this new config are:

* always: Trust packed-refs file attributes
* after_open: Same as 'always', but refresh the file attributes of
packed-refs before trusting it
* never: Always read the packed-refs file
* unset: Fallback to 'trustFolderStat' to determine if the file
attributes of packed-refs can be trusted

Folks whose repositories are on NFS and have traditionally been
setting 'trustFolderStat=false' can now get some performance improvement
with 'trustPackedRefsStat=after_open' as it refreshes the file
attributes of packed-refs (at least on some NFS clients) before
considering it.

For example, consider a repository on NFS with ~500k packed-refs. Here
are some stats which illustrate the improvement with this new config
when reading packed refs on NFS:

trustFolderStat=true trustPackedRefsStat=unset: 0.2ms
trustFolderStat=false trustPackedRefsStat=unset: 155ms
trustFolderStat=false trustPackedRefsStat=after_open: 1.5ms

Change-Id: I00da88e4cceebbcf3475be0fc0011ff65767c111
Signed-off-by: Kaushik Lingarkar <quic_kaushikl@quicinc.com>

RefDatabase: fix javadoc formatting

Change-Id: I547819ac380a0e6a88d05206ff171b69f46a8549

Pull up additionalRefsNames from RefDirectory to RefDatabase

This enables to reuse this constant in all RefDatabase implementations.

Change-Id: I13d8fb780de24f71e005b698965fb5bcdbf3c728

Add TernarySearchTree

A ternary search tree is a type of tree where nodes are arranged in a
manner similar to a binary search tree, but with up to three children
rather than the binary tree's limit of two.

Each node of a ternary search tree stores a single character, a
reference to a value object and references to its three children named
equal kid, lo kid and hi kid. The lo kid pointer must point to a node
whose character value is less than the current node. The hi kid pointer
must point to a node whose character is greater than the current
node.[1] The equal kid points to the next character in the word. Each
node in a ternary search tree represents a prefix of the stored strings.
All strings in the middle subtree of a node start with that prefix.

Like other prefix trees, a ternary search tree can be used as an
associative map with the ability for incremental string search. Ternary
search trees are more space efficient compared to standard prefix trees,
at the cost of speed.

They allow efficient prefix search which is important to implement
searching refs by prefix in a RefDatabase.

Searching by prefix returns all keys if the prefix is an empty string.

Bug: 576165
Change-Id: If160df70151a8e1c1bd6716ee4968e4c45b2c7ac

CommitGraph: teach ObjectReader to get commit-graph

FileRepository's ObjectReader#getCommitGraph will return commit-graph
when it exists and core.commitGraph is true.

DfsRepository is not supported currently.

Change-Id: I992d43d104cf542797e6949470e95e56de025107
Signed-off-by: kylezhao <kylezhao@tencent.com>

Merge "CommitGraph: add commit-graph for FileObjectDatabase"

PatchApplier: fix handling of last newline in text patch

If the last line came from the patch, use the patch to determine whether
or not there should be a trailing newline. Otherwise use the old text.

Add test cases for
- no newline at end, last line not in patch hunk
- no newline at end, last line in patch hunk
- patch removing the last newline
- patch adding a newline at the end of file not having one

all for core.autocrlf false, true, and input.

Add a test case where the "no newline" indicator line is not the last
line of the last hunk. This can happen if the patch ends with removals
at the file end.

Bug: 581234
Change-Id: I09d079b51479b89400ad300d0662c1dcb50deab6
Also-by: Yuriy Mitrofanov <a2terminator@mail.ru>
Signed-off-by: Thomas Wolf <twolf@apache.org>

CommitGraph: add commit-graph for FileObjectDatabase

This change makes JGit can read .git/objects/info/commit-graph file
and then get CommitGraph.

Loading a new commit-graph into memory requires additional time. After
testing, loading a copy of the Linux's commit-graph(1039139 commits)
is under 50ms.

Bug: 574368
Change-Id: Iadfdd6ed437945d3cdfdbe988cf541198140a8bf
Signed-off-by: kylezhao <kylezhao@tencent.com>

Reformat PatchApplier and PatchApplierTest

Some lines were too long, unnecessary fully qualified class names,
and an assertEquals(actual, expected) when it should have been
assertEquals(expected, actual).

Change-Id: I3b3c46c963afe2fb82a79c1e93970e73778877e5
Signed-off-by: Thomas Wolf <twolf@apache.org>

PackWriter#prepareBitmapIndex: add clarifying comments

New readers of #prepareBitmapIndex may be confused about the manual
memory management (hidden mutation and nulling out pointers).

Add two clarifying comments to help future readers.

Change-Id: I93cab1919066efda37e96c47667f6991f67e377e

Merge "IO#readFully: provide overload that fills the full array"

IO#readFully: provide overload that fills the full array

IO#readFully is often called with the intent to fill the destination
array from beginning to end. The redundant arguments for where to start
and stop filling are opportunities for bugs if specified incorrectly or
if not changed to match a changed array length.

Provide a overloaded method for filling the full destination array.

Change-Id: I964f18f4a061189cce1ca00ff0258669277ff499
Signed-off-by: Anna Papitto <annapapitto@google.com>

Fix API warnings for the new CommitGraph

Mark the internal package as internal, visible only to the test bundle.
Add an API filter for CoreConfig.DEFAULT_COMMIT_GRAPH_ENABLE.

Change-Id: Ib62a93b873c93daf638b6c57e62fd267e16801bb
Signed-off-by: Thomas Wolf <twolf@apache.org>

PackReverseIndex#findPosition: fix typo in method name

The package-private findPostion method has a type in it. The typo will
become more widespread when a file-based implementation class is
introduced.

Correct the spelling to findPosition before the file-based
implementation is introduced.

Change-Id: Ib285f5a3f9a333ace1782dae9b5d425505eb962a
Signed-off-by: Anna Papitto <annapapitto@google.com>

GC: Write commit-graph files when gc

If 'core.commitGraph' and 'gc.writeCommitGraph' are both true, then gc
will rewrite the commit-graph file when 'git gc' is run. Defaults to
false while the commit-graph feature matures.

Bug: 574368
Change-Id: Ic94cd69034c524285c938414610f2e152198e06e
Signed-off-by: kylezhao <kylezhao@tencent.com>

CommitGraph: add core.commitGraph config

Change-Id: I3b5e735ebafba09ca18fd83da479c7950fa3ea8d
Signed-off-by: kylezhao <kylezhao@tencent.com>

Merge "Gc#deleteOrphans: avoid dependence on PackExt alphabetical ordering"

CommitGraph: implement commit-graph read

Git introduced a new file storing the topology and some metadata of
the commits in the repo (commitGraph). With this data, git can browse
commit history without parsing the pack, speeding up e.g.
reachability checks.

This change teaches JGit to read commit-graph-format file, following
the upstream format([1]).

JGit can read a commit-graph file from a buffered stream, which means
that we can provide this feature for both FileRepository and
DfsRepository.

[1] https://git-scm.com/docs/commit-graph-format/2.21.0

Bug: 574368
Change-Id: Ib5c0d6678cb242870a0f5841bd413ad3885e95f6
Signed-off-by: kylezhao <kylezhao@tencent.com>

Gc#deleteOrphans: avoid dependence on PackExt alphabetical ordering

Deleting orphan files depends on .pack and .keep being reverse-sorted
to before the corresponding index files that could be orphans. The new
reverse index file extension (.rev) will break that frail dependency.

Rewrite Gc#deleteOrphans to avoid that dependence by tracking which pack
names have a .pack or .keep file and then deleting any index files that
without a corresponding one. This approach takes linear time instead of
the O(n logn) time needed for sorting.

Change-Id: If83c378ea070b8871d4b01ae008e7bf8270de763
Signed-off-by: Anna Papitto <annapapitto@google.com>

WalkPushConnection: Sanitize paths given to transports

These paths are given to the underlying URI-based transports (s3, sftp,
http), all of which expect forward-slash as the path separator
character.

Change-Id: I3cbb5928c9531a4da4691411bd8ac248fdf47ef2

Fix documentation for core.trustFolderStat

Update documentation for core.trustFolderStat to highlight that it is
also used when reading the packed-refs file.

Change-Id: I3eac377c3a7f48493abc8ae6d0889ee70a05d24d
Signed-off-by: Kaushik Lingarkar <quic_kaushikl@quicinc.com>

Merge changes Iad832fe1,Icef9658c

* changes:
CommitGraphWriter: fix UnusedException errorprone error
Update jetty to 10.0.13

GraphCommits: Remove unused getter by position

CommitGraphWriter uses the GraphCommits in for-each loops and doesn't
need the access by position anymore. This was a left-over from
https://git.eclipse.org/r/c/jgit/jgit/+/182832

Remove the unused method.

Change-Id: I39df9bfab2601d581705ddf4cea3c04ed4765ff9

CommitGraphWriter: fix UnusedException errorprone error

Errorprone run in the bazel build raised this exception:

org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/commitgraph/CommitGraphWriter.java:105:
error: [UnusedException] This catch block catches an exception and
re-throws another, but swallows the caught exception rather than setting
it as a cause. This can make debugging harder.
} catch (InterruptedIOException e) {
  ^
    (see https://errorprone.info/bugpattern/UnusedException)
  Did you mean 'throw new
IOException(JGitText.get().commitGraphWritingCancelled, e);'?

Change-Id: Iad832fe17955fc1e60e6a4902bc50fd9dca76b9d

Update jetty to 10.0.13

Since Oomph's p2 repo for jetty 10.0.13 doesn't have source bundles, we
remove them. Eclipse platform doesn't create p2 repos for jetty anymore
and we aren't yet ready to use maven dependencies like the platform
does.

Change-Id: Icef9658ce441be43931e32d931adf717e2fa222c

PackExt: Add a commit graph extension.

There is no commit graph PackExt because the non-DFS stack is not writing using PackExt mechanism. The extension is needed in DFS to determine the stream to write the commit-graph.

Add a commit graph extension that matches the one in cgit
(https://git-scm.com/docs/commit-graph#_file_layout)
in preparation for adding DFS support for reading and writing commit graphs.

Change-Id: Id14eda9f116a319124981e0bcbc533928b1b5e8c
Signed-off-by: Xing Huang <xingkhuang@google.com>

Merge "commitgraph package: fix exports/imports, add @since tag for new API"

BatchRefUpdate: Consistent switch branches in ref update

The expression RefUpdate ru = newUpdate(cmd) is eagerly evaluated before the switch statement.
But it is not used in some switch cases and thus is calculated uselessly.

Move expression evaluation to the switch case where it is actually used.
After such a move, several cases became identical and thus were squashed.

Change-Id: Ifd1976f1c28378e092fb24d7ca9c415cba49f07f

RefWriter#writePackedRefs: Remove a redundant "if" check

After checking the variable, the same variable was checked again inside
the "if" block, and after the first check, this variable does not
change. Remove the second unnecessary check.

Change-Id: I6a38e67073f7f93105575b8f415ad32d350af602

commitgraph package: fix exports/imports, add @since tag for new API

Change-Id: I9175b1d796f91f5ba4e21d3418550ae451c054b0