searchForReuse might impact performance in large repositories
The search for reuse phase for *all* the objects scans *all*
the packfiles, looking for the best candidate to serve back to the
client.
This can lead to an expensive operation when the number of
packfiles and objects is high.
Add parameter "pack.searchForReuseTimeout" to limit the time spent
on this search.
Change-Id: I54f5cddb6796fdc93ad9585c2ab4b44854fa6c48
DfsBundleWriter writes out the entire repository to a Git bundle file.
It packs all objects included in the packfile by concatenating all pack
files. This makes the bundle creation fast and cheap. Useful for backing
up a repository as-is.
Change-Id: Iee20e4b1ab45b2a178dde8c72093c0dd83f04805
Signed-off-by: Masaya Suzuki <masayasuzuki@google.com>
Do not send empty blob in response to blob:none filter
If I create a repository containing an empty file and clone it
with
git clone --no-checkout --filter=blob:none \
https://url/of/repository
then I would expect no blobs to be transferred over the wire. Alas,
JGit rewrites filter=blob:none to filter=blob:limit=0, so if the
repository contains an empty file then the empty blob gets
transferred.
Fix it by teaching JGit about filters based on object type to
complement the existing filters based on object size. This prepares
us for other future filters such as object:none.
In particular, this means we do not need to look up the size of the
filtered blobs, which should speed up clones. Noticed by Anna
Pologova and Terry Parker.
Change-Id: Id4b234921a190c108d8be2c87f54dcbfa811602a
Signed-off-by: Jonathan Nieder <jrn@google.com>
Currently, the garbage collection is consistently failing for some large
repositories in the building bitmap phase, e.g.Linux-MSM project:
https://source.codeaurora.org/quic/la/kernel/msm-3.18
Historically, bitmap index creation happened in 3 phases:
1. Select the commits to which bitmaps should be attached.
2. Create all bitmaps for these commits, stored in uncompressed format
in the PackBitmapIndexBuilder.
3. Deltify the bitmaps and write them to disk.
We investigated the process. For phase 2 it's most efficient to create
bitmaps starting with oldest commit and moving to the newest commit,
because the newer commits are able to reuse the work for the old ones.
But for bitmap deltification in phase 3, it's better when a newer
commit's bitmap is the base, and the current disk format writes bitmaps
out for the newest commits first.
This change introduces a new collection to hold the deltified and
compressed representations of the bitmaps, keeping a smaller subset of
commits in the PackBitmapIndexBuilder to help make the bitmap index
creation more memory efficient.
And in this commit, we're setting DISTANCE_THRESHOLD to 0 in the
PackWriterBitmapPreparer, which means the garbage collection will not
have much behavoir change and will still use as much memory as before.
Change-Id: I6ec2c3e8dde11805af47874d67d33cf1ef83660e
Signed-off-by: Yunjie Li <yunjieli@google.com>
PackBitmapIndex: Move BitmapCommit to a top-level class
Move BitmapCommit from inside the PackWriterBitmapPreparer to a new
top-level class in preparation for improving the memory footprint of GC's
bitmap generation phase.
Change-Id: I4d404a5b3a34998b441d23105197f33d32d39670
Signed-off-by: Yunjie Li <yunjieli@google.com>
Move array designators from the variable to the type
As reported by Sonar Lint:
Array designators should always be located on the type for better code
readability. Otherwise, developers must look both at the type and the
variable name to know whether or not a variable is an array.
Change-Id: If6b41fed3483d0992d402d8680552ab4bef89ffb
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Using the batch cleanup operation via Source -> Cleanup -> "Use lambdas
where possible" from standard JDT.
Change-Id: I5452bd94fdccc920ade071228aeed3a8b9fdbe62
Signed-off-by: Lars Vogel <Lars.Vogel@vogella.com>
Enable UnusedException at ERROR level which causes the build to fail
in many places with:
[UnusedException] This catch block catches an symbol and re-throws
another, but swallows the caught symbol rather than setting it as a
cause. This can make debugging harder.
Fix it by setting the caught exception as cause on the subsequently
thrown exception.
Note: The grammatically incorrect error message is copy-pasted as-is
from the version of ErrorProne currently used in Bazel; it has been
fixed by [1] in the latest version.
[1] https://github.com/google/error-prone/commit/d57a39c
Change-Id: I11ed38243091fc12f64f1b2db404ba3f1d2e98b5
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Enable and fix "Statement unnecessarily nested within else clause" warnings
Since [1] the gerrit project includes jgit as a submodule, and has this
warning enabled, resulting in 100s of warnings in the console.
Also enable the warning here, and fix them.
At the same time, add missing braces around adjacent and nearby one-line
blocks.
[1] https://gerrit-review.googlesource.com/c/gerrit/+/227897
Change-Id: I81df3fc7ed6eedf6874ce1a3bedfa727a1897e4c
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Add to statistics the amount and size of packfiles offloaded to HTTP
download.
Change-Id: I895a7219ecac2794368bfc4fdfae74c1238deed9
Signed-off-by: Ivan Frade <ifrade@google.com>
These warnings were missed to address in a0048208 which introduced them.
Change-Id: Ia2d15fdce72c10378d020682b80fe7fc548c0d4c
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
UploadPack: support custom packfile-to-URI mapping
Teach UploadPack to take a provider of URIs corresponding to cached
packs. When fetching, if the client supports the packfile-uri feature,
and if such a cached pack were to be streamed, instead send the
corresponding URI.
This packfile-uri feature is implemented in the jt/fetch-cdn-offload
branch of Git. There is interest in this feature [1], but it is not yet
merged.
[1] https://public-inbox.org/git/cover.1552073690.git.jonathantanmy@google.com/
Change-Id: I9a32dae131c9c56ad2ff4a8a9638ae3b5e44dc15
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
PackWriter: Prefer boolean operators over logical operators in comparisons
Using the | and & operators in boolean conditions results in a warning
from Error Prone:
[ShortCircuitBoolean]
Prefer the short-circuiting boolean operators && and || to & and |.
see https://errorprone.info/bugpattern/ShortCircuitBoolean
Change-Id: I4275c60306e43c74030c4465ba02cb853ad444e1
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
When Error Prone checks are enabled, the "ClassCanBeStatic" warning is
triggered:
Inner class is non-static but does not reference enclosing class
see https://errorprone.info/bugpattern/ClassCanBeStatic
Change-Id: I5a0e3bf0cf8c28176d9c98914c1c0dfab9c5736f
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Use integer depth in PackWriter's DepthAwareVisitationPolicy
- ObjectWalk.getTreeDepth() returns int hence there is no need to use
long depths in the lowestDepthVisited map.
- Also fix boxing warnings introduced in 0a15cb3a.
Change-Id: I6d73b6f41d5d20975d02f376c8588e411eaff0ec
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
If a tree is visited during pack and filtered out with tree:<depth>, we
may need to include it if it is visited again at a lower depth.
Until now we revisit it no matter what the depth is. Now, avoid
visiting it if it has been visited at a lower or equal depth.
Change-Id: I68cc1d08f1999a8336684a05fe16e7ae51898866
Signed-off-by: Matthew DeVore <matvore@gmail.com>
tree:<depth> should not traverse overly-deep trees
If we are traversing a tree which is too deep, then there is no need to
traverse the children. Skipping children is much faster than traversing
the possibly thousands of objects which are directly or indirectly
referenced by the tree.
Change-Id: I6d68cc1d35da48e3288b9cc80356a281ab36863d
Signed-off-by: Matthew DeVore <matvore@gmail.com>
This is used when fetching, and in particular to populate a partial
clone or a virtual file system cache as the user navigates. With this,
a client can pre-fetch a few directories deeper than only the current
directory.
depth:0 will omit all trees, and is useful if you only want to fetch
the commits of a repository, or fetch just a single tree or blob object.
depth:1 will fetch only the root tree of all commits fetched. depth:2
will fetch the root tree and all blobs and tree objects directly
referenced from it. depth:3 gets one more level, and so on. depth:#
will not filter a blob or tree that is explicitly marked wanted.
Bitmaps are disabled when this filter is used.
This implementation is quite slow because it iterates over all omitted
objects rather than skipping them. This will be addressed in follow-up
commits.
Change-Id: Ic312fee22d60e32cfcad59da56980e90ae2cae6a
Signed-off-by: Matthew DeVore <matvore@gmail.com>
Since Java 7 the diamond operator can be used instead of explicit
type parameters.
Change-Id: I2dee5fce7afebb1d9088eeaec4484ee58b4fa492
Signed-off-by: Carsten Hammer <carsten.hammer@t-online.de>
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Expose and pass around the FilterSpec object rather than the raw blob limit
Use the FilterSpec object so that less code has to know about the make-up of
FilterSpecs. When fields are added to FilterSpec, these pieces of code won't
need updating again.
Change-Id: I2b9e59a9926ff112faf62a3fa2d33c961a1779e5
Signed-off-by: Matthew DeVore <matvore@gmail.com>
Always send refs' objects despite "filter" in pack
In a0c9016abd ("upload-pack: send refs' objects despite "filter"",
2018-07-09), Git updated the "filter" option in the fetch-pack
upload-pack protocol to not filter objects explicitly specified in
"want" lines, even if they match the criterion of the filter. Update
JGit to match that behavior.
Change-Id: Ia4d74326edb89e61062e397e05483298c50f9232
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Remove it from
* package private functions.
* try blocks
* for loops
this was done with the following python script:
$ cat f.py
import sys
import re
import os
def replaceFinal(m):
return m.group(1) + "(" + m.group(2).replace('final ', '') + ")"
methodDecl = re.compile(r"^([\t ]*[a-zA-Z_ ]+)\(([^)]*)\)")
def subst(fn):
input = open(fn)
os.rename(fn, fn + "~")
dest = open(fn, 'w')
for l in input:
l = methodDecl.sub(replaceFinal, l)
dest.write(l)
dest.close()
for root, dirs, files in os.walk(".", topdown=False):
for f in files:
if not f.endswith('.java'):
continue
full = os.path.join(root, f)
print full
subst(full)
Change-Id: If533a75a417594fc893e7c669d2c1f0f6caeb7ca
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
PackWriter: Fix Eclipse errors about missing Javadoc
Change If72b4b422 added a new method filterAndAddObject with a
partial Javadoc, which causes errors in Eclipse.
Since it's a private method, Javadoc is not strictly necessary, so
just convert it to a standard comment block.
Bug: 532540
Change-Id: I06aa79211d1223dccf6c931451ca885ca6d39cbc
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Teach UploadPack to support filtering by blob size
Teach UploadPack to advertise the filter capability and support a
"filter" line in the request, accepting blob sizes only, if the
configuration variable "uploadpack.allowfilter" is true. This feature is
currently in the "master" branch of Git, and as of the time of writing,
this feature is to be released in Git 2.17.
This is incomplete in that the filter-by-sparse-specification feature
also supported by Git is not included in this patch.
If a JGit server were to be patched with this commit, and a repository
on that server configured with RequestPolicy.ANY or
RequestPolicy.REACHABLE_COMMIT_TIP, a Git client built from the "master"
branch would be able to perform a partial clone.
Change-Id: If72b4b422c06ab432137e9e5272d353b14b73259
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Use PackStatistics and PostUploadHook and PostUploadHookChain instead.
Also remove
- UploadPack#getPackStatistics replaced by #getStatistics
- UploadPack#getLogger and UploadPack#setLogger
Change-Id: I70881c539af3094d68d594f19983dea0973604e8
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Add fetch statistics for the counts of advertised refs, wants and haves.
Also add the duration in milliseconds for the negotiation phase. For
non-bidirectional transports like HTTP, this is the time for the final
round that sends the pack back to the user.
Change-Id: I1af7ffd3cb7b62182340682e2a243691ea24ec2e
Signed-off-by: Terry Parker <tparker@google.com>
Replace explicit calls to initCause where possible
Where the exception being thrown has a constructor that takes a
Throwable, use that instead of instantiating the exception and then
explicitly calling initCause.
Change-Id: I06a0df407ba751a7af8c1c4a46f9e2714f13dbe3
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Make PackWriterBitmapWriter class public and move it to a more central
location, in preparation for its use by another class (in a subsequent
commit).
One of its inner static classes, AddUnseenToBitmapFilter, previously
package-private, is also used directly in its former package. Therefore,
AddUnseenToBitmapFilter and its sibling class have been moved to an
internal package instead.
Change-Id: I740bc4bfc4e4e3c857d1ee7d25fe45e90cd22a75
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Commit db77610 ensured that all refs/tags commits are added to the
primary GC pack. It did that by adding all of the refs/tags commits
to the primary GC pack PackWriter's "interesting" object set.
Unfortunately, all commit objects in the "interesting" set are
selected as commits for which bitmap indices will be built. In a
repository like chromium with lots of tags, this changed the number of
bitmaps created from <700 to >10000. That puts huge memory pressure on
the GC task.
This change restores the original behavior of ignoring tags when
selecting commits for bitmaps.
In the "uninteresting" set, commits for refs/heads and refs/tags for
unannotated tags can not be differentiated. We instead identify
refs/tags commits by passing their ObjectIds as a new "noBitmaps"
parameter to the PackWriter.preparePack() methods.
PackWriterBitmapPreparer.setupTipCommitBitmaps() can then use that
"noBitmaps" parameter to exclude those commits.
Change-Id: Icd287c6b04fc1e48de773033fe432a9b0e904ac5
Signed-off-by: Terry Parker <tparker@google.com>
The ObjectWalker in PackWriterBitmapWalker needs to be reset whenever it
starts a new walk. Move this responsibility from the caller to the
method when the new walk starts.
Change-Id: Ib66003be1b5bdc80f46b9bbbb17d45e616714912
Signed-off-by: Zhen Chen <czhen@google.com>
Enable and fix warnings about redundant specification of type arguments
Since the introduction of generic type parameter inference in Java 7,
it's not necessary to explicitly specify the type of generic parameters.
Enable the warning in Eclipse, and fix all occurrences.
Change-Id: I9158caf1beca5e4980b6240ac401f3868520aad0
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Enable and fix 'Should be tagged with @Override' warning
Set missingOverrideAnnotation=warning in Eclipse compiler preferences
which enables the warning:
The method <method> of type <type> should be tagged with @Override
since it actually overrides a superclass method
Justification for this warning is described in:
http://stackoverflow.com/a/94411/381622
Enabling this causes in excess of 1000 warnings across the entire
code-base. They are very easy to fix automatically with Eclipse's
"Quick Fix" tool.
Fix all of them except 2 which cause compilation failure when the
project is built with mvn; add TODO comments on those for further
investigation.
Change-Id: I5772061041fd361fe93137fd8b0ad356e748a29c
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
These packages don't use @since tags because they are not part of the
stable public API. Some @since tags snuck in, though. Remove them to
make the convention easier to find for new contributors and the
expectations clearer for users.
Change-Id: I6c17d3cfc93657f1b33cf5c5708f2b1c712b0d31
Shallow fetch: Pass along "shallow"s in unparsed-wants case, too
Since 84d2738ff2 (Don't skip want validation when the client sends no
haves, 2013-06-21), this branch is not taken. Process the
"shallow"s anyway as a defensive measure in case the code path gets
revived.
Change-Id: Idfb834825d77f51e17191c1635c9d78c78738cfd
Signed-off-by: Jonathan Nieder <jrn@google.com>
Shallow fetch/clone: Make --depth mean the total history depth
cgit changed the --depth parameter to mean the total depth of history
rather than the depth of ancestors to be returned [1]. JGit still uses
the latter meaning, so update it to match cgit.
depth=0 still means a non-shallow clone. depth=1 now means only the
wants rather than the wants and their direct parents.
This is accomplished by changing the semantic meaning of "depth" in
UploadPack and PackWriter to mean the total depth of history desired,
while keeping "depth" in DepthWalk.{RevWalk,ObjectWalk} to mean
the depth of traversal. Thus UploadPack and PackWriter always
initialize their DepthWalks with "depth-1".
[1] upload-pack: fix off-by-one depth calculation in shallow clone
https://code.googlesource.com/git/+/682c7d2f1a2d1a5443777237450505738af2ff1a
Change-Id: I87ed3c0f56c37e3491e367a41f5e555c4207ff44
Signed-off-by: Terry Parker <tparker@google.com>
When fetching from a shallow clone, the client sends "have" lines
to tell the server about objects it already has and "shallow" lines
to tell where its local history terminates. In some circumstances,
the server fails to honor the shallow lines and fails to return
objects that the client needs.
UploadPack passes the "have" lines to PackWriter so PackWriter can
omit them from the generated pack. UploadPack processes "shallow"
lines by calling RevWalk.assumeShallow() with the set of shallow
commits. RevWalk creates and caches RevCommits for these shallow
commits, clearing out their parents. That way, walks correctly
terminate at the shallow commits instead of assuming the client has
history going back behind them. UploadPack converts its RevWalk to an
ObjectWalk, maintaining the cached RevCommits, and passes it to
PackWriter.
Unfortunately, to support shallow fetches the PackWriter does the
following:
if (shallowPack && !(walk instanceof DepthWalk.ObjectWalk))
walk = new DepthWalk.ObjectWalk(reader, depth);
That is, when the client sends a "deepen" line (fetch --depth=<n>)
and the caller has not passed in a DepthWalk.ObjectWalk, PackWriter
throws away the RevWalk that was passed in and makes a new one. The
cleared parent lists prepared by RevWalk.assumeShallow() are lost.
Fortunately UploadPack intends to pass in a DepthWalk.ObjectWalk.
It tries to create it by calling toObjectWalkWithSameObjects() on
a DepthWalk.RevWalk. But it doesn't work: because DepthWalk.RevWalk
does not override the standard RevWalk#toObjectWalkWithSameObjects
implementation, the result is a plain ObjectWalk instead of an
instance of DepthWalk.ObjectWalk.
The result is that the "shallow" information is thrown away and
objects reachable from the shallow commits can be omitted from the
pack sent when fetching with --depth from a shallow clone.
Multiple factors collude to limit the circumstances under which this
bug can be observed:
1. Commits with depth != 0 don't enter DepthGenerator's pending queue.
That means a "have" cannot have any effect on DepthGenerator unless
it is also a "want".
2. DepthGenerator#next() doesn't call carryFlagsImpl(), so the
uninteresting flag is not propagated to ancestors there even if a
"have" is also a "want".
3. JGit treats a depth of 1 as "1 past the wants".
Because of (2), the only place the UNINTERESTING flag can leak to a
shallow commit's parents is in the carryFlags() call from
markUninteresting(). carryFlags() only traverses commits that have
already been parsed: commits yet to be parsed are supposed to inherit
correct flags from their parent in PendingGenerator#next (which
doesn't happen here --- that is (2)). So the list of commits that have
already been parsed becomes relevant.
When we hit the markUninteresting() call, all "want"s, "have"s, and
commits to be unshallowed have been parsed. carryFlags() only
affects the parsed commits. If the "want" is a direct parent of a
"have", then it carryFlags() marks it as uninteresting. If the "have"
was also a "shallow", then its parent pointer should have been null
and the "want" shouldn't have been marked, so we see the bug. If the
"want" is a more distant ancestor then (2) keeps the uninteresting
state from propagating to the "want" and we don't see the bug. If the
"shallow" is not also a "have" then the shallow commit isn't parsed
so (2) keeps the uninteresting state from propagating to the "want
so we don't see the bug.
Here is a reproduction case (time flowing left to right, arrows
pointing to parents). "C" must be a commit that the client
reports as a "have" during negotiation. That can only happen if the
server reports it as an existing branch or tag in the first round of
negotiation:
A <-- B <-- C <-- D
First do
git clone --depth 1 <repo>
which yields D as a "have" and C as a "shallow" commit. Then try
git fetch --depth 1 <repo> B:refs/heads/B
Negotiation sets up: have D, shallow C, have C, want B.
But due to this bug B is marked as uninteresting and is not sent.
Change-Id: I6e14b57b2f85e52d28cdcf356df647870f475440
Signed-off-by: Terry Parker <tparker@google.com>
When doing an incremental fetch from JGit, "have" commits are marked
as "uninteresting". In a non-shallow fetch, when the RevWalk hits an
"uninteresting" commit it marks the commit's corresponding tree as
uninteresting. That has the effect of dropping those trees and all the
trees and blobs they reference out of the thin pack returned to the
client.
However, shallow fetches use a DepthWalk to limit the RevWalk, which
nearly always causes the RevWalk to terminate before encountering the
"have" commits. As a result the pack created for the incremental fetch
never encounters "uninteresting" tree objects and thus includes
duplicate objects that it knows the client already has.
Change-Id: I7b1f7c3b0d83e04d34cd2fa676f1ad4fec904c05
Signed-off-by: Terry Parker <tparker@google.com>