Although the stable-4.0 branch already exists, 4.0 development is
still happening on master until IP logs are sent for review, which
will happen at the end of May.
Change-Id: I863ba85c6303f8ef2eb13bca5e2d30e5d3c58b29
Signed-off-by: Jonathan Nieder <jrn@google.com>
While trying to decide between "which matches every object" and "as it
matches every object", I became distracted and wrote both.
Change-Id: I867ce29664e661a81a9d441e59ffd0b72270dd98
Signed-off-by: Jonathan Nieder <jrn@google.com>
Allow ObjectWalk to be filtered by an arbitrary predicate
This will make it possible to declare a collection of objects as
ineligible for the walk en masse, for example if they are known to be
uninteresting via a bitmap.
Change-Id: I637008b25bf9fb57df60ebb2133a70214930546a
Signed-off-by: Jonathan Nieder <jrn@google.com>
Applications that use a commit message once and do not
need it again can free the body to save memory. Expose
the disposeBody() methods to support this and use it in
pgm.Log which only visits each commit once.
Change-Id: I4142a0749c24f15386ee7fb119934a0432234de3
This was added a very long time ago to support the failed
DHT storage implementation. Since then no storage system
was able to make use of this API, but it pollutes internals
of the walkers.
Kill the API on ObjectReader and drop the invocations from
the walker code.
Change-Id: I36608afdac13a6c3084d7c7e0af5e0cb22900332
Previously using an ObjectWalk meant uninteresting commits may keep
their commit message buffers in memory just in case they were found to
be on the boundary and were output as UNINTERESTING for the caller.
This was incorrect inside StartGenerator. ObjectWalk hides these
internal UNINTERESTING cases from its caller unless RevSort.BOUNDARY
was explicitly set, and its false by default. Callers never see one
of these saved uninteresting commits.
Change the test to allow early dispose unless the application has
explicitly asked for RevSort.BOUNDARY. This allows uninteresting
commit buffers to be discarded and garbage collected in ObjectWalks
when the caller will never be given the RevCommit.
Change-Id: Ic1419cc1d9ee95f4d09386dd0730d54c12dcc157
Despite being the primary author of RevWalk and ObjectWalk I still
fail to remember to setRetainBody(false) in application code using
an ObjectWalk to examine the graph.
Document the default for RevWalk is setRetainBody(true), where the
application usually wants the commit bodies to display or inspect.
Change the default for ObjectWalk to setRetainBody(false), as nearly
all callers want only the graph shape and do not need the larger text
inside a commit body. This allows some code in JGit to be simplified.
Change-Id: I367e42209e805bd5e1f41b4072aeb2fa98ec9d99
Revert "Let ObjectWalk.markUninteresting also mark the root tree as"
The Iff2de881 tried to fix missing tree ..." but introduced severe
performance degradation (>10x in some cases) when acting as server
(git push) and as client (replication). IOW cure is worse than the
disease.
This reverts commit c4797fe986.
Change-Id: I4e6056eb352d51277867f857a0cab380eca153ac
Signed-off-by: David Ostrovsky <david@ostrovsky.org>
RevWalk: Do not close reader passed explicitly to constructor
The RevWalk(ObjectReader) constructor is explicitly to handle the case
where the caller is responsible for opening and closing the reader.
The reader should only be closed when it was created in the
RevWalk(Repository) constructor.
Change-Id: Ic0d595dc8d10de79e87549546c6c5ea2dc617e9b
Implement AutoClosable interface on classes that used release()
Implement AutoClosable and deprecate the old release() method to give
JGit consumers some time to adapt.
Bug: 428039
Change-Id: Id664a91dc5a8cf2ac401e7d87ce2e3b89e221458
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Add retainOnReset(RevFlag) to RevWalk to simplify reset usage
Applications sometimes use a RevFlag instead of a Set<RevObject>
to track boolean state bits about objects being processed. However
this requires careful use of the resetRetain() methods to avoid an
accidental clearing of the RevFlag bits, effectively clearing the
Set<RevObject> the application wanted to track.
Simplify that use case by offering retainOnReset, a collection of
flags that are never cleared by the RevWalk.
Change-Id: I4c05b89b1398e4a4f371eac3a5d1d5edddec838f
When marking commits as uninteresting don't care if the tree exists
When during an ObjectWalk commits are marked as uninteresting we should
be tolerant against the situation that the commit exists in the repo but
the referenced tree is not exisiting. Since commit
c4797fe986 we are throwing
MissingObjectException in such a case. This semantic differs from native
git behaviour and may cause push operations to fail while they would
work in native git. See:
http://dev.eclipse.org/mhonarc/lists/egit-dev/msg03585.html
Bug: 445744
Change-Id: Ib7dec10fd2ef1adbb8adbabb9d3d5a64e554286a
When marking commits as uninteresting don't care if the tree exists
When during an ObjectWalk commits are marked as uninteresting we should
be tolerant against the situation that the commit exists in the repo but
the referenced tree is not exisiting. Since commit
c4797fe986 we are throwing
MissingObjectException in such a case. This semantic differs from native
git behaviour and may cause push operations to fail while they would
work in native git. See:
http://dev.eclipse.org/mhonarc/lists/egit-dev/msg03585.html
Bug: 445744
Change-Id: Ib7dec10fd2ef1adbb8adbabb9d3d5a64e554286a
Let ObjectWalk.markUninteresting also mark the root tree as
uninteresting
Using the ObjectWalk and marking a commit as uninteresting didn't mark
its root tree as uninteresting. This caused the "missing tree ..."
error in Gerrit under special circumstances. For example, if the
patch-set 2 changes only the commit message then the patch-set 1
and patch-set 2 share the same root-tree:
ps1 -> o o <- ps2
\ /
o root-tree
The transported pack will contain the ps2 commit but not the root-tree
object.
When using the BaseReceivePack.setCheckReferencedObjectsAreReachable
JGit will check the reachability of all referenced objects not provided
in the transported pack. Since the ps1 was advertised it will properly
be marked as uninteresting. However, the root-tree was reachable because
the ObjectWalk.markUninteresting missed to mark it as uninteresting.
JGit was then rejecting the pack with the "missing tree ..." exception.
Gerrit-issue: https://code.google.com/p/gerrit/issues/detail?id=1582
Change-Id: Iff2de8810f14ca304e6655fc8debeb8f3e20712b
Signed-off-by: Saša Živkov <sasa.zivkov@sap.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Rename RewriteTreeFilter to TreeRevFilter and make it public
The current behavior of passing a TreeFilter to RevWalk has limited
usefulness, since the RevFilter derived from the TreeFilter is always
ANDed together with any other RevFilters. It is also tied fairly
tightly to the parent rewriting mechanism.
Make TreeRevFilter a generic RevFilter that matches modified paths
against any TreeFilter. This allows for more complex logic like
(modified this path OR authored by this person).
Leave the rewrite flag logic in this class, since it's closely tied to
the parent comparison code, but hidden behind a protected constructor.
Change-Id: Ia72ef591a99415e6f340c5f64583a49c91f1b82f
Previously, setting any TreeFilter on a RevWalk triggered parent
rewriting, which in the current StartGenerator implementation ends up
buffering the entire commit history in memory. Aside from causing poor
performance on large histories, this does not match the default
behavior of `git rev-list`, which does not rewrite parent SHAs unless
asked to via --parents/--children.
Add a new method setRewriteParents() to RevWalk to disable this
behavior. Continue rewriting parents by default to maintain backwards
compatibility.
Change-Id: I1f38e05526071c75ca58095e312663de5e6f334d
Under certain circumstances isMergedInto() returned
false even though base is reachable from the tip.
This will hinder pushes and receives by falsely
detecting "non fast forward" merges.
o---o---o---o---o
/ \
/ o---o---A---o---M
/ /
---2---1-
if M (tip) was compared to 1 (base), the method
isMergedInto() could still return false, since
two mergeBases will be detected and the return
statement will only look at one of them:
return next() == base;
In most cases this would pass, but if "A" is
a commit with an old timestamp, the Generator
would walk down to "2" before completing the
walk pass "A" and first finding the other
merge base "1". In this case, the first call to
next() returns 2, which compared to base evaluates
as false.
This is fixed by iterating merge bases and
returning true if base is found among them.
Change-Id: If2ee1f4270f5ea4bee73ecb0e9c933f8234818da
Signed-off-by: Gustaf Lundh <gustaf.lundh@sonymobile.com>
Signed-off-by: Sven Selberg <sven.selberg@sonymobile.com>
Fix StackOverflowError in RevCommit.carryFlags on deep side graphs
Copying flags through a graph with deep side branches can cause
StackOverflowError. The recursive step to visit the 2nd parent of
a merge commit can overflow the stack if these are themselves very
deep histories with many branches.
Rewrite the loop to iterate up to 500 recursive steps deep before
unwinding the stack and running the remaining parts of the graph
using a dynamically allocated FIFORevQueue.
This approach still allows simple graphs that mostly merge short
lived topic branches into master to copy flags with no dynamic
memory allocation, relying only on temporary stack extensions.
Larger more complex graphs only pay the allocation penalities
if copying has to extend outwards "very far" in the graph, which
is unlikely for many coloring based algorithms.
Change-Id: I1882e6832c916e27dd5f6b7602d9caf66fb39c84
Fix RevWalkUtils.findBranchesReachableFrom not finding some branches
The "cut off" optimization causes it to not include branches that
contain the specified commit but happen to share commits with a branch
that does not contain the commit.
An example:
-B foo
\
-A---C master
findBranchesReachableFrom for commit A with both branches as input may
not return master (depending on the order of the input). The reason is
that A is not contained in foo, and therefore the old code would put B
in the cutOff set. When then walking the master commits and B is
checked, it is found in the cutOff set and the walk is aborted, causing
master not to be returned even though it should.
Bug: 425674
Change-Id: I2c0c406ce5fcc9a03538b483473af930d4895d30
Signed-off-by: Robin Stocker <robin@nibor.org>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
In certain cases a JGit server updating an existing shallow client
selected a common ancestor that was behind the shallow edge of
the client. This allowed the server to assume the client had some
objects it did not have and allowed creation of pack deltas the
client could never inflate.
Any commit the client has advertised as shallow must be treated
by UploadPack server as though it has no parents. With no parents
the walker cannot visit graph history the client does not have,
and PackWriter cannot consider delta base candidates the client
is lacking.
Change-Id: I4922b9354df9f490966a586fb693762e897345a2
The comment about legacy Tag and Object types no longer applies,
though prior to Idb273d5a92849b42935ac14eed73b796b80aad50 the field
was still being used by RewriteTreeFilter.
Change-Id: I9ee5da8f8a3b61c9cf543817c03117ee0609dd8f
The various rename detection options are an inherent part of the
filter, similar to the path being followed.
This fixes a potential NPE when a RevWalk with a FollowFilter is
created without a Repository, since the old code path tried to get
the DiffConfig from the RevWalk's possibly-missing repository.
Change-Id: Idb273d5a92849b42935ac14eed73b796b80aad50
Support creating pack bitmap indexes in PackWriter.
Update the PackWriter to support writing out pack bitmap indexes,
a parallel ".bitmap" file to the ".pack" file.
Bitmaps are selected at commits every 1 to 5,000 commits for
each unique path from the start. The most recent 100 commits are
all bitmapped. The next 19,000 commits have a bitmaps every 100
commits. The remaining commits have a bitmap every 5,000 commits.
Commits with more than 1 parent are prefered over ones
with 1 or less. Furthermore, previously computed bitmaps are reused,
if the previous entry had the reuse flag set, which is set when the
bitmap was placed at the max allowed distance.
Bitmaps are used to speed up the counting phase when packing, for
requests that are not shallow. The PackWriterBitmapWalker uses
a RevFilter to proactively mark commits with RevFlag.SEEN, when
they appear in a bitmap. The walker produces the full closure
of reachable ObjectIds, given the collection of starting ObjectIds.
For fetch request, two ObjectWalks are executed to compute the
ObjectIds reachable from the haves and from the wants. The
ObjectIds needed to be written are determined by taking all the
resulting wants AND NOT the haves.
For clone requests, we get cached pack support for "free" since
it is possible to determine if all of the ObjectIds in a pack file
are included in the resulting list of ObjectIds to write.
On my machine, the best times for clones and fetches of the linux
kernel repository (with about 2.6M objects and 300K commits) are
tabulated below:
Operation Index V2 Index VE003
Clone 37530ms (524.06 MiB) 82ms (524.06 MiB)
Fetch (1 commit back) 75ms 107ms
Fetch (10 commits back) 456ms (269.51 KiB) 341ms (265.19 KiB)
Fetch (100 commits back) 449ms (269.91 KiB) 337ms (267.28 KiB)
Fetch (1000 commits back) 2229ms ( 14.75 MiB) 189ms ( 14.42 MiB)
Fetch (10000 commits back) 2177ms ( 16.30 MiB) 254ms ( 15.88 MiB)
Fetch (100000 commits back) 14340ms (185.83 MiB) 1655ms (189.39 MiB)
Change-Id: Icdb0cdd66ff168917fb9ef17b96093990cc6a98d
When a lot of commits are added to DateRevQueue, the
sort-on-insertion approach is very heavy on CPU cycles.
One approach to fix this was made by Dave Borowitz:
https://git.eclipse.org/r/#/c/5491/
But using Java's PriorityQueue seems to have brought some
extra overhead, and the desired performance could not be
reached.
This fix takes another approach to the insertion problem,
without changing the expected behaviour or bringing extra
memory overhead:
If we detect over 1000 commits in the DateRevQueue, a
"seek-index" is rebuilt every 1000th added commit.
The index keeps track of every 100th commit in the
DateRevQueue. During insertions, it will be used for a
preliminary scanning (binary search) of the queue, with
the intention of helping add() find a good starting point
to start walking from. After finding this starting point,
add() will step commit-by-commit until the correct
insertion place in the queue is found (today, the queue
is expected to be sorted at all times).
When applied to repositories with many refs, this approach
has proven to bring huge performance gains and scales quite
well.
For instance, in a repository with close to 80000 refs,
we could cut down the time a typical Gerrit replication
of 1 commit would take (just a push from JGit's point of
view) from 32sec down to 3.5sec.
Below you see some typical times to add a specific amount
of commits (with random commit times) to the DateRevQueue
and the difference the preliminary seek-index makes:
Commits | Index | No Index
1024 8ms 8ms
2048 13ms 9ms
4096 5ms 59ms
8192 11ms 595ms
16384 22ms 3058ms
32768 64ms 13811ms
65536 201ms 62677ms
131072 783ms 331585ms
Only one extra reference is needed for every 100 inserted
commits (and only when we see more than 1000 commits in
the queue), so the memory overhead should be negligible.
Various index-stepping values were tested, and 100 seemed to
scale very well and be effective from start.
In the future, it should probably be dynamic and based on
the number of refs in the queue, but this should serve well
as a starting point.
Note: While other fundamentally different data structures may
be more suitable, the DateRevQueue is extremely central to
many of the Git core operations. This approach was chosen,
since the effect of the patch is easy to predict in conjuction
with the current implementation. A totally new data structure
will make it harder to predict behaviour in many common and
uncommon cases (in terms of breaking ties, memory usage, cost
when using few elements, object creation/disposing overhead,
etc).
Change-Id: Ie7b99f40eacf6324bfb4716d82073adeda64d10f
A few classes such as Constanrs are marked with @SuppressWarnings, as are
toString() methods with many liternal, but otherwise $NLS-n$ is used for
string containing text that should not be translated. A few literals may
fall into the gray zone, but mostly I've tried to only tag the obvious
ones.
Change-Id: I22e50a77e2bf9e0b842a66bdf674e8fa1692f590
StartGenerator now processes .git/shallow to have the
RevWalk stop for shallow commits.
See RevWalkShallowTest for tests.
Bug: 394543
CQ: 6908
Change-Id: Ia5af1dab3fe9c7888f44eeecab1e1bcf2e8e48fe
Signed-off-by: Chris Aniszczyk <zx@twitter.com>
These appear as descriptions in the index, see here (currently empty):
http://download.eclipse.org/jgit/docs/latest/apidocs/
Change-Id: If7996deef30ae688bade8b3ad6b19547ca3d8b50
Signed-off-by: Chris Aniszczyk <zx@twitter.com>
Refactored method to find branches from which a commit is reachable
The method uses some heuristics to obtain much better performance
than isMergeBase.
Since I wrote the relevant code in the method I approve the license
change from EPL to EDL implied by the move.
Change-Id: Ic4a7584811a2b0bf24e4f6b3eab2a7c022eabee8
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
This is needed to allow jumping to a selected commit when loading
history incrementally.
Change-Id: Id3b97d88d3b4b2d67561b11f8810cb88fe040823
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
This is needed to allow jumping to a selected commit when loading
history incrementally.
Change-Id: Id3b97d88d3b4b2d67561b11f8810cb88fe040823
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Use Integer, Character, and Long valueOf methods when
passing parameters to MessageFormat and other places
that expect objects instead of primitives
Change-Id: I5942fbdbca6a378136c00d951ce61167f2366ca4