Originally, blame's walk to find a scapegoat to blame for a file
walking backward from a commit used the test
treeWalk.getFileMode(0).getObjectType() != OBJ_BLOB
to throw out gitlink (submodule) entries. Later, 52500d3264 (blame:
Micro optimize blob lookup in tree, 2014-04-17) changed that test to
(treeWalk.getRawMode(0) & TYPE_FILE) != TYPE_FILE
These checks are not the same, though: the older test accepts files
and symlinks, while the newer one accepts files, symlinks, and gitlink
(submodule) entries. This is particularly broken in the submodule
case --- trying to parse the referred-to commit as a blob produces
caught an exception: GET /gerrit/+blame/master/plugins/reviewnotes HTTP/1.1
org.eclipse.jgit.errors.MissingObjectException: Missing blob 61702414c0
Stick to just (possibly executable) files instead. Symlinks are not
line-oriented data so blame on a symlink is not likely to be useful.
A quick grep for '& TYPE_' doesn't find any other instances of this
bug.
Change-Id: Iebcc91f1bee3c91adda51dccd6372e8302bf23fe
Signed-off-by: Jonathan Nieder <jrn@google.com>
blame: Revert common subtree elimination "optimization"
This partially reverts 6de12836d7.
Performing a TreeWalk over 2 trees to identify and skip unmodified
subtrees to pass all blame onto an ancestor appears to be a micro
optimization that works for a very limited number of files. In the
general case the 2 tree walk is slowing down blame more than it helps
to speed it up.
I keep coming up with files in multiple repositories where 6de128 is
making things worse, not better, and only one example where it
actually improved performance, render_view_impl.cc in chromium
as described in the commit message.
Change-Id: Ic6d5fff22acb5ab6485614a07bdb388e8c336679
blame: Fix merges, where merge result differs only by whitespace
When blaming a merge commit with "Ignore whitespace changes" enabled,
don't discard blame candidates for other parents when we encounter a
parent that only has whitespace changes compared to the merge result.
The algorithm early prepares parents for blaming, removing the
appropriate blame regions from the list of regions still to blame. Only
at the end, the prepared blame candidates are submitted for blaming.
When looking at a non-first parent which only differs in whitespace to
the merge result, it submitted that parent, but only to blame it for the
(usually few) lines not already prepared to blame on other parents. Due
to an early return the blame candidates for the previous parents were
forgotten, leaving many lines unannotated.
bug: 433024
Change-Id: I43c9caf2078b92b05e652dbed2192568907bf199
Signed-off-by: Konrad Kügler <swamblumat-eclipsebugs@yahoo.de>
Skipping directly to the parent is already possible with an existing
helper method. Update the source path (to follow the rename) and then
use the existing code path to push the parent inside the current entry.
Change-Id: Icb1d49e53d14b599efc478990613625a9e058e09
blame: Allow candidate to manage its setup before output
Pass in the RevWalk and let the candidate decide how to prepare itself
for output. This removes the conditional for the missing sourceCommit,
as candidates missing a commit can override the method with a no-op.
Change-Id: I3fa19b8676dfd3c177583f8f42593b5000b5350d
blame: Do not update candidate regionList during output
Instead of updating the candidate's regionList field to iterate
through the linked list of regions, use a special purpose field
in the BlameGenerator. This allows the candidate to be unmodified.
Change-Id: I2cda031b59220ab603ef82050e741ecbbaa1953f
The computeRange method is inefficient for computing the entire file.
If the entire file was selected ask for the entire file.
Change-Id: I8b2dbf635e875cc125443dac50be121208646540
blame: Reduce running time ~4.5% by skipping common subtrees
With this commit running blame on render_view_impl.cc[1] saves
about 644 ms over prior versions, reducing the time about 4.5%.
Large projects often contain strands of commits where no changes
are made to a particular subtree. Blame used to dive recursively
into these subtrees to look for the blob and check if its SHA-1
was changed. In chromium/src[1] only 20% of the commits modify
the content/renderer subtree relevant for the file.
The recursivePath is necessary to check for '/' and remember
if common subtree elimination should be attempted. When a file
lives within a subtree the extra cost to check for unmodified
subtrees saves time. However for files in the root tree the
extra work incurred by TreeWalk is not worthwhile and would
significantly increase overall running time.
Now typical running times from an otherwise idle desktop:
real 0m13.387s 0m13.341s 0m13.443s
user 0m15.410s 0m15.220s 0m15.350s
previously:
real 0m14.085s 0m14.049s 0m13.968s
user 0m15.730s 0m15.820s 0m15.770s
[1] 34d6e5c5b4/content/renderer/render_view_impl.cc
Change-Id: Ib16d684df7ffa034ee28def3fb22c797998d5b7b
Avoid converting the raw mode to FileMode. This is an expensive
if-else-if sort of test to just check if the thing is a blob.
Instead test the bit mask directly, which is at least a few
instructions shorter.
The TreeWalk is already recursive and will auto-dive into any
subtrees found. isSubtree check is unnecessary, as is the loop,
as only one result will ever be returned by next().
Change-Id: I9fb25229ebed857469427bfbdf74aedebfddfac8
Ensure commit object names are unique by extending the default
abbreviation as long as necessary. This allows `jgit blame` to
more closely match the formatted output of `git blame` on large
histories like Gerrit Code Review's ReceiveCommits.java file.
Change-Id: I5f7c4855769ee9dcba973389df9e109005dcdb5b
Blame correctly in the presence of conflicting merges
Problem:
The BlameGenerator used the RevFlag SEEN to mark commits it had
already looked at (but not necessarily processed), to prevent
processing a commit multiple times. If a commit is a conflicting
merge that contains lines of the merge base, that have been deleted
in its first parent, either these lines or the lines untouched
since the merge base would not be blamed properly.
This happens for example if a file is modified on a main branch in an
earlier commit M and on a side branch in a later commit S. For this
example, M deletes some lines relative to the common base commit B,
and S modifies a subset of these lines, leaving some other of these
lines untouched.
Then side is merged into main, creating a conflict for these
lines. The merge resolution shall carry over some unmodified lines
from B that would otherwise be deleted by M. The route to blame
these lines is via S to B. They can't be blamed via M, as they
don't exist there anymore.
Q
|\
| \
| S
| |
M |
| /
|/
B
Blaming the merged file first blames via S, because that is the
most recent commit. Doing so, it also looks at B to blame the
unmodified lines of B carried over by S into the merge result. In the
course of this, B is submitted for later processing and marked SEEN.
Later M is blamed. It notices that its parent commit B has been
SEEN and aborts processing for M. B is blamed after that, but only
for the lines that survived via S.
As a result, only the lines contributed by S or by B via S are
blamed. All the other lines that were unchanges by both M and S,
which should have been blamed to B via M, are not blamed.
Solution:
Don't abort processing when encountering a SEEN commit. Rather add the
new region list of lines to be blamed to those of the already SEEN and
enqueued commit's region list. This way when the B commit of the
above example is processed, it will blame both the lines of M and S,
yielding a complete blame result.
Bug: 374382
Change-Id: I369059597608022948009ea7708cc8190f05a8d3
Signed-off-by: Konrad Kügler <swamblumat-eclipsebugs@yahoo.de>
Signed-off-by: Shawn Pearce <spearce@spearce.org>
A few classes such as Constanrs are marked with @SuppressWarnings, as are
toString() methods with many liternal, but otherwise $NLS-n$ is used for
string containing text that should not be translated. A few literals may
fall into the gray zone, but mostly I've tried to only tag the obvious
ones.
Change-Id: I22e50a77e2bf9e0b842a66bdf674e8fa1692f590
These appear as descriptions in the index, see here (currently empty):
http://download.eclipse.org/jgit/docs/latest/apidocs/
Change-Id: If7996deef30ae688bade8b3ad6b19547ca3d8b50
Signed-off-by: Chris Aniszczyk <zx@twitter.com>
[blame] Don't pass null to PersonIdent constructor
The API was changed to not allow null values anymore with
I0ac994ae8e47789d38f7c6e6db55d482f0f1bac3, leading to an IAE.
Bug: 393054
Change-Id: If33560ae976b46a02ff75b2e4ec05c13a8ad2d41
The file location of the constructor for BlameGenerator
did not specify where the path should be relative from.
Fix BlameGenerator comments based on suggestions by Robin Stocker.
Change-Id: I3d79db2d2ba4961835fe664ae6178e0bfc97b910
[blame] Fix blame following renames in non-toplevel directories
Mark the treeWalk as recursive; otherwise following renames only works
for toplevel files.
Bug: 302549
Change-Id: I70867928eadf332b0942f8bf6877a3acb3828c87
Signed-off-by: Carsten Pfeiffer <carsten.pfeiffer@gebit.de>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Signed-off-by: Chris Aniszczyk <zx@twitter.com>
blame: Compute the origin of lines in a result file
BlameGenerator digs through history and discovers the origin of each
line of some result file. BlameResult consumes the stream of regions
created by the generator and lays them out in a table for applications
to display alongside of source lines.
Applications may optionally push in the working tree copy of a file
using the push(String, byte[]) method, allowing the application to
receive accurate line annotations for the working tree version. Lines
that are uncommitted (difference between HEAD and working tree) will
show up with the description given by the application as the author,
or "Not Committed Yet" as a default string.
Applications may also run the BlameGenerator in reverse mode using the
reverse(AnyObjectId, AnyObjectId) method instead of push(). When
running in the reverse mode the generator annotates lines by the
commit they are removed in, rather than the commit they were added in.
This allows a user to discover where a line disappeared from when they
are looking at an older revision in the repository. For example:
blame --reverse 16e810b2..master -L 1080, org.eclipse.jgit.test/tst/org/eclipse/jgit/storage/file/RefDirectoryTest.java
( 1080) }
2302a6d3 (Christian Halstrick 2011-05-20 11:18:20 +0200 1081)
2302a6d3 (Christian Halstrick 2011-05-20 11:18:20 +0200 1082) /**
2302a6d3 (Christian Halstrick 2011-05-20 11:18:20 +0200 1083) * Kick the timestamp of a local file.
Above we learn that line 1080 (a closing curly brace of the prior
method) still exists in branch master, but the Javadoc comment below
it has been removed by Christian Halstrick on May 20th as part of
commit 2302a6d3. This result differs considerably from that of C
Git's blame --reverse feature. JGit tells the reader which commit
performed the delete, while C Git tells the reader the last commit
that still contained the line, leaving it an exercise to the reader
to discover the descendant that performed the removal.
This is still only a basic implementation. Quite notably it is
missing support for the smart block copy/move detection that the C
implementation of `git blame` is well known for. Despite being
incremental, the BlameGenerator can only be run once. After the
generator runs it cannot be reused. A better implementation would
support applications browsing through history efficiently.
In regards to CQ 5110, only a little of the original code survives.
CQ: 5110
Bug: 306161
Change-Id: I84b8ea4838bb7d25f4fcdd540547884704661b8f
Signed-off-by: Kevin Sawicki <kevin@github.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>