Native Git canonicalizes line endings when detecting
renames, more specifically it replaces CRLF by LF.
See: hash_chars in diffcore-delta.c
Bug: 449545
Change-Id: Iec2aab12ae9e67074cccb7fbd4d9defe176a0130
Signed-off-by: Marc Strapetz <marc.strapetz@syntevo.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
HistogramDiff: Convert stack recursion to heap managed queue
Each time the longest common substring is found the diff algorithm
recurses to reprocess the regions before and after the common string.
Large files with many edits can trigger StackOverflowError as the
algorithm attempts to process a deeply split tree of regions. This
is especially prone to happen in servers where the Java stack size
may have been limited to 1M or even 256K.
To keep edits produced in order a queue is used to process edits
in a depth-first strategy.
Change-Id: Iae7260c6934efdffac7c7bee4d3633a8208924f7
Handle diff formatting when there is nothing to compare with
DiffFormatter now suports either side being null and the log program
will output the diff for the first commit.
Bug: 395791
Change-Id: I378957b57e9ad1f7195ba416f402178453f0ebd3
If the header and trailer are identical up to a single line on both
sides, return that REPLACE edit as the only result. No algorithm can
break down a REPLACE with height of 1.
Change-Id: I483c40e8790cc3e8b322ef6dfce2299491fd0ac7
When git-core renames or copies a file and the mode differs the
header shows the mode change first, then the rename or copy data:
diff --git a/COPYING b/LICENSE
old mode 100644
new mode 100755
similarity index 92%
rename from COPYING
rename to LICENSE
index d645695..54863be
--- a/COPYING
+++ b/LICENSE
@@ -56,20 +56,6 @@
JGit relies on this ordering inside of FileHeader. Parsing "new file
mode NNN" after "copy from/to" or "rename from/to" resets the change
type to be ADD, losing the COPIED or RENAMED status and old path.
This fixes a 4 year old bug in Gerrit Code Review that prevents
opening a file for review if the file was copied from another file,
modified in this change, and the mode was updated (e.g. execute
bit was added).
Change-Id: If4c9ecd61ef0ca8e3e1ea857301f7b5c948efb96
[ms: added test case]
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
This method always returns false and is private so it cannot be
overridden at runtime by a subclass. Drop the method and the branch
that can never be taken.
Change-Id: I4d3edbf469c6739dca191e62ea580bdb534b67a4
These methods do not touch instance members and can avoid the
implicit "this" argument.
Change-Id: I01c30bb22266eed1c9db18bdf9f90c1c1590e3ec
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Apply tree filter marks when pairing DiffEntry for renames
When using a RenameDetector to generate new DiffEntries after using
DiffEntry.scan, the treeFilterMarks of the original entries were lost.
Now it combines the marks from src and dst.
See EGit bug 335082 where this is used.
Change-Id: I72b34b10ca12e3a6bd10ce44f4fa05b193fc52cc
Fix DiffFormatter NPEs for DiffEntry without content change
DiffEntry.getOldId() returns null for a diff without an index line (e.g.
only mode changed, rename without content change).
Bug: 407743
Change-Id: I42eac87421f2a53c985af260a253338f578492bc
The various rename detection options are an inherent part of the
filter, similar to the path being followed.
This fixes a potential NPE when a RevWalk with a FollowFilter is
created without a Repository, since the old code path tried to get
the DiffConfig from the RevWalk's possibly-missing repository.
Change-Id: Idb273d5a92849b42935ac14eed73b796b80aad50
Extract method to output the first header line of a git diff
In order to be able to determine the range of the first header line
(e.g. "diff --git a/file1 b/file2") in subclasses, the code that formats
the first header line is extracted.
Required by egit's change: Ia61398146c0336ab332234f24d341561292554db
Change-Id: I9dd5eb964ed8b6869745c3162159b7425ac2c44a
Signed-off-by: Tobias Pfeifer <to.pfeifer@sap.com>
Enable marking entries using TreeFilters in DiffEntry
This adds a new optional TreeFilter[] argument to DiffEntry.scan. All
filters will be checked during the scan to determine if an entry should
be "marked" with regard to that filter.
After having called scan, the user can then call isMarked(int) on the
entries to find out whether they matched the TreeFilter with the passed
index.
An example use case for this is in the file diff viewer of EGit's
History view, where we'd like to highlight entries that are matching the
current filter.
See EGit change I03da4b38d1591495cb290909f0e4c6e52270e97f.
Bug: 393610
Change-Id: Icf911fe6fca131b2567514f54d66636a44561af1
Signed-off-by: Robin Stocker <robin@nibor.org>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
A few classes such as Constanrs are marked with @SuppressWarnings, as are
toString() methods with many liternal, but otherwise $NLS-n$ is used for
string containing text that should not be translated. A few literals may
fall into the gray zone, but mostly I've tried to only tag the obvious
ones.
Change-Id: I22e50a77e2bf9e0b842a66bdf674e8fa1692f590
These appear as descriptions in the index, see here (currently empty):
http://download.eclipse.org/jgit/docs/latest/apidocs/
Change-Id: If7996deef30ae688bade8b3ad6b19547ca3d8b50
Signed-off-by: Chris Aniszczyk <zx@twitter.com>
Change-Id: I0a86ce0e393dfde9bb27f0b29e036e76c856396e
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Signed-off-by: Chris Aniszczyk <zx@twitter.com>
Use Integer, Character, and Long valueOf methods when
passing parameters to MessageFormat and other places
that expect objects instead of primitives
Change-Id: I5942fbdbca6a378136c00d951ce61167f2366ca4
Content length is computed and cached (short term) in the working
tree iterator when core.autocrlf is set.
Hopefully this is a cleaner fix than my previous attempt to make
autocrlf work.
Change-Id: I1b6bbb643101a00db94e5514b5e2b069f338907a
RawText#getEOL() does the same thing as RawText#getLineDelimiter()
The duplication has been introduced when merging
I08e1369e142bb19f42a8d7bbb5a7d062cc8533fc and
I18adc63596f4657516ccc6d704a561924c79d445. The former should have been
manually rebased. It also missed a copyright update in ApplyCommandTest.
Change-Id: I18fe6108220f964524fb16b719604222aa7abee6
Report diff entries for files that only change mode
This also updates DiffFormatter to not write path lines
for entries that have the same object id
Bug: 361570
Change-Id: I830a78e2babf472503630a7aa020ebfd5c7e69c6
Allow detecting which files were renamed during a revwalk
The egit history view shows the files associated with a commit by using
a PathFilter. When following renames with a FollowFilter, the PathFilter
cannot be configured anymore because the affected files are simply not
known.
Thus, it should be possible to get to know which files are renamed.
Bug: 302549
Change-Id: I4761e9f5cfb4f0ef0b0e1e38991401a1d5003bea
Adds method into DiffEntry class that allows to specify whether changed
trees are included in scanning result list. By default changed trees
aren't added, but in some cases having changed tree would be useful.
Also adds check for tree count in TreeWalk and when it is different from
two it will thrown an IllegalArgumentException.
This change is required by egit
I7ddb21e7ff54333dd6d7ace3209bbcf83da2b219
Change-Id: I5a680a73e1cffa18ade3402cc86008f46c1da1f1
Signed-off-by: Dariusz Luksza <dariusz@luksza.org>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
There is no reason for this type to contain an ArrayList and try to
hide the implementation. It only slows down execution by adding an
extra layer of method dispatch to each invocation.
Instead subclass from ArrayList.
Change-Id: Ifbb9c7060c2fe3d5a7397c1aa85fbade14088637
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Adds a class which can be used to calculates a SHA1 of the diff
associated with a patch, similar to git patch-id.
In this version whitespace is not ignored.
Change-Id: I421d15ea905e23df543082786786841cbe3ef10d
Signed-off-by: Stefan Lay <stefan.lay@sap.com>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
For the following patch on the linux 2.6.32 tag:
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -685,6 +685,7 @@ static void enqueue_sleeper(struct cfs_rq *cfs_rq, struct sc
static void check_spread(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
+#if 0
#ifdef CONFIG_SCHED_DEBUG
s64 d = se->vruntime - cfs_rq->min_vruntime;
@@ -694,6 +695,7 @@ static void check_spread(struct cfs_rq *cfs_rq, struct
sched
if (d > 3*sysctl_sched_latency)
schedstat_inc(cfs_rq, nr_spread_over);
#endif
+#endif
}
static void
JGit produced an incorrect diff, attempting to add a new "}" instead
of the new "#endif" at the end of the hunk. This was caused by a prior
fix for bug 328895 where we wanted to "slide" a diff down in the file
when adding a new method/function and want to show the closing curly
brace as being added after the new method, rather than added onto the
end of the prior function or method just before the insertion point.
Bug: 345956
Change-Id: I32b9e24f1e2980258b1b39dd1807919ab1c5f9b2
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Fix diff when first text is the start of the other
The problem occurred when the first text ends in the middle
of the last line of the other text and the first text has no
end of line.
Bug: 344975
Change-Id: I1f0dd9f8062f2148a7c1341c9122202e082ad19d
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
DiffFormatter: Use IndexDiffFilter to speed up working tree
If DiffFormatter is asked to compare the index to the working tree,
it can go faster by using the cached stat information to compare
the two entries rather than relying on SHA-1 computation alone.
Change-Id: Icb21c15b8279ee8cee382e5e179e0cf8903aee4d
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Its confusing that a new TreeWalk() needs to have reset() invoked
on it before addTree(). This is a historical accident caused by
how TreeWalk was abused within ObjectWalk.
Drop the initial empty tree from the TreeWalk and thus remove a
number of pointless reset() operations from unit tests and some of
the internal JGit code.
Existing application code which is still calling reset() will simply
be incurring a few unnecessary field assignments, but they should
consider cleaning up their code in the future.
Change-Id: I434e94ffa43491019e7dff52ca420a4d2245f48b
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
When in OURS and THEIRS a new file is created we want a conflict
when the two contents differ. If on two branches the same file
with the same content is created this should not be a conflict.
But: the current merge algorithm is throwing NPEs in this case.
Fix this by choosing an empty RawText as common base if the
base is empty.
Change-Id: I21cb23f852965b82fb82ccd66ec961c7edb3ac3d
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Fix DiffConfig to understand "copy" resp. "copies" for diff.renames property.
Rename detection should be considered enabled if
diff.renames config property is set to "copy" or "copies", instead of
throwing IllegalArgumentException.
Change-Id: If55d955e37235d4d00f5b0febd6aa10c0e27814e
The diff algorithm which is used by Merge, Cherry-Pick, Rebase
should be configurable. A new configuration parameter "diff.algorithm"
is introduced which currently accepts the values "myers" or
"histogram". Based on this parameter for example the ResolveMerger
will choose a diff algorithm. The reason for this is bug 331078.
This bug shows that JGit is more compatible with C Git when
histogram diff is in place. But since histogram diff is quite new we
need an easy way to fall back to Myers diff.
Bug: 331078
Change-Id: I2549c992e478d991c61c9508ad826d1a9e539ae3
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Philipp Thun <philipp.thun@sap.com>
If there are only deletes, don't need perform rename or copy
detection. There are no adds (aka destinations) for the deletes
to match against.
Change-Id: I00fb90c509fa26a053de561dd8506cc1e0f5799a
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Setting the array elements to -1 is more expensive than relying on
the allocator to zero the array for us first. Shifting the code to
always add 1 to the size (so an empty file is actually 1 byte long)
allows us to detect an unloaded size by comparing to 0, thus saving
the array fill calls.
Change-Id: Iad859e910655675b53ba70de8e6fceaef7cfcdd1
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
SimilarityRenameDetector: Avoid allocating source index
If the only file added is really small, and all of the deleted
files are really big, none of the permutations will match up due
to the sizes being too far apart to fit the current rename score.
Avoid allocating the really big deleted SimilarityIndex by deferring
its construction until at least one add along that row has a
reasonable chance of matching it.
This avoids expending a lot of CPU time looking at big deleted
binary files when a small modified text file was broken due to a
high percentage of changed lines.
Change-Id: I11ae37edb80a7be1eef8cc01d79412017c2fc075
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
SimilarityRenameDetector: Only attempt to index large files once
If a file fails to index the first time the loop encounters it, the
file is likely to fail to index again on the next row. Rather than
wasting a huge amount of CPU to index it again and fail, remember
which destination files failed to index and skip over them on each
subsequent row.
Because this condition is very unlikely, avoid allocating the BitSet
until its actually needed. This keeps the memory usage unaffected
for the common case.
Change-Id: I93509b28b61a9bba8f681a7b4df4c6127bca2a09
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The counter portion of each pair is only 32 bits wide, but is part
of a larger 64 bit integer. If the file size was larger than 4 GB
the counter could overflow and impact the key, changing the hash,
and later resulting in an incorrect similarity score.
Guard against this overflow condition by capping the count for each
record at 2^32-1. If any record contains more than that many bytes
the table aborts hashing and throws TableFullException.
This permits the index to scan and work on files that exceed 4 GB
in size, but only if the file contains more than one unique block.
The index throws TableFullException on a 4 GB file containing all
zeros, but should succeed on a 6 GB file containing unique lines.
The index now uses a 64 bit accumulator during the common scoring
algorithm, possibly resulting in slower summations. However this
index is already heavily dependent upon 64 bit integer operations
being efficient, so increasing from 32 bits to 64 bits allows us
to correctly handle 6 GB files.
Change-Id: I14e6dbc88d54ead19336a4c0c25eae18e73e6ec2
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Files bigger than 8 MB (2^23 bytes) tended to overflow the internal
hashtable, as the table was capped in size to 2^17 records. If a
file contained 2^17 unique data blocks/lines, the table insertion
got stuck in an infinite loop as the able couldn't grow, and there
was no open slot for the new item.
Remove the artifical 2^17 table limit and instead allow the table
to grow to be as big as 2^30. With a 64 byte block size, this
permits hashing inputs as large as 64 GB.
If the table reaches 2^30 (or cannot be allocated) hashing is
aborted. RenameDetector no longer tries to break a modify file pair,
and it does not try to match the file for rename or copy detection.
Change-Id: Ibb4d756844f4667e181e24a34a468dc3655863ac
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
SimilarityIndex: Correct comment explaining the logic
This comment was wrong, due to a copy-and-paste error. Here the
code is looking at records of dst that do not exist in src, and
are skipping past them to find another match.
Change-Id: I07c1fba7dee093a1eeffcf7e0c7ec85446777ffb
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>