Suppress boxing warnings where we know they are ok
Invoke the wrapper types' valueOf via static imports.
For booleans used in asserts, add a new assert in
the JUnit utility package since out current version of JUnit
does not have the assert(boolean, boolean) method.
Change-Id: I9099bd8efbc8c133479344d51ce7dabed8958a2b
Eclipse has some problem re-running single JUnit tests if
the tests are in Junit 3 format, but the JUnit 4 launcher
is used. This was quite unnecessary and the move was not
completed. We still have no JUnit4 test.
This completes the extermination of JUnit3. Most of the
work was global searce/replace using regular expression,
followed by numerous invocarions of quick-fix and organize
imports and verification that we had the same number of
tests before and after.
- Annotations were introduced.
- All references to JUnit3 classes removed
- Half-good replacement for getting the test name. This was
needed to make the TestRngs work. The initialization of
TestRngs was also made lazily since we can not longer find
out the test name in runtime in the @Before methods.
- Renamed test classes to end with Test, with the exception
of TestTranslateBundle, which fails from Maven
- Moved JGitTestUtil to the junit support bundle
Change-Id: Iddcd3da6ca927a7be773a9c63ebf8bb2147e2d13
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Rename getOldName,getNewName to getOldPath,getNewPath
TreeWalk calls this value "path", while "name" is the stuff after the
last slash. FileHeader should do the same thing to be consistent.
Rename getOldName to getOldPath and getNewName to getNewPath.
Bug: 318526
Change-Id: Ib2e372ad4426402d37939b48d8f233154cc637da
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Content similarity based rename detection is performed only after
a linear time detection is performed using exact content match on
the ObjectIds. Any names which were paired up during that exact
match phase are excluded from the inexact similarity based rename,
which reduces the space that must be considered.
During rename detection two entries cannot be marked as a rename
if they are different types of files. This prevents a symlink from
being renamed to a regular file, even if their blob content appears
to be similar, or is identical.
Efficiently comparing two files is performed by building up two
hash indexes and hashing lines or short blocks from each file,
counting the number of bytes that each line or block represents.
Instead of using a standard java.util.HashMap, we use a custom
open hashing scheme similiar to what we use in ObjecIdSubclassMap.
This permits us to have a very light-weight hash, with very little
memory overhead per cell stored.
As we only need two ints per record in the map (line/block key and
number of bytes), we collapse them into a single long inside of
a long array, making very efficient use of available memory when
we create the index table. We only need object headers for the
index structure itself, and the index table, but not per-cell.
This offers a massive space savings over using java.util.HashMap.
The score calculation is done by approximating how many bytes are
the same between the two inputs (which for a delta would be how much
is copied from the base into the result). The score is derived by
dividing the approximate number of bytes in common into the length
of the larger of the two input files.
Right now the SimilarityIndex table should average about 1/2 full,
which means we waste about 50% of our memory on empty entries
after we are done indexing a file and sort the table's contents.
If memory becomes an issue we could discard the table and copy all
records over to a new array that is properly sized.
Building the index requires O(M + N log N) time, where M is the
size of the input file in bytes, and N is the number of unique
lines/blocks in the file. The N log N time constraint comes
from the sort of the index table that is necessary to perform
linear time matching against another SimilarityIndex created for
a different file.
To actually perform the rename detection, a SxD matrix is created,
placing the sources (aka deletions) along one dimension and the
destinations (aka additions) along the other. A simple O(S x D)
loop examines every cell in this matrix.
A SimilarityIndex is built along the row and reused for each
column compare along that row, avoiding the costly index rebuild
at the row level. A future improvement would be to load a smaller
square matrix into SimilarityIndexes and process everything in that
sub-matrix before discarding the column dimension and moving down
to the next sub-matrix block along that same grid of rows.
An optional ProgressMonitor is permitted to be passed in, allowing
applications to see the progress of the detector as it works through
the matrix cells. This provides some indication of current status
for very long running renames.
The default line/block hash function used by the SimilarityIndex
may not be optimal, and may produce too many collisions. It is
borrowed from RawText's hash, which is used to quickly skip out of
a longer equality test if two lines have different hash functions.
We may need to refine this hash in the future, in order to minimize
the number of collisions we get on common source files.
Based on a handful of test commits in JGit (especially my own
recent rename repository refactoring series), this rename detector
produces output that is very close to C Git. The content similarity
scores are sometimes off by 1%, which is most probably caused by
our SimilarityIndex type using a different hash function than C
Git uses when it computes the delta size between any two objects
in the rename matrix.
Bug: 318504
Change-Id: I11dff969e8a2e4cf252636d857d2113053bdd9dc
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The strings are externalized into the root resource bundles.
The resource bundles are stored under the new "resources" source
folder to get proper maven build.
Strings from tests are, in general, not externalized. Only in
cases where it was necessary to make the test pass the strings
were externalized. This was typically necessary in cases where
e.getMessage() was used in assert and the exception message was
slightly changed due to reuse of the externalized strings.
Change-Id: Ic0f29c80b9a54fcec8320d8539a3e112852a1f7b
Signed-off-by: Sasa Zivkov <sasa.zivkov@sap.com>
Per CQ 3448 this is the initial contribution of the JGit project
to eclipse.org. It is derived from the historical JGit repository
at commit 3a2dd9921c.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>