mirrors/jgit - jgit - source @ dussan.org

Grafico dei commit

Autore	SHA1	Messaggio	Data
Youssef Elghareeb	1788b72d1a	Skip detecting content renames for binary files This is similar to change Idbc2c29bd that skipped detecting content renames for large files. With this change, we added a new option in RenameDetector called "skipContentRenamesForBinaryFiles", that when set, causes binary files with any slight modification to be identified as added/deleted. The default for this boolean is false, so preserving current behaviour. Change-Id: I4770b1f69c60b1037025ddd0940ba86df6047299	3 anni fa
Youssef Elghareeb	4a78d911c5	Skip detecting content renames for large files There are two code paths for detecting renames: one on tree diffs (using DiffFormatter#scan) and the other on single file diffs (using DiffFormatter#format). The latter skips binary and large files for rename detection - check [1], but the former doesn't. This change skips content rename detection for the tree diffs case for large files. This is essential to avoid expensive computations while reading the file, especially for callers who don't want to pay that cost. Content renames are those which involve files with slightly modified content. Exact renames will still be identified. The default threshold for file sizes is reused from PackConfig.DEFAULT_BIG_FILE_THRESHOLD: 50 MB. [1] `232876421d/org.eclipse.jgit/src/org/eclipse/jgit/diff/RawText.java (386)` Change-Id: Idbc2c29bd381c6e387185204638f76fda47df41e Signed-off-by: Youssef Elghareeb <ghareeb@google.com>	3 anni fa
Matthias Sohn	5c5f7c6b14	Update EDL 1.0 license headers to new short SPDX compliant format This is the format given by the Eclipse legal doc generator [1]. [1] https://www.eclipse.org/projects/tools/documentation.php?id=technology.jgit Bug: 548298 Change-Id: I8d8cabc998ba1b083e3f0906a8d558d391ffb6c4 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	4 anni fa
David Pursehouse	3b4448637f	Enable and fix warnings about redundant specification of type arguments Since the introduction of generic type parameter inference in Java 7, it's not necessary to explicitly specify the type of generic parameters. Enable the warning in Eclipse, and fix all occurrences. Change-Id: I9158caf1beca5e4980b6240ac401f3868520aad0 Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>	7 anni fa
Rüdiger Herrmann	e18444de30	Fix MissingObjectException in RenameDetector When attempting to determine the size of a blob that does not exist, the RenameDetector throws a MissingObjectException. The fix is to return a size of zero if the size is requested for a blob id that doesn't exist. Bug: 481577 Change-Id: I4e86276039c630617610cc51d0eefa56d7d3952f Signed-off-by: Rüdiger Herrmann <ruediger.herrmann@gmx.de>	8 anni fa
Robin Rosenberg	767be14f34	Move base test classes to the junit bundle for reuse for Java 7 tests Change-Id: Iedb54eb9d8396bc3ae66d8754c1527fd9ca655f9	11 anni fa
Robin Rosenberg	04bc9b3ddc	Add type argumente to some raw reclaration Change-Id: Ief195fb5c55f75172f0428fdac8c8874292ae566	11 anni fa
Robin Rosenberg	d9e07a574a	Convert all JGit unit tests to JUnit 4 Eclipse has some problem re-running single JUnit tests if the tests are in Junit 3 format, but the JUnit 4 launcher is used. This was quite unnecessary and the move was not completed. We still have no JUnit4 test. This completes the extermination of JUnit3. Most of the work was global searce/replace using regular expression, followed by numerous invocarions of quick-fix and organize imports and verification that we had the same number of tests before and after. - Annotations were introduced. - All references to JUnit3 classes removed - Half-good replacement for getting the test name. This was needed to make the TestRngs work. The initialization of TestRngs was also made lazily since we can not longer find out the test name in runtime in the @Before methods. - Renamed test classes to end with Test, with the exception of TestTranslateBundle, which fails from Maven - Moved JGitTestUtil to the junit support bundle Change-Id: Iddcd3da6ca927a7be773a9c63ebf8bb2147e2d13 Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 anni fa
Shawn O. Pearce	e2f5716c94	Fix ArrayIndexOutOfBounds on non-square exact rename matrix If the exact rename matrix for a particular ObjectId isn't square we crashed with an ArrayIndexOutOfBoundsException because the matrix entries were encoded backwards. The encode function accepts the source (aka deleted) index first, not second. Add a unit test to cover this non-square case to ensure we don't have this regression in the future. Change-Id: I5b005e5093e1f00de2e3ec104e27ab6820203566 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 anni fa
Shawn O. Pearce	60c5939b23	Rename getOldName,getNewName to getOldPath,getNewPath TreeWalk calls this value "path", while "name" is the stuff after the last slash. FileHeader should do the same thing to be consistent. Rename getOldName to getOldPath and getNewName to getNewPath. Bug: 318526 Change-Id: Ib2e372ad4426402d37939b48d8f233154cc637da Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 anni fa
Jeff Schumacher	e64cb03065	Fixed bug in scoring mechanism for rename detection A bug in rename detection would cause file scores to be wrong. The bug was due to the way rename detection would judge the similarity between files. If file A has three lines containing 'foo', and file B has 5 lines containing 'foo', the rename detection phase should record that A and B have three lines in common (the minimum of the number of times that line appears in both files). Instead, it would choose the the number of times the line appeared in the destination file, in this case file B. I fixed the bug by having the SimilarityIndex instead choose the minimum number, as it should. I also added a test case to verify that the bug had been fixed. Change-Id: Ic75272a2d6e512a361f88eec91e1b8a7c2298d6b	14 anni fa
Jeff Schumacher	396fe6da45	Break dissimilar file pairs during diff File pairs that are very dissimilar during a diff were not being broken apart into their constituent ADD/DELETE pairs. The leads to sub-optimal rename detection. Take, for example, this situation: A file exists at src/a.txt containing "foo". A user renames src/a.txt to src/b.txt, then adds a new src/a.txt containing "bar". Even though the old a.txt and the new b.txt are identical, the rename detection algorithm would not detect it as a rename since it was already paired in a MODIFY. I added code to split all MODIFYs below a certain score into their constituent ADD/DELETE pairs. This allows situations like the one I described above to be more correctly handled. Change-Id: I22c04b70581f206bbc68c4cd1ee87a1f663b418e Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 anni fa
Shawn O. Pearce	395d236058	Fix NPE in RenameDetector If we have two adds of the same object but no deletes the detector threw an NPE because the entry that came back from the deleted map was null (no matching objects). In this case we need to put the adds all back onto the list of left over additions since they did not match a delete. Change-Id: Ie68fbe7426b4dc0cb571a08911c7adbffff755d5 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> CC: Jeffrey Schumacher" <jeffschu@google.com>	14 anni fa
Jeff Schumacher	31311cacfd	Implemented file path based tie breaking to exact rename detection During the exact rename detection phase in RenameDetector, ties were resolved on a first-found basis. I added support for file path based tie breaking during that phase. Basically, there are four situations that have to be handled: One add matching one delete: In this simple case, we pair them as a rename. One add matching many deletes: Find the delete whos path matches the add the closest, and pair them as a rename. Many adds matching one delete: Similar to the above case, we find the add that matches the delete the closest, and pair them as a rename. The other adds are marked as copies of the delete. Many adds matching many deletes: Build a scoring matrix similar to the one used for content- based matching, scoring instead by file path. Some of the utility functions in SimilarityRenameDetector are used in this case, as we use the same encoding scheme. Once the matrix is built, scan it for the best matches, marking them as renames. The rest are marked as copies. I don't particularly like the idea of using utility functions right out of SimilarityRenameDetector, but it works for the moment. A later commit will likely refactor this into a common utility class, as well as bringing exact rename detection out of RenameDetector and into a separate class, much like SimilarityRenameDetector. Change-Id: I1fb08390aebdcbf20d049aecf402a36506e55611	14 anni fa
Jeff Schumacher	f666cc755b	Added more test cases for RenameDetector I added test cases to cover the majority of the code. It's not 100% coverage yet, but the remaining bits are small. Change-Id: Ib534c8e94b13358b8b22cf54e2ff84132bae6d14	14 anni fa
Jeff Schumacher	9a48de86d8	Added file path similarity to scoring metric in rename detection The scoring method was not taking into account the similarity of the file paths and file names. I changed the metric so that it is 99% based on content (which used to be 100% of the old metric), and 1% based on path similarity. Of that 1%, half (.5% of the total final score) is based on the actual file names (e.g. "foo.java"), and half on the directory (e.g. "src/com/foo/bar/"). Change-Id: I94f0c23bf6413c491b10d5625f6ad7d2ecfb4def	14 anni fa
Shawn O. Pearce	978535b090	Implement similarity based rename detection Content similarity based rename detection is performed only after a linear time detection is performed using exact content match on the ObjectIds. Any names which were paired up during that exact match phase are excluded from the inexact similarity based rename, which reduces the space that must be considered. During rename detection two entries cannot be marked as a rename if they are different types of files. This prevents a symlink from being renamed to a regular file, even if their blob content appears to be similar, or is identical. Efficiently comparing two files is performed by building up two hash indexes and hashing lines or short blocks from each file, counting the number of bytes that each line or block represents. Instead of using a standard java.util.HashMap, we use a custom open hashing scheme similiar to what we use in ObjecIdSubclassMap. This permits us to have a very light-weight hash, with very little memory overhead per cell stored. As we only need two ints per record in the map (line/block key and number of bytes), we collapse them into a single long inside of a long array, making very efficient use of available memory when we create the index table. We only need object headers for the index structure itself, and the index table, but not per-cell. This offers a massive space savings over using java.util.HashMap. The score calculation is done by approximating how many bytes are the same between the two inputs (which for a delta would be how much is copied from the base into the result). The score is derived by dividing the approximate number of bytes in common into the length of the larger of the two input files. Right now the SimilarityIndex table should average about 1/2 full, which means we waste about 50% of our memory on empty entries after we are done indexing a file and sort the table's contents. If memory becomes an issue we could discard the table and copy all records over to a new array that is properly sized. Building the index requires O(M + N log N) time, where M is the size of the input file in bytes, and N is the number of unique lines/blocks in the file. The N log N time constraint comes from the sort of the index table that is necessary to perform linear time matching against another SimilarityIndex created for a different file. To actually perform the rename detection, a SxD matrix is created, placing the sources (aka deletions) along one dimension and the destinations (aka additions) along the other. A simple O(S x D) loop examines every cell in this matrix. A SimilarityIndex is built along the row and reused for each column compare along that row, avoiding the costly index rebuild at the row level. A future improvement would be to load a smaller square matrix into SimilarityIndexes and process everything in that sub-matrix before discarding the column dimension and moving down to the next sub-matrix block along that same grid of rows. An optional ProgressMonitor is permitted to be passed in, allowing applications to see the progress of the detector as it works through the matrix cells. This provides some indication of current status for very long running renames. The default line/block hash function used by the SimilarityIndex may not be optimal, and may produce too many collisions. It is borrowed from RawText's hash, which is used to quickly skip out of a longer equality test if two lines have different hash functions. We may need to refine this hash in the future, in order to minimize the number of collisions we get on common source files. Based on a handful of test commits in JGit (especially my own recent rename repository refactoring series), this rename detector produces output that is very close to C Git. The content similarity scores are sometimes off by 1%, which is most probably caused by our SimilarityIndex type using a different hash function than C Git uses when it computes the delta size between any two objects in the rename matrix. Bug: 318504 Change-Id: I11dff969e8a2e4cf252636d857d2113053bdd9dc Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 anni fa
Jeff Schumacher	cb8e1e6014	Added a preliminary version of rename detection JGit does not currently do rename detection during diffs. I added a class that, given a TreeWalk to iterate over, can output a list of DiffEntry's for that TreeWalk, taking into account renames. This class only detects renames by SHA1's. More complex rename detection, along the lines of what C Git does will be added later. Change-Id: I93606ce15da70df6660651ec322ea50718dd7c04	14 anni fa

20 Commit (1788b72d1af819dee6f371af9e1b0667f0ed8a64)