mirrors/jgit - jgit - source @ dussan.org

Commit Graph

Author	SHA1	Message	Date
Robin Rosenberg	3f4725c179	Handle content length in WorkingTreeIterator Content length is computed and cached (short term) in the working tree iterator when core.autocrlf is set. Hopefully this is a cleaner fix than my previous attempt to make autocrlf work. Change-Id: I1b6bbb643101a00db94e5514b5e2b069f338907a	12 years ago
Robin Rosenberg	95d311f888	Move JGitText to an internal package Change-Id: I763590a45d75f00a09097ab6f89581a3bbd3c797	12 years ago
Tomasz Zarna	eedd77a97b	RawText#getEOL() does the same thing as RawText#getLineDelimiter() The duplication has been introduced when merging I08e1369e142bb19f42a8d7bbb5a7d062cc8533fc and I18adc63596f4657516ccc6d704a561924c79d445. The former should have been manually rebased. It also missed a copyright update in ApplyCommandTest. Change-Id: I18fe6108220f964524fb16b719604222aa7abee6	12 years ago
Tomasz Zarna	92f90eb229	Add ApplyCommand to JGit API Bug: 361548 CQ: 6243 Change-Id: I08e1369e142bb19f42a8d7bbb5a7d062cc8533fc Signed-off-by: Chris Aniszczyk <zx@twitter.com>	12 years ago
Tomasz Zarna	9c7371a8c4	Allow to get end-of-line characters for a RawText Bug: 370320 Change-Id: I18adc63596f4657516ccc6d704a561924c79d445 Signed-off-by: Kevin Sawicki <kevin@github.com>	12 years ago
Tomasz Zarna	f1945cac1d	Add getters for old and new prefixes in DiffFormatter Bug: 370318 Change-Id: Iaf9282ba55ee3bb4e2c27fb71c598b308771bf57	12 years ago
Tomasz Zarna	acd8aee98a	DiffFormatter#format(List) fails unless #scan(ATI, ATI) is called first Bug: 354919 Change-Id: I710394fe6675e0e5aa66d9118c5b10d433aa30ea	13 years ago
Kevin Sawicki	78bc526d9b	Report diff entries for files that only change mode This also updates DiffFormatter to not write path lines for entries that have the same object id Bug: 361570 Change-Id: I830a78e2babf472503630a7aa020ebfd5c7e69c6	12 years ago
Carsten Pfeiffer	98d4bd6d36	Allow detecting which files were renamed during a revwalk The egit history view shows the files associated with a commit by using a PathFilter. When following renames with a FollowFilter, the PathFilter cannot be configured anymore because the affected files are simply not known. Thus, it should be possible to get to know which files are renamed. Bug: 302549 Change-Id: I4761e9f5cfb4f0ef0b0e1e38991401a1d5003bea	12 years ago
Dariusz Luksza	679cab9b32	Adds DiffEntry.scan(TreeWalk, boolean) method Adds method into DiffEntry class that allows to specify whether changed trees are included in scanning result list. By default changed trees aren't added, but in some cases having changed tree would be useful. Also adds check for tree count in TreeWalk and when it is different from two it will thrown an IllegalArgumentException. This change is required by egit I7ddb21e7ff54333dd6d7ace3209bbcf83da2b219 Change-Id: I5a680a73e1cffa18ade3402cc86008f46c1da1f1 Signed-off-by: Dariusz Luksza <dariusz@luksza.org> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	13 years ago
Shawn O. Pearce	8d1ac7a769	Change EditList to extend ArrayList There is no reason for this type to contain an ArrayList and try to hide the implementation. It only slows down execution by adding an extra layer of method dispatch to each invocation. Instead subclass from ArrayList. Change-Id: Ifbb9c7060c2fe3d5a7397c1aa85fbade14088637 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Stefan Lay	05fa1713da	Add a DiffFormatter which calculates a patch-id Adds a class which can be used to calculates a SHA1 of the diff associated with a patch, similar to git patch-id. In this version whitespace is not ignored. Change-Id: I421d15ea905e23df543082786786841cbe3ef10d Signed-off-by: Stefan Lay <stefan.lay@sap.com> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>	13 years ago
Shawn O. Pearce	c2b87d59a0	Fix diff bug on inserted line For the following patch on the linux 2.6.32 tag: --- a/kernel/sched_fair.c +++ b/kernel/sched_fair.c @@ -685,6 +685,7 @@ static void enqueue_sleeper(struct cfs_rq cfs_rq, struct sc static void check_spread(struct cfs_rq cfs_rq, struct sched_entity se) { +#if 0 #ifdef CONFIG_SCHED_DEBUG s64 d = se->vruntime - cfs_rq->min_vruntime; @@ -694,6 +695,7 @@ static void check_spread(struct cfs_rq cfs_rq, struct sched if (d > 3*sysctl_sched_latency) schedstat_inc(cfs_rq, nr_spread_over); #endif +#endif } static void JGit produced an incorrect diff, attempting to add a new "}" instead of the new "#endif" at the end of the hunk. This was caused by a prior fix for bug 328895 where we wanted to "slide" a diff down in the file when adding a new method/function and want to show the closing curly brace as being added after the new method, rather than added onto the end of the prior function or method just before the insertion point. Bug: 345956 Change-Id: I32b9e24f1e2980258b1b39dd1807919ab1c5f9b2 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Robin Rosenberg	51a5cc7f1a	Fix diff when first text is the start of the other The problem occurred when the first text ends in the middle of the last line of the other text and the first text has no end of line. Bug: 344975 Change-Id: I1f0dd9f8062f2148a7c1341c9122202e082ad19d Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>	13 years ago
Shawn O. Pearce	45a020fe6a	DiffFormatter: Use IndexDiffFilter to speed up working tree If DiffFormatter is asked to compare the index to the working tree, it can go faster by using the cached stat information to compare the two entries rather than relying on SHA-1 computation alone. Change-Id: Icb21c15b8279ee8cee382e5e179e0cf8903aee4d Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	a02be9725c	Remove empty iterator from TreeWalk Its confusing that a new TreeWalk() needs to have reset() invoked on it before addTree(). This is a historical accident caused by how TreeWalk was abused within ObjectWalk. Drop the initial empty tree from the TreeWalk and thus remove a number of pointless reset() operations from unit tests and some of the internal JGit code. Existing application code which is still calling reset() will simply be incurring a few unnecessary field assignments, but they should consider cleaning up their code in the future. Change-Id: I434e94ffa43491019e7dff52ca420a4d2245f48b Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Christian Halstrick	deabacc420	Fixed Merge Algorithm regarding concurrent file creations When in OURS and THEIRS a new file is created we want a conflict when the two contents differ. If on two branches the same file with the same content is created this should not be a conflict. But: the current merge algorithm is throwing NPEs in this case. Fix this by choosing an empty RawText as common base if the base is empty. Change-Id: I21cb23f852965b82fb82ccd66ec961c7edb3ac3d Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>	13 years ago
Marc Strapetz	e147fbcd66	Fix DiffConfig to understand "copy" resp. "copies" for diff.renames property. Rename detection should be considered enabled if diff.renames config property is set to "copy" or "copies", instead of throwing IllegalArgumentException. Change-Id: If55d955e37235d4d00f5b0febd6aa10c0e27814e	13 years ago
Christian Halstrick	049827d708	Make diff algorithm configurable The diff algorithm which is used by Merge, Cherry-Pick, Rebase should be configurable. A new configuration parameter "diff.algorithm" is introduced which currently accepts the values "myers" or "histogram". Based on this parameter for example the ResolveMerger will choose a diff algorithm. The reason for this is bug 331078. This bug shows that JGit is more compatible with C Git when histogram diff is in place. But since histogram diff is quite new we need an easy way to fall back to Myers diff. Bug: 331078 Change-Id: I2549c992e478d991c61c9508ad826d1a9e539ae3 Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Philipp Thun <philipp.thun@sap.com>	13 years ago
Shawn O. Pearce	bc9bca064d	RenameDetector: Only scan deletes if adds exist If there are only deletes, don't need perform rename or copy detection. There are no adds (aka destinations) for the deletes to match against. Change-Id: I00fb90c509fa26a053de561dd8506cc1e0f5799a Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	05653bda04	SimilarityRenameDetector: Initialize sizes to 0 Setting the array elements to -1 is more expensive than relying on the allocator to zero the array for us first. Shifting the code to always add 1 to the size (so an empty file is actually 1 byte long) allows us to detect an unloaded size by comparing to 0, thus saving the array fill calls. Change-Id: Iad859e910655675b53ba70de8e6fceaef7cfcdd1 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	68baa3097e	SimilarityRenameDetector: Avoid allocating source index If the only file added is really small, and all of the deleted files are really big, none of the permutations will match up due to the sizes being too far apart to fit the current rename score. Avoid allocating the really big deleted SimilarityIndex by deferring its construction until at least one add along that row has a reasonable chance of matching it. This avoids expending a lot of CPU time looking at big deleted binary files when a small modified text file was broken due to a high percentage of changed lines. Change-Id: I11ae37edb80a7be1eef8cc01d79412017c2fc075 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	918e6e20f0	SimilarityRenameDetector: Only attempt to index large files once If a file fails to index the first time the loop encounters it, the file is likely to fail to index again on the next row. Rather than wasting a huge amount of CPU to index it again and fail, remember which destination files failed to index and skip over them on each subsequent row. Because this condition is very unlikely, avoid allocating the BitSet until its actually needed. This keeps the memory usage unaffected for the common case. Change-Id: I93509b28b61a9bba8f681a7b4df4c6127bca2a09 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	0e307a6afd	SimilarityIndex: Don't overflow internal counter fields The counter portion of each pair is only 32 bits wide, but is part of a larger 64 bit integer. If the file size was larger than 4 GB the counter could overflow and impact the key, changing the hash, and later resulting in an incorrect similarity score. Guard against this overflow condition by capping the count for each record at 2^32-1. If any record contains more than that many bytes the table aborts hashing and throws TableFullException. This permits the index to scan and work on files that exceed 4 GB in size, but only if the file contains more than one unique block. The index throws TableFullException on a 4 GB file containing all zeros, but should succeed on a 6 GB file containing unique lines. The index now uses a 64 bit accumulator during the common scoring algorithm, possibly resulting in slower summations. However this index is already heavily dependent upon 64 bit integer operations being efficient, so increasing from 32 bits to 64 bits allows us to correctly handle 6 GB files. Change-Id: I14e6dbc88d54ead19336a4c0c25eae18e73e6ec2 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	d63887127e	SimilarityIndex: Accept files larger than 8 MB Files bigger than 8 MB (2^23 bytes) tended to overflow the internal hashtable, as the table was capped in size to 2^17 records. If a file contained 2^17 unique data blocks/lines, the table insertion got stuck in an infinite loop as the able couldn't grow, and there was no open slot for the new item. Remove the artifical 2^17 table limit and instead allow the table to grow to be as big as 2^30. With a 64 byte block size, this permits hashing inputs as large as 64 GB. If the table reaches 2^30 (or cannot be allocated) hashing is aborted. RenameDetector no longer tries to break a modify file pair, and it does not try to match the file for rename or copy detection. Change-Id: Ibb4d756844f4667e181e24a34a468dc3655863ac Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	f3b511568b	SimilarityIndex: Correct comment explaining the logic This comment was wrong, due to a copy-and-paste error. Here the code is looking at records of dst that do not exist in src, and are skipping past them to find another match. Change-Id: I07c1fba7dee093a1eeffcf7e0c7ec85446777ffb Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	aa09599a3d	Fix ugly diff showing insertion of new method When adding a new method near the end of the sequence we want to show the full method inserted, and not tear the prior method due to the common trailing curly brace being consumed as part of the common end region of the sequences. Bug: 328895 Change-Id: I233bc40445fb5452863f5fb082bc3097433a8da6 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	b88b693a3d	Fix broken HistogramDiff HistogramDiff failed on cases where the initial element for the LCS was actually very common (e.g. has 20 occurrences), and the first element of the inserted region after the LCS was also common but had fewer occurrences (e.g. 10), while the LCS also contained a unique element (1 occurrence). This happens often in Java source code. The initial element for the LCS might be the empty line ("\n"), and the inserted but common element might be "\t/\n", with the LCS being a large span of lines that contains unique method declarations. Even though "/" occurs less often than the empty line its not a better LCS if the LCS we already have contains a unique element. The logic in HistogramDiff would normally have worked fine, except I tried to optimize scanning of B by making tryLongestCommonSequence return the end of the region when there are matching elements found in A. This allows us to skip over the current LCS region, as it has already been examined, but caused us to fail to identify an element that had a lower occurrence count within the region. The solution used here is to trade space-for-time by keeping a table of A positions to their occurrence counts. This allows the matching logic to always use the smallest count for this region, even if the smallest count doesn't appear on the initial element. The new unit test testEdit_LcsContainsUnique() verifies this new behavior works as expected. Bug: 328895 Change-Id: Id170783b891f645b6a8cf6f133c6682b8de40aaf Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	33ae28b482	Correct typo in HistogramDiffIndex Javadoc Change-Id: I8bd2e81fcc14aa86919c504f1d0001944dea50b2 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Robin Stocker	db35d91fa6	Fix oddness check in MyersDiff for negative numbers It's probably not possible that these numbers are negative in the algorithm, but it's cleaner this way and gets rid of three more FindBugs warnings. Change-Id: Ifbce4e2c787fb9a7cd309c605e8d86211ef8a352	13 years ago
Shawn O. Pearce	4c7e100910	Add getString utility functions to RawText These routines can be useful when debugging, because we can add an expression to the Eclipse "Expressions" panel to show the text that appears on a line. Gerrit Code Review also uses these in its own subclass of RawText in order to format patch files, so pulling it up to be part of core JGit may help other applications too. Change-Id: I20a6b112e3403ecfc1c2715ae75dcecc1a85b167 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	3f3b6bfdb3	Remove dead RawText(RawTextComparator) constructor Since the introduction of HashedSequence we no longer need to supply the RawTextComparator at the time of constructing a RawText. Drop the definition from the constructor, because it doesn't make sense as part of our public API. Change-Id: Iaab34611d60eee4a2036830142b089b2dae81842 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	8ea558bd82	Fix RawTextComparator reduceCommonStartEnd at empty lines When an empty line was inserted at the beginning of the common end part of a RawText the comparator incorrectly considered it to be common, which meant the DiffAlgorithm would later not even have it be part of the region it examines. This would cause JGit to skip a line of insertion, which later confused Gerrit Code Review when it tried to match up the pre and post RawText files for a difference that had this type of insertion. Define two new unit tests to check for this insertion of a blank line condition and correct for it by removing the LF from the common region when the condition is detected. Change-Id: I2108570eb2929803b9a56f9fb9c400c758e7156b Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	e82cadc0dc	Delete PatienceDiff HistogramDiff outperforms it for any case where PatienceDiff needs to fallback to another algorithm. Consequently it's not worth keeping around, because we would always want a fallback enabled. Change-Id: I39b99cb1db4b3be74a764dd3d68cd4c9ecd91481 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	6048f34c58	Use HistogramDiff by default in DiffFormatter Its behavior is similar to PatienceDiff, and runs nearly as fast, often beating the performance of MyersDiff. Change-Id: I43c3faefa8109f1a68ef57522bec9cf27b5df252 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	1bd24a23f9	Define LowLevelDiffAlgorithm to bypass re-hashing When passing to a fallback algorithm, we can avoid creating a new copy of the hash codes for each sequence by passing in the hashed sequences directly. This makes it cheaper to switch from HistogramDiff down to MyersDiff in a single pass. Change-Id: Ibf2e81be57c083862eeb134279aed676653bf9b5 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	4fc50df97d	Fix empty block corner case in PatienceDiff There is a corner case where we get an EMPTY region during recursion, but we didn't expect to receive that. Its harmless to ignore the region since the region is empty and has no content, so do so rather than throwing an exception Change-Id: I50dcec81ecba763072bb739adfab5879fb48b23a Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	7a0c126d5f	Fix infinite loop in PatienceDiff Certain inputs caused an infinite loop because the prior match data couldn't be used as expected. Rather than incrementing the match pointer before looking at an element, do it after, so the loop breaks when we wrap around to the starting point. Change-Id: Ieab28bb3485a914eeddc68aa38c256f255dd778c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	b533a72934	Implement HistogramDiff HistogramDiff is an alternative implementation of patience diff, performing a search over all matching locations and picking the longest common subsequence that has the lowest occurrence count. If there are unique common elements, its behavior is identical to that of patience diff. Actual performance on real-world source files usually beats MyersDiff, sometimes by a factor of 3, especially for complex comparators that ignore whitespace. Change-Id: I1806cd708087e36d144fb824a0e5ab7cdd579d73 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	9bcf391355	Micro-optimize EditList.addAll Pass through the addAll request to our underlying ArrayList. This way the underlying ArrayList grows no more than once during the call, which may be important if the list was originally allocated at the default size of 16, but 64 Edits are being added. Change-Id: I31c3261e895766f82c3c832b251a09f6e37e8860 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	11f99fecfd	Reduce content hash function collisions The hash code returned by RawTextComparator (or that is used by the SimilarityIndex) play an important role in the speed of any algorithm that is based upon them. The lower the number of collisions produced by the hash function, the shorter the hash chains within hash tables will be, and the less likely we are to fall into O(N^2) runtime behaviors for algorithms like PatienceDiff. Our prior hash function was absolutely horrid, so replace it with the proper definition of the DJB hash that was originally published by Professor Daniel J. Bernstein. To support this assertion, below is a table listing the maximum number of collisions that result when hashing the unique lines in each source code file of 3 randomly chosen projects: test_jgit: 931 files; 122 avg. unique lines/file Algorithm \| Collisions -------------+----------- prior_hash 418 djb 5 sha1 6 string_hash31 11 test_linux26: 30198 files; 258 avg. unique lines/file Algorithm \| Collisions -------------+----------- prior_hash 8675 djb 32 sha1 8 string_hash31 32 test_frameworks_base: 8381 files; 184 avg. unique lines/file Algorithm \| Collisions -------------+----------- prior_hash 4615 djb 10 sha1 6 string_hash31 13 We can clearly see that prior_hash performed very poorly, resulting in 8,675 collisions (elements in the same hash bucket) for at least one file in the Linux kernel repository. This leads to some very bad O(N) style insertion and lookup performance, even though the hash table was sized to be the next power-of-2 larger than the total number of unique lines in the file. The djb hash we are replacing prior_hash with performs closer to SHA-1 in terms of having very few collisions. This indicates it provides a reasonably distributed output for this type of input, despite being a much simpler algorithm (and therefore will be much faster to execute). The string_hash31 function is provided just to compare results with, it is the algorithm commonly used by java.lang.String hashCode(). However, life isn't quite this simple. djb produces a 32 bit hash code, but our hash tables are always smaller than 2^32 buckets. Mashing the 32 bit code into an array index used to be done by simply taking the lower bits of the hash code by a bitwise and operator. This unfortuntely still produces many collisions, e.g. 32 on the linux-2.6 repository files. From [1] we can apply a final "cleanup" step to the hash code to mix the bits together a little better, and give priority to the higher order bits as they include data from more bytes of input: test_jgit: 931 files; 122 avg. unique lines/file Algorithm \| Collisions -------------+----------- prior_hash 418 djb 5 djb + cleanup 6 test_linux26: 30198 files; 258 avg. unique lines/file Algorithm \| Collisions -------------+----------- prior_hash 8675 djb 32 djb + cleanup 7 test_frameworks_base: 8381 files; 184 avg. unique lines/file Algorithm \| Collisions -------------+----------- prior_hash 4615 djb 10 djb + cleanup 7 This is a massive improvement, as the number of collisions for common inputs drops to acceptable levels, and we haven't really made the hash functions any more complex than they were before. [1] http://lkml.org/lkml/2009/10/27/404 Change-Id: Ia753b695de9526a157ddba265824240bd05dead1 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	857d68d173	Perform common start/end elimination by default for DiffAlgorithm As it turns out, every single diff algorithm we might try to implement can benfit from using the SequenceComparator's native concept of the simple reduceCommonStartEnd() step. For most inputs, there can be a significant number of elements that can be removed from the space the DiffAlgorithm needs to consider, which will reduce the overall running time for the final solution. Pool this logic inside of DiffAlgorithm itself as a default, but permit a specific algorithm to override it when necessary. Convert MyersDiff to use this reduction to reduce the space it needs to search, making it perform slightly better on common inputs. Change-Id: I14004d771117e4a4ab2a02cace8deaeda9814bc1 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	e84d826eb6	Remove unnecessary hash cache from PatienceDiffIndex PatienceDiff always uses a HashedSequence, which promises to provide constant time access for hash codes during the equals method and aborts fast if the hash codes don't match. Therefore we don't need to cache the hash codes inside of the index, saving us memory. Change-Id: I80bf1e95094b7670e6c0acc26546364a1012d60e Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	a67afbfee1	Implement Bram Cohen's Patience Diff Change-Id: Ic7a76df2861ea6c569ab9756a62018987912bd13 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	baaddd51f1	Move cached element hash codes to HashedSequence Most diff implementations really want to use cached hash codes for elements, rather than element equality, as they need to perform many compares and unique hash codes for elements can really speed that process up. To make it easier to define element hash functions, move the caching of hash codes into a wrapper sequence type, so that individual sequence types like RawText don't need to do this themselves. This has a nice property of also allowing the sequence to no longer care about the specific SequenceComparator that is going to be used, and permits the caching to only examine the middle region that isn't common to the two inputs. Change-Id: If8623556da9419117b07c5073e8bce39de02570e Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	e0970cd1b4	Micro-optimize reduceCommonStartEnd for RawText This is a faster exact match based form that tries to improve performance for the common case of the header and trailer of a text file not changing at all. After this fast path we use the slower path based on the super class' using equals() to allow for whitespace ignore modes to still work. Some simple performance testing showed a major improvement over the older implementation for a common edit we see in JGit. The test compared blob `29a89bc` and 372a978, which is the ObjectDirectory.java file difference in commit `41dd9ed1c0`. The two text files are approximately 22 KiB in size. DEFAULT old 203900 ns DEFAULT new 100400 ns This new version is 2x faster for the DEFAULT comparator, which does not treat space specially. This is because we can now examine a larger swath of text with fewer instructions per byte compared. The older algorithm had to stop at each line break and recompute how to examine the next line, while the new algorithm only stops when the first difference is found. WS_IGNORE_ALL old 298500 ns WS_IGNORE_ALL new 63300 ns Its 4.7x faster for the whitespace ignore comparator, as the common header and footer do not have a whitespace difference. Avoiding the special case handling for whitespace on each byte considered saves a lot of time. Since most edits to source code (and other text like files) appears in the interior of the file, fast elimination of common header/footer means faster diff throughput. In the less common case of an actual header or footer edit, the common header/footer elimination is stopped rather quickly either way, so there is very little downside to the optimiation applied here. Change-Id: I1d501b4c3ff80ed086b20bf12faf51ae62167db7 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	590a9f94a1	Add Subsequence utility methods DiffAlgorithm implementations may find it useful to construct an Edit and use that to later subsequence the two base sequences, so define two new utility methods a() and b() to construct the A and B ranges. Once a subsequence has had Edits created for it the indexes are within the space of the subsequence. These must be shifted back to the original base sequence's indexes. Define toBase() as a utility method to perform that shifting work in-place, so DiffAlgorithm implementations have an efficient way to convert back to the caller's original space. Change-Id: I8d788e4d158b9f466fa9cb4a40865fb806376aee Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	276d38065b	Define a subsequence utility type A diff algorithm may find this type useful if it wants to delegate a particular range of elements to another algorithm, without changing the underlying sequence types. Change-Id: I4544467781233e21ac8b35081304b2bad7db00f6 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	307ba53eb6	Define DiffAlgorithm as an abstract function This makes it easier to parametrize DiffFormatter with a different implementation, as we later plan to add PatienceDiff to JGit. Change-Id: Id35ef478d5fa20fe10a1ba297f9436fd7adde9ce Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	db1a9c6a8c	Correct Javadoc for WS_IGNORE_CHANGE comparator Change-Id: I8aa1e7c7ae192ed28b2c8aaa3c5884b7b4666e9c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago

1 2 3

107 Commits (3f4725c179c176560937d756682fcd6cfbf685fe)