debug-diff-algorithms: Real world performance test implementations
When working on a difference algorithm's implementation, its generally
more important to care about how it behaves on real-world inputs than
it does on fake inputs created for unit test cases. Run each
implementation against a number of real-world repositories, looking at
changes between files in each commit. This gives a better picture of
how a particular algorithm performs.
This test suite run against JGit and linux-2.6 with the current
available algorithms shows HistogramDiff always out-performs
MyersDiff, and by a wide margin on the linux-2.6 sources. As
HistogramDiff has similar output properties as PatienceDiff, the
resulting edits are probably also more human-readable. These test
results show that HistogramDiff is a good choice for the default
implementation, and also show that PatienceDiff isn't worth keeping.
jgit: start at
baa83ae
2686 files, 760 commits
N= 3 min lines, 3016 max lines
Algorithm Time(ns) ( Time(ns) on Time(ns) on )
( N=3 N=3016 )
---------------------------------------------------------------------
histogram_myers
314652100 ( 3900 298100 )
histogram
315973000 ( 3800 302100 )
patience
774724900 ( 4500 347900 )
patience_histogram_myers
786332800 ( 3700 351200 )
myers
819359300 ( 4100 379100 )
patience_myers
843416700 ( 3800 348000 )
linux-2.6.git: start at
85a3318
4001 files, 2680 commits
N= 2 min lines, 39098 max lines
Algorithm Time(ns) ( Time(ns) on Time(ns) on )
( N=2 N=39098 )
---------------------------------------------------------------------
histogram_myers
1229870000 ( 5900
2642700 )
histogram
1235654100 ( 6000
2695400 )
patience
3856546000 ( 5900
2627700 )
patience_histogram_myers
3866728100 ( 7000
2624000 )
patience_myers
4004875300 ( 8000
2651700 )
myers
9794679000 ( 7200
2716200 )
Change-Id: I2502684d31f7851e720356820d04d8cf767f7229
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>