You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

RawTextTest.java 8.4KB

Micro-optimize reduceCommonStartEnd for RawText This is a faster exact match based form that tries to improve performance for the common case of the header and trailer of a text file not changing at all. After this fast path we use the slower path based on the super class' using equals() to allow for whitespace ignore modes to still work. Some simple performance testing showed a major improvement over the older implementation for a common edit we see in JGit. The test compared blob 29a89bc and 372a978, which is the ObjectDirectory.java file difference in commit 41dd9ed1c054f9f9e1ab52fc7bbf1a55a56cf543. The two text files are approximately 22 KiB in size. DEFAULT old 203900 ns DEFAULT new 100400 ns This new version is 2x faster for the DEFAULT comparator, which does not treat space specially. This is because we can now examine a larger swath of text with fewer instructions per byte compared. The older algorithm had to stop at each line break and recompute how to examine the next line, while the new algorithm only stops when the first difference is found. WS_IGNORE_ALL old 298500 ns WS_IGNORE_ALL new 63300 ns Its 4.7x faster for the whitespace ignore comparator, as the common header and footer do not have a whitespace difference. Avoiding the special case handling for whitespace on each byte considered saves a lot of time. Since most edits to source code (and other text like files) appears in the interior of the file, fast elimination of common header/footer means faster diff throughput. In the less common case of an actual header or footer edit, the common header/footer elimination is stopped rather quickly either way, so there is very little downside to the optimiation applied here. Change-Id: I1d501b4c3ff80ed086b20bf12faf51ae62167db7 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago
Micro-optimize reduceCommonStartEnd for RawText This is a faster exact match based form that tries to improve performance for the common case of the header and trailer of a text file not changing at all. After this fast path we use the slower path based on the super class' using equals() to allow for whitespace ignore modes to still work. Some simple performance testing showed a major improvement over the older implementation for a common edit we see in JGit. The test compared blob 29a89bc and 372a978, which is the ObjectDirectory.java file difference in commit 41dd9ed1c054f9f9e1ab52fc7bbf1a55a56cf543. The two text files are approximately 22 KiB in size. DEFAULT old 203900 ns DEFAULT new 100400 ns This new version is 2x faster for the DEFAULT comparator, which does not treat space specially. This is because we can now examine a larger swath of text with fewer instructions per byte compared. The older algorithm had to stop at each line break and recompute how to examine the next line, while the new algorithm only stops when the first difference is found. WS_IGNORE_ALL old 298500 ns WS_IGNORE_ALL new 63300 ns Its 4.7x faster for the whitespace ignore comparator, as the common header and footer do not have a whitespace difference. Avoiding the special case handling for whitespace on each byte considered saves a lot of time. Since most edits to source code (and other text like files) appears in the interior of the file, fast elimination of common header/footer means faster diff throughput. In the less common case of an actual header or footer edit, the common header/footer elimination is stopped rather quickly either way, so there is very little downside to the optimiation applied here. Change-Id: I1d501b4c3ff80ed086b20bf12faf51ae62167db7 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago
Micro-optimize reduceCommonStartEnd for RawText This is a faster exact match based form that tries to improve performance for the common case of the header and trailer of a text file not changing at all. After this fast path we use the slower path based on the super class' using equals() to allow for whitespace ignore modes to still work. Some simple performance testing showed a major improvement over the older implementation for a common edit we see in JGit. The test compared blob 29a89bc and 372a978, which is the ObjectDirectory.java file difference in commit 41dd9ed1c054f9f9e1ab52fc7bbf1a55a56cf543. The two text files are approximately 22 KiB in size. DEFAULT old 203900 ns DEFAULT new 100400 ns This new version is 2x faster for the DEFAULT comparator, which does not treat space specially. This is because we can now examine a larger swath of text with fewer instructions per byte compared. The older algorithm had to stop at each line break and recompute how to examine the next line, while the new algorithm only stops when the first difference is found. WS_IGNORE_ALL old 298500 ns WS_IGNORE_ALL new 63300 ns Its 4.7x faster for the whitespace ignore comparator, as the common header and footer do not have a whitespace difference. Avoiding the special case handling for whitespace on each byte considered saves a lot of time. Since most edits to source code (and other text like files) appears in the interior of the file, fast elimination of common header/footer means faster diff throughput. In the less common case of an actual header or footer edit, the common header/footer elimination is stopped rather quickly either way, so there is very little downside to the optimiation applied here. Change-Id: I1d501b4c3ff80ed086b20bf12faf51ae62167db7 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago
Micro-optimize reduceCommonStartEnd for RawText This is a faster exact match based form that tries to improve performance for the common case of the header and trailer of a text file not changing at all. After this fast path we use the slower path based on the super class' using equals() to allow for whitespace ignore modes to still work. Some simple performance testing showed a major improvement over the older implementation for a common edit we see in JGit. The test compared blob 29a89bc and 372a978, which is the ObjectDirectory.java file difference in commit 41dd9ed1c054f9f9e1ab52fc7bbf1a55a56cf543. The two text files are approximately 22 KiB in size. DEFAULT old 203900 ns DEFAULT new 100400 ns This new version is 2x faster for the DEFAULT comparator, which does not treat space specially. This is because we can now examine a larger swath of text with fewer instructions per byte compared. The older algorithm had to stop at each line break and recompute how to examine the next line, while the new algorithm only stops when the first difference is found. WS_IGNORE_ALL old 298500 ns WS_IGNORE_ALL new 63300 ns Its 4.7x faster for the whitespace ignore comparator, as the common header and footer do not have a whitespace difference. Avoiding the special case handling for whitespace on each byte considered saves a lot of time. Since most edits to source code (and other text like files) appears in the interior of the file, fast elimination of common header/footer means faster diff throughput. In the less common case of an actual header or footer edit, the common header/footer elimination is stopped rather quickly either way, so there is very little downside to the optimiation applied here. Change-Id: I1d501b4c3ff80ed086b20bf12faf51ae62167db7 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252
  1. /*
  2. * Copyright (C) 2009, Google Inc.
  3. * Copyright (C) 2009, Johannes E. Schindelin <johannes.schindelin@gmx.de>
  4. * and other copyright owners as documented in the project's IP log.
  5. *
  6. * This program and the accompanying materials are made available
  7. * under the terms of the Eclipse Distribution License v1.0 which
  8. * accompanies this distribution, is reproduced below, and is
  9. * available at http://www.eclipse.org/org/documents/edl-v10.php
  10. *
  11. * All rights reserved.
  12. *
  13. * Redistribution and use in source and binary forms, with or
  14. * without modification, are permitted provided that the following
  15. * conditions are met:
  16. *
  17. * - Redistributions of source code must retain the above copyright
  18. * notice, this list of conditions and the following disclaimer.
  19. *
  20. * - Redistributions in binary form must reproduce the above
  21. * copyright notice, this list of conditions and the following
  22. * disclaimer in the documentation and/or other materials provided
  23. * with the distribution.
  24. *
  25. * - Neither the name of the Eclipse Foundation, Inc. nor the
  26. * names of its contributors may be used to endorse or promote
  27. * products derived from this software without specific prior
  28. * written permission.
  29. *
  30. * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
  31. * CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
  32. * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  33. * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  34. * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
  35. * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
  36. * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
  37. * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
  38. * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  39. * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
  40. * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
  41. * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
  42. * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  43. */
  44. package org.eclipse.jgit.diff;
  45. import static org.junit.Assert.assertEquals;
  46. import static org.junit.Assert.assertFalse;
  47. import static org.junit.Assert.assertNull;
  48. import static org.junit.Assert.assertTrue;
  49. import java.io.ByteArrayOutputStream;
  50. import java.io.IOException;
  51. import java.io.UnsupportedEncodingException;
  52. import org.eclipse.jgit.lib.Constants;
  53. import org.eclipse.jgit.util.RawParseUtils;
  54. import org.junit.Test;
  55. public class RawTextTest {
  56. @Test
  57. public void testEmpty() {
  58. final RawText r = new RawText(new byte[0]);
  59. assertEquals(0, r.size());
  60. }
  61. @Test
  62. public void testEquals() {
  63. final RawText a = new RawText(Constants.encodeASCII("foo-a\nfoo-b\n"));
  64. final RawText b = new RawText(Constants.encodeASCII("foo-b\nfoo-c\n"));
  65. RawTextComparator cmp = RawTextComparator.DEFAULT;
  66. assertEquals(2, a.size());
  67. assertEquals(2, b.size());
  68. // foo-a != foo-b
  69. assertFalse(cmp.equals(a, 0, b, 0));
  70. assertFalse(cmp.equals(b, 0, a, 0));
  71. // foo-b == foo-b
  72. assertTrue(cmp.equals(a, 1, b, 0));
  73. assertTrue(cmp.equals(b, 0, a, 1));
  74. }
  75. @Test
  76. public void testWriteLine1() throws IOException {
  77. final RawText a = new RawText(Constants.encodeASCII("foo-a\nfoo-b\n"));
  78. final ByteArrayOutputStream o = new ByteArrayOutputStream();
  79. a.writeLine(o, 0);
  80. final byte[] r = o.toByteArray();
  81. assertEquals("foo-a", RawParseUtils.decode(r));
  82. }
  83. @Test
  84. public void testWriteLine2() throws IOException {
  85. final RawText a = new RawText(Constants.encodeASCII("foo-a\nfoo-b"));
  86. final ByteArrayOutputStream o = new ByteArrayOutputStream();
  87. a.writeLine(o, 1);
  88. final byte[] r = o.toByteArray();
  89. assertEquals("foo-b", RawParseUtils.decode(r));
  90. }
  91. @Test
  92. public void testWriteLine3() throws IOException {
  93. final RawText a = new RawText(Constants.encodeASCII("a\n\nb\n"));
  94. final ByteArrayOutputStream o = new ByteArrayOutputStream();
  95. a.writeLine(o, 1);
  96. final byte[] r = o.toByteArray();
  97. assertEquals("", RawParseUtils.decode(r));
  98. }
  99. @Test
  100. public void testComparatorReduceCommonStartEnd()
  101. throws UnsupportedEncodingException {
  102. final RawTextComparator c = RawTextComparator.DEFAULT;
  103. Edit e;
  104. e = c.reduceCommonStartEnd(t(""), t(""), new Edit(0, 0, 0, 0));
  105. assertEquals(new Edit(0, 0, 0, 0), e);
  106. e = c.reduceCommonStartEnd(t("a"), t("b"), new Edit(0, 1, 0, 1));
  107. assertEquals(new Edit(0, 1, 0, 1), e);
  108. e = c.reduceCommonStartEnd(t("a"), t("a"), new Edit(0, 1, 0, 1));
  109. assertEquals(new Edit(1, 1, 1, 1), e);
  110. e = c.reduceCommonStartEnd(t("axB"), t("axC"), new Edit(0, 3, 0, 3));
  111. assertEquals(new Edit(2, 3, 2, 3), e);
  112. e = c.reduceCommonStartEnd(t("Bxy"), t("Cxy"), new Edit(0, 3, 0, 3));
  113. assertEquals(new Edit(0, 1, 0, 1), e);
  114. e = c.reduceCommonStartEnd(t("bc"), t("Abc"), new Edit(0, 2, 0, 3));
  115. assertEquals(new Edit(0, 0, 0, 1), e);
  116. e = new Edit(0, 5, 0, 5);
  117. e = c.reduceCommonStartEnd(t("abQxy"), t("abRxy"), e);
  118. assertEquals(new Edit(2, 3, 2, 3), e);
  119. RawText a = new RawText("p\na b\nQ\nc d\n".getBytes("UTF-8"));
  120. RawText b = new RawText("p\na b \nR\n c d \n".getBytes("UTF-8"));
  121. e = new Edit(0, 4, 0, 4);
  122. e = RawTextComparator.WS_IGNORE_ALL.reduceCommonStartEnd(a, b, e);
  123. assertEquals(new Edit(2, 3, 2, 3), e);
  124. }
  125. @Test
  126. public void testComparatorReduceCommonStartEnd_EmptyLine()
  127. throws UnsupportedEncodingException {
  128. RawText a;
  129. RawText b;
  130. Edit e;
  131. a = new RawText("R\n y\n".getBytes("UTF-8"));
  132. b = new RawText("S\n\n y\n".getBytes("UTF-8"));
  133. e = new Edit(0, 2, 0, 3);
  134. e = RawTextComparator.DEFAULT.reduceCommonStartEnd(a, b, e);
  135. assertEquals(new Edit(0, 1, 0, 2), e);
  136. a = new RawText("S\n\n y\n".getBytes("UTF-8"));
  137. b = new RawText("R\n y\n".getBytes("UTF-8"));
  138. e = new Edit(0, 3, 0, 2);
  139. e = RawTextComparator.DEFAULT.reduceCommonStartEnd(a, b, e);
  140. assertEquals(new Edit(0, 2, 0, 1), e);
  141. }
  142. @Test
  143. public void testComparatorReduceCommonStartButLastLineNoEol()
  144. throws UnsupportedEncodingException {
  145. RawText a;
  146. RawText b;
  147. Edit e;
  148. a = new RawText("start".getBytes("UTF-8"));
  149. b = new RawText("start of line".getBytes("UTF-8"));
  150. e = new Edit(0, 1, 0, 1);
  151. e = RawTextComparator.DEFAULT.reduceCommonStartEnd(a, b, e);
  152. assertEquals(new Edit(0, 1, 0, 1), e);
  153. }
  154. @Test
  155. public void testComparatorReduceCommonStartButLastLineNoEol_2()
  156. throws UnsupportedEncodingException {
  157. RawText a;
  158. RawText b;
  159. Edit e;
  160. a = new RawText("start".getBytes("UTF-8"));
  161. b = new RawText("start of\nlastline".getBytes("UTF-8"));
  162. e = new Edit(0, 1, 0, 2);
  163. e = RawTextComparator.DEFAULT.reduceCommonStartEnd(a, b, e);
  164. assertEquals(new Edit(0, 1, 0, 2), e);
  165. }
  166. @Test
  167. public void testLineDelimiter() throws Exception {
  168. RawText rt = new RawText(Constants.encodeASCII("foo\n"));
  169. assertEquals("\n", rt.getLineDelimiter());
  170. assertFalse(rt.isMissingNewlineAtEnd());
  171. rt = new RawText(Constants.encodeASCII("foo\r\n"));
  172. assertEquals("\r\n", rt.getLineDelimiter());
  173. assertFalse(rt.isMissingNewlineAtEnd());
  174. rt = new RawText(Constants.encodeASCII("foo\nbar"));
  175. assertEquals("\n", rt.getLineDelimiter());
  176. assertTrue(rt.isMissingNewlineAtEnd());
  177. rt = new RawText(Constants.encodeASCII("foo\r\nbar"));
  178. assertEquals("\r\n", rt.getLineDelimiter());
  179. assertTrue(rt.isMissingNewlineAtEnd());
  180. rt = new RawText(Constants.encodeASCII("foo\nbar\r\n"));
  181. assertEquals("\n", rt.getLineDelimiter());
  182. assertFalse(rt.isMissingNewlineAtEnd());
  183. rt = new RawText(Constants.encodeASCII("foo\r\nbar\n"));
  184. assertEquals("\r\n", rt.getLineDelimiter());
  185. assertFalse(rt.isMissingNewlineAtEnd());
  186. rt = new RawText(Constants.encodeASCII("foo"));
  187. assertNull(rt.getLineDelimiter());
  188. assertTrue(rt.isMissingNewlineAtEnd());
  189. rt = new RawText(Constants.encodeASCII(""));
  190. assertNull(rt.getLineDelimiter());
  191. assertTrue(rt.isMissingNewlineAtEnd());
  192. rt = new RawText(Constants.encodeASCII("\n"));
  193. assertEquals("\n", rt.getLineDelimiter());
  194. assertFalse(rt.isMissingNewlineAtEnd());
  195. rt = new RawText(Constants.encodeASCII("\r\n"));
  196. assertEquals("\r\n", rt.getLineDelimiter());
  197. assertFalse(rt.isMissingNewlineAtEnd());
  198. }
  199. @Test
  200. public void testLineDelimiter2() throws Exception {
  201. RawText rt = new RawText(Constants.encodeASCII("\nfoo"));
  202. assertEquals("\n", rt.getLineDelimiter());
  203. assertTrue(rt.isMissingNewlineAtEnd());
  204. }
  205. private static RawText t(String text) {
  206. StringBuilder r = new StringBuilder();
  207. for (int i = 0; i < text.length(); i++) {
  208. r.append(text.charAt(i));
  209. r.append('\n');
  210. }
  211. try {
  212. return new RawText(r.toString().getBytes("UTF-8"));
  213. } catch (UnsupportedEncodingException e) {
  214. throw new RuntimeException(e);
  215. }
  216. }
  217. }