You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

Increase core.streamFileThreshold default to 50 MiB Projects like org.eclipse.mdt contain large XML files about 6 MiB in size. So does the Android project platform/frameworks/base. Doing a clone of either project with JGit takes forever to checkout the files into the working directory, because delta decompression tends to be very expensive as we need to constantly reposition the base stream for each copy instruction. This can be made worse by a very bad ordering of offsets, possibly due to an XML editor that doesn't preserve the order of elements in the file very well. Increasing the threshold to the same limit PackWriter uses when doing delta compression (50 MiB) permits a default configured JGit to decompress these XML file objects using the faster random-access arrays, rather than re-seeking through an inflate stream, significantly reducing checkout time after a clone. Since this new limit may be dangerously close to the JVM maximum heap size, every allocation attempt is now wrapped in a try/catch so that JGit can degrade by switching to the large object stream mode when the allocation is refused. It will run slower, but the operation will still complete. The large stream mode will run very well for big objects that aren't delta compressed, and is acceptable for delta compressed objects that are using only forward referencing copy instructions. Copies using prior offsets are still going to be horrible, and there is nothing we can do about it except increase core.streamFileThreshold. We might in the future want to consider changing the way the delta generators work in JGit and native C Git to avoid prior offsets once an object reaches a certain size, even if that causes the delta instruction stream to be slightly larger. Unfortunately native C Git won't want to do that until its also able to stream objects rather than malloc them as contiguous blocks. Change-Id: Ief7a3896afce15073e80d3691bed90c6a3897307 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
13 年之前
Increase core.streamFileThreshold default to 50 MiB Projects like org.eclipse.mdt contain large XML files about 6 MiB in size. So does the Android project platform/frameworks/base. Doing a clone of either project with JGit takes forever to checkout the files into the working directory, because delta decompression tends to be very expensive as we need to constantly reposition the base stream for each copy instruction. This can be made worse by a very bad ordering of offsets, possibly due to an XML editor that doesn't preserve the order of elements in the file very well. Increasing the threshold to the same limit PackWriter uses when doing delta compression (50 MiB) permits a default configured JGit to decompress these XML file objects using the faster random-access arrays, rather than re-seeking through an inflate stream, significantly reducing checkout time after a clone. Since this new limit may be dangerously close to the JVM maximum heap size, every allocation attempt is now wrapped in a try/catch so that JGit can degrade by switching to the large object stream mode when the allocation is refused. It will run slower, but the operation will still complete. The large stream mode will run very well for big objects that aren't delta compressed, and is acceptable for delta compressed objects that are using only forward referencing copy instructions. Copies using prior offsets are still going to be horrible, and there is nothing we can do about it except increase core.streamFileThreshold. We might in the future want to consider changing the way the delta generators work in JGit and native C Git to avoid prior offsets once an object reaches a certain size, even if that causes the delta instruction stream to be slightly larger. Unfortunately native C Git won't want to do that until its also able to stream objects rather than malloc them as contiguous blocks. Change-Id: Ief7a3896afce15073e80d3691bed90c6a3897307 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
13 年之前
Increase core.streamFileThreshold default to 50 MiB Projects like org.eclipse.mdt contain large XML files about 6 MiB in size. So does the Android project platform/frameworks/base. Doing a clone of either project with JGit takes forever to checkout the files into the working directory, because delta decompression tends to be very expensive as we need to constantly reposition the base stream for each copy instruction. This can be made worse by a very bad ordering of offsets, possibly due to an XML editor that doesn't preserve the order of elements in the file very well. Increasing the threshold to the same limit PackWriter uses when doing delta compression (50 MiB) permits a default configured JGit to decompress these XML file objects using the faster random-access arrays, rather than re-seeking through an inflate stream, significantly reducing checkout time after a clone. Since this new limit may be dangerously close to the JVM maximum heap size, every allocation attempt is now wrapped in a try/catch so that JGit can degrade by switching to the large object stream mode when the allocation is refused. It will run slower, but the operation will still complete. The large stream mode will run very well for big objects that aren't delta compressed, and is acceptable for delta compressed objects that are using only forward referencing copy instructions. Copies using prior offsets are still going to be horrible, and there is nothing we can do about it except increase core.streamFileThreshold. We might in the future want to consider changing the way the delta generators work in JGit and native C Git to avoid prior offsets once an object reaches a certain size, even if that causes the delta instruction stream to be slightly larger. Unfortunately native C Git won't want to do that until its also able to stream objects rather than malloc them as contiguous blocks. Change-Id: Ief7a3896afce15073e80d3691bed90c6a3897307 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
13 年之前
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374
  1. /*
  2. * Copyright (C) 2010, Google Inc. and others
  3. *
  4. * This program and the accompanying materials are made available under the
  5. * terms of the Eclipse Distribution License v. 1.0 which is available at
  6. * https://www.eclipse.org/org/documents/edl-v10.php.
  7. *
  8. * SPDX-License-Identifier: BSD-3-Clause
  9. */
  10. package org.eclipse.jgit.internal.storage.file;
  11. import static org.junit.Assert.assertArrayEquals;
  12. import static org.junit.Assert.assertEquals;
  13. import static org.junit.Assert.assertFalse;
  14. import static org.junit.Assert.assertNotNull;
  15. import static org.junit.Assert.assertNull;
  16. import static org.junit.Assert.assertTrue;
  17. import static org.junit.Assert.fail;
  18. import java.io.ByteArrayInputStream;
  19. import java.io.ByteArrayOutputStream;
  20. import java.io.File;
  21. import java.io.FileOutputStream;
  22. import java.io.IOException;
  23. import java.security.MessageDigest;
  24. import java.text.MessageFormat;
  25. import java.util.ArrayList;
  26. import java.util.Arrays;
  27. import java.util.Collections;
  28. import java.util.List;
  29. import java.util.zip.Deflater;
  30. import org.eclipse.jgit.errors.LargeObjectException;
  31. import org.eclipse.jgit.internal.JGitText;
  32. import org.eclipse.jgit.internal.storage.pack.DeltaEncoder;
  33. import org.eclipse.jgit.internal.storage.pack.PackExt;
  34. import org.eclipse.jgit.junit.JGitTestUtil;
  35. import org.eclipse.jgit.junit.LocalDiskRepositoryTestCase;
  36. import org.eclipse.jgit.junit.TestRepository;
  37. import org.eclipse.jgit.junit.TestRng;
  38. import org.eclipse.jgit.lib.Constants;
  39. import org.eclipse.jgit.lib.NullProgressMonitor;
  40. import org.eclipse.jgit.lib.ObjectId;
  41. import org.eclipse.jgit.lib.ObjectInserter;
  42. import org.eclipse.jgit.lib.ObjectLoader;
  43. import org.eclipse.jgit.lib.ObjectStream;
  44. import org.eclipse.jgit.lib.Repository;
  45. import org.eclipse.jgit.revwalk.RevBlob;
  46. import org.eclipse.jgit.storage.file.WindowCacheConfig;
  47. import org.eclipse.jgit.transport.PackParser;
  48. import org.eclipse.jgit.transport.PackedObjectInfo;
  49. import org.eclipse.jgit.util.IO;
  50. import org.eclipse.jgit.util.NB;
  51. import org.eclipse.jgit.util.TemporaryBuffer;
  52. import org.junit.After;
  53. import org.junit.Before;
  54. import org.junit.Test;
  55. public class PackTest extends LocalDiskRepositoryTestCase {
  56. private int streamThreshold = 16 * 1024;
  57. private TestRng rng;
  58. private FileRepository repo;
  59. private TestRepository<Repository> tr;
  60. private WindowCursor wc;
  61. private TestRng getRng() {
  62. if (rng == null)
  63. rng = new TestRng(JGitTestUtil.getName());
  64. return rng;
  65. }
  66. @Override
  67. @Before
  68. public void setUp() throws Exception {
  69. super.setUp();
  70. WindowCacheConfig cfg = new WindowCacheConfig();
  71. cfg.setStreamFileThreshold(streamThreshold);
  72. cfg.install();
  73. repo = createBareRepository();
  74. tr = new TestRepository<>(repo);
  75. wc = (WindowCursor) repo.newObjectReader();
  76. }
  77. @Override
  78. @After
  79. public void tearDown() throws Exception {
  80. if (wc != null)
  81. wc.close();
  82. new WindowCacheConfig().install();
  83. super.tearDown();
  84. }
  85. @Test
  86. public void testWhole_SmallObject() throws Exception {
  87. final int type = Constants.OBJ_BLOB;
  88. byte[] data = getRng().nextBytes(300);
  89. RevBlob id = tr.blob(data);
  90. tr.branch("master").commit().add("A", id).create();
  91. tr.packAndPrune();
  92. assertTrue("has blob", wc.has(id));
  93. ObjectLoader ol = wc.open(id);
  94. assertNotNull("created loader", ol);
  95. assertEquals(type, ol.getType());
  96. assertEquals(data.length, ol.getSize());
  97. assertFalse("is not large", ol.isLarge());
  98. assertTrue("same content", Arrays.equals(data, ol.getCachedBytes()));
  99. try (ObjectStream in = ol.openStream()) {
  100. assertNotNull("have stream", in);
  101. assertEquals(type, in.getType());
  102. assertEquals(data.length, in.getSize());
  103. byte[] data2 = new byte[data.length];
  104. IO.readFully(in, data2, 0, data.length);
  105. assertTrue("same content", Arrays.equals(data2, data));
  106. assertEquals("stream at EOF", -1, in.read());
  107. }
  108. }
  109. @Test
  110. public void testWhole_LargeObject() throws Exception {
  111. final int type = Constants.OBJ_BLOB;
  112. byte[] data = getRng().nextBytes(streamThreshold + 5);
  113. RevBlob id = tr.blob(data);
  114. tr.branch("master").commit().add("A", id).create();
  115. tr.packAndPrune();
  116. assertTrue("has blob", wc.has(id));
  117. ObjectLoader ol = wc.open(id);
  118. assertNotNull("created loader", ol);
  119. assertEquals(type, ol.getType());
  120. assertEquals(data.length, ol.getSize());
  121. assertTrue("is large", ol.isLarge());
  122. try {
  123. ol.getCachedBytes();
  124. fail("Should have thrown LargeObjectException");
  125. } catch (LargeObjectException tooBig) {
  126. assertEquals(MessageFormat.format(
  127. JGitText.get().largeObjectException, id.name()), tooBig
  128. .getMessage());
  129. }
  130. try (ObjectStream in = ol.openStream()) {
  131. assertNotNull("have stream", in);
  132. assertEquals(type, in.getType());
  133. assertEquals(data.length, in.getSize());
  134. byte[] data2 = new byte[data.length];
  135. IO.readFully(in, data2, 0, data.length);
  136. assertTrue("same content", Arrays.equals(data2, data));
  137. assertEquals("stream at EOF", -1, in.read());
  138. }
  139. }
  140. @Test
  141. public void testDelta_SmallObjectChain() throws Exception {
  142. try (ObjectInserter.Formatter fmt = new ObjectInserter.Formatter()) {
  143. byte[] data0 = new byte[512];
  144. Arrays.fill(data0, (byte) 0xf3);
  145. ObjectId id0 = fmt.idFor(Constants.OBJ_BLOB, data0);
  146. TemporaryBuffer.Heap pack = new TemporaryBuffer.Heap(64 * 1024);
  147. packHeader(pack, 4);
  148. objectHeader(pack, Constants.OBJ_BLOB, data0.length);
  149. deflate(pack, data0);
  150. byte[] data1 = clone(0x01, data0);
  151. byte[] delta1 = delta(data0, data1);
  152. ObjectId id1 = fmt.idFor(Constants.OBJ_BLOB, data1);
  153. objectHeader(pack, Constants.OBJ_REF_DELTA, delta1.length);
  154. id0.copyRawTo(pack);
  155. deflate(pack, delta1);
  156. byte[] data2 = clone(0x02, data1);
  157. byte[] delta2 = delta(data1, data2);
  158. ObjectId id2 = fmt.idFor(Constants.OBJ_BLOB, data2);
  159. objectHeader(pack, Constants.OBJ_REF_DELTA, delta2.length);
  160. id1.copyRawTo(pack);
  161. deflate(pack, delta2);
  162. byte[] data3 = clone(0x03, data2);
  163. byte[] delta3 = delta(data2, data3);
  164. ObjectId id3 = fmt.idFor(Constants.OBJ_BLOB, data3);
  165. objectHeader(pack, Constants.OBJ_REF_DELTA, delta3.length);
  166. id2.copyRawTo(pack);
  167. deflate(pack, delta3);
  168. digest(pack);
  169. PackParser ip = index(pack.toByteArray());
  170. ip.setAllowThin(true);
  171. ip.parse(NullProgressMonitor.INSTANCE);
  172. assertTrue("has blob", wc.has(id3));
  173. ObjectLoader ol = wc.open(id3);
  174. assertNotNull("created loader", ol);
  175. assertEquals(Constants.OBJ_BLOB, ol.getType());
  176. assertEquals(data3.length, ol.getSize());
  177. assertFalse("is large", ol.isLarge());
  178. assertNotNull(ol.getCachedBytes());
  179. assertArrayEquals(data3, ol.getCachedBytes());
  180. try (ObjectStream in = ol.openStream()) {
  181. assertNotNull("have stream", in);
  182. assertEquals(Constants.OBJ_BLOB, in.getType());
  183. assertEquals(data3.length, in.getSize());
  184. byte[] act = new byte[data3.length];
  185. IO.readFully(in, act, 0, data3.length);
  186. assertTrue("same content", Arrays.equals(act, data3));
  187. assertEquals("stream at EOF", -1, in.read());
  188. }
  189. }
  190. }
  191. @Test
  192. public void testDelta_FailsOver2GiB() throws Exception {
  193. try (ObjectInserter.Formatter fmt = new ObjectInserter.Formatter()) {
  194. byte[] base = new byte[] { 'a' };
  195. ObjectId idA = fmt.idFor(Constants.OBJ_BLOB, base);
  196. ObjectId idB = fmt.idFor(Constants.OBJ_BLOB, new byte[] { 'b' });
  197. PackedObjectInfo a = new PackedObjectInfo(idA);
  198. PackedObjectInfo b = new PackedObjectInfo(idB);
  199. TemporaryBuffer.Heap packContents = new TemporaryBuffer.Heap(64 * 1024);
  200. packHeader(packContents, 2);
  201. a.setOffset(packContents.length());
  202. objectHeader(packContents, Constants.OBJ_BLOB, base.length);
  203. deflate(packContents, base);
  204. ByteArrayOutputStream tmp = new ByteArrayOutputStream();
  205. DeltaEncoder de = new DeltaEncoder(tmp, base.length, 3L << 30);
  206. de.copy(0, 1);
  207. byte[] delta = tmp.toByteArray();
  208. b.setOffset(packContents.length());
  209. objectHeader(packContents, Constants.OBJ_REF_DELTA, delta.length);
  210. idA.copyRawTo(packContents);
  211. deflate(packContents, delta);
  212. byte[] footer = digest(packContents);
  213. File dir = new File(repo.getObjectDatabase().getDirectory(),
  214. "pack");
  215. File packName = new File(dir, idA.name() + ".pack");
  216. File idxName = new File(dir, idA.name() + ".idx");
  217. try (FileOutputStream f = new FileOutputStream(packName)) {
  218. f.write(packContents.toByteArray());
  219. }
  220. try (FileOutputStream f = new FileOutputStream(idxName)) {
  221. List<PackedObjectInfo> list = new ArrayList<>();
  222. list.add(a);
  223. list.add(b);
  224. Collections.sort(list);
  225. new PackIndexWriterV1(f).write(list, footer);
  226. }
  227. Pack pack = new Pack(packName, PackExt.INDEX.getBit());
  228. try {
  229. pack.get(wc, b);
  230. fail("expected LargeObjectException.ExceedsByteArrayLimit");
  231. } catch (LargeObjectException.ExceedsByteArrayLimit bad) {
  232. assertNull(bad.getObjectId());
  233. } finally {
  234. pack.close();
  235. }
  236. }
  237. }
  238. @Test
  239. public void testConfigurableStreamFileThreshold() throws Exception {
  240. byte[] data = getRng().nextBytes(300);
  241. RevBlob id = tr.blob(data);
  242. tr.branch("master").commit().add("A", id).create();
  243. tr.packAndPrune();
  244. assertTrue("has blob", wc.has(id));
  245. ObjectLoader ol = wc.open(id);
  246. try (ObjectStream in = ol.openStream()) {
  247. assertTrue(in instanceof ObjectStream.SmallStream);
  248. assertEquals(300, in.available());
  249. }
  250. wc.setStreamFileThreshold(299);
  251. ol = wc.open(id);
  252. try (ObjectStream in = ol.openStream()) {
  253. assertTrue(in instanceof ObjectStream.Filter);
  254. assertEquals(1, in.available());
  255. }
  256. }
  257. private static byte[] clone(int first, byte[] base) {
  258. byte[] r = new byte[base.length];
  259. System.arraycopy(base, 1, r, 1, r.length - 1);
  260. r[0] = (byte) first;
  261. return r;
  262. }
  263. private static byte[] delta(byte[] base, byte[] dest) throws IOException {
  264. ByteArrayOutputStream tmp = new ByteArrayOutputStream();
  265. DeltaEncoder de = new DeltaEncoder(tmp, base.length, dest.length);
  266. de.insert(dest, 0, 1);
  267. de.copy(1, base.length - 1);
  268. return tmp.toByteArray();
  269. }
  270. private static void packHeader(TemporaryBuffer.Heap pack, int cnt)
  271. throws IOException {
  272. final byte[] hdr = new byte[8];
  273. NB.encodeInt32(hdr, 0, 2);
  274. NB.encodeInt32(hdr, 4, cnt);
  275. pack.write(Constants.PACK_SIGNATURE);
  276. pack.write(hdr, 0, 8);
  277. }
  278. private static void objectHeader(TemporaryBuffer.Heap pack, int type, int sz)
  279. throws IOException {
  280. byte[] buf = new byte[8];
  281. int nextLength = sz >>> 4;
  282. buf[0] = (byte) ((nextLength > 0 ? 0x80 : 0x00) | (type << 4) | (sz & 0x0F));
  283. sz = nextLength;
  284. int n = 1;
  285. while (sz > 0) {
  286. nextLength >>>= 7;
  287. buf[n++] = (byte) ((nextLength > 0 ? 0x80 : 0x00) | (sz & 0x7F));
  288. sz = nextLength;
  289. }
  290. pack.write(buf, 0, n);
  291. }
  292. private static void deflate(TemporaryBuffer.Heap pack, byte[] content)
  293. throws IOException {
  294. final Deflater deflater = new Deflater();
  295. final byte[] buf = new byte[128];
  296. deflater.setInput(content, 0, content.length);
  297. deflater.finish();
  298. do {
  299. final int n = deflater.deflate(buf, 0, buf.length);
  300. if (n > 0)
  301. pack.write(buf, 0, n);
  302. } while (!deflater.finished());
  303. deflater.end();
  304. }
  305. private static byte[] digest(TemporaryBuffer.Heap buf)
  306. throws IOException {
  307. MessageDigest md = Constants.newMessageDigest();
  308. md.update(buf.toByteArray());
  309. byte[] footer = md.digest();
  310. buf.write(footer);
  311. return footer;
  312. }
  313. private ObjectInserter inserter;
  314. @After
  315. public void release() {
  316. if (inserter != null) {
  317. inserter.close();
  318. }
  319. }
  320. private PackParser index(byte[] raw) throws IOException {
  321. if (inserter == null)
  322. inserter = repo.newObjectInserter();
  323. return inserter.newPackParser(new ByteArrayInputStream(raw));
  324. }
  325. }