You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

WindowCursor.java 9.9KB

Extract PackFile specific code to ObjectToPack subclass The ObjectReader class is dual-purposed into being a factory for the ObjectToPack, permitting specific ObjectDatabase implementations to override the method and offer their own custom subclass of the generic ObjectToPack class. By allowing them to directly extend the type, each implementation can add custom fields to support tracking where an object is stored, without incurring any additional penalties like a parallel Map<ObjectId,Object> would cost. The reader was chosen to act as a factory rather than the database, as the reader will eventually be tied more tightly with the ObjectWalk and TreeWalk. During object enumeration the reader would have had to load the object for the RevWalk, and may chose to cache object position data internally so it can later be reused and fed into the ObjectToPack instance supplied to the PackWriter. Since a reader is not thread-safe, and is scoped to this PackWriter and its internal ObjectWalk, its a great place for the database to perform caching, if any. Right now this change goes a bit backwards by changing what should be generic ObjectToPack references inside of PackWriter to the very PackFile specific LocalObjectToPack subclass. We will correct these in a later commit as we start to refine what the ObjectToPack API will eventually look like in order to better support the PackWriter. Change-Id: I9f047d26b97e46dee3bc0ccb4060bbebedbe8ea9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago
PackWriter: Support reuse of entire packs The most expensive part of packing a repository for transport to another system is enumerating all of the objects in the repository. Once this gets to the size of the linux-2.6 repository (1.8 million objects), enumeration can take several CPU minutes and costs a lot of temporary working set memory. Teach PackWriter to efficiently reuse an existing "cached pack" by answering a clone request with a thin pack followed by a larger cached pack appended to the end. This requires the repository owner to first construct the cached pack by hand, and record the tip commits inside of $GIT_DIR/objects/info/cached-packs: cd $GIT_DIR root=$(git rev-parse master) tmp=objects/.tmp-$$ names=$(echo $root | git pack-objects --keep-true-parents --revs $tmp) for n in $names; do chmod a-w $tmp-$n.pack $tmp-$n.idx touch objects/pack/pack-$n.keep mv $tmp-$n.pack objects/pack/pack-$n.pack mv $tmp-$n.idx objects/pack/pack-$n.idx done (echo "+ $root"; for n in $names; do echo "P $n"; done; echo) >>objects/info/cached-packs git repack -a -d When a clone request needs to include $root, the corresponding cached pack will be copied as-is, rather than enumerating all of the objects that are reachable from $root. For a linux-2.6 kernel repository that should be about 376 MiB, the above process creates two packs of 368 MiB and 38 MiB[1]. This is a local disk usage increase of ~26 MiB, due to reduced delta compression between the large cached pack and the smaller recent activity pack. The overhead is similar to 1 full copy of the compressed project sources. With this cached pack in hand, JGit daemon completes a clone request in 1m17s less time, but a slightly larger data transfer (+2.39 MiB): Before: remote: Counting objects: 1861830, done remote: Finding sources: 100% (1861830/1861830) remote: Getting sizes: 100% (88243/88243) remote: Compressing objects: 100% (88184/88184) Receiving objects: 100% (1861830/1861830), 376.01 MiB | 19.01 MiB/s, done. remote: Total 1861830 (delta 4706), reused 1851053 (delta 1553844) Resolving deltas: 100% (1564621/1564621), done. real 3m19.005s After: remote: Counting objects: 1601, done remote: Counting objects: 1828460, done remote: Finding sources: 100% (50475/50475) remote: Getting sizes: 100% (18843/18843) remote: Compressing objects: 100% (7585/7585) remote: Total 1861830 (delta 2407), reused 1856197 (delta 37510) Receiving objects: 100% (1861830/1861830), 378.40 MiB | 31.31 MiB/s, done. Resolving deltas: 100% (1559477/1559477), done. real 2m2.938s Repository owners can periodically refresh their cached packs by repacking their repository, folding all newer objects into a larger cached pack. Since repacking is already considered to be a normal Git maintenance activity, this isn't a very big burden. [1] In this test $root was set back about two weeks. Change-Id: Ib87131d5c4b5e8c5cacb0f4fe16ff4ece554734b Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago
Extract PackFile specific code to ObjectToPack subclass The ObjectReader class is dual-purposed into being a factory for the ObjectToPack, permitting specific ObjectDatabase implementations to override the method and offer their own custom subclass of the generic ObjectToPack class. By allowing them to directly extend the type, each implementation can add custom fields to support tracking where an object is stored, without incurring any additional penalties like a parallel Map<ObjectId,Object> would cost. The reader was chosen to act as a factory rather than the database, as the reader will eventually be tied more tightly with the ObjectWalk and TreeWalk. During object enumeration the reader would have had to load the object for the RevWalk, and may chose to cache object position data internally so it can later be reused and fed into the ObjectToPack instance supplied to the PackWriter. Since a reader is not thread-safe, and is scoped to this PackWriter and its internal ObjectWalk, its a great place for the database to perform caching, if any. Right now this change goes a bit backwards by changing what should be generic ObjectToPack references inside of PackWriter to the very PackFile specific LocalObjectToPack subclass. We will correct these in a later commit as we start to refine what the ObjectToPack API will eventually look like in order to better support the PackWriter. Change-Id: I9f047d26b97e46dee3bc0ccb4060bbebedbe8ea9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago
PackWriter: Support reuse of entire packs The most expensive part of packing a repository for transport to another system is enumerating all of the objects in the repository. Once this gets to the size of the linux-2.6 repository (1.8 million objects), enumeration can take several CPU minutes and costs a lot of temporary working set memory. Teach PackWriter to efficiently reuse an existing "cached pack" by answering a clone request with a thin pack followed by a larger cached pack appended to the end. This requires the repository owner to first construct the cached pack by hand, and record the tip commits inside of $GIT_DIR/objects/info/cached-packs: cd $GIT_DIR root=$(git rev-parse master) tmp=objects/.tmp-$$ names=$(echo $root | git pack-objects --keep-true-parents --revs $tmp) for n in $names; do chmod a-w $tmp-$n.pack $tmp-$n.idx touch objects/pack/pack-$n.keep mv $tmp-$n.pack objects/pack/pack-$n.pack mv $tmp-$n.idx objects/pack/pack-$n.idx done (echo "+ $root"; for n in $names; do echo "P $n"; done; echo) >>objects/info/cached-packs git repack -a -d When a clone request needs to include $root, the corresponding cached pack will be copied as-is, rather than enumerating all of the objects that are reachable from $root. For a linux-2.6 kernel repository that should be about 376 MiB, the above process creates two packs of 368 MiB and 38 MiB[1]. This is a local disk usage increase of ~26 MiB, due to reduced delta compression between the large cached pack and the smaller recent activity pack. The overhead is similar to 1 full copy of the compressed project sources. With this cached pack in hand, JGit daemon completes a clone request in 1m17s less time, but a slightly larger data transfer (+2.39 MiB): Before: remote: Counting objects: 1861830, done remote: Finding sources: 100% (1861830/1861830) remote: Getting sizes: 100% (88243/88243) remote: Compressing objects: 100% (88184/88184) Receiving objects: 100% (1861830/1861830), 376.01 MiB | 19.01 MiB/s, done. remote: Total 1861830 (delta 4706), reused 1851053 (delta 1553844) Resolving deltas: 100% (1564621/1564621), done. real 3m19.005s After: remote: Counting objects: 1601, done remote: Counting objects: 1828460, done remote: Finding sources: 100% (50475/50475) remote: Getting sizes: 100% (18843/18843) remote: Compressing objects: 100% (7585/7585) remote: Total 1861830 (delta 2407), reused 1856197 (delta 37510) Receiving objects: 100% (1861830/1861830), 378.40 MiB | 31.31 MiB/s, done. Resolving deltas: 100% (1559477/1559477), done. real 2m2.938s Repository owners can periodically refresh their cached packs by repacking their repository, folding all newer objects into a larger cached pack. Since repacking is already considered to be a normal Git maintenance activity, this isn't a very big burden. [1] In this test $root was set back about two weeks. Change-Id: Ib87131d5c4b5e8c5cacb0f4fe16ff4ece554734b Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago
PackWriter: Support reuse of entire packs The most expensive part of packing a repository for transport to another system is enumerating all of the objects in the repository. Once this gets to the size of the linux-2.6 repository (1.8 million objects), enumeration can take several CPU minutes and costs a lot of temporary working set memory. Teach PackWriter to efficiently reuse an existing "cached pack" by answering a clone request with a thin pack followed by a larger cached pack appended to the end. This requires the repository owner to first construct the cached pack by hand, and record the tip commits inside of $GIT_DIR/objects/info/cached-packs: cd $GIT_DIR root=$(git rev-parse master) tmp=objects/.tmp-$$ names=$(echo $root | git pack-objects --keep-true-parents --revs $tmp) for n in $names; do chmod a-w $tmp-$n.pack $tmp-$n.idx touch objects/pack/pack-$n.keep mv $tmp-$n.pack objects/pack/pack-$n.pack mv $tmp-$n.idx objects/pack/pack-$n.idx done (echo "+ $root"; for n in $names; do echo "P $n"; done; echo) >>objects/info/cached-packs git repack -a -d When a clone request needs to include $root, the corresponding cached pack will be copied as-is, rather than enumerating all of the objects that are reachable from $root. For a linux-2.6 kernel repository that should be about 376 MiB, the above process creates two packs of 368 MiB and 38 MiB[1]. This is a local disk usage increase of ~26 MiB, due to reduced delta compression between the large cached pack and the smaller recent activity pack. The overhead is similar to 1 full copy of the compressed project sources. With this cached pack in hand, JGit daemon completes a clone request in 1m17s less time, but a slightly larger data transfer (+2.39 MiB): Before: remote: Counting objects: 1861830, done remote: Finding sources: 100% (1861830/1861830) remote: Getting sizes: 100% (88243/88243) remote: Compressing objects: 100% (88184/88184) Receiving objects: 100% (1861830/1861830), 376.01 MiB | 19.01 MiB/s, done. remote: Total 1861830 (delta 4706), reused 1851053 (delta 1553844) Resolving deltas: 100% (1564621/1564621), done. real 3m19.005s After: remote: Counting objects: 1601, done remote: Counting objects: 1828460, done remote: Finding sources: 100% (50475/50475) remote: Getting sizes: 100% (18843/18843) remote: Compressing objects: 100% (7585/7585) remote: Total 1861830 (delta 2407), reused 1856197 (delta 37510) Receiving objects: 100% (1861830/1861830), 378.40 MiB | 31.31 MiB/s, done. Resolving deltas: 100% (1559477/1559477), done. real 2m2.938s Repository owners can periodically refresh their cached packs by repacking their repository, folding all newer objects into a larger cached pack. Since repacking is already considered to be a normal Git maintenance activity, this isn't a very big burden. [1] In this test $root was set back about two weeks. Change-Id: Ib87131d5c4b5e8c5cacb0f4fe16ff4ece554734b Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307
  1. /*
  2. * Copyright (C) 2008-2009, Google Inc.
  3. * Copyright (C) 2006-2008, Shawn O. Pearce <spearce@spearce.org>
  4. * and other copyright owners as documented in the project's IP log.
  5. *
  6. * This program and the accompanying materials are made available
  7. * under the terms of the Eclipse Distribution License v1.0 which
  8. * accompanies this distribution, is reproduced below, and is
  9. * available at http://www.eclipse.org/org/documents/edl-v10.php
  10. *
  11. * All rights reserved.
  12. *
  13. * Redistribution and use in source and binary forms, with or
  14. * without modification, are permitted provided that the following
  15. * conditions are met:
  16. *
  17. * - Redistributions of source code must retain the above copyright
  18. * notice, this list of conditions and the following disclaimer.
  19. *
  20. * - Redistributions in binary form must reproduce the above
  21. * copyright notice, this list of conditions and the following
  22. * disclaimer in the documentation and/or other materials provided
  23. * with the distribution.
  24. *
  25. * - Neither the name of the Eclipse Foundation, Inc. nor the
  26. * names of its contributors may be used to endorse or promote
  27. * products derived from this software without specific prior
  28. * written permission.
  29. *
  30. * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
  31. * CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
  32. * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  33. * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  34. * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
  35. * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
  36. * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
  37. * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
  38. * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  39. * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
  40. * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
  41. * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
  42. * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  43. */
  44. package org.eclipse.jgit.storage.file;
  45. import java.io.IOException;
  46. import java.util.Collection;
  47. import java.util.Collections;
  48. import java.util.HashSet;
  49. import java.util.List;
  50. import java.util.zip.DataFormatException;
  51. import java.util.zip.Inflater;
  52. import org.eclipse.jgit.errors.IncorrectObjectTypeException;
  53. import org.eclipse.jgit.errors.MissingObjectException;
  54. import org.eclipse.jgit.errors.StoredObjectRepresentationNotAvailableException;
  55. import org.eclipse.jgit.lib.AbbreviatedObjectId;
  56. import org.eclipse.jgit.lib.AnyObjectId;
  57. import org.eclipse.jgit.lib.Constants;
  58. import org.eclipse.jgit.lib.InflaterCache;
  59. import org.eclipse.jgit.lib.ObjectId;
  60. import org.eclipse.jgit.lib.ObjectLoader;
  61. import org.eclipse.jgit.lib.ObjectReader;
  62. import org.eclipse.jgit.lib.ProgressMonitor;
  63. import org.eclipse.jgit.revwalk.RevObject;
  64. import org.eclipse.jgit.storage.pack.CachedPack;
  65. import org.eclipse.jgit.storage.pack.ObjectReuseAsIs;
  66. import org.eclipse.jgit.storage.pack.ObjectToPack;
  67. import org.eclipse.jgit.storage.pack.PackOutputStream;
  68. import org.eclipse.jgit.storage.pack.PackWriter;
  69. /** Active handle to a ByteWindow. */
  70. final class WindowCursor extends ObjectReader implements ObjectReuseAsIs {
  71. /** Temporary buffer large enough for at least one raw object id. */
  72. final byte[] tempId = new byte[Constants.OBJECT_ID_LENGTH];
  73. private Inflater inf;
  74. private ByteWindow window;
  75. final FileObjectDatabase db;
  76. WindowCursor(FileObjectDatabase db) {
  77. this.db = db;
  78. }
  79. @Override
  80. public ObjectReader newReader() {
  81. return new WindowCursor(db);
  82. }
  83. @Override
  84. public Collection<ObjectId> resolve(AbbreviatedObjectId id)
  85. throws IOException {
  86. if (id.isComplete())
  87. return Collections.singleton(id.toObjectId());
  88. HashSet<ObjectId> matches = new HashSet<ObjectId>(4);
  89. db.resolve(matches, id);
  90. return matches;
  91. }
  92. public boolean has(AnyObjectId objectId) throws IOException {
  93. return db.has(objectId);
  94. }
  95. public ObjectLoader open(AnyObjectId objectId, int typeHint)
  96. throws MissingObjectException, IncorrectObjectTypeException,
  97. IOException {
  98. final ObjectLoader ldr = db.openObject(this, objectId);
  99. if (ldr == null) {
  100. if (typeHint == OBJ_ANY)
  101. throw new MissingObjectException(objectId.copy(), "unknown");
  102. throw new MissingObjectException(objectId.copy(), typeHint);
  103. }
  104. if (typeHint != OBJ_ANY && ldr.getType() != typeHint)
  105. throw new IncorrectObjectTypeException(objectId.copy(), typeHint);
  106. return ldr;
  107. }
  108. public long getObjectSize(AnyObjectId objectId, int typeHint)
  109. throws MissingObjectException, IncorrectObjectTypeException,
  110. IOException {
  111. long sz = db.getObjectSize(this, objectId);
  112. if (sz < 0) {
  113. if (typeHint == OBJ_ANY)
  114. throw new MissingObjectException(objectId.copy(), "unknown");
  115. throw new MissingObjectException(objectId.copy(), typeHint);
  116. }
  117. return sz;
  118. }
  119. public LocalObjectToPack newObjectToPack(RevObject obj) {
  120. return new LocalObjectToPack(obj);
  121. }
  122. public void selectObjectRepresentation(PackWriter packer,
  123. ProgressMonitor monitor, Iterable<ObjectToPack> objects)
  124. throws IOException, MissingObjectException {
  125. for (ObjectToPack otp : objects) {
  126. db.selectObjectRepresentation(packer, otp, this);
  127. monitor.update(1);
  128. }
  129. }
  130. public void copyObjectAsIs(PackOutputStream out, ObjectToPack otp)
  131. throws IOException, StoredObjectRepresentationNotAvailableException {
  132. LocalObjectToPack src = (LocalObjectToPack) otp;
  133. src.pack.copyAsIs(out, src, this);
  134. }
  135. public void writeObjects(PackOutputStream out, List<ObjectToPack> list)
  136. throws IOException {
  137. for (ObjectToPack otp : list)
  138. out.writeObject(otp);
  139. }
  140. @SuppressWarnings("unchecked")
  141. public Collection<CachedPack> getCachedPacks() throws IOException {
  142. return (Collection<CachedPack>) db.getCachedPacks();
  143. }
  144. /**
  145. * Copy bytes from the window to a caller supplied buffer.
  146. *
  147. * @param pack
  148. * the file the desired window is stored within.
  149. * @param position
  150. * position within the file to read from.
  151. * @param dstbuf
  152. * destination buffer to copy into.
  153. * @param dstoff
  154. * offset within <code>dstbuf</code> to start copying into.
  155. * @param cnt
  156. * number of bytes to copy. This value may exceed the number of
  157. * bytes remaining in the window starting at offset
  158. * <code>pos</code>.
  159. * @return number of bytes actually copied; this may be less than
  160. * <code>cnt</code> if <code>cnt</code> exceeded the number of bytes
  161. * available.
  162. * @throws IOException
  163. * this cursor does not match the provider or id and the proper
  164. * window could not be acquired through the provider's cache.
  165. */
  166. int copy(final PackFile pack, long position, final byte[] dstbuf,
  167. int dstoff, final int cnt) throws IOException {
  168. final long length = pack.length;
  169. int need = cnt;
  170. while (need > 0 && position < length) {
  171. pin(pack, position);
  172. final int r = window.copy(position, dstbuf, dstoff, need);
  173. position += r;
  174. dstoff += r;
  175. need -= r;
  176. }
  177. return cnt - need;
  178. }
  179. public void copyPackAsIs(PackOutputStream out, CachedPack pack)
  180. throws IOException {
  181. ((LocalCachedPack) pack).copyAsIs(out, this);
  182. }
  183. void copyPackAsIs(final PackFile pack, final PackOutputStream out,
  184. long position, long cnt) throws IOException {
  185. while (0 < cnt) {
  186. pin(pack, position);
  187. int ptr = (int) (position - window.start);
  188. int n = (int) Math.min(window.size() - ptr, cnt);
  189. window.write(out, position, n);
  190. position += n;
  191. cnt -= n;
  192. }
  193. }
  194. /**
  195. * Inflate a region of the pack starting at {@code position}.
  196. *
  197. * @param pack
  198. * the file the desired window is stored within.
  199. * @param position
  200. * position within the file to read from.
  201. * @param dstbuf
  202. * destination buffer the inflater should output decompressed
  203. * data to.
  204. * @param dstoff
  205. * current offset within <code>dstbuf</code> to inflate into.
  206. * @return updated <code>dstoff</code> based on the number of bytes
  207. * successfully inflated into <code>dstbuf</code>.
  208. * @throws IOException
  209. * this cursor does not match the provider or id and the proper
  210. * window could not be acquired through the provider's cache.
  211. * @throws DataFormatException
  212. * the inflater encountered an invalid chunk of data. Data
  213. * stream corruption is likely.
  214. */
  215. int inflate(final PackFile pack, long position, final byte[] dstbuf,
  216. int dstoff) throws IOException, DataFormatException {
  217. prepareInflater();
  218. pin(pack, position);
  219. position += window.setInput(position, inf);
  220. do {
  221. int n = inf.inflate(dstbuf, dstoff, dstbuf.length - dstoff);
  222. if (n == 0) {
  223. if (inf.needsInput()) {
  224. pin(pack, position);
  225. position += window.setInput(position, inf);
  226. } else if (inf.finished())
  227. return dstoff;
  228. else
  229. throw new DataFormatException();
  230. }
  231. dstoff += n;
  232. } while (dstoff < dstbuf.length);
  233. return dstoff;
  234. }
  235. ByteArrayWindow quickCopy(PackFile p, long pos, long cnt)
  236. throws IOException {
  237. pin(p, pos);
  238. if (window instanceof ByteArrayWindow
  239. && window.contains(p, pos + (cnt - 1)))
  240. return (ByteArrayWindow) window;
  241. return null;
  242. }
  243. Inflater inflater() {
  244. prepareInflater();
  245. return inf;
  246. }
  247. private void prepareInflater() {
  248. if (inf == null)
  249. inf = InflaterCache.get();
  250. else
  251. inf.reset();
  252. }
  253. void pin(final PackFile pack, final long position)
  254. throws IOException {
  255. final ByteWindow w = window;
  256. if (w == null || !w.contains(pack, position)) {
  257. // If memory is low, we may need what is in our window field to
  258. // be cleaned up by the GC during the get for the next window.
  259. // So we always clear it, even though we are just going to set
  260. // it again.
  261. //
  262. window = null;
  263. window = WindowCache.get(pack, position);
  264. }
  265. }
  266. int getStreamFileThreshold() {
  267. return WindowCache.getStreamFileThreshold();
  268. }
  269. /** Release the current window cursor. */
  270. public void release() {
  271. window = null;
  272. try {
  273. InflaterCache.release(inf);
  274. } finally {
  275. inf = null;
  276. }
  277. }
  278. }