You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

FileObjectDatabase.java 8.7KB

PackWriter: Support reuse of entire packs The most expensive part of packing a repository for transport to another system is enumerating all of the objects in the repository. Once this gets to the size of the linux-2.6 repository (1.8 million objects), enumeration can take several CPU minutes and costs a lot of temporary working set memory. Teach PackWriter to efficiently reuse an existing "cached pack" by answering a clone request with a thin pack followed by a larger cached pack appended to the end. This requires the repository owner to first construct the cached pack by hand, and record the tip commits inside of $GIT_DIR/objects/info/cached-packs: cd $GIT_DIR root=$(git rev-parse master) tmp=objects/.tmp-$$ names=$(echo $root | git pack-objects --keep-true-parents --revs $tmp) for n in $names; do chmod a-w $tmp-$n.pack $tmp-$n.idx touch objects/pack/pack-$n.keep mv $tmp-$n.pack objects/pack/pack-$n.pack mv $tmp-$n.idx objects/pack/pack-$n.idx done (echo "+ $root"; for n in $names; do echo "P $n"; done; echo) >>objects/info/cached-packs git repack -a -d When a clone request needs to include $root, the corresponding cached pack will be copied as-is, rather than enumerating all of the objects that are reachable from $root. For a linux-2.6 kernel repository that should be about 376 MiB, the above process creates two packs of 368 MiB and 38 MiB[1]. This is a local disk usage increase of ~26 MiB, due to reduced delta compression between the large cached pack and the smaller recent activity pack. The overhead is similar to 1 full copy of the compressed project sources. With this cached pack in hand, JGit daemon completes a clone request in 1m17s less time, but a slightly larger data transfer (+2.39 MiB): Before: remote: Counting objects: 1861830, done remote: Finding sources: 100% (1861830/1861830) remote: Getting sizes: 100% (88243/88243) remote: Compressing objects: 100% (88184/88184) Receiving objects: 100% (1861830/1861830), 376.01 MiB | 19.01 MiB/s, done. remote: Total 1861830 (delta 4706), reused 1851053 (delta 1553844) Resolving deltas: 100% (1564621/1564621), done. real 3m19.005s After: remote: Counting objects: 1601, done remote: Counting objects: 1828460, done remote: Finding sources: 100% (50475/50475) remote: Getting sizes: 100% (18843/18843) remote: Compressing objects: 100% (7585/7585) remote: Total 1861830 (delta 2407), reused 1856197 (delta 37510) Receiving objects: 100% (1861830/1861830), 378.40 MiB | 31.31 MiB/s, done. Resolving deltas: 100% (1559477/1559477), done. real 2m2.938s Repository owners can periodically refresh their cached packs by repacking their repository, folding all newer objects into a larger cached pack. Since repacking is already considered to be a normal Git maintenance activity, this isn't a very big burden. [1] In this test $root was set back about two weeks. Change-Id: Ib87131d5c4b5e8c5cacb0f4fe16ff4ece554734b Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago
PackWriter: Support reuse of entire packs The most expensive part of packing a repository for transport to another system is enumerating all of the objects in the repository. Once this gets to the size of the linux-2.6 repository (1.8 million objects), enumeration can take several CPU minutes and costs a lot of temporary working set memory. Teach PackWriter to efficiently reuse an existing "cached pack" by answering a clone request with a thin pack followed by a larger cached pack appended to the end. This requires the repository owner to first construct the cached pack by hand, and record the tip commits inside of $GIT_DIR/objects/info/cached-packs: cd $GIT_DIR root=$(git rev-parse master) tmp=objects/.tmp-$$ names=$(echo $root | git pack-objects --keep-true-parents --revs $tmp) for n in $names; do chmod a-w $tmp-$n.pack $tmp-$n.idx touch objects/pack/pack-$n.keep mv $tmp-$n.pack objects/pack/pack-$n.pack mv $tmp-$n.idx objects/pack/pack-$n.idx done (echo "+ $root"; for n in $names; do echo "P $n"; done; echo) >>objects/info/cached-packs git repack -a -d When a clone request needs to include $root, the corresponding cached pack will be copied as-is, rather than enumerating all of the objects that are reachable from $root. For a linux-2.6 kernel repository that should be about 376 MiB, the above process creates two packs of 368 MiB and 38 MiB[1]. This is a local disk usage increase of ~26 MiB, due to reduced delta compression between the large cached pack and the smaller recent activity pack. The overhead is similar to 1 full copy of the compressed project sources. With this cached pack in hand, JGit daemon completes a clone request in 1m17s less time, but a slightly larger data transfer (+2.39 MiB): Before: remote: Counting objects: 1861830, done remote: Finding sources: 100% (1861830/1861830) remote: Getting sizes: 100% (88243/88243) remote: Compressing objects: 100% (88184/88184) Receiving objects: 100% (1861830/1861830), 376.01 MiB | 19.01 MiB/s, done. remote: Total 1861830 (delta 4706), reused 1851053 (delta 1553844) Resolving deltas: 100% (1564621/1564621), done. real 3m19.005s After: remote: Counting objects: 1601, done remote: Counting objects: 1828460, done remote: Finding sources: 100% (50475/50475) remote: Getting sizes: 100% (18843/18843) remote: Compressing objects: 100% (7585/7585) remote: Total 1861830 (delta 2407), reused 1856197 (delta 37510) Receiving objects: 100% (1861830/1861830), 378.40 MiB | 31.31 MiB/s, done. Resolving deltas: 100% (1559477/1559477), done. real 2m2.938s Repository owners can periodically refresh their cached packs by repacking their repository, folding all newer objects into a larger cached pack. Since repacking is already considered to be a normal Git maintenance activity, this isn't a very big burden. [1] In this test $root was set back about two weeks. Change-Id: Ib87131d5c4b5e8c5cacb0f4fe16ff4ece554734b Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago
PackWriter: Support reuse of entire packs The most expensive part of packing a repository for transport to another system is enumerating all of the objects in the repository. Once this gets to the size of the linux-2.6 repository (1.8 million objects), enumeration can take several CPU minutes and costs a lot of temporary working set memory. Teach PackWriter to efficiently reuse an existing "cached pack" by answering a clone request with a thin pack followed by a larger cached pack appended to the end. This requires the repository owner to first construct the cached pack by hand, and record the tip commits inside of $GIT_DIR/objects/info/cached-packs: cd $GIT_DIR root=$(git rev-parse master) tmp=objects/.tmp-$$ names=$(echo $root | git pack-objects --keep-true-parents --revs $tmp) for n in $names; do chmod a-w $tmp-$n.pack $tmp-$n.idx touch objects/pack/pack-$n.keep mv $tmp-$n.pack objects/pack/pack-$n.pack mv $tmp-$n.idx objects/pack/pack-$n.idx done (echo "+ $root"; for n in $names; do echo "P $n"; done; echo) >>objects/info/cached-packs git repack -a -d When a clone request needs to include $root, the corresponding cached pack will be copied as-is, rather than enumerating all of the objects that are reachable from $root. For a linux-2.6 kernel repository that should be about 376 MiB, the above process creates two packs of 368 MiB and 38 MiB[1]. This is a local disk usage increase of ~26 MiB, due to reduced delta compression between the large cached pack and the smaller recent activity pack. The overhead is similar to 1 full copy of the compressed project sources. With this cached pack in hand, JGit daemon completes a clone request in 1m17s less time, but a slightly larger data transfer (+2.39 MiB): Before: remote: Counting objects: 1861830, done remote: Finding sources: 100% (1861830/1861830) remote: Getting sizes: 100% (88243/88243) remote: Compressing objects: 100% (88184/88184) Receiving objects: 100% (1861830/1861830), 376.01 MiB | 19.01 MiB/s, done. remote: Total 1861830 (delta 4706), reused 1851053 (delta 1553844) Resolving deltas: 100% (1564621/1564621), done. real 3m19.005s After: remote: Counting objects: 1601, done remote: Counting objects: 1828460, done remote: Finding sources: 100% (50475/50475) remote: Getting sizes: 100% (18843/18843) remote: Compressing objects: 100% (7585/7585) remote: Total 1861830 (delta 2407), reused 1856197 (delta 37510) Receiving objects: 100% (1861830/1861830), 378.40 MiB | 31.31 MiB/s, done. Resolving deltas: 100% (1559477/1559477), done. real 2m2.938s Repository owners can periodically refresh their cached packs by repacking their repository, folding all newer objects into a larger cached pack. Since repacking is already considered to be a normal Git maintenance activity, this isn't a very big burden. [1] In this test $root was set back about two weeks. Change-Id: Ib87131d5c4b5e8c5cacb0f4fe16ff4ece554734b Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago
PackWriter: Support reuse of entire packs The most expensive part of packing a repository for transport to another system is enumerating all of the objects in the repository. Once this gets to the size of the linux-2.6 repository (1.8 million objects), enumeration can take several CPU minutes and costs a lot of temporary working set memory. Teach PackWriter to efficiently reuse an existing "cached pack" by answering a clone request with a thin pack followed by a larger cached pack appended to the end. This requires the repository owner to first construct the cached pack by hand, and record the tip commits inside of $GIT_DIR/objects/info/cached-packs: cd $GIT_DIR root=$(git rev-parse master) tmp=objects/.tmp-$$ names=$(echo $root | git pack-objects --keep-true-parents --revs $tmp) for n in $names; do chmod a-w $tmp-$n.pack $tmp-$n.idx touch objects/pack/pack-$n.keep mv $tmp-$n.pack objects/pack/pack-$n.pack mv $tmp-$n.idx objects/pack/pack-$n.idx done (echo "+ $root"; for n in $names; do echo "P $n"; done; echo) >>objects/info/cached-packs git repack -a -d When a clone request needs to include $root, the corresponding cached pack will be copied as-is, rather than enumerating all of the objects that are reachable from $root. For a linux-2.6 kernel repository that should be about 376 MiB, the above process creates two packs of 368 MiB and 38 MiB[1]. This is a local disk usage increase of ~26 MiB, due to reduced delta compression between the large cached pack and the smaller recent activity pack. The overhead is similar to 1 full copy of the compressed project sources. With this cached pack in hand, JGit daemon completes a clone request in 1m17s less time, but a slightly larger data transfer (+2.39 MiB): Before: remote: Counting objects: 1861830, done remote: Finding sources: 100% (1861830/1861830) remote: Getting sizes: 100% (88243/88243) remote: Compressing objects: 100% (88184/88184) Receiving objects: 100% (1861830/1861830), 376.01 MiB | 19.01 MiB/s, done. remote: Total 1861830 (delta 4706), reused 1851053 (delta 1553844) Resolving deltas: 100% (1564621/1564621), done. real 3m19.005s After: remote: Counting objects: 1601, done remote: Counting objects: 1828460, done remote: Finding sources: 100% (50475/50475) remote: Getting sizes: 100% (18843/18843) remote: Compressing objects: 100% (7585/7585) remote: Total 1861830 (delta 2407), reused 1856197 (delta 37510) Receiving objects: 100% (1861830/1861830), 378.40 MiB | 31.31 MiB/s, done. Resolving deltas: 100% (1559477/1559477), done. real 2m2.938s Repository owners can periodically refresh their cached packs by repacking their repository, folding all newer objects into a larger cached pack. Since repacking is already considered to be a normal Git maintenance activity, this isn't a very big burden. [1] In this test $root was set back about two weeks. Change-Id: Ib87131d5c4b5e8c5cacb0f4fe16ff4ece554734b Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322
  1. /*
  2. * Copyright (C) 2010, Google Inc.
  3. * and other copyright owners as documented in the project's IP log.
  4. *
  5. * This program and the accompanying materials are made available
  6. * under the terms of the Eclipse Distribution License v1.0 which
  7. * accompanies this distribution, is reproduced below, and is
  8. * available at http://www.eclipse.org/org/documents/edl-v10.php
  9. *
  10. * All rights reserved.
  11. *
  12. * Redistribution and use in source and binary forms, with or
  13. * without modification, are permitted provided that the following
  14. * conditions are met:
  15. *
  16. * - Redistributions of source code must retain the above copyright
  17. * notice, this list of conditions and the following disclaimer.
  18. *
  19. * - Redistributions in binary form must reproduce the above
  20. * copyright notice, this list of conditions and the following
  21. * disclaimer in the documentation and/or other materials provided
  22. * with the distribution.
  23. *
  24. * - Neither the name of the Eclipse Foundation, Inc. nor the
  25. * names of its contributors may be used to endorse or promote
  26. * products derived from this software without specific prior
  27. * written permission.
  28. *
  29. * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
  30. * CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
  31. * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  32. * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  33. * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
  34. * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
  35. * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
  36. * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
  37. * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  38. * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
  39. * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
  40. * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
  41. * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  42. */
  43. package org.eclipse.jgit.storage.file;
  44. import java.io.File;
  45. import java.io.IOException;
  46. import java.util.Collection;
  47. import java.util.Set;
  48. import org.eclipse.jgit.lib.AbbreviatedObjectId;
  49. import org.eclipse.jgit.lib.AnyObjectId;
  50. import org.eclipse.jgit.lib.Config;
  51. import org.eclipse.jgit.lib.ObjectDatabase;
  52. import org.eclipse.jgit.lib.ObjectId;
  53. import org.eclipse.jgit.lib.ObjectLoader;
  54. import org.eclipse.jgit.lib.ObjectReader;
  55. import org.eclipse.jgit.storage.pack.CachedPack;
  56. import org.eclipse.jgit.storage.pack.ObjectToPack;
  57. import org.eclipse.jgit.storage.pack.PackWriter;
  58. import org.eclipse.jgit.util.FS;
  59. abstract class FileObjectDatabase extends ObjectDatabase {
  60. static enum InsertLooseObjectResult {
  61. INSERTED, EXISTS_PACKED, EXISTS_LOOSE, FAILURE;
  62. }
  63. @Override
  64. public ObjectReader newReader() {
  65. return new WindowCursor(this);
  66. }
  67. @Override
  68. public ObjectDirectoryInserter newInserter() {
  69. return new ObjectDirectoryInserter(this, getConfig());
  70. }
  71. /**
  72. * Does the requested object exist in this database?
  73. * <p>
  74. * Alternates (if present) are searched automatically.
  75. *
  76. * @param objectId
  77. * identity of the object to test for existence of.
  78. * @return true if the specified object is stored in this database, or any
  79. * of the alternate databases.
  80. */
  81. public boolean has(final AnyObjectId objectId) {
  82. return hasObjectImpl1(objectId) || hasObjectImpl2(objectId.name());
  83. }
  84. /**
  85. * Compute the location of a loose object file.
  86. *
  87. * @param objectId
  88. * identity of the loose object to map to the directory.
  89. * @return location of the object, if it were to exist as a loose object.
  90. */
  91. File fileFor(final AnyObjectId objectId) {
  92. return fileFor(objectId.name());
  93. }
  94. File fileFor(final String objectName) {
  95. final String d = objectName.substring(0, 2);
  96. final String f = objectName.substring(2);
  97. return new File(new File(getDirectory(), d), f);
  98. }
  99. final boolean hasObjectImpl1(final AnyObjectId objectId) {
  100. if (hasObject1(objectId))
  101. return true;
  102. for (final AlternateHandle alt : myAlternates()) {
  103. if (alt.db.hasObjectImpl1(objectId))
  104. return true;
  105. }
  106. return tryAgain1() && hasObject1(objectId);
  107. }
  108. final boolean hasObjectImpl2(final String objectId) {
  109. if (hasObject2(objectId))
  110. return true;
  111. for (final AlternateHandle alt : myAlternates()) {
  112. if (alt.db.hasObjectImpl2(objectId))
  113. return true;
  114. }
  115. return false;
  116. }
  117. abstract void resolve(Set<ObjectId> matches, AbbreviatedObjectId id)
  118. throws IOException;
  119. abstract Config getConfig();
  120. abstract FS getFS();
  121. /**
  122. * Open an object from this database.
  123. * <p>
  124. * Alternates (if present) are searched automatically.
  125. *
  126. * @param curs
  127. * temporary working space associated with the calling thread.
  128. * @param objectId
  129. * identity of the object to open.
  130. * @return a {@link ObjectLoader} for accessing the data of the named
  131. * object, or null if the object does not exist.
  132. * @throws IOException
  133. */
  134. ObjectLoader openObject(final WindowCursor curs, final AnyObjectId objectId)
  135. throws IOException {
  136. ObjectLoader ldr;
  137. ldr = openObjectImpl1(curs, objectId);
  138. if (ldr != null)
  139. return ldr;
  140. ldr = openObjectImpl2(curs, objectId.name(), objectId);
  141. if (ldr != null)
  142. return ldr;
  143. return null;
  144. }
  145. final ObjectLoader openObjectImpl1(final WindowCursor curs,
  146. final AnyObjectId objectId) throws IOException {
  147. ObjectLoader ldr;
  148. ldr = openObject1(curs, objectId);
  149. if (ldr != null)
  150. return ldr;
  151. for (final AlternateHandle alt : myAlternates()) {
  152. ldr = alt.db.openObjectImpl1(curs, objectId);
  153. if (ldr != null)
  154. return ldr;
  155. }
  156. if (tryAgain1()) {
  157. ldr = openObject1(curs, objectId);
  158. if (ldr != null)
  159. return ldr;
  160. }
  161. return null;
  162. }
  163. final ObjectLoader openObjectImpl2(final WindowCursor curs,
  164. final String objectName, final AnyObjectId objectId)
  165. throws IOException {
  166. ObjectLoader ldr;
  167. ldr = openObject2(curs, objectName, objectId);
  168. if (ldr != null)
  169. return ldr;
  170. for (final AlternateHandle alt : myAlternates()) {
  171. ldr = alt.db.openObjectImpl2(curs, objectName, objectId);
  172. if (ldr != null)
  173. return ldr;
  174. }
  175. return null;
  176. }
  177. long getObjectSize(WindowCursor curs, AnyObjectId objectId)
  178. throws IOException {
  179. long sz = getObjectSizeImpl1(curs, objectId);
  180. if (0 <= sz)
  181. return sz;
  182. return getObjectSizeImpl2(curs, objectId.name(), objectId);
  183. }
  184. final long getObjectSizeImpl1(final WindowCursor curs,
  185. final AnyObjectId objectId) throws IOException {
  186. long sz;
  187. sz = getObjectSize1(curs, objectId);
  188. if (0 <= sz)
  189. return sz;
  190. for (final AlternateHandle alt : myAlternates()) {
  191. sz = alt.db.getObjectSizeImpl1(curs, objectId);
  192. if (0 <= sz)
  193. return sz;
  194. }
  195. if (tryAgain1()) {
  196. sz = getObjectSize1(curs, objectId);
  197. if (0 <= sz)
  198. return sz;
  199. }
  200. return -1;
  201. }
  202. final long getObjectSizeImpl2(final WindowCursor curs,
  203. final String objectName, final AnyObjectId objectId)
  204. throws IOException {
  205. long sz;
  206. sz = getObjectSize2(curs, objectName, objectId);
  207. if (0 <= sz)
  208. return sz;
  209. for (final AlternateHandle alt : myAlternates()) {
  210. sz = alt.db.getObjectSizeImpl2(curs, objectName, objectId);
  211. if (0 <= sz)
  212. return sz;
  213. }
  214. return -1;
  215. }
  216. abstract void selectObjectRepresentation(PackWriter packer,
  217. ObjectToPack otp, WindowCursor curs) throws IOException;
  218. abstract File getDirectory();
  219. abstract Collection<? extends CachedPack> getCachedPacks()
  220. throws IOException;
  221. abstract AlternateHandle[] myAlternates();
  222. abstract boolean tryAgain1();
  223. abstract boolean hasObject1(AnyObjectId objectId);
  224. abstract boolean hasObject2(String objectId);
  225. abstract ObjectLoader openObject1(WindowCursor curs, AnyObjectId objectId)
  226. throws IOException;
  227. abstract ObjectLoader openObject2(WindowCursor curs, String objectName,
  228. AnyObjectId objectId) throws IOException;
  229. abstract long getObjectSize1(WindowCursor curs, AnyObjectId objectId)
  230. throws IOException;
  231. abstract long getObjectSize2(WindowCursor curs, String objectName,
  232. AnyObjectId objectId) throws IOException;
  233. abstract InsertLooseObjectResult insertUnpackedObject(File tmp,
  234. ObjectId id, boolean createDuplicate) throws IOException;
  235. abstract PackFile openPack(File pack, File idx) throws IOException;
  236. abstract FileObjectDatabase newCachedFileObjectDatabase();
  237. static class AlternateHandle {
  238. final FileObjectDatabase db;
  239. AlternateHandle(FileObjectDatabase db) {
  240. this.db = db;
  241. }
  242. @SuppressWarnings("unchecked")
  243. Collection<CachedPack> getCachedPacks() throws IOException {
  244. return (Collection<CachedPack>) db.getCachedPacks();
  245. }
  246. void close() {
  247. db.close();
  248. }
  249. }
  250. static class AlternateRepository extends AlternateHandle {
  251. final FileRepository repository;
  252. AlternateRepository(FileRepository r) {
  253. super(r.getObjectDatabase());
  254. repository = r;
  255. }
  256. void close() {
  257. repository.close();
  258. }
  259. }
  260. }