You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

BundleWriter.java 8.8KB

PackWriter: Hoist and cluster reference targets Many source browsers and network related tools like UploadPack need to find and parse the target of all branches and annotated tags within the repository during their startup phase. Clustering these together into the same part of the pack file will improve locality, reducing thrashing when an application starts and needs to load all of these into memory at once. To prevent bottlenecking basic log viewing tools that are scannning backwards from the tip of a current branch (and don't need tags) we place this cluster of older targets after 4096 newer commits have already been placed into the pack stream. 4096 was chosen as a rough guess, but was based on a few factors: - log viewers typically show 5-200 commits per page - users only view the first page or two - DHT can cram 2200-4000 commits per 1 MiB chunk thus these will fall into the second commit chunk (roughly) Unfortunately this placement hurts history tools that are scanning backwards through the commit graph and completely ignored tags or branch heads when they started. An ancient tagged commit is no longer positioned behind its first child (its now much earlier), resulting in a page fault for the parser to reload this cluster of objects on demand. This may be an acceptable loss. If a user is walking backwards and has already scanned through more than 4096 commits of history, waiting for the region to reload isn't really that bad compared to the amount of time already spent. If the repository is so small that there are less than 4096 commits, this change has no impact on the placement of objects. Change-Id: If3052e430d305e17878d94145c93754f56b74c61 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago
PackWriter: Hoist and cluster reference targets Many source browsers and network related tools like UploadPack need to find and parse the target of all branches and annotated tags within the repository during their startup phase. Clustering these together into the same part of the pack file will improve locality, reducing thrashing when an application starts and needs to load all of these into memory at once. To prevent bottlenecking basic log viewing tools that are scannning backwards from the tip of a current branch (and don't need tags) we place this cluster of older targets after 4096 newer commits have already been placed into the pack stream. 4096 was chosen as a rough guess, but was based on a few factors: - log viewers typically show 5-200 commits per page - users only view the first page or two - DHT can cram 2200-4000 commits per 1 MiB chunk thus these will fall into the second commit chunk (roughly) Unfortunately this placement hurts history tools that are scanning backwards through the commit graph and completely ignored tags or branch heads when they started. An ancient tagged commit is no longer positioned behind its first child (its now much earlier), resulting in a page fault for the parser to reload this cluster of objects on demand. This may be an acceptable loss. If a user is walking backwards and has already scanned through more than 4096 commits of history, waiting for the region to reload isn't really that bad compared to the amount of time already spent. If the repository is so small that there are less than 4096 commits, this change has no impact on the placement of objects. Change-Id: If3052e430d305e17878d94145c93754f56b74c61 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago
PackWriter: Hoist and cluster reference targets Many source browsers and network related tools like UploadPack need to find and parse the target of all branches and annotated tags within the repository during their startup phase. Clustering these together into the same part of the pack file will improve locality, reducing thrashing when an application starts and needs to load all of these into memory at once. To prevent bottlenecking basic log viewing tools that are scannning backwards from the tip of a current branch (and don't need tags) we place this cluster of older targets after 4096 newer commits have already been placed into the pack stream. 4096 was chosen as a rough guess, but was based on a few factors: - log viewers typically show 5-200 commits per page - users only view the first page or two - DHT can cram 2200-4000 commits per 1 MiB chunk thus these will fall into the second commit chunk (roughly) Unfortunately this placement hurts history tools that are scanning backwards through the commit graph and completely ignored tags or branch heads when they started. An ancient tagged commit is no longer positioned behind its first child (its now much earlier), resulting in a page fault for the parser to reload this cluster of objects on demand. This may be an acceptable loss. If a user is walking backwards and has already scanned through more than 4096 commits of history, waiting for the region to reload isn't really that bad compared to the amount of time already spent. If the repository is so small that there are less than 4096 commits, this change has no impact on the placement of objects. Change-Id: If3052e430d305e17878d94145c93754f56b74c61 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago
PackWriter: Hoist and cluster reference targets Many source browsers and network related tools like UploadPack need to find and parse the target of all branches and annotated tags within the repository during their startup phase. Clustering these together into the same part of the pack file will improve locality, reducing thrashing when an application starts and needs to load all of these into memory at once. To prevent bottlenecking basic log viewing tools that are scannning backwards from the tip of a current branch (and don't need tags) we place this cluster of older targets after 4096 newer commits have already been placed into the pack stream. 4096 was chosen as a rough guess, but was based on a few factors: - log viewers typically show 5-200 commits per page - users only view the first page or two - DHT can cram 2200-4000 commits per 1 MiB chunk thus these will fall into the second commit chunk (roughly) Unfortunately this placement hurts history tools that are scanning backwards through the commit graph and completely ignored tags or branch heads when they started. An ancient tagged commit is no longer positioned behind its first child (its now much earlier), resulting in a page fault for the parser to reload this cluster of objects on demand. This may be an acceptable loss. If a user is walking backwards and has already scanned through more than 4096 commits of history, waiting for the region to reload isn't really that bad compared to the amount of time already spent. If the repository is so small that there are less than 4096 commits, this change has no impact on the placement of objects. Change-Id: If3052e430d305e17878d94145c93754f56b74c61 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264
  1. /*
  2. * Copyright (C) 2008-2010, Google Inc.
  3. * and other copyright owners as documented in the project's IP log.
  4. *
  5. * This program and the accompanying materials are made available
  6. * under the terms of the Eclipse Distribution License v1.0 which
  7. * accompanies this distribution, is reproduced below, and is
  8. * available at http://www.eclipse.org/org/documents/edl-v10.php
  9. *
  10. * All rights reserved.
  11. *
  12. * Redistribution and use in source and binary forms, with or
  13. * without modification, are permitted provided that the following
  14. * conditions are met:
  15. *
  16. * - Redistributions of source code must retain the above copyright
  17. * notice, this list of conditions and the following disclaimer.
  18. *
  19. * - Redistributions in binary form must reproduce the above
  20. * copyright notice, this list of conditions and the following
  21. * disclaimer in the documentation and/or other materials provided
  22. * with the distribution.
  23. *
  24. * - Neither the name of the Eclipse Foundation, Inc. nor the
  25. * names of its contributors may be used to endorse or promote
  26. * products derived from this software without specific prior
  27. * written permission.
  28. *
  29. * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
  30. * CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
  31. * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  32. * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  33. * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
  34. * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
  35. * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
  36. * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
  37. * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  38. * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
  39. * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
  40. * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
  41. * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  42. */
  43. package org.eclipse.jgit.transport;
  44. import java.io.IOException;
  45. import java.io.OutputStream;
  46. import java.io.OutputStreamWriter;
  47. import java.io.Writer;
  48. import java.text.MessageFormat;
  49. import java.util.HashSet;
  50. import java.util.Map;
  51. import java.util.Set;
  52. import java.util.TreeMap;
  53. import org.eclipse.jgit.internal.JGitText;
  54. import org.eclipse.jgit.internal.storage.pack.PackWriter;
  55. import org.eclipse.jgit.lib.AnyObjectId;
  56. import org.eclipse.jgit.lib.Constants;
  57. import org.eclipse.jgit.lib.ObjectId;
  58. import org.eclipse.jgit.lib.ProgressMonitor;
  59. import org.eclipse.jgit.lib.Ref;
  60. import org.eclipse.jgit.lib.Repository;
  61. import org.eclipse.jgit.revwalk.RevCommit;
  62. import org.eclipse.jgit.storage.pack.PackConfig;
  63. /**
  64. * Creates a Git bundle file, for sneaker-net transport to another system.
  65. * <p>
  66. * Bundles generated by this class can be later read in from a file URI using
  67. * the bundle transport, or from an application controlled buffer by the more
  68. * generic {@link TransportBundleStream}.
  69. * <p>
  70. * Applications creating bundles need to call one or more <code>include</code>
  71. * calls to reflect which objects should be available as refs in the bundle for
  72. * the other side to fetch. At least one include is required to create a valid
  73. * bundle file, and duplicate names are not permitted.
  74. * <p>
  75. * Optional <code>assume</code> calls can be made to declare commits which the
  76. * recipient must have in order to fetch from the bundle file. Objects reachable
  77. * from these assumed commits can be used as delta bases in order to reduce the
  78. * overall bundle size.
  79. */
  80. public class BundleWriter {
  81. private final Repository db;
  82. private final Map<String, ObjectId> include;
  83. private final Set<RevCommit> assume;
  84. private final Set<ObjectId> tagTargets;
  85. private PackConfig packConfig;
  86. private ObjectCountCallback callback;
  87. /**
  88. * Create a writer for a bundle.
  89. *
  90. * @param repo
  91. * repository where objects are stored.
  92. */
  93. public BundleWriter(final Repository repo) {
  94. db = repo;
  95. include = new TreeMap<String, ObjectId>();
  96. assume = new HashSet<RevCommit>();
  97. tagTargets = new HashSet<ObjectId>();
  98. }
  99. /**
  100. * Set the configuration used by the pack generator.
  101. *
  102. * @param pc
  103. * configuration controlling packing parameters. If null the
  104. * source repository's settings will be used.
  105. */
  106. public void setPackConfig(PackConfig pc) {
  107. this.packConfig = pc;
  108. }
  109. /**
  110. * Include an object (and everything reachable from it) in the bundle.
  111. *
  112. * @param name
  113. * name the recipient can discover this object as from the
  114. * bundle's list of advertised refs . The name must be a valid
  115. * ref format and must not have already been included in this
  116. * bundle writer.
  117. * @param id
  118. * object to pack. Multiple refs may point to the same object.
  119. */
  120. public void include(final String name, final AnyObjectId id) {
  121. boolean validRefName = Repository.isValidRefName(name) || Constants.HEAD.equals(name);
  122. if (!validRefName)
  123. throw new IllegalArgumentException(MessageFormat.format(JGitText.get().invalidRefName, name));
  124. if (include.containsKey(name))
  125. throw new IllegalStateException(JGitText.get().duplicateRef + name);
  126. include.put(name, id.toObjectId());
  127. }
  128. /**
  129. * Include a single ref (a name/object pair) in the bundle.
  130. * <p>
  131. * This is a utility function for:
  132. * <code>include(r.getName(), r.getObjectId())</code>.
  133. *
  134. * @param r
  135. * the ref to include.
  136. */
  137. public void include(final Ref r) {
  138. include(r.getName(), r.getObjectId());
  139. if (r.getPeeledObjectId() != null)
  140. tagTargets.add(r.getPeeledObjectId());
  141. else if (r.getObjectId() != null
  142. && r.getName().startsWith(Constants.R_HEADS))
  143. tagTargets.add(r.getObjectId());
  144. }
  145. /**
  146. * Assume a commit is available on the recipient's side.
  147. * <p>
  148. * In order to fetch from a bundle the recipient must have any assumed
  149. * commit. Each assumed commit is explicitly recorded in the bundle header
  150. * to permit the recipient to validate it has these objects.
  151. *
  152. * @param c
  153. * the commit to assume being available. This commit should be
  154. * parsed and not disposed in order to maximize the amount of
  155. * debugging information available in the bundle stream.
  156. */
  157. public void assume(final RevCommit c) {
  158. if (c != null)
  159. assume.add(c);
  160. }
  161. /**
  162. * Generate and write the bundle to the output stream.
  163. * <p>
  164. * This method can only be called once per BundleWriter instance.
  165. *
  166. * @param monitor
  167. * progress monitor to report bundle writing status to.
  168. * @param os
  169. * the stream the bundle is written to. The stream should be
  170. * buffered by the caller. The caller is responsible for closing
  171. * the stream.
  172. * @throws IOException
  173. * an error occurred reading a local object's data to include in
  174. * the bundle, or writing compressed object data to the output
  175. * stream.
  176. * @throws WriteAbortedException
  177. * the write operation is aborted by
  178. * {@link ObjectCountCallback}.
  179. */
  180. public void writeBundle(ProgressMonitor monitor, OutputStream os)
  181. throws IOException {
  182. PackConfig pc = packConfig;
  183. if (pc == null)
  184. pc = new PackConfig(db);
  185. try (PackWriter packWriter = new PackWriter(pc, db.newObjectReader())) {
  186. packWriter.setObjectCountCallback(callback);
  187. final HashSet<ObjectId> inc = new HashSet<ObjectId>();
  188. final HashSet<ObjectId> exc = new HashSet<ObjectId>();
  189. inc.addAll(include.values());
  190. for (final RevCommit r : assume)
  191. exc.add(r.getId());
  192. packWriter.setIndexDisabled(true);
  193. packWriter.setDeltaBaseAsOffset(true);
  194. packWriter.setThin(exc.size() > 0);
  195. packWriter.setReuseValidatingObjects(false);
  196. if (exc.size() == 0)
  197. packWriter.setTagTargets(tagTargets);
  198. packWriter.preparePack(monitor, inc, exc);
  199. final Writer w = new OutputStreamWriter(os, Constants.CHARSET);
  200. w.write(TransportBundle.V2_BUNDLE_SIGNATURE);
  201. w.write('\n');
  202. final char[] tmp = new char[Constants.OBJECT_ID_STRING_LENGTH];
  203. for (final RevCommit a : assume) {
  204. w.write('-');
  205. a.copyTo(tmp, w);
  206. if (a.getRawBuffer() != null) {
  207. w.write(' ');
  208. w.write(a.getShortMessage());
  209. }
  210. w.write('\n');
  211. }
  212. for (final Map.Entry<String, ObjectId> e : include.entrySet()) {
  213. e.getValue().copyTo(tmp, w);
  214. w.write(' ');
  215. w.write(e.getKey());
  216. w.write('\n');
  217. }
  218. w.write('\n');
  219. w.flush();
  220. packWriter.writePack(monitor, monitor, os);
  221. }
  222. }
  223. /**
  224. * Set the {@link ObjectCountCallback}.
  225. * <p>
  226. * It should be set before calling
  227. * {@link #writeBundle(ProgressMonitor, OutputStream)}.
  228. * <p>
  229. * This callback will be passed on to
  230. * {@link PackWriter#setObjectCountCallback}.
  231. *
  232. * @param callback
  233. * the callback to set
  234. *
  235. * @return this object for chaining.
  236. * @since 4.1
  237. */
  238. public BundleWriter setObjectCountCallback(ObjectCountCallback callback) {
  239. this.callback = callback;
  240. return this;
  241. }
  242. }