You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

BundleWriter.java 9.4KB

PackWriter: Hoist and cluster reference targets Many source browsers and network related tools like UploadPack need to find and parse the target of all branches and annotated tags within the repository during their startup phase. Clustering these together into the same part of the pack file will improve locality, reducing thrashing when an application starts and needs to load all of these into memory at once. To prevent bottlenecking basic log viewing tools that are scannning backwards from the tip of a current branch (and don't need tags) we place this cluster of older targets after 4096 newer commits have already been placed into the pack stream. 4096 was chosen as a rough guess, but was based on a few factors: - log viewers typically show 5-200 commits per page - users only view the first page or two - DHT can cram 2200-4000 commits per 1 MiB chunk thus these will fall into the second commit chunk (roughly) Unfortunately this placement hurts history tools that are scanning backwards through the commit graph and completely ignored tags or branch heads when they started. An ancient tagged commit is no longer positioned behind its first child (its now much earlier), resulting in a page fault for the parser to reload this cluster of objects on demand. This may be an acceptable loss. If a user is walking backwards and has already scanned through more than 4096 commits of history, waiting for the region to reload isn't really that bad compared to the amount of time already spent. If the repository is so small that there are less than 4096 commits, this change has no impact on the placement of objects. Change-Id: If3052e430d305e17878d94145c93754f56b74c61 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago
PackWriter: Hoist and cluster reference targets Many source browsers and network related tools like UploadPack need to find and parse the target of all branches and annotated tags within the repository during their startup phase. Clustering these together into the same part of the pack file will improve locality, reducing thrashing when an application starts and needs to load all of these into memory at once. To prevent bottlenecking basic log viewing tools that are scannning backwards from the tip of a current branch (and don't need tags) we place this cluster of older targets after 4096 newer commits have already been placed into the pack stream. 4096 was chosen as a rough guess, but was based on a few factors: - log viewers typically show 5-200 commits per page - users only view the first page or two - DHT can cram 2200-4000 commits per 1 MiB chunk thus these will fall into the second commit chunk (roughly) Unfortunately this placement hurts history tools that are scanning backwards through the commit graph and completely ignored tags or branch heads when they started. An ancient tagged commit is no longer positioned behind its first child (its now much earlier), resulting in a page fault for the parser to reload this cluster of objects on demand. This may be an acceptable loss. If a user is walking backwards and has already scanned through more than 4096 commits of history, waiting for the region to reload isn't really that bad compared to the amount of time already spent. If the repository is so small that there are less than 4096 commits, this change has no impact on the placement of objects. Change-Id: If3052e430d305e17878d94145c93754f56b74c61 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago
PackWriter: Hoist and cluster reference targets Many source browsers and network related tools like UploadPack need to find and parse the target of all branches and annotated tags within the repository during their startup phase. Clustering these together into the same part of the pack file will improve locality, reducing thrashing when an application starts and needs to load all of these into memory at once. To prevent bottlenecking basic log viewing tools that are scannning backwards from the tip of a current branch (and don't need tags) we place this cluster of older targets after 4096 newer commits have already been placed into the pack stream. 4096 was chosen as a rough guess, but was based on a few factors: - log viewers typically show 5-200 commits per page - users only view the first page or two - DHT can cram 2200-4000 commits per 1 MiB chunk thus these will fall into the second commit chunk (roughly) Unfortunately this placement hurts history tools that are scanning backwards through the commit graph and completely ignored tags or branch heads when they started. An ancient tagged commit is no longer positioned behind its first child (its now much earlier), resulting in a page fault for the parser to reload this cluster of objects on demand. This may be an acceptable loss. If a user is walking backwards and has already scanned through more than 4096 commits of history, waiting for the region to reload isn't really that bad compared to the amount of time already spent. If the repository is so small that there are less than 4096 commits, this change has no impact on the placement of objects. Change-Id: If3052e430d305e17878d94145c93754f56b74c61 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290
  1. /*
  2. * Copyright (C) 2008-2010, Google Inc.
  3. * and other copyright owners as documented in the project's IP log.
  4. *
  5. * This program and the accompanying materials are made available
  6. * under the terms of the Eclipse Distribution License v1.0 which
  7. * accompanies this distribution, is reproduced below, and is
  8. * available at http://www.eclipse.org/org/documents/edl-v10.php
  9. *
  10. * All rights reserved.
  11. *
  12. * Redistribution and use in source and binary forms, with or
  13. * without modification, are permitted provided that the following
  14. * conditions are met:
  15. *
  16. * - Redistributions of source code must retain the above copyright
  17. * notice, this list of conditions and the following disclaimer.
  18. *
  19. * - Redistributions in binary form must reproduce the above
  20. * copyright notice, this list of conditions and the following
  21. * disclaimer in the documentation and/or other materials provided
  22. * with the distribution.
  23. *
  24. * - Neither the name of the Eclipse Foundation, Inc. nor the
  25. * names of its contributors may be used to endorse or promote
  26. * products derived from this software without specific prior
  27. * written permission.
  28. *
  29. * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
  30. * CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
  31. * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  32. * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  33. * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
  34. * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
  35. * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
  36. * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
  37. * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  38. * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
  39. * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
  40. * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
  41. * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  42. */
  43. package org.eclipse.jgit.transport;
  44. import static java.nio.charset.StandardCharsets.UTF_8;
  45. import java.io.IOException;
  46. import java.io.OutputStream;
  47. import java.io.OutputStreamWriter;
  48. import java.io.Writer;
  49. import java.text.MessageFormat;
  50. import java.util.HashSet;
  51. import java.util.Map;
  52. import java.util.Set;
  53. import java.util.TreeMap;
  54. import org.eclipse.jgit.internal.JGitText;
  55. import org.eclipse.jgit.internal.storage.pack.PackWriter;
  56. import org.eclipse.jgit.lib.AnyObjectId;
  57. import org.eclipse.jgit.lib.Constants;
  58. import org.eclipse.jgit.lib.ObjectId;
  59. import org.eclipse.jgit.lib.ObjectReader;
  60. import org.eclipse.jgit.lib.ProgressMonitor;
  61. import org.eclipse.jgit.lib.Ref;
  62. import org.eclipse.jgit.lib.Repository;
  63. import org.eclipse.jgit.revwalk.RevCommit;
  64. import org.eclipse.jgit.storage.pack.PackConfig;
  65. /**
  66. * Creates a Git bundle file, for sneaker-net transport to another system.
  67. * <p>
  68. * Bundles generated by this class can be later read in from a file URI using
  69. * the bundle transport, or from an application controlled buffer by the more
  70. * generic {@link org.eclipse.jgit.transport.TransportBundleStream}.
  71. * <p>
  72. * Applications creating bundles need to call one or more <code>include</code>
  73. * calls to reflect which objects should be available as refs in the bundle for
  74. * the other side to fetch. At least one include is required to create a valid
  75. * bundle file, and duplicate names are not permitted.
  76. * <p>
  77. * Optional <code>assume</code> calls can be made to declare commits which the
  78. * recipient must have in order to fetch from the bundle file. Objects reachable
  79. * from these assumed commits can be used as delta bases in order to reduce the
  80. * overall bundle size.
  81. */
  82. public class BundleWriter {
  83. private final Repository db;
  84. private final ObjectReader reader;
  85. private final Map<String, ObjectId> include;
  86. private final Set<RevCommit> assume;
  87. private final Set<ObjectId> tagTargets;
  88. private PackConfig packConfig;
  89. private ObjectCountCallback callback;
  90. /**
  91. * Create a writer for a bundle.
  92. *
  93. * @param repo
  94. * repository where objects are stored.
  95. */
  96. public BundleWriter(Repository repo) {
  97. db = repo;
  98. reader = null;
  99. include = new TreeMap<>();
  100. assume = new HashSet<>();
  101. tagTargets = new HashSet<>();
  102. }
  103. /**
  104. * Create a writer for a bundle.
  105. *
  106. * @param or
  107. * reader for reading objects. Will be closed at the end of {@link
  108. * #writeBundle(ProgressMonitor, OutputStream)}, but readers may be
  109. * reused after closing.
  110. * @since 4.8
  111. */
  112. public BundleWriter(ObjectReader or) {
  113. db = null;
  114. reader = or;
  115. include = new TreeMap<>();
  116. assume = new HashSet<>();
  117. tagTargets = new HashSet<>();
  118. }
  119. /**
  120. * Set the configuration used by the pack generator.
  121. *
  122. * @param pc
  123. * configuration controlling packing parameters. If null the
  124. * source repository's settings will be used, or the default
  125. * settings if constructed without a repo.
  126. */
  127. public void setPackConfig(PackConfig pc) {
  128. this.packConfig = pc;
  129. }
  130. /**
  131. * Include an object (and everything reachable from it) in the bundle.
  132. *
  133. * @param name
  134. * name the recipient can discover this object as from the
  135. * bundle's list of advertised refs . The name must be a valid
  136. * ref format and must not have already been included in this
  137. * bundle writer.
  138. * @param id
  139. * object to pack. Multiple refs may point to the same object.
  140. */
  141. public void include(String name, AnyObjectId id) {
  142. boolean validRefName = Repository.isValidRefName(name) || Constants.HEAD.equals(name);
  143. if (!validRefName)
  144. throw new IllegalArgumentException(MessageFormat.format(JGitText.get().invalidRefName, name));
  145. if (include.containsKey(name))
  146. throw new IllegalStateException(JGitText.get().duplicateRef + name);
  147. include.put(name, id.toObjectId());
  148. }
  149. /**
  150. * Include a single ref (a name/object pair) in the bundle.
  151. * <p>
  152. * This is a utility function for:
  153. * <code>include(r.getName(), r.getObjectId())</code>.
  154. *
  155. * @param r
  156. * the ref to include.
  157. */
  158. public void include(Ref r) {
  159. include(r.getName(), r.getObjectId());
  160. if (r.getPeeledObjectId() != null)
  161. tagTargets.add(r.getPeeledObjectId());
  162. else if (r.getObjectId() != null
  163. && r.getName().startsWith(Constants.R_HEADS))
  164. tagTargets.add(r.getObjectId());
  165. }
  166. /**
  167. * Assume a commit is available on the recipient's side.
  168. * <p>
  169. * In order to fetch from a bundle the recipient must have any assumed
  170. * commit. Each assumed commit is explicitly recorded in the bundle header
  171. * to permit the recipient to validate it has these objects.
  172. *
  173. * @param c
  174. * the commit to assume being available. This commit should be
  175. * parsed and not disposed in order to maximize the amount of
  176. * debugging information available in the bundle stream.
  177. */
  178. public void assume(RevCommit c) {
  179. if (c != null)
  180. assume.add(c);
  181. }
  182. /**
  183. * Generate and write the bundle to the output stream.
  184. * <p>
  185. * This method can only be called once per BundleWriter instance.
  186. *
  187. * @param monitor
  188. * progress monitor to report bundle writing status to.
  189. * @param os
  190. * the stream the bundle is written to. The stream should be
  191. * buffered by the caller. The caller is responsible for closing
  192. * the stream.
  193. * @throws java.io.IOException
  194. * an error occurred reading a local object's data to include in
  195. * the bundle, or writing compressed object data to the output
  196. * stream.
  197. */
  198. public void writeBundle(ProgressMonitor monitor, OutputStream os)
  199. throws IOException {
  200. try (PackWriter packWriter = newPackWriter()) {
  201. packWriter.setObjectCountCallback(callback);
  202. final HashSet<ObjectId> inc = new HashSet<>();
  203. final HashSet<ObjectId> exc = new HashSet<>();
  204. inc.addAll(include.values());
  205. for (RevCommit r : assume)
  206. exc.add(r.getId());
  207. packWriter.setIndexDisabled(true);
  208. packWriter.setDeltaBaseAsOffset(true);
  209. packWriter.setThin(exc.size() > 0);
  210. packWriter.setReuseValidatingObjects(false);
  211. if (exc.isEmpty()) {
  212. packWriter.setTagTargets(tagTargets);
  213. }
  214. packWriter.preparePack(monitor, inc, exc);
  215. final Writer w = new OutputStreamWriter(os, UTF_8);
  216. w.write(TransportBundle.V2_BUNDLE_SIGNATURE);
  217. w.write('\n');
  218. final char[] tmp = new char[Constants.OBJECT_ID_STRING_LENGTH];
  219. for (RevCommit a : assume) {
  220. w.write('-');
  221. a.copyTo(tmp, w);
  222. if (a.getRawBuffer() != null) {
  223. w.write(' ');
  224. w.write(a.getShortMessage());
  225. }
  226. w.write('\n');
  227. }
  228. for (Map.Entry<String, ObjectId> e : include.entrySet()) {
  229. e.getValue().copyTo(tmp, w);
  230. w.write(' ');
  231. w.write(e.getKey());
  232. w.write('\n');
  233. }
  234. w.write('\n');
  235. w.flush();
  236. packWriter.writePack(monitor, monitor, os);
  237. }
  238. }
  239. private PackWriter newPackWriter() {
  240. PackConfig pc = packConfig;
  241. if (pc == null) {
  242. pc = db != null ? new PackConfig(db) : new PackConfig();
  243. }
  244. return new PackWriter(pc, reader != null ? reader : db.newObjectReader());
  245. }
  246. /**
  247. * Set the {@link org.eclipse.jgit.transport.ObjectCountCallback}.
  248. * <p>
  249. * It should be set before calling
  250. * {@link #writeBundle(ProgressMonitor, OutputStream)}.
  251. * <p>
  252. * This callback will be passed on to
  253. * {@link org.eclipse.jgit.internal.storage.pack.PackWriter#setObjectCountCallback}.
  254. *
  255. * @param callback
  256. * the callback to set
  257. * @return this object for chaining.
  258. * @since 4.1
  259. */
  260. public BundleWriter setObjectCountCallback(ObjectCountCallback callback) {
  261. this.callback = callback;
  262. return this;
  263. }
  264. }