You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

ObjectReader.java 16KB

Added read/write support for pack bitmap index. A pack bitmap index is an additional index of compressed bitmaps of the object graph. Furthermore, a logical API of the index functionality is included, as it is expected to be used by the PackWriter. Compressed bitmaps are created using the javaewah library, which is a word-aligned compressed variant of the Java bitset class based on run-length encoding. The library only works with positive integer values. Thus, the maximum number of ObjectIds in a pack file that this index can currently support is limited to Integer.MAX_VALUE. Every ObjectId is given an integer mapping. The integer is the position of the ObjectId in the complete ObjectId list, sorted by offset, for the pack file. That integer is what the bitmaps use to reference the ObjectId. Currently, the new index format can only be used with pack files that contain a complete closure of the object graph e.g. the result of a garbage collection. The index file includes four bitmaps for the Git object types i.e. commits, trees, blobs, and tags. In addition, a collection of bitmaps keyed by an ObjectId is also included. The bitmap for each entry in the collection represents the full closure of ObjectIds reachable from the keyed ObjectId (including the keyed ObjectId itself). The bitmaps are further compressed by XORing the current bitmaps against prior bitmaps in the index, and selecting the smallest representation. The XOR'd bitmap and offset from the current entry to the position of the bitmap to XOR against is the actual representation of the entry in the index file. Each entry contains one byte, which is currently used to note whether the bitmap should be blindly reused. Change-Id: Id328724bf6b4c8366a088233098c18643edcf40f
11 years ago
Added read/write support for pack bitmap index. A pack bitmap index is an additional index of compressed bitmaps of the object graph. Furthermore, a logical API of the index functionality is included, as it is expected to be used by the PackWriter. Compressed bitmaps are created using the javaewah library, which is a word-aligned compressed variant of the Java bitset class based on run-length encoding. The library only works with positive integer values. Thus, the maximum number of ObjectIds in a pack file that this index can currently support is limited to Integer.MAX_VALUE. Every ObjectId is given an integer mapping. The integer is the position of the ObjectId in the complete ObjectId list, sorted by offset, for the pack file. That integer is what the bitmaps use to reference the ObjectId. Currently, the new index format can only be used with pack files that contain a complete closure of the object graph e.g. the result of a garbage collection. The index file includes four bitmaps for the Git object types i.e. commits, trees, blobs, and tags. In addition, a collection of bitmaps keyed by an ObjectId is also included. The bitmap for each entry in the collection represents the full closure of ObjectIds reachable from the keyed ObjectId (including the keyed ObjectId itself). The bitmaps are further compressed by XORing the current bitmaps against prior bitmaps in the index, and selecting the smallest representation. The XOR'd bitmap and offset from the current entry to the position of the bitmap to XOR against is the actual representation of the entry in the index file. Each entry contains one byte, which is currently used to note whether the bitmap should be blindly reused. Change-Id: Id328724bf6b4c8366a088233098c18643edcf40f
11 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474
  1. /*
  2. * Copyright (C) 2010, Google Inc.
  3. * and other copyright owners as documented in the project's IP log.
  4. *
  5. * This program and the accompanying materials are made available
  6. * under the terms of the Eclipse Distribution License v1.0 which
  7. * accompanies this distribution, is reproduced below, and is
  8. * available at http://www.eclipse.org/org/documents/edl-v10.php
  9. *
  10. * All rights reserved.
  11. *
  12. * Redistribution and use in source and binary forms, with or
  13. * without modification, are permitted provided that the following
  14. * conditions are met:
  15. *
  16. * - Redistributions of source code must retain the above copyright
  17. * notice, this list of conditions and the following disclaimer.
  18. *
  19. * - Redistributions in binary form must reproduce the above
  20. * copyright notice, this list of conditions and the following
  21. * disclaimer in the documentation and/or other materials provided
  22. * with the distribution.
  23. *
  24. * - Neither the name of the Eclipse Foundation, Inc. nor the
  25. * names of its contributors may be used to endorse or promote
  26. * products derived from this software without specific prior
  27. * written permission.
  28. *
  29. * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
  30. * CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
  31. * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  32. * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  33. * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
  34. * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
  35. * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
  36. * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
  37. * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  38. * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
  39. * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
  40. * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
  41. * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  42. */
  43. package org.eclipse.jgit.lib;
  44. import java.io.IOException;
  45. import java.util.ArrayList;
  46. import java.util.Collection;
  47. import java.util.Iterator;
  48. import java.util.List;
  49. import java.util.Set;
  50. import org.eclipse.jgit.errors.IncorrectObjectTypeException;
  51. import org.eclipse.jgit.errors.MissingObjectException;
  52. import org.eclipse.jgit.internal.storage.pack.ObjectReuseAsIs;
  53. import org.eclipse.jgit.revwalk.ObjectWalk;
  54. import org.eclipse.jgit.revwalk.RevCommit;
  55. import org.eclipse.jgit.revwalk.RevWalk;
  56. /**
  57. * Reads an {@link ObjectDatabase} for a single thread.
  58. * <p>
  59. * Readers that can support efficient reuse of pack encoded objects should also
  60. * implement the companion interface {@link ObjectReuseAsIs}.
  61. */
  62. public abstract class ObjectReader {
  63. /** Type hint indicating the caller doesn't know the type. */
  64. public static final int OBJ_ANY = -1;
  65. /**
  66. * Construct a new reader from the same data.
  67. * <p>
  68. * Applications can use this method to build a new reader from the same data
  69. * source, but for an different thread.
  70. *
  71. * @return a brand new reader, using the same data source.
  72. */
  73. public abstract ObjectReader newReader();
  74. /**
  75. * Obtain a unique abbreviation (prefix) of an object SHA-1.
  76. *
  77. * This method uses a reasonable default for the minimum length. Callers who
  78. * don't care about the minimum length should prefer this method.
  79. *
  80. * The returned abbreviation would expand back to the argument ObjectId when
  81. * passed to {@link #resolve(AbbreviatedObjectId)}, assuming no new objects
  82. * are added to this repository between calls.
  83. *
  84. * @param objectId
  85. * object identity that needs to be abbreviated.
  86. * @return SHA-1 abbreviation.
  87. * @throws IOException
  88. * the object store cannot be read.
  89. */
  90. public AbbreviatedObjectId abbreviate(AnyObjectId objectId)
  91. throws IOException {
  92. return abbreviate(objectId, 7);
  93. }
  94. /**
  95. * Obtain a unique abbreviation (prefix) of an object SHA-1.
  96. *
  97. * The returned abbreviation would expand back to the argument ObjectId when
  98. * passed to {@link #resolve(AbbreviatedObjectId)}, assuming no new objects
  99. * are added to this repository between calls.
  100. *
  101. * The default implementation of this method abbreviates the id to the
  102. * minimum length, then resolves it to see if there are multiple results.
  103. * When multiple results are found, the length is extended by 1 and resolve
  104. * is tried again.
  105. *
  106. * @param objectId
  107. * object identity that needs to be abbreviated.
  108. * @param len
  109. * minimum length of the abbreviated string. Must be in the range
  110. * [2, {@value Constants#OBJECT_ID_STRING_LENGTH}].
  111. * @return SHA-1 abbreviation. If no matching objects exist in the
  112. * repository, the abbreviation will match the minimum length.
  113. * @throws IOException
  114. * the object store cannot be read.
  115. */
  116. public AbbreviatedObjectId abbreviate(AnyObjectId objectId, int len)
  117. throws IOException {
  118. if (len == Constants.OBJECT_ID_STRING_LENGTH)
  119. return AbbreviatedObjectId.fromObjectId(objectId);
  120. AbbreviatedObjectId abbrev = objectId.abbreviate(len);
  121. Collection<ObjectId> matches = resolve(abbrev);
  122. while (1 < matches.size() && len < Constants.OBJECT_ID_STRING_LENGTH) {
  123. abbrev = objectId.abbreviate(++len);
  124. List<ObjectId> n = new ArrayList<ObjectId>(8);
  125. for (ObjectId candidate : matches) {
  126. if (abbrev.prefixCompare(candidate) == 0)
  127. n.add(candidate);
  128. }
  129. if (1 < n.size())
  130. matches = n;
  131. else
  132. matches = resolve(abbrev);
  133. }
  134. return abbrev;
  135. }
  136. /**
  137. * Resolve an abbreviated ObjectId to its full form.
  138. *
  139. * This method searches for an ObjectId that begins with the abbreviation,
  140. * and returns at least some matching candidates.
  141. *
  142. * If the returned collection is empty, no objects start with this
  143. * abbreviation. The abbreviation doesn't belong to this repository, or the
  144. * repository lacks the necessary objects to complete it.
  145. *
  146. * If the collection contains exactly one member, the abbreviation is
  147. * (currently) unique within this database. There is a reasonably high
  148. * probability that the returned id is what was previously abbreviated.
  149. *
  150. * If the collection contains 2 or more members, the abbreviation is not
  151. * unique. In this case the implementation is only required to return at
  152. * least 2 candidates to signal the abbreviation has conflicts. User
  153. * friendly implementations should return as many candidates as reasonably
  154. * possible, as the caller may be able to disambiguate further based on
  155. * context. However since databases can be very large (e.g. 10 million
  156. * objects) returning 625,000 candidates for the abbreviation "0" is simply
  157. * unreasonable, so implementors should draw the line at around 256 matches.
  158. *
  159. * @param id
  160. * abbreviated id to resolve to a complete identity. The
  161. * abbreviation must have a length of at least 2.
  162. * @return candidates that begin with the abbreviated identity.
  163. * @throws IOException
  164. * the object store cannot be read.
  165. */
  166. public abstract Collection<ObjectId> resolve(AbbreviatedObjectId id)
  167. throws IOException;
  168. /**
  169. * Does the requested object exist in this database?
  170. *
  171. * @param objectId
  172. * identity of the object to test for existence of.
  173. * @return true if the specified object is stored in this database.
  174. * @throws IOException
  175. * the object store cannot be accessed.
  176. */
  177. public boolean has(AnyObjectId objectId) throws IOException {
  178. return has(objectId, OBJ_ANY);
  179. }
  180. /**
  181. * Does the requested object exist in this database?
  182. *
  183. * @param objectId
  184. * identity of the object to test for existence of.
  185. * @param typeHint
  186. * hint about the type of object being requested, e.g.
  187. * {@link Constants#OBJ_BLOB}; {@link #OBJ_ANY} if the object
  188. * type is not known, or does not matter to the caller.
  189. * @return true if the specified object is stored in this database.
  190. * @throws IncorrectObjectTypeException
  191. * typeHint was not OBJ_ANY, and the object's actual type does
  192. * not match typeHint.
  193. * @throws IOException
  194. * the object store cannot be accessed.
  195. */
  196. public boolean has(AnyObjectId objectId, int typeHint) throws IOException {
  197. try {
  198. open(objectId, typeHint);
  199. return true;
  200. } catch (MissingObjectException notFound) {
  201. return false;
  202. }
  203. }
  204. /**
  205. * Open an object from this database.
  206. *
  207. * @param objectId
  208. * identity of the object to open.
  209. * @return a {@link ObjectLoader} for accessing the object.
  210. * @throws MissingObjectException
  211. * the object does not exist.
  212. * @throws IOException
  213. * the object store cannot be accessed.
  214. */
  215. public ObjectLoader open(AnyObjectId objectId)
  216. throws MissingObjectException, IOException {
  217. return open(objectId, OBJ_ANY);
  218. }
  219. /**
  220. * Open an object from this database.
  221. *
  222. * @param objectId
  223. * identity of the object to open.
  224. * @param typeHint
  225. * hint about the type of object being requested, e.g.
  226. * {@link Constants#OBJ_BLOB}; {@link #OBJ_ANY} if the object
  227. * type is not known, or does not matter to the caller.
  228. * @return a {@link ObjectLoader} for accessing the object.
  229. * @throws MissingObjectException
  230. * the object does not exist.
  231. * @throws IncorrectObjectTypeException
  232. * typeHint was not OBJ_ANY, and the object's actual type does
  233. * not match typeHint.
  234. * @throws IOException
  235. * the object store cannot be accessed.
  236. */
  237. public abstract ObjectLoader open(AnyObjectId objectId, int typeHint)
  238. throws MissingObjectException, IncorrectObjectTypeException,
  239. IOException;
  240. /**
  241. * Returns IDs for those commits which should be considered as shallow.
  242. *
  243. * @return IDs of shallow commits
  244. * @throws IOException
  245. */
  246. public abstract Set<ObjectId> getShallowCommits() throws IOException;
  247. /**
  248. * Asynchronous object opening.
  249. *
  250. * @param <T>
  251. * type of identifier being supplied.
  252. * @param objectIds
  253. * objects to open from the object store. The supplied collection
  254. * must not be modified until the queue has finished.
  255. * @param reportMissing
  256. * if true missing objects are reported by calling failure with a
  257. * MissingObjectException. This may be more expensive for the
  258. * implementation to guarantee. If false the implementation may
  259. * choose to report MissingObjectException, or silently skip over
  260. * the object with no warning.
  261. * @return queue to read the objects from.
  262. */
  263. public <T extends ObjectId> AsyncObjectLoaderQueue<T> open(
  264. Iterable<T> objectIds, final boolean reportMissing) {
  265. final Iterator<T> idItr = objectIds.iterator();
  266. return new AsyncObjectLoaderQueue<T>() {
  267. private T cur;
  268. public boolean next() throws MissingObjectException, IOException {
  269. if (idItr.hasNext()) {
  270. cur = idItr.next();
  271. return true;
  272. } else {
  273. return false;
  274. }
  275. }
  276. public T getCurrent() {
  277. return cur;
  278. }
  279. public ObjectId getObjectId() {
  280. return cur;
  281. }
  282. public ObjectLoader open() throws IOException {
  283. return ObjectReader.this.open(cur, OBJ_ANY);
  284. }
  285. public boolean cancel(boolean mayInterruptIfRunning) {
  286. return true;
  287. }
  288. public void release() {
  289. // Since we are sequential by default, we don't
  290. // have any state to clean up if we terminate early.
  291. }
  292. };
  293. }
  294. /**
  295. * Get only the size of an object.
  296. * <p>
  297. * The default implementation of this method opens an ObjectLoader.
  298. * Databases are encouraged to override this if a faster access method is
  299. * available to them.
  300. *
  301. * @param objectId
  302. * identity of the object to open.
  303. * @param typeHint
  304. * hint about the type of object being requested, e.g.
  305. * {@link Constants#OBJ_BLOB}; {@link #OBJ_ANY} if the object
  306. * type is not known, or does not matter to the caller.
  307. * @return size of object in bytes.
  308. * @throws MissingObjectException
  309. * the object does not exist.
  310. * @throws IncorrectObjectTypeException
  311. * typeHint was not OBJ_ANY, and the object's actual type does
  312. * not match typeHint.
  313. * @throws IOException
  314. * the object store cannot be accessed.
  315. */
  316. public long getObjectSize(AnyObjectId objectId, int typeHint)
  317. throws MissingObjectException, IncorrectObjectTypeException,
  318. IOException {
  319. return open(objectId, typeHint).getSize();
  320. }
  321. /**
  322. * Asynchronous object size lookup.
  323. *
  324. * @param <T>
  325. * type of identifier being supplied.
  326. * @param objectIds
  327. * objects to get the size of from the object store. The supplied
  328. * collection must not be modified until the queue has finished.
  329. * @param reportMissing
  330. * if true missing objects are reported by calling failure with a
  331. * MissingObjectException. This may be more expensive for the
  332. * implementation to guarantee. If false the implementation may
  333. * choose to report MissingObjectException, or silently skip over
  334. * the object with no warning.
  335. * @return queue to read object sizes from.
  336. */
  337. public <T extends ObjectId> AsyncObjectSizeQueue<T> getObjectSize(
  338. Iterable<T> objectIds, final boolean reportMissing) {
  339. final Iterator<T> idItr = objectIds.iterator();
  340. return new AsyncObjectSizeQueue<T>() {
  341. private T cur;
  342. private long sz;
  343. public boolean next() throws MissingObjectException, IOException {
  344. if (idItr.hasNext()) {
  345. cur = idItr.next();
  346. sz = getObjectSize(cur, OBJ_ANY);
  347. return true;
  348. } else {
  349. return false;
  350. }
  351. }
  352. public T getCurrent() {
  353. return cur;
  354. }
  355. public ObjectId getObjectId() {
  356. return cur;
  357. }
  358. public long getSize() {
  359. return sz;
  360. }
  361. public boolean cancel(boolean mayInterruptIfRunning) {
  362. return true;
  363. }
  364. public void release() {
  365. // Since we are sequential by default, we don't
  366. // have any state to clean up if we terminate early.
  367. }
  368. };
  369. }
  370. /**
  371. * Advice from a {@link RevWalk} that a walk is starting from these roots.
  372. *
  373. * @param walk
  374. * the revision pool that is using this reader.
  375. * @param roots
  376. * starting points of the revision walk. The starting points have
  377. * their headers parsed, but might be missing bodies.
  378. * @throws IOException
  379. * the reader cannot initialize itself to support the walk.
  380. */
  381. public void walkAdviceBeginCommits(RevWalk walk, Collection<RevCommit> roots)
  382. throws IOException {
  383. // Do nothing by default, most readers don't want or need advice.
  384. }
  385. /**
  386. * Advice from an {@link ObjectWalk} that trees will be traversed.
  387. *
  388. * @param ow
  389. * the object pool that is using this reader.
  390. * @param min
  391. * the first commit whose root tree will be read.
  392. * @param max
  393. * the last commit whose root tree will be read.
  394. * @throws IOException
  395. * the reader cannot initialize itself to support the walk.
  396. */
  397. public void walkAdviceBeginTrees(ObjectWalk ow, RevCommit min, RevCommit max)
  398. throws IOException {
  399. // Do nothing by default, most readers don't want or need advice.
  400. }
  401. /** Advice from that a walk is over. */
  402. public void walkAdviceEnd() {
  403. // Do nothing by default, most readers don't want or need advice.
  404. }
  405. /**
  406. * Advise the reader to avoid unreachable objects.
  407. * <p>
  408. * While enabled the reader will skip over anything previously proven to be
  409. * unreachable. This may be dangerous in the face of concurrent writes.
  410. *
  411. * @param avoid
  412. * true to avoid unreachable objects.
  413. * @since 3.0
  414. */
  415. public void setAvoidUnreachableObjects(boolean avoid) {
  416. // Do nothing by default.
  417. }
  418. /**
  419. * An index that can be used to speed up ObjectWalks.
  420. *
  421. * @return the index or null if one does not exist.
  422. * @throws IOException
  423. * when the index fails to load
  424. * @since 3.0
  425. */
  426. public BitmapIndex getBitmapIndex() throws IOException {
  427. return null;
  428. }
  429. /**
  430. * Release any resources used by this reader.
  431. * <p>
  432. * A reader that has been released can be used again, but may need to be
  433. * released after the subsequent usage.
  434. */
  435. public void release() {
  436. // Do nothing.
  437. }
  438. }