You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

ObjectReader.java 17KB

Added read/write support for pack bitmap index. A pack bitmap index is an additional index of compressed bitmaps of the object graph. Furthermore, a logical API of the index functionality is included, as it is expected to be used by the PackWriter. Compressed bitmaps are created using the javaewah library, which is a word-aligned compressed variant of the Java bitset class based on run-length encoding. The library only works with positive integer values. Thus, the maximum number of ObjectIds in a pack file that this index can currently support is limited to Integer.MAX_VALUE. Every ObjectId is given an integer mapping. The integer is the position of the ObjectId in the complete ObjectId list, sorted by offset, for the pack file. That integer is what the bitmaps use to reference the ObjectId. Currently, the new index format can only be used with pack files that contain a complete closure of the object graph e.g. the result of a garbage collection. The index file includes four bitmaps for the Git object types i.e. commits, trees, blobs, and tags. In addition, a collection of bitmaps keyed by an ObjectId is also included. The bitmap for each entry in the collection represents the full closure of ObjectIds reachable from the keyed ObjectId (including the keyed ObjectId itself). The bitmaps are further compressed by XORing the current bitmaps against prior bitmaps in the index, and selecting the smallest representation. The XOR'd bitmap and offset from the current entry to the position of the bitmap to XOR against is the actual representation of the entry in the index file. Each entry contains one byte, which is currently used to note whether the bitmap should be blindly reused. Change-Id: Id328724bf6b4c8366a088233098c18643edcf40f
11 years ago
Added read/write support for pack bitmap index. A pack bitmap index is an additional index of compressed bitmaps of the object graph. Furthermore, a logical API of the index functionality is included, as it is expected to be used by the PackWriter. Compressed bitmaps are created using the javaewah library, which is a word-aligned compressed variant of the Java bitset class based on run-length encoding. The library only works with positive integer values. Thus, the maximum number of ObjectIds in a pack file that this index can currently support is limited to Integer.MAX_VALUE. Every ObjectId is given an integer mapping. The integer is the position of the ObjectId in the complete ObjectId list, sorted by offset, for the pack file. That integer is what the bitmaps use to reference the ObjectId. Currently, the new index format can only be used with pack files that contain a complete closure of the object graph e.g. the result of a garbage collection. The index file includes four bitmaps for the Git object types i.e. commits, trees, blobs, and tags. In addition, a collection of bitmaps keyed by an ObjectId is also included. The bitmap for each entry in the collection represents the full closure of ObjectIds reachable from the keyed ObjectId (including the keyed ObjectId itself). The bitmaps are further compressed by XORing the current bitmaps against prior bitmaps in the index, and selecting the smallest representation. The XOR'd bitmap and offset from the current entry to the position of the bitmap to XOR against is the actual representation of the entry in the index file. Each entry contains one byte, which is currently used to note whether the bitmap should be blindly reused. Change-Id: Id328724bf6b4c8366a088233098c18643edcf40f
11 years ago
Added read/write support for pack bitmap index. A pack bitmap index is an additional index of compressed bitmaps of the object graph. Furthermore, a logical API of the index functionality is included, as it is expected to be used by the PackWriter. Compressed bitmaps are created using the javaewah library, which is a word-aligned compressed variant of the Java bitset class based on run-length encoding. The library only works with positive integer values. Thus, the maximum number of ObjectIds in a pack file that this index can currently support is limited to Integer.MAX_VALUE. Every ObjectId is given an integer mapping. The integer is the position of the ObjectId in the complete ObjectId list, sorted by offset, for the pack file. That integer is what the bitmaps use to reference the ObjectId. Currently, the new index format can only be used with pack files that contain a complete closure of the object graph e.g. the result of a garbage collection. The index file includes four bitmaps for the Git object types i.e. commits, trees, blobs, and tags. In addition, a collection of bitmaps keyed by an ObjectId is also included. The bitmap for each entry in the collection represents the full closure of ObjectIds reachable from the keyed ObjectId (including the keyed ObjectId itself). The bitmaps are further compressed by XORing the current bitmaps against prior bitmaps in the index, and selecting the smallest representation. The XOR'd bitmap and offset from the current entry to the position of the bitmap to XOR against is the actual representation of the entry in the index file. Each entry contains one byte, which is currently used to note whether the bitmap should be blindly reused. Change-Id: Id328724bf6b4c8366a088233098c18643edcf40f
11 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563
  1. /*
  2. * Copyright (C) 2010, Google Inc. and others
  3. *
  4. * This program and the accompanying materials are made available under the
  5. * terms of the Eclipse Distribution License v. 1.0 which is available at
  6. * https://www.eclipse.org/org/documents/edl-v10.php.
  7. *
  8. * SPDX-License-Identifier: BSD-3-Clause
  9. */
  10. package org.eclipse.jgit.lib;
  11. import java.io.IOException;
  12. import java.util.ArrayList;
  13. import java.util.Collection;
  14. import java.util.Iterator;
  15. import java.util.List;
  16. import java.util.Set;
  17. import org.eclipse.jgit.annotations.Nullable;
  18. import org.eclipse.jgit.errors.IncorrectObjectTypeException;
  19. import org.eclipse.jgit.errors.MissingObjectException;
  20. /**
  21. * Reads an {@link org.eclipse.jgit.lib.ObjectDatabase} for a single thread.
  22. * <p>
  23. * Readers that can support efficient reuse of pack encoded objects should also
  24. * implement the companion interface
  25. * {@link org.eclipse.jgit.internal.storage.pack.ObjectReuseAsIs}.
  26. */
  27. public abstract class ObjectReader implements AutoCloseable {
  28. /** Type hint indicating the caller doesn't know the type. */
  29. public static final int OBJ_ANY = -1;
  30. /**
  31. * The threshold at which a file will be streamed rather than loaded
  32. * entirely into memory.
  33. * @since 4.6
  34. */
  35. protected int streamFileThreshold;
  36. /**
  37. * Construct a new reader from the same data.
  38. * <p>
  39. * Applications can use this method to build a new reader from the same data
  40. * source, but for an different thread.
  41. *
  42. * @return a brand new reader, using the same data source.
  43. */
  44. public abstract ObjectReader newReader();
  45. /**
  46. * Obtain a unique abbreviation (prefix) of an object SHA-1.
  47. *
  48. * This method uses a reasonable default for the minimum length. Callers who
  49. * don't care about the minimum length should prefer this method.
  50. *
  51. * The returned abbreviation would expand back to the argument ObjectId when
  52. * passed to {@link #resolve(AbbreviatedObjectId)}, assuming no new objects
  53. * are added to this repository between calls.
  54. *
  55. * @param objectId
  56. * object identity that needs to be abbreviated.
  57. * @return SHA-1 abbreviation.
  58. * @throws java.io.IOException
  59. * the object store cannot be read.
  60. */
  61. public AbbreviatedObjectId abbreviate(AnyObjectId objectId)
  62. throws IOException {
  63. return abbreviate(objectId, 7);
  64. }
  65. /**
  66. * Obtain a unique abbreviation (prefix) of an object SHA-1.
  67. *
  68. * The returned abbreviation would expand back to the argument ObjectId when
  69. * passed to {@link #resolve(AbbreviatedObjectId)}, assuming no new objects
  70. * are added to this repository between calls.
  71. *
  72. * The default implementation of this method abbreviates the id to the
  73. * minimum length, then resolves it to see if there are multiple results.
  74. * When multiple results are found, the length is extended by 1 and resolve
  75. * is tried again.
  76. *
  77. * @param objectId
  78. * object identity that needs to be abbreviated.
  79. * @param len
  80. * minimum length of the abbreviated string. Must be in the range
  81. * [2, {@value Constants#OBJECT_ID_STRING_LENGTH}].
  82. * @return SHA-1 abbreviation. If no matching objects exist in the
  83. * repository, the abbreviation will match the minimum length.
  84. * @throws java.io.IOException
  85. * the object store cannot be read.
  86. */
  87. public AbbreviatedObjectId abbreviate(AnyObjectId objectId, int len)
  88. throws IOException {
  89. if (len == Constants.OBJECT_ID_STRING_LENGTH)
  90. return AbbreviatedObjectId.fromObjectId(objectId);
  91. AbbreviatedObjectId abbrev = objectId.abbreviate(len);
  92. Collection<ObjectId> matches = resolve(abbrev);
  93. while (1 < matches.size() && len < Constants.OBJECT_ID_STRING_LENGTH) {
  94. abbrev = objectId.abbreviate(++len);
  95. List<ObjectId> n = new ArrayList<>(8);
  96. for (ObjectId candidate : matches) {
  97. if (abbrev.prefixCompare(candidate) == 0)
  98. n.add(candidate);
  99. }
  100. if (1 < n.size())
  101. matches = n;
  102. else
  103. matches = resolve(abbrev);
  104. }
  105. return abbrev;
  106. }
  107. /**
  108. * Resolve an abbreviated ObjectId to its full form.
  109. *
  110. * This method searches for an ObjectId that begins with the abbreviation,
  111. * and returns at least some matching candidates.
  112. *
  113. * If the returned collection is empty, no objects start with this
  114. * abbreviation. The abbreviation doesn't belong to this repository, or the
  115. * repository lacks the necessary objects to complete it.
  116. *
  117. * If the collection contains exactly one member, the abbreviation is
  118. * (currently) unique within this database. There is a reasonably high
  119. * probability that the returned id is what was previously abbreviated.
  120. *
  121. * If the collection contains 2 or more members, the abbreviation is not
  122. * unique. In this case the implementation is only required to return at
  123. * least 2 candidates to signal the abbreviation has conflicts. User
  124. * friendly implementations should return as many candidates as reasonably
  125. * possible, as the caller may be able to disambiguate further based on
  126. * context. However since databases can be very large (e.g. 10 million
  127. * objects) returning 625,000 candidates for the abbreviation "0" is simply
  128. * unreasonable, so implementors should draw the line at around 256 matches.
  129. *
  130. * @param id
  131. * abbreviated id to resolve to a complete identity. The
  132. * abbreviation must have a length of at least 2.
  133. * @return candidates that begin with the abbreviated identity.
  134. * @throws java.io.IOException
  135. * the object store cannot be read.
  136. */
  137. public abstract Collection<ObjectId> resolve(AbbreviatedObjectId id)
  138. throws IOException;
  139. /**
  140. * Does the requested object exist in this database?
  141. *
  142. * @param objectId
  143. * identity of the object to test for existence of.
  144. * @return true if the specified object is stored in this database.
  145. * @throws java.io.IOException
  146. * the object store cannot be accessed.
  147. */
  148. public boolean has(AnyObjectId objectId) throws IOException {
  149. return has(objectId, OBJ_ANY);
  150. }
  151. /**
  152. * Does the requested object exist in this database?
  153. *
  154. * @param objectId
  155. * identity of the object to test for existence of.
  156. * @param typeHint
  157. * hint about the type of object being requested, e.g.
  158. * {@link org.eclipse.jgit.lib.Constants#OBJ_BLOB};
  159. * {@link #OBJ_ANY} if the object type is not known, or does not
  160. * matter to the caller.
  161. * @return true if the specified object is stored in this database.
  162. * @throws IncorrectObjectTypeException
  163. * typeHint was not OBJ_ANY, and the object's actual type does
  164. * not match typeHint.
  165. * @throws java.io.IOException
  166. * the object store cannot be accessed.
  167. */
  168. public boolean has(AnyObjectId objectId, int typeHint) throws IOException {
  169. try {
  170. open(objectId, typeHint);
  171. return true;
  172. } catch (MissingObjectException notFound) {
  173. return false;
  174. }
  175. }
  176. /**
  177. * Open an object from this database.
  178. *
  179. * @param objectId
  180. * identity of the object to open.
  181. * @return a {@link org.eclipse.jgit.lib.ObjectLoader} for accessing the
  182. * object.
  183. * @throws org.eclipse.jgit.errors.MissingObjectException
  184. * the object does not exist.
  185. * @throws java.io.IOException
  186. * the object store cannot be accessed.
  187. */
  188. public ObjectLoader open(AnyObjectId objectId)
  189. throws MissingObjectException, IOException {
  190. return open(objectId, OBJ_ANY);
  191. }
  192. /**
  193. * Open an object from this database.
  194. *
  195. * @param objectId
  196. * identity of the object to open.
  197. * @param typeHint
  198. * hint about the type of object being requested, e.g.
  199. * {@link org.eclipse.jgit.lib.Constants#OBJ_BLOB};
  200. * {@link #OBJ_ANY} if the object type is not known, or does not
  201. * matter to the caller.
  202. * @return a {@link org.eclipse.jgit.lib.ObjectLoader} for accessing the
  203. * object.
  204. * @throws org.eclipse.jgit.errors.MissingObjectException
  205. * the object does not exist.
  206. * @throws org.eclipse.jgit.errors.IncorrectObjectTypeException
  207. * typeHint was not OBJ_ANY, and the object's actual type does
  208. * not match typeHint.
  209. * @throws java.io.IOException
  210. * the object store cannot be accessed.
  211. */
  212. public abstract ObjectLoader open(AnyObjectId objectId, int typeHint)
  213. throws MissingObjectException, IncorrectObjectTypeException,
  214. IOException;
  215. /**
  216. * Returns IDs for those commits which should be considered as shallow.
  217. *
  218. * @return IDs of shallow commits
  219. * @throws java.io.IOException
  220. */
  221. public abstract Set<ObjectId> getShallowCommits() throws IOException;
  222. /**
  223. * Asynchronous object opening.
  224. *
  225. * @param objectIds
  226. * objects to open from the object store. The supplied collection
  227. * must not be modified until the queue has finished.
  228. * @param reportMissing
  229. * if true missing objects are reported by calling failure with a
  230. * MissingObjectException. This may be more expensive for the
  231. * implementation to guarantee. If false the implementation may
  232. * choose to report MissingObjectException, or silently skip over
  233. * the object with no warning.
  234. * @return queue to read the objects from.
  235. */
  236. public <T extends ObjectId> AsyncObjectLoaderQueue<T> open(
  237. Iterable<T> objectIds, final boolean reportMissing) {
  238. final Iterator<T> idItr = objectIds.iterator();
  239. return new AsyncObjectLoaderQueue<T>() {
  240. private T cur;
  241. @Override
  242. public boolean next() throws MissingObjectException, IOException {
  243. if (idItr.hasNext()) {
  244. cur = idItr.next();
  245. return true;
  246. }
  247. return false;
  248. }
  249. @Override
  250. public T getCurrent() {
  251. return cur;
  252. }
  253. @Override
  254. public ObjectId getObjectId() {
  255. return cur;
  256. }
  257. @Override
  258. public ObjectLoader open() throws IOException {
  259. return ObjectReader.this.open(cur, OBJ_ANY);
  260. }
  261. @Override
  262. public boolean cancel(boolean mayInterruptIfRunning) {
  263. return true;
  264. }
  265. @Override
  266. public void release() {
  267. // Since we are sequential by default, we don't
  268. // have any state to clean up if we terminate early.
  269. }
  270. };
  271. }
  272. /**
  273. * Get only the size of an object.
  274. * <p>
  275. * The default implementation of this method opens an ObjectLoader.
  276. * Databases are encouraged to override this if a faster access method is
  277. * available to them.
  278. *
  279. * @param objectId
  280. * identity of the object to open.
  281. * @param typeHint
  282. * hint about the type of object being requested, e.g.
  283. * {@link org.eclipse.jgit.lib.Constants#OBJ_BLOB};
  284. * {@link #OBJ_ANY} if the object type is not known, or does not
  285. * matter to the caller.
  286. * @return size of object in bytes.
  287. * @throws org.eclipse.jgit.errors.MissingObjectException
  288. * the object does not exist.
  289. * @throws org.eclipse.jgit.errors.IncorrectObjectTypeException
  290. * typeHint was not OBJ_ANY, and the object's actual type does
  291. * not match typeHint.
  292. * @throws java.io.IOException
  293. * the object store cannot be accessed.
  294. */
  295. public long getObjectSize(AnyObjectId objectId, int typeHint)
  296. throws MissingObjectException, IncorrectObjectTypeException,
  297. IOException {
  298. return open(objectId, typeHint).getSize();
  299. }
  300. /**
  301. * Asynchronous object size lookup.
  302. *
  303. * @param objectIds
  304. * objects to get the size of from the object store. The supplied
  305. * collection must not be modified until the queue has finished.
  306. * @param reportMissing
  307. * if true missing objects are reported by calling failure with a
  308. * MissingObjectException. This may be more expensive for the
  309. * implementation to guarantee. If false the implementation may
  310. * choose to report MissingObjectException, or silently skip over
  311. * the object with no warning.
  312. * @return queue to read object sizes from.
  313. */
  314. public <T extends ObjectId> AsyncObjectSizeQueue<T> getObjectSize(
  315. Iterable<T> objectIds, final boolean reportMissing) {
  316. final Iterator<T> idItr = objectIds.iterator();
  317. return new AsyncObjectSizeQueue<T>() {
  318. private T cur;
  319. private long sz;
  320. @Override
  321. public boolean next() throws MissingObjectException, IOException {
  322. if (idItr.hasNext()) {
  323. cur = idItr.next();
  324. sz = getObjectSize(cur, OBJ_ANY);
  325. return true;
  326. }
  327. return false;
  328. }
  329. @Override
  330. public T getCurrent() {
  331. return cur;
  332. }
  333. @Override
  334. public ObjectId getObjectId() {
  335. return cur;
  336. }
  337. @Override
  338. public long getSize() {
  339. return sz;
  340. }
  341. @Override
  342. public boolean cancel(boolean mayInterruptIfRunning) {
  343. return true;
  344. }
  345. @Override
  346. public void release() {
  347. // Since we are sequential by default, we don't
  348. // have any state to clean up if we terminate early.
  349. }
  350. };
  351. }
  352. /**
  353. * Advise the reader to avoid unreachable objects.
  354. * <p>
  355. * While enabled the reader will skip over anything previously proven to be
  356. * unreachable. This may be dangerous in the face of concurrent writes.
  357. *
  358. * @param avoid
  359. * true to avoid unreachable objects.
  360. * @since 3.0
  361. */
  362. public void setAvoidUnreachableObjects(boolean avoid) {
  363. // Do nothing by default.
  364. }
  365. /**
  366. * An index that can be used to speed up ObjectWalks.
  367. *
  368. * @return the index or null if one does not exist.
  369. * @throws java.io.IOException
  370. * when the index fails to load
  371. * @since 3.0
  372. */
  373. public BitmapIndex getBitmapIndex() throws IOException {
  374. return null;
  375. }
  376. /**
  377. * Get the {@link org.eclipse.jgit.lib.ObjectInserter} from which this
  378. * reader was created using {@code inserter.newReader()}
  379. *
  380. * @return the {@link org.eclipse.jgit.lib.ObjectInserter} from which this
  381. * reader was created using {@code inserter.newReader()}, or null if
  382. * this reader was not created from an inserter.
  383. * @since 4.4
  384. */
  385. @Nullable
  386. public ObjectInserter getCreatedFromInserter() {
  387. return null;
  388. }
  389. /**
  390. * {@inheritDoc}
  391. * <p>
  392. * Release any resources used by this reader.
  393. * <p>
  394. * A reader that has been released can be used again, but may need to be
  395. * released after the subsequent usage.
  396. *
  397. * @since 4.0
  398. */
  399. @Override
  400. public abstract void close();
  401. /**
  402. * Sets the threshold at which a file will be streamed rather than loaded
  403. * entirely into memory
  404. *
  405. * @param threshold
  406. * the new threshold
  407. * @since 4.6
  408. */
  409. public void setStreamFileThreshold(int threshold) {
  410. streamFileThreshold = threshold;
  411. }
  412. /**
  413. * Returns the threshold at which a file will be streamed rather than loaded
  414. * entirely into memory
  415. *
  416. * @return the threshold in bytes
  417. * @since 4.6
  418. */
  419. public int getStreamFileThreshold() {
  420. return streamFileThreshold;
  421. }
  422. /**
  423. * Wraps a delegate ObjectReader.
  424. *
  425. * @since 4.4
  426. */
  427. public abstract static class Filter extends ObjectReader {
  428. /**
  429. * @return delegate ObjectReader to handle all processing.
  430. * @since 4.4
  431. */
  432. protected abstract ObjectReader delegate();
  433. @Override
  434. public ObjectReader newReader() {
  435. return delegate().newReader();
  436. }
  437. @Override
  438. public AbbreviatedObjectId abbreviate(AnyObjectId objectId)
  439. throws IOException {
  440. return delegate().abbreviate(objectId);
  441. }
  442. @Override
  443. public AbbreviatedObjectId abbreviate(AnyObjectId objectId, int len)
  444. throws IOException {
  445. return delegate().abbreviate(objectId, len);
  446. }
  447. @Override
  448. public Collection<ObjectId> resolve(AbbreviatedObjectId id)
  449. throws IOException {
  450. return delegate().resolve(id);
  451. }
  452. @Override
  453. public boolean has(AnyObjectId objectId) throws IOException {
  454. return delegate().has(objectId);
  455. }
  456. @Override
  457. public boolean has(AnyObjectId objectId, int typeHint) throws IOException {
  458. return delegate().has(objectId, typeHint);
  459. }
  460. @Override
  461. public ObjectLoader open(AnyObjectId objectId)
  462. throws MissingObjectException, IOException {
  463. return delegate().open(objectId);
  464. }
  465. @Override
  466. public ObjectLoader open(AnyObjectId objectId, int typeHint)
  467. throws MissingObjectException, IncorrectObjectTypeException,
  468. IOException {
  469. return delegate().open(objectId, typeHint);
  470. }
  471. @Override
  472. public Set<ObjectId> getShallowCommits() throws IOException {
  473. return delegate().getShallowCommits();
  474. }
  475. @Override
  476. public <T extends ObjectId> AsyncObjectLoaderQueue<T> open(
  477. Iterable<T> objectIds, boolean reportMissing) {
  478. return delegate().open(objectIds, reportMissing);
  479. }
  480. @Override
  481. public long getObjectSize(AnyObjectId objectId, int typeHint)
  482. throws MissingObjectException, IncorrectObjectTypeException,
  483. IOException {
  484. return delegate().getObjectSize(objectId, typeHint);
  485. }
  486. @Override
  487. public <T extends ObjectId> AsyncObjectSizeQueue<T> getObjectSize(
  488. Iterable<T> objectIds, boolean reportMissing) {
  489. return delegate().getObjectSize(objectIds, reportMissing);
  490. }
  491. @Override
  492. public void setAvoidUnreachableObjects(boolean avoid) {
  493. delegate().setAvoidUnreachableObjects(avoid);
  494. }
  495. @Override
  496. public BitmapIndex getBitmapIndex() throws IOException {
  497. return delegate().getBitmapIndex();
  498. }
  499. @Override
  500. @Nullable
  501. public ObjectInserter getCreatedFromInserter() {
  502. return delegate().getCreatedFromInserter();
  503. }
  504. @Override
  505. public void close() {
  506. delegate().close();
  507. }
  508. }
  509. }