You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

ObjectReader.java 18KB

Added read/write support for pack bitmap index. A pack bitmap index is an additional index of compressed bitmaps of the object graph. Furthermore, a logical API of the index functionality is included, as it is expected to be used by the PackWriter. Compressed bitmaps are created using the javaewah library, which is a word-aligned compressed variant of the Java bitset class based on run-length encoding. The library only works with positive integer values. Thus, the maximum number of ObjectIds in a pack file that this index can currently support is limited to Integer.MAX_VALUE. Every ObjectId is given an integer mapping. The integer is the position of the ObjectId in the complete ObjectId list, sorted by offset, for the pack file. That integer is what the bitmaps use to reference the ObjectId. Currently, the new index format can only be used with pack files that contain a complete closure of the object graph e.g. the result of a garbage collection. The index file includes four bitmaps for the Git object types i.e. commits, trees, blobs, and tags. In addition, a collection of bitmaps keyed by an ObjectId is also included. The bitmap for each entry in the collection represents the full closure of ObjectIds reachable from the keyed ObjectId (including the keyed ObjectId itself). The bitmaps are further compressed by XORing the current bitmaps against prior bitmaps in the index, and selecting the smallest representation. The XOR'd bitmap and offset from the current entry to the position of the bitmap to XOR against is the actual representation of the entry in the index file. Each entry contains one byte, which is currently used to note whether the bitmap should be blindly reused. Change-Id: Id328724bf6b4c8366a088233098c18643edcf40f
11 years ago
Added read/write support for pack bitmap index. A pack bitmap index is an additional index of compressed bitmaps of the object graph. Furthermore, a logical API of the index functionality is included, as it is expected to be used by the PackWriter. Compressed bitmaps are created using the javaewah library, which is a word-aligned compressed variant of the Java bitset class based on run-length encoding. The library only works with positive integer values. Thus, the maximum number of ObjectIds in a pack file that this index can currently support is limited to Integer.MAX_VALUE. Every ObjectId is given an integer mapping. The integer is the position of the ObjectId in the complete ObjectId list, sorted by offset, for the pack file. That integer is what the bitmaps use to reference the ObjectId. Currently, the new index format can only be used with pack files that contain a complete closure of the object graph e.g. the result of a garbage collection. The index file includes four bitmaps for the Git object types i.e. commits, trees, blobs, and tags. In addition, a collection of bitmaps keyed by an ObjectId is also included. The bitmap for each entry in the collection represents the full closure of ObjectIds reachable from the keyed ObjectId (including the keyed ObjectId itself). The bitmaps are further compressed by XORing the current bitmaps against prior bitmaps in the index, and selecting the smallest representation. The XOR'd bitmap and offset from the current entry to the position of the bitmap to XOR against is the actual representation of the entry in the index file. Each entry contains one byte, which is currently used to note whether the bitmap should be blindly reused. Change-Id: Id328724bf6b4c8366a088233098c18643edcf40f
11 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580
  1. /*
  2. * Copyright (C) 2010, Google Inc.
  3. * and other copyright owners as documented in the project's IP log.
  4. *
  5. * This program and the accompanying materials are made available
  6. * under the terms of the Eclipse Distribution License v1.0 which
  7. * accompanies this distribution, is reproduced below, and is
  8. * available at http://www.eclipse.org/org/documents/edl-v10.php
  9. *
  10. * All rights reserved.
  11. *
  12. * Redistribution and use in source and binary forms, with or
  13. * without modification, are permitted provided that the following
  14. * conditions are met:
  15. *
  16. * - Redistributions of source code must retain the above copyright
  17. * notice, this list of conditions and the following disclaimer.
  18. *
  19. * - Redistributions in binary form must reproduce the above
  20. * copyright notice, this list of conditions and the following
  21. * disclaimer in the documentation and/or other materials provided
  22. * with the distribution.
  23. *
  24. * - Neither the name of the Eclipse Foundation, Inc. nor the
  25. * names of its contributors may be used to endorse or promote
  26. * products derived from this software without specific prior
  27. * written permission.
  28. *
  29. * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
  30. * CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
  31. * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  32. * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  33. * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
  34. * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
  35. * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
  36. * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
  37. * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  38. * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
  39. * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
  40. * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
  41. * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  42. */
  43. package org.eclipse.jgit.lib;
  44. import java.io.IOException;
  45. import java.util.ArrayList;
  46. import java.util.Collection;
  47. import java.util.Iterator;
  48. import java.util.List;
  49. import java.util.Set;
  50. import org.eclipse.jgit.annotations.Nullable;
  51. import org.eclipse.jgit.errors.IncorrectObjectTypeException;
  52. import org.eclipse.jgit.errors.MissingObjectException;
  53. import org.eclipse.jgit.internal.storage.pack.ObjectReuseAsIs;
  54. /**
  55. * Reads an {@link ObjectDatabase} for a single thread.
  56. * <p>
  57. * Readers that can support efficient reuse of pack encoded objects should also
  58. * implement the companion interface {@link ObjectReuseAsIs}.
  59. */
  60. public abstract class ObjectReader implements AutoCloseable {
  61. /** Type hint indicating the caller doesn't know the type. */
  62. public static final int OBJ_ANY = -1;
  63. /**
  64. * The threshold at which a file will be streamed rather than loaded
  65. * entirely into memory.
  66. * @since 4.6
  67. */
  68. protected int streamFileThreshold;
  69. /**
  70. * Construct a new reader from the same data.
  71. * <p>
  72. * Applications can use this method to build a new reader from the same data
  73. * source, but for an different thread.
  74. *
  75. * @return a brand new reader, using the same data source.
  76. */
  77. public abstract ObjectReader newReader();
  78. /**
  79. * Obtain a unique abbreviation (prefix) of an object SHA-1.
  80. *
  81. * This method uses a reasonable default for the minimum length. Callers who
  82. * don't care about the minimum length should prefer this method.
  83. *
  84. * The returned abbreviation would expand back to the argument ObjectId when
  85. * passed to {@link #resolve(AbbreviatedObjectId)}, assuming no new objects
  86. * are added to this repository between calls.
  87. *
  88. * @param objectId
  89. * object identity that needs to be abbreviated.
  90. * @return SHA-1 abbreviation.
  91. * @throws IOException
  92. * the object store cannot be read.
  93. */
  94. public AbbreviatedObjectId abbreviate(AnyObjectId objectId)
  95. throws IOException {
  96. return abbreviate(objectId, 7);
  97. }
  98. /**
  99. * Obtain a unique abbreviation (prefix) of an object SHA-1.
  100. *
  101. * The returned abbreviation would expand back to the argument ObjectId when
  102. * passed to {@link #resolve(AbbreviatedObjectId)}, assuming no new objects
  103. * are added to this repository between calls.
  104. *
  105. * The default implementation of this method abbreviates the id to the
  106. * minimum length, then resolves it to see if there are multiple results.
  107. * When multiple results are found, the length is extended by 1 and resolve
  108. * is tried again.
  109. *
  110. * @param objectId
  111. * object identity that needs to be abbreviated.
  112. * @param len
  113. * minimum length of the abbreviated string. Must be in the range
  114. * [2, {@value Constants#OBJECT_ID_STRING_LENGTH}].
  115. * @return SHA-1 abbreviation. If no matching objects exist in the
  116. * repository, the abbreviation will match the minimum length.
  117. * @throws IOException
  118. * the object store cannot be read.
  119. */
  120. public AbbreviatedObjectId abbreviate(AnyObjectId objectId, int len)
  121. throws IOException {
  122. if (len == Constants.OBJECT_ID_STRING_LENGTH)
  123. return AbbreviatedObjectId.fromObjectId(objectId);
  124. AbbreviatedObjectId abbrev = objectId.abbreviate(len);
  125. Collection<ObjectId> matches = resolve(abbrev);
  126. while (1 < matches.size() && len < Constants.OBJECT_ID_STRING_LENGTH) {
  127. abbrev = objectId.abbreviate(++len);
  128. List<ObjectId> n = new ArrayList<ObjectId>(8);
  129. for (ObjectId candidate : matches) {
  130. if (abbrev.prefixCompare(candidate) == 0)
  131. n.add(candidate);
  132. }
  133. if (1 < n.size())
  134. matches = n;
  135. else
  136. matches = resolve(abbrev);
  137. }
  138. return abbrev;
  139. }
  140. /**
  141. * Resolve an abbreviated ObjectId to its full form.
  142. *
  143. * This method searches for an ObjectId that begins with the abbreviation,
  144. * and returns at least some matching candidates.
  145. *
  146. * If the returned collection is empty, no objects start with this
  147. * abbreviation. The abbreviation doesn't belong to this repository, or the
  148. * repository lacks the necessary objects to complete it.
  149. *
  150. * If the collection contains exactly one member, the abbreviation is
  151. * (currently) unique within this database. There is a reasonably high
  152. * probability that the returned id is what was previously abbreviated.
  153. *
  154. * If the collection contains 2 or more members, the abbreviation is not
  155. * unique. In this case the implementation is only required to return at
  156. * least 2 candidates to signal the abbreviation has conflicts. User
  157. * friendly implementations should return as many candidates as reasonably
  158. * possible, as the caller may be able to disambiguate further based on
  159. * context. However since databases can be very large (e.g. 10 million
  160. * objects) returning 625,000 candidates for the abbreviation "0" is simply
  161. * unreasonable, so implementors should draw the line at around 256 matches.
  162. *
  163. * @param id
  164. * abbreviated id to resolve to a complete identity. The
  165. * abbreviation must have a length of at least 2.
  166. * @return candidates that begin with the abbreviated identity.
  167. * @throws IOException
  168. * the object store cannot be read.
  169. */
  170. public abstract Collection<ObjectId> resolve(AbbreviatedObjectId id)
  171. throws IOException;
  172. /**
  173. * Does the requested object exist in this database?
  174. *
  175. * @param objectId
  176. * identity of the object to test for existence of.
  177. * @return true if the specified object is stored in this database.
  178. * @throws IOException
  179. * the object store cannot be accessed.
  180. */
  181. public boolean has(AnyObjectId objectId) throws IOException {
  182. return has(objectId, OBJ_ANY);
  183. }
  184. /**
  185. * Does the requested object exist in this database?
  186. *
  187. * @param objectId
  188. * identity of the object to test for existence of.
  189. * @param typeHint
  190. * hint about the type of object being requested, e.g.
  191. * {@link Constants#OBJ_BLOB}; {@link #OBJ_ANY} if the object
  192. * type is not known, or does not matter to the caller.
  193. * @return true if the specified object is stored in this database.
  194. * @throws IncorrectObjectTypeException
  195. * typeHint was not OBJ_ANY, and the object's actual type does
  196. * not match typeHint.
  197. * @throws IOException
  198. * the object store cannot be accessed.
  199. */
  200. public boolean has(AnyObjectId objectId, int typeHint) throws IOException {
  201. try {
  202. open(objectId, typeHint);
  203. return true;
  204. } catch (MissingObjectException notFound) {
  205. return false;
  206. }
  207. }
  208. /**
  209. * Open an object from this database.
  210. *
  211. * @param objectId
  212. * identity of the object to open.
  213. * @return a {@link ObjectLoader} for accessing the object.
  214. * @throws MissingObjectException
  215. * the object does not exist.
  216. * @throws IOException
  217. * the object store cannot be accessed.
  218. */
  219. public ObjectLoader open(AnyObjectId objectId)
  220. throws MissingObjectException, IOException {
  221. return open(objectId, OBJ_ANY);
  222. }
  223. /**
  224. * Open an object from this database.
  225. *
  226. * @param objectId
  227. * identity of the object to open.
  228. * @param typeHint
  229. * hint about the type of object being requested, e.g.
  230. * {@link Constants#OBJ_BLOB}; {@link #OBJ_ANY} if the object
  231. * type is not known, or does not matter to the caller.
  232. * @return a {@link ObjectLoader} for accessing the object.
  233. * @throws MissingObjectException
  234. * the object does not exist.
  235. * @throws IncorrectObjectTypeException
  236. * typeHint was not OBJ_ANY, and the object's actual type does
  237. * not match typeHint.
  238. * @throws IOException
  239. * the object store cannot be accessed.
  240. */
  241. public abstract ObjectLoader open(AnyObjectId objectId, int typeHint)
  242. throws MissingObjectException, IncorrectObjectTypeException,
  243. IOException;
  244. /**
  245. * Returns IDs for those commits which should be considered as shallow.
  246. *
  247. * @return IDs of shallow commits
  248. * @throws IOException
  249. */
  250. public abstract Set<ObjectId> getShallowCommits() throws IOException;
  251. /**
  252. * Asynchronous object opening.
  253. *
  254. * @param <T>
  255. * type of identifier being supplied.
  256. * @param objectIds
  257. * objects to open from the object store. The supplied collection
  258. * must not be modified until the queue has finished.
  259. * @param reportMissing
  260. * if true missing objects are reported by calling failure with a
  261. * MissingObjectException. This may be more expensive for the
  262. * implementation to guarantee. If false the implementation may
  263. * choose to report MissingObjectException, or silently skip over
  264. * the object with no warning.
  265. * @return queue to read the objects from.
  266. */
  267. public <T extends ObjectId> AsyncObjectLoaderQueue<T> open(
  268. Iterable<T> objectIds, final boolean reportMissing) {
  269. final Iterator<T> idItr = objectIds.iterator();
  270. return new AsyncObjectLoaderQueue<T>() {
  271. private T cur;
  272. public boolean next() throws MissingObjectException, IOException {
  273. if (idItr.hasNext()) {
  274. cur = idItr.next();
  275. return true;
  276. } else {
  277. return false;
  278. }
  279. }
  280. public T getCurrent() {
  281. return cur;
  282. }
  283. public ObjectId getObjectId() {
  284. return cur;
  285. }
  286. public ObjectLoader open() throws IOException {
  287. return ObjectReader.this.open(cur, OBJ_ANY);
  288. }
  289. public boolean cancel(boolean mayInterruptIfRunning) {
  290. return true;
  291. }
  292. public void release() {
  293. // Since we are sequential by default, we don't
  294. // have any state to clean up if we terminate early.
  295. }
  296. };
  297. }
  298. /**
  299. * Get only the size of an object.
  300. * <p>
  301. * The default implementation of this method opens an ObjectLoader.
  302. * Databases are encouraged to override this if a faster access method is
  303. * available to them.
  304. *
  305. * @param objectId
  306. * identity of the object to open.
  307. * @param typeHint
  308. * hint about the type of object being requested, e.g.
  309. * {@link Constants#OBJ_BLOB}; {@link #OBJ_ANY} if the object
  310. * type is not known, or does not matter to the caller.
  311. * @return size of object in bytes.
  312. * @throws MissingObjectException
  313. * the object does not exist.
  314. * @throws IncorrectObjectTypeException
  315. * typeHint was not OBJ_ANY, and the object's actual type does
  316. * not match typeHint.
  317. * @throws IOException
  318. * the object store cannot be accessed.
  319. */
  320. public long getObjectSize(AnyObjectId objectId, int typeHint)
  321. throws MissingObjectException, IncorrectObjectTypeException,
  322. IOException {
  323. return open(objectId, typeHint).getSize();
  324. }
  325. /**
  326. * Asynchronous object size lookup.
  327. *
  328. * @param <T>
  329. * type of identifier being supplied.
  330. * @param objectIds
  331. * objects to get the size of from the object store. The supplied
  332. * collection must not be modified until the queue has finished.
  333. * @param reportMissing
  334. * if true missing objects are reported by calling failure with a
  335. * MissingObjectException. This may be more expensive for the
  336. * implementation to guarantee. If false the implementation may
  337. * choose to report MissingObjectException, or silently skip over
  338. * the object with no warning.
  339. * @return queue to read object sizes from.
  340. */
  341. public <T extends ObjectId> AsyncObjectSizeQueue<T> getObjectSize(
  342. Iterable<T> objectIds, final boolean reportMissing) {
  343. final Iterator<T> idItr = objectIds.iterator();
  344. return new AsyncObjectSizeQueue<T>() {
  345. private T cur;
  346. private long sz;
  347. public boolean next() throws MissingObjectException, IOException {
  348. if (idItr.hasNext()) {
  349. cur = idItr.next();
  350. sz = getObjectSize(cur, OBJ_ANY);
  351. return true;
  352. } else {
  353. return false;
  354. }
  355. }
  356. public T getCurrent() {
  357. return cur;
  358. }
  359. public ObjectId getObjectId() {
  360. return cur;
  361. }
  362. public long getSize() {
  363. return sz;
  364. }
  365. public boolean cancel(boolean mayInterruptIfRunning) {
  366. return true;
  367. }
  368. public void release() {
  369. // Since we are sequential by default, we don't
  370. // have any state to clean up if we terminate early.
  371. }
  372. };
  373. }
  374. /**
  375. * Advise the reader to avoid unreachable objects.
  376. * <p>
  377. * While enabled the reader will skip over anything previously proven to be
  378. * unreachable. This may be dangerous in the face of concurrent writes.
  379. *
  380. * @param avoid
  381. * true to avoid unreachable objects.
  382. * @since 3.0
  383. */
  384. public void setAvoidUnreachableObjects(boolean avoid) {
  385. // Do nothing by default.
  386. }
  387. /**
  388. * An index that can be used to speed up ObjectWalks.
  389. *
  390. * @return the index or null if one does not exist.
  391. * @throws IOException
  392. * when the index fails to load
  393. * @since 3.0
  394. */
  395. public BitmapIndex getBitmapIndex() throws IOException {
  396. return null;
  397. }
  398. /**
  399. * @return the {@link ObjectInserter} from which this reader was created
  400. * using {@code inserter.newReader()}, or null if this reader was not
  401. * created from an inserter.
  402. * @since 4.4
  403. */
  404. @Nullable
  405. public ObjectInserter getCreatedFromInserter() {
  406. return null;
  407. }
  408. /**
  409. * Release any resources used by this reader.
  410. * <p>
  411. * A reader that has been released can be used again, but may need to be
  412. * released after the subsequent usage.
  413. *
  414. * @since 4.0
  415. */
  416. @Override
  417. public abstract void close();
  418. /**
  419. * Sets the threshold at which a file will be streamed rather than loaded
  420. * entirely into memory
  421. *
  422. * @param threshold
  423. * the new threshold
  424. * @since 4.6
  425. */
  426. public void setStreamFileThreshold(int threshold) {
  427. streamFileThreshold = threshold;
  428. }
  429. /**
  430. * Returns the threshold at which a file will be streamed rather than loaded
  431. * entirely into memory
  432. *
  433. * @return the threshold in bytes
  434. * @since 4.6
  435. */
  436. public int getStreamFileThreshold() {
  437. return streamFileThreshold;
  438. }
  439. /**
  440. * Wraps a delegate ObjectReader.
  441. *
  442. * @since 4.4
  443. */
  444. public static abstract class Filter extends ObjectReader {
  445. /**
  446. * @return delegate ObjectReader to handle all processing.
  447. * @since 4.4
  448. */
  449. protected abstract ObjectReader delegate();
  450. @Override
  451. public ObjectReader newReader() {
  452. return delegate().newReader();
  453. }
  454. @Override
  455. public AbbreviatedObjectId abbreviate(AnyObjectId objectId)
  456. throws IOException {
  457. return delegate().abbreviate(objectId);
  458. }
  459. @Override
  460. public AbbreviatedObjectId abbreviate(AnyObjectId objectId, int len)
  461. throws IOException {
  462. return delegate().abbreviate(objectId, len);
  463. }
  464. @Override
  465. public Collection<ObjectId> resolve(AbbreviatedObjectId id)
  466. throws IOException {
  467. return delegate().resolve(id);
  468. }
  469. @Override
  470. public boolean has(AnyObjectId objectId) throws IOException {
  471. return delegate().has(objectId);
  472. }
  473. @Override
  474. public boolean has(AnyObjectId objectId, int typeHint) throws IOException {
  475. return delegate().has(objectId, typeHint);
  476. }
  477. @Override
  478. public ObjectLoader open(AnyObjectId objectId)
  479. throws MissingObjectException, IOException {
  480. return delegate().open(objectId);
  481. }
  482. @Override
  483. public ObjectLoader open(AnyObjectId objectId, int typeHint)
  484. throws MissingObjectException, IncorrectObjectTypeException,
  485. IOException {
  486. return delegate().open(objectId, typeHint);
  487. }
  488. @Override
  489. public Set<ObjectId> getShallowCommits() throws IOException {
  490. return delegate().getShallowCommits();
  491. }
  492. @Override
  493. public <T extends ObjectId> AsyncObjectLoaderQueue<T> open(
  494. Iterable<T> objectIds, boolean reportMissing) {
  495. return delegate().open(objectIds, reportMissing);
  496. }
  497. @Override
  498. public long getObjectSize(AnyObjectId objectId, int typeHint)
  499. throws MissingObjectException, IncorrectObjectTypeException,
  500. IOException {
  501. return delegate().getObjectSize(objectId, typeHint);
  502. }
  503. @Override
  504. public <T extends ObjectId> AsyncObjectSizeQueue<T> getObjectSize(
  505. Iterable<T> objectIds, boolean reportMissing) {
  506. return delegate().getObjectSize(objectIds, reportMissing);
  507. }
  508. @Override
  509. public void setAvoidUnreachableObjects(boolean avoid) {
  510. delegate().setAvoidUnreachableObjects(avoid);
  511. }
  512. @Override
  513. public BitmapIndex getBitmapIndex() throws IOException {
  514. return delegate().getBitmapIndex();
  515. }
  516. @Override
  517. @Nullable
  518. public ObjectInserter getCreatedFromInserter() {
  519. return delegate().getCreatedFromInserter();
  520. }
  521. @Override
  522. public void close() {
  523. delegate().close();
  524. }
  525. }
  526. }