You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

PackReverseIndex.java 5.3KB

Added read/write support for pack bitmap index. A pack bitmap index is an additional index of compressed bitmaps of the object graph. Furthermore, a logical API of the index functionality is included, as it is expected to be used by the PackWriter. Compressed bitmaps are created using the javaewah library, which is a word-aligned compressed variant of the Java bitset class based on run-length encoding. The library only works with positive integer values. Thus, the maximum number of ObjectIds in a pack file that this index can currently support is limited to Integer.MAX_VALUE. Every ObjectId is given an integer mapping. The integer is the position of the ObjectId in the complete ObjectId list, sorted by offset, for the pack file. That integer is what the bitmaps use to reference the ObjectId. Currently, the new index format can only be used with pack files that contain a complete closure of the object graph e.g. the result of a garbage collection. The index file includes four bitmaps for the Git object types i.e. commits, trees, blobs, and tags. In addition, a collection of bitmaps keyed by an ObjectId is also included. The bitmap for each entry in the collection represents the full closure of ObjectIds reachable from the keyed ObjectId (including the keyed ObjectId itself). The bitmaps are further compressed by XORing the current bitmaps against prior bitmaps in the index, and selecting the smallest representation. The XOR'd bitmap and offset from the current entry to the position of the bitmap to XOR against is the actual representation of the entry in the index file. Each entry contains one byte, which is currently used to note whether the bitmap should be blindly reused. Change-Id: Id328724bf6b4c8366a088233098c18643edcf40f
11 년 전
Added read/write support for pack bitmap index. A pack bitmap index is an additional index of compressed bitmaps of the object graph. Furthermore, a logical API of the index functionality is included, as it is expected to be used by the PackWriter. Compressed bitmaps are created using the javaewah library, which is a word-aligned compressed variant of the Java bitset class based on run-length encoding. The library only works with positive integer values. Thus, the maximum number of ObjectIds in a pack file that this index can currently support is limited to Integer.MAX_VALUE. Every ObjectId is given an integer mapping. The integer is the position of the ObjectId in the complete ObjectId list, sorted by offset, for the pack file. That integer is what the bitmaps use to reference the ObjectId. Currently, the new index format can only be used with pack files that contain a complete closure of the object graph e.g. the result of a garbage collection. The index file includes four bitmaps for the Git object types i.e. commits, trees, blobs, and tags. In addition, a collection of bitmaps keyed by an ObjectId is also included. The bitmap for each entry in the collection represents the full closure of ObjectIds reachable from the keyed ObjectId (including the keyed ObjectId itself). The bitmaps are further compressed by XORing the current bitmaps against prior bitmaps in the index, and selecting the smallest representation. The XOR'd bitmap and offset from the current entry to the position of the bitmap to XOR against is the actual representation of the entry in the index file. Each entry contains one byte, which is currently used to note whether the bitmap should be blindly reused. Change-Id: Id328724bf6b4c8366a088233098c18643edcf40f
11 년 전
Added read/write support for pack bitmap index. A pack bitmap index is an additional index of compressed bitmaps of the object graph. Furthermore, a logical API of the index functionality is included, as it is expected to be used by the PackWriter. Compressed bitmaps are created using the javaewah library, which is a word-aligned compressed variant of the Java bitset class based on run-length encoding. The library only works with positive integer values. Thus, the maximum number of ObjectIds in a pack file that this index can currently support is limited to Integer.MAX_VALUE. Every ObjectId is given an integer mapping. The integer is the position of the ObjectId in the complete ObjectId list, sorted by offset, for the pack file. That integer is what the bitmaps use to reference the ObjectId. Currently, the new index format can only be used with pack files that contain a complete closure of the object graph e.g. the result of a garbage collection. The index file includes four bitmaps for the Git object types i.e. commits, trees, blobs, and tags. In addition, a collection of bitmaps keyed by an ObjectId is also included. The bitmap for each entry in the collection represents the full closure of ObjectIds reachable from the keyed ObjectId (including the keyed ObjectId itself). The bitmaps are further compressed by XORing the current bitmaps against prior bitmaps in the index, and selecting the smallest representation. The XOR'd bitmap and offset from the current entry to the position of the bitmap to XOR against is the actual representation of the entry in the index file. Each entry contains one byte, which is currently used to note whether the bitmap should be blindly reused. Change-Id: Id328724bf6b4c8366a088233098c18643edcf40f
11 년 전
Added read/write support for pack bitmap index. A pack bitmap index is an additional index of compressed bitmaps of the object graph. Furthermore, a logical API of the index functionality is included, as it is expected to be used by the PackWriter. Compressed bitmaps are created using the javaewah library, which is a word-aligned compressed variant of the Java bitset class based on run-length encoding. The library only works with positive integer values. Thus, the maximum number of ObjectIds in a pack file that this index can currently support is limited to Integer.MAX_VALUE. Every ObjectId is given an integer mapping. The integer is the position of the ObjectId in the complete ObjectId list, sorted by offset, for the pack file. That integer is what the bitmaps use to reference the ObjectId. Currently, the new index format can only be used with pack files that contain a complete closure of the object graph e.g. the result of a garbage collection. The index file includes four bitmaps for the Git object types i.e. commits, trees, blobs, and tags. In addition, a collection of bitmaps keyed by an ObjectId is also included. The bitmap for each entry in the collection represents the full closure of ObjectIds reachable from the keyed ObjectId (including the keyed ObjectId itself). The bitmaps are further compressed by XORing the current bitmaps against prior bitmaps in the index, and selecting the smallest representation. The XOR'd bitmap and offset from the current entry to the position of the bitmap to XOR against is the actual representation of the entry in the index file. Each entry contains one byte, which is currently used to note whether the bitmap should be blindly reused. Change-Id: Id328724bf6b4c8366a088233098c18643edcf40f
11 년 전
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185
  1. /*
  2. * Copyright (C) 2008, Marek Zawirski <marek.zawirski@gmail.com> and others
  3. *
  4. * This program and the accompanying materials are made available under the
  5. * terms of the Eclipse Distribution License v. 1.0 which is available at
  6. * https://www.eclipse.org/org/documents/edl-v10.php.
  7. *
  8. * SPDX-License-Identifier: BSD-3-Clause
  9. */
  10. package org.eclipse.jgit.internal.storage.file;
  11. import java.text.MessageFormat;
  12. import org.eclipse.jgit.errors.CorruptObjectException;
  13. import org.eclipse.jgit.internal.JGitText;
  14. import org.eclipse.jgit.internal.storage.file.PackIndex.MutableEntry;
  15. import org.eclipse.jgit.lib.ObjectId;
  16. /**
  17. * <p>
  18. * Reverse index for forward pack index. Provides operations based on offset
  19. * instead of object id. Such offset-based reverse lookups are performed in
  20. * O(log n) time.
  21. * </p>
  22. *
  23. * @see PackIndex
  24. * @see Pack
  25. */
  26. public class PackReverseIndex {
  27. /** Index we were created from, and that has our ObjectId data. */
  28. private final PackIndex index;
  29. /** The number of bytes per entry in the offsetIndex. */
  30. private final long bucketSize;
  31. /**
  32. * An index into the nth mapping, where the value is the position after the
  33. * the last index that contains the values of the bucket. For example given
  34. * offset o (and bucket = o / bucketSize), the offset will be contained in
  35. * the range nth[offsetIndex[bucket - 1]] inclusive to
  36. * nth[offsetIndex[bucket]] exclusive.
  37. *
  38. * See {@link #binarySearch}
  39. */
  40. private final int[] offsetIndex;
  41. /** Mapping from indices in offset order to indices in SHA-1 order. */
  42. private final int[] nth;
  43. /**
  44. * Create reverse index from straight/forward pack index, by indexing all
  45. * its entries.
  46. *
  47. * @param packIndex
  48. * forward index - entries to (reverse) index.
  49. */
  50. public PackReverseIndex(PackIndex packIndex) {
  51. index = packIndex;
  52. final long cnt = index.getObjectCount();
  53. if (cnt + 1 > Integer.MAX_VALUE)
  54. throw new IllegalArgumentException(
  55. JGitText.get().hugeIndexesAreNotSupportedByJgitYet);
  56. if (cnt == 0) {
  57. bucketSize = Long.MAX_VALUE;
  58. offsetIndex = new int[1];
  59. nth = new int[0];
  60. return;
  61. }
  62. final long[] offsetsBySha1 = new long[(int) cnt];
  63. long maxOffset = 0;
  64. int ith = 0;
  65. for (MutableEntry me : index) {
  66. final long o = me.getOffset();
  67. offsetsBySha1[ith++] = o;
  68. if (o > maxOffset)
  69. maxOffset = o;
  70. }
  71. bucketSize = maxOffset / cnt + 1;
  72. int[] bucketIndex = new int[(int) cnt];
  73. int[] bucketValues = new int[(int) cnt + 1];
  74. for (int oi = 0; oi < offsetsBySha1.length; oi++) {
  75. final long o = offsetsBySha1[oi];
  76. final int bucket = (int) (o / bucketSize);
  77. final int bucketValuesPos = oi + 1;
  78. final int current = bucketIndex[bucket];
  79. bucketIndex[bucket] = bucketValuesPos;
  80. bucketValues[bucketValuesPos] = current;
  81. }
  82. int nthByOffset = 0;
  83. nth = new int[offsetsBySha1.length];
  84. offsetIndex = bucketIndex; // Reuse the allocation
  85. for (int bi = 0; bi < bucketIndex.length; bi++) {
  86. final int start = nthByOffset;
  87. // Insertion sort of the values in the bucket.
  88. for (int vi = bucketIndex[bi]; vi > 0; vi = bucketValues[vi]) {
  89. final int nthBySha1 = vi - 1;
  90. final long o = offsetsBySha1[nthBySha1];
  91. int insertion = nthByOffset++;
  92. for (; start < insertion; insertion--) {
  93. if (o > offsetsBySha1[nth[insertion - 1]])
  94. break;
  95. nth[insertion] = nth[insertion - 1];
  96. }
  97. nth[insertion] = nthBySha1;
  98. }
  99. offsetIndex[bi] = nthByOffset;
  100. }
  101. }
  102. /**
  103. * Search for object id with the specified start offset in this pack
  104. * (reverse) index.
  105. *
  106. * @param offset
  107. * start offset of object to find.
  108. * @return object id for this offset, or null if no object was found.
  109. */
  110. public ObjectId findObject(long offset) {
  111. final int ith = binarySearch(offset);
  112. if (ith < 0)
  113. return null;
  114. return index.getObjectId(nth[ith]);
  115. }
  116. /**
  117. * Search for the next offset to the specified offset in this pack (reverse)
  118. * index.
  119. *
  120. * @param offset
  121. * start offset of previous object (must be valid-existing
  122. * offset).
  123. * @param maxOffset
  124. * maximum offset in a pack (returned when there is no next
  125. * offset).
  126. * @return offset of the next object in a pack or maxOffset if provided
  127. * offset was the last one.
  128. * @throws org.eclipse.jgit.errors.CorruptObjectException
  129. * when there is no object with the provided offset.
  130. */
  131. public long findNextOffset(long offset, long maxOffset)
  132. throws CorruptObjectException {
  133. final int ith = binarySearch(offset);
  134. if (ith < 0)
  135. throw new CorruptObjectException(
  136. MessageFormat.format(
  137. JGitText.get().cantFindObjectInReversePackIndexForTheSpecifiedOffset,
  138. Long.valueOf(offset)));
  139. if (ith + 1 == nth.length)
  140. return maxOffset;
  141. return index.getOffset(nth[ith + 1]);
  142. }
  143. int findPostion(long offset) {
  144. return binarySearch(offset);
  145. }
  146. private int binarySearch(long offset) {
  147. int bucket = (int) (offset / bucketSize);
  148. int low = bucket == 0 ? 0 : offsetIndex[bucket - 1];
  149. int high = offsetIndex[bucket];
  150. while (low < high) {
  151. final int mid = (low + high) >>> 1;
  152. final long o = index.getOffset(nth[mid]);
  153. if (offset < o)
  154. high = mid;
  155. else if (offset == o)
  156. return mid;
  157. else
  158. low = mid + 1;
  159. }
  160. return -1;
  161. }
  162. ObjectId findObjectByPosition(int nthPosition) {
  163. return index.getObjectId(nth[nthPosition]);
  164. }
  165. }