Added read/write support for pack bitmap index.
A pack bitmap index is an additional index of compressed
bitmaps of the object graph. Furthermore, a logical API of the index
functionality is included, as it is expected to be used by the
PackWriter.
Compressed bitmaps are created using the javaewah library, which is a
word-aligned compressed variant of the Java bitset class based on
run-length encoding. The library only works with positive integer
values. Thus, the maximum number of ObjectIds in a pack file that
this index can currently support is limited to Integer.MAX_VALUE.
Every ObjectId is given an integer mapping. The integer is the
position of the ObjectId in the complete ObjectId list, sorted
by offset, for the pack file. That integer is what the bitmaps
use to reference the ObjectId. Currently, the new index format can
only be used with pack files that contain a complete closure of the
object graph e.g. the result of a garbage collection.
The index file includes four bitmaps for the Git object types i.e.
commits, trees, blobs, and tags. In addition, a collection of
bitmaps keyed by an ObjectId is also included. The bitmap for each entry
in the collection represents the full closure of ObjectIds reachable
from the keyed ObjectId (including the keyed ObjectId itself). The
bitmaps are further compressed by XORing the current bitmaps against
prior bitmaps in the index, and selecting the smallest representation.
The XOR'd bitmap and offset from the current entry to the position
of the bitmap to XOR against is the actual representation of the entry
in the index file. Each entry contains one byte, which is currently
used to note whether the bitmap should be blindly reused.
Change-Id: Id328724bf6b4c8366a088233098c18643edcf40f
11 년 전 Added read/write support for pack bitmap index.
A pack bitmap index is an additional index of compressed
bitmaps of the object graph. Furthermore, a logical API of the index
functionality is included, as it is expected to be used by the
PackWriter.
Compressed bitmaps are created using the javaewah library, which is a
word-aligned compressed variant of the Java bitset class based on
run-length encoding. The library only works with positive integer
values. Thus, the maximum number of ObjectIds in a pack file that
this index can currently support is limited to Integer.MAX_VALUE.
Every ObjectId is given an integer mapping. The integer is the
position of the ObjectId in the complete ObjectId list, sorted
by offset, for the pack file. That integer is what the bitmaps
use to reference the ObjectId. Currently, the new index format can
only be used with pack files that contain a complete closure of the
object graph e.g. the result of a garbage collection.
The index file includes four bitmaps for the Git object types i.e.
commits, trees, blobs, and tags. In addition, a collection of
bitmaps keyed by an ObjectId is also included. The bitmap for each entry
in the collection represents the full closure of ObjectIds reachable
from the keyed ObjectId (including the keyed ObjectId itself). The
bitmaps are further compressed by XORing the current bitmaps against
prior bitmaps in the index, and selecting the smallest representation.
The XOR'd bitmap and offset from the current entry to the position
of the bitmap to XOR against is the actual representation of the entry
in the index file. Each entry contains one byte, which is currently
used to note whether the bitmap should be blindly reused.
Change-Id: Id328724bf6b4c8366a088233098c18643edcf40f
11 년 전 Added read/write support for pack bitmap index.
A pack bitmap index is an additional index of compressed
bitmaps of the object graph. Furthermore, a logical API of the index
functionality is included, as it is expected to be used by the
PackWriter.
Compressed bitmaps are created using the javaewah library, which is a
word-aligned compressed variant of the Java bitset class based on
run-length encoding. The library only works with positive integer
values. Thus, the maximum number of ObjectIds in a pack file that
this index can currently support is limited to Integer.MAX_VALUE.
Every ObjectId is given an integer mapping. The integer is the
position of the ObjectId in the complete ObjectId list, sorted
by offset, for the pack file. That integer is what the bitmaps
use to reference the ObjectId. Currently, the new index format can
only be used with pack files that contain a complete closure of the
object graph e.g. the result of a garbage collection.
The index file includes four bitmaps for the Git object types i.e.
commits, trees, blobs, and tags. In addition, a collection of
bitmaps keyed by an ObjectId is also included. The bitmap for each entry
in the collection represents the full closure of ObjectIds reachable
from the keyed ObjectId (including the keyed ObjectId itself). The
bitmaps are further compressed by XORing the current bitmaps against
prior bitmaps in the index, and selecting the smallest representation.
The XOR'd bitmap and offset from the current entry to the position
of the bitmap to XOR against is the actual representation of the entry
in the index file. Each entry contains one byte, which is currently
used to note whether the bitmap should be blindly reused.
Change-Id: Id328724bf6b4c8366a088233098c18643edcf40f
11 년 전 Added read/write support for pack bitmap index.
A pack bitmap index is an additional index of compressed
bitmaps of the object graph. Furthermore, a logical API of the index
functionality is included, as it is expected to be used by the
PackWriter.
Compressed bitmaps are created using the javaewah library, which is a
word-aligned compressed variant of the Java bitset class based on
run-length encoding. The library only works with positive integer
values. Thus, the maximum number of ObjectIds in a pack file that
this index can currently support is limited to Integer.MAX_VALUE.
Every ObjectId is given an integer mapping. The integer is the
position of the ObjectId in the complete ObjectId list, sorted
by offset, for the pack file. That integer is what the bitmaps
use to reference the ObjectId. Currently, the new index format can
only be used with pack files that contain a complete closure of the
object graph e.g. the result of a garbage collection.
The index file includes four bitmaps for the Git object types i.e.
commits, trees, blobs, and tags. In addition, a collection of
bitmaps keyed by an ObjectId is also included. The bitmap for each entry
in the collection represents the full closure of ObjectIds reachable
from the keyed ObjectId (including the keyed ObjectId itself). The
bitmaps are further compressed by XORing the current bitmaps against
prior bitmaps in the index, and selecting the smallest representation.
The XOR'd bitmap and offset from the current entry to the position
of the bitmap to XOR against is the actual representation of the entry
in the index file. Each entry contains one byte, which is currently
used to note whether the bitmap should be blindly reused.
Change-Id: Id328724bf6b4c8366a088233098c18643edcf40f
11 년 전 |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185 |
- /*
- * Copyright (C) 2008, Marek Zawirski <marek.zawirski@gmail.com> and others
- *
- * This program and the accompanying materials are made available under the
- * terms of the Eclipse Distribution License v. 1.0 which is available at
- * https://www.eclipse.org/org/documents/edl-v10.php.
- *
- * SPDX-License-Identifier: BSD-3-Clause
- */
-
- package org.eclipse.jgit.internal.storage.file;
-
- import java.text.MessageFormat;
-
- import org.eclipse.jgit.errors.CorruptObjectException;
- import org.eclipse.jgit.internal.JGitText;
- import org.eclipse.jgit.internal.storage.file.PackIndex.MutableEntry;
- import org.eclipse.jgit.lib.ObjectId;
-
- /**
- * <p>
- * Reverse index for forward pack index. Provides operations based on offset
- * instead of object id. Such offset-based reverse lookups are performed in
- * O(log n) time.
- * </p>
- *
- * @see PackIndex
- * @see Pack
- */
- public class PackReverseIndex {
- /** Index we were created from, and that has our ObjectId data. */
- private final PackIndex index;
-
- /** The number of bytes per entry in the offsetIndex. */
- private final long bucketSize;
-
- /**
- * An index into the nth mapping, where the value is the position after the
- * the last index that contains the values of the bucket. For example given
- * offset o (and bucket = o / bucketSize), the offset will be contained in
- * the range nth[offsetIndex[bucket - 1]] inclusive to
- * nth[offsetIndex[bucket]] exclusive.
- *
- * See {@link #binarySearch}
- */
- private final int[] offsetIndex;
-
- /** Mapping from indices in offset order to indices in SHA-1 order. */
- private final int[] nth;
-
- /**
- * Create reverse index from straight/forward pack index, by indexing all
- * its entries.
- *
- * @param packIndex
- * forward index - entries to (reverse) index.
- */
- public PackReverseIndex(PackIndex packIndex) {
- index = packIndex;
-
- final long cnt = index.getObjectCount();
- if (cnt + 1 > Integer.MAX_VALUE)
- throw new IllegalArgumentException(
- JGitText.get().hugeIndexesAreNotSupportedByJgitYet);
-
- if (cnt == 0) {
- bucketSize = Long.MAX_VALUE;
- offsetIndex = new int[1];
- nth = new int[0];
- return;
- }
-
- final long[] offsetsBySha1 = new long[(int) cnt];
-
- long maxOffset = 0;
- int ith = 0;
- for (MutableEntry me : index) {
- final long o = me.getOffset();
- offsetsBySha1[ith++] = o;
- if (o > maxOffset)
- maxOffset = o;
- }
-
- bucketSize = maxOffset / cnt + 1;
- int[] bucketIndex = new int[(int) cnt];
- int[] bucketValues = new int[(int) cnt + 1];
- for (int oi = 0; oi < offsetsBySha1.length; oi++) {
- final long o = offsetsBySha1[oi];
- final int bucket = (int) (o / bucketSize);
- final int bucketValuesPos = oi + 1;
- final int current = bucketIndex[bucket];
- bucketIndex[bucket] = bucketValuesPos;
- bucketValues[bucketValuesPos] = current;
- }
-
- int nthByOffset = 0;
- nth = new int[offsetsBySha1.length];
- offsetIndex = bucketIndex; // Reuse the allocation
- for (int bi = 0; bi < bucketIndex.length; bi++) {
- final int start = nthByOffset;
- // Insertion sort of the values in the bucket.
- for (int vi = bucketIndex[bi]; vi > 0; vi = bucketValues[vi]) {
- final int nthBySha1 = vi - 1;
- final long o = offsetsBySha1[nthBySha1];
- int insertion = nthByOffset++;
- for (; start < insertion; insertion--) {
- if (o > offsetsBySha1[nth[insertion - 1]])
- break;
- nth[insertion] = nth[insertion - 1];
- }
- nth[insertion] = nthBySha1;
- }
- offsetIndex[bi] = nthByOffset;
- }
- }
-
- /**
- * Search for object id with the specified start offset in this pack
- * (reverse) index.
- *
- * @param offset
- * start offset of object to find.
- * @return object id for this offset, or null if no object was found.
- */
- public ObjectId findObject(long offset) {
- final int ith = binarySearch(offset);
- if (ith < 0)
- return null;
- return index.getObjectId(nth[ith]);
- }
-
- /**
- * Search for the next offset to the specified offset in this pack (reverse)
- * index.
- *
- * @param offset
- * start offset of previous object (must be valid-existing
- * offset).
- * @param maxOffset
- * maximum offset in a pack (returned when there is no next
- * offset).
- * @return offset of the next object in a pack or maxOffset if provided
- * offset was the last one.
- * @throws org.eclipse.jgit.errors.CorruptObjectException
- * when there is no object with the provided offset.
- */
- public long findNextOffset(long offset, long maxOffset)
- throws CorruptObjectException {
- final int ith = binarySearch(offset);
- if (ith < 0)
- throw new CorruptObjectException(
- MessageFormat.format(
- JGitText.get().cantFindObjectInReversePackIndexForTheSpecifiedOffset,
- Long.valueOf(offset)));
-
- if (ith + 1 == nth.length)
- return maxOffset;
- return index.getOffset(nth[ith + 1]);
- }
-
- int findPostion(long offset) {
- return binarySearch(offset);
- }
-
- private int binarySearch(long offset) {
- int bucket = (int) (offset / bucketSize);
- int low = bucket == 0 ? 0 : offsetIndex[bucket - 1];
- int high = offsetIndex[bucket];
- while (low < high) {
- final int mid = (low + high) >>> 1;
- final long o = index.getOffset(nth[mid]);
- if (offset < o)
- high = mid;
- else if (offset == o)
- return mid;
- else
- low = mid + 1;
- }
- return -1;
- }
-
- ObjectId findObjectByPosition(int nthPosition) {
- return index.getObjectId(nth[nthPosition]);
- }
- }
|