You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

WindowCursor.java 12KB

Added read/write support for pack bitmap index. A pack bitmap index is an additional index of compressed bitmaps of the object graph. Furthermore, a logical API of the index functionality is included, as it is expected to be used by the PackWriter. Compressed bitmaps are created using the javaewah library, which is a word-aligned compressed variant of the Java bitset class based on run-length encoding. The library only works with positive integer values. Thus, the maximum number of ObjectIds in a pack file that this index can currently support is limited to Integer.MAX_VALUE. Every ObjectId is given an integer mapping. The integer is the position of the ObjectId in the complete ObjectId list, sorted by offset, for the pack file. That integer is what the bitmaps use to reference the ObjectId. Currently, the new index format can only be used with pack files that contain a complete closure of the object graph e.g. the result of a garbage collection. The index file includes four bitmaps for the Git object types i.e. commits, trees, blobs, and tags. In addition, a collection of bitmaps keyed by an ObjectId is also included. The bitmap for each entry in the collection represents the full closure of ObjectIds reachable from the keyed ObjectId (including the keyed ObjectId itself). The bitmaps are further compressed by XORing the current bitmaps against prior bitmaps in the index, and selecting the smallest representation. The XOR'd bitmap and offset from the current entry to the position of the bitmap to XOR against is the actual representation of the entry in the index file. Each entry contains one byte, which is currently used to note whether the bitmap should be blindly reused. Change-Id: Id328724bf6b4c8366a088233098c18643edcf40f
12 years ago
Support creating pack bitmap indexes in PackWriter. Update the PackWriter to support writing out pack bitmap indexes, a parallel ".bitmap" file to the ".pack" file. Bitmaps are selected at commits every 1 to 5,000 commits for each unique path from the start. The most recent 100 commits are all bitmapped. The next 19,000 commits have a bitmaps every 100 commits. The remaining commits have a bitmap every 5,000 commits. Commits with more than 1 parent are prefered over ones with 1 or less. Furthermore, previously computed bitmaps are reused, if the previous entry had the reuse flag set, which is set when the bitmap was placed at the max allowed distance. Bitmaps are used to speed up the counting phase when packing, for requests that are not shallow. The PackWriterBitmapWalker uses a RevFilter to proactively mark commits with RevFlag.SEEN, when they appear in a bitmap. The walker produces the full closure of reachable ObjectIds, given the collection of starting ObjectIds. For fetch request, two ObjectWalks are executed to compute the ObjectIds reachable from the haves and from the wants. The ObjectIds needed to be written are determined by taking all the resulting wants AND NOT the haves. For clone requests, we get cached pack support for "free" since it is possible to determine if all of the ObjectIds in a pack file are included in the resulting list of ObjectIds to write. On my machine, the best times for clones and fetches of the linux kernel repository (with about 2.6M objects and 300K commits) are tabulated below: Operation Index V2 Index VE003 Clone 37530ms (524.06 MiB) 82ms (524.06 MiB) Fetch (1 commit back) 75ms 107ms Fetch (10 commits back) 456ms (269.51 KiB) 341ms (265.19 KiB) Fetch (100 commits back) 449ms (269.91 KiB) 337ms (267.28 KiB) Fetch (1000 commits back) 2229ms ( 14.75 MiB) 189ms ( 14.42 MiB) Fetch (10000 commits back) 2177ms ( 16.30 MiB) 254ms ( 15.88 MiB) Fetch (100000 commits back) 14340ms (185.83 MiB) 1655ms (189.39 MiB) Change-Id: Icdb0cdd66ff168917fb9ef17b96093990cc6a98d
12 years ago
Added read/write support for pack bitmap index. A pack bitmap index is an additional index of compressed bitmaps of the object graph. Furthermore, a logical API of the index functionality is included, as it is expected to be used by the PackWriter. Compressed bitmaps are created using the javaewah library, which is a word-aligned compressed variant of the Java bitset class based on run-length encoding. The library only works with positive integer values. Thus, the maximum number of ObjectIds in a pack file that this index can currently support is limited to Integer.MAX_VALUE. Every ObjectId is given an integer mapping. The integer is the position of the ObjectId in the complete ObjectId list, sorted by offset, for the pack file. That integer is what the bitmaps use to reference the ObjectId. Currently, the new index format can only be used with pack files that contain a complete closure of the object graph e.g. the result of a garbage collection. The index file includes four bitmaps for the Git object types i.e. commits, trees, blobs, and tags. In addition, a collection of bitmaps keyed by an ObjectId is also included. The bitmap for each entry in the collection represents the full closure of ObjectIds reachable from the keyed ObjectId (including the keyed ObjectId itself). The bitmaps are further compressed by XORing the current bitmaps against prior bitmaps in the index, and selecting the smallest representation. The XOR'd bitmap and offset from the current entry to the position of the bitmap to XOR against is the actual representation of the entry in the index file. Each entry contains one byte, which is currently used to note whether the bitmap should be blindly reused. Change-Id: Id328724bf6b4c8366a088233098c18643edcf40f
12 years ago
Support creating pack bitmap indexes in PackWriter. Update the PackWriter to support writing out pack bitmap indexes, a parallel ".bitmap" file to the ".pack" file. Bitmaps are selected at commits every 1 to 5,000 commits for each unique path from the start. The most recent 100 commits are all bitmapped. The next 19,000 commits have a bitmaps every 100 commits. The remaining commits have a bitmap every 5,000 commits. Commits with more than 1 parent are prefered over ones with 1 or less. Furthermore, previously computed bitmaps are reused, if the previous entry had the reuse flag set, which is set when the bitmap was placed at the max allowed distance. Bitmaps are used to speed up the counting phase when packing, for requests that are not shallow. The PackWriterBitmapWalker uses a RevFilter to proactively mark commits with RevFlag.SEEN, when they appear in a bitmap. The walker produces the full closure of reachable ObjectIds, given the collection of starting ObjectIds. For fetch request, two ObjectWalks are executed to compute the ObjectIds reachable from the haves and from the wants. The ObjectIds needed to be written are determined by taking all the resulting wants AND NOT the haves. For clone requests, we get cached pack support for "free" since it is possible to determine if all of the ObjectIds in a pack file are included in the resulting list of ObjectIds to write. On my machine, the best times for clones and fetches of the linux kernel repository (with about 2.6M objects and 300K commits) are tabulated below: Operation Index V2 Index VE003 Clone 37530ms (524.06 MiB) 82ms (524.06 MiB) Fetch (1 commit back) 75ms 107ms Fetch (10 commits back) 456ms (269.51 KiB) 341ms (265.19 KiB) Fetch (100 commits back) 449ms (269.91 KiB) 337ms (267.28 KiB) Fetch (1000 commits back) 2229ms ( 14.75 MiB) 189ms ( 14.42 MiB) Fetch (10000 commits back) 2177ms ( 16.30 MiB) 254ms ( 15.88 MiB) Fetch (100000 commits back) 14340ms (185.83 MiB) 1655ms (189.39 MiB) Change-Id: Icdb0cdd66ff168917fb9ef17b96093990cc6a98d
12 years ago
Extract PackFile specific code to ObjectToPack subclass The ObjectReader class is dual-purposed into being a factory for the ObjectToPack, permitting specific ObjectDatabase implementations to override the method and offer their own custom subclass of the generic ObjectToPack class. By allowing them to directly extend the type, each implementation can add custom fields to support tracking where an object is stored, without incurring any additional penalties like a parallel Map<ObjectId,Object> would cost. The reader was chosen to act as a factory rather than the database, as the reader will eventually be tied more tightly with the ObjectWalk and TreeWalk. During object enumeration the reader would have had to load the object for the RevWalk, and may chose to cache object position data internally so it can later be reused and fed into the ObjectToPack instance supplied to the PackWriter. Since a reader is not thread-safe, and is scoped to this PackWriter and its internal ObjectWalk, its a great place for the database to perform caching, if any. Right now this change goes a bit backwards by changing what should be generic ObjectToPack references inside of PackWriter to the very PackFile specific LocalObjectToPack subclass. We will correct these in a later commit as we start to refine what the ObjectToPack API will eventually look like in order to better support the PackWriter. Change-Id: I9f047d26b97e46dee3bc0ccb4060bbebedbe8ea9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago
PackWriter: Support reuse of entire packs The most expensive part of packing a repository for transport to another system is enumerating all of the objects in the repository. Once this gets to the size of the linux-2.6 repository (1.8 million objects), enumeration can take several CPU minutes and costs a lot of temporary working set memory. Teach PackWriter to efficiently reuse an existing "cached pack" by answering a clone request with a thin pack followed by a larger cached pack appended to the end. This requires the repository owner to first construct the cached pack by hand, and record the tip commits inside of $GIT_DIR/objects/info/cached-packs: cd $GIT_DIR root=$(git rev-parse master) tmp=objects/.tmp-$$ names=$(echo $root | git pack-objects --keep-true-parents --revs $tmp) for n in $names; do chmod a-w $tmp-$n.pack $tmp-$n.idx touch objects/pack/pack-$n.keep mv $tmp-$n.pack objects/pack/pack-$n.pack mv $tmp-$n.idx objects/pack/pack-$n.idx done (echo "+ $root"; for n in $names; do echo "P $n"; done; echo) >>objects/info/cached-packs git repack -a -d When a clone request needs to include $root, the corresponding cached pack will be copied as-is, rather than enumerating all of the objects that are reachable from $root. For a linux-2.6 kernel repository that should be about 376 MiB, the above process creates two packs of 368 MiB and 38 MiB[1]. This is a local disk usage increase of ~26 MiB, due to reduced delta compression between the large cached pack and the smaller recent activity pack. The overhead is similar to 1 full copy of the compressed project sources. With this cached pack in hand, JGit daemon completes a clone request in 1m17s less time, but a slightly larger data transfer (+2.39 MiB): Before: remote: Counting objects: 1861830, done remote: Finding sources: 100% (1861830/1861830) remote: Getting sizes: 100% (88243/88243) remote: Compressing objects: 100% (88184/88184) Receiving objects: 100% (1861830/1861830), 376.01 MiB | 19.01 MiB/s, done. remote: Total 1861830 (delta 4706), reused 1851053 (delta 1553844) Resolving deltas: 100% (1564621/1564621), done. real 3m19.005s After: remote: Counting objects: 1601, done remote: Counting objects: 1828460, done remote: Finding sources: 100% (50475/50475) remote: Getting sizes: 100% (18843/18843) remote: Compressing objects: 100% (7585/7585) remote: Total 1861830 (delta 2407), reused 1856197 (delta 37510) Receiving objects: 100% (1861830/1861830), 378.40 MiB | 31.31 MiB/s, done. Resolving deltas: 100% (1559477/1559477), done. real 2m2.938s Repository owners can periodically refresh their cached packs by repacking their repository, folding all newer objects into a larger cached pack. Since repacking is already considered to be a normal Git maintenance activity, this isn't a very big burden. [1] In this test $root was set back about two weeks. Change-Id: Ib87131d5c4b5e8c5cacb0f4fe16ff4ece554734b Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago
PackWriter: Support reuse of entire packs The most expensive part of packing a repository for transport to another system is enumerating all of the objects in the repository. Once this gets to the size of the linux-2.6 repository (1.8 million objects), enumeration can take several CPU minutes and costs a lot of temporary working set memory. Teach PackWriter to efficiently reuse an existing "cached pack" by answering a clone request with a thin pack followed by a larger cached pack appended to the end. This requires the repository owner to first construct the cached pack by hand, and record the tip commits inside of $GIT_DIR/objects/info/cached-packs: cd $GIT_DIR root=$(git rev-parse master) tmp=objects/.tmp-$$ names=$(echo $root | git pack-objects --keep-true-parents --revs $tmp) for n in $names; do chmod a-w $tmp-$n.pack $tmp-$n.idx touch objects/pack/pack-$n.keep mv $tmp-$n.pack objects/pack/pack-$n.pack mv $tmp-$n.idx objects/pack/pack-$n.idx done (echo "+ $root"; for n in $names; do echo "P $n"; done; echo) >>objects/info/cached-packs git repack -a -d When a clone request needs to include $root, the corresponding cached pack will be copied as-is, rather than enumerating all of the objects that are reachable from $root. For a linux-2.6 kernel repository that should be about 376 MiB, the above process creates two packs of 368 MiB and 38 MiB[1]. This is a local disk usage increase of ~26 MiB, due to reduced delta compression between the large cached pack and the smaller recent activity pack. The overhead is similar to 1 full copy of the compressed project sources. With this cached pack in hand, JGit daemon completes a clone request in 1m17s less time, but a slightly larger data transfer (+2.39 MiB): Before: remote: Counting objects: 1861830, done remote: Finding sources: 100% (1861830/1861830) remote: Getting sizes: 100% (88243/88243) remote: Compressing objects: 100% (88184/88184) Receiving objects: 100% (1861830/1861830), 376.01 MiB | 19.01 MiB/s, done. remote: Total 1861830 (delta 4706), reused 1851053 (delta 1553844) Resolving deltas: 100% (1564621/1564621), done. real 3m19.005s After: remote: Counting objects: 1601, done remote: Counting objects: 1828460, done remote: Finding sources: 100% (50475/50475) remote: Getting sizes: 100% (18843/18843) remote: Compressing objects: 100% (7585/7585) remote: Total 1861830 (delta 2407), reused 1856197 (delta 37510) Receiving objects: 100% (1861830/1861830), 378.40 MiB | 31.31 MiB/s, done. Resolving deltas: 100% (1559477/1559477), done. real 2m2.938s Repository owners can periodically refresh their cached packs by repacking their repository, folding all newer objects into a larger cached pack. Since repacking is already considered to be a normal Git maintenance activity, this isn't a very big burden. [1] In this test $root was set back about two weeks. Change-Id: Ib87131d5c4b5e8c5cacb0f4fe16ff4ece554734b Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago
PackWriter: Support reuse of entire packs The most expensive part of packing a repository for transport to another system is enumerating all of the objects in the repository. Once this gets to the size of the linux-2.6 repository (1.8 million objects), enumeration can take several CPU minutes and costs a lot of temporary working set memory. Teach PackWriter to efficiently reuse an existing "cached pack" by answering a clone request with a thin pack followed by a larger cached pack appended to the end. This requires the repository owner to first construct the cached pack by hand, and record the tip commits inside of $GIT_DIR/objects/info/cached-packs: cd $GIT_DIR root=$(git rev-parse master) tmp=objects/.tmp-$$ names=$(echo $root | git pack-objects --keep-true-parents --revs $tmp) for n in $names; do chmod a-w $tmp-$n.pack $tmp-$n.idx touch objects/pack/pack-$n.keep mv $tmp-$n.pack objects/pack/pack-$n.pack mv $tmp-$n.idx objects/pack/pack-$n.idx done (echo "+ $root"; for n in $names; do echo "P $n"; done; echo) >>objects/info/cached-packs git repack -a -d When a clone request needs to include $root, the corresponding cached pack will be copied as-is, rather than enumerating all of the objects that are reachable from $root. For a linux-2.6 kernel repository that should be about 376 MiB, the above process creates two packs of 368 MiB and 38 MiB[1]. This is a local disk usage increase of ~26 MiB, due to reduced delta compression between the large cached pack and the smaller recent activity pack. The overhead is similar to 1 full copy of the compressed project sources. With this cached pack in hand, JGit daemon completes a clone request in 1m17s less time, but a slightly larger data transfer (+2.39 MiB): Before: remote: Counting objects: 1861830, done remote: Finding sources: 100% (1861830/1861830) remote: Getting sizes: 100% (88243/88243) remote: Compressing objects: 100% (88184/88184) Receiving objects: 100% (1861830/1861830), 376.01 MiB | 19.01 MiB/s, done. remote: Total 1861830 (delta 4706), reused 1851053 (delta 1553844) Resolving deltas: 100% (1564621/1564621), done. real 3m19.005s After: remote: Counting objects: 1601, done remote: Counting objects: 1828460, done remote: Finding sources: 100% (50475/50475) remote: Getting sizes: 100% (18843/18843) remote: Compressing objects: 100% (7585/7585) remote: Total 1861830 (delta 2407), reused 1856197 (delta 37510) Receiving objects: 100% (1861830/1861830), 378.40 MiB | 31.31 MiB/s, done. Resolving deltas: 100% (1559477/1559477), done. real 2m2.938s Repository owners can periodically refresh their cached packs by repacking their repository, folding all newer objects into a larger cached pack. Since repacking is already considered to be a normal Git maintenance activity, this isn't a very big burden. [1] In this test $root was set back about two weeks. Change-Id: Ib87131d5c4b5e8c5cacb0f4fe16ff4ece554734b Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago
PackWriter: Support reuse of entire packs The most expensive part of packing a repository for transport to another system is enumerating all of the objects in the repository. Once this gets to the size of the linux-2.6 repository (1.8 million objects), enumeration can take several CPU minutes and costs a lot of temporary working set memory. Teach PackWriter to efficiently reuse an existing "cached pack" by answering a clone request with a thin pack followed by a larger cached pack appended to the end. This requires the repository owner to first construct the cached pack by hand, and record the tip commits inside of $GIT_DIR/objects/info/cached-packs: cd $GIT_DIR root=$(git rev-parse master) tmp=objects/.tmp-$$ names=$(echo $root | git pack-objects --keep-true-parents --revs $tmp) for n in $names; do chmod a-w $tmp-$n.pack $tmp-$n.idx touch objects/pack/pack-$n.keep mv $tmp-$n.pack objects/pack/pack-$n.pack mv $tmp-$n.idx objects/pack/pack-$n.idx done (echo "+ $root"; for n in $names; do echo "P $n"; done; echo) >>objects/info/cached-packs git repack -a -d When a clone request needs to include $root, the corresponding cached pack will be copied as-is, rather than enumerating all of the objects that are reachable from $root. For a linux-2.6 kernel repository that should be about 376 MiB, the above process creates two packs of 368 MiB and 38 MiB[1]. This is a local disk usage increase of ~26 MiB, due to reduced delta compression between the large cached pack and the smaller recent activity pack. The overhead is similar to 1 full copy of the compressed project sources. With this cached pack in hand, JGit daemon completes a clone request in 1m17s less time, but a slightly larger data transfer (+2.39 MiB): Before: remote: Counting objects: 1861830, done remote: Finding sources: 100% (1861830/1861830) remote: Getting sizes: 100% (88243/88243) remote: Compressing objects: 100% (88184/88184) Receiving objects: 100% (1861830/1861830), 376.01 MiB | 19.01 MiB/s, done. remote: Total 1861830 (delta 4706), reused 1851053 (delta 1553844) Resolving deltas: 100% (1564621/1564621), done. real 3m19.005s After: remote: Counting objects: 1601, done remote: Counting objects: 1828460, done remote: Finding sources: 100% (50475/50475) remote: Getting sizes: 100% (18843/18843) remote: Compressing objects: 100% (7585/7585) remote: Total 1861830 (delta 2407), reused 1856197 (delta 37510) Receiving objects: 100% (1861830/1861830), 378.40 MiB | 31.31 MiB/s, done. Resolving deltas: 100% (1559477/1559477), done. real 2m2.938s Repository owners can periodically refresh their cached packs by repacking their repository, folding all newer objects into a larger cached pack. Since repacking is already considered to be a normal Git maintenance activity, this isn't a very big burden. [1] In this test $root was set back about two weeks. Change-Id: Ib87131d5c4b5e8c5cacb0f4fe16ff4ece554734b Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386
  1. /*
  2. * Copyright (C) 2008-2009, Google Inc.
  3. * Copyright (C) 2006-2008, Shawn O. Pearce <spearce@spearce.org>
  4. * and other copyright owners as documented in the project's IP log.
  5. *
  6. * This program and the accompanying materials are made available
  7. * under the terms of the Eclipse Distribution License v1.0 which
  8. * accompanies this distribution, is reproduced below, and is
  9. * available at http://www.eclipse.org/org/documents/edl-v10.php
  10. *
  11. * All rights reserved.
  12. *
  13. * Redistribution and use in source and binary forms, with or
  14. * without modification, are permitted provided that the following
  15. * conditions are met:
  16. *
  17. * - Redistributions of source code must retain the above copyright
  18. * notice, this list of conditions and the following disclaimer.
  19. *
  20. * - Redistributions in binary form must reproduce the above
  21. * copyright notice, this list of conditions and the following
  22. * disclaimer in the documentation and/or other materials provided
  23. * with the distribution.
  24. *
  25. * - Neither the name of the Eclipse Foundation, Inc. nor the
  26. * names of its contributors may be used to endorse or promote
  27. * products derived from this software without specific prior
  28. * written permission.
  29. *
  30. * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
  31. * CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
  32. * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  33. * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  34. * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
  35. * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
  36. * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
  37. * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
  38. * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  39. * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
  40. * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
  41. * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
  42. * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  43. */
  44. package org.eclipse.jgit.internal.storage.file;
  45. import java.io.IOException;
  46. import java.util.Collection;
  47. import java.util.Collections;
  48. import java.util.HashSet;
  49. import java.util.List;
  50. import java.util.Set;
  51. import java.util.zip.DataFormatException;
  52. import java.util.zip.Inflater;
  53. import org.eclipse.jgit.annotations.Nullable;
  54. import org.eclipse.jgit.errors.IncorrectObjectTypeException;
  55. import org.eclipse.jgit.errors.MissingObjectException;
  56. import org.eclipse.jgit.errors.StoredObjectRepresentationNotAvailableException;
  57. import org.eclipse.jgit.internal.JGitText;
  58. import org.eclipse.jgit.internal.storage.pack.CachedPack;
  59. import org.eclipse.jgit.internal.storage.pack.ObjectReuseAsIs;
  60. import org.eclipse.jgit.internal.storage.pack.ObjectToPack;
  61. import org.eclipse.jgit.internal.storage.pack.PackOutputStream;
  62. import org.eclipse.jgit.internal.storage.pack.PackWriter;
  63. import org.eclipse.jgit.lib.AbbreviatedObjectId;
  64. import org.eclipse.jgit.lib.AnyObjectId;
  65. import org.eclipse.jgit.lib.BitmapIndex;
  66. import org.eclipse.jgit.lib.BitmapIndex.BitmapBuilder;
  67. import org.eclipse.jgit.lib.Constants;
  68. import org.eclipse.jgit.lib.InflaterCache;
  69. import org.eclipse.jgit.lib.ObjectId;
  70. import org.eclipse.jgit.lib.ObjectInserter;
  71. import org.eclipse.jgit.lib.ObjectLoader;
  72. import org.eclipse.jgit.lib.ObjectReader;
  73. import org.eclipse.jgit.lib.ProgressMonitor;
  74. /** Active handle to a ByteWindow. */
  75. final class WindowCursor extends ObjectReader implements ObjectReuseAsIs {
  76. /** Temporary buffer large enough for at least one raw object id. */
  77. final byte[] tempId = new byte[Constants.OBJECT_ID_LENGTH];
  78. private Inflater inf;
  79. private ByteWindow window;
  80. private DeltaBaseCache baseCache;
  81. @Nullable
  82. private final ObjectInserter createdFromInserter;
  83. final FileObjectDatabase db;
  84. WindowCursor(FileObjectDatabase db) {
  85. this.db = db;
  86. this.createdFromInserter = null;
  87. this.streamFileThreshold = WindowCache.getStreamFileThreshold();
  88. }
  89. WindowCursor(FileObjectDatabase db,
  90. @Nullable ObjectDirectoryInserter createdFromInserter) {
  91. this.db = db;
  92. this.createdFromInserter = createdFromInserter;
  93. this.streamFileThreshold = WindowCache.getStreamFileThreshold();
  94. }
  95. DeltaBaseCache getDeltaBaseCache() {
  96. if (baseCache == null)
  97. baseCache = new DeltaBaseCache();
  98. return baseCache;
  99. }
  100. /** {@inheritDoc} */
  101. @Override
  102. public ObjectReader newReader() {
  103. return new WindowCursor(db);
  104. }
  105. /** {@inheritDoc} */
  106. @Override
  107. public BitmapIndex getBitmapIndex() throws IOException {
  108. for (PackFile pack : db.getPacks()) {
  109. PackBitmapIndex index = pack.getBitmapIndex();
  110. if (index != null)
  111. return new BitmapIndexImpl(index);
  112. }
  113. return null;
  114. }
  115. /** {@inheritDoc} */
  116. @Override
  117. public Collection<CachedPack> getCachedPacksAndUpdate(
  118. BitmapBuilder needBitmap) throws IOException {
  119. for (PackFile pack : db.getPacks()) {
  120. PackBitmapIndex index = pack.getBitmapIndex();
  121. if (needBitmap.removeAllOrNone(index))
  122. return Collections.<CachedPack> singletonList(
  123. new LocalCachedPack(Collections.singletonList(pack)));
  124. }
  125. return Collections.emptyList();
  126. }
  127. /** {@inheritDoc} */
  128. @Override
  129. public Collection<ObjectId> resolve(AbbreviatedObjectId id)
  130. throws IOException {
  131. if (id.isComplete())
  132. return Collections.singleton(id.toObjectId());
  133. HashSet<ObjectId> matches = new HashSet<>(4);
  134. db.resolve(matches, id);
  135. return matches;
  136. }
  137. /** {@inheritDoc} */
  138. @Override
  139. public boolean has(AnyObjectId objectId) throws IOException {
  140. return db.has(objectId);
  141. }
  142. /** {@inheritDoc} */
  143. @Override
  144. public ObjectLoader open(AnyObjectId objectId, int typeHint)
  145. throws MissingObjectException, IncorrectObjectTypeException,
  146. IOException {
  147. final ObjectLoader ldr = db.openObject(this, objectId);
  148. if (ldr == null) {
  149. if (typeHint == OBJ_ANY)
  150. throw new MissingObjectException(objectId.copy(),
  151. JGitText.get().unknownObjectType2);
  152. throw new MissingObjectException(objectId.copy(), typeHint);
  153. }
  154. if (typeHint != OBJ_ANY && ldr.getType() != typeHint)
  155. throw new IncorrectObjectTypeException(objectId.copy(), typeHint);
  156. return ldr;
  157. }
  158. /** {@inheritDoc} */
  159. @Override
  160. public Set<ObjectId> getShallowCommits() throws IOException {
  161. return db.getShallowCommits();
  162. }
  163. /** {@inheritDoc} */
  164. @Override
  165. public long getObjectSize(AnyObjectId objectId, int typeHint)
  166. throws MissingObjectException, IncorrectObjectTypeException,
  167. IOException {
  168. long sz = db.getObjectSize(this, objectId);
  169. if (sz < 0) {
  170. if (typeHint == OBJ_ANY)
  171. throw new MissingObjectException(objectId.copy(),
  172. JGitText.get().unknownObjectType2);
  173. throw new MissingObjectException(objectId.copy(), typeHint);
  174. }
  175. return sz;
  176. }
  177. /** {@inheritDoc} */
  178. @Override
  179. public LocalObjectToPack newObjectToPack(AnyObjectId objectId, int type) {
  180. return new LocalObjectToPack(objectId, type);
  181. }
  182. /** {@inheritDoc} */
  183. @Override
  184. public void selectObjectRepresentation(PackWriter packer,
  185. ProgressMonitor monitor, Iterable<ObjectToPack> objects)
  186. throws IOException, MissingObjectException {
  187. for (ObjectToPack otp : objects) {
  188. db.selectObjectRepresentation(packer, otp, this);
  189. monitor.update(1);
  190. }
  191. }
  192. /** {@inheritDoc} */
  193. @Override
  194. public void copyObjectAsIs(PackOutputStream out, ObjectToPack otp,
  195. boolean validate) throws IOException,
  196. StoredObjectRepresentationNotAvailableException {
  197. LocalObjectToPack src = (LocalObjectToPack) otp;
  198. src.pack.copyAsIs(out, src, validate, this);
  199. }
  200. /** {@inheritDoc} */
  201. @Override
  202. public void writeObjects(PackOutputStream out, List<ObjectToPack> list)
  203. throws IOException {
  204. for (ObjectToPack otp : list)
  205. out.writeObject(otp);
  206. }
  207. /**
  208. * Copy bytes from the window to a caller supplied buffer.
  209. *
  210. * @param pack
  211. * the file the desired window is stored within.
  212. * @param position
  213. * position within the file to read from.
  214. * @param dstbuf
  215. * destination buffer to copy into.
  216. * @param dstoff
  217. * offset within <code>dstbuf</code> to start copying into.
  218. * @param cnt
  219. * number of bytes to copy. This value may exceed the number of
  220. * bytes remaining in the window starting at offset
  221. * <code>pos</code>.
  222. * @return number of bytes actually copied; this may be less than
  223. * <code>cnt</code> if <code>cnt</code> exceeded the number of bytes
  224. * available.
  225. * @throws IOException
  226. * this cursor does not match the provider or id and the proper
  227. * window could not be acquired through the provider's cache.
  228. */
  229. int copy(final PackFile pack, long position, final byte[] dstbuf,
  230. int dstoff, final int cnt) throws IOException {
  231. final long length = pack.length;
  232. int need = cnt;
  233. while (need > 0 && position < length) {
  234. pin(pack, position);
  235. final int r = window.copy(position, dstbuf, dstoff, need);
  236. position += r;
  237. dstoff += r;
  238. need -= r;
  239. }
  240. return cnt - need;
  241. }
  242. /** {@inheritDoc} */
  243. @Override
  244. public void copyPackAsIs(PackOutputStream out, CachedPack pack)
  245. throws IOException {
  246. ((LocalCachedPack) pack).copyAsIs(out, this);
  247. }
  248. void copyPackAsIs(final PackFile pack, final long length,
  249. final PackOutputStream out) throws IOException {
  250. long position = 12;
  251. long remaining = length - (12 + 20);
  252. while (0 < remaining) {
  253. pin(pack, position);
  254. int ptr = (int) (position - window.start);
  255. int n = (int) Math.min(window.size() - ptr, remaining);
  256. window.write(out, position, n);
  257. position += n;
  258. remaining -= n;
  259. }
  260. }
  261. /**
  262. * Inflate a region of the pack starting at {@code position}.
  263. *
  264. * @param pack
  265. * the file the desired window is stored within.
  266. * @param position
  267. * position within the file to read from.
  268. * @param dstbuf
  269. * destination buffer the inflater should output decompressed
  270. * data to. Must be large enough to store the entire stream,
  271. * unless headerOnly is true.
  272. * @param headerOnly
  273. * if true the caller wants only {@code dstbuf.length} bytes.
  274. * @return number of bytes inflated into <code>dstbuf</code>.
  275. * @throws IOException
  276. * this cursor does not match the provider or id and the proper
  277. * window could not be acquired through the provider's cache.
  278. * @throws DataFormatException
  279. * the inflater encountered an invalid chunk of data. Data
  280. * stream corruption is likely.
  281. */
  282. int inflate(final PackFile pack, long position, final byte[] dstbuf,
  283. boolean headerOnly) throws IOException, DataFormatException {
  284. prepareInflater();
  285. pin(pack, position);
  286. position += window.setInput(position, inf);
  287. for (int dstoff = 0;;) {
  288. int n = inf.inflate(dstbuf, dstoff, dstbuf.length - dstoff);
  289. dstoff += n;
  290. if (inf.finished() || (headerOnly && dstoff == dstbuf.length))
  291. return dstoff;
  292. if (inf.needsInput()) {
  293. pin(pack, position);
  294. position += window.setInput(position, inf);
  295. } else if (n == 0)
  296. throw new DataFormatException();
  297. }
  298. }
  299. ByteArrayWindow quickCopy(PackFile p, long pos, long cnt)
  300. throws IOException {
  301. pin(p, pos);
  302. if (window instanceof ByteArrayWindow
  303. && window.contains(p, pos + (cnt - 1)))
  304. return (ByteArrayWindow) window;
  305. return null;
  306. }
  307. Inflater inflater() {
  308. prepareInflater();
  309. return inf;
  310. }
  311. private void prepareInflater() {
  312. if (inf == null)
  313. inf = InflaterCache.get();
  314. else
  315. inf.reset();
  316. }
  317. void pin(PackFile pack, long position)
  318. throws IOException {
  319. final ByteWindow w = window;
  320. if (w == null || !w.contains(pack, position)) {
  321. // If memory is low, we may need what is in our window field to
  322. // be cleaned up by the GC during the get for the next window.
  323. // So we always clear it, even though we are just going to set
  324. // it again.
  325. //
  326. window = null;
  327. window = WindowCache.get(pack, position);
  328. }
  329. }
  330. /** {@inheritDoc} */
  331. @Override
  332. @Nullable
  333. public ObjectInserter getCreatedFromInserter() {
  334. return createdFromInserter;
  335. }
  336. /**
  337. * {@inheritDoc}
  338. * <p>
  339. * Release the current window cursor.
  340. */
  341. @Override
  342. public void close() {
  343. window = null;
  344. baseCache = null;
  345. try {
  346. InflaterCache.release(inf);
  347. } finally {
  348. inf = null;
  349. }
  350. }
  351. }