You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

PackFile.java 35KB

Added read/write support for pack bitmap index. A pack bitmap index is an additional index of compressed bitmaps of the object graph. Furthermore, a logical API of the index functionality is included, as it is expected to be used by the PackWriter. Compressed bitmaps are created using the javaewah library, which is a word-aligned compressed variant of the Java bitset class based on run-length encoding. The library only works with positive integer values. Thus, the maximum number of ObjectIds in a pack file that this index can currently support is limited to Integer.MAX_VALUE. Every ObjectId is given an integer mapping. The integer is the position of the ObjectId in the complete ObjectId list, sorted by offset, for the pack file. That integer is what the bitmaps use to reference the ObjectId. Currently, the new index format can only be used with pack files that contain a complete closure of the object graph e.g. the result of a garbage collection. The index file includes four bitmaps for the Git object types i.e. commits, trees, blobs, and tags. In addition, a collection of bitmaps keyed by an ObjectId is also included. The bitmap for each entry in the collection represents the full closure of ObjectIds reachable from the keyed ObjectId (including the keyed ObjectId itself). The bitmaps are further compressed by XORing the current bitmaps against prior bitmaps in the index, and selecting the smallest representation. The XOR'd bitmap and offset from the current entry to the position of the bitmap to XOR against is the actual representation of the entry in the index file. Each entry contains one byte, which is currently used to note whether the bitmap should be blindly reused. Change-Id: Id328724bf6b4c8366a088233098c18643edcf40f
11 years ago
Don't use interruptable pread() to access pack files The J2SE NIO APIs require that FileChannel close the underlying file descriptor if a thread is interrupted while it is inside of a read or write operation on that channel. This is insane, because it means we cannot share the file descriptor between threads. If a thread is in the middle of the FileChannel variant of IO.readFully() and it receives an interrupt, the pack will be automatically closed on us. This causes the other threads trying to use that same FileChannel to receive IOExceptions, which leads to the pack getting marked as invalid. Once the pack is marked invalid, JGit loses access to its entire contents and starts to report MissingObjectExceptions. Because PackWriter must ensure that the chosen pack file stays available until the current object's data is fully copied to the output, JGit cannot simply reopen the pack when its automatically closed due to an interrupt being sent at the wrong time. The pack may have been deleted by a concurrent `git gc` process, and that open file descriptor might be the last reference to the inode on disk. Once its closed, the PackWriter loses access to that object representation, and it cannot complete sending the object the client. Fortunately, RandomAccessFile's readFully method does not have this problem. Interrupts during readFully() are ignored. However, it requires us to first seek to the offset we need to read, then issue the read call. This requires locking around the file descriptor to prevent concurrent threads from moving the pointer before the read. This reduces the concurrency level, as now only one window can be paged in at a time from each pack. However, the WindowCache should already be holding most of the pages required to handle the working set for a process, and its own internal locking was already limiting us on the number of concurrent loads possible. Provided that most concurrent accesses are getting hits in the WindowCache, or are for different repositories on the same server, we shouldn't see a major performance hit due to the more serialized loading. I would have preferred to use a pool of RandomAccessFiles for each pack, with threads borrowing an instance dedicated to that thread whenever they needed to page in a window. This would permit much higher levels of concurrency by using multiple file descriptors (and file pointers) for each pack. However the code became too complex to develop in any reasonable period of time, so I've chosen to retrofit the existing code with more serialization instead. Bug: 308945 Change-Id: I2e6e11c6e5a105e5aef68871b66200fd725134c9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago
Added read/write support for pack bitmap index. A pack bitmap index is an additional index of compressed bitmaps of the object graph. Furthermore, a logical API of the index functionality is included, as it is expected to be used by the PackWriter. Compressed bitmaps are created using the javaewah library, which is a word-aligned compressed variant of the Java bitset class based on run-length encoding. The library only works with positive integer values. Thus, the maximum number of ObjectIds in a pack file that this index can currently support is limited to Integer.MAX_VALUE. Every ObjectId is given an integer mapping. The integer is the position of the ObjectId in the complete ObjectId list, sorted by offset, for the pack file. That integer is what the bitmaps use to reference the ObjectId. Currently, the new index format can only be used with pack files that contain a complete closure of the object graph e.g. the result of a garbage collection. The index file includes four bitmaps for the Git object types i.e. commits, trees, blobs, and tags. In addition, a collection of bitmaps keyed by an ObjectId is also included. The bitmap for each entry in the collection represents the full closure of ObjectIds reachable from the keyed ObjectId (including the keyed ObjectId itself). The bitmaps are further compressed by XORing the current bitmaps against prior bitmaps in the index, and selecting the smallest representation. The XOR'd bitmap and offset from the current entry to the position of the bitmap to XOR against is the actual representation of the entry in the index file. Each entry contains one byte, which is currently used to note whether the bitmap should be blindly reused. Change-Id: Id328724bf6b4c8366a088233098c18643edcf40f
11 years ago
Added read/write support for pack bitmap index. A pack bitmap index is an additional index of compressed bitmaps of the object graph. Furthermore, a logical API of the index functionality is included, as it is expected to be used by the PackWriter. Compressed bitmaps are created using the javaewah library, which is a word-aligned compressed variant of the Java bitset class based on run-length encoding. The library only works with positive integer values. Thus, the maximum number of ObjectIds in a pack file that this index can currently support is limited to Integer.MAX_VALUE. Every ObjectId is given an integer mapping. The integer is the position of the ObjectId in the complete ObjectId list, sorted by offset, for the pack file. That integer is what the bitmaps use to reference the ObjectId. Currently, the new index format can only be used with pack files that contain a complete closure of the object graph e.g. the result of a garbage collection. The index file includes four bitmaps for the Git object types i.e. commits, trees, blobs, and tags. In addition, a collection of bitmaps keyed by an ObjectId is also included. The bitmap for each entry in the collection represents the full closure of ObjectIds reachable from the keyed ObjectId (including the keyed ObjectId itself). The bitmaps are further compressed by XORing the current bitmaps against prior bitmaps in the index, and selecting the smallest representation. The XOR'd bitmap and offset from the current entry to the position of the bitmap to XOR against is the actual representation of the entry in the index file. Each entry contains one byte, which is currently used to note whether the bitmap should be blindly reused. Change-Id: Id328724bf6b4c8366a088233098c18643edcf40f
11 years ago
PackWriter: Support reuse of entire packs The most expensive part of packing a repository for transport to another system is enumerating all of the objects in the repository. Once this gets to the size of the linux-2.6 repository (1.8 million objects), enumeration can take several CPU minutes and costs a lot of temporary working set memory. Teach PackWriter to efficiently reuse an existing "cached pack" by answering a clone request with a thin pack followed by a larger cached pack appended to the end. This requires the repository owner to first construct the cached pack by hand, and record the tip commits inside of $GIT_DIR/objects/info/cached-packs: cd $GIT_DIR root=$(git rev-parse master) tmp=objects/.tmp-$$ names=$(echo $root | git pack-objects --keep-true-parents --revs $tmp) for n in $names; do chmod a-w $tmp-$n.pack $tmp-$n.idx touch objects/pack/pack-$n.keep mv $tmp-$n.pack objects/pack/pack-$n.pack mv $tmp-$n.idx objects/pack/pack-$n.idx done (echo "+ $root"; for n in $names; do echo "P $n"; done; echo) >>objects/info/cached-packs git repack -a -d When a clone request needs to include $root, the corresponding cached pack will be copied as-is, rather than enumerating all of the objects that are reachable from $root. For a linux-2.6 kernel repository that should be about 376 MiB, the above process creates two packs of 368 MiB and 38 MiB[1]. This is a local disk usage increase of ~26 MiB, due to reduced delta compression between the large cached pack and the smaller recent activity pack. The overhead is similar to 1 full copy of the compressed project sources. With this cached pack in hand, JGit daemon completes a clone request in 1m17s less time, but a slightly larger data transfer (+2.39 MiB): Before: remote: Counting objects: 1861830, done remote: Finding sources: 100% (1861830/1861830) remote: Getting sizes: 100% (88243/88243) remote: Compressing objects: 100% (88184/88184) Receiving objects: 100% (1861830/1861830), 376.01 MiB | 19.01 MiB/s, done. remote: Total 1861830 (delta 4706), reused 1851053 (delta 1553844) Resolving deltas: 100% (1564621/1564621), done. real 3m19.005s After: remote: Counting objects: 1601, done remote: Counting objects: 1828460, done remote: Finding sources: 100% (50475/50475) remote: Getting sizes: 100% (18843/18843) remote: Compressing objects: 100% (7585/7585) remote: Total 1861830 (delta 2407), reused 1856197 (delta 37510) Receiving objects: 100% (1861830/1861830), 378.40 MiB | 31.31 MiB/s, done. Resolving deltas: 100% (1559477/1559477), done. real 2m2.938s Repository owners can periodically refresh their cached packs by repacking their repository, folding all newer objects into a larger cached pack. Since repacking is already considered to be a normal Git maintenance activity, this isn't a very big burden. [1] In this test $root was set back about two weeks. Change-Id: Ib87131d5c4b5e8c5cacb0f4fe16ff4ece554734b Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago
PackWriter: Support reuse of entire packs The most expensive part of packing a repository for transport to another system is enumerating all of the objects in the repository. Once this gets to the size of the linux-2.6 repository (1.8 million objects), enumeration can take several CPU minutes and costs a lot of temporary working set memory. Teach PackWriter to efficiently reuse an existing "cached pack" by answering a clone request with a thin pack followed by a larger cached pack appended to the end. This requires the repository owner to first construct the cached pack by hand, and record the tip commits inside of $GIT_DIR/objects/info/cached-packs: cd $GIT_DIR root=$(git rev-parse master) tmp=objects/.tmp-$$ names=$(echo $root | git pack-objects --keep-true-parents --revs $tmp) for n in $names; do chmod a-w $tmp-$n.pack $tmp-$n.idx touch objects/pack/pack-$n.keep mv $tmp-$n.pack objects/pack/pack-$n.pack mv $tmp-$n.idx objects/pack/pack-$n.idx done (echo "+ $root"; for n in $names; do echo "P $n"; done; echo) >>objects/info/cached-packs git repack -a -d When a clone request needs to include $root, the corresponding cached pack will be copied as-is, rather than enumerating all of the objects that are reachable from $root. For a linux-2.6 kernel repository that should be about 376 MiB, the above process creates two packs of 368 MiB and 38 MiB[1]. This is a local disk usage increase of ~26 MiB, due to reduced delta compression between the large cached pack and the smaller recent activity pack. The overhead is similar to 1 full copy of the compressed project sources. With this cached pack in hand, JGit daemon completes a clone request in 1m17s less time, but a slightly larger data transfer (+2.39 MiB): Before: remote: Counting objects: 1861830, done remote: Finding sources: 100% (1861830/1861830) remote: Getting sizes: 100% (88243/88243) remote: Compressing objects: 100% (88184/88184) Receiving objects: 100% (1861830/1861830), 376.01 MiB | 19.01 MiB/s, done. remote: Total 1861830 (delta 4706), reused 1851053 (delta 1553844) Resolving deltas: 100% (1564621/1564621), done. real 3m19.005s After: remote: Counting objects: 1601, done remote: Counting objects: 1828460, done remote: Finding sources: 100% (50475/50475) remote: Getting sizes: 100% (18843/18843) remote: Compressing objects: 100% (7585/7585) remote: Total 1861830 (delta 2407), reused 1856197 (delta 37510) Receiving objects: 100% (1861830/1861830), 378.40 MiB | 31.31 MiB/s, done. Resolving deltas: 100% (1559477/1559477), done. real 2m2.938s Repository owners can periodically refresh their cached packs by repacking their repository, folding all newer objects into a larger cached pack. Since repacking is already considered to be a normal Git maintenance activity, this isn't a very big burden. [1] In this test $root was set back about two weeks. Change-Id: Ib87131d5c4b5e8c5cacb0f4fe16ff4ece554734b Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago
PackWriter: Support reuse of entire packs The most expensive part of packing a repository for transport to another system is enumerating all of the objects in the repository. Once this gets to the size of the linux-2.6 repository (1.8 million objects), enumeration can take several CPU minutes and costs a lot of temporary working set memory. Teach PackWriter to efficiently reuse an existing "cached pack" by answering a clone request with a thin pack followed by a larger cached pack appended to the end. This requires the repository owner to first construct the cached pack by hand, and record the tip commits inside of $GIT_DIR/objects/info/cached-packs: cd $GIT_DIR root=$(git rev-parse master) tmp=objects/.tmp-$$ names=$(echo $root | git pack-objects --keep-true-parents --revs $tmp) for n in $names; do chmod a-w $tmp-$n.pack $tmp-$n.idx touch objects/pack/pack-$n.keep mv $tmp-$n.pack objects/pack/pack-$n.pack mv $tmp-$n.idx objects/pack/pack-$n.idx done (echo "+ $root"; for n in $names; do echo "P $n"; done; echo) >>objects/info/cached-packs git repack -a -d When a clone request needs to include $root, the corresponding cached pack will be copied as-is, rather than enumerating all of the objects that are reachable from $root. For a linux-2.6 kernel repository that should be about 376 MiB, the above process creates two packs of 368 MiB and 38 MiB[1]. This is a local disk usage increase of ~26 MiB, due to reduced delta compression between the large cached pack and the smaller recent activity pack. The overhead is similar to 1 full copy of the compressed project sources. With this cached pack in hand, JGit daemon completes a clone request in 1m17s less time, but a slightly larger data transfer (+2.39 MiB): Before: remote: Counting objects: 1861830, done remote: Finding sources: 100% (1861830/1861830) remote: Getting sizes: 100% (88243/88243) remote: Compressing objects: 100% (88184/88184) Receiving objects: 100% (1861830/1861830), 376.01 MiB | 19.01 MiB/s, done. remote: Total 1861830 (delta 4706), reused 1851053 (delta 1553844) Resolving deltas: 100% (1564621/1564621), done. real 3m19.005s After: remote: Counting objects: 1601, done remote: Counting objects: 1828460, done remote: Finding sources: 100% (50475/50475) remote: Getting sizes: 100% (18843/18843) remote: Compressing objects: 100% (7585/7585) remote: Total 1861830 (delta 2407), reused 1856197 (delta 37510) Receiving objects: 100% (1861830/1861830), 378.40 MiB | 31.31 MiB/s, done. Resolving deltas: 100% (1559477/1559477), done. real 2m2.938s Repository owners can periodically refresh their cached packs by repacking their repository, folding all newer objects into a larger cached pack. Since repacking is already considered to be a normal Git maintenance activity, this isn't a very big burden. [1] In this test $root was set back about two weeks. Change-Id: Ib87131d5c4b5e8c5cacb0f4fe16ff4ece554734b Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago
PackWriter: Support reuse of entire packs The most expensive part of packing a repository for transport to another system is enumerating all of the objects in the repository. Once this gets to the size of the linux-2.6 repository (1.8 million objects), enumeration can take several CPU minutes and costs a lot of temporary working set memory. Teach PackWriter to efficiently reuse an existing "cached pack" by answering a clone request with a thin pack followed by a larger cached pack appended to the end. This requires the repository owner to first construct the cached pack by hand, and record the tip commits inside of $GIT_DIR/objects/info/cached-packs: cd $GIT_DIR root=$(git rev-parse master) tmp=objects/.tmp-$$ names=$(echo $root | git pack-objects --keep-true-parents --revs $tmp) for n in $names; do chmod a-w $tmp-$n.pack $tmp-$n.idx touch objects/pack/pack-$n.keep mv $tmp-$n.pack objects/pack/pack-$n.pack mv $tmp-$n.idx objects/pack/pack-$n.idx done (echo "+ $root"; for n in $names; do echo "P $n"; done; echo) >>objects/info/cached-packs git repack -a -d When a clone request needs to include $root, the corresponding cached pack will be copied as-is, rather than enumerating all of the objects that are reachable from $root. For a linux-2.6 kernel repository that should be about 376 MiB, the above process creates two packs of 368 MiB and 38 MiB[1]. This is a local disk usage increase of ~26 MiB, due to reduced delta compression between the large cached pack and the smaller recent activity pack. The overhead is similar to 1 full copy of the compressed project sources. With this cached pack in hand, JGit daemon completes a clone request in 1m17s less time, but a slightly larger data transfer (+2.39 MiB): Before: remote: Counting objects: 1861830, done remote: Finding sources: 100% (1861830/1861830) remote: Getting sizes: 100% (88243/88243) remote: Compressing objects: 100% (88184/88184) Receiving objects: 100% (1861830/1861830), 376.01 MiB | 19.01 MiB/s, done. remote: Total 1861830 (delta 4706), reused 1851053 (delta 1553844) Resolving deltas: 100% (1564621/1564621), done. real 3m19.005s After: remote: Counting objects: 1601, done remote: Counting objects: 1828460, done remote: Finding sources: 100% (50475/50475) remote: Getting sizes: 100% (18843/18843) remote: Compressing objects: 100% (7585/7585) remote: Total 1861830 (delta 2407), reused 1856197 (delta 37510) Receiving objects: 100% (1861830/1861830), 378.40 MiB | 31.31 MiB/s, done. Resolving deltas: 100% (1559477/1559477), done. real 2m2.938s Repository owners can periodically refresh their cached packs by repacking their repository, folding all newer objects into a larger cached pack. Since repacking is already considered to be a normal Git maintenance activity, this isn't a very big burden. [1] In this test $root was set back about two weeks. Change-Id: Ib87131d5c4b5e8c5cacb0f4fe16ff4ece554734b Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago
Fix PackInvalidException when fetch and repack run concurrently We are running several servers with jGit. We need to run repack from time to time to keep the repos performant. I.e. after push we test how many small packs are in the repo and when a threshold is reached we run the repack. After upgrading jGit version we've found that if someone does the clone at the time repack is running the clone sometimes (not always) fails because the repack removes .pack file used by the clone. Server exception and client error attached. I've tracked down the cause and it seems to be introduced between jGit 5.2 (which we upgraded from) and 5.3 and being caused by this commit: Move throw of PackInvalidException outside the catch - https://github.com/eclipse/jgit/commit/afef866a44cd65fef292c174cad445b3fb526400 The problem is that when the throw was inside of the try block the last catch block catched the exception and called openFailed(false) method. It is true that it called it with invalidate = false, which is wrong. The real problem though is that with the throw outside of the try block the openFail is not called at all and the fields activeWindows and activeCopyRawData are not set to 0. Which affects the later called tests like: if (++activeCopyRawData == 1 && activeWindows == 0). The fix for this is relatively simple keeping the throw outside of the try block and still having the invalid field set to true. I did exhaustive testing of the change running concurrent clones and pushes indefinitely and with the patch applied it never fails while without the patch it takes relatively short to get the error. See: https://www.eclipse.org/lists/jgit-dev/msg04014.html Bug: 569349 Change-Id: I9dbf8801c8d3131955ad7124f42b62095d96da54 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
3 years ago
Don't use interruptable pread() to access pack files The J2SE NIO APIs require that FileChannel close the underlying file descriptor if a thread is interrupted while it is inside of a read or write operation on that channel. This is insane, because it means we cannot share the file descriptor between threads. If a thread is in the middle of the FileChannel variant of IO.readFully() and it receives an interrupt, the pack will be automatically closed on us. This causes the other threads trying to use that same FileChannel to receive IOExceptions, which leads to the pack getting marked as invalid. Once the pack is marked invalid, JGit loses access to its entire contents and starts to report MissingObjectExceptions. Because PackWriter must ensure that the chosen pack file stays available until the current object's data is fully copied to the output, JGit cannot simply reopen the pack when its automatically closed due to an interrupt being sent at the wrong time. The pack may have been deleted by a concurrent `git gc` process, and that open file descriptor might be the last reference to the inode on disk. Once its closed, the PackWriter loses access to that object representation, and it cannot complete sending the object the client. Fortunately, RandomAccessFile's readFully method does not have this problem. Interrupts during readFully() are ignored. However, it requires us to first seek to the offset we need to read, then issue the read call. This requires locking around the file descriptor to prevent concurrent threads from moving the pointer before the read. This reduces the concurrency level, as now only one window can be paged in at a time from each pack. However, the WindowCache should already be holding most of the pages required to handle the working set for a process, and its own internal locking was already limiting us on the number of concurrent loads possible. Provided that most concurrent accesses are getting hits in the WindowCache, or are for different repositories on the same server, we shouldn't see a major performance hit due to the more serialized loading. I would have preferred to use a pool of RandomAccessFiles for each pack, with threads borrowing an instance dedicated to that thread whenever they needed to page in a window. This would permit much higher levels of concurrency by using multiple file descriptors (and file pointers) for each pack. However the code became too complex to develop in any reasonable period of time, so I've chosen to retrofit the existing code with more serialization instead. Bug: 308945 Change-Id: I2e6e11c6e5a105e5aef68871b66200fd725134c9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago
Don't use interruptable pread() to access pack files The J2SE NIO APIs require that FileChannel close the underlying file descriptor if a thread is interrupted while it is inside of a read or write operation on that channel. This is insane, because it means we cannot share the file descriptor between threads. If a thread is in the middle of the FileChannel variant of IO.readFully() and it receives an interrupt, the pack will be automatically closed on us. This causes the other threads trying to use that same FileChannel to receive IOExceptions, which leads to the pack getting marked as invalid. Once the pack is marked invalid, JGit loses access to its entire contents and starts to report MissingObjectExceptions. Because PackWriter must ensure that the chosen pack file stays available until the current object's data is fully copied to the output, JGit cannot simply reopen the pack when its automatically closed due to an interrupt being sent at the wrong time. The pack may have been deleted by a concurrent `git gc` process, and that open file descriptor might be the last reference to the inode on disk. Once its closed, the PackWriter loses access to that object representation, and it cannot complete sending the object the client. Fortunately, RandomAccessFile's readFully method does not have this problem. Interrupts during readFully() are ignored. However, it requires us to first seek to the offset we need to read, then issue the read call. This requires locking around the file descriptor to prevent concurrent threads from moving the pointer before the read. This reduces the concurrency level, as now only one window can be paged in at a time from each pack. However, the WindowCache should already be holding most of the pages required to handle the working set for a process, and its own internal locking was already limiting us on the number of concurrent loads possible. Provided that most concurrent accesses are getting hits in the WindowCache, or are for different repositories on the same server, we shouldn't see a major performance hit due to the more serialized loading. I would have preferred to use a pool of RandomAccessFiles for each pack, with threads borrowing an instance dedicated to that thread whenever they needed to page in a window. This would permit much higher levels of concurrency by using multiple file descriptors (and file pointers) for each pack. However the code became too complex to develop in any reasonable period of time, so I've chosen to retrofit the existing code with more serialization instead. Bug: 308945 Change-Id: I2e6e11c6e5a105e5aef68871b66200fd725134c9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago
Don't use interruptable pread() to access pack files The J2SE NIO APIs require that FileChannel close the underlying file descriptor if a thread is interrupted while it is inside of a read or write operation on that channel. This is insane, because it means we cannot share the file descriptor between threads. If a thread is in the middle of the FileChannel variant of IO.readFully() and it receives an interrupt, the pack will be automatically closed on us. This causes the other threads trying to use that same FileChannel to receive IOExceptions, which leads to the pack getting marked as invalid. Once the pack is marked invalid, JGit loses access to its entire contents and starts to report MissingObjectExceptions. Because PackWriter must ensure that the chosen pack file stays available until the current object's data is fully copied to the output, JGit cannot simply reopen the pack when its automatically closed due to an interrupt being sent at the wrong time. The pack may have been deleted by a concurrent `git gc` process, and that open file descriptor might be the last reference to the inode on disk. Once its closed, the PackWriter loses access to that object representation, and it cannot complete sending the object the client. Fortunately, RandomAccessFile's readFully method does not have this problem. Interrupts during readFully() are ignored. However, it requires us to first seek to the offset we need to read, then issue the read call. This requires locking around the file descriptor to prevent concurrent threads from moving the pointer before the read. This reduces the concurrency level, as now only one window can be paged in at a time from each pack. However, the WindowCache should already be holding most of the pages required to handle the working set for a process, and its own internal locking was already limiting us on the number of concurrent loads possible. Provided that most concurrent accesses are getting hits in the WindowCache, or are for different repositories on the same server, we shouldn't see a major performance hit due to the more serialized loading. I would have preferred to use a pool of RandomAccessFiles for each pack, with threads borrowing an instance dedicated to that thread whenever they needed to page in a window. This would permit much higher levels of concurrency by using multiple file descriptors (and file pointers) for each pack. However the code became too complex to develop in any reasonable period of time, so I've chosen to retrofit the existing code with more serialization instead. Bug: 308945 Change-Id: I2e6e11c6e5a105e5aef68871b66200fd725134c9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago
Don't use interruptable pread() to access pack files The J2SE NIO APIs require that FileChannel close the underlying file descriptor if a thread is interrupted while it is inside of a read or write operation on that channel. This is insane, because it means we cannot share the file descriptor between threads. If a thread is in the middle of the FileChannel variant of IO.readFully() and it receives an interrupt, the pack will be automatically closed on us. This causes the other threads trying to use that same FileChannel to receive IOExceptions, which leads to the pack getting marked as invalid. Once the pack is marked invalid, JGit loses access to its entire contents and starts to report MissingObjectExceptions. Because PackWriter must ensure that the chosen pack file stays available until the current object's data is fully copied to the output, JGit cannot simply reopen the pack when its automatically closed due to an interrupt being sent at the wrong time. The pack may have been deleted by a concurrent `git gc` process, and that open file descriptor might be the last reference to the inode on disk. Once its closed, the PackWriter loses access to that object representation, and it cannot complete sending the object the client. Fortunately, RandomAccessFile's readFully method does not have this problem. Interrupts during readFully() are ignored. However, it requires us to first seek to the offset we need to read, then issue the read call. This requires locking around the file descriptor to prevent concurrent threads from moving the pointer before the read. This reduces the concurrency level, as now only one window can be paged in at a time from each pack. However, the WindowCache should already be holding most of the pages required to handle the working set for a process, and its own internal locking was already limiting us on the number of concurrent loads possible. Provided that most concurrent accesses are getting hits in the WindowCache, or are for different repositories on the same server, we shouldn't see a major performance hit due to the more serialized loading. I would have preferred to use a pool of RandomAccessFiles for each pack, with threads borrowing an instance dedicated to that thread whenever they needed to page in a window. This would permit much higher levels of concurrency by using multiple file descriptors (and file pointers) for each pack. However the code became too complex to develop in any reasonable period of time, so I've chosen to retrofit the existing code with more serialization instead. Bug: 308945 Change-Id: I2e6e11c6e5a105e5aef68871b66200fd725134c9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago
Don't use interruptable pread() to access pack files The J2SE NIO APIs require that FileChannel close the underlying file descriptor if a thread is interrupted while it is inside of a read or write operation on that channel. This is insane, because it means we cannot share the file descriptor between threads. If a thread is in the middle of the FileChannel variant of IO.readFully() and it receives an interrupt, the pack will be automatically closed on us. This causes the other threads trying to use that same FileChannel to receive IOExceptions, which leads to the pack getting marked as invalid. Once the pack is marked invalid, JGit loses access to its entire contents and starts to report MissingObjectExceptions. Because PackWriter must ensure that the chosen pack file stays available until the current object's data is fully copied to the output, JGit cannot simply reopen the pack when its automatically closed due to an interrupt being sent at the wrong time. The pack may have been deleted by a concurrent `git gc` process, and that open file descriptor might be the last reference to the inode on disk. Once its closed, the PackWriter loses access to that object representation, and it cannot complete sending the object the client. Fortunately, RandomAccessFile's readFully method does not have this problem. Interrupts during readFully() are ignored. However, it requires us to first seek to the offset we need to read, then issue the read call. This requires locking around the file descriptor to prevent concurrent threads from moving the pointer before the read. This reduces the concurrency level, as now only one window can be paged in at a time from each pack. However, the WindowCache should already be holding most of the pages required to handle the working set for a process, and its own internal locking was already limiting us on the number of concurrent loads possible. Provided that most concurrent accesses are getting hits in the WindowCache, or are for different repositories on the same server, we shouldn't see a major performance hit due to the more serialized loading. I would have preferred to use a pool of RandomAccessFiles for each pack, with threads borrowing an instance dedicated to that thread whenever they needed to page in a window. This would permit much higher levels of concurrency by using multiple file descriptors (and file pointers) for each pack. However the code became too complex to develop in any reasonable period of time, so I've chosen to retrofit the existing code with more serialization instead. Bug: 308945 Change-Id: I2e6e11c6e5a105e5aef68871b66200fd725134c9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago
Don't use interruptable pread() to access pack files The J2SE NIO APIs require that FileChannel close the underlying file descriptor if a thread is interrupted while it is inside of a read or write operation on that channel. This is insane, because it means we cannot share the file descriptor between threads. If a thread is in the middle of the FileChannel variant of IO.readFully() and it receives an interrupt, the pack will be automatically closed on us. This causes the other threads trying to use that same FileChannel to receive IOExceptions, which leads to the pack getting marked as invalid. Once the pack is marked invalid, JGit loses access to its entire contents and starts to report MissingObjectExceptions. Because PackWriter must ensure that the chosen pack file stays available until the current object's data is fully copied to the output, JGit cannot simply reopen the pack when its automatically closed due to an interrupt being sent at the wrong time. The pack may have been deleted by a concurrent `git gc` process, and that open file descriptor might be the last reference to the inode on disk. Once its closed, the PackWriter loses access to that object representation, and it cannot complete sending the object the client. Fortunately, RandomAccessFile's readFully method does not have this problem. Interrupts during readFully() are ignored. However, it requires us to first seek to the offset we need to read, then issue the read call. This requires locking around the file descriptor to prevent concurrent threads from moving the pointer before the read. This reduces the concurrency level, as now only one window can be paged in at a time from each pack. However, the WindowCache should already be holding most of the pages required to handle the working set for a process, and its own internal locking was already limiting us on the number of concurrent loads possible. Provided that most concurrent accesses are getting hits in the WindowCache, or are for different repositories on the same server, we shouldn't see a major performance hit due to the more serialized loading. I would have preferred to use a pool of RandomAccessFiles for each pack, with threads borrowing an instance dedicated to that thread whenever they needed to page in a window. This would permit much higher levels of concurrency by using multiple file descriptors (and file pointers) for each pack. However the code became too complex to develop in any reasonable period of time, so I've chosen to retrofit the existing code with more serialization instead. Bug: 308945 Change-Id: I2e6e11c6e5a105e5aef68871b66200fd725134c9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago
Don't use interruptable pread() to access pack files The J2SE NIO APIs require that FileChannel close the underlying file descriptor if a thread is interrupted while it is inside of a read or write operation on that channel. This is insane, because it means we cannot share the file descriptor between threads. If a thread is in the middle of the FileChannel variant of IO.readFully() and it receives an interrupt, the pack will be automatically closed on us. This causes the other threads trying to use that same FileChannel to receive IOExceptions, which leads to the pack getting marked as invalid. Once the pack is marked invalid, JGit loses access to its entire contents and starts to report MissingObjectExceptions. Because PackWriter must ensure that the chosen pack file stays available until the current object's data is fully copied to the output, JGit cannot simply reopen the pack when its automatically closed due to an interrupt being sent at the wrong time. The pack may have been deleted by a concurrent `git gc` process, and that open file descriptor might be the last reference to the inode on disk. Once its closed, the PackWriter loses access to that object representation, and it cannot complete sending the object the client. Fortunately, RandomAccessFile's readFully method does not have this problem. Interrupts during readFully() are ignored. However, it requires us to first seek to the offset we need to read, then issue the read call. This requires locking around the file descriptor to prevent concurrent threads from moving the pointer before the read. This reduces the concurrency level, as now only one window can be paged in at a time from each pack. However, the WindowCache should already be holding most of the pages required to handle the working set for a process, and its own internal locking was already limiting us on the number of concurrent loads possible. Provided that most concurrent accesses are getting hits in the WindowCache, or are for different repositories on the same server, we shouldn't see a major performance hit due to the more serialized loading. I would have preferred to use a pool of RandomAccessFiles for each pack, with threads borrowing an instance dedicated to that thread whenever they needed to page in a window. This would permit much higher levels of concurrency by using multiple file descriptors (and file pointers) for each pack. However the code became too complex to develop in any reasonable period of time, so I've chosen to retrofit the existing code with more serialization instead. Bug: 308945 Change-Id: I2e6e11c6e5a105e5aef68871b66200fd725134c9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago
Don't use interruptable pread() to access pack files The J2SE NIO APIs require that FileChannel close the underlying file descriptor if a thread is interrupted while it is inside of a read or write operation on that channel. This is insane, because it means we cannot share the file descriptor between threads. If a thread is in the middle of the FileChannel variant of IO.readFully() and it receives an interrupt, the pack will be automatically closed on us. This causes the other threads trying to use that same FileChannel to receive IOExceptions, which leads to the pack getting marked as invalid. Once the pack is marked invalid, JGit loses access to its entire contents and starts to report MissingObjectExceptions. Because PackWriter must ensure that the chosen pack file stays available until the current object's data is fully copied to the output, JGit cannot simply reopen the pack when its automatically closed due to an interrupt being sent at the wrong time. The pack may have been deleted by a concurrent `git gc` process, and that open file descriptor might be the last reference to the inode on disk. Once its closed, the PackWriter loses access to that object representation, and it cannot complete sending the object the client. Fortunately, RandomAccessFile's readFully method does not have this problem. Interrupts during readFully() are ignored. However, it requires us to first seek to the offset we need to read, then issue the read call. This requires locking around the file descriptor to prevent concurrent threads from moving the pointer before the read. This reduces the concurrency level, as now only one window can be paged in at a time from each pack. However, the WindowCache should already be holding most of the pages required to handle the working set for a process, and its own internal locking was already limiting us on the number of concurrent loads possible. Provided that most concurrent accesses are getting hits in the WindowCache, or are for different repositories on the same server, we shouldn't see a major performance hit due to the more serialized loading. I would have preferred to use a pool of RandomAccessFiles for each pack, with threads borrowing an instance dedicated to that thread whenever they needed to page in a window. This would permit much higher levels of concurrency by using multiple file descriptors (and file pointers) for each pack. However the code became too complex to develop in any reasonable period of time, so I've chosen to retrofit the existing code with more serialization instead. Bug: 308945 Change-Id: I2e6e11c6e5a105e5aef68871b66200fd725134c9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago
Don't use interruptable pread() to access pack files The J2SE NIO APIs require that FileChannel close the underlying file descriptor if a thread is interrupted while it is inside of a read or write operation on that channel. This is insane, because it means we cannot share the file descriptor between threads. If a thread is in the middle of the FileChannel variant of IO.readFully() and it receives an interrupt, the pack will be automatically closed on us. This causes the other threads trying to use that same FileChannel to receive IOExceptions, which leads to the pack getting marked as invalid. Once the pack is marked invalid, JGit loses access to its entire contents and starts to report MissingObjectExceptions. Because PackWriter must ensure that the chosen pack file stays available until the current object's data is fully copied to the output, JGit cannot simply reopen the pack when its automatically closed due to an interrupt being sent at the wrong time. The pack may have been deleted by a concurrent `git gc` process, and that open file descriptor might be the last reference to the inode on disk. Once its closed, the PackWriter loses access to that object representation, and it cannot complete sending the object the client. Fortunately, RandomAccessFile's readFully method does not have this problem. Interrupts during readFully() are ignored. However, it requires us to first seek to the offset we need to read, then issue the read call. This requires locking around the file descriptor to prevent concurrent threads from moving the pointer before the read. This reduces the concurrency level, as now only one window can be paged in at a time from each pack. However, the WindowCache should already be holding most of the pages required to handle the working set for a process, and its own internal locking was already limiting us on the number of concurrent loads possible. Provided that most concurrent accesses are getting hits in the WindowCache, or are for different repositories on the same server, we shouldn't see a major performance hit due to the more serialized loading. I would have preferred to use a pool of RandomAccessFiles for each pack, with threads borrowing an instance dedicated to that thread whenever they needed to page in a window. This would permit much higher levels of concurrency by using multiple file descriptors (and file pointers) for each pack. However the code became too complex to develop in any reasonable period of time, so I've chosen to retrofit the existing code with more serialization instead. Bug: 308945 Change-Id: I2e6e11c6e5a105e5aef68871b66200fd725134c9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago
Don't use interruptable pread() to access pack files The J2SE NIO APIs require that FileChannel close the underlying file descriptor if a thread is interrupted while it is inside of a read or write operation on that channel. This is insane, because it means we cannot share the file descriptor between threads. If a thread is in the middle of the FileChannel variant of IO.readFully() and it receives an interrupt, the pack will be automatically closed on us. This causes the other threads trying to use that same FileChannel to receive IOExceptions, which leads to the pack getting marked as invalid. Once the pack is marked invalid, JGit loses access to its entire contents and starts to report MissingObjectExceptions. Because PackWriter must ensure that the chosen pack file stays available until the current object's data is fully copied to the output, JGit cannot simply reopen the pack when its automatically closed due to an interrupt being sent at the wrong time. The pack may have been deleted by a concurrent `git gc` process, and that open file descriptor might be the last reference to the inode on disk. Once its closed, the PackWriter loses access to that object representation, and it cannot complete sending the object the client. Fortunately, RandomAccessFile's readFully method does not have this problem. Interrupts during readFully() are ignored. However, it requires us to first seek to the offset we need to read, then issue the read call. This requires locking around the file descriptor to prevent concurrent threads from moving the pointer before the read. This reduces the concurrency level, as now only one window can be paged in at a time from each pack. However, the WindowCache should already be holding most of the pages required to handle the working set for a process, and its own internal locking was already limiting us on the number of concurrent loads possible. Provided that most concurrent accesses are getting hits in the WindowCache, or are for different repositories on the same server, we shouldn't see a major performance hit due to the more serialized loading. I would have preferred to use a pool of RandomAccessFiles for each pack, with threads borrowing an instance dedicated to that thread whenever they needed to page in a window. This would permit much higher levels of concurrency by using multiple file descriptors (and file pointers) for each pack. However the code became too complex to develop in any reasonable period of time, so I've chosen to retrofit the existing code with more serialization instead. Bug: 308945 Change-Id: I2e6e11c6e5a105e5aef68871b66200fd725134c9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago
Increase core.streamFileThreshold default to 50 MiB Projects like org.eclipse.mdt contain large XML files about 6 MiB in size. So does the Android project platform/frameworks/base. Doing a clone of either project with JGit takes forever to checkout the files into the working directory, because delta decompression tends to be very expensive as we need to constantly reposition the base stream for each copy instruction. This can be made worse by a very bad ordering of offsets, possibly due to an XML editor that doesn't preserve the order of elements in the file very well. Increasing the threshold to the same limit PackWriter uses when doing delta compression (50 MiB) permits a default configured JGit to decompress these XML file objects using the faster random-access arrays, rather than re-seeking through an inflate stream, significantly reducing checkout time after a clone. Since this new limit may be dangerously close to the JVM maximum heap size, every allocation attempt is now wrapped in a try/catch so that JGit can degrade by switching to the large object stream mode when the allocation is refused. It will run slower, but the operation will still complete. The large stream mode will run very well for big objects that aren't delta compressed, and is acceptable for delta compressed objects that are using only forward referencing copy instructions. Copies using prior offsets are still going to be horrible, and there is nothing we can do about it except increase core.streamFileThreshold. We might in the future want to consider changing the way the delta generators work in JGit and native C Git to avoid prior offsets once an object reaches a certain size, even if that causes the delta instruction stream to be slightly larger. Unfortunately native C Git won't want to do that until its also able to stream objects rather than malloc them as contiguous blocks. Change-Id: Ief7a3896afce15073e80d3691bed90c6a3897307 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
13 years ago
Increase core.streamFileThreshold default to 50 MiB Projects like org.eclipse.mdt contain large XML files about 6 MiB in size. So does the Android project platform/frameworks/base. Doing a clone of either project with JGit takes forever to checkout the files into the working directory, because delta decompression tends to be very expensive as we need to constantly reposition the base stream for each copy instruction. This can be made worse by a very bad ordering of offsets, possibly due to an XML editor that doesn't preserve the order of elements in the file very well. Increasing the threshold to the same limit PackWriter uses when doing delta compression (50 MiB) permits a default configured JGit to decompress these XML file objects using the faster random-access arrays, rather than re-seeking through an inflate stream, significantly reducing checkout time after a clone. Since this new limit may be dangerously close to the JVM maximum heap size, every allocation attempt is now wrapped in a try/catch so that JGit can degrade by switching to the large object stream mode when the allocation is refused. It will run slower, but the operation will still complete. The large stream mode will run very well for big objects that aren't delta compressed, and is acceptable for delta compressed objects that are using only forward referencing copy instructions. Copies using prior offsets are still going to be horrible, and there is nothing we can do about it except increase core.streamFileThreshold. We might in the future want to consider changing the way the delta generators work in JGit and native C Git to avoid prior offsets once an object reaches a certain size, even if that causes the delta instruction stream to be slightly larger. Unfortunately native C Git won't want to do that until its also able to stream objects rather than malloc them as contiguous blocks. Change-Id: Ief7a3896afce15073e80d3691bed90c6a3897307 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
13 years ago
Increase core.streamFileThreshold default to 50 MiB Projects like org.eclipse.mdt contain large XML files about 6 MiB in size. So does the Android project platform/frameworks/base. Doing a clone of either project with JGit takes forever to checkout the files into the working directory, because delta decompression tends to be very expensive as we need to constantly reposition the base stream for each copy instruction. This can be made worse by a very bad ordering of offsets, possibly due to an XML editor that doesn't preserve the order of elements in the file very well. Increasing the threshold to the same limit PackWriter uses when doing delta compression (50 MiB) permits a default configured JGit to decompress these XML file objects using the faster random-access arrays, rather than re-seeking through an inflate stream, significantly reducing checkout time after a clone. Since this new limit may be dangerously close to the JVM maximum heap size, every allocation attempt is now wrapped in a try/catch so that JGit can degrade by switching to the large object stream mode when the allocation is refused. It will run slower, but the operation will still complete. The large stream mode will run very well for big objects that aren't delta compressed, and is acceptable for delta compressed objects that are using only forward referencing copy instructions. Copies using prior offsets are still going to be horrible, and there is nothing we can do about it except increase core.streamFileThreshold. We might in the future want to consider changing the way the delta generators work in JGit and native C Git to avoid prior offsets once an object reaches a certain size, even if that causes the delta instruction stream to be slightly larger. Unfortunately native C Git won't want to do that until its also able to stream objects rather than malloc them as contiguous blocks. Change-Id: Ief7a3896afce15073e80d3691bed90c6a3897307 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
13 years ago
Increase core.streamFileThreshold default to 50 MiB Projects like org.eclipse.mdt contain large XML files about 6 MiB in size. So does the Android project platform/frameworks/base. Doing a clone of either project with JGit takes forever to checkout the files into the working directory, because delta decompression tends to be very expensive as we need to constantly reposition the base stream for each copy instruction. This can be made worse by a very bad ordering of offsets, possibly due to an XML editor that doesn't preserve the order of elements in the file very well. Increasing the threshold to the same limit PackWriter uses when doing delta compression (50 MiB) permits a default configured JGit to decompress these XML file objects using the faster random-access arrays, rather than re-seeking through an inflate stream, significantly reducing checkout time after a clone. Since this new limit may be dangerously close to the JVM maximum heap size, every allocation attempt is now wrapped in a try/catch so that JGit can degrade by switching to the large object stream mode when the allocation is refused. It will run slower, but the operation will still complete. The large stream mode will run very well for big objects that aren't delta compressed, and is acceptable for delta compressed objects that are using only forward referencing copy instructions. Copies using prior offsets are still going to be horrible, and there is nothing we can do about it except increase core.streamFileThreshold. We might in the future want to consider changing the way the delta generators work in JGit and native C Git to avoid prior offsets once an object reaches a certain size, even if that causes the delta instruction stream to be slightly larger. Unfortunately native C Git won't want to do that until its also able to stream objects rather than malloc them as contiguous blocks. Change-Id: Ief7a3896afce15073e80d3691bed90c6a3897307 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
13 years ago
Added read/write support for pack bitmap index. A pack bitmap index is an additional index of compressed bitmaps of the object graph. Furthermore, a logical API of the index functionality is included, as it is expected to be used by the PackWriter. Compressed bitmaps are created using the javaewah library, which is a word-aligned compressed variant of the Java bitset class based on run-length encoding. The library only works with positive integer values. Thus, the maximum number of ObjectIds in a pack file that this index can currently support is limited to Integer.MAX_VALUE. Every ObjectId is given an integer mapping. The integer is the position of the ObjectId in the complete ObjectId list, sorted by offset, for the pack file. That integer is what the bitmaps use to reference the ObjectId. Currently, the new index format can only be used with pack files that contain a complete closure of the object graph e.g. the result of a garbage collection. The index file includes four bitmaps for the Git object types i.e. commits, trees, blobs, and tags. In addition, a collection of bitmaps keyed by an ObjectId is also included. The bitmap for each entry in the collection represents the full closure of ObjectIds reachable from the keyed ObjectId (including the keyed ObjectId itself). The bitmaps are further compressed by XORing the current bitmaps against prior bitmaps in the index, and selecting the smallest representation. The XOR'd bitmap and offset from the current entry to the position of the bitmap to XOR against is the actual representation of the entry in the index file. Each entry contains one byte, which is currently used to note whether the bitmap should be blindly reused. Change-Id: Id328724bf6b4c8366a088233098c18643edcf40f
11 years ago
Added read/write support for pack bitmap index. A pack bitmap index is an additional index of compressed bitmaps of the object graph. Furthermore, a logical API of the index functionality is included, as it is expected to be used by the PackWriter. Compressed bitmaps are created using the javaewah library, which is a word-aligned compressed variant of the Java bitset class based on run-length encoding. The library only works with positive integer values. Thus, the maximum number of ObjectIds in a pack file that this index can currently support is limited to Integer.MAX_VALUE. Every ObjectId is given an integer mapping. The integer is the position of the ObjectId in the complete ObjectId list, sorted by offset, for the pack file. That integer is what the bitmaps use to reference the ObjectId. Currently, the new index format can only be used with pack files that contain a complete closure of the object graph e.g. the result of a garbage collection. The index file includes four bitmaps for the Git object types i.e. commits, trees, blobs, and tags. In addition, a collection of bitmaps keyed by an ObjectId is also included. The bitmap for each entry in the collection represents the full closure of ObjectIds reachable from the keyed ObjectId (including the keyed ObjectId itself). The bitmaps are further compressed by XORing the current bitmaps against prior bitmaps in the index, and selecting the smallest representation. The XOR'd bitmap and offset from the current entry to the position of the bitmap to XOR against is the actual representation of the entry in the index file. Each entry contains one byte, which is currently used to note whether the bitmap should be blindly reused. Change-Id: Id328724bf6b4c8366a088233098c18643edcf40f
11 years ago
Added read/write support for pack bitmap index. A pack bitmap index is an additional index of compressed bitmaps of the object graph. Furthermore, a logical API of the index functionality is included, as it is expected to be used by the PackWriter. Compressed bitmaps are created using the javaewah library, which is a word-aligned compressed variant of the Java bitset class based on run-length encoding. The library only works with positive integer values. Thus, the maximum number of ObjectIds in a pack file that this index can currently support is limited to Integer.MAX_VALUE. Every ObjectId is given an integer mapping. The integer is the position of the ObjectId in the complete ObjectId list, sorted by offset, for the pack file. That integer is what the bitmaps use to reference the ObjectId. Currently, the new index format can only be used with pack files that contain a complete closure of the object graph e.g. the result of a garbage collection. The index file includes four bitmaps for the Git object types i.e. commits, trees, blobs, and tags. In addition, a collection of bitmaps keyed by an ObjectId is also included. The bitmap for each entry in the collection represents the full closure of ObjectIds reachable from the keyed ObjectId (including the keyed ObjectId itself). The bitmaps are further compressed by XORing the current bitmaps against prior bitmaps in the index, and selecting the smallest representation. The XOR'd bitmap and offset from the current entry to the position of the bitmap to XOR against is the actual representation of the entry in the index file. Each entry contains one byte, which is currently used to note whether the bitmap should be blindly reused. Change-Id: Id328724bf6b4c8366a088233098c18643edcf40f
11 years ago
Added read/write support for pack bitmap index. A pack bitmap index is an additional index of compressed bitmaps of the object graph. Furthermore, a logical API of the index functionality is included, as it is expected to be used by the PackWriter. Compressed bitmaps are created using the javaewah library, which is a word-aligned compressed variant of the Java bitset class based on run-length encoding. The library only works with positive integer values. Thus, the maximum number of ObjectIds in a pack file that this index can currently support is limited to Integer.MAX_VALUE. Every ObjectId is given an integer mapping. The integer is the position of the ObjectId in the complete ObjectId list, sorted by offset, for the pack file. That integer is what the bitmaps use to reference the ObjectId. Currently, the new index format can only be used with pack files that contain a complete closure of the object graph e.g. the result of a garbage collection. The index file includes four bitmaps for the Git object types i.e. commits, trees, blobs, and tags. In addition, a collection of bitmaps keyed by an ObjectId is also included. The bitmap for each entry in the collection represents the full closure of ObjectIds reachable from the keyed ObjectId (including the keyed ObjectId itself). The bitmaps are further compressed by XORing the current bitmaps against prior bitmaps in the index, and selecting the smallest representation. The XOR'd bitmap and offset from the current entry to the position of the bitmap to XOR against is the actual representation of the entry in the index file. Each entry contains one byte, which is currently used to note whether the bitmap should be blindly reused. Change-Id: Id328724bf6b4c8366a088233098c18643edcf40f
11 years ago
Added read/write support for pack bitmap index. A pack bitmap index is an additional index of compressed bitmaps of the object graph. Furthermore, a logical API of the index functionality is included, as it is expected to be used by the PackWriter. Compressed bitmaps are created using the javaewah library, which is a word-aligned compressed variant of the Java bitset class based on run-length encoding. The library only works with positive integer values. Thus, the maximum number of ObjectIds in a pack file that this index can currently support is limited to Integer.MAX_VALUE. Every ObjectId is given an integer mapping. The integer is the position of the ObjectId in the complete ObjectId list, sorted by offset, for the pack file. That integer is what the bitmaps use to reference the ObjectId. Currently, the new index format can only be used with pack files that contain a complete closure of the object graph e.g. the result of a garbage collection. The index file includes four bitmaps for the Git object types i.e. commits, trees, blobs, and tags. In addition, a collection of bitmaps keyed by an ObjectId is also included. The bitmap for each entry in the collection represents the full closure of ObjectIds reachable from the keyed ObjectId (including the keyed ObjectId itself). The bitmaps are further compressed by XORing the current bitmaps against prior bitmaps in the index, and selecting the smallest representation. The XOR'd bitmap and offset from the current entry to the position of the bitmap to XOR against is the actual representation of the entry in the index file. Each entry contains one byte, which is currently used to note whether the bitmap should be blindly reused. Change-Id: Id328724bf6b4c8366a088233098c18643edcf40f
11 years ago
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130113111321133113411351136113711381139114011411142114311441145114611471148114911501151115211531154115511561157115811591160116111621163116411651166116711681169117011711172117311741175117611771178117911801181118211831184118511861187118811891190119111921193119411951196119711981199120012011202120312041205120612071208120912101211121212131214121512161217121812191220122112221223122412251226122712281229123012311232123312341235123612371238123912401241
  1. /*
  2. * Copyright (C) 2008-2009, Google Inc.
  3. * Copyright (C) 2007, Robin Rosenberg <robin.rosenberg@dewire.com>
  4. * Copyright (C) 2006-2008, Shawn O. Pearce <spearce@spearce.org>
  5. * and other copyright owners as documented in the project's IP log.
  6. *
  7. * This program and the accompanying materials are made available
  8. * under the terms of the Eclipse Distribution License v1.0 which
  9. * accompanies this distribution, is reproduced below, and is
  10. * available at http://www.eclipse.org/org/documents/edl-v10.php
  11. *
  12. * All rights reserved.
  13. *
  14. * Redistribution and use in source and binary forms, with or
  15. * without modification, are permitted provided that the following
  16. * conditions are met:
  17. *
  18. * - Redistributions of source code must retain the above copyright
  19. * notice, this list of conditions and the following disclaimer.
  20. *
  21. * - Redistributions in binary form must reproduce the above
  22. * copyright notice, this list of conditions and the following
  23. * disclaimer in the documentation and/or other materials provided
  24. * with the distribution.
  25. *
  26. * - Neither the name of the Eclipse Foundation, Inc. nor the
  27. * names of its contributors may be used to endorse or promote
  28. * products derived from this software without specific prior
  29. * written permission.
  30. *
  31. * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
  32. * CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
  33. * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  34. * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  35. * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
  36. * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
  37. * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
  38. * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
  39. * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  40. * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
  41. * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
  42. * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
  43. * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  44. */
  45. package org.eclipse.jgit.internal.storage.file;
  46. import static org.eclipse.jgit.internal.storage.pack.PackExt.BITMAP_INDEX;
  47. import static org.eclipse.jgit.internal.storage.pack.PackExt.INDEX;
  48. import static org.eclipse.jgit.internal.storage.pack.PackExt.KEEP;
  49. import java.io.EOFException;
  50. import java.io.File;
  51. import java.io.FileNotFoundException;
  52. import java.io.IOException;
  53. import java.io.InterruptedIOException;
  54. import java.io.RandomAccessFile;
  55. import java.nio.MappedByteBuffer;
  56. import java.nio.channels.FileChannel.MapMode;
  57. import java.nio.file.AccessDeniedException;
  58. import java.nio.file.NoSuchFileException;
  59. import java.text.MessageFormat;
  60. import java.time.Instant;
  61. import java.util.Arrays;
  62. import java.util.Collections;
  63. import java.util.Comparator;
  64. import java.util.Iterator;
  65. import java.util.Set;
  66. import java.util.concurrent.atomic.AtomicInteger;
  67. import java.util.zip.CRC32;
  68. import java.util.zip.DataFormatException;
  69. import java.util.zip.Inflater;
  70. import org.eclipse.jgit.errors.CorruptObjectException;
  71. import org.eclipse.jgit.errors.LargeObjectException;
  72. import org.eclipse.jgit.errors.MissingObjectException;
  73. import org.eclipse.jgit.errors.NoPackSignatureException;
  74. import org.eclipse.jgit.errors.PackInvalidException;
  75. import org.eclipse.jgit.errors.PackMismatchException;
  76. import org.eclipse.jgit.errors.StoredObjectRepresentationNotAvailableException;
  77. import org.eclipse.jgit.errors.UnpackException;
  78. import org.eclipse.jgit.errors.UnsupportedPackIndexVersionException;
  79. import org.eclipse.jgit.errors.UnsupportedPackVersionException;
  80. import org.eclipse.jgit.internal.JGitText;
  81. import org.eclipse.jgit.internal.storage.pack.BinaryDelta;
  82. import org.eclipse.jgit.internal.storage.pack.ObjectToPack;
  83. import org.eclipse.jgit.internal.storage.pack.PackExt;
  84. import org.eclipse.jgit.internal.storage.pack.PackOutputStream;
  85. import org.eclipse.jgit.lib.AbbreviatedObjectId;
  86. import org.eclipse.jgit.lib.AnyObjectId;
  87. import org.eclipse.jgit.lib.Constants;
  88. import org.eclipse.jgit.lib.ObjectId;
  89. import org.eclipse.jgit.lib.ObjectLoader;
  90. import org.eclipse.jgit.util.LongList;
  91. import org.eclipse.jgit.util.NB;
  92. import org.eclipse.jgit.util.RawParseUtils;
  93. import org.slf4j.Logger;
  94. import org.slf4j.LoggerFactory;
  95. /**
  96. * A Git version 2 pack file representation. A pack file contains Git objects in
  97. * delta packed format yielding high compression of lots of object where some
  98. * objects are similar.
  99. */
  100. public class PackFile implements Iterable<PackIndex.MutableEntry> {
  101. private final static Logger LOG = LoggerFactory.getLogger(PackFile.class);
  102. /** Sorts PackFiles to be most recently created to least recently created. */
  103. public static final Comparator<PackFile> SORT = new Comparator<PackFile>() {
  104. @Override
  105. public int compare(PackFile a, PackFile b) {
  106. return b.packLastModified.compareTo(a.packLastModified);
  107. }
  108. };
  109. private final File packFile;
  110. private final int extensions;
  111. private File keepFile;
  112. private volatile String packName;
  113. final int hash;
  114. private RandomAccessFile fd;
  115. /** Serializes reads performed against {@link #fd}. */
  116. private final Object readLock = new Object();
  117. long length;
  118. private int activeWindows;
  119. private int activeCopyRawData;
  120. Instant packLastModified;
  121. private PackFileSnapshot fileSnapshot;
  122. private volatile boolean invalid;
  123. private volatile Exception invalidatingCause;
  124. private boolean invalidBitmap;
  125. private AtomicInteger transientErrorCount = new AtomicInteger();
  126. private byte[] packChecksum;
  127. private volatile PackIndex loadedIdx;
  128. private PackReverseIndex reverseIdx;
  129. private PackBitmapIndex bitmapIdx;
  130. /**
  131. * Objects we have tried to read, and discovered to be corrupt.
  132. * <p>
  133. * The list is allocated after the first corruption is found, and filled in
  134. * as more entries are discovered. Typically this list is never used, as
  135. * pack files do not usually contain corrupt objects.
  136. */
  137. private volatile LongList corruptObjects;
  138. /**
  139. * Construct a reader for an existing, pre-indexed packfile.
  140. *
  141. * @param packFile
  142. * path of the <code>.pack</code> file holding the data.
  143. * @param extensions
  144. * additional pack file extensions with the same base as the pack
  145. */
  146. public PackFile(File packFile, int extensions) {
  147. this.packFile = packFile;
  148. this.fileSnapshot = PackFileSnapshot.save(packFile);
  149. this.packLastModified = fileSnapshot.lastModifiedInstant();
  150. this.extensions = extensions;
  151. // Multiply by 31 here so we can more directly combine with another
  152. // value in WindowCache.hash(), without doing the multiply there.
  153. //
  154. hash = System.identityHashCode(this) * 31;
  155. length = Long.MAX_VALUE;
  156. }
  157. private PackIndex idx() throws IOException {
  158. PackIndex idx = loadedIdx;
  159. if (idx == null) {
  160. synchronized (this) {
  161. idx = loadedIdx;
  162. if (idx == null) {
  163. if (invalid) {
  164. throw new PackInvalidException(packFile, invalidatingCause);
  165. }
  166. try {
  167. long start = System.currentTimeMillis();
  168. idx = PackIndex.open(extFile(INDEX));
  169. if (LOG.isDebugEnabled()) {
  170. LOG.debug(String.format(
  171. "Opening pack index %s, size %.3f MB took %d ms", //$NON-NLS-1$
  172. extFile(INDEX).getAbsolutePath(),
  173. Float.valueOf(extFile(INDEX).length()
  174. / (1024f * 1024)),
  175. Long.valueOf(System.currentTimeMillis()
  176. - start)));
  177. }
  178. if (packChecksum == null) {
  179. packChecksum = idx.packChecksum;
  180. fileSnapshot.setChecksum(
  181. ObjectId.fromRaw(packChecksum));
  182. } else if (!Arrays.equals(packChecksum,
  183. idx.packChecksum)) {
  184. throw new PackMismatchException(MessageFormat
  185. .format(JGitText.get().packChecksumMismatch,
  186. packFile.getPath(),
  187. ObjectId.fromRaw(packChecksum)
  188. .name(),
  189. ObjectId.fromRaw(idx.packChecksum)
  190. .name()));
  191. }
  192. loadedIdx = idx;
  193. } catch (InterruptedIOException e) {
  194. // don't invalidate the pack, we are interrupted from
  195. // another thread
  196. throw e;
  197. } catch (IOException e) {
  198. invalid = true;
  199. invalidatingCause = e;
  200. throw e;
  201. }
  202. }
  203. }
  204. }
  205. return idx;
  206. }
  207. /**
  208. * Get the File object which locates this pack on disk.
  209. *
  210. * @return the File object which locates this pack on disk.
  211. */
  212. public File getPackFile() {
  213. return packFile;
  214. }
  215. /**
  216. * Get the index for this pack file.
  217. *
  218. * @return the index for this pack file.
  219. * @throws java.io.IOException
  220. */
  221. public PackIndex getIndex() throws IOException {
  222. return idx();
  223. }
  224. /**
  225. * Get name extracted from {@code pack-*.pack} pattern.
  226. *
  227. * @return name extracted from {@code pack-*.pack} pattern.
  228. */
  229. public String getPackName() {
  230. String name = packName;
  231. if (name == null) {
  232. name = getPackFile().getName();
  233. if (name.startsWith("pack-")) //$NON-NLS-1$
  234. name = name.substring("pack-".length()); //$NON-NLS-1$
  235. if (name.endsWith(".pack")) //$NON-NLS-1$
  236. name = name.substring(0, name.length() - ".pack".length()); //$NON-NLS-1$
  237. packName = name;
  238. }
  239. return name;
  240. }
  241. /**
  242. * Determine if an object is contained within the pack file.
  243. * <p>
  244. * For performance reasons only the index file is searched; the main pack
  245. * content is ignored entirely.
  246. * </p>
  247. *
  248. * @param id
  249. * the object to look for. Must not be null.
  250. * @return true if the object is in this pack; false otherwise.
  251. * @throws java.io.IOException
  252. * the index file cannot be loaded into memory.
  253. */
  254. public boolean hasObject(AnyObjectId id) throws IOException {
  255. final long offset = idx().findOffset(id);
  256. return 0 < offset && !isCorrupt(offset);
  257. }
  258. /**
  259. * Determines whether a .keep file exists for this pack file.
  260. *
  261. * @return true if a .keep file exist.
  262. */
  263. public boolean shouldBeKept() {
  264. if (keepFile == null)
  265. keepFile = extFile(KEEP);
  266. return keepFile.exists();
  267. }
  268. /**
  269. * Get an object from this pack.
  270. *
  271. * @param curs
  272. * temporary working space associated with the calling thread.
  273. * @param id
  274. * the object to obtain from the pack. Must not be null.
  275. * @return the object loader for the requested object if it is contained in
  276. * this pack; null if the object was not found.
  277. * @throws IOException
  278. * the pack file or the index could not be read.
  279. */
  280. ObjectLoader get(WindowCursor curs, AnyObjectId id)
  281. throws IOException {
  282. final long offset = idx().findOffset(id);
  283. return 0 < offset && !isCorrupt(offset) ? load(curs, offset) : null;
  284. }
  285. void resolve(Set<ObjectId> matches, AbbreviatedObjectId id, int matchLimit)
  286. throws IOException {
  287. idx().resolve(matches, id, matchLimit);
  288. }
  289. /**
  290. * Close the resources utilized by this repository
  291. */
  292. public void close() {
  293. WindowCache.purge(this);
  294. synchronized (this) {
  295. loadedIdx = null;
  296. reverseIdx = null;
  297. }
  298. }
  299. /**
  300. * {@inheritDoc}
  301. * <p>
  302. * Provide iterator over entries in associated pack index, that should also
  303. * exist in this pack file. Objects returned by such iterator are mutable
  304. * during iteration.
  305. * <p>
  306. * Iterator returns objects in SHA-1 lexicographical order.
  307. * </p>
  308. *
  309. * @see PackIndex#iterator()
  310. */
  311. @Override
  312. public Iterator<PackIndex.MutableEntry> iterator() {
  313. try {
  314. return idx().iterator();
  315. } catch (IOException e) {
  316. return Collections.<PackIndex.MutableEntry> emptyList().iterator();
  317. }
  318. }
  319. /**
  320. * Obtain the total number of objects available in this pack. This method
  321. * relies on pack index, giving number of effectively available objects.
  322. *
  323. * @return number of objects in index of this pack, likewise in this pack
  324. * @throws IOException
  325. * the index file cannot be loaded into memory.
  326. */
  327. long getObjectCount() throws IOException {
  328. return idx().getObjectCount();
  329. }
  330. /**
  331. * Search for object id with the specified start offset in associated pack
  332. * (reverse) index.
  333. *
  334. * @param offset
  335. * start offset of object to find
  336. * @return object id for this offset, or null if no object was found
  337. * @throws IOException
  338. * the index file cannot be loaded into memory.
  339. */
  340. ObjectId findObjectForOffset(long offset) throws IOException {
  341. return getReverseIdx().findObject(offset);
  342. }
  343. /**
  344. * Return the @{@link FileSnapshot} associated to the underlying packfile
  345. * that has been used when the object was created.
  346. *
  347. * @return the packfile @{@link FileSnapshot} that the object is loaded from.
  348. */
  349. PackFileSnapshot getFileSnapshot() {
  350. return fileSnapshot;
  351. }
  352. AnyObjectId getPackChecksum() {
  353. return ObjectId.fromRaw(packChecksum);
  354. }
  355. private final byte[] decompress(final long position, final int sz,
  356. final WindowCursor curs) throws IOException, DataFormatException {
  357. byte[] dstbuf;
  358. try {
  359. dstbuf = new byte[sz];
  360. } catch (OutOfMemoryError noMemory) {
  361. // The size may be larger than our heap allows, return null to
  362. // let the caller know allocation isn't possible and it should
  363. // use the large object streaming approach instead.
  364. //
  365. // For example, this can occur when sz is 640 MB, and JRE
  366. // maximum heap size is only 256 MB. Even if the JRE has
  367. // 200 MB free, it cannot allocate a 640 MB byte array.
  368. return null;
  369. }
  370. if (curs.inflate(this, position, dstbuf, false) != sz)
  371. throw new EOFException(MessageFormat.format(
  372. JGitText.get().shortCompressedStreamAt,
  373. Long.valueOf(position)));
  374. return dstbuf;
  375. }
  376. void copyPackAsIs(PackOutputStream out, WindowCursor curs)
  377. throws IOException {
  378. // Pin the first window, this ensures the length is accurate.
  379. curs.pin(this, 0);
  380. curs.copyPackAsIs(this, length, out);
  381. }
  382. final void copyAsIs(PackOutputStream out, LocalObjectToPack src,
  383. boolean validate, WindowCursor curs) throws IOException,
  384. StoredObjectRepresentationNotAvailableException {
  385. beginCopyAsIs(src);
  386. try {
  387. copyAsIs2(out, src, validate, curs);
  388. } finally {
  389. endCopyAsIs();
  390. }
  391. }
  392. private void copyAsIs2(PackOutputStream out, LocalObjectToPack src,
  393. boolean validate, WindowCursor curs) throws IOException,
  394. StoredObjectRepresentationNotAvailableException {
  395. final CRC32 crc1 = validate ? new CRC32() : null;
  396. final CRC32 crc2 = validate ? new CRC32() : null;
  397. final byte[] buf = out.getCopyBuffer();
  398. // Rip apart the header so we can discover the size.
  399. //
  400. readFully(src.offset, buf, 0, 20, curs);
  401. int c = buf[0] & 0xff;
  402. final int typeCode = (c >> 4) & 7;
  403. long inflatedLength = c & 15;
  404. int shift = 4;
  405. int headerCnt = 1;
  406. while ((c & 0x80) != 0) {
  407. c = buf[headerCnt++] & 0xff;
  408. inflatedLength += ((long) (c & 0x7f)) << shift;
  409. shift += 7;
  410. }
  411. if (typeCode == Constants.OBJ_OFS_DELTA) {
  412. do {
  413. c = buf[headerCnt++] & 0xff;
  414. } while ((c & 128) != 0);
  415. if (validate) {
  416. assert(crc1 != null && crc2 != null);
  417. crc1.update(buf, 0, headerCnt);
  418. crc2.update(buf, 0, headerCnt);
  419. }
  420. } else if (typeCode == Constants.OBJ_REF_DELTA) {
  421. if (validate) {
  422. assert(crc1 != null && crc2 != null);
  423. crc1.update(buf, 0, headerCnt);
  424. crc2.update(buf, 0, headerCnt);
  425. }
  426. readFully(src.offset + headerCnt, buf, 0, 20, curs);
  427. if (validate) {
  428. assert(crc1 != null && crc2 != null);
  429. crc1.update(buf, 0, 20);
  430. crc2.update(buf, 0, 20);
  431. }
  432. headerCnt += 20;
  433. } else if (validate) {
  434. assert(crc1 != null && crc2 != null);
  435. crc1.update(buf, 0, headerCnt);
  436. crc2.update(buf, 0, headerCnt);
  437. }
  438. final long dataOffset = src.offset + headerCnt;
  439. final long dataLength = src.length;
  440. final long expectedCRC;
  441. final ByteArrayWindow quickCopy;
  442. // Verify the object isn't corrupt before sending. If it is,
  443. // we report it missing instead.
  444. //
  445. try {
  446. quickCopy = curs.quickCopy(this, dataOffset, dataLength);
  447. if (validate && idx().hasCRC32Support()) {
  448. assert(crc1 != null);
  449. // Index has the CRC32 code cached, validate the object.
  450. //
  451. expectedCRC = idx().findCRC32(src);
  452. if (quickCopy != null) {
  453. quickCopy.crc32(crc1, dataOffset, (int) dataLength);
  454. } else {
  455. long pos = dataOffset;
  456. long cnt = dataLength;
  457. while (cnt > 0) {
  458. final int n = (int) Math.min(cnt, buf.length);
  459. readFully(pos, buf, 0, n, curs);
  460. crc1.update(buf, 0, n);
  461. pos += n;
  462. cnt -= n;
  463. }
  464. }
  465. if (crc1.getValue() != expectedCRC) {
  466. setCorrupt(src.offset);
  467. throw new CorruptObjectException(MessageFormat.format(
  468. JGitText.get().objectAtHasBadZlibStream,
  469. Long.valueOf(src.offset), getPackFile()));
  470. }
  471. } else if (validate) {
  472. // We don't have a CRC32 code in the index, so compute it
  473. // now while inflating the raw data to get zlib to tell us
  474. // whether or not the data is safe.
  475. //
  476. Inflater inf = curs.inflater();
  477. byte[] tmp = new byte[1024];
  478. if (quickCopy != null) {
  479. quickCopy.check(inf, tmp, dataOffset, (int) dataLength);
  480. } else {
  481. assert(crc1 != null);
  482. long pos = dataOffset;
  483. long cnt = dataLength;
  484. while (cnt > 0) {
  485. final int n = (int) Math.min(cnt, buf.length);
  486. readFully(pos, buf, 0, n, curs);
  487. crc1.update(buf, 0, n);
  488. inf.setInput(buf, 0, n);
  489. while (inf.inflate(tmp, 0, tmp.length) > 0)
  490. continue;
  491. pos += n;
  492. cnt -= n;
  493. }
  494. }
  495. if (!inf.finished() || inf.getBytesRead() != dataLength) {
  496. setCorrupt(src.offset);
  497. throw new EOFException(MessageFormat.format(
  498. JGitText.get().shortCompressedStreamAt,
  499. Long.valueOf(src.offset)));
  500. }
  501. assert(crc1 != null);
  502. expectedCRC = crc1.getValue();
  503. } else {
  504. expectedCRC = -1;
  505. }
  506. } catch (DataFormatException dataFormat) {
  507. setCorrupt(src.offset);
  508. CorruptObjectException corruptObject = new CorruptObjectException(
  509. MessageFormat.format(
  510. JGitText.get().objectAtHasBadZlibStream,
  511. Long.valueOf(src.offset), getPackFile()),
  512. dataFormat);
  513. throw new StoredObjectRepresentationNotAvailableException(src,
  514. corruptObject);
  515. } catch (IOException ioError) {
  516. throw new StoredObjectRepresentationNotAvailableException(src,
  517. ioError);
  518. }
  519. if (quickCopy != null) {
  520. // The entire object fits into a single byte array window slice,
  521. // and we have it pinned. Write this out without copying.
  522. //
  523. out.writeHeader(src, inflatedLength);
  524. quickCopy.write(out, dataOffset, (int) dataLength);
  525. } else if (dataLength <= buf.length) {
  526. // Tiny optimization: Lots of objects are very small deltas or
  527. // deflated commits that are likely to fit in the copy buffer.
  528. //
  529. if (!validate) {
  530. long pos = dataOffset;
  531. long cnt = dataLength;
  532. while (cnt > 0) {
  533. final int n = (int) Math.min(cnt, buf.length);
  534. readFully(pos, buf, 0, n, curs);
  535. pos += n;
  536. cnt -= n;
  537. }
  538. }
  539. out.writeHeader(src, inflatedLength);
  540. out.write(buf, 0, (int) dataLength);
  541. } else {
  542. // Now we are committed to sending the object. As we spool it out,
  543. // check its CRC32 code to make sure there wasn't corruption between
  544. // the verification we did above, and us actually outputting it.
  545. //
  546. out.writeHeader(src, inflatedLength);
  547. long pos = dataOffset;
  548. long cnt = dataLength;
  549. while (cnt > 0) {
  550. final int n = (int) Math.min(cnt, buf.length);
  551. readFully(pos, buf, 0, n, curs);
  552. if (validate) {
  553. assert(crc2 != null);
  554. crc2.update(buf, 0, n);
  555. }
  556. out.write(buf, 0, n);
  557. pos += n;
  558. cnt -= n;
  559. }
  560. if (validate) {
  561. assert(crc2 != null);
  562. if (crc2.getValue() != expectedCRC) {
  563. throw new CorruptObjectException(MessageFormat.format(
  564. JGitText.get().objectAtHasBadZlibStream,
  565. Long.valueOf(src.offset), getPackFile()));
  566. }
  567. }
  568. }
  569. }
  570. boolean invalid() {
  571. return invalid;
  572. }
  573. void setInvalid() {
  574. invalid = true;
  575. }
  576. int incrementTransientErrorCount() {
  577. return transientErrorCount.incrementAndGet();
  578. }
  579. void resetTransientErrorCount() {
  580. transientErrorCount.set(0);
  581. }
  582. private void readFully(final long position, final byte[] dstbuf,
  583. int dstoff, final int cnt, final WindowCursor curs)
  584. throws IOException {
  585. if (curs.copy(this, position, dstbuf, dstoff, cnt) != cnt)
  586. throw new EOFException();
  587. }
  588. private synchronized void beginCopyAsIs(ObjectToPack otp)
  589. throws StoredObjectRepresentationNotAvailableException {
  590. if (++activeCopyRawData == 1 && activeWindows == 0) {
  591. try {
  592. doOpen();
  593. } catch (IOException thisPackNotValid) {
  594. throw new StoredObjectRepresentationNotAvailableException(otp,
  595. thisPackNotValid);
  596. }
  597. }
  598. }
  599. private synchronized void endCopyAsIs() {
  600. if (--activeCopyRawData == 0 && activeWindows == 0)
  601. doClose();
  602. }
  603. synchronized boolean beginWindowCache() throws IOException {
  604. if (++activeWindows == 1) {
  605. if (activeCopyRawData == 0)
  606. doOpen();
  607. return true;
  608. }
  609. return false;
  610. }
  611. synchronized boolean endWindowCache() {
  612. final boolean r = --activeWindows == 0;
  613. if (r && activeCopyRawData == 0)
  614. doClose();
  615. return r;
  616. }
  617. private void doOpen() throws IOException {
  618. if (invalid) {
  619. openFail(true, invalidatingCause);
  620. throw new PackInvalidException(packFile, invalidatingCause);
  621. }
  622. try {
  623. synchronized (readLock) {
  624. fd = new RandomAccessFile(packFile, "r"); //$NON-NLS-1$
  625. length = fd.length();
  626. onOpenPack();
  627. }
  628. } catch (InterruptedIOException e) {
  629. // don't invalidate the pack, we are interrupted from another thread
  630. openFail(false, e);
  631. throw e;
  632. } catch (FileNotFoundException fn) {
  633. // don't invalidate the pack if opening an existing file failed
  634. // since it may be related to a temporary lack of resources (e.g.
  635. // max open files)
  636. openFail(!packFile.exists(), fn);
  637. throw fn;
  638. } catch (EOFException | AccessDeniedException | NoSuchFileException
  639. | CorruptObjectException | NoPackSignatureException
  640. | PackMismatchException | UnpackException
  641. | UnsupportedPackIndexVersionException
  642. | UnsupportedPackVersionException pe) {
  643. // exceptions signaling permanent problems with a pack
  644. openFail(true, pe);
  645. throw pe;
  646. } catch (IOException | RuntimeException ge) {
  647. // generic exceptions could be transient so we should not mark the
  648. // pack invalid to avoid false MissingObjectExceptions
  649. openFail(false, ge);
  650. throw ge;
  651. }
  652. }
  653. private void openFail(boolean invalidate, Exception cause) {
  654. activeWindows = 0;
  655. activeCopyRawData = 0;
  656. invalid = invalidate;
  657. invalidatingCause = cause;
  658. doClose();
  659. }
  660. private void doClose() {
  661. synchronized (readLock) {
  662. if (fd != null) {
  663. try {
  664. fd.close();
  665. } catch (IOException err) {
  666. // Ignore a close event. We had it open only for reading.
  667. // There should not be errors related to network buffers
  668. // not flushed, etc.
  669. }
  670. fd = null;
  671. }
  672. }
  673. }
  674. ByteArrayWindow read(long pos, int size) throws IOException {
  675. synchronized (readLock) {
  676. if (invalid || fd == null) {
  677. // Due to concurrency between a read and another packfile invalidation thread
  678. // one thread could come up to this point and then fail with NPE.
  679. // Detect the situation and throw a proper exception so that can be properly
  680. // managed by the main packfile search loop and the Git client won't receive
  681. // any failures.
  682. throw new PackInvalidException(packFile, invalidatingCause);
  683. }
  684. if (length < pos + size)
  685. size = (int) (length - pos);
  686. final byte[] buf = new byte[size];
  687. fd.seek(pos);
  688. fd.readFully(buf, 0, size);
  689. return new ByteArrayWindow(this, pos, buf);
  690. }
  691. }
  692. ByteWindow mmap(long pos, int size) throws IOException {
  693. synchronized (readLock) {
  694. if (length < pos + size)
  695. size = (int) (length - pos);
  696. MappedByteBuffer map;
  697. try {
  698. map = fd.getChannel().map(MapMode.READ_ONLY, pos, size);
  699. } catch (IOException ioe1) {
  700. // The most likely reason this failed is the JVM has run out
  701. // of virtual memory. We need to discard quickly, and try to
  702. // force the GC to finalize and release any existing mappings.
  703. //
  704. System.gc();
  705. System.runFinalization();
  706. map = fd.getChannel().map(MapMode.READ_ONLY, pos, size);
  707. }
  708. if (map.hasArray())
  709. return new ByteArrayWindow(this, pos, map.array());
  710. return new ByteBufferWindow(this, pos, map);
  711. }
  712. }
  713. private void onOpenPack() throws IOException {
  714. final PackIndex idx = idx();
  715. final byte[] buf = new byte[20];
  716. fd.seek(0);
  717. fd.readFully(buf, 0, 12);
  718. if (RawParseUtils.match(buf, 0, Constants.PACK_SIGNATURE) != 4) {
  719. throw new NoPackSignatureException(JGitText.get().notAPACKFile);
  720. }
  721. final long vers = NB.decodeUInt32(buf, 4);
  722. final long packCnt = NB.decodeUInt32(buf, 8);
  723. if (vers != 2 && vers != 3) {
  724. throw new UnsupportedPackVersionException(vers);
  725. }
  726. if (packCnt != idx.getObjectCount()) {
  727. throw new PackMismatchException(MessageFormat.format(
  728. JGitText.get().packObjectCountMismatch,
  729. Long.valueOf(packCnt), Long.valueOf(idx.getObjectCount()),
  730. getPackFile()));
  731. }
  732. fd.seek(length - 20);
  733. fd.readFully(buf, 0, 20);
  734. if (!Arrays.equals(buf, packChecksum)) {
  735. throw new PackMismatchException(MessageFormat.format(
  736. JGitText.get().packChecksumMismatch,
  737. getPackFile(),
  738. ObjectId.fromRaw(buf).name(),
  739. ObjectId.fromRaw(idx.packChecksum).name()));
  740. }
  741. }
  742. ObjectLoader load(WindowCursor curs, long pos)
  743. throws IOException, LargeObjectException {
  744. try {
  745. final byte[] ib = curs.tempId;
  746. Delta delta = null;
  747. byte[] data = null;
  748. int type = Constants.OBJ_BAD;
  749. boolean cached = false;
  750. SEARCH: for (;;) {
  751. readFully(pos, ib, 0, 20, curs);
  752. int c = ib[0] & 0xff;
  753. final int typeCode = (c >> 4) & 7;
  754. long sz = c & 15;
  755. int shift = 4;
  756. int p = 1;
  757. while ((c & 0x80) != 0) {
  758. c = ib[p++] & 0xff;
  759. sz += ((long) (c & 0x7f)) << shift;
  760. shift += 7;
  761. }
  762. switch (typeCode) {
  763. case Constants.OBJ_COMMIT:
  764. case Constants.OBJ_TREE:
  765. case Constants.OBJ_BLOB:
  766. case Constants.OBJ_TAG: {
  767. if (delta != null || sz < curs.getStreamFileThreshold())
  768. data = decompress(pos + p, (int) sz, curs);
  769. if (delta != null) {
  770. type = typeCode;
  771. break SEARCH;
  772. }
  773. if (data != null)
  774. return new ObjectLoader.SmallObject(typeCode, data);
  775. else
  776. return new LargePackedWholeObject(typeCode, sz, pos, p,
  777. this, curs.db);
  778. }
  779. case Constants.OBJ_OFS_DELTA: {
  780. c = ib[p++] & 0xff;
  781. long base = c & 127;
  782. while ((c & 128) != 0) {
  783. base += 1;
  784. c = ib[p++] & 0xff;
  785. base <<= 7;
  786. base += (c & 127);
  787. }
  788. base = pos - base;
  789. delta = new Delta(delta, pos, (int) sz, p, base);
  790. if (sz != delta.deltaSize)
  791. break SEARCH;
  792. DeltaBaseCache.Entry e = curs.getDeltaBaseCache().get(this, base);
  793. if (e != null) {
  794. type = e.type;
  795. data = e.data;
  796. cached = true;
  797. break SEARCH;
  798. }
  799. pos = base;
  800. continue SEARCH;
  801. }
  802. case Constants.OBJ_REF_DELTA: {
  803. readFully(pos + p, ib, 0, 20, curs);
  804. long base = findDeltaBase(ObjectId.fromRaw(ib));
  805. delta = new Delta(delta, pos, (int) sz, p + 20, base);
  806. if (sz != delta.deltaSize)
  807. break SEARCH;
  808. DeltaBaseCache.Entry e = curs.getDeltaBaseCache().get(this, base);
  809. if (e != null) {
  810. type = e.type;
  811. data = e.data;
  812. cached = true;
  813. break SEARCH;
  814. }
  815. pos = base;
  816. continue SEARCH;
  817. }
  818. default:
  819. throw new IOException(MessageFormat.format(
  820. JGitText.get().unknownObjectType,
  821. Integer.valueOf(typeCode)));
  822. }
  823. }
  824. // At this point there is at least one delta to apply to data.
  825. // (Whole objects with no deltas to apply return early above.)
  826. if (data == null)
  827. throw new IOException(JGitText.get().inMemoryBufferLimitExceeded);
  828. assert(delta != null);
  829. do {
  830. // Cache only the base immediately before desired object.
  831. if (cached)
  832. cached = false;
  833. else if (delta.next == null)
  834. curs.getDeltaBaseCache().store(this, delta.basePos, data, type);
  835. pos = delta.deltaPos;
  836. final byte[] cmds = decompress(pos + delta.hdrLen,
  837. delta.deltaSize, curs);
  838. if (cmds == null) {
  839. data = null; // Discard base in case of OutOfMemoryError
  840. throw new LargeObjectException.OutOfMemory(new OutOfMemoryError());
  841. }
  842. final long sz = BinaryDelta.getResultSize(cmds);
  843. if (Integer.MAX_VALUE <= sz)
  844. throw new LargeObjectException.ExceedsByteArrayLimit();
  845. final byte[] result;
  846. try {
  847. result = new byte[(int) sz];
  848. } catch (OutOfMemoryError tooBig) {
  849. data = null; // Discard base in case of OutOfMemoryError
  850. throw new LargeObjectException.OutOfMemory(tooBig);
  851. }
  852. BinaryDelta.apply(data, cmds, result);
  853. data = result;
  854. delta = delta.next;
  855. } while (delta != null);
  856. return new ObjectLoader.SmallObject(type, data);
  857. } catch (DataFormatException dfe) {
  858. throw new CorruptObjectException(
  859. MessageFormat.format(
  860. JGitText.get().objectAtHasBadZlibStream,
  861. Long.valueOf(pos), getPackFile()),
  862. dfe);
  863. }
  864. }
  865. private long findDeltaBase(ObjectId baseId) throws IOException,
  866. MissingObjectException {
  867. long ofs = idx().findOffset(baseId);
  868. if (ofs < 0)
  869. throw new MissingObjectException(baseId,
  870. JGitText.get().missingDeltaBase);
  871. return ofs;
  872. }
  873. private static class Delta {
  874. /** Child that applies onto this object. */
  875. final Delta next;
  876. /** Offset of the delta object. */
  877. final long deltaPos;
  878. /** Size of the inflated delta stream. */
  879. final int deltaSize;
  880. /** Total size of the delta's pack entry header (including base). */
  881. final int hdrLen;
  882. /** Offset of the base object this delta applies onto. */
  883. final long basePos;
  884. Delta(Delta next, long ofs, int sz, int hdrLen, long baseOffset) {
  885. this.next = next;
  886. this.deltaPos = ofs;
  887. this.deltaSize = sz;
  888. this.hdrLen = hdrLen;
  889. this.basePos = baseOffset;
  890. }
  891. }
  892. byte[] getDeltaHeader(WindowCursor wc, long pos)
  893. throws IOException, DataFormatException {
  894. // The delta stream starts as two variable length integers. If we
  895. // assume they are 64 bits each, we need 16 bytes to encode them,
  896. // plus 2 extra bytes for the variable length overhead. So 18 is
  897. // the longest delta instruction header.
  898. //
  899. final byte[] hdr = new byte[18];
  900. wc.inflate(this, pos, hdr, true /* headerOnly */);
  901. return hdr;
  902. }
  903. int getObjectType(WindowCursor curs, long pos) throws IOException {
  904. final byte[] ib = curs.tempId;
  905. for (;;) {
  906. readFully(pos, ib, 0, 20, curs);
  907. int c = ib[0] & 0xff;
  908. final int type = (c >> 4) & 7;
  909. switch (type) {
  910. case Constants.OBJ_COMMIT:
  911. case Constants.OBJ_TREE:
  912. case Constants.OBJ_BLOB:
  913. case Constants.OBJ_TAG:
  914. return type;
  915. case Constants.OBJ_OFS_DELTA: {
  916. int p = 1;
  917. while ((c & 0x80) != 0)
  918. c = ib[p++] & 0xff;
  919. c = ib[p++] & 0xff;
  920. long ofs = c & 127;
  921. while ((c & 128) != 0) {
  922. ofs += 1;
  923. c = ib[p++] & 0xff;
  924. ofs <<= 7;
  925. ofs += (c & 127);
  926. }
  927. pos = pos - ofs;
  928. continue;
  929. }
  930. case Constants.OBJ_REF_DELTA: {
  931. int p = 1;
  932. while ((c & 0x80) != 0)
  933. c = ib[p++] & 0xff;
  934. readFully(pos + p, ib, 0, 20, curs);
  935. pos = findDeltaBase(ObjectId.fromRaw(ib));
  936. continue;
  937. }
  938. default:
  939. throw new IOException(
  940. MessageFormat.format(JGitText.get().unknownObjectType,
  941. Integer.valueOf(type)));
  942. }
  943. }
  944. }
  945. long getObjectSize(WindowCursor curs, AnyObjectId id)
  946. throws IOException {
  947. final long offset = idx().findOffset(id);
  948. return 0 < offset ? getObjectSize(curs, offset) : -1;
  949. }
  950. long getObjectSize(WindowCursor curs, long pos)
  951. throws IOException {
  952. final byte[] ib = curs.tempId;
  953. readFully(pos, ib, 0, 20, curs);
  954. int c = ib[0] & 0xff;
  955. final int type = (c >> 4) & 7;
  956. long sz = c & 15;
  957. int shift = 4;
  958. int p = 1;
  959. while ((c & 0x80) != 0) {
  960. c = ib[p++] & 0xff;
  961. sz += ((long) (c & 0x7f)) << shift;
  962. shift += 7;
  963. }
  964. long deltaAt;
  965. switch (type) {
  966. case Constants.OBJ_COMMIT:
  967. case Constants.OBJ_TREE:
  968. case Constants.OBJ_BLOB:
  969. case Constants.OBJ_TAG:
  970. return sz;
  971. case Constants.OBJ_OFS_DELTA:
  972. c = ib[p++] & 0xff;
  973. while ((c & 128) != 0)
  974. c = ib[p++] & 0xff;
  975. deltaAt = pos + p;
  976. break;
  977. case Constants.OBJ_REF_DELTA:
  978. deltaAt = pos + p + 20;
  979. break;
  980. default:
  981. throw new IOException(MessageFormat.format(
  982. JGitText.get().unknownObjectType, Integer.valueOf(type)));
  983. }
  984. try {
  985. return BinaryDelta.getResultSize(getDeltaHeader(curs, deltaAt));
  986. } catch (DataFormatException e) {
  987. throw new CorruptObjectException(MessageFormat.format(
  988. JGitText.get().objectAtHasBadZlibStream, Long.valueOf(pos),
  989. getPackFile()));
  990. }
  991. }
  992. LocalObjectRepresentation representation(final WindowCursor curs,
  993. final AnyObjectId objectId) throws IOException {
  994. final long pos = idx().findOffset(objectId);
  995. if (pos < 0)
  996. return null;
  997. final byte[] ib = curs.tempId;
  998. readFully(pos, ib, 0, 20, curs);
  999. int c = ib[0] & 0xff;
  1000. int p = 1;
  1001. final int typeCode = (c >> 4) & 7;
  1002. while ((c & 0x80) != 0)
  1003. c = ib[p++] & 0xff;
  1004. long len = (findEndOffset(pos) - pos);
  1005. switch (typeCode) {
  1006. case Constants.OBJ_COMMIT:
  1007. case Constants.OBJ_TREE:
  1008. case Constants.OBJ_BLOB:
  1009. case Constants.OBJ_TAG:
  1010. return LocalObjectRepresentation.newWhole(this, pos, len - p);
  1011. case Constants.OBJ_OFS_DELTA: {
  1012. c = ib[p++] & 0xff;
  1013. long ofs = c & 127;
  1014. while ((c & 128) != 0) {
  1015. ofs += 1;
  1016. c = ib[p++] & 0xff;
  1017. ofs <<= 7;
  1018. ofs += (c & 127);
  1019. }
  1020. ofs = pos - ofs;
  1021. return LocalObjectRepresentation.newDelta(this, pos, len - p, ofs);
  1022. }
  1023. case Constants.OBJ_REF_DELTA: {
  1024. len -= p;
  1025. len -= Constants.OBJECT_ID_LENGTH;
  1026. readFully(pos + p, ib, 0, 20, curs);
  1027. ObjectId id = ObjectId.fromRaw(ib);
  1028. return LocalObjectRepresentation.newDelta(this, pos, len, id);
  1029. }
  1030. default:
  1031. throw new IOException(
  1032. MessageFormat.format(JGitText.get().unknownObjectType,
  1033. Integer.valueOf(typeCode)));
  1034. }
  1035. }
  1036. private long findEndOffset(long startOffset)
  1037. throws IOException, CorruptObjectException {
  1038. final long maxOffset = length - 20;
  1039. return getReverseIdx().findNextOffset(startOffset, maxOffset);
  1040. }
  1041. synchronized PackBitmapIndex getBitmapIndex() throws IOException {
  1042. if (invalid || invalidBitmap)
  1043. return null;
  1044. if (bitmapIdx == null && hasExt(BITMAP_INDEX)) {
  1045. final PackBitmapIndex idx;
  1046. try {
  1047. idx = PackBitmapIndex.open(extFile(BITMAP_INDEX), idx(),
  1048. getReverseIdx());
  1049. } catch (FileNotFoundException e) {
  1050. // Once upon a time this bitmap file existed. Now it
  1051. // has been removed. Most likely an external gc has
  1052. // removed this packfile and the bitmap
  1053. invalidBitmap = true;
  1054. return null;
  1055. }
  1056. // At this point, idx() will have set packChecksum.
  1057. if (Arrays.equals(packChecksum, idx.packChecksum))
  1058. bitmapIdx = idx;
  1059. else
  1060. invalidBitmap = true;
  1061. }
  1062. return bitmapIdx;
  1063. }
  1064. private synchronized PackReverseIndex getReverseIdx() throws IOException {
  1065. if (reverseIdx == null)
  1066. reverseIdx = new PackReverseIndex(idx());
  1067. return reverseIdx;
  1068. }
  1069. private boolean isCorrupt(long offset) {
  1070. LongList list = corruptObjects;
  1071. if (list == null)
  1072. return false;
  1073. synchronized (list) {
  1074. return list.contains(offset);
  1075. }
  1076. }
  1077. private void setCorrupt(long offset) {
  1078. LongList list = corruptObjects;
  1079. if (list == null) {
  1080. synchronized (readLock) {
  1081. list = corruptObjects;
  1082. if (list == null) {
  1083. list = new LongList();
  1084. corruptObjects = list;
  1085. }
  1086. }
  1087. }
  1088. synchronized (list) {
  1089. list.add(offset);
  1090. }
  1091. }
  1092. private File extFile(PackExt ext) {
  1093. String p = packFile.getName();
  1094. int dot = p.lastIndexOf('.');
  1095. String b = (dot < 0) ? p : p.substring(0, dot);
  1096. return new File(packFile.getParentFile(), b + '.' + ext.getExtension());
  1097. }
  1098. private boolean hasExt(PackExt ext) {
  1099. return (extensions & ext.getBit()) != 0;
  1100. }
  1101. @SuppressWarnings("nls")
  1102. @Override
  1103. public String toString() {
  1104. return "PackFile [packFileName=" + packFile.getName() + ", length="
  1105. + packFile.length() + ", packChecksum="
  1106. + ObjectId.fromRaw(packChecksum).name() + "]";
  1107. }
  1108. }