You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

Don't use interruptable pread() to access pack files The J2SE NIO APIs require that FileChannel close the underlying file descriptor if a thread is interrupted while it is inside of a read or write operation on that channel. This is insane, because it means we cannot share the file descriptor between threads. If a thread is in the middle of the FileChannel variant of IO.readFully() and it receives an interrupt, the pack will be automatically closed on us. This causes the other threads trying to use that same FileChannel to receive IOExceptions, which leads to the pack getting marked as invalid. Once the pack is marked invalid, JGit loses access to its entire contents and starts to report MissingObjectExceptions. Because PackWriter must ensure that the chosen pack file stays available until the current object's data is fully copied to the output, JGit cannot simply reopen the pack when its automatically closed due to an interrupt being sent at the wrong time. The pack may have been deleted by a concurrent `git gc` process, and that open file descriptor might be the last reference to the inode on disk. Once its closed, the PackWriter loses access to that object representation, and it cannot complete sending the object the client. Fortunately, RandomAccessFile's readFully method does not have this problem. Interrupts during readFully() are ignored. However, it requires us to first seek to the offset we need to read, then issue the read call. This requires locking around the file descriptor to prevent concurrent threads from moving the pointer before the read. This reduces the concurrency level, as now only one window can be paged in at a time from each pack. However, the WindowCache should already be holding most of the pages required to handle the working set for a process, and its own internal locking was already limiting us on the number of concurrent loads possible. Provided that most concurrent accesses are getting hits in the WindowCache, or are for different repositories on the same server, we shouldn't see a major performance hit due to the more serialized loading. I would have preferred to use a pool of RandomAccessFiles for each pack, with threads borrowing an instance dedicated to that thread whenever they needed to page in a window. This would permit much higher levels of concurrency by using multiple file descriptors (and file pointers) for each pack. However the code became too complex to develop in any reasonable period of time, so I've chosen to retrofit the existing code with more serialization instead. Bug: 308945 Change-Id: I2e6e11c6e5a105e5aef68871b66200fd725134c9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 anni fa
Added read/write support for pack bitmap index. A pack bitmap index is an additional index of compressed bitmaps of the object graph. Furthermore, a logical API of the index functionality is included, as it is expected to be used by the PackWriter. Compressed bitmaps are created using the javaewah library, which is a word-aligned compressed variant of the Java bitset class based on run-length encoding. The library only works with positive integer values. Thus, the maximum number of ObjectIds in a pack file that this index can currently support is limited to Integer.MAX_VALUE. Every ObjectId is given an integer mapping. The integer is the position of the ObjectId in the complete ObjectId list, sorted by offset, for the pack file. That integer is what the bitmaps use to reference the ObjectId. Currently, the new index format can only be used with pack files that contain a complete closure of the object graph e.g. the result of a garbage collection. The index file includes four bitmaps for the Git object types i.e. commits, trees, blobs, and tags. In addition, a collection of bitmaps keyed by an ObjectId is also included. The bitmap for each entry in the collection represents the full closure of ObjectIds reachable from the keyed ObjectId (including the keyed ObjectId itself). The bitmaps are further compressed by XORing the current bitmaps against prior bitmaps in the index, and selecting the smallest representation. The XOR'd bitmap and offset from the current entry to the position of the bitmap to XOR against is the actual representation of the entry in the index file. Each entry contains one byte, which is currently used to note whether the bitmap should be blindly reused. Change-Id: Id328724bf6b4c8366a088233098c18643edcf40f
11 anni fa
PackWriter: Support reuse of entire packs The most expensive part of packing a repository for transport to another system is enumerating all of the objects in the repository. Once this gets to the size of the linux-2.6 repository (1.8 million objects), enumeration can take several CPU minutes and costs a lot of temporary working set memory. Teach PackWriter to efficiently reuse an existing "cached pack" by answering a clone request with a thin pack followed by a larger cached pack appended to the end. This requires the repository owner to first construct the cached pack by hand, and record the tip commits inside of $GIT_DIR/objects/info/cached-packs: cd $GIT_DIR root=$(git rev-parse master) tmp=objects/.tmp-$$ names=$(echo $root | git pack-objects --keep-true-parents --revs $tmp) for n in $names; do chmod a-w $tmp-$n.pack $tmp-$n.idx touch objects/pack/pack-$n.keep mv $tmp-$n.pack objects/pack/pack-$n.pack mv $tmp-$n.idx objects/pack/pack-$n.idx done (echo "+ $root"; for n in $names; do echo "P $n"; done; echo) >>objects/info/cached-packs git repack -a -d When a clone request needs to include $root, the corresponding cached pack will be copied as-is, rather than enumerating all of the objects that are reachable from $root. For a linux-2.6 kernel repository that should be about 376 MiB, the above process creates two packs of 368 MiB and 38 MiB[1]. This is a local disk usage increase of ~26 MiB, due to reduced delta compression between the large cached pack and the smaller recent activity pack. The overhead is similar to 1 full copy of the compressed project sources. With this cached pack in hand, JGit daemon completes a clone request in 1m17s less time, but a slightly larger data transfer (+2.39 MiB): Before: remote: Counting objects: 1861830, done remote: Finding sources: 100% (1861830/1861830) remote: Getting sizes: 100% (88243/88243) remote: Compressing objects: 100% (88184/88184) Receiving objects: 100% (1861830/1861830), 376.01 MiB | 19.01 MiB/s, done. remote: Total 1861830 (delta 4706), reused 1851053 (delta 1553844) Resolving deltas: 100% (1564621/1564621), done. real 3m19.005s After: remote: Counting objects: 1601, done remote: Counting objects: 1828460, done remote: Finding sources: 100% (50475/50475) remote: Getting sizes: 100% (18843/18843) remote: Compressing objects: 100% (7585/7585) remote: Total 1861830 (delta 2407), reused 1856197 (delta 37510) Receiving objects: 100% (1861830/1861830), 378.40 MiB | 31.31 MiB/s, done. Resolving deltas: 100% (1559477/1559477), done. real 2m2.938s Repository owners can periodically refresh their cached packs by repacking their repository, folding all newer objects into a larger cached pack. Since repacking is already considered to be a normal Git maintenance activity, this isn't a very big burden. [1] In this test $root was set back about two weeks. Change-Id: Ib87131d5c4b5e8c5cacb0f4fe16ff4ece554734b Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 anni fa
PackWriter: Support reuse of entire packs The most expensive part of packing a repository for transport to another system is enumerating all of the objects in the repository. Once this gets to the size of the linux-2.6 repository (1.8 million objects), enumeration can take several CPU minutes and costs a lot of temporary working set memory. Teach PackWriter to efficiently reuse an existing "cached pack" by answering a clone request with a thin pack followed by a larger cached pack appended to the end. This requires the repository owner to first construct the cached pack by hand, and record the tip commits inside of $GIT_DIR/objects/info/cached-packs: cd $GIT_DIR root=$(git rev-parse master) tmp=objects/.tmp-$$ names=$(echo $root | git pack-objects --keep-true-parents --revs $tmp) for n in $names; do chmod a-w $tmp-$n.pack $tmp-$n.idx touch objects/pack/pack-$n.keep mv $tmp-$n.pack objects/pack/pack-$n.pack mv $tmp-$n.idx objects/pack/pack-$n.idx done (echo "+ $root"; for n in $names; do echo "P $n"; done; echo) >>objects/info/cached-packs git repack -a -d When a clone request needs to include $root, the corresponding cached pack will be copied as-is, rather than enumerating all of the objects that are reachable from $root. For a linux-2.6 kernel repository that should be about 376 MiB, the above process creates two packs of 368 MiB and 38 MiB[1]. This is a local disk usage increase of ~26 MiB, due to reduced delta compression between the large cached pack and the smaller recent activity pack. The overhead is similar to 1 full copy of the compressed project sources. With this cached pack in hand, JGit daemon completes a clone request in 1m17s less time, but a slightly larger data transfer (+2.39 MiB): Before: remote: Counting objects: 1861830, done remote: Finding sources: 100% (1861830/1861830) remote: Getting sizes: 100% (88243/88243) remote: Compressing objects: 100% (88184/88184) Receiving objects: 100% (1861830/1861830), 376.01 MiB | 19.01 MiB/s, done. remote: Total 1861830 (delta 4706), reused 1851053 (delta 1553844) Resolving deltas: 100% (1564621/1564621), done. real 3m19.005s After: remote: Counting objects: 1601, done remote: Counting objects: 1828460, done remote: Finding sources: 100% (50475/50475) remote: Getting sizes: 100% (18843/18843) remote: Compressing objects: 100% (7585/7585) remote: Total 1861830 (delta 2407), reused 1856197 (delta 37510) Receiving objects: 100% (1861830/1861830), 378.40 MiB | 31.31 MiB/s, done. Resolving deltas: 100% (1559477/1559477), done. real 2m2.938s Repository owners can periodically refresh their cached packs by repacking their repository, folding all newer objects into a larger cached pack. Since repacking is already considered to be a normal Git maintenance activity, this isn't a very big burden. [1] In this test $root was set back about two weeks. Change-Id: Ib87131d5c4b5e8c5cacb0f4fe16ff4ece554734b Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 anni fa
PackWriter: Support reuse of entire packs The most expensive part of packing a repository for transport to another system is enumerating all of the objects in the repository. Once this gets to the size of the linux-2.6 repository (1.8 million objects), enumeration can take several CPU minutes and costs a lot of temporary working set memory. Teach PackWriter to efficiently reuse an existing "cached pack" by answering a clone request with a thin pack followed by a larger cached pack appended to the end. This requires the repository owner to first construct the cached pack by hand, and record the tip commits inside of $GIT_DIR/objects/info/cached-packs: cd $GIT_DIR root=$(git rev-parse master) tmp=objects/.tmp-$$ names=$(echo $root | git pack-objects --keep-true-parents --revs $tmp) for n in $names; do chmod a-w $tmp-$n.pack $tmp-$n.idx touch objects/pack/pack-$n.keep mv $tmp-$n.pack objects/pack/pack-$n.pack mv $tmp-$n.idx objects/pack/pack-$n.idx done (echo "+ $root"; for n in $names; do echo "P $n"; done; echo) >>objects/info/cached-packs git repack -a -d When a clone request needs to include $root, the corresponding cached pack will be copied as-is, rather than enumerating all of the objects that are reachable from $root. For a linux-2.6 kernel repository that should be about 376 MiB, the above process creates two packs of 368 MiB and 38 MiB[1]. This is a local disk usage increase of ~26 MiB, due to reduced delta compression between the large cached pack and the smaller recent activity pack. The overhead is similar to 1 full copy of the compressed project sources. With this cached pack in hand, JGit daemon completes a clone request in 1m17s less time, but a slightly larger data transfer (+2.39 MiB): Before: remote: Counting objects: 1861830, done remote: Finding sources: 100% (1861830/1861830) remote: Getting sizes: 100% (88243/88243) remote: Compressing objects: 100% (88184/88184) Receiving objects: 100% (1861830/1861830), 376.01 MiB | 19.01 MiB/s, done. remote: Total 1861830 (delta 4706), reused 1851053 (delta 1553844) Resolving deltas: 100% (1564621/1564621), done. real 3m19.005s After: remote: Counting objects: 1601, done remote: Counting objects: 1828460, done remote: Finding sources: 100% (50475/50475) remote: Getting sizes: 100% (18843/18843) remote: Compressing objects: 100% (7585/7585) remote: Total 1861830 (delta 2407), reused 1856197 (delta 37510) Receiving objects: 100% (1861830/1861830), 378.40 MiB | 31.31 MiB/s, done. Resolving deltas: 100% (1559477/1559477), done. real 2m2.938s Repository owners can periodically refresh their cached packs by repacking their repository, folding all newer objects into a larger cached pack. Since repacking is already considered to be a normal Git maintenance activity, this isn't a very big burden. [1] In this test $root was set back about two weeks. Change-Id: Ib87131d5c4b5e8c5cacb0f4fe16ff4ece554734b Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 anni fa
PackWriter: Support reuse of entire packs The most expensive part of packing a repository for transport to another system is enumerating all of the objects in the repository. Once this gets to the size of the linux-2.6 repository (1.8 million objects), enumeration can take several CPU minutes and costs a lot of temporary working set memory. Teach PackWriter to efficiently reuse an existing "cached pack" by answering a clone request with a thin pack followed by a larger cached pack appended to the end. This requires the repository owner to first construct the cached pack by hand, and record the tip commits inside of $GIT_DIR/objects/info/cached-packs: cd $GIT_DIR root=$(git rev-parse master) tmp=objects/.tmp-$$ names=$(echo $root | git pack-objects --keep-true-parents --revs $tmp) for n in $names; do chmod a-w $tmp-$n.pack $tmp-$n.idx touch objects/pack/pack-$n.keep mv $tmp-$n.pack objects/pack/pack-$n.pack mv $tmp-$n.idx objects/pack/pack-$n.idx done (echo "+ $root"; for n in $names; do echo "P $n"; done; echo) >>objects/info/cached-packs git repack -a -d When a clone request needs to include $root, the corresponding cached pack will be copied as-is, rather than enumerating all of the objects that are reachable from $root. For a linux-2.6 kernel repository that should be about 376 MiB, the above process creates two packs of 368 MiB and 38 MiB[1]. This is a local disk usage increase of ~26 MiB, due to reduced delta compression between the large cached pack and the smaller recent activity pack. The overhead is similar to 1 full copy of the compressed project sources. With this cached pack in hand, JGit daemon completes a clone request in 1m17s less time, but a slightly larger data transfer (+2.39 MiB): Before: remote: Counting objects: 1861830, done remote: Finding sources: 100% (1861830/1861830) remote: Getting sizes: 100% (88243/88243) remote: Compressing objects: 100% (88184/88184) Receiving objects: 100% (1861830/1861830), 376.01 MiB | 19.01 MiB/s, done. remote: Total 1861830 (delta 4706), reused 1851053 (delta 1553844) Resolving deltas: 100% (1564621/1564621), done. real 3m19.005s After: remote: Counting objects: 1601, done remote: Counting objects: 1828460, done remote: Finding sources: 100% (50475/50475) remote: Getting sizes: 100% (18843/18843) remote: Compressing objects: 100% (7585/7585) remote: Total 1861830 (delta 2407), reused 1856197 (delta 37510) Receiving objects: 100% (1861830/1861830), 378.40 MiB | 31.31 MiB/s, done. Resolving deltas: 100% (1559477/1559477), done. real 2m2.938s Repository owners can periodically refresh their cached packs by repacking their repository, folding all newer objects into a larger cached pack. Since repacking is already considered to be a normal Git maintenance activity, this isn't a very big burden. [1] In this test $root was set back about two weeks. Change-Id: Ib87131d5c4b5e8c5cacb0f4fe16ff4ece554734b Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 anni fa
Fix PackInvalidException when fetch and repack run concurrently We are running several servers with jGit. We need to run repack from time to time to keep the repos performant. I.e. after push we test how many small packs are in the repo and when a threshold is reached we run the repack. After upgrading jGit version we've found that if someone does the clone at the time repack is running the clone sometimes (not always) fails because the repack removes .pack file used by the clone. Server exception and client error attached. I've tracked down the cause and it seems to be introduced between jGit 5.2 (which we upgraded from) and 5.3 and being caused by this commit: Move throw of PackInvalidException outside the catch - https://github.com/eclipse/jgit/commit/afef866a44cd65fef292c174cad445b3fb526400 The problem is that when the throw was inside of the try block the last catch block catched the exception and called openFailed(false) method. It is true that it called it with invalidate = false, which is wrong. The real problem though is that with the throw outside of the try block the openFail is not called at all and the fields activeWindows and activeCopyRawData are not set to 0. Which affects the later called tests like: if (++activeCopyRawData == 1 && activeWindows == 0). The fix for this is relatively simple keeping the throw outside of the try block and still having the invalid field set to true. I did exhaustive testing of the change running concurrent clones and pushes indefinitely and with the patch applied it never fails while without the patch it takes relatively short to get the error. See: https://www.eclipse.org/lists/jgit-dev/msg04014.html Bug: 569349 Change-Id: I9dbf8801c8d3131955ad7124f42b62095d96da54 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
3 anni fa
Don't use interruptable pread() to access pack files The J2SE NIO APIs require that FileChannel close the underlying file descriptor if a thread is interrupted while it is inside of a read or write operation on that channel. This is insane, because it means we cannot share the file descriptor between threads. If a thread is in the middle of the FileChannel variant of IO.readFully() and it receives an interrupt, the pack will be automatically closed on us. This causes the other threads trying to use that same FileChannel to receive IOExceptions, which leads to the pack getting marked as invalid. Once the pack is marked invalid, JGit loses access to its entire contents and starts to report MissingObjectExceptions. Because PackWriter must ensure that the chosen pack file stays available until the current object's data is fully copied to the output, JGit cannot simply reopen the pack when its automatically closed due to an interrupt being sent at the wrong time. The pack may have been deleted by a concurrent `git gc` process, and that open file descriptor might be the last reference to the inode on disk. Once its closed, the PackWriter loses access to that object representation, and it cannot complete sending the object the client. Fortunately, RandomAccessFile's readFully method does not have this problem. Interrupts during readFully() are ignored. However, it requires us to first seek to the offset we need to read, then issue the read call. This requires locking around the file descriptor to prevent concurrent threads from moving the pointer before the read. This reduces the concurrency level, as now only one window can be paged in at a time from each pack. However, the WindowCache should already be holding most of the pages required to handle the working set for a process, and its own internal locking was already limiting us on the number of concurrent loads possible. Provided that most concurrent accesses are getting hits in the WindowCache, or are for different repositories on the same server, we shouldn't see a major performance hit due to the more serialized loading. I would have preferred to use a pool of RandomAccessFiles for each pack, with threads borrowing an instance dedicated to that thread whenever they needed to page in a window. This would permit much higher levels of concurrency by using multiple file descriptors (and file pointers) for each pack. However the code became too complex to develop in any reasonable period of time, so I've chosen to retrofit the existing code with more serialization instead. Bug: 308945 Change-Id: I2e6e11c6e5a105e5aef68871b66200fd725134c9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 anni fa
Don't use interruptable pread() to access pack files The J2SE NIO APIs require that FileChannel close the underlying file descriptor if a thread is interrupted while it is inside of a read or write operation on that channel. This is insane, because it means we cannot share the file descriptor between threads. If a thread is in the middle of the FileChannel variant of IO.readFully() and it receives an interrupt, the pack will be automatically closed on us. This causes the other threads trying to use that same FileChannel to receive IOExceptions, which leads to the pack getting marked as invalid. Once the pack is marked invalid, JGit loses access to its entire contents and starts to report MissingObjectExceptions. Because PackWriter must ensure that the chosen pack file stays available until the current object's data is fully copied to the output, JGit cannot simply reopen the pack when its automatically closed due to an interrupt being sent at the wrong time. The pack may have been deleted by a concurrent `git gc` process, and that open file descriptor might be the last reference to the inode on disk. Once its closed, the PackWriter loses access to that object representation, and it cannot complete sending the object the client. Fortunately, RandomAccessFile's readFully method does not have this problem. Interrupts during readFully() are ignored. However, it requires us to first seek to the offset we need to read, then issue the read call. This requires locking around the file descriptor to prevent concurrent threads from moving the pointer before the read. This reduces the concurrency level, as now only one window can be paged in at a time from each pack. However, the WindowCache should already be holding most of the pages required to handle the working set for a process, and its own internal locking was already limiting us on the number of concurrent loads possible. Provided that most concurrent accesses are getting hits in the WindowCache, or are for different repositories on the same server, we shouldn't see a major performance hit due to the more serialized loading. I would have preferred to use a pool of RandomAccessFiles for each pack, with threads borrowing an instance dedicated to that thread whenever they needed to page in a window. This would permit much higher levels of concurrency by using multiple file descriptors (and file pointers) for each pack. However the code became too complex to develop in any reasonable period of time, so I've chosen to retrofit the existing code with more serialization instead. Bug: 308945 Change-Id: I2e6e11c6e5a105e5aef68871b66200fd725134c9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 anni fa
Don't use interruptable pread() to access pack files The J2SE NIO APIs require that FileChannel close the underlying file descriptor if a thread is interrupted while it is inside of a read or write operation on that channel. This is insane, because it means we cannot share the file descriptor between threads. If a thread is in the middle of the FileChannel variant of IO.readFully() and it receives an interrupt, the pack will be automatically closed on us. This causes the other threads trying to use that same FileChannel to receive IOExceptions, which leads to the pack getting marked as invalid. Once the pack is marked invalid, JGit loses access to its entire contents and starts to report MissingObjectExceptions. Because PackWriter must ensure that the chosen pack file stays available until the current object's data is fully copied to the output, JGit cannot simply reopen the pack when its automatically closed due to an interrupt being sent at the wrong time. The pack may have been deleted by a concurrent `git gc` process, and that open file descriptor might be the last reference to the inode on disk. Once its closed, the PackWriter loses access to that object representation, and it cannot complete sending the object the client. Fortunately, RandomAccessFile's readFully method does not have this problem. Interrupts during readFully() are ignored. However, it requires us to first seek to the offset we need to read, then issue the read call. This requires locking around the file descriptor to prevent concurrent threads from moving the pointer before the read. This reduces the concurrency level, as now only one window can be paged in at a time from each pack. However, the WindowCache should already be holding most of the pages required to handle the working set for a process, and its own internal locking was already limiting us on the number of concurrent loads possible. Provided that most concurrent accesses are getting hits in the WindowCache, or are for different repositories on the same server, we shouldn't see a major performance hit due to the more serialized loading. I would have preferred to use a pool of RandomAccessFiles for each pack, with threads borrowing an instance dedicated to that thread whenever they needed to page in a window. This would permit much higher levels of concurrency by using multiple file descriptors (and file pointers) for each pack. However the code became too complex to develop in any reasonable period of time, so I've chosen to retrofit the existing code with more serialization instead. Bug: 308945 Change-Id: I2e6e11c6e5a105e5aef68871b66200fd725134c9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 anni fa
Don't use interruptable pread() to access pack files The J2SE NIO APIs require that FileChannel close the underlying file descriptor if a thread is interrupted while it is inside of a read or write operation on that channel. This is insane, because it means we cannot share the file descriptor between threads. If a thread is in the middle of the FileChannel variant of IO.readFully() and it receives an interrupt, the pack will be automatically closed on us. This causes the other threads trying to use that same FileChannel to receive IOExceptions, which leads to the pack getting marked as invalid. Once the pack is marked invalid, JGit loses access to its entire contents and starts to report MissingObjectExceptions. Because PackWriter must ensure that the chosen pack file stays available until the current object's data is fully copied to the output, JGit cannot simply reopen the pack when its automatically closed due to an interrupt being sent at the wrong time. The pack may have been deleted by a concurrent `git gc` process, and that open file descriptor might be the last reference to the inode on disk. Once its closed, the PackWriter loses access to that object representation, and it cannot complete sending the object the client. Fortunately, RandomAccessFile's readFully method does not have this problem. Interrupts during readFully() are ignored. However, it requires us to first seek to the offset we need to read, then issue the read call. This requires locking around the file descriptor to prevent concurrent threads from moving the pointer before the read. This reduces the concurrency level, as now only one window can be paged in at a time from each pack. However, the WindowCache should already be holding most of the pages required to handle the working set for a process, and its own internal locking was already limiting us on the number of concurrent loads possible. Provided that most concurrent accesses are getting hits in the WindowCache, or are for different repositories on the same server, we shouldn't see a major performance hit due to the more serialized loading. I would have preferred to use a pool of RandomAccessFiles for each pack, with threads borrowing an instance dedicated to that thread whenever they needed to page in a window. This would permit much higher levels of concurrency by using multiple file descriptors (and file pointers) for each pack. However the code became too complex to develop in any reasonable period of time, so I've chosen to retrofit the existing code with more serialization instead. Bug: 308945 Change-Id: I2e6e11c6e5a105e5aef68871b66200fd725134c9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 anni fa
Don't use interruptable pread() to access pack files The J2SE NIO APIs require that FileChannel close the underlying file descriptor if a thread is interrupted while it is inside of a read or write operation on that channel. This is insane, because it means we cannot share the file descriptor between threads. If a thread is in the middle of the FileChannel variant of IO.readFully() and it receives an interrupt, the pack will be automatically closed on us. This causes the other threads trying to use that same FileChannel to receive IOExceptions, which leads to the pack getting marked as invalid. Once the pack is marked invalid, JGit loses access to its entire contents and starts to report MissingObjectExceptions. Because PackWriter must ensure that the chosen pack file stays available until the current object's data is fully copied to the output, JGit cannot simply reopen the pack when its automatically closed due to an interrupt being sent at the wrong time. The pack may have been deleted by a concurrent `git gc` process, and that open file descriptor might be the last reference to the inode on disk. Once its closed, the PackWriter loses access to that object representation, and it cannot complete sending the object the client. Fortunately, RandomAccessFile's readFully method does not have this problem. Interrupts during readFully() are ignored. However, it requires us to first seek to the offset we need to read, then issue the read call. This requires locking around the file descriptor to prevent concurrent threads from moving the pointer before the read. This reduces the concurrency level, as now only one window can be paged in at a time from each pack. However, the WindowCache should already be holding most of the pages required to handle the working set for a process, and its own internal locking was already limiting us on the number of concurrent loads possible. Provided that most concurrent accesses are getting hits in the WindowCache, or are for different repositories on the same server, we shouldn't see a major performance hit due to the more serialized loading. I would have preferred to use a pool of RandomAccessFiles for each pack, with threads borrowing an instance dedicated to that thread whenever they needed to page in a window. This would permit much higher levels of concurrency by using multiple file descriptors (and file pointers) for each pack. However the code became too complex to develop in any reasonable period of time, so I've chosen to retrofit the existing code with more serialization instead. Bug: 308945 Change-Id: I2e6e11c6e5a105e5aef68871b66200fd725134c9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 anni fa
Don't use interruptable pread() to access pack files The J2SE NIO APIs require that FileChannel close the underlying file descriptor if a thread is interrupted while it is inside of a read or write operation on that channel. This is insane, because it means we cannot share the file descriptor between threads. If a thread is in the middle of the FileChannel variant of IO.readFully() and it receives an interrupt, the pack will be automatically closed on us. This causes the other threads trying to use that same FileChannel to receive IOExceptions, which leads to the pack getting marked as invalid. Once the pack is marked invalid, JGit loses access to its entire contents and starts to report MissingObjectExceptions. Because PackWriter must ensure that the chosen pack file stays available until the current object's data is fully copied to the output, JGit cannot simply reopen the pack when its automatically closed due to an interrupt being sent at the wrong time. The pack may have been deleted by a concurrent `git gc` process, and that open file descriptor might be the last reference to the inode on disk. Once its closed, the PackWriter loses access to that object representation, and it cannot complete sending the object the client. Fortunately, RandomAccessFile's readFully method does not have this problem. Interrupts during readFully() are ignored. However, it requires us to first seek to the offset we need to read, then issue the read call. This requires locking around the file descriptor to prevent concurrent threads from moving the pointer before the read. This reduces the concurrency level, as now only one window can be paged in at a time from each pack. However, the WindowCache should already be holding most of the pages required to handle the working set for a process, and its own internal locking was already limiting us on the number of concurrent loads possible. Provided that most concurrent accesses are getting hits in the WindowCache, or are for different repositories on the same server, we shouldn't see a major performance hit due to the more serialized loading. I would have preferred to use a pool of RandomAccessFiles for each pack, with threads borrowing an instance dedicated to that thread whenever they needed to page in a window. This would permit much higher levels of concurrency by using multiple file descriptors (and file pointers) for each pack. However the code became too complex to develop in any reasonable period of time, so I've chosen to retrofit the existing code with more serialization instead. Bug: 308945 Change-Id: I2e6e11c6e5a105e5aef68871b66200fd725134c9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 anni fa
Don't use interruptable pread() to access pack files The J2SE NIO APIs require that FileChannel close the underlying file descriptor if a thread is interrupted while it is inside of a read or write operation on that channel. This is insane, because it means we cannot share the file descriptor between threads. If a thread is in the middle of the FileChannel variant of IO.readFully() and it receives an interrupt, the pack will be automatically closed on us. This causes the other threads trying to use that same FileChannel to receive IOExceptions, which leads to the pack getting marked as invalid. Once the pack is marked invalid, JGit loses access to its entire contents and starts to report MissingObjectExceptions. Because PackWriter must ensure that the chosen pack file stays available until the current object's data is fully copied to the output, JGit cannot simply reopen the pack when its automatically closed due to an interrupt being sent at the wrong time. The pack may have been deleted by a concurrent `git gc` process, and that open file descriptor might be the last reference to the inode on disk. Once its closed, the PackWriter loses access to that object representation, and it cannot complete sending the object the client. Fortunately, RandomAccessFile's readFully method does not have this problem. Interrupts during readFully() are ignored. However, it requires us to first seek to the offset we need to read, then issue the read call. This requires locking around the file descriptor to prevent concurrent threads from moving the pointer before the read. This reduces the concurrency level, as now only one window can be paged in at a time from each pack. However, the WindowCache should already be holding most of the pages required to handle the working set for a process, and its own internal locking was already limiting us on the number of concurrent loads possible. Provided that most concurrent accesses are getting hits in the WindowCache, or are for different repositories on the same server, we shouldn't see a major performance hit due to the more serialized loading. I would have preferred to use a pool of RandomAccessFiles for each pack, with threads borrowing an instance dedicated to that thread whenever they needed to page in a window. This would permit much higher levels of concurrency by using multiple file descriptors (and file pointers) for each pack. However the code became too complex to develop in any reasonable period of time, so I've chosen to retrofit the existing code with more serialization instead. Bug: 308945 Change-Id: I2e6e11c6e5a105e5aef68871b66200fd725134c9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 anni fa
Don't use interruptable pread() to access pack files The J2SE NIO APIs require that FileChannel close the underlying file descriptor if a thread is interrupted while it is inside of a read or write operation on that channel. This is insane, because it means we cannot share the file descriptor between threads. If a thread is in the middle of the FileChannel variant of IO.readFully() and it receives an interrupt, the pack will be automatically closed on us. This causes the other threads trying to use that same FileChannel to receive IOExceptions, which leads to the pack getting marked as invalid. Once the pack is marked invalid, JGit loses access to its entire contents and starts to report MissingObjectExceptions. Because PackWriter must ensure that the chosen pack file stays available until the current object's data is fully copied to the output, JGit cannot simply reopen the pack when its automatically closed due to an interrupt being sent at the wrong time. The pack may have been deleted by a concurrent `git gc` process, and that open file descriptor might be the last reference to the inode on disk. Once its closed, the PackWriter loses access to that object representation, and it cannot complete sending the object the client. Fortunately, RandomAccessFile's readFully method does not have this problem. Interrupts during readFully() are ignored. However, it requires us to first seek to the offset we need to read, then issue the read call. This requires locking around the file descriptor to prevent concurrent threads from moving the pointer before the read. This reduces the concurrency level, as now only one window can be paged in at a time from each pack. However, the WindowCache should already be holding most of the pages required to handle the working set for a process, and its own internal locking was already limiting us on the number of concurrent loads possible. Provided that most concurrent accesses are getting hits in the WindowCache, or are for different repositories on the same server, we shouldn't see a major performance hit due to the more serialized loading. I would have preferred to use a pool of RandomAccessFiles for each pack, with threads borrowing an instance dedicated to that thread whenever they needed to page in a window. This would permit much higher levels of concurrency by using multiple file descriptors (and file pointers) for each pack. However the code became too complex to develop in any reasonable period of time, so I've chosen to retrofit the existing code with more serialization instead. Bug: 308945 Change-Id: I2e6e11c6e5a105e5aef68871b66200fd725134c9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 anni fa
Don't use interruptable pread() to access pack files The J2SE NIO APIs require that FileChannel close the underlying file descriptor if a thread is interrupted while it is inside of a read or write operation on that channel. This is insane, because it means we cannot share the file descriptor between threads. If a thread is in the middle of the FileChannel variant of IO.readFully() and it receives an interrupt, the pack will be automatically closed on us. This causes the other threads trying to use that same FileChannel to receive IOExceptions, which leads to the pack getting marked as invalid. Once the pack is marked invalid, JGit loses access to its entire contents and starts to report MissingObjectExceptions. Because PackWriter must ensure that the chosen pack file stays available until the current object's data is fully copied to the output, JGit cannot simply reopen the pack when its automatically closed due to an interrupt being sent at the wrong time. The pack may have been deleted by a concurrent `git gc` process, and that open file descriptor might be the last reference to the inode on disk. Once its closed, the PackWriter loses access to that object representation, and it cannot complete sending the object the client. Fortunately, RandomAccessFile's readFully method does not have this problem. Interrupts during readFully() are ignored. However, it requires us to first seek to the offset we need to read, then issue the read call. This requires locking around the file descriptor to prevent concurrent threads from moving the pointer before the read. This reduces the concurrency level, as now only one window can be paged in at a time from each pack. However, the WindowCache should already be holding most of the pages required to handle the working set for a process, and its own internal locking was already limiting us on the number of concurrent loads possible. Provided that most concurrent accesses are getting hits in the WindowCache, or are for different repositories on the same server, we shouldn't see a major performance hit due to the more serialized loading. I would have preferred to use a pool of RandomAccessFiles for each pack, with threads borrowing an instance dedicated to that thread whenever they needed to page in a window. This would permit much higher levels of concurrency by using multiple file descriptors (and file pointers) for each pack. However the code became too complex to develop in any reasonable period of time, so I've chosen to retrofit the existing code with more serialization instead. Bug: 308945 Change-Id: I2e6e11c6e5a105e5aef68871b66200fd725134c9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 anni fa
Don't use interruptable pread() to access pack files The J2SE NIO APIs require that FileChannel close the underlying file descriptor if a thread is interrupted while it is inside of a read or write operation on that channel. This is insane, because it means we cannot share the file descriptor between threads. If a thread is in the middle of the FileChannel variant of IO.readFully() and it receives an interrupt, the pack will be automatically closed on us. This causes the other threads trying to use that same FileChannel to receive IOExceptions, which leads to the pack getting marked as invalid. Once the pack is marked invalid, JGit loses access to its entire contents and starts to report MissingObjectExceptions. Because PackWriter must ensure that the chosen pack file stays available until the current object's data is fully copied to the output, JGit cannot simply reopen the pack when its automatically closed due to an interrupt being sent at the wrong time. The pack may have been deleted by a concurrent `git gc` process, and that open file descriptor might be the last reference to the inode on disk. Once its closed, the PackWriter loses access to that object representation, and it cannot complete sending the object the client. Fortunately, RandomAccessFile's readFully method does not have this problem. Interrupts during readFully() are ignored. However, it requires us to first seek to the offset we need to read, then issue the read call. This requires locking around the file descriptor to prevent concurrent threads from moving the pointer before the read. This reduces the concurrency level, as now only one window can be paged in at a time from each pack. However, the WindowCache should already be holding most of the pages required to handle the working set for a process, and its own internal locking was already limiting us on the number of concurrent loads possible. Provided that most concurrent accesses are getting hits in the WindowCache, or are for different repositories on the same server, we shouldn't see a major performance hit due to the more serialized loading. I would have preferred to use a pool of RandomAccessFiles for each pack, with threads borrowing an instance dedicated to that thread whenever they needed to page in a window. This would permit much higher levels of concurrency by using multiple file descriptors (and file pointers) for each pack. However the code became too complex to develop in any reasonable period of time, so I've chosen to retrofit the existing code with more serialization instead. Bug: 308945 Change-Id: I2e6e11c6e5a105e5aef68871b66200fd725134c9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 anni fa
Increase core.streamFileThreshold default to 50 MiB Projects like org.eclipse.mdt contain large XML files about 6 MiB in size. So does the Android project platform/frameworks/base. Doing a clone of either project with JGit takes forever to checkout the files into the working directory, because delta decompression tends to be very expensive as we need to constantly reposition the base stream for each copy instruction. This can be made worse by a very bad ordering of offsets, possibly due to an XML editor that doesn't preserve the order of elements in the file very well. Increasing the threshold to the same limit PackWriter uses when doing delta compression (50 MiB) permits a default configured JGit to decompress these XML file objects using the faster random-access arrays, rather than re-seeking through an inflate stream, significantly reducing checkout time after a clone. Since this new limit may be dangerously close to the JVM maximum heap size, every allocation attempt is now wrapped in a try/catch so that JGit can degrade by switching to the large object stream mode when the allocation is refused. It will run slower, but the operation will still complete. The large stream mode will run very well for big objects that aren't delta compressed, and is acceptable for delta compressed objects that are using only forward referencing copy instructions. Copies using prior offsets are still going to be horrible, and there is nothing we can do about it except increase core.streamFileThreshold. We might in the future want to consider changing the way the delta generators work in JGit and native C Git to avoid prior offsets once an object reaches a certain size, even if that causes the delta instruction stream to be slightly larger. Unfortunately native C Git won't want to do that until its also able to stream objects rather than malloc them as contiguous blocks. Change-Id: Ief7a3896afce15073e80d3691bed90c6a3897307 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
13 anni fa
Increase core.streamFileThreshold default to 50 MiB Projects like org.eclipse.mdt contain large XML files about 6 MiB in size. So does the Android project platform/frameworks/base. Doing a clone of either project with JGit takes forever to checkout the files into the working directory, because delta decompression tends to be very expensive as we need to constantly reposition the base stream for each copy instruction. This can be made worse by a very bad ordering of offsets, possibly due to an XML editor that doesn't preserve the order of elements in the file very well. Increasing the threshold to the same limit PackWriter uses when doing delta compression (50 MiB) permits a default configured JGit to decompress these XML file objects using the faster random-access arrays, rather than re-seeking through an inflate stream, significantly reducing checkout time after a clone. Since this new limit may be dangerously close to the JVM maximum heap size, every allocation attempt is now wrapped in a try/catch so that JGit can degrade by switching to the large object stream mode when the allocation is refused. It will run slower, but the operation will still complete. The large stream mode will run very well for big objects that aren't delta compressed, and is acceptable for delta compressed objects that are using only forward referencing copy instructions. Copies using prior offsets are still going to be horrible, and there is nothing we can do about it except increase core.streamFileThreshold. We might in the future want to consider changing the way the delta generators work in JGit and native C Git to avoid prior offsets once an object reaches a certain size, even if that causes the delta instruction stream to be slightly larger. Unfortunately native C Git won't want to do that until its also able to stream objects rather than malloc them as contiguous blocks. Change-Id: Ief7a3896afce15073e80d3691bed90c6a3897307 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
13 anni fa
Increase core.streamFileThreshold default to 50 MiB Projects like org.eclipse.mdt contain large XML files about 6 MiB in size. So does the Android project platform/frameworks/base. Doing a clone of either project with JGit takes forever to checkout the files into the working directory, because delta decompression tends to be very expensive as we need to constantly reposition the base stream for each copy instruction. This can be made worse by a very bad ordering of offsets, possibly due to an XML editor that doesn't preserve the order of elements in the file very well. Increasing the threshold to the same limit PackWriter uses when doing delta compression (50 MiB) permits a default configured JGit to decompress these XML file objects using the faster random-access arrays, rather than re-seeking through an inflate stream, significantly reducing checkout time after a clone. Since this new limit may be dangerously close to the JVM maximum heap size, every allocation attempt is now wrapped in a try/catch so that JGit can degrade by switching to the large object stream mode when the allocation is refused. It will run slower, but the operation will still complete. The large stream mode will run very well for big objects that aren't delta compressed, and is acceptable for delta compressed objects that are using only forward referencing copy instructions. Copies using prior offsets are still going to be horrible, and there is nothing we can do about it except increase core.streamFileThreshold. We might in the future want to consider changing the way the delta generators work in JGit and native C Git to avoid prior offsets once an object reaches a certain size, even if that causes the delta instruction stream to be slightly larger. Unfortunately native C Git won't want to do that until its also able to stream objects rather than malloc them as contiguous blocks. Change-Id: Ief7a3896afce15073e80d3691bed90c6a3897307 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
13 anni fa
Increase core.streamFileThreshold default to 50 MiB Projects like org.eclipse.mdt contain large XML files about 6 MiB in size. So does the Android project platform/frameworks/base. Doing a clone of either project with JGit takes forever to checkout the files into the working directory, because delta decompression tends to be very expensive as we need to constantly reposition the base stream for each copy instruction. This can be made worse by a very bad ordering of offsets, possibly due to an XML editor that doesn't preserve the order of elements in the file very well. Increasing the threshold to the same limit PackWriter uses when doing delta compression (50 MiB) permits a default configured JGit to decompress these XML file objects using the faster random-access arrays, rather than re-seeking through an inflate stream, significantly reducing checkout time after a clone. Since this new limit may be dangerously close to the JVM maximum heap size, every allocation attempt is now wrapped in a try/catch so that JGit can degrade by switching to the large object stream mode when the allocation is refused. It will run slower, but the operation will still complete. The large stream mode will run very well for big objects that aren't delta compressed, and is acceptable for delta compressed objects that are using only forward referencing copy instructions. Copies using prior offsets are still going to be horrible, and there is nothing we can do about it except increase core.streamFileThreshold. We might in the future want to consider changing the way the delta generators work in JGit and native C Git to avoid prior offsets once an object reaches a certain size, even if that causes the delta instruction stream to be slightly larger. Unfortunately native C Git won't want to do that until its also able to stream objects rather than malloc them as contiguous blocks. Change-Id: Ief7a3896afce15073e80d3691bed90c6a3897307 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
13 anni fa
Added read/write support for pack bitmap index. A pack bitmap index is an additional index of compressed bitmaps of the object graph. Furthermore, a logical API of the index functionality is included, as it is expected to be used by the PackWriter. Compressed bitmaps are created using the javaewah library, which is a word-aligned compressed variant of the Java bitset class based on run-length encoding. The library only works with positive integer values. Thus, the maximum number of ObjectIds in a pack file that this index can currently support is limited to Integer.MAX_VALUE. Every ObjectId is given an integer mapping. The integer is the position of the ObjectId in the complete ObjectId list, sorted by offset, for the pack file. That integer is what the bitmaps use to reference the ObjectId. Currently, the new index format can only be used with pack files that contain a complete closure of the object graph e.g. the result of a garbage collection. The index file includes four bitmaps for the Git object types i.e. commits, trees, blobs, and tags. In addition, a collection of bitmaps keyed by an ObjectId is also included. The bitmap for each entry in the collection represents the full closure of ObjectIds reachable from the keyed ObjectId (including the keyed ObjectId itself). The bitmaps are further compressed by XORing the current bitmaps against prior bitmaps in the index, and selecting the smallest representation. The XOR'd bitmap and offset from the current entry to the position of the bitmap to XOR against is the actual representation of the entry in the index file. Each entry contains one byte, which is currently used to note whether the bitmap should be blindly reused. Change-Id: Id328724bf6b4c8366a088233098c18643edcf40f
11 anni fa
Added read/write support for pack bitmap index. A pack bitmap index is an additional index of compressed bitmaps of the object graph. Furthermore, a logical API of the index functionality is included, as it is expected to be used by the PackWriter. Compressed bitmaps are created using the javaewah library, which is a word-aligned compressed variant of the Java bitset class based on run-length encoding. The library only works with positive integer values. Thus, the maximum number of ObjectIds in a pack file that this index can currently support is limited to Integer.MAX_VALUE. Every ObjectId is given an integer mapping. The integer is the position of the ObjectId in the complete ObjectId list, sorted by offset, for the pack file. That integer is what the bitmaps use to reference the ObjectId. Currently, the new index format can only be used with pack files that contain a complete closure of the object graph e.g. the result of a garbage collection. The index file includes four bitmaps for the Git object types i.e. commits, trees, blobs, and tags. In addition, a collection of bitmaps keyed by an ObjectId is also included. The bitmap for each entry in the collection represents the full closure of ObjectIds reachable from the keyed ObjectId (including the keyed ObjectId itself). The bitmaps are further compressed by XORing the current bitmaps against prior bitmaps in the index, and selecting the smallest representation. The XOR'd bitmap and offset from the current entry to the position of the bitmap to XOR against is the actual representation of the entry in the index file. Each entry contains one byte, which is currently used to note whether the bitmap should be blindly reused. Change-Id: Id328724bf6b4c8366a088233098c18643edcf40f
11 anni fa
Added read/write support for pack bitmap index. A pack bitmap index is an additional index of compressed bitmaps of the object graph. Furthermore, a logical API of the index functionality is included, as it is expected to be used by the PackWriter. Compressed bitmaps are created using the javaewah library, which is a word-aligned compressed variant of the Java bitset class based on run-length encoding. The library only works with positive integer values. Thus, the maximum number of ObjectIds in a pack file that this index can currently support is limited to Integer.MAX_VALUE. Every ObjectId is given an integer mapping. The integer is the position of the ObjectId in the complete ObjectId list, sorted by offset, for the pack file. That integer is what the bitmaps use to reference the ObjectId. Currently, the new index format can only be used with pack files that contain a complete closure of the object graph e.g. the result of a garbage collection. The index file includes four bitmaps for the Git object types i.e. commits, trees, blobs, and tags. In addition, a collection of bitmaps keyed by an ObjectId is also included. The bitmap for each entry in the collection represents the full closure of ObjectIds reachable from the keyed ObjectId (including the keyed ObjectId itself). The bitmaps are further compressed by XORing the current bitmaps against prior bitmaps in the index, and selecting the smallest representation. The XOR'd bitmap and offset from the current entry to the position of the bitmap to XOR against is the actual representation of the entry in the index file. Each entry contains one byte, which is currently used to note whether the bitmap should be blindly reused. Change-Id: Id328724bf6b4c8366a088233098c18643edcf40f
11 anni fa
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130113111321133113411351136113711381139114011411142114311441145114611471148114911501151115211531154115511561157115811591160116111621163116411651166116711681169117011711172117311741175117611771178117911801181118211831184118511861187118811891190
  1. /*
  2. * Copyright (C) 2008-2009, Google Inc.
  3. * Copyright (C) 2007, Robin Rosenberg <robin.rosenberg@dewire.com>
  4. * Copyright (C) 2006-2008, Shawn O. Pearce <spearce@spearce.org> and others
  5. *
  6. * This program and the accompanying materials are made available under the
  7. * terms of the Eclipse Distribution License v. 1.0 which is available at
  8. * https://www.eclipse.org/org/documents/edl-v10.php.
  9. *
  10. * SPDX-License-Identifier: BSD-3-Clause
  11. */
  12. package org.eclipse.jgit.internal.storage.file;
  13. import static org.eclipse.jgit.internal.storage.pack.PackExt.INDEX;
  14. import static org.eclipse.jgit.internal.storage.pack.PackExt.KEEP;
  15. import java.io.EOFException;
  16. import java.io.File;
  17. import java.io.FileNotFoundException;
  18. import java.io.IOException;
  19. import java.io.InterruptedIOException;
  20. import java.io.RandomAccessFile;
  21. import java.nio.MappedByteBuffer;
  22. import java.nio.channels.FileChannel.MapMode;
  23. import java.nio.file.AccessDeniedException;
  24. import java.nio.file.NoSuchFileException;
  25. import java.text.MessageFormat;
  26. import java.time.Instant;
  27. import java.util.Arrays;
  28. import java.util.Collections;
  29. import java.util.Comparator;
  30. import java.util.Iterator;
  31. import java.util.Set;
  32. import java.util.concurrent.atomic.AtomicInteger;
  33. import java.util.zip.CRC32;
  34. import java.util.zip.DataFormatException;
  35. import java.util.zip.Inflater;
  36. import org.eclipse.jgit.annotations.Nullable;
  37. import org.eclipse.jgit.errors.CorruptObjectException;
  38. import org.eclipse.jgit.errors.LargeObjectException;
  39. import org.eclipse.jgit.errors.MissingObjectException;
  40. import org.eclipse.jgit.errors.NoPackSignatureException;
  41. import org.eclipse.jgit.errors.PackInvalidException;
  42. import org.eclipse.jgit.errors.PackMismatchException;
  43. import org.eclipse.jgit.errors.StoredObjectRepresentationNotAvailableException;
  44. import org.eclipse.jgit.errors.StoredPackRepresentationNotAvailableException;
  45. import org.eclipse.jgit.errors.UnpackException;
  46. import org.eclipse.jgit.errors.UnsupportedPackIndexVersionException;
  47. import org.eclipse.jgit.errors.UnsupportedPackVersionException;
  48. import org.eclipse.jgit.internal.JGitText;
  49. import org.eclipse.jgit.internal.storage.pack.BinaryDelta;
  50. import org.eclipse.jgit.internal.storage.pack.ObjectToPack;
  51. import org.eclipse.jgit.internal.storage.pack.PackOutputStream;
  52. import org.eclipse.jgit.lib.AbbreviatedObjectId;
  53. import org.eclipse.jgit.lib.AnyObjectId;
  54. import org.eclipse.jgit.lib.Constants;
  55. import org.eclipse.jgit.lib.ObjectId;
  56. import org.eclipse.jgit.lib.ObjectLoader;
  57. import org.eclipse.jgit.util.LongList;
  58. import org.eclipse.jgit.util.NB;
  59. import org.eclipse.jgit.util.RawParseUtils;
  60. import org.slf4j.Logger;
  61. import org.slf4j.LoggerFactory;
  62. /**
  63. * A Git version 2 pack file representation. A pack file contains Git objects in
  64. * delta packed format yielding high compression of lots of object where some
  65. * objects are similar.
  66. */
  67. public class Pack implements Iterable<PackIndex.MutableEntry> {
  68. private static final Logger LOG = LoggerFactory.getLogger(Pack.class);
  69. /**
  70. * Sorts PackFiles to be most recently created to least recently created.
  71. */
  72. public static final Comparator<Pack> SORT = (a, b) -> b.packLastModified
  73. .compareTo(a.packLastModified);
  74. private final PackFile packFile;
  75. private PackFile keepFile;
  76. final int hash;
  77. private RandomAccessFile fd;
  78. /** Serializes reads performed against {@link #fd}. */
  79. private final Object readLock = new Object();
  80. long length;
  81. private int activeWindows;
  82. private int activeCopyRawData;
  83. Instant packLastModified;
  84. private PackFileSnapshot fileSnapshot;
  85. private volatile boolean invalid;
  86. private volatile Exception invalidatingCause;
  87. @Nullable
  88. private PackFile bitmapIdxFile;
  89. private AtomicInteger transientErrorCount = new AtomicInteger();
  90. private byte[] packChecksum;
  91. private volatile PackIndex loadedIdx;
  92. private PackReverseIndex reverseIdx;
  93. private PackBitmapIndex bitmapIdx;
  94. /**
  95. * Objects we have tried to read, and discovered to be corrupt.
  96. * <p>
  97. * The list is allocated after the first corruption is found, and filled in
  98. * as more entries are discovered. Typically this list is never used, as
  99. * pack files do not usually contain corrupt objects.
  100. */
  101. private volatile LongList corruptObjects;
  102. /**
  103. * Construct a reader for an existing, pre-indexed packfile.
  104. *
  105. * @param packFile
  106. * path of the <code>.pack</code> file holding the data.
  107. * @param bitmapIdxFile
  108. * existing bitmap index file with the same base as the pack
  109. */
  110. public Pack(File packFile, @Nullable PackFile bitmapIdxFile) {
  111. this.packFile = new PackFile(packFile);
  112. this.fileSnapshot = PackFileSnapshot.save(packFile);
  113. this.packLastModified = fileSnapshot.lastModifiedInstant();
  114. this.bitmapIdxFile = bitmapIdxFile;
  115. // Multiply by 31 here so we can more directly combine with another
  116. // value in WindowCache.hash(), without doing the multiply there.
  117. //
  118. hash = System.identityHashCode(this) * 31;
  119. length = Long.MAX_VALUE;
  120. }
  121. private PackIndex idx() throws IOException {
  122. PackIndex idx = loadedIdx;
  123. if (idx == null) {
  124. synchronized (this) {
  125. idx = loadedIdx;
  126. if (idx == null) {
  127. if (invalid) {
  128. throw new PackInvalidException(packFile,
  129. invalidatingCause);
  130. }
  131. try {
  132. long start = System.currentTimeMillis();
  133. PackFile idxFile = packFile.create(INDEX);
  134. idx = PackIndex.open(idxFile);
  135. if (LOG.isDebugEnabled()) {
  136. LOG.debug(String.format(
  137. "Opening pack index %s, size %.3f MB took %d ms", //$NON-NLS-1$
  138. idxFile.getAbsolutePath(),
  139. Float.valueOf(idxFile.length()
  140. / (1024f * 1024)),
  141. Long.valueOf(System.currentTimeMillis()
  142. - start)));
  143. }
  144. if (packChecksum == null) {
  145. packChecksum = idx.packChecksum;
  146. fileSnapshot.setChecksum(
  147. ObjectId.fromRaw(packChecksum));
  148. } else if (!Arrays.equals(packChecksum,
  149. idx.packChecksum)) {
  150. throw new PackMismatchException(MessageFormat
  151. .format(JGitText.get().packChecksumMismatch,
  152. packFile.getPath(),
  153. ObjectId.fromRaw(packChecksum)
  154. .name(),
  155. ObjectId.fromRaw(idx.packChecksum)
  156. .name()));
  157. }
  158. loadedIdx = idx;
  159. } catch (InterruptedIOException e) {
  160. // don't invalidate the pack, we are interrupted from
  161. // another thread
  162. throw e;
  163. } catch (IOException e) {
  164. invalid = true;
  165. invalidatingCause = e;
  166. throw e;
  167. }
  168. }
  169. }
  170. }
  171. return idx;
  172. }
  173. /**
  174. * Get the File object which locates this pack on disk.
  175. *
  176. * @return the File object which locates this pack on disk.
  177. */
  178. public PackFile getPackFile() {
  179. return packFile;
  180. }
  181. /**
  182. * Get the index for this pack file.
  183. *
  184. * @return the index for this pack file.
  185. * @throws java.io.IOException
  186. */
  187. public PackIndex getIndex() throws IOException {
  188. return idx();
  189. }
  190. /**
  191. * Get name extracted from {@code pack-*.pack} pattern.
  192. *
  193. * @return name extracted from {@code pack-*.pack} pattern.
  194. */
  195. public String getPackName() {
  196. return packFile.getId();
  197. }
  198. /**
  199. * Determine if an object is contained within the pack file.
  200. * <p>
  201. * For performance reasons only the index file is searched; the main pack
  202. * content is ignored entirely.
  203. * </p>
  204. *
  205. * @param id
  206. * the object to look for. Must not be null.
  207. * @return true if the object is in this pack; false otherwise.
  208. * @throws java.io.IOException
  209. * the index file cannot be loaded into memory.
  210. */
  211. public boolean hasObject(AnyObjectId id) throws IOException {
  212. final long offset = idx().findOffset(id);
  213. return 0 < offset && !isCorrupt(offset);
  214. }
  215. /**
  216. * Determines whether a .keep file exists for this pack file.
  217. *
  218. * @return true if a .keep file exist.
  219. */
  220. public boolean shouldBeKept() {
  221. if (keepFile == null) {
  222. keepFile = packFile.create(KEEP);
  223. }
  224. return keepFile.exists();
  225. }
  226. /**
  227. * Get an object from this pack.
  228. *
  229. * @param curs
  230. * temporary working space associated with the calling thread.
  231. * @param id
  232. * the object to obtain from the pack. Must not be null.
  233. * @return the object loader for the requested object if it is contained in
  234. * this pack; null if the object was not found.
  235. * @throws IOException
  236. * the pack file or the index could not be read.
  237. */
  238. ObjectLoader get(WindowCursor curs, AnyObjectId id)
  239. throws IOException {
  240. final long offset = idx().findOffset(id);
  241. return 0 < offset && !isCorrupt(offset) ? load(curs, offset) : null;
  242. }
  243. void resolve(Set<ObjectId> matches, AbbreviatedObjectId id, int matchLimit)
  244. throws IOException {
  245. idx().resolve(matches, id, matchLimit);
  246. }
  247. /**
  248. * Close the resources utilized by this repository
  249. */
  250. public void close() {
  251. WindowCache.purge(this);
  252. synchronized (this) {
  253. loadedIdx = null;
  254. reverseIdx = null;
  255. }
  256. }
  257. /**
  258. * {@inheritDoc}
  259. * <p>
  260. * Provide iterator over entries in associated pack index, that should also
  261. * exist in this pack file. Objects returned by such iterator are mutable
  262. * during iteration.
  263. * <p>
  264. * Iterator returns objects in SHA-1 lexicographical order.
  265. * </p>
  266. *
  267. * @see PackIndex#iterator()
  268. */
  269. @Override
  270. public Iterator<PackIndex.MutableEntry> iterator() {
  271. try {
  272. return idx().iterator();
  273. } catch (IOException e) {
  274. return Collections.<PackIndex.MutableEntry> emptyList().iterator();
  275. }
  276. }
  277. /**
  278. * Obtain the total number of objects available in this pack. This method
  279. * relies on pack index, giving number of effectively available objects.
  280. *
  281. * @return number of objects in index of this pack, likewise in this pack
  282. * @throws IOException
  283. * the index file cannot be loaded into memory.
  284. */
  285. long getObjectCount() throws IOException {
  286. return idx().getObjectCount();
  287. }
  288. /**
  289. * Search for object id with the specified start offset in associated pack
  290. * (reverse) index.
  291. *
  292. * @param offset
  293. * start offset of object to find
  294. * @return object id for this offset, or null if no object was found
  295. * @throws IOException
  296. * the index file cannot be loaded into memory.
  297. */
  298. ObjectId findObjectForOffset(long offset) throws IOException {
  299. return getReverseIdx().findObject(offset);
  300. }
  301. /**
  302. * Return the @{@link FileSnapshot} associated to the underlying packfile
  303. * that has been used when the object was created.
  304. *
  305. * @return the packfile @{@link FileSnapshot} that the object is loaded from.
  306. */
  307. PackFileSnapshot getFileSnapshot() {
  308. return fileSnapshot;
  309. }
  310. AnyObjectId getPackChecksum() {
  311. return ObjectId.fromRaw(packChecksum);
  312. }
  313. private final byte[] decompress(final long position, final int sz,
  314. final WindowCursor curs) throws IOException, DataFormatException {
  315. byte[] dstbuf;
  316. try {
  317. dstbuf = new byte[sz];
  318. } catch (OutOfMemoryError noMemory) {
  319. // The size may be larger than our heap allows, return null to
  320. // let the caller know allocation isn't possible and it should
  321. // use the large object streaming approach instead.
  322. //
  323. // For example, this can occur when sz is 640 MB, and JRE
  324. // maximum heap size is only 256 MB. Even if the JRE has
  325. // 200 MB free, it cannot allocate a 640 MB byte array.
  326. return null;
  327. }
  328. if (curs.inflate(this, position, dstbuf, false) != sz)
  329. throw new EOFException(MessageFormat.format(
  330. JGitText.get().shortCompressedStreamAt,
  331. Long.valueOf(position)));
  332. return dstbuf;
  333. }
  334. void copyPackAsIs(PackOutputStream out, WindowCursor curs)
  335. throws IOException, StoredPackRepresentationNotAvailableException {
  336. // Pin the first window, this ensures the length is accurate.
  337. curs.pin(this, 0);
  338. curs.copyPackAsIs(this, length, out);
  339. }
  340. final void copyAsIs(PackOutputStream out, LocalObjectToPack src,
  341. boolean validate, WindowCursor curs) throws IOException,
  342. StoredObjectRepresentationNotAvailableException {
  343. beginCopyAsIs(src);
  344. try {
  345. copyAsIs2(out, src, validate, curs);
  346. } finally {
  347. endCopyAsIs();
  348. }
  349. }
  350. private void copyAsIs2(PackOutputStream out, LocalObjectToPack src,
  351. boolean validate, WindowCursor curs) throws IOException,
  352. StoredObjectRepresentationNotAvailableException {
  353. final CRC32 crc1 = validate ? new CRC32() : null;
  354. final CRC32 crc2 = validate ? new CRC32() : null;
  355. final byte[] buf = out.getCopyBuffer();
  356. // Rip apart the header so we can discover the size.
  357. //
  358. readFully(src.offset, buf, 0, 20, curs);
  359. int c = buf[0] & 0xff;
  360. final int typeCode = (c >> 4) & 7;
  361. long inflatedLength = c & 15;
  362. int shift = 4;
  363. int headerCnt = 1;
  364. while ((c & 0x80) != 0) {
  365. c = buf[headerCnt++] & 0xff;
  366. inflatedLength += ((long) (c & 0x7f)) << shift;
  367. shift += 7;
  368. }
  369. if (typeCode == Constants.OBJ_OFS_DELTA) {
  370. do {
  371. c = buf[headerCnt++] & 0xff;
  372. } while ((c & 128) != 0);
  373. if (validate) {
  374. assert(crc1 != null && crc2 != null);
  375. crc1.update(buf, 0, headerCnt);
  376. crc2.update(buf, 0, headerCnt);
  377. }
  378. } else if (typeCode == Constants.OBJ_REF_DELTA) {
  379. if (validate) {
  380. assert(crc1 != null && crc2 != null);
  381. crc1.update(buf, 0, headerCnt);
  382. crc2.update(buf, 0, headerCnt);
  383. }
  384. readFully(src.offset + headerCnt, buf, 0, 20, curs);
  385. if (validate) {
  386. assert(crc1 != null && crc2 != null);
  387. crc1.update(buf, 0, 20);
  388. crc2.update(buf, 0, 20);
  389. }
  390. headerCnt += 20;
  391. } else if (validate) {
  392. assert(crc1 != null && crc2 != null);
  393. crc1.update(buf, 0, headerCnt);
  394. crc2.update(buf, 0, headerCnt);
  395. }
  396. final long dataOffset = src.offset + headerCnt;
  397. final long dataLength = src.length;
  398. final long expectedCRC;
  399. final ByteArrayWindow quickCopy;
  400. // Verify the object isn't corrupt before sending. If it is,
  401. // we report it missing instead.
  402. //
  403. try {
  404. quickCopy = curs.quickCopy(this, dataOffset, dataLength);
  405. if (validate && idx().hasCRC32Support()) {
  406. assert(crc1 != null);
  407. // Index has the CRC32 code cached, validate the object.
  408. //
  409. expectedCRC = idx().findCRC32(src);
  410. if (quickCopy != null) {
  411. quickCopy.crc32(crc1, dataOffset, (int) dataLength);
  412. } else {
  413. long pos = dataOffset;
  414. long cnt = dataLength;
  415. while (cnt > 0) {
  416. final int n = (int) Math.min(cnt, buf.length);
  417. readFully(pos, buf, 0, n, curs);
  418. crc1.update(buf, 0, n);
  419. pos += n;
  420. cnt -= n;
  421. }
  422. }
  423. if (crc1.getValue() != expectedCRC) {
  424. setCorrupt(src.offset);
  425. throw new CorruptObjectException(MessageFormat.format(
  426. JGitText.get().objectAtHasBadZlibStream,
  427. Long.valueOf(src.offset), getPackFile()));
  428. }
  429. } else if (validate) {
  430. // We don't have a CRC32 code in the index, so compute it
  431. // now while inflating the raw data to get zlib to tell us
  432. // whether or not the data is safe.
  433. //
  434. Inflater inf = curs.inflater();
  435. byte[] tmp = new byte[1024];
  436. if (quickCopy != null) {
  437. quickCopy.check(inf, tmp, dataOffset, (int) dataLength);
  438. } else {
  439. assert(crc1 != null);
  440. long pos = dataOffset;
  441. long cnt = dataLength;
  442. while (cnt > 0) {
  443. final int n = (int) Math.min(cnt, buf.length);
  444. readFully(pos, buf, 0, n, curs);
  445. crc1.update(buf, 0, n);
  446. inf.setInput(buf, 0, n);
  447. while (inf.inflate(tmp, 0, tmp.length) > 0)
  448. continue;
  449. pos += n;
  450. cnt -= n;
  451. }
  452. }
  453. if (!inf.finished() || inf.getBytesRead() != dataLength) {
  454. setCorrupt(src.offset);
  455. throw new EOFException(MessageFormat.format(
  456. JGitText.get().shortCompressedStreamAt,
  457. Long.valueOf(src.offset)));
  458. }
  459. assert(crc1 != null);
  460. expectedCRC = crc1.getValue();
  461. } else {
  462. expectedCRC = -1;
  463. }
  464. } catch (DataFormatException dataFormat) {
  465. setCorrupt(src.offset);
  466. CorruptObjectException corruptObject = new CorruptObjectException(
  467. MessageFormat.format(
  468. JGitText.get().objectAtHasBadZlibStream,
  469. Long.valueOf(src.offset), getPackFile()),
  470. dataFormat);
  471. throw new StoredObjectRepresentationNotAvailableException(src,
  472. corruptObject);
  473. } catch (IOException ioError) {
  474. throw new StoredObjectRepresentationNotAvailableException(src,
  475. ioError);
  476. }
  477. if (quickCopy != null) {
  478. // The entire object fits into a single byte array window slice,
  479. // and we have it pinned. Write this out without copying.
  480. //
  481. out.writeHeader(src, inflatedLength);
  482. quickCopy.write(out, dataOffset, (int) dataLength);
  483. } else if (dataLength <= buf.length) {
  484. // Tiny optimization: Lots of objects are very small deltas or
  485. // deflated commits that are likely to fit in the copy buffer.
  486. //
  487. if (!validate) {
  488. long pos = dataOffset;
  489. long cnt = dataLength;
  490. while (cnt > 0) {
  491. final int n = (int) Math.min(cnt, buf.length);
  492. readFully(pos, buf, 0, n, curs);
  493. pos += n;
  494. cnt -= n;
  495. }
  496. }
  497. out.writeHeader(src, inflatedLength);
  498. out.write(buf, 0, (int) dataLength);
  499. } else {
  500. // Now we are committed to sending the object. As we spool it out,
  501. // check its CRC32 code to make sure there wasn't corruption between
  502. // the verification we did above, and us actually outputting it.
  503. //
  504. out.writeHeader(src, inflatedLength);
  505. long pos = dataOffset;
  506. long cnt = dataLength;
  507. while (cnt > 0) {
  508. final int n = (int) Math.min(cnt, buf.length);
  509. readFully(pos, buf, 0, n, curs);
  510. if (validate) {
  511. assert(crc2 != null);
  512. crc2.update(buf, 0, n);
  513. }
  514. out.write(buf, 0, n);
  515. pos += n;
  516. cnt -= n;
  517. }
  518. if (validate) {
  519. assert(crc2 != null);
  520. if (crc2.getValue() != expectedCRC) {
  521. throw new CorruptObjectException(MessageFormat.format(
  522. JGitText.get().objectAtHasBadZlibStream,
  523. Long.valueOf(src.offset), getPackFile()));
  524. }
  525. }
  526. }
  527. }
  528. boolean invalid() {
  529. return invalid;
  530. }
  531. void setInvalid() {
  532. invalid = true;
  533. }
  534. int incrementTransientErrorCount() {
  535. return transientErrorCount.incrementAndGet();
  536. }
  537. void resetTransientErrorCount() {
  538. transientErrorCount.set(0);
  539. }
  540. private void readFully(final long position, final byte[] dstbuf,
  541. int dstoff, final int cnt, final WindowCursor curs)
  542. throws IOException {
  543. if (curs.copy(this, position, dstbuf, dstoff, cnt) != cnt)
  544. throw new EOFException();
  545. }
  546. private synchronized void beginCopyAsIs(ObjectToPack otp)
  547. throws StoredObjectRepresentationNotAvailableException {
  548. if (++activeCopyRawData == 1 && activeWindows == 0) {
  549. try {
  550. doOpen();
  551. } catch (IOException thisPackNotValid) {
  552. throw new StoredObjectRepresentationNotAvailableException(otp,
  553. thisPackNotValid);
  554. }
  555. }
  556. }
  557. private synchronized void endCopyAsIs() {
  558. if (--activeCopyRawData == 0 && activeWindows == 0)
  559. doClose();
  560. }
  561. synchronized boolean beginWindowCache() throws IOException {
  562. if (++activeWindows == 1) {
  563. if (activeCopyRawData == 0)
  564. doOpen();
  565. return true;
  566. }
  567. return false;
  568. }
  569. synchronized boolean endWindowCache() {
  570. final boolean r = --activeWindows == 0;
  571. if (r && activeCopyRawData == 0)
  572. doClose();
  573. return r;
  574. }
  575. private void doOpen() throws IOException {
  576. if (invalid) {
  577. openFail(true, invalidatingCause);
  578. throw new PackInvalidException(packFile, invalidatingCause);
  579. }
  580. try {
  581. synchronized (readLock) {
  582. fd = new RandomAccessFile(packFile, "r"); //$NON-NLS-1$
  583. length = fd.length();
  584. onOpenPack();
  585. }
  586. } catch (InterruptedIOException e) {
  587. // don't invalidate the pack, we are interrupted from another thread
  588. openFail(false, e);
  589. throw e;
  590. } catch (FileNotFoundException fn) {
  591. // don't invalidate the pack if opening an existing file failed
  592. // since it may be related to a temporary lack of resources (e.g.
  593. // max open files)
  594. openFail(!packFile.exists(), fn);
  595. throw fn;
  596. } catch (EOFException | AccessDeniedException | NoSuchFileException
  597. | CorruptObjectException | NoPackSignatureException
  598. | PackMismatchException | UnpackException
  599. | UnsupportedPackIndexVersionException
  600. | UnsupportedPackVersionException pe) {
  601. // exceptions signaling permanent problems with a pack
  602. openFail(true, pe);
  603. throw pe;
  604. } catch (IOException | RuntimeException ge) {
  605. // generic exceptions could be transient so we should not mark the
  606. // pack invalid to avoid false MissingObjectExceptions
  607. openFail(false, ge);
  608. throw ge;
  609. }
  610. }
  611. private void openFail(boolean invalidate, Exception cause) {
  612. activeWindows = 0;
  613. activeCopyRawData = 0;
  614. invalid = invalidate;
  615. invalidatingCause = cause;
  616. doClose();
  617. }
  618. private void doClose() {
  619. synchronized (readLock) {
  620. if (fd != null) {
  621. try {
  622. fd.close();
  623. } catch (IOException err) {
  624. // Ignore a close event. We had it open only for reading.
  625. // There should not be errors related to network buffers
  626. // not flushed, etc.
  627. }
  628. fd = null;
  629. }
  630. }
  631. }
  632. ByteArrayWindow read(long pos, int size) throws IOException {
  633. synchronized (readLock) {
  634. if (invalid || fd == null) {
  635. // Due to concurrency between a read and another packfile invalidation thread
  636. // one thread could come up to this point and then fail with NPE.
  637. // Detect the situation and throw a proper exception so that can be properly
  638. // managed by the main packfile search loop and the Git client won't receive
  639. // any failures.
  640. throw new PackInvalidException(packFile, invalidatingCause);
  641. }
  642. if (length < pos + size)
  643. size = (int) (length - pos);
  644. final byte[] buf = new byte[size];
  645. fd.seek(pos);
  646. fd.readFully(buf, 0, size);
  647. return new ByteArrayWindow(this, pos, buf);
  648. }
  649. }
  650. ByteWindow mmap(long pos, int size) throws IOException {
  651. synchronized (readLock) {
  652. if (length < pos + size)
  653. size = (int) (length - pos);
  654. MappedByteBuffer map;
  655. try {
  656. map = fd.getChannel().map(MapMode.READ_ONLY, pos, size);
  657. } catch (IOException ioe1) {
  658. // The most likely reason this failed is the JVM has run out
  659. // of virtual memory. We need to discard quickly, and try to
  660. // force the GC to finalize and release any existing mappings.
  661. //
  662. System.gc();
  663. System.runFinalization();
  664. map = fd.getChannel().map(MapMode.READ_ONLY, pos, size);
  665. }
  666. if (map.hasArray())
  667. return new ByteArrayWindow(this, pos, map.array());
  668. return new ByteBufferWindow(this, pos, map);
  669. }
  670. }
  671. private void onOpenPack() throws IOException {
  672. final PackIndex idx = idx();
  673. final byte[] buf = new byte[20];
  674. fd.seek(0);
  675. fd.readFully(buf, 0, 12);
  676. if (RawParseUtils.match(buf, 0, Constants.PACK_SIGNATURE) != 4) {
  677. throw new NoPackSignatureException(JGitText.get().notAPACKFile);
  678. }
  679. final long vers = NB.decodeUInt32(buf, 4);
  680. final long packCnt = NB.decodeUInt32(buf, 8);
  681. if (vers != 2 && vers != 3) {
  682. throw new UnsupportedPackVersionException(vers);
  683. }
  684. if (packCnt != idx.getObjectCount()) {
  685. throw new PackMismatchException(MessageFormat.format(
  686. JGitText.get().packObjectCountMismatch,
  687. Long.valueOf(packCnt), Long.valueOf(idx.getObjectCount()),
  688. getPackFile()));
  689. }
  690. fd.seek(length - 20);
  691. fd.readFully(buf, 0, 20);
  692. if (!Arrays.equals(buf, packChecksum)) {
  693. throw new PackMismatchException(MessageFormat.format(
  694. JGitText.get().packChecksumMismatch,
  695. getPackFile(),
  696. ObjectId.fromRaw(buf).name(),
  697. ObjectId.fromRaw(idx.packChecksum).name()));
  698. }
  699. }
  700. ObjectLoader load(WindowCursor curs, long pos)
  701. throws IOException, LargeObjectException {
  702. try {
  703. final byte[] ib = curs.tempId;
  704. Delta delta = null;
  705. byte[] data = null;
  706. int type = Constants.OBJ_BAD;
  707. boolean cached = false;
  708. SEARCH: for (;;) {
  709. readFully(pos, ib, 0, 20, curs);
  710. int c = ib[0] & 0xff;
  711. final int typeCode = (c >> 4) & 7;
  712. long sz = c & 15;
  713. int shift = 4;
  714. int p = 1;
  715. while ((c & 0x80) != 0) {
  716. c = ib[p++] & 0xff;
  717. sz += ((long) (c & 0x7f)) << shift;
  718. shift += 7;
  719. }
  720. switch (typeCode) {
  721. case Constants.OBJ_COMMIT:
  722. case Constants.OBJ_TREE:
  723. case Constants.OBJ_BLOB:
  724. case Constants.OBJ_TAG: {
  725. if (delta != null || sz < curs.getStreamFileThreshold()) {
  726. data = decompress(pos + p, (int) sz, curs);
  727. }
  728. if (delta != null) {
  729. type = typeCode;
  730. break SEARCH;
  731. }
  732. if (data != null) {
  733. return new ObjectLoader.SmallObject(typeCode, data);
  734. }
  735. return new LargePackedWholeObject(typeCode, sz, pos, p,
  736. this, curs.db);
  737. }
  738. case Constants.OBJ_OFS_DELTA: {
  739. c = ib[p++] & 0xff;
  740. long base = c & 127;
  741. while ((c & 128) != 0) {
  742. base += 1;
  743. c = ib[p++] & 0xff;
  744. base <<= 7;
  745. base += (c & 127);
  746. }
  747. base = pos - base;
  748. delta = new Delta(delta, pos, (int) sz, p, base);
  749. if (sz != delta.deltaSize)
  750. break SEARCH;
  751. DeltaBaseCache.Entry e = curs.getDeltaBaseCache().get(this, base);
  752. if (e != null) {
  753. type = e.type;
  754. data = e.data;
  755. cached = true;
  756. break SEARCH;
  757. }
  758. pos = base;
  759. continue SEARCH;
  760. }
  761. case Constants.OBJ_REF_DELTA: {
  762. readFully(pos + p, ib, 0, 20, curs);
  763. long base = findDeltaBase(ObjectId.fromRaw(ib));
  764. delta = new Delta(delta, pos, (int) sz, p + 20, base);
  765. if (sz != delta.deltaSize)
  766. break SEARCH;
  767. DeltaBaseCache.Entry e = curs.getDeltaBaseCache().get(this, base);
  768. if (e != null) {
  769. type = e.type;
  770. data = e.data;
  771. cached = true;
  772. break SEARCH;
  773. }
  774. pos = base;
  775. continue SEARCH;
  776. }
  777. default:
  778. throw new IOException(MessageFormat.format(
  779. JGitText.get().unknownObjectType,
  780. Integer.valueOf(typeCode)));
  781. }
  782. }
  783. // At this point there is at least one delta to apply to data.
  784. // (Whole objects with no deltas to apply return early above.)
  785. if (data == null)
  786. throw new IOException(JGitText.get().inMemoryBufferLimitExceeded);
  787. assert(delta != null);
  788. do {
  789. // Cache only the base immediately before desired object.
  790. if (cached)
  791. cached = false;
  792. else if (delta.next == null)
  793. curs.getDeltaBaseCache().store(this, delta.basePos, data, type);
  794. pos = delta.deltaPos;
  795. final byte[] cmds = decompress(pos + delta.hdrLen,
  796. delta.deltaSize, curs);
  797. if (cmds == null) {
  798. data = null; // Discard base in case of OutOfMemoryError
  799. throw new LargeObjectException.OutOfMemory(new OutOfMemoryError());
  800. }
  801. final long sz = BinaryDelta.getResultSize(cmds);
  802. if (Integer.MAX_VALUE <= sz)
  803. throw new LargeObjectException.ExceedsByteArrayLimit();
  804. final byte[] result;
  805. try {
  806. result = new byte[(int) sz];
  807. } catch (OutOfMemoryError tooBig) {
  808. data = null; // Discard base in case of OutOfMemoryError
  809. throw new LargeObjectException.OutOfMemory(tooBig);
  810. }
  811. BinaryDelta.apply(data, cmds, result);
  812. data = result;
  813. delta = delta.next;
  814. } while (delta != null);
  815. return new ObjectLoader.SmallObject(type, data);
  816. } catch (DataFormatException dfe) {
  817. throw new CorruptObjectException(
  818. MessageFormat.format(
  819. JGitText.get().objectAtHasBadZlibStream,
  820. Long.valueOf(pos), getPackFile()),
  821. dfe);
  822. }
  823. }
  824. private long findDeltaBase(ObjectId baseId) throws IOException,
  825. MissingObjectException {
  826. long ofs = idx().findOffset(baseId);
  827. if (ofs < 0)
  828. throw new MissingObjectException(baseId,
  829. JGitText.get().missingDeltaBase);
  830. return ofs;
  831. }
  832. private static class Delta {
  833. /** Child that applies onto this object. */
  834. final Delta next;
  835. /** Offset of the delta object. */
  836. final long deltaPos;
  837. /** Size of the inflated delta stream. */
  838. final int deltaSize;
  839. /** Total size of the delta's pack entry header (including base). */
  840. final int hdrLen;
  841. /** Offset of the base object this delta applies onto. */
  842. final long basePos;
  843. Delta(Delta next, long ofs, int sz, int hdrLen, long baseOffset) {
  844. this.next = next;
  845. this.deltaPos = ofs;
  846. this.deltaSize = sz;
  847. this.hdrLen = hdrLen;
  848. this.basePos = baseOffset;
  849. }
  850. }
  851. byte[] getDeltaHeader(WindowCursor wc, long pos)
  852. throws IOException, DataFormatException {
  853. // The delta stream starts as two variable length integers. If we
  854. // assume they are 64 bits each, we need 16 bytes to encode them,
  855. // plus 2 extra bytes for the variable length overhead. So 18 is
  856. // the longest delta instruction header.
  857. //
  858. final byte[] hdr = new byte[18];
  859. wc.inflate(this, pos, hdr, true /* headerOnly */);
  860. return hdr;
  861. }
  862. int getObjectType(WindowCursor curs, long pos) throws IOException {
  863. final byte[] ib = curs.tempId;
  864. for (;;) {
  865. readFully(pos, ib, 0, 20, curs);
  866. int c = ib[0] & 0xff;
  867. final int type = (c >> 4) & 7;
  868. switch (type) {
  869. case Constants.OBJ_COMMIT:
  870. case Constants.OBJ_TREE:
  871. case Constants.OBJ_BLOB:
  872. case Constants.OBJ_TAG:
  873. return type;
  874. case Constants.OBJ_OFS_DELTA: {
  875. int p = 1;
  876. while ((c & 0x80) != 0)
  877. c = ib[p++] & 0xff;
  878. c = ib[p++] & 0xff;
  879. long ofs = c & 127;
  880. while ((c & 128) != 0) {
  881. ofs += 1;
  882. c = ib[p++] & 0xff;
  883. ofs <<= 7;
  884. ofs += (c & 127);
  885. }
  886. pos = pos - ofs;
  887. continue;
  888. }
  889. case Constants.OBJ_REF_DELTA: {
  890. int p = 1;
  891. while ((c & 0x80) != 0)
  892. c = ib[p++] & 0xff;
  893. readFully(pos + p, ib, 0, 20, curs);
  894. pos = findDeltaBase(ObjectId.fromRaw(ib));
  895. continue;
  896. }
  897. default:
  898. throw new IOException(
  899. MessageFormat.format(JGitText.get().unknownObjectType,
  900. Integer.valueOf(type)));
  901. }
  902. }
  903. }
  904. long getObjectSize(WindowCursor curs, AnyObjectId id)
  905. throws IOException {
  906. final long offset = idx().findOffset(id);
  907. return 0 < offset ? getObjectSize(curs, offset) : -1;
  908. }
  909. long getObjectSize(WindowCursor curs, long pos)
  910. throws IOException {
  911. final byte[] ib = curs.tempId;
  912. readFully(pos, ib, 0, 20, curs);
  913. int c = ib[0] & 0xff;
  914. final int type = (c >> 4) & 7;
  915. long sz = c & 15;
  916. int shift = 4;
  917. int p = 1;
  918. while ((c & 0x80) != 0) {
  919. c = ib[p++] & 0xff;
  920. sz += ((long) (c & 0x7f)) << shift;
  921. shift += 7;
  922. }
  923. long deltaAt;
  924. switch (type) {
  925. case Constants.OBJ_COMMIT:
  926. case Constants.OBJ_TREE:
  927. case Constants.OBJ_BLOB:
  928. case Constants.OBJ_TAG:
  929. return sz;
  930. case Constants.OBJ_OFS_DELTA:
  931. c = ib[p++] & 0xff;
  932. while ((c & 128) != 0)
  933. c = ib[p++] & 0xff;
  934. deltaAt = pos + p;
  935. break;
  936. case Constants.OBJ_REF_DELTA:
  937. deltaAt = pos + p + 20;
  938. break;
  939. default:
  940. throw new IOException(MessageFormat.format(
  941. JGitText.get().unknownObjectType, Integer.valueOf(type)));
  942. }
  943. try {
  944. return BinaryDelta.getResultSize(getDeltaHeader(curs, deltaAt));
  945. } catch (DataFormatException e) {
  946. throw new CorruptObjectException(MessageFormat.format(
  947. JGitText.get().objectAtHasBadZlibStream, Long.valueOf(pos),
  948. getPackFile()), e);
  949. }
  950. }
  951. LocalObjectRepresentation representation(final WindowCursor curs,
  952. final AnyObjectId objectId) throws IOException {
  953. final long pos = idx().findOffset(objectId);
  954. if (pos < 0)
  955. return null;
  956. final byte[] ib = curs.tempId;
  957. readFully(pos, ib, 0, 20, curs);
  958. int c = ib[0] & 0xff;
  959. int p = 1;
  960. final int typeCode = (c >> 4) & 7;
  961. while ((c & 0x80) != 0)
  962. c = ib[p++] & 0xff;
  963. long len = (findEndOffset(pos) - pos);
  964. switch (typeCode) {
  965. case Constants.OBJ_COMMIT:
  966. case Constants.OBJ_TREE:
  967. case Constants.OBJ_BLOB:
  968. case Constants.OBJ_TAG:
  969. return LocalObjectRepresentation.newWhole(this, pos, len - p);
  970. case Constants.OBJ_OFS_DELTA: {
  971. c = ib[p++] & 0xff;
  972. long ofs = c & 127;
  973. while ((c & 128) != 0) {
  974. ofs += 1;
  975. c = ib[p++] & 0xff;
  976. ofs <<= 7;
  977. ofs += (c & 127);
  978. }
  979. ofs = pos - ofs;
  980. return LocalObjectRepresentation.newDelta(this, pos, len - p, ofs);
  981. }
  982. case Constants.OBJ_REF_DELTA: {
  983. len -= p;
  984. len -= Constants.OBJECT_ID_LENGTH;
  985. readFully(pos + p, ib, 0, 20, curs);
  986. ObjectId id = ObjectId.fromRaw(ib);
  987. return LocalObjectRepresentation.newDelta(this, pos, len, id);
  988. }
  989. default:
  990. throw new IOException(
  991. MessageFormat.format(JGitText.get().unknownObjectType,
  992. Integer.valueOf(typeCode)));
  993. }
  994. }
  995. private long findEndOffset(long startOffset)
  996. throws IOException, CorruptObjectException {
  997. final long maxOffset = length - 20;
  998. return getReverseIdx().findNextOffset(startOffset, maxOffset);
  999. }
  1000. synchronized PackBitmapIndex getBitmapIndex() throws IOException {
  1001. if (invalid || bitmapIdxFile == null) {
  1002. return null;
  1003. }
  1004. if (bitmapIdx == null) {
  1005. final PackBitmapIndex idx;
  1006. try {
  1007. idx = PackBitmapIndex.open(bitmapIdxFile, idx(),
  1008. getReverseIdx());
  1009. } catch (FileNotFoundException e) {
  1010. // Once upon a time this bitmap file existed. Now it
  1011. // has been removed. Most likely an external gc has
  1012. // removed this packfile and the bitmap
  1013. bitmapIdxFile = null;
  1014. return null;
  1015. }
  1016. // At this point, idx() will have set packChecksum.
  1017. if (Arrays.equals(packChecksum, idx.packChecksum)) {
  1018. bitmapIdx = idx;
  1019. } else {
  1020. bitmapIdxFile = null;
  1021. }
  1022. }
  1023. return bitmapIdx;
  1024. }
  1025. private synchronized PackReverseIndex getReverseIdx() throws IOException {
  1026. if (reverseIdx == null)
  1027. reverseIdx = new PackReverseIndex(idx());
  1028. return reverseIdx;
  1029. }
  1030. private boolean isCorrupt(long offset) {
  1031. LongList list = corruptObjects;
  1032. if (list == null)
  1033. return false;
  1034. synchronized (list) {
  1035. return list.contains(offset);
  1036. }
  1037. }
  1038. private void setCorrupt(long offset) {
  1039. LongList list = corruptObjects;
  1040. if (list == null) {
  1041. synchronized (readLock) {
  1042. list = corruptObjects;
  1043. if (list == null) {
  1044. list = new LongList();
  1045. corruptObjects = list;
  1046. }
  1047. }
  1048. }
  1049. synchronized (list) {
  1050. list.add(offset);
  1051. }
  1052. }
  1053. @SuppressWarnings("nls")
  1054. @Override
  1055. public String toString() {
  1056. return "Pack [packFileName=" + packFile.getName() + ", length="
  1057. + packFile.length() + ", packChecksum="
  1058. + ObjectId.fromRaw(packChecksum).name() + "]";
  1059. }
  1060. }