Ви не можете вибрати більше 25 тем Теми мають розпочинатися з літери або цифри, можуть містити дефіси (-) і не повинні перевищувати 35 символів.

Added read/write support for pack bitmap index. A pack bitmap index is an additional index of compressed bitmaps of the object graph. Furthermore, a logical API of the index functionality is included, as it is expected to be used by the PackWriter. Compressed bitmaps are created using the javaewah library, which is a word-aligned compressed variant of the Java bitset class based on run-length encoding. The library only works with positive integer values. Thus, the maximum number of ObjectIds in a pack file that this index can currently support is limited to Integer.MAX_VALUE. Every ObjectId is given an integer mapping. The integer is the position of the ObjectId in the complete ObjectId list, sorted by offset, for the pack file. That integer is what the bitmaps use to reference the ObjectId. Currently, the new index format can only be used with pack files that contain a complete closure of the object graph e.g. the result of a garbage collection. The index file includes four bitmaps for the Git object types i.e. commits, trees, blobs, and tags. In addition, a collection of bitmaps keyed by an ObjectId is also included. The bitmap for each entry in the collection represents the full closure of ObjectIds reachable from the keyed ObjectId (including the keyed ObjectId itself). The bitmaps are further compressed by XORing the current bitmaps against prior bitmaps in the index, and selecting the smallest representation. The XOR'd bitmap and offset from the current entry to the position of the bitmap to XOR against is the actual representation of the entry in the index file. Each entry contains one byte, which is currently used to note whether the bitmap should be blindly reused. Change-Id: Id328724bf6b4c8366a088233098c18643edcf40f
11 роки тому
Don't use interruptable pread() to access pack files The J2SE NIO APIs require that FileChannel close the underlying file descriptor if a thread is interrupted while it is inside of a read or write operation on that channel. This is insane, because it means we cannot share the file descriptor between threads. If a thread is in the middle of the FileChannel variant of IO.readFully() and it receives an interrupt, the pack will be automatically closed on us. This causes the other threads trying to use that same FileChannel to receive IOExceptions, which leads to the pack getting marked as invalid. Once the pack is marked invalid, JGit loses access to its entire contents and starts to report MissingObjectExceptions. Because PackWriter must ensure that the chosen pack file stays available until the current object's data is fully copied to the output, JGit cannot simply reopen the pack when its automatically closed due to an interrupt being sent at the wrong time. The pack may have been deleted by a concurrent `git gc` process, and that open file descriptor might be the last reference to the inode on disk. Once its closed, the PackWriter loses access to that object representation, and it cannot complete sending the object the client. Fortunately, RandomAccessFile's readFully method does not have this problem. Interrupts during readFully() are ignored. However, it requires us to first seek to the offset we need to read, then issue the read call. This requires locking around the file descriptor to prevent concurrent threads from moving the pointer before the read. This reduces the concurrency level, as now only one window can be paged in at a time from each pack. However, the WindowCache should already be holding most of the pages required to handle the working set for a process, and its own internal locking was already limiting us on the number of concurrent loads possible. Provided that most concurrent accesses are getting hits in the WindowCache, or are for different repositories on the same server, we shouldn't see a major performance hit due to the more serialized loading. I would have preferred to use a pool of RandomAccessFiles for each pack, with threads borrowing an instance dedicated to that thread whenever they needed to page in a window. This would permit much higher levels of concurrency by using multiple file descriptors (and file pointers) for each pack. However the code became too complex to develop in any reasonable period of time, so I've chosen to retrofit the existing code with more serialization instead. Bug: 308945 Change-Id: I2e6e11c6e5a105e5aef68871b66200fd725134c9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 роки тому
Added read/write support for pack bitmap index. A pack bitmap index is an additional index of compressed bitmaps of the object graph. Furthermore, a logical API of the index functionality is included, as it is expected to be used by the PackWriter. Compressed bitmaps are created using the javaewah library, which is a word-aligned compressed variant of the Java bitset class based on run-length encoding. The library only works with positive integer values. Thus, the maximum number of ObjectIds in a pack file that this index can currently support is limited to Integer.MAX_VALUE. Every ObjectId is given an integer mapping. The integer is the position of the ObjectId in the complete ObjectId list, sorted by offset, for the pack file. That integer is what the bitmaps use to reference the ObjectId. Currently, the new index format can only be used with pack files that contain a complete closure of the object graph e.g. the result of a garbage collection. The index file includes four bitmaps for the Git object types i.e. commits, trees, blobs, and tags. In addition, a collection of bitmaps keyed by an ObjectId is also included. The bitmap for each entry in the collection represents the full closure of ObjectIds reachable from the keyed ObjectId (including the keyed ObjectId itself). The bitmaps are further compressed by XORing the current bitmaps against prior bitmaps in the index, and selecting the smallest representation. The XOR'd bitmap and offset from the current entry to the position of the bitmap to XOR against is the actual representation of the entry in the index file. Each entry contains one byte, which is currently used to note whether the bitmap should be blindly reused. Change-Id: Id328724bf6b4c8366a088233098c18643edcf40f
11 роки тому
Added read/write support for pack bitmap index. A pack bitmap index is an additional index of compressed bitmaps of the object graph. Furthermore, a logical API of the index functionality is included, as it is expected to be used by the PackWriter. Compressed bitmaps are created using the javaewah library, which is a word-aligned compressed variant of the Java bitset class based on run-length encoding. The library only works with positive integer values. Thus, the maximum number of ObjectIds in a pack file that this index can currently support is limited to Integer.MAX_VALUE. Every ObjectId is given an integer mapping. The integer is the position of the ObjectId in the complete ObjectId list, sorted by offset, for the pack file. That integer is what the bitmaps use to reference the ObjectId. Currently, the new index format can only be used with pack files that contain a complete closure of the object graph e.g. the result of a garbage collection. The index file includes four bitmaps for the Git object types i.e. commits, trees, blobs, and tags. In addition, a collection of bitmaps keyed by an ObjectId is also included. The bitmap for each entry in the collection represents the full closure of ObjectIds reachable from the keyed ObjectId (including the keyed ObjectId itself). The bitmaps are further compressed by XORing the current bitmaps against prior bitmaps in the index, and selecting the smallest representation. The XOR'd bitmap and offset from the current entry to the position of the bitmap to XOR against is the actual representation of the entry in the index file. Each entry contains one byte, which is currently used to note whether the bitmap should be blindly reused. Change-Id: Id328724bf6b4c8366a088233098c18643edcf40f
11 роки тому
PackWriter: Support reuse of entire packs The most expensive part of packing a repository for transport to another system is enumerating all of the objects in the repository. Once this gets to the size of the linux-2.6 repository (1.8 million objects), enumeration can take several CPU minutes and costs a lot of temporary working set memory. Teach PackWriter to efficiently reuse an existing "cached pack" by answering a clone request with a thin pack followed by a larger cached pack appended to the end. This requires the repository owner to first construct the cached pack by hand, and record the tip commits inside of $GIT_DIR/objects/info/cached-packs: cd $GIT_DIR root=$(git rev-parse master) tmp=objects/.tmp-$$ names=$(echo $root | git pack-objects --keep-true-parents --revs $tmp) for n in $names; do chmod a-w $tmp-$n.pack $tmp-$n.idx touch objects/pack/pack-$n.keep mv $tmp-$n.pack objects/pack/pack-$n.pack mv $tmp-$n.idx objects/pack/pack-$n.idx done (echo "+ $root"; for n in $names; do echo "P $n"; done; echo) >>objects/info/cached-packs git repack -a -d When a clone request needs to include $root, the corresponding cached pack will be copied as-is, rather than enumerating all of the objects that are reachable from $root. For a linux-2.6 kernel repository that should be about 376 MiB, the above process creates two packs of 368 MiB and 38 MiB[1]. This is a local disk usage increase of ~26 MiB, due to reduced delta compression between the large cached pack and the smaller recent activity pack. The overhead is similar to 1 full copy of the compressed project sources. With this cached pack in hand, JGit daemon completes a clone request in 1m17s less time, but a slightly larger data transfer (+2.39 MiB): Before: remote: Counting objects: 1861830, done remote: Finding sources: 100% (1861830/1861830) remote: Getting sizes: 100% (88243/88243) remote: Compressing objects: 100% (88184/88184) Receiving objects: 100% (1861830/1861830), 376.01 MiB | 19.01 MiB/s, done. remote: Total 1861830 (delta 4706), reused 1851053 (delta 1553844) Resolving deltas: 100% (1564621/1564621), done. real 3m19.005s After: remote: Counting objects: 1601, done remote: Counting objects: 1828460, done remote: Finding sources: 100% (50475/50475) remote: Getting sizes: 100% (18843/18843) remote: Compressing objects: 100% (7585/7585) remote: Total 1861830 (delta 2407), reused 1856197 (delta 37510) Receiving objects: 100% (1861830/1861830), 378.40 MiB | 31.31 MiB/s, done. Resolving deltas: 100% (1559477/1559477), done. real 2m2.938s Repository owners can periodically refresh their cached packs by repacking their repository, folding all newer objects into a larger cached pack. Since repacking is already considered to be a normal Git maintenance activity, this isn't a very big burden. [1] In this test $root was set back about two weeks. Change-Id: Ib87131d5c4b5e8c5cacb0f4fe16ff4ece554734b Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 роки тому
PackWriter: Support reuse of entire packs The most expensive part of packing a repository for transport to another system is enumerating all of the objects in the repository. Once this gets to the size of the linux-2.6 repository (1.8 million objects), enumeration can take several CPU minutes and costs a lot of temporary working set memory. Teach PackWriter to efficiently reuse an existing "cached pack" by answering a clone request with a thin pack followed by a larger cached pack appended to the end. This requires the repository owner to first construct the cached pack by hand, and record the tip commits inside of $GIT_DIR/objects/info/cached-packs: cd $GIT_DIR root=$(git rev-parse master) tmp=objects/.tmp-$$ names=$(echo $root | git pack-objects --keep-true-parents --revs $tmp) for n in $names; do chmod a-w $tmp-$n.pack $tmp-$n.idx touch objects/pack/pack-$n.keep mv $tmp-$n.pack objects/pack/pack-$n.pack mv $tmp-$n.idx objects/pack/pack-$n.idx done (echo "+ $root"; for n in $names; do echo "P $n"; done; echo) >>objects/info/cached-packs git repack -a -d When a clone request needs to include $root, the corresponding cached pack will be copied as-is, rather than enumerating all of the objects that are reachable from $root. For a linux-2.6 kernel repository that should be about 376 MiB, the above process creates two packs of 368 MiB and 38 MiB[1]. This is a local disk usage increase of ~26 MiB, due to reduced delta compression between the large cached pack and the smaller recent activity pack. The overhead is similar to 1 full copy of the compressed project sources. With this cached pack in hand, JGit daemon completes a clone request in 1m17s less time, but a slightly larger data transfer (+2.39 MiB): Before: remote: Counting objects: 1861830, done remote: Finding sources: 100% (1861830/1861830) remote: Getting sizes: 100% (88243/88243) remote: Compressing objects: 100% (88184/88184) Receiving objects: 100% (1861830/1861830), 376.01 MiB | 19.01 MiB/s, done. remote: Total 1861830 (delta 4706), reused 1851053 (delta 1553844) Resolving deltas: 100% (1564621/1564621), done. real 3m19.005s After: remote: Counting objects: 1601, done remote: Counting objects: 1828460, done remote: Finding sources: 100% (50475/50475) remote: Getting sizes: 100% (18843/18843) remote: Compressing objects: 100% (7585/7585) remote: Total 1861830 (delta 2407), reused 1856197 (delta 37510) Receiving objects: 100% (1861830/1861830), 378.40 MiB | 31.31 MiB/s, done. Resolving deltas: 100% (1559477/1559477), done. real 2m2.938s Repository owners can periodically refresh their cached packs by repacking their repository, folding all newer objects into a larger cached pack. Since repacking is already considered to be a normal Git maintenance activity, this isn't a very big burden. [1] In this test $root was set back about two weeks. Change-Id: Ib87131d5c4b5e8c5cacb0f4fe16ff4ece554734b Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 роки тому
PackWriter: Support reuse of entire packs The most expensive part of packing a repository for transport to another system is enumerating all of the objects in the repository. Once this gets to the size of the linux-2.6 repository (1.8 million objects), enumeration can take several CPU minutes and costs a lot of temporary working set memory. Teach PackWriter to efficiently reuse an existing "cached pack" by answering a clone request with a thin pack followed by a larger cached pack appended to the end. This requires the repository owner to first construct the cached pack by hand, and record the tip commits inside of $GIT_DIR/objects/info/cached-packs: cd $GIT_DIR root=$(git rev-parse master) tmp=objects/.tmp-$$ names=$(echo $root | git pack-objects --keep-true-parents --revs $tmp) for n in $names; do chmod a-w $tmp-$n.pack $tmp-$n.idx touch objects/pack/pack-$n.keep mv $tmp-$n.pack objects/pack/pack-$n.pack mv $tmp-$n.idx objects/pack/pack-$n.idx done (echo "+ $root"; for n in $names; do echo "P $n"; done; echo) >>objects/info/cached-packs git repack -a -d When a clone request needs to include $root, the corresponding cached pack will be copied as-is, rather than enumerating all of the objects that are reachable from $root. For a linux-2.6 kernel repository that should be about 376 MiB, the above process creates two packs of 368 MiB and 38 MiB[1]. This is a local disk usage increase of ~26 MiB, due to reduced delta compression between the large cached pack and the smaller recent activity pack. The overhead is similar to 1 full copy of the compressed project sources. With this cached pack in hand, JGit daemon completes a clone request in 1m17s less time, but a slightly larger data transfer (+2.39 MiB): Before: remote: Counting objects: 1861830, done remote: Finding sources: 100% (1861830/1861830) remote: Getting sizes: 100% (88243/88243) remote: Compressing objects: 100% (88184/88184) Receiving objects: 100% (1861830/1861830), 376.01 MiB | 19.01 MiB/s, done. remote: Total 1861830 (delta 4706), reused 1851053 (delta 1553844) Resolving deltas: 100% (1564621/1564621), done. real 3m19.005s After: remote: Counting objects: 1601, done remote: Counting objects: 1828460, done remote: Finding sources: 100% (50475/50475) remote: Getting sizes: 100% (18843/18843) remote: Compressing objects: 100% (7585/7585) remote: Total 1861830 (delta 2407), reused 1856197 (delta 37510) Receiving objects: 100% (1861830/1861830), 378.40 MiB | 31.31 MiB/s, done. Resolving deltas: 100% (1559477/1559477), done. real 2m2.938s Repository owners can periodically refresh their cached packs by repacking their repository, folding all newer objects into a larger cached pack. Since repacking is already considered to be a normal Git maintenance activity, this isn't a very big burden. [1] In this test $root was set back about two weeks. Change-Id: Ib87131d5c4b5e8c5cacb0f4fe16ff4ece554734b Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 роки тому
PackWriter: Support reuse of entire packs The most expensive part of packing a repository for transport to another system is enumerating all of the objects in the repository. Once this gets to the size of the linux-2.6 repository (1.8 million objects), enumeration can take several CPU minutes and costs a lot of temporary working set memory. Teach PackWriter to efficiently reuse an existing "cached pack" by answering a clone request with a thin pack followed by a larger cached pack appended to the end. This requires the repository owner to first construct the cached pack by hand, and record the tip commits inside of $GIT_DIR/objects/info/cached-packs: cd $GIT_DIR root=$(git rev-parse master) tmp=objects/.tmp-$$ names=$(echo $root | git pack-objects --keep-true-parents --revs $tmp) for n in $names; do chmod a-w $tmp-$n.pack $tmp-$n.idx touch objects/pack/pack-$n.keep mv $tmp-$n.pack objects/pack/pack-$n.pack mv $tmp-$n.idx objects/pack/pack-$n.idx done (echo "+ $root"; for n in $names; do echo "P $n"; done; echo) >>objects/info/cached-packs git repack -a -d When a clone request needs to include $root, the corresponding cached pack will be copied as-is, rather than enumerating all of the objects that are reachable from $root. For a linux-2.6 kernel repository that should be about 376 MiB, the above process creates two packs of 368 MiB and 38 MiB[1]. This is a local disk usage increase of ~26 MiB, due to reduced delta compression between the large cached pack and the smaller recent activity pack. The overhead is similar to 1 full copy of the compressed project sources. With this cached pack in hand, JGit daemon completes a clone request in 1m17s less time, but a slightly larger data transfer (+2.39 MiB): Before: remote: Counting objects: 1861830, done remote: Finding sources: 100% (1861830/1861830) remote: Getting sizes: 100% (88243/88243) remote: Compressing objects: 100% (88184/88184) Receiving objects: 100% (1861830/1861830), 376.01 MiB | 19.01 MiB/s, done. remote: Total 1861830 (delta 4706), reused 1851053 (delta 1553844) Resolving deltas: 100% (1564621/1564621), done. real 3m19.005s After: remote: Counting objects: 1601, done remote: Counting objects: 1828460, done remote: Finding sources: 100% (50475/50475) remote: Getting sizes: 100% (18843/18843) remote: Compressing objects: 100% (7585/7585) remote: Total 1861830 (delta 2407), reused 1856197 (delta 37510) Receiving objects: 100% (1861830/1861830), 378.40 MiB | 31.31 MiB/s, done. Resolving deltas: 100% (1559477/1559477), done. real 2m2.938s Repository owners can periodically refresh their cached packs by repacking their repository, folding all newer objects into a larger cached pack. Since repacking is already considered to be a normal Git maintenance activity, this isn't a very big burden. [1] In this test $root was set back about two weeks. Change-Id: Ib87131d5c4b5e8c5cacb0f4fe16ff4ece554734b Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 роки тому
Don't use interruptable pread() to access pack files The J2SE NIO APIs require that FileChannel close the underlying file descriptor if a thread is interrupted while it is inside of a read or write operation on that channel. This is insane, because it means we cannot share the file descriptor between threads. If a thread is in the middle of the FileChannel variant of IO.readFully() and it receives an interrupt, the pack will be automatically closed on us. This causes the other threads trying to use that same FileChannel to receive IOExceptions, which leads to the pack getting marked as invalid. Once the pack is marked invalid, JGit loses access to its entire contents and starts to report MissingObjectExceptions. Because PackWriter must ensure that the chosen pack file stays available until the current object's data is fully copied to the output, JGit cannot simply reopen the pack when its automatically closed due to an interrupt being sent at the wrong time. The pack may have been deleted by a concurrent `git gc` process, and that open file descriptor might be the last reference to the inode on disk. Once its closed, the PackWriter loses access to that object representation, and it cannot complete sending the object the client. Fortunately, RandomAccessFile's readFully method does not have this problem. Interrupts during readFully() are ignored. However, it requires us to first seek to the offset we need to read, then issue the read call. This requires locking around the file descriptor to prevent concurrent threads from moving the pointer before the read. This reduces the concurrency level, as now only one window can be paged in at a time from each pack. However, the WindowCache should already be holding most of the pages required to handle the working set for a process, and its own internal locking was already limiting us on the number of concurrent loads possible. Provided that most concurrent accesses are getting hits in the WindowCache, or are for different repositories on the same server, we shouldn't see a major performance hit due to the more serialized loading. I would have preferred to use a pool of RandomAccessFiles for each pack, with threads borrowing an instance dedicated to that thread whenever they needed to page in a window. This would permit much higher levels of concurrency by using multiple file descriptors (and file pointers) for each pack. However the code became too complex to develop in any reasonable period of time, so I've chosen to retrofit the existing code with more serialization instead. Bug: 308945 Change-Id: I2e6e11c6e5a105e5aef68871b66200fd725134c9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 роки тому
Don't use interruptable pread() to access pack files The J2SE NIO APIs require that FileChannel close the underlying file descriptor if a thread is interrupted while it is inside of a read or write operation on that channel. This is insane, because it means we cannot share the file descriptor between threads. If a thread is in the middle of the FileChannel variant of IO.readFully() and it receives an interrupt, the pack will be automatically closed on us. This causes the other threads trying to use that same FileChannel to receive IOExceptions, which leads to the pack getting marked as invalid. Once the pack is marked invalid, JGit loses access to its entire contents and starts to report MissingObjectExceptions. Because PackWriter must ensure that the chosen pack file stays available until the current object's data is fully copied to the output, JGit cannot simply reopen the pack when its automatically closed due to an interrupt being sent at the wrong time. The pack may have been deleted by a concurrent `git gc` process, and that open file descriptor might be the last reference to the inode on disk. Once its closed, the PackWriter loses access to that object representation, and it cannot complete sending the object the client. Fortunately, RandomAccessFile's readFully method does not have this problem. Interrupts during readFully() are ignored. However, it requires us to first seek to the offset we need to read, then issue the read call. This requires locking around the file descriptor to prevent concurrent threads from moving the pointer before the read. This reduces the concurrency level, as now only one window can be paged in at a time from each pack. However, the WindowCache should already be holding most of the pages required to handle the working set for a process, and its own internal locking was already limiting us on the number of concurrent loads possible. Provided that most concurrent accesses are getting hits in the WindowCache, or are for different repositories on the same server, we shouldn't see a major performance hit due to the more serialized loading. I would have preferred to use a pool of RandomAccessFiles for each pack, with threads borrowing an instance dedicated to that thread whenever they needed to page in a window. This would permit much higher levels of concurrency by using multiple file descriptors (and file pointers) for each pack. However the code became too complex to develop in any reasonable period of time, so I've chosen to retrofit the existing code with more serialization instead. Bug: 308945 Change-Id: I2e6e11c6e5a105e5aef68871b66200fd725134c9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 роки тому
Don't use interruptable pread() to access pack files The J2SE NIO APIs require that FileChannel close the underlying file descriptor if a thread is interrupted while it is inside of a read or write operation on that channel. This is insane, because it means we cannot share the file descriptor between threads. If a thread is in the middle of the FileChannel variant of IO.readFully() and it receives an interrupt, the pack will be automatically closed on us. This causes the other threads trying to use that same FileChannel to receive IOExceptions, which leads to the pack getting marked as invalid. Once the pack is marked invalid, JGit loses access to its entire contents and starts to report MissingObjectExceptions. Because PackWriter must ensure that the chosen pack file stays available until the current object's data is fully copied to the output, JGit cannot simply reopen the pack when its automatically closed due to an interrupt being sent at the wrong time. The pack may have been deleted by a concurrent `git gc` process, and that open file descriptor might be the last reference to the inode on disk. Once its closed, the PackWriter loses access to that object representation, and it cannot complete sending the object the client. Fortunately, RandomAccessFile's readFully method does not have this problem. Interrupts during readFully() are ignored. However, it requires us to first seek to the offset we need to read, then issue the read call. This requires locking around the file descriptor to prevent concurrent threads from moving the pointer before the read. This reduces the concurrency level, as now only one window can be paged in at a time from each pack. However, the WindowCache should already be holding most of the pages required to handle the working set for a process, and its own internal locking was already limiting us on the number of concurrent loads possible. Provided that most concurrent accesses are getting hits in the WindowCache, or are for different repositories on the same server, we shouldn't see a major performance hit due to the more serialized loading. I would have preferred to use a pool of RandomAccessFiles for each pack, with threads borrowing an instance dedicated to that thread whenever they needed to page in a window. This would permit much higher levels of concurrency by using multiple file descriptors (and file pointers) for each pack. However the code became too complex to develop in any reasonable period of time, so I've chosen to retrofit the existing code with more serialization instead. Bug: 308945 Change-Id: I2e6e11c6e5a105e5aef68871b66200fd725134c9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 роки тому
Don't use interruptable pread() to access pack files The J2SE NIO APIs require that FileChannel close the underlying file descriptor if a thread is interrupted while it is inside of a read or write operation on that channel. This is insane, because it means we cannot share the file descriptor between threads. If a thread is in the middle of the FileChannel variant of IO.readFully() and it receives an interrupt, the pack will be automatically closed on us. This causes the other threads trying to use that same FileChannel to receive IOExceptions, which leads to the pack getting marked as invalid. Once the pack is marked invalid, JGit loses access to its entire contents and starts to report MissingObjectExceptions. Because PackWriter must ensure that the chosen pack file stays available until the current object's data is fully copied to the output, JGit cannot simply reopen the pack when its automatically closed due to an interrupt being sent at the wrong time. The pack may have been deleted by a concurrent `git gc` process, and that open file descriptor might be the last reference to the inode on disk. Once its closed, the PackWriter loses access to that object representation, and it cannot complete sending the object the client. Fortunately, RandomAccessFile's readFully method does not have this problem. Interrupts during readFully() are ignored. However, it requires us to first seek to the offset we need to read, then issue the read call. This requires locking around the file descriptor to prevent concurrent threads from moving the pointer before the read. This reduces the concurrency level, as now only one window can be paged in at a time from each pack. However, the WindowCache should already be holding most of the pages required to handle the working set for a process, and its own internal locking was already limiting us on the number of concurrent loads possible. Provided that most concurrent accesses are getting hits in the WindowCache, or are for different repositories on the same server, we shouldn't see a major performance hit due to the more serialized loading. I would have preferred to use a pool of RandomAccessFiles for each pack, with threads borrowing an instance dedicated to that thread whenever they needed to page in a window. This would permit much higher levels of concurrency by using multiple file descriptors (and file pointers) for each pack. However the code became too complex to develop in any reasonable period of time, so I've chosen to retrofit the existing code with more serialization instead. Bug: 308945 Change-Id: I2e6e11c6e5a105e5aef68871b66200fd725134c9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 роки тому
Don't use interruptable pread() to access pack files The J2SE NIO APIs require that FileChannel close the underlying file descriptor if a thread is interrupted while it is inside of a read or write operation on that channel. This is insane, because it means we cannot share the file descriptor between threads. If a thread is in the middle of the FileChannel variant of IO.readFully() and it receives an interrupt, the pack will be automatically closed on us. This causes the other threads trying to use that same FileChannel to receive IOExceptions, which leads to the pack getting marked as invalid. Once the pack is marked invalid, JGit loses access to its entire contents and starts to report MissingObjectExceptions. Because PackWriter must ensure that the chosen pack file stays available until the current object's data is fully copied to the output, JGit cannot simply reopen the pack when its automatically closed due to an interrupt being sent at the wrong time. The pack may have been deleted by a concurrent `git gc` process, and that open file descriptor might be the last reference to the inode on disk. Once its closed, the PackWriter loses access to that object representation, and it cannot complete sending the object the client. Fortunately, RandomAccessFile's readFully method does not have this problem. Interrupts during readFully() are ignored. However, it requires us to first seek to the offset we need to read, then issue the read call. This requires locking around the file descriptor to prevent concurrent threads from moving the pointer before the read. This reduces the concurrency level, as now only one window can be paged in at a time from each pack. However, the WindowCache should already be holding most of the pages required to handle the working set for a process, and its own internal locking was already limiting us on the number of concurrent loads possible. Provided that most concurrent accesses are getting hits in the WindowCache, or are for different repositories on the same server, we shouldn't see a major performance hit due to the more serialized loading. I would have preferred to use a pool of RandomAccessFiles for each pack, with threads borrowing an instance dedicated to that thread whenever they needed to page in a window. This would permit much higher levels of concurrency by using multiple file descriptors (and file pointers) for each pack. However the code became too complex to develop in any reasonable period of time, so I've chosen to retrofit the existing code with more serialization instead. Bug: 308945 Change-Id: I2e6e11c6e5a105e5aef68871b66200fd725134c9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 роки тому
Don't use interruptable pread() to access pack files The J2SE NIO APIs require that FileChannel close the underlying file descriptor if a thread is interrupted while it is inside of a read or write operation on that channel. This is insane, because it means we cannot share the file descriptor between threads. If a thread is in the middle of the FileChannel variant of IO.readFully() and it receives an interrupt, the pack will be automatically closed on us. This causes the other threads trying to use that same FileChannel to receive IOExceptions, which leads to the pack getting marked as invalid. Once the pack is marked invalid, JGit loses access to its entire contents and starts to report MissingObjectExceptions. Because PackWriter must ensure that the chosen pack file stays available until the current object's data is fully copied to the output, JGit cannot simply reopen the pack when its automatically closed due to an interrupt being sent at the wrong time. The pack may have been deleted by a concurrent `git gc` process, and that open file descriptor might be the last reference to the inode on disk. Once its closed, the PackWriter loses access to that object representation, and it cannot complete sending the object the client. Fortunately, RandomAccessFile's readFully method does not have this problem. Interrupts during readFully() are ignored. However, it requires us to first seek to the offset we need to read, then issue the read call. This requires locking around the file descriptor to prevent concurrent threads from moving the pointer before the read. This reduces the concurrency level, as now only one window can be paged in at a time from each pack. However, the WindowCache should already be holding most of the pages required to handle the working set for a process, and its own internal locking was already limiting us on the number of concurrent loads possible. Provided that most concurrent accesses are getting hits in the WindowCache, or are for different repositories on the same server, we shouldn't see a major performance hit due to the more serialized loading. I would have preferred to use a pool of RandomAccessFiles for each pack, with threads borrowing an instance dedicated to that thread whenever they needed to page in a window. This would permit much higher levels of concurrency by using multiple file descriptors (and file pointers) for each pack. However the code became too complex to develop in any reasonable period of time, so I've chosen to retrofit the existing code with more serialization instead. Bug: 308945 Change-Id: I2e6e11c6e5a105e5aef68871b66200fd725134c9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 роки тому
Don't use interruptable pread() to access pack files The J2SE NIO APIs require that FileChannel close the underlying file descriptor if a thread is interrupted while it is inside of a read or write operation on that channel. This is insane, because it means we cannot share the file descriptor between threads. If a thread is in the middle of the FileChannel variant of IO.readFully() and it receives an interrupt, the pack will be automatically closed on us. This causes the other threads trying to use that same FileChannel to receive IOExceptions, which leads to the pack getting marked as invalid. Once the pack is marked invalid, JGit loses access to its entire contents and starts to report MissingObjectExceptions. Because PackWriter must ensure that the chosen pack file stays available until the current object's data is fully copied to the output, JGit cannot simply reopen the pack when its automatically closed due to an interrupt being sent at the wrong time. The pack may have been deleted by a concurrent `git gc` process, and that open file descriptor might be the last reference to the inode on disk. Once its closed, the PackWriter loses access to that object representation, and it cannot complete sending the object the client. Fortunately, RandomAccessFile's readFully method does not have this problem. Interrupts during readFully() are ignored. However, it requires us to first seek to the offset we need to read, then issue the read call. This requires locking around the file descriptor to prevent concurrent threads from moving the pointer before the read. This reduces the concurrency level, as now only one window can be paged in at a time from each pack. However, the WindowCache should already be holding most of the pages required to handle the working set for a process, and its own internal locking was already limiting us on the number of concurrent loads possible. Provided that most concurrent accesses are getting hits in the WindowCache, or are for different repositories on the same server, we shouldn't see a major performance hit due to the more serialized loading. I would have preferred to use a pool of RandomAccessFiles for each pack, with threads borrowing an instance dedicated to that thread whenever they needed to page in a window. This would permit much higher levels of concurrency by using multiple file descriptors (and file pointers) for each pack. However the code became too complex to develop in any reasonable period of time, so I've chosen to retrofit the existing code with more serialization instead. Bug: 308945 Change-Id: I2e6e11c6e5a105e5aef68871b66200fd725134c9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 роки тому
Don't use interruptable pread() to access pack files The J2SE NIO APIs require that FileChannel close the underlying file descriptor if a thread is interrupted while it is inside of a read or write operation on that channel. This is insane, because it means we cannot share the file descriptor between threads. If a thread is in the middle of the FileChannel variant of IO.readFully() and it receives an interrupt, the pack will be automatically closed on us. This causes the other threads trying to use that same FileChannel to receive IOExceptions, which leads to the pack getting marked as invalid. Once the pack is marked invalid, JGit loses access to its entire contents and starts to report MissingObjectExceptions. Because PackWriter must ensure that the chosen pack file stays available until the current object's data is fully copied to the output, JGit cannot simply reopen the pack when its automatically closed due to an interrupt being sent at the wrong time. The pack may have been deleted by a concurrent `git gc` process, and that open file descriptor might be the last reference to the inode on disk. Once its closed, the PackWriter loses access to that object representation, and it cannot complete sending the object the client. Fortunately, RandomAccessFile's readFully method does not have this problem. Interrupts during readFully() are ignored. However, it requires us to first seek to the offset we need to read, then issue the read call. This requires locking around the file descriptor to prevent concurrent threads from moving the pointer before the read. This reduces the concurrency level, as now only one window can be paged in at a time from each pack. However, the WindowCache should already be holding most of the pages required to handle the working set for a process, and its own internal locking was already limiting us on the number of concurrent loads possible. Provided that most concurrent accesses are getting hits in the WindowCache, or are for different repositories on the same server, we shouldn't see a major performance hit due to the more serialized loading. I would have preferred to use a pool of RandomAccessFiles for each pack, with threads borrowing an instance dedicated to that thread whenever they needed to page in a window. This would permit much higher levels of concurrency by using multiple file descriptors (and file pointers) for each pack. However the code became too complex to develop in any reasonable period of time, so I've chosen to retrofit the existing code with more serialization instead. Bug: 308945 Change-Id: I2e6e11c6e5a105e5aef68871b66200fd725134c9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 роки тому
Don't use interruptable pread() to access pack files The J2SE NIO APIs require that FileChannel close the underlying file descriptor if a thread is interrupted while it is inside of a read or write operation on that channel. This is insane, because it means we cannot share the file descriptor between threads. If a thread is in the middle of the FileChannel variant of IO.readFully() and it receives an interrupt, the pack will be automatically closed on us. This causes the other threads trying to use that same FileChannel to receive IOExceptions, which leads to the pack getting marked as invalid. Once the pack is marked invalid, JGit loses access to its entire contents and starts to report MissingObjectExceptions. Because PackWriter must ensure that the chosen pack file stays available until the current object's data is fully copied to the output, JGit cannot simply reopen the pack when its automatically closed due to an interrupt being sent at the wrong time. The pack may have been deleted by a concurrent `git gc` process, and that open file descriptor might be the last reference to the inode on disk. Once its closed, the PackWriter loses access to that object representation, and it cannot complete sending the object the client. Fortunately, RandomAccessFile's readFully method does not have this problem. Interrupts during readFully() are ignored. However, it requires us to first seek to the offset we need to read, then issue the read call. This requires locking around the file descriptor to prevent concurrent threads from moving the pointer before the read. This reduces the concurrency level, as now only one window can be paged in at a time from each pack. However, the WindowCache should already be holding most of the pages required to handle the working set for a process, and its own internal locking was already limiting us on the number of concurrent loads possible. Provided that most concurrent accesses are getting hits in the WindowCache, or are for different repositories on the same server, we shouldn't see a major performance hit due to the more serialized loading. I would have preferred to use a pool of RandomAccessFiles for each pack, with threads borrowing an instance dedicated to that thread whenever they needed to page in a window. This would permit much higher levels of concurrency by using multiple file descriptors (and file pointers) for each pack. However the code became too complex to develop in any reasonable period of time, so I've chosen to retrofit the existing code with more serialization instead. Bug: 308945 Change-Id: I2e6e11c6e5a105e5aef68871b66200fd725134c9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 роки тому
Increase core.streamFileThreshold default to 50 MiB Projects like org.eclipse.mdt contain large XML files about 6 MiB in size. So does the Android project platform/frameworks/base. Doing a clone of either project with JGit takes forever to checkout the files into the working directory, because delta decompression tends to be very expensive as we need to constantly reposition the base stream for each copy instruction. This can be made worse by a very bad ordering of offsets, possibly due to an XML editor that doesn't preserve the order of elements in the file very well. Increasing the threshold to the same limit PackWriter uses when doing delta compression (50 MiB) permits a default configured JGit to decompress these XML file objects using the faster random-access arrays, rather than re-seeking through an inflate stream, significantly reducing checkout time after a clone. Since this new limit may be dangerously close to the JVM maximum heap size, every allocation attempt is now wrapped in a try/catch so that JGit can degrade by switching to the large object stream mode when the allocation is refused. It will run slower, but the operation will still complete. The large stream mode will run very well for big objects that aren't delta compressed, and is acceptable for delta compressed objects that are using only forward referencing copy instructions. Copies using prior offsets are still going to be horrible, and there is nothing we can do about it except increase core.streamFileThreshold. We might in the future want to consider changing the way the delta generators work in JGit and native C Git to avoid prior offsets once an object reaches a certain size, even if that causes the delta instruction stream to be slightly larger. Unfortunately native C Git won't want to do that until its also able to stream objects rather than malloc them as contiguous blocks. Change-Id: Ief7a3896afce15073e80d3691bed90c6a3897307 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
13 роки тому
Increase core.streamFileThreshold default to 50 MiB Projects like org.eclipse.mdt contain large XML files about 6 MiB in size. So does the Android project platform/frameworks/base. Doing a clone of either project with JGit takes forever to checkout the files into the working directory, because delta decompression tends to be very expensive as we need to constantly reposition the base stream for each copy instruction. This can be made worse by a very bad ordering of offsets, possibly due to an XML editor that doesn't preserve the order of elements in the file very well. Increasing the threshold to the same limit PackWriter uses when doing delta compression (50 MiB) permits a default configured JGit to decompress these XML file objects using the faster random-access arrays, rather than re-seeking through an inflate stream, significantly reducing checkout time after a clone. Since this new limit may be dangerously close to the JVM maximum heap size, every allocation attempt is now wrapped in a try/catch so that JGit can degrade by switching to the large object stream mode when the allocation is refused. It will run slower, but the operation will still complete. The large stream mode will run very well for big objects that aren't delta compressed, and is acceptable for delta compressed objects that are using only forward referencing copy instructions. Copies using prior offsets are still going to be horrible, and there is nothing we can do about it except increase core.streamFileThreshold. We might in the future want to consider changing the way the delta generators work in JGit and native C Git to avoid prior offsets once an object reaches a certain size, even if that causes the delta instruction stream to be slightly larger. Unfortunately native C Git won't want to do that until its also able to stream objects rather than malloc them as contiguous blocks. Change-Id: Ief7a3896afce15073e80d3691bed90c6a3897307 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
13 роки тому
Increase core.streamFileThreshold default to 50 MiB Projects like org.eclipse.mdt contain large XML files about 6 MiB in size. So does the Android project platform/frameworks/base. Doing a clone of either project with JGit takes forever to checkout the files into the working directory, because delta decompression tends to be very expensive as we need to constantly reposition the base stream for each copy instruction. This can be made worse by a very bad ordering of offsets, possibly due to an XML editor that doesn't preserve the order of elements in the file very well. Increasing the threshold to the same limit PackWriter uses when doing delta compression (50 MiB) permits a default configured JGit to decompress these XML file objects using the faster random-access arrays, rather than re-seeking through an inflate stream, significantly reducing checkout time after a clone. Since this new limit may be dangerously close to the JVM maximum heap size, every allocation attempt is now wrapped in a try/catch so that JGit can degrade by switching to the large object stream mode when the allocation is refused. It will run slower, but the operation will still complete. The large stream mode will run very well for big objects that aren't delta compressed, and is acceptable for delta compressed objects that are using only forward referencing copy instructions. Copies using prior offsets are still going to be horrible, and there is nothing we can do about it except increase core.streamFileThreshold. We might in the future want to consider changing the way the delta generators work in JGit and native C Git to avoid prior offsets once an object reaches a certain size, even if that causes the delta instruction stream to be slightly larger. Unfortunately native C Git won't want to do that until its also able to stream objects rather than malloc them as contiguous blocks. Change-Id: Ief7a3896afce15073e80d3691bed90c6a3897307 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
13 роки тому
Increase core.streamFileThreshold default to 50 MiB Projects like org.eclipse.mdt contain large XML files about 6 MiB in size. So does the Android project platform/frameworks/base. Doing a clone of either project with JGit takes forever to checkout the files into the working directory, because delta decompression tends to be very expensive as we need to constantly reposition the base stream for each copy instruction. This can be made worse by a very bad ordering of offsets, possibly due to an XML editor that doesn't preserve the order of elements in the file very well. Increasing the threshold to the same limit PackWriter uses when doing delta compression (50 MiB) permits a default configured JGit to decompress these XML file objects using the faster random-access arrays, rather than re-seeking through an inflate stream, significantly reducing checkout time after a clone. Since this new limit may be dangerously close to the JVM maximum heap size, every allocation attempt is now wrapped in a try/catch so that JGit can degrade by switching to the large object stream mode when the allocation is refused. It will run slower, but the operation will still complete. The large stream mode will run very well for big objects that aren't delta compressed, and is acceptable for delta compressed objects that are using only forward referencing copy instructions. Copies using prior offsets are still going to be horrible, and there is nothing we can do about it except increase core.streamFileThreshold. We might in the future want to consider changing the way the delta generators work in JGit and native C Git to avoid prior offsets once an object reaches a certain size, even if that causes the delta instruction stream to be slightly larger. Unfortunately native C Git won't want to do that until its also able to stream objects rather than malloc them as contiguous blocks. Change-Id: Ief7a3896afce15073e80d3691bed90c6a3897307 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
13 роки тому
Added read/write support for pack bitmap index. A pack bitmap index is an additional index of compressed bitmaps of the object graph. Furthermore, a logical API of the index functionality is included, as it is expected to be used by the PackWriter. Compressed bitmaps are created using the javaewah library, which is a word-aligned compressed variant of the Java bitset class based on run-length encoding. The library only works with positive integer values. Thus, the maximum number of ObjectIds in a pack file that this index can currently support is limited to Integer.MAX_VALUE. Every ObjectId is given an integer mapping. The integer is the position of the ObjectId in the complete ObjectId list, sorted by offset, for the pack file. That integer is what the bitmaps use to reference the ObjectId. Currently, the new index format can only be used with pack files that contain a complete closure of the object graph e.g. the result of a garbage collection. The index file includes four bitmaps for the Git object types i.e. commits, trees, blobs, and tags. In addition, a collection of bitmaps keyed by an ObjectId is also included. The bitmap for each entry in the collection represents the full closure of ObjectIds reachable from the keyed ObjectId (including the keyed ObjectId itself). The bitmaps are further compressed by XORing the current bitmaps against prior bitmaps in the index, and selecting the smallest representation. The XOR'd bitmap and offset from the current entry to the position of the bitmap to XOR against is the actual representation of the entry in the index file. Each entry contains one byte, which is currently used to note whether the bitmap should be blindly reused. Change-Id: Id328724bf6b4c8366a088233098c18643edcf40f
11 роки тому
Added read/write support for pack bitmap index. A pack bitmap index is an additional index of compressed bitmaps of the object graph. Furthermore, a logical API of the index functionality is included, as it is expected to be used by the PackWriter. Compressed bitmaps are created using the javaewah library, which is a word-aligned compressed variant of the Java bitset class based on run-length encoding. The library only works with positive integer values. Thus, the maximum number of ObjectIds in a pack file that this index can currently support is limited to Integer.MAX_VALUE. Every ObjectId is given an integer mapping. The integer is the position of the ObjectId in the complete ObjectId list, sorted by offset, for the pack file. That integer is what the bitmaps use to reference the ObjectId. Currently, the new index format can only be used with pack files that contain a complete closure of the object graph e.g. the result of a garbage collection. The index file includes four bitmaps for the Git object types i.e. commits, trees, blobs, and tags. In addition, a collection of bitmaps keyed by an ObjectId is also included. The bitmap for each entry in the collection represents the full closure of ObjectIds reachable from the keyed ObjectId (including the keyed ObjectId itself). The bitmaps are further compressed by XORing the current bitmaps against prior bitmaps in the index, and selecting the smallest representation. The XOR'd bitmap and offset from the current entry to the position of the bitmap to XOR against is the actual representation of the entry in the index file. Each entry contains one byte, which is currently used to note whether the bitmap should be blindly reused. Change-Id: Id328724bf6b4c8366a088233098c18643edcf40f
11 роки тому
Added read/write support for pack bitmap index. A pack bitmap index is an additional index of compressed bitmaps of the object graph. Furthermore, a logical API of the index functionality is included, as it is expected to be used by the PackWriter. Compressed bitmaps are created using the javaewah library, which is a word-aligned compressed variant of the Java bitset class based on run-length encoding. The library only works with positive integer values. Thus, the maximum number of ObjectIds in a pack file that this index can currently support is limited to Integer.MAX_VALUE. Every ObjectId is given an integer mapping. The integer is the position of the ObjectId in the complete ObjectId list, sorted by offset, for the pack file. That integer is what the bitmaps use to reference the ObjectId. Currently, the new index format can only be used with pack files that contain a complete closure of the object graph e.g. the result of a garbage collection. The index file includes four bitmaps for the Git object types i.e. commits, trees, blobs, and tags. In addition, a collection of bitmaps keyed by an ObjectId is also included. The bitmap for each entry in the collection represents the full closure of ObjectIds reachable from the keyed ObjectId (including the keyed ObjectId itself). The bitmaps are further compressed by XORing the current bitmaps against prior bitmaps in the index, and selecting the smallest representation. The XOR'd bitmap and offset from the current entry to the position of the bitmap to XOR against is the actual representation of the entry in the index file. Each entry contains one byte, which is currently used to note whether the bitmap should be blindly reused. Change-Id: Id328724bf6b4c8366a088233098c18643edcf40f
11 роки тому
Added read/write support for pack bitmap index. A pack bitmap index is an additional index of compressed bitmaps of the object graph. Furthermore, a logical API of the index functionality is included, as it is expected to be used by the PackWriter. Compressed bitmaps are created using the javaewah library, which is a word-aligned compressed variant of the Java bitset class based on run-length encoding. The library only works with positive integer values. Thus, the maximum number of ObjectIds in a pack file that this index can currently support is limited to Integer.MAX_VALUE. Every ObjectId is given an integer mapping. The integer is the position of the ObjectId in the complete ObjectId list, sorted by offset, for the pack file. That integer is what the bitmaps use to reference the ObjectId. Currently, the new index format can only be used with pack files that contain a complete closure of the object graph e.g. the result of a garbage collection. The index file includes four bitmaps for the Git object types i.e. commits, trees, blobs, and tags. In addition, a collection of bitmaps keyed by an ObjectId is also included. The bitmap for each entry in the collection represents the full closure of ObjectIds reachable from the keyed ObjectId (including the keyed ObjectId itself). The bitmaps are further compressed by XORing the current bitmaps against prior bitmaps in the index, and selecting the smallest representation. The XOR'd bitmap and offset from the current entry to the position of the bitmap to XOR against is the actual representation of the entry in the index file. Each entry contains one byte, which is currently used to note whether the bitmap should be blindly reused. Change-Id: Id328724bf6b4c8366a088233098c18643edcf40f
11 роки тому
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130
  1. /*
  2. * Copyright (C) 2008-2009, Google Inc.
  3. * Copyright (C) 2007, Robin Rosenberg <robin.rosenberg@dewire.com>
  4. * Copyright (C) 2006-2008, Shawn O. Pearce <spearce@spearce.org>
  5. * and other copyright owners as documented in the project's IP log.
  6. *
  7. * This program and the accompanying materials are made available
  8. * under the terms of the Eclipse Distribution License v1.0 which
  9. * accompanies this distribution, is reproduced below, and is
  10. * available at http://www.eclipse.org/org/documents/edl-v10.php
  11. *
  12. * All rights reserved.
  13. *
  14. * Redistribution and use in source and binary forms, with or
  15. * without modification, are permitted provided that the following
  16. * conditions are met:
  17. *
  18. * - Redistributions of source code must retain the above copyright
  19. * notice, this list of conditions and the following disclaimer.
  20. *
  21. * - Redistributions in binary form must reproduce the above
  22. * copyright notice, this list of conditions and the following
  23. * disclaimer in the documentation and/or other materials provided
  24. * with the distribution.
  25. *
  26. * - Neither the name of the Eclipse Foundation, Inc. nor the
  27. * names of its contributors may be used to endorse or promote
  28. * products derived from this software without specific prior
  29. * written permission.
  30. *
  31. * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
  32. * CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
  33. * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  34. * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  35. * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
  36. * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
  37. * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
  38. * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
  39. * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  40. * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
  41. * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
  42. * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
  43. * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  44. */
  45. package org.eclipse.jgit.internal.storage.file;
  46. import static org.eclipse.jgit.internal.storage.pack.PackExt.BITMAP_INDEX;
  47. import static org.eclipse.jgit.internal.storage.pack.PackExt.INDEX;
  48. import java.io.EOFException;
  49. import java.io.File;
  50. import java.io.IOException;
  51. import java.io.InterruptedIOException;
  52. import java.io.RandomAccessFile;
  53. import java.nio.MappedByteBuffer;
  54. import java.nio.channels.FileChannel.MapMode;
  55. import java.text.MessageFormat;
  56. import java.util.Arrays;
  57. import java.util.Collections;
  58. import java.util.Comparator;
  59. import java.util.Iterator;
  60. import java.util.Set;
  61. import java.util.zip.CRC32;
  62. import java.util.zip.DataFormatException;
  63. import java.util.zip.Inflater;
  64. import org.eclipse.jgit.errors.CorruptObjectException;
  65. import org.eclipse.jgit.errors.LargeObjectException;
  66. import org.eclipse.jgit.errors.MissingObjectException;
  67. import org.eclipse.jgit.errors.PackInvalidException;
  68. import org.eclipse.jgit.errors.PackMismatchException;
  69. import org.eclipse.jgit.errors.StoredObjectRepresentationNotAvailableException;
  70. import org.eclipse.jgit.internal.JGitText;
  71. import org.eclipse.jgit.internal.storage.pack.BinaryDelta;
  72. import org.eclipse.jgit.internal.storage.pack.ObjectToPack;
  73. import org.eclipse.jgit.internal.storage.pack.PackExt;
  74. import org.eclipse.jgit.internal.storage.pack.PackOutputStream;
  75. import org.eclipse.jgit.lib.AbbreviatedObjectId;
  76. import org.eclipse.jgit.lib.AnyObjectId;
  77. import org.eclipse.jgit.lib.Constants;
  78. import org.eclipse.jgit.lib.ObjectId;
  79. import org.eclipse.jgit.lib.ObjectLoader;
  80. import org.eclipse.jgit.util.LongList;
  81. import org.eclipse.jgit.util.NB;
  82. import org.eclipse.jgit.util.RawParseUtils;
  83. /**
  84. * A Git version 2 pack file representation. A pack file contains Git objects in
  85. * delta packed format yielding high compression of lots of object where some
  86. * objects are similar.
  87. */
  88. public class PackFile implements Iterable<PackIndex.MutableEntry> {
  89. /** Sorts PackFiles to be most recently created to least recently created. */
  90. public static final Comparator<PackFile> SORT = new Comparator<PackFile>() {
  91. public int compare(final PackFile a, final PackFile b) {
  92. return b.packLastModified - a.packLastModified;
  93. }
  94. };
  95. private final File packFile;
  96. private final int extensions;
  97. private File keepFile;
  98. private volatile String packName;
  99. final int hash;
  100. private RandomAccessFile fd;
  101. /** Serializes reads performed against {@link #fd}. */
  102. private final Object readLock = new Object();
  103. long length;
  104. private int activeWindows;
  105. private int activeCopyRawData;
  106. private int packLastModified;
  107. private volatile boolean invalid;
  108. private boolean invalidBitmap;
  109. private byte[] packChecksum;
  110. private PackIndex loadedIdx;
  111. private PackReverseIndex reverseIdx;
  112. private PackBitmapIndex bitmapIdx;
  113. /**
  114. * Objects we have tried to read, and discovered to be corrupt.
  115. * <p>
  116. * The list is allocated after the first corruption is found, and filled in
  117. * as more entries are discovered. Typically this list is never used, as
  118. * pack files do not usually contain corrupt objects.
  119. */
  120. private volatile LongList corruptObjects;
  121. /**
  122. * Construct a reader for an existing, pre-indexed packfile.
  123. *
  124. * @param packFile
  125. * path of the <code>.pack</code> file holding the data.
  126. * @param extensions
  127. * additional pack file extensions with the same base as the pack
  128. */
  129. public PackFile(final File packFile, int extensions) {
  130. this.packFile = packFile;
  131. this.packLastModified = (int) (packFile.lastModified() >> 10);
  132. this.extensions = extensions;
  133. // Multiply by 31 here so we can more directly combine with another
  134. // value in WindowCache.hash(), without doing the multiply there.
  135. //
  136. hash = System.identityHashCode(this) * 31;
  137. length = Long.MAX_VALUE;
  138. }
  139. private synchronized PackIndex idx() throws IOException {
  140. if (loadedIdx == null) {
  141. if (invalid)
  142. throw new PackInvalidException(packFile);
  143. try {
  144. final PackIndex idx = PackIndex.open(extFile(INDEX));
  145. if (packChecksum == null) {
  146. packChecksum = idx.packChecksum;
  147. } else if (!Arrays.equals(packChecksum, idx.packChecksum)) {
  148. throw new PackMismatchException(MessageFormat.format(
  149. JGitText.get().packChecksumMismatch,
  150. packFile.getPath()));
  151. }
  152. loadedIdx = idx;
  153. } catch (InterruptedIOException e) {
  154. // don't invalidate the pack, we are interrupted from another thread
  155. throw e;
  156. } catch (IOException e) {
  157. invalid = true;
  158. throw e;
  159. }
  160. }
  161. return loadedIdx;
  162. }
  163. /** @return the File object which locates this pack on disk. */
  164. public File getPackFile() {
  165. return packFile;
  166. }
  167. /**
  168. * @return the index for this pack file.
  169. * @throws IOException
  170. */
  171. public PackIndex getIndex() throws IOException {
  172. return idx();
  173. }
  174. /** @return name extracted from {@code pack-*.pack} pattern. */
  175. public String getPackName() {
  176. String name = packName;
  177. if (name == null) {
  178. name = getPackFile().getName();
  179. if (name.startsWith("pack-")) //$NON-NLS-1$
  180. name = name.substring("pack-".length()); //$NON-NLS-1$
  181. if (name.endsWith(".pack")) //$NON-NLS-1$
  182. name = name.substring(0, name.length() - ".pack".length()); //$NON-NLS-1$
  183. packName = name;
  184. }
  185. return name;
  186. }
  187. /**
  188. * Determine if an object is contained within the pack file.
  189. * <p>
  190. * For performance reasons only the index file is searched; the main pack
  191. * content is ignored entirely.
  192. * </p>
  193. *
  194. * @param id
  195. * the object to look for. Must not be null.
  196. * @return true if the object is in this pack; false otherwise.
  197. * @throws IOException
  198. * the index file cannot be loaded into memory.
  199. */
  200. public boolean hasObject(final AnyObjectId id) throws IOException {
  201. final long offset = idx().findOffset(id);
  202. return 0 < offset && !isCorrupt(offset);
  203. }
  204. /**
  205. * Determines whether a .keep file exists for this pack file.
  206. *
  207. * @return true if a .keep file exist.
  208. */
  209. public boolean shouldBeKept() {
  210. if (keepFile == null)
  211. keepFile = new File(packFile.getPath() + ".keep"); //$NON-NLS-1$
  212. return keepFile.exists();
  213. }
  214. /**
  215. * Get an object from this pack.
  216. *
  217. * @param curs
  218. * temporary working space associated with the calling thread.
  219. * @param id
  220. * the object to obtain from the pack. Must not be null.
  221. * @return the object loader for the requested object if it is contained in
  222. * this pack; null if the object was not found.
  223. * @throws IOException
  224. * the pack file or the index could not be read.
  225. */
  226. ObjectLoader get(final WindowCursor curs, final AnyObjectId id)
  227. throws IOException {
  228. final long offset = idx().findOffset(id);
  229. return 0 < offset && !isCorrupt(offset) ? load(curs, offset) : null;
  230. }
  231. void resolve(Set<ObjectId> matches, AbbreviatedObjectId id, int matchLimit)
  232. throws IOException {
  233. idx().resolve(matches, id, matchLimit);
  234. }
  235. /**
  236. * Close the resources utilized by this repository
  237. */
  238. public void close() {
  239. WindowCache.purge(this);
  240. synchronized (this) {
  241. loadedIdx = null;
  242. reverseIdx = null;
  243. }
  244. }
  245. /**
  246. * Provide iterator over entries in associated pack index, that should also
  247. * exist in this pack file. Objects returned by such iterator are mutable
  248. * during iteration.
  249. * <p>
  250. * Iterator returns objects in SHA-1 lexicographical order.
  251. * </p>
  252. *
  253. * @return iterator over entries of associated pack index
  254. *
  255. * @see PackIndex#iterator()
  256. */
  257. public Iterator<PackIndex.MutableEntry> iterator() {
  258. try {
  259. return idx().iterator();
  260. } catch (IOException e) {
  261. return Collections.<PackIndex.MutableEntry> emptyList().iterator();
  262. }
  263. }
  264. /**
  265. * Obtain the total number of objects available in this pack. This method
  266. * relies on pack index, giving number of effectively available objects.
  267. *
  268. * @return number of objects in index of this pack, likewise in this pack
  269. * @throws IOException
  270. * the index file cannot be loaded into memory.
  271. */
  272. long getObjectCount() throws IOException {
  273. return idx().getObjectCount();
  274. }
  275. /**
  276. * Search for object id with the specified start offset in associated pack
  277. * (reverse) index.
  278. *
  279. * @param offset
  280. * start offset of object to find
  281. * @return object id for this offset, or null if no object was found
  282. * @throws IOException
  283. * the index file cannot be loaded into memory.
  284. */
  285. ObjectId findObjectForOffset(final long offset) throws IOException {
  286. return getReverseIdx().findObject(offset);
  287. }
  288. private final byte[] decompress(final long position, final int sz,
  289. final WindowCursor curs) throws IOException, DataFormatException {
  290. byte[] dstbuf;
  291. try {
  292. dstbuf = new byte[sz];
  293. } catch (OutOfMemoryError noMemory) {
  294. // The size may be larger than our heap allows, return null to
  295. // let the caller know allocation isn't possible and it should
  296. // use the large object streaming approach instead.
  297. //
  298. // For example, this can occur when sz is 640 MB, and JRE
  299. // maximum heap size is only 256 MB. Even if the JRE has
  300. // 200 MB free, it cannot allocate a 640 MB byte array.
  301. return null;
  302. }
  303. if (curs.inflate(this, position, dstbuf, false) != sz)
  304. throw new EOFException(MessageFormat.format(
  305. JGitText.get().shortCompressedStreamAt,
  306. Long.valueOf(position)));
  307. return dstbuf;
  308. }
  309. void copyPackAsIs(PackOutputStream out, WindowCursor curs)
  310. throws IOException {
  311. // Pin the first window, this ensures the length is accurate.
  312. curs.pin(this, 0);
  313. curs.copyPackAsIs(this, length, out);
  314. }
  315. final void copyAsIs(PackOutputStream out, LocalObjectToPack src,
  316. boolean validate, WindowCursor curs) throws IOException,
  317. StoredObjectRepresentationNotAvailableException {
  318. beginCopyAsIs(src);
  319. try {
  320. copyAsIs2(out, src, validate, curs);
  321. } finally {
  322. endCopyAsIs();
  323. }
  324. }
  325. private void copyAsIs2(PackOutputStream out, LocalObjectToPack src,
  326. boolean validate, WindowCursor curs) throws IOException,
  327. StoredObjectRepresentationNotAvailableException {
  328. final CRC32 crc1 = validate ? new CRC32() : null;
  329. final CRC32 crc2 = validate ? new CRC32() : null;
  330. final byte[] buf = out.getCopyBuffer();
  331. // Rip apart the header so we can discover the size.
  332. //
  333. readFully(src.offset, buf, 0, 20, curs);
  334. int c = buf[0] & 0xff;
  335. final int typeCode = (c >> 4) & 7;
  336. long inflatedLength = c & 15;
  337. int shift = 4;
  338. int headerCnt = 1;
  339. while ((c & 0x80) != 0) {
  340. c = buf[headerCnt++] & 0xff;
  341. inflatedLength += ((long) (c & 0x7f)) << shift;
  342. shift += 7;
  343. }
  344. if (typeCode == Constants.OBJ_OFS_DELTA) {
  345. do {
  346. c = buf[headerCnt++] & 0xff;
  347. } while ((c & 128) != 0);
  348. if (validate) {
  349. assert(crc1 != null && crc2 != null);
  350. crc1.update(buf, 0, headerCnt);
  351. crc2.update(buf, 0, headerCnt);
  352. }
  353. } else if (typeCode == Constants.OBJ_REF_DELTA) {
  354. if (validate) {
  355. assert(crc1 != null && crc2 != null);
  356. crc1.update(buf, 0, headerCnt);
  357. crc2.update(buf, 0, headerCnt);
  358. }
  359. readFully(src.offset + headerCnt, buf, 0, 20, curs);
  360. if (validate) {
  361. assert(crc1 != null && crc2 != null);
  362. crc1.update(buf, 0, 20);
  363. crc2.update(buf, 0, 20);
  364. }
  365. headerCnt += 20;
  366. } else if (validate) {
  367. assert(crc1 != null && crc2 != null);
  368. crc1.update(buf, 0, headerCnt);
  369. crc2.update(buf, 0, headerCnt);
  370. }
  371. final long dataOffset = src.offset + headerCnt;
  372. final long dataLength = src.length;
  373. final long expectedCRC;
  374. final ByteArrayWindow quickCopy;
  375. // Verify the object isn't corrupt before sending. If it is,
  376. // we report it missing instead.
  377. //
  378. try {
  379. quickCopy = curs.quickCopy(this, dataOffset, dataLength);
  380. if (validate && idx().hasCRC32Support()) {
  381. assert(crc1 != null);
  382. // Index has the CRC32 code cached, validate the object.
  383. //
  384. expectedCRC = idx().findCRC32(src);
  385. if (quickCopy != null) {
  386. quickCopy.crc32(crc1, dataOffset, (int) dataLength);
  387. } else {
  388. long pos = dataOffset;
  389. long cnt = dataLength;
  390. while (cnt > 0) {
  391. final int n = (int) Math.min(cnt, buf.length);
  392. readFully(pos, buf, 0, n, curs);
  393. crc1.update(buf, 0, n);
  394. pos += n;
  395. cnt -= n;
  396. }
  397. }
  398. if (crc1.getValue() != expectedCRC) {
  399. setCorrupt(src.offset);
  400. throw new CorruptObjectException(MessageFormat.format(
  401. JGitText.get().objectAtHasBadZlibStream,
  402. Long.valueOf(src.offset), getPackFile()));
  403. }
  404. } else if (validate) {
  405. // We don't have a CRC32 code in the index, so compute it
  406. // now while inflating the raw data to get zlib to tell us
  407. // whether or not the data is safe.
  408. //
  409. Inflater inf = curs.inflater();
  410. byte[] tmp = new byte[1024];
  411. if (quickCopy != null) {
  412. quickCopy.check(inf, tmp, dataOffset, (int) dataLength);
  413. } else {
  414. assert(crc1 != null);
  415. long pos = dataOffset;
  416. long cnt = dataLength;
  417. while (cnt > 0) {
  418. final int n = (int) Math.min(cnt, buf.length);
  419. readFully(pos, buf, 0, n, curs);
  420. crc1.update(buf, 0, n);
  421. inf.setInput(buf, 0, n);
  422. while (inf.inflate(tmp, 0, tmp.length) > 0)
  423. continue;
  424. pos += n;
  425. cnt -= n;
  426. }
  427. }
  428. if (!inf.finished() || inf.getBytesRead() != dataLength) {
  429. setCorrupt(src.offset);
  430. throw new EOFException(MessageFormat.format(
  431. JGitText.get().shortCompressedStreamAt,
  432. Long.valueOf(src.offset)));
  433. }
  434. assert(crc1 != null);
  435. expectedCRC = crc1.getValue();
  436. } else {
  437. expectedCRC = -1;
  438. }
  439. } catch (DataFormatException dataFormat) {
  440. setCorrupt(src.offset);
  441. CorruptObjectException corruptObject = new CorruptObjectException(
  442. MessageFormat.format(
  443. JGitText.get().objectAtHasBadZlibStream,
  444. Long.valueOf(src.offset), getPackFile()));
  445. corruptObject.initCause(dataFormat);
  446. StoredObjectRepresentationNotAvailableException gone;
  447. gone = new StoredObjectRepresentationNotAvailableException(src);
  448. gone.initCause(corruptObject);
  449. throw gone;
  450. } catch (IOException ioError) {
  451. StoredObjectRepresentationNotAvailableException gone;
  452. gone = new StoredObjectRepresentationNotAvailableException(src);
  453. gone.initCause(ioError);
  454. throw gone;
  455. }
  456. if (quickCopy != null) {
  457. // The entire object fits into a single byte array window slice,
  458. // and we have it pinned. Write this out without copying.
  459. //
  460. out.writeHeader(src, inflatedLength);
  461. quickCopy.write(out, dataOffset, (int) dataLength);
  462. } else if (dataLength <= buf.length) {
  463. // Tiny optimization: Lots of objects are very small deltas or
  464. // deflated commits that are likely to fit in the copy buffer.
  465. //
  466. if (!validate) {
  467. long pos = dataOffset;
  468. long cnt = dataLength;
  469. while (cnt > 0) {
  470. final int n = (int) Math.min(cnt, buf.length);
  471. readFully(pos, buf, 0, n, curs);
  472. pos += n;
  473. cnt -= n;
  474. }
  475. }
  476. out.writeHeader(src, inflatedLength);
  477. out.write(buf, 0, (int) dataLength);
  478. } else {
  479. // Now we are committed to sending the object. As we spool it out,
  480. // check its CRC32 code to make sure there wasn't corruption between
  481. // the verification we did above, and us actually outputting it.
  482. //
  483. out.writeHeader(src, inflatedLength);
  484. long pos = dataOffset;
  485. long cnt = dataLength;
  486. while (cnt > 0) {
  487. final int n = (int) Math.min(cnt, buf.length);
  488. readFully(pos, buf, 0, n, curs);
  489. if (validate) {
  490. assert(crc2 != null);
  491. crc2.update(buf, 0, n);
  492. }
  493. out.write(buf, 0, n);
  494. pos += n;
  495. cnt -= n;
  496. }
  497. if (validate) {
  498. assert(crc2 != null);
  499. if (crc2.getValue() != expectedCRC) {
  500. throw new CorruptObjectException(MessageFormat.format(
  501. JGitText.get().objectAtHasBadZlibStream,
  502. Long.valueOf(src.offset), getPackFile()));
  503. }
  504. }
  505. }
  506. }
  507. boolean invalid() {
  508. return invalid;
  509. }
  510. void setInvalid() {
  511. invalid = true;
  512. }
  513. private void readFully(final long position, final byte[] dstbuf,
  514. int dstoff, final int cnt, final WindowCursor curs)
  515. throws IOException {
  516. if (curs.copy(this, position, dstbuf, dstoff, cnt) != cnt)
  517. throw new EOFException();
  518. }
  519. private synchronized void beginCopyAsIs(ObjectToPack otp)
  520. throws StoredObjectRepresentationNotAvailableException {
  521. if (++activeCopyRawData == 1 && activeWindows == 0) {
  522. try {
  523. doOpen();
  524. } catch (IOException thisPackNotValid) {
  525. StoredObjectRepresentationNotAvailableException gone;
  526. gone = new StoredObjectRepresentationNotAvailableException(otp);
  527. gone.initCause(thisPackNotValid);
  528. throw gone;
  529. }
  530. }
  531. }
  532. private synchronized void endCopyAsIs() {
  533. if (--activeCopyRawData == 0 && activeWindows == 0)
  534. doClose();
  535. }
  536. synchronized boolean beginWindowCache() throws IOException {
  537. if (++activeWindows == 1) {
  538. if (activeCopyRawData == 0)
  539. doOpen();
  540. return true;
  541. }
  542. return false;
  543. }
  544. synchronized boolean endWindowCache() {
  545. final boolean r = --activeWindows == 0;
  546. if (r && activeCopyRawData == 0)
  547. doClose();
  548. return r;
  549. }
  550. private void doOpen() throws IOException {
  551. try {
  552. if (invalid)
  553. throw new PackInvalidException(packFile);
  554. synchronized (readLock) {
  555. fd = new RandomAccessFile(packFile, "r"); //$NON-NLS-1$
  556. length = fd.length();
  557. onOpenPack();
  558. }
  559. } catch (InterruptedIOException e) {
  560. // don't invalidate the pack, we are interrupted from another thread
  561. openFail(false);
  562. throw e;
  563. } catch (IOException ioe) {
  564. openFail(true);
  565. throw ioe;
  566. } catch (RuntimeException re) {
  567. openFail(true);
  568. throw re;
  569. } catch (Error re) {
  570. openFail(true);
  571. throw re;
  572. }
  573. }
  574. private void openFail(boolean invalidate) {
  575. activeWindows = 0;
  576. activeCopyRawData = 0;
  577. invalid = invalidate;
  578. doClose();
  579. }
  580. private void doClose() {
  581. synchronized (readLock) {
  582. if (fd != null) {
  583. try {
  584. fd.close();
  585. } catch (IOException err) {
  586. // Ignore a close event. We had it open only for reading.
  587. // There should not be errors related to network buffers
  588. // not flushed, etc.
  589. }
  590. fd = null;
  591. }
  592. }
  593. }
  594. ByteArrayWindow read(final long pos, int size) throws IOException {
  595. synchronized (readLock) {
  596. if (length < pos + size)
  597. size = (int) (length - pos);
  598. final byte[] buf = new byte[size];
  599. fd.seek(pos);
  600. fd.readFully(buf, 0, size);
  601. return new ByteArrayWindow(this, pos, buf);
  602. }
  603. }
  604. ByteWindow mmap(final long pos, int size) throws IOException {
  605. synchronized (readLock) {
  606. if (length < pos + size)
  607. size = (int) (length - pos);
  608. MappedByteBuffer map;
  609. try {
  610. map = fd.getChannel().map(MapMode.READ_ONLY, pos, size);
  611. } catch (IOException ioe1) {
  612. // The most likely reason this failed is the JVM has run out
  613. // of virtual memory. We need to discard quickly, and try to
  614. // force the GC to finalize and release any existing mappings.
  615. //
  616. System.gc();
  617. System.runFinalization();
  618. map = fd.getChannel().map(MapMode.READ_ONLY, pos, size);
  619. }
  620. if (map.hasArray())
  621. return new ByteArrayWindow(this, pos, map.array());
  622. return new ByteBufferWindow(this, pos, map);
  623. }
  624. }
  625. private void onOpenPack() throws IOException {
  626. final PackIndex idx = idx();
  627. final byte[] buf = new byte[20];
  628. fd.seek(0);
  629. fd.readFully(buf, 0, 12);
  630. if (RawParseUtils.match(buf, 0, Constants.PACK_SIGNATURE) != 4)
  631. throw new IOException(JGitText.get().notAPACKFile);
  632. final long vers = NB.decodeUInt32(buf, 4);
  633. final long packCnt = NB.decodeUInt32(buf, 8);
  634. if (vers != 2 && vers != 3)
  635. throw new IOException(MessageFormat.format(
  636. JGitText.get().unsupportedPackVersion, Long.valueOf(vers)));
  637. if (packCnt != idx.getObjectCount())
  638. throw new PackMismatchException(MessageFormat.format(
  639. JGitText.get().packObjectCountMismatch,
  640. Long.valueOf(packCnt), Long.valueOf(idx.getObjectCount()),
  641. getPackFile()));
  642. fd.seek(length - 20);
  643. fd.readFully(buf, 0, 20);
  644. if (!Arrays.equals(buf, packChecksum))
  645. throw new PackMismatchException(MessageFormat.format(
  646. JGitText.get().packObjectCountMismatch
  647. , ObjectId.fromRaw(buf).name()
  648. , ObjectId.fromRaw(idx.packChecksum).name()
  649. , getPackFile()));
  650. }
  651. ObjectLoader load(final WindowCursor curs, long pos)
  652. throws IOException, LargeObjectException {
  653. try {
  654. final byte[] ib = curs.tempId;
  655. Delta delta = null;
  656. byte[] data = null;
  657. int type = Constants.OBJ_BAD;
  658. boolean cached = false;
  659. SEARCH: for (;;) {
  660. readFully(pos, ib, 0, 20, curs);
  661. int c = ib[0] & 0xff;
  662. final int typeCode = (c >> 4) & 7;
  663. long sz = c & 15;
  664. int shift = 4;
  665. int p = 1;
  666. while ((c & 0x80) != 0) {
  667. c = ib[p++] & 0xff;
  668. sz += ((long) (c & 0x7f)) << shift;
  669. shift += 7;
  670. }
  671. switch (typeCode) {
  672. case Constants.OBJ_COMMIT:
  673. case Constants.OBJ_TREE:
  674. case Constants.OBJ_BLOB:
  675. case Constants.OBJ_TAG: {
  676. if (delta != null || sz < curs.getStreamFileThreshold())
  677. data = decompress(pos + p, (int) sz, curs);
  678. if (delta != null) {
  679. type = typeCode;
  680. break SEARCH;
  681. }
  682. if (data != null)
  683. return new ObjectLoader.SmallObject(typeCode, data);
  684. else
  685. return new LargePackedWholeObject(typeCode, sz, pos, p,
  686. this, curs.db);
  687. }
  688. case Constants.OBJ_OFS_DELTA: {
  689. c = ib[p++] & 0xff;
  690. long base = c & 127;
  691. while ((c & 128) != 0) {
  692. base += 1;
  693. c = ib[p++] & 0xff;
  694. base <<= 7;
  695. base += (c & 127);
  696. }
  697. base = pos - base;
  698. delta = new Delta(delta, pos, (int) sz, p, base);
  699. if (sz != delta.deltaSize)
  700. break SEARCH;
  701. DeltaBaseCache.Entry e = curs.getDeltaBaseCache().get(this, base);
  702. if (e != null) {
  703. type = e.type;
  704. data = e.data;
  705. cached = true;
  706. break SEARCH;
  707. }
  708. pos = base;
  709. continue SEARCH;
  710. }
  711. case Constants.OBJ_REF_DELTA: {
  712. readFully(pos + p, ib, 0, 20, curs);
  713. long base = findDeltaBase(ObjectId.fromRaw(ib));
  714. delta = new Delta(delta, pos, (int) sz, p + 20, base);
  715. if (sz != delta.deltaSize)
  716. break SEARCH;
  717. DeltaBaseCache.Entry e = curs.getDeltaBaseCache().get(this, base);
  718. if (e != null) {
  719. type = e.type;
  720. data = e.data;
  721. cached = true;
  722. break SEARCH;
  723. }
  724. pos = base;
  725. continue SEARCH;
  726. }
  727. default:
  728. throw new IOException(MessageFormat.format(
  729. JGitText.get().unknownObjectType,
  730. Integer.valueOf(typeCode)));
  731. }
  732. }
  733. // At this point there is at least one delta to apply to data.
  734. // (Whole objects with no deltas to apply return early above.)
  735. if (data == null)
  736. throw new IOException(JGitText.get().inMemoryBufferLimitExceeded);
  737. assert(delta != null);
  738. do {
  739. // Cache only the base immediately before desired object.
  740. if (cached)
  741. cached = false;
  742. else if (delta.next == null)
  743. curs.getDeltaBaseCache().store(this, delta.basePos, data, type);
  744. pos = delta.deltaPos;
  745. final byte[] cmds = decompress(pos + delta.hdrLen,
  746. delta.deltaSize, curs);
  747. if (cmds == null) {
  748. data = null; // Discard base in case of OutOfMemoryError
  749. throw new LargeObjectException.OutOfMemory(new OutOfMemoryError());
  750. }
  751. final long sz = BinaryDelta.getResultSize(cmds);
  752. if (Integer.MAX_VALUE <= sz)
  753. throw new LargeObjectException.ExceedsByteArrayLimit();
  754. final byte[] result;
  755. try {
  756. result = new byte[(int) sz];
  757. } catch (OutOfMemoryError tooBig) {
  758. data = null; // Discard base in case of OutOfMemoryError
  759. throw new LargeObjectException.OutOfMemory(tooBig);
  760. }
  761. BinaryDelta.apply(data, cmds, result);
  762. data = result;
  763. delta = delta.next;
  764. } while (delta != null);
  765. return new ObjectLoader.SmallObject(type, data);
  766. } catch (DataFormatException dfe) {
  767. CorruptObjectException coe = new CorruptObjectException(
  768. MessageFormat.format(
  769. JGitText.get().objectAtHasBadZlibStream,
  770. Long.valueOf(pos), getPackFile()));
  771. coe.initCause(dfe);
  772. throw coe;
  773. }
  774. }
  775. private long findDeltaBase(ObjectId baseId) throws IOException,
  776. MissingObjectException {
  777. long ofs = idx().findOffset(baseId);
  778. if (ofs < 0)
  779. throw new MissingObjectException(baseId,
  780. JGitText.get().missingDeltaBase);
  781. return ofs;
  782. }
  783. private static class Delta {
  784. /** Child that applies onto this object. */
  785. final Delta next;
  786. /** Offset of the delta object. */
  787. final long deltaPos;
  788. /** Size of the inflated delta stream. */
  789. final int deltaSize;
  790. /** Total size of the delta's pack entry header (including base). */
  791. final int hdrLen;
  792. /** Offset of the base object this delta applies onto. */
  793. final long basePos;
  794. Delta(Delta next, long ofs, int sz, int hdrLen, long baseOffset) {
  795. this.next = next;
  796. this.deltaPos = ofs;
  797. this.deltaSize = sz;
  798. this.hdrLen = hdrLen;
  799. this.basePos = baseOffset;
  800. }
  801. }
  802. byte[] getDeltaHeader(WindowCursor wc, long pos)
  803. throws IOException, DataFormatException {
  804. // The delta stream starts as two variable length integers. If we
  805. // assume they are 64 bits each, we need 16 bytes to encode them,
  806. // plus 2 extra bytes for the variable length overhead. So 18 is
  807. // the longest delta instruction header.
  808. //
  809. final byte[] hdr = new byte[18];
  810. wc.inflate(this, pos, hdr, true /* headerOnly */);
  811. return hdr;
  812. }
  813. int getObjectType(final WindowCursor curs, long pos) throws IOException {
  814. final byte[] ib = curs.tempId;
  815. for (;;) {
  816. readFully(pos, ib, 0, 20, curs);
  817. int c = ib[0] & 0xff;
  818. final int type = (c >> 4) & 7;
  819. switch (type) {
  820. case Constants.OBJ_COMMIT:
  821. case Constants.OBJ_TREE:
  822. case Constants.OBJ_BLOB:
  823. case Constants.OBJ_TAG:
  824. return type;
  825. case Constants.OBJ_OFS_DELTA: {
  826. int p = 1;
  827. while ((c & 0x80) != 0)
  828. c = ib[p++] & 0xff;
  829. c = ib[p++] & 0xff;
  830. long ofs = c & 127;
  831. while ((c & 128) != 0) {
  832. ofs += 1;
  833. c = ib[p++] & 0xff;
  834. ofs <<= 7;
  835. ofs += (c & 127);
  836. }
  837. pos = pos - ofs;
  838. continue;
  839. }
  840. case Constants.OBJ_REF_DELTA: {
  841. int p = 1;
  842. while ((c & 0x80) != 0)
  843. c = ib[p++] & 0xff;
  844. readFully(pos + p, ib, 0, 20, curs);
  845. pos = findDeltaBase(ObjectId.fromRaw(ib));
  846. continue;
  847. }
  848. default:
  849. throw new IOException(
  850. MessageFormat.format(JGitText.get().unknownObjectType,
  851. Integer.valueOf(type)));
  852. }
  853. }
  854. }
  855. long getObjectSize(final WindowCursor curs, final AnyObjectId id)
  856. throws IOException {
  857. final long offset = idx().findOffset(id);
  858. return 0 < offset ? getObjectSize(curs, offset) : -1;
  859. }
  860. long getObjectSize(final WindowCursor curs, final long pos)
  861. throws IOException {
  862. final byte[] ib = curs.tempId;
  863. readFully(pos, ib, 0, 20, curs);
  864. int c = ib[0] & 0xff;
  865. final int type = (c >> 4) & 7;
  866. long sz = c & 15;
  867. int shift = 4;
  868. int p = 1;
  869. while ((c & 0x80) != 0) {
  870. c = ib[p++] & 0xff;
  871. sz += ((long) (c & 0x7f)) << shift;
  872. shift += 7;
  873. }
  874. long deltaAt;
  875. switch (type) {
  876. case Constants.OBJ_COMMIT:
  877. case Constants.OBJ_TREE:
  878. case Constants.OBJ_BLOB:
  879. case Constants.OBJ_TAG:
  880. return sz;
  881. case Constants.OBJ_OFS_DELTA:
  882. c = ib[p++] & 0xff;
  883. while ((c & 128) != 0)
  884. c = ib[p++] & 0xff;
  885. deltaAt = pos + p;
  886. break;
  887. case Constants.OBJ_REF_DELTA:
  888. deltaAt = pos + p + 20;
  889. break;
  890. default:
  891. throw new IOException(MessageFormat.format(
  892. JGitText.get().unknownObjectType, Integer.valueOf(type)));
  893. }
  894. try {
  895. return BinaryDelta.getResultSize(getDeltaHeader(curs, deltaAt));
  896. } catch (DataFormatException e) {
  897. throw new CorruptObjectException(MessageFormat.format(
  898. JGitText.get().objectAtHasBadZlibStream, Long.valueOf(pos),
  899. getPackFile()));
  900. }
  901. }
  902. LocalObjectRepresentation representation(final WindowCursor curs,
  903. final AnyObjectId objectId) throws IOException {
  904. final long pos = idx().findOffset(objectId);
  905. if (pos < 0)
  906. return null;
  907. final byte[] ib = curs.tempId;
  908. readFully(pos, ib, 0, 20, curs);
  909. int c = ib[0] & 0xff;
  910. int p = 1;
  911. final int typeCode = (c >> 4) & 7;
  912. while ((c & 0x80) != 0)
  913. c = ib[p++] & 0xff;
  914. long len = (findEndOffset(pos) - pos);
  915. switch (typeCode) {
  916. case Constants.OBJ_COMMIT:
  917. case Constants.OBJ_TREE:
  918. case Constants.OBJ_BLOB:
  919. case Constants.OBJ_TAG:
  920. return LocalObjectRepresentation.newWhole(this, pos, len - p);
  921. case Constants.OBJ_OFS_DELTA: {
  922. c = ib[p++] & 0xff;
  923. long ofs = c & 127;
  924. while ((c & 128) != 0) {
  925. ofs += 1;
  926. c = ib[p++] & 0xff;
  927. ofs <<= 7;
  928. ofs += (c & 127);
  929. }
  930. ofs = pos - ofs;
  931. return LocalObjectRepresentation.newDelta(this, pos, len - p, ofs);
  932. }
  933. case Constants.OBJ_REF_DELTA: {
  934. len -= p;
  935. len -= Constants.OBJECT_ID_LENGTH;
  936. readFully(pos + p, ib, 0, 20, curs);
  937. ObjectId id = ObjectId.fromRaw(ib);
  938. return LocalObjectRepresentation.newDelta(this, pos, len, id);
  939. }
  940. default:
  941. throw new IOException(
  942. MessageFormat.format(JGitText.get().unknownObjectType,
  943. Integer.valueOf(typeCode)));
  944. }
  945. }
  946. private long findEndOffset(final long startOffset)
  947. throws IOException, CorruptObjectException {
  948. final long maxOffset = length - 20;
  949. return getReverseIdx().findNextOffset(startOffset, maxOffset);
  950. }
  951. synchronized PackBitmapIndex getBitmapIndex() throws IOException {
  952. if (invalid || invalidBitmap)
  953. return null;
  954. if (bitmapIdx == null && hasExt(BITMAP_INDEX)) {
  955. final PackBitmapIndex idx = PackBitmapIndex.open(
  956. extFile(BITMAP_INDEX), idx(), getReverseIdx());
  957. // At this point, idx() will have set packChecksum.
  958. if (Arrays.equals(packChecksum, idx.packChecksum))
  959. bitmapIdx = idx;
  960. else
  961. invalidBitmap = true;
  962. }
  963. return bitmapIdx;
  964. }
  965. private synchronized PackReverseIndex getReverseIdx() throws IOException {
  966. if (reverseIdx == null)
  967. reverseIdx = new PackReverseIndex(idx());
  968. return reverseIdx;
  969. }
  970. private boolean isCorrupt(long offset) {
  971. LongList list = corruptObjects;
  972. if (list == null)
  973. return false;
  974. synchronized (list) {
  975. return list.contains(offset);
  976. }
  977. }
  978. private void setCorrupt(long offset) {
  979. LongList list = corruptObjects;
  980. if (list == null) {
  981. synchronized (readLock) {
  982. list = corruptObjects;
  983. if (list == null) {
  984. list = new LongList();
  985. corruptObjects = list;
  986. }
  987. }
  988. }
  989. synchronized (list) {
  990. list.add(offset);
  991. }
  992. }
  993. private File extFile(PackExt ext) {
  994. String p = packFile.getName();
  995. int dot = p.lastIndexOf('.');
  996. String b = (dot < 0) ? p : p.substring(0, dot);
  997. return new File(packFile.getParentFile(), b + '.' + ext.getExtension());
  998. }
  999. private boolean hasExt(PackExt ext) {
  1000. return (extensions & ext.getBit()) != 0;
  1001. }
  1002. }