You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

DfsReader.java 22KB

DFS: A storage layer for JGit In practice the DHT storage layer has not been performing as well as large scale server environments want to see from a Git server. The performance of the DHT schema degrades rapidly as small changes are pushed into the repository due to the chunk size being less than 1/3 of the pushed pack size. Small chunks cause poor prefetch performance during reading, and require significantly longer prefetch lists inside of the chunk meta field to work around the small size. The DHT code is very complex (>17,000 lines of code) and is very sensitive to the underlying database round-trip time, as well as the way objects were written into the pack stream that was chunked and stored on the database. A poor pack layout (from any version of C Git prior to Junio reworking it) can cause the DHT code to be unable to enumerate the objects of the linux-2.6 repository in a completable time scale. Performing a clone from a DHT stored repository of 2 million objects takes 2 million row lookups in the DHT to locate the OBJECT_INDEX row for each object being cloned. This is very difficult for some DHTs to scale, even at 5000 rows/second the lookup stage alone takes 6 minutes (on local filesystem, this is almost too fast to bother measuring). Some servers like Apache Cassandra just fall over and cannot complete the 2 million lookups in rapid fire. On a ~400 MiB repository, the DHT schema has an extra 25 MiB of redundant data that gets downloaded to the JGit process, and that is before you consider the cost of the OBJECT_INDEX table also being fully loaded, which is at least 223 MiB of data for the linux kernel repository. In the DHT schema answering a `git clone` of the ~400 MiB linux kernel needs to load 248 MiB of "index" data from the DHT, in addition to the ~400 MiB of pack data that gets sent to the client. This is 193 MiB more data to be accessed than the native filesystem format, but it needs to come over a much smaller pipe (local Ethernet typically) than the local SATA disk drive. I also never got around to writing the "repack" support for the DHT schema, as it turns out to be fairly complex to safely repack data in the repository while also trying to minimize the amount of changes made to the database, due to very common limitations on database mutation rates.. This new DFS storage layer fixes a lot of those issues by taking the simple approach for storing relatively standard Git pack and index files on an abstract filesystem. Packs are accessed by an in-process buffer cache, similar to the WindowCache used by the local filesystem storage layer. Unlike the local file IO, there are some assumptions that the storage system has relatively high latency and no concept of "file handles". Instead it looks at the file more like HTTP byte range requests, where a read channel is a simply a thunk to trigger a read request over the network. The DFS code in this change is still abstract, it does not store on any particular filesystem, but is fairly well suited to the Amazon S3 or Apache Hadoop HDFS. Storing packs directly on HDFS rather than HBase removes a layer of abstraction, as most HBase row reads turn into an HDFS read. Most of the DFS code in this change was blatently copied from the local filesystem code. Most parts should be refactored to be shared between the two storage systems, but right now I am hesistent to do this due to how well tuned the local filesystem code currently is. Change-Id: Iec524abdf172e9ec5485d6c88ca6512cd8a6eafb
13 years ago
DFS: A storage layer for JGit In practice the DHT storage layer has not been performing as well as large scale server environments want to see from a Git server. The performance of the DHT schema degrades rapidly as small changes are pushed into the repository due to the chunk size being less than 1/3 of the pushed pack size. Small chunks cause poor prefetch performance during reading, and require significantly longer prefetch lists inside of the chunk meta field to work around the small size. The DHT code is very complex (>17,000 lines of code) and is very sensitive to the underlying database round-trip time, as well as the way objects were written into the pack stream that was chunked and stored on the database. A poor pack layout (from any version of C Git prior to Junio reworking it) can cause the DHT code to be unable to enumerate the objects of the linux-2.6 repository in a completable time scale. Performing a clone from a DHT stored repository of 2 million objects takes 2 million row lookups in the DHT to locate the OBJECT_INDEX row for each object being cloned. This is very difficult for some DHTs to scale, even at 5000 rows/second the lookup stage alone takes 6 minutes (on local filesystem, this is almost too fast to bother measuring). Some servers like Apache Cassandra just fall over and cannot complete the 2 million lookups in rapid fire. On a ~400 MiB repository, the DHT schema has an extra 25 MiB of redundant data that gets downloaded to the JGit process, and that is before you consider the cost of the OBJECT_INDEX table also being fully loaded, which is at least 223 MiB of data for the linux kernel repository. In the DHT schema answering a `git clone` of the ~400 MiB linux kernel needs to load 248 MiB of "index" data from the DHT, in addition to the ~400 MiB of pack data that gets sent to the client. This is 193 MiB more data to be accessed than the native filesystem format, but it needs to come over a much smaller pipe (local Ethernet typically) than the local SATA disk drive. I also never got around to writing the "repack" support for the DHT schema, as it turns out to be fairly complex to safely repack data in the repository while also trying to minimize the amount of changes made to the database, due to very common limitations on database mutation rates.. This new DFS storage layer fixes a lot of those issues by taking the simple approach for storing relatively standard Git pack and index files on an abstract filesystem. Packs are accessed by an in-process buffer cache, similar to the WindowCache used by the local filesystem storage layer. Unlike the local file IO, there are some assumptions that the storage system has relatively high latency and no concept of "file handles". Instead it looks at the file more like HTTP byte range requests, where a read channel is a simply a thunk to trigger a read request over the network. The DFS code in this change is still abstract, it does not store on any particular filesystem, but is fairly well suited to the Amazon S3 or Apache Hadoop HDFS. Storing packs directly on HDFS rather than HBase removes a layer of abstraction, as most HBase row reads turn into an HDFS read. Most of the DFS code in this change was blatently copied from the local filesystem code. Most parts should be refactored to be shared between the two storage systems, but right now I am hesistent to do this due to how well tuned the local filesystem code currently is. Change-Id: Iec524abdf172e9ec5485d6c88ca6512cd8a6eafb
13 years ago
DFS: A storage layer for JGit In practice the DHT storage layer has not been performing as well as large scale server environments want to see from a Git server. The performance of the DHT schema degrades rapidly as small changes are pushed into the repository due to the chunk size being less than 1/3 of the pushed pack size. Small chunks cause poor prefetch performance during reading, and require significantly longer prefetch lists inside of the chunk meta field to work around the small size. The DHT code is very complex (>17,000 lines of code) and is very sensitive to the underlying database round-trip time, as well as the way objects were written into the pack stream that was chunked and stored on the database. A poor pack layout (from any version of C Git prior to Junio reworking it) can cause the DHT code to be unable to enumerate the objects of the linux-2.6 repository in a completable time scale. Performing a clone from a DHT stored repository of 2 million objects takes 2 million row lookups in the DHT to locate the OBJECT_INDEX row for each object being cloned. This is very difficult for some DHTs to scale, even at 5000 rows/second the lookup stage alone takes 6 minutes (on local filesystem, this is almost too fast to bother measuring). Some servers like Apache Cassandra just fall over and cannot complete the 2 million lookups in rapid fire. On a ~400 MiB repository, the DHT schema has an extra 25 MiB of redundant data that gets downloaded to the JGit process, and that is before you consider the cost of the OBJECT_INDEX table also being fully loaded, which is at least 223 MiB of data for the linux kernel repository. In the DHT schema answering a `git clone` of the ~400 MiB linux kernel needs to load 248 MiB of "index" data from the DHT, in addition to the ~400 MiB of pack data that gets sent to the client. This is 193 MiB more data to be accessed than the native filesystem format, but it needs to come over a much smaller pipe (local Ethernet typically) than the local SATA disk drive. I also never got around to writing the "repack" support for the DHT schema, as it turns out to be fairly complex to safely repack data in the repository while also trying to minimize the amount of changes made to the database, due to very common limitations on database mutation rates.. This new DFS storage layer fixes a lot of those issues by taking the simple approach for storing relatively standard Git pack and index files on an abstract filesystem. Packs are accessed by an in-process buffer cache, similar to the WindowCache used by the local filesystem storage layer. Unlike the local file IO, there are some assumptions that the storage system has relatively high latency and no concept of "file handles". Instead it looks at the file more like HTTP byte range requests, where a read channel is a simply a thunk to trigger a read request over the network. The DFS code in this change is still abstract, it does not store on any particular filesystem, but is fairly well suited to the Amazon S3 or Apache Hadoop HDFS. Storing packs directly on HDFS rather than HBase removes a layer of abstraction, as most HBase row reads turn into an HDFS read. Most of the DFS code in this change was blatently copied from the local filesystem code. Most parts should be refactored to be shared between the two storage systems, but right now I am hesistent to do this due to how well tuned the local filesystem code currently is. Change-Id: Iec524abdf172e9ec5485d6c88ca6512cd8a6eafb
13 years ago
DFS: A storage layer for JGit In practice the DHT storage layer has not been performing as well as large scale server environments want to see from a Git server. The performance of the DHT schema degrades rapidly as small changes are pushed into the repository due to the chunk size being less than 1/3 of the pushed pack size. Small chunks cause poor prefetch performance during reading, and require significantly longer prefetch lists inside of the chunk meta field to work around the small size. The DHT code is very complex (>17,000 lines of code) and is very sensitive to the underlying database round-trip time, as well as the way objects were written into the pack stream that was chunked and stored on the database. A poor pack layout (from any version of C Git prior to Junio reworking it) can cause the DHT code to be unable to enumerate the objects of the linux-2.6 repository in a completable time scale. Performing a clone from a DHT stored repository of 2 million objects takes 2 million row lookups in the DHT to locate the OBJECT_INDEX row for each object being cloned. This is very difficult for some DHTs to scale, even at 5000 rows/second the lookup stage alone takes 6 minutes (on local filesystem, this is almost too fast to bother measuring). Some servers like Apache Cassandra just fall over and cannot complete the 2 million lookups in rapid fire. On a ~400 MiB repository, the DHT schema has an extra 25 MiB of redundant data that gets downloaded to the JGit process, and that is before you consider the cost of the OBJECT_INDEX table also being fully loaded, which is at least 223 MiB of data for the linux kernel repository. In the DHT schema answering a `git clone` of the ~400 MiB linux kernel needs to load 248 MiB of "index" data from the DHT, in addition to the ~400 MiB of pack data that gets sent to the client. This is 193 MiB more data to be accessed than the native filesystem format, but it needs to come over a much smaller pipe (local Ethernet typically) than the local SATA disk drive. I also never got around to writing the "repack" support for the DHT schema, as it turns out to be fairly complex to safely repack data in the repository while also trying to minimize the amount of changes made to the database, due to very common limitations on database mutation rates.. This new DFS storage layer fixes a lot of those issues by taking the simple approach for storing relatively standard Git pack and index files on an abstract filesystem. Packs are accessed by an in-process buffer cache, similar to the WindowCache used by the local filesystem storage layer. Unlike the local file IO, there are some assumptions that the storage system has relatively high latency and no concept of "file handles". Instead it looks at the file more like HTTP byte range requests, where a read channel is a simply a thunk to trigger a read request over the network. The DFS code in this change is still abstract, it does not store on any particular filesystem, but is fairly well suited to the Amazon S3 or Apache Hadoop HDFS. Storing packs directly on HDFS rather than HBase removes a layer of abstraction, as most HBase row reads turn into an HDFS read. Most of the DFS code in this change was blatently copied from the local filesystem code. Most parts should be refactored to be shared between the two storage systems, but right now I am hesistent to do this due to how well tuned the local filesystem code currently is. Change-Id: Iec524abdf172e9ec5485d6c88ca6512cd8a6eafb
13 years ago
DFS: A storage layer for JGit In practice the DHT storage layer has not been performing as well as large scale server environments want to see from a Git server. The performance of the DHT schema degrades rapidly as small changes are pushed into the repository due to the chunk size being less than 1/3 of the pushed pack size. Small chunks cause poor prefetch performance during reading, and require significantly longer prefetch lists inside of the chunk meta field to work around the small size. The DHT code is very complex (>17,000 lines of code) and is very sensitive to the underlying database round-trip time, as well as the way objects were written into the pack stream that was chunked and stored on the database. A poor pack layout (from any version of C Git prior to Junio reworking it) can cause the DHT code to be unable to enumerate the objects of the linux-2.6 repository in a completable time scale. Performing a clone from a DHT stored repository of 2 million objects takes 2 million row lookups in the DHT to locate the OBJECT_INDEX row for each object being cloned. This is very difficult for some DHTs to scale, even at 5000 rows/second the lookup stage alone takes 6 minutes (on local filesystem, this is almost too fast to bother measuring). Some servers like Apache Cassandra just fall over and cannot complete the 2 million lookups in rapid fire. On a ~400 MiB repository, the DHT schema has an extra 25 MiB of redundant data that gets downloaded to the JGit process, and that is before you consider the cost of the OBJECT_INDEX table also being fully loaded, which is at least 223 MiB of data for the linux kernel repository. In the DHT schema answering a `git clone` of the ~400 MiB linux kernel needs to load 248 MiB of "index" data from the DHT, in addition to the ~400 MiB of pack data that gets sent to the client. This is 193 MiB more data to be accessed than the native filesystem format, but it needs to come over a much smaller pipe (local Ethernet typically) than the local SATA disk drive. I also never got around to writing the "repack" support for the DHT schema, as it turns out to be fairly complex to safely repack data in the repository while also trying to minimize the amount of changes made to the database, due to very common limitations on database mutation rates.. This new DFS storage layer fixes a lot of those issues by taking the simple approach for storing relatively standard Git pack and index files on an abstract filesystem. Packs are accessed by an in-process buffer cache, similar to the WindowCache used by the local filesystem storage layer. Unlike the local file IO, there are some assumptions that the storage system has relatively high latency and no concept of "file handles". Instead it looks at the file more like HTTP byte range requests, where a read channel is a simply a thunk to trigger a read request over the network. The DFS code in this change is still abstract, it does not store on any particular filesystem, but is fairly well suited to the Amazon S3 or Apache Hadoop HDFS. Storing packs directly on HDFS rather than HBase removes a layer of abstraction, as most HBase row reads turn into an HDFS read. Most of the DFS code in this change was blatently copied from the local filesystem code. Most parts should be refactored to be shared between the two storage systems, but right now I am hesistent to do this due to how well tuned the local filesystem code currently is. Change-Id: Iec524abdf172e9ec5485d6c88ca6512cd8a6eafb
13 years ago
DFS: A storage layer for JGit In practice the DHT storage layer has not been performing as well as large scale server environments want to see from a Git server. The performance of the DHT schema degrades rapidly as small changes are pushed into the repository due to the chunk size being less than 1/3 of the pushed pack size. Small chunks cause poor prefetch performance during reading, and require significantly longer prefetch lists inside of the chunk meta field to work around the small size. The DHT code is very complex (>17,000 lines of code) and is very sensitive to the underlying database round-trip time, as well as the way objects were written into the pack stream that was chunked and stored on the database. A poor pack layout (from any version of C Git prior to Junio reworking it) can cause the DHT code to be unable to enumerate the objects of the linux-2.6 repository in a completable time scale. Performing a clone from a DHT stored repository of 2 million objects takes 2 million row lookups in the DHT to locate the OBJECT_INDEX row for each object being cloned. This is very difficult for some DHTs to scale, even at 5000 rows/second the lookup stage alone takes 6 minutes (on local filesystem, this is almost too fast to bother measuring). Some servers like Apache Cassandra just fall over and cannot complete the 2 million lookups in rapid fire. On a ~400 MiB repository, the DHT schema has an extra 25 MiB of redundant data that gets downloaded to the JGit process, and that is before you consider the cost of the OBJECT_INDEX table also being fully loaded, which is at least 223 MiB of data for the linux kernel repository. In the DHT schema answering a `git clone` of the ~400 MiB linux kernel needs to load 248 MiB of "index" data from the DHT, in addition to the ~400 MiB of pack data that gets sent to the client. This is 193 MiB more data to be accessed than the native filesystem format, but it needs to come over a much smaller pipe (local Ethernet typically) than the local SATA disk drive. I also never got around to writing the "repack" support for the DHT schema, as it turns out to be fairly complex to safely repack data in the repository while also trying to minimize the amount of changes made to the database, due to very common limitations on database mutation rates.. This new DFS storage layer fixes a lot of those issues by taking the simple approach for storing relatively standard Git pack and index files on an abstract filesystem. Packs are accessed by an in-process buffer cache, similar to the WindowCache used by the local filesystem storage layer. Unlike the local file IO, there are some assumptions that the storage system has relatively high latency and no concept of "file handles". Instead it looks at the file more like HTTP byte range requests, where a read channel is a simply a thunk to trigger a read request over the network. The DFS code in this change is still abstract, it does not store on any particular filesystem, but is fairly well suited to the Amazon S3 or Apache Hadoop HDFS. Storing packs directly on HDFS rather than HBase removes a layer of abstraction, as most HBase row reads turn into an HDFS read. Most of the DFS code in this change was blatently copied from the local filesystem code. Most parts should be refactored to be shared between the two storage systems, but right now I am hesistent to do this due to how well tuned the local filesystem code currently is. Change-Id: Iec524abdf172e9ec5485d6c88ca6512cd8a6eafb
13 years ago
DFS: A storage layer for JGit In practice the DHT storage layer has not been performing as well as large scale server environments want to see from a Git server. The performance of the DHT schema degrades rapidly as small changes are pushed into the repository due to the chunk size being less than 1/3 of the pushed pack size. Small chunks cause poor prefetch performance during reading, and require significantly longer prefetch lists inside of the chunk meta field to work around the small size. The DHT code is very complex (>17,000 lines of code) and is very sensitive to the underlying database round-trip time, as well as the way objects were written into the pack stream that was chunked and stored on the database. A poor pack layout (from any version of C Git prior to Junio reworking it) can cause the DHT code to be unable to enumerate the objects of the linux-2.6 repository in a completable time scale. Performing a clone from a DHT stored repository of 2 million objects takes 2 million row lookups in the DHT to locate the OBJECT_INDEX row for each object being cloned. This is very difficult for some DHTs to scale, even at 5000 rows/second the lookup stage alone takes 6 minutes (on local filesystem, this is almost too fast to bother measuring). Some servers like Apache Cassandra just fall over and cannot complete the 2 million lookups in rapid fire. On a ~400 MiB repository, the DHT schema has an extra 25 MiB of redundant data that gets downloaded to the JGit process, and that is before you consider the cost of the OBJECT_INDEX table also being fully loaded, which is at least 223 MiB of data for the linux kernel repository. In the DHT schema answering a `git clone` of the ~400 MiB linux kernel needs to load 248 MiB of "index" data from the DHT, in addition to the ~400 MiB of pack data that gets sent to the client. This is 193 MiB more data to be accessed than the native filesystem format, but it needs to come over a much smaller pipe (local Ethernet typically) than the local SATA disk drive. I also never got around to writing the "repack" support for the DHT schema, as it turns out to be fairly complex to safely repack data in the repository while also trying to minimize the amount of changes made to the database, due to very common limitations on database mutation rates.. This new DFS storage layer fixes a lot of those issues by taking the simple approach for storing relatively standard Git pack and index files on an abstract filesystem. Packs are accessed by an in-process buffer cache, similar to the WindowCache used by the local filesystem storage layer. Unlike the local file IO, there are some assumptions that the storage system has relatively high latency and no concept of "file handles". Instead it looks at the file more like HTTP byte range requests, where a read channel is a simply a thunk to trigger a read request over the network. The DFS code in this change is still abstract, it does not store on any particular filesystem, but is fairly well suited to the Amazon S3 or Apache Hadoop HDFS. Storing packs directly on HDFS rather than HBase removes a layer of abstraction, as most HBase row reads turn into an HDFS read. Most of the DFS code in this change was blatently copied from the local filesystem code. Most parts should be refactored to be shared between the two storage systems, but right now I am hesistent to do this due to how well tuned the local filesystem code currently is. Change-Id: Iec524abdf172e9ec5485d6c88ca6512cd8a6eafb
13 years ago
DFS: A storage layer for JGit In practice the DHT storage layer has not been performing as well as large scale server environments want to see from a Git server. The performance of the DHT schema degrades rapidly as small changes are pushed into the repository due to the chunk size being less than 1/3 of the pushed pack size. Small chunks cause poor prefetch performance during reading, and require significantly longer prefetch lists inside of the chunk meta field to work around the small size. The DHT code is very complex (>17,000 lines of code) and is very sensitive to the underlying database round-trip time, as well as the way objects were written into the pack stream that was chunked and stored on the database. A poor pack layout (from any version of C Git prior to Junio reworking it) can cause the DHT code to be unable to enumerate the objects of the linux-2.6 repository in a completable time scale. Performing a clone from a DHT stored repository of 2 million objects takes 2 million row lookups in the DHT to locate the OBJECT_INDEX row for each object being cloned. This is very difficult for some DHTs to scale, even at 5000 rows/second the lookup stage alone takes 6 minutes (on local filesystem, this is almost too fast to bother measuring). Some servers like Apache Cassandra just fall over and cannot complete the 2 million lookups in rapid fire. On a ~400 MiB repository, the DHT schema has an extra 25 MiB of redundant data that gets downloaded to the JGit process, and that is before you consider the cost of the OBJECT_INDEX table also being fully loaded, which is at least 223 MiB of data for the linux kernel repository. In the DHT schema answering a `git clone` of the ~400 MiB linux kernel needs to load 248 MiB of "index" data from the DHT, in addition to the ~400 MiB of pack data that gets sent to the client. This is 193 MiB more data to be accessed than the native filesystem format, but it needs to come over a much smaller pipe (local Ethernet typically) than the local SATA disk drive. I also never got around to writing the "repack" support for the DHT schema, as it turns out to be fairly complex to safely repack data in the repository while also trying to minimize the amount of changes made to the database, due to very common limitations on database mutation rates.. This new DFS storage layer fixes a lot of those issues by taking the simple approach for storing relatively standard Git pack and index files on an abstract filesystem. Packs are accessed by an in-process buffer cache, similar to the WindowCache used by the local filesystem storage layer. Unlike the local file IO, there are some assumptions that the storage system has relatively high latency and no concept of "file handles". Instead it looks at the file more like HTTP byte range requests, where a read channel is a simply a thunk to trigger a read request over the network. The DFS code in this change is still abstract, it does not store on any particular filesystem, but is fairly well suited to the Amazon S3 or Apache Hadoop HDFS. Storing packs directly on HDFS rather than HBase removes a layer of abstraction, as most HBase row reads turn into an HDFS read. Most of the DFS code in this change was blatently copied from the local filesystem code. Most parts should be refactored to be shared between the two storage systems, but right now I am hesistent to do this due to how well tuned the local filesystem code currently is. Change-Id: Iec524abdf172e9ec5485d6c88ca6512cd8a6eafb
13 years ago
DFS: A storage layer for JGit In practice the DHT storage layer has not been performing as well as large scale server environments want to see from a Git server. The performance of the DHT schema degrades rapidly as small changes are pushed into the repository due to the chunk size being less than 1/3 of the pushed pack size. Small chunks cause poor prefetch performance during reading, and require significantly longer prefetch lists inside of the chunk meta field to work around the small size. The DHT code is very complex (>17,000 lines of code) and is very sensitive to the underlying database round-trip time, as well as the way objects were written into the pack stream that was chunked and stored on the database. A poor pack layout (from any version of C Git prior to Junio reworking it) can cause the DHT code to be unable to enumerate the objects of the linux-2.6 repository in a completable time scale. Performing a clone from a DHT stored repository of 2 million objects takes 2 million row lookups in the DHT to locate the OBJECT_INDEX row for each object being cloned. This is very difficult for some DHTs to scale, even at 5000 rows/second the lookup stage alone takes 6 minutes (on local filesystem, this is almost too fast to bother measuring). Some servers like Apache Cassandra just fall over and cannot complete the 2 million lookups in rapid fire. On a ~400 MiB repository, the DHT schema has an extra 25 MiB of redundant data that gets downloaded to the JGit process, and that is before you consider the cost of the OBJECT_INDEX table also being fully loaded, which is at least 223 MiB of data for the linux kernel repository. In the DHT schema answering a `git clone` of the ~400 MiB linux kernel needs to load 248 MiB of "index" data from the DHT, in addition to the ~400 MiB of pack data that gets sent to the client. This is 193 MiB more data to be accessed than the native filesystem format, but it needs to come over a much smaller pipe (local Ethernet typically) than the local SATA disk drive. I also never got around to writing the "repack" support for the DHT schema, as it turns out to be fairly complex to safely repack data in the repository while also trying to minimize the amount of changes made to the database, due to very common limitations on database mutation rates.. This new DFS storage layer fixes a lot of those issues by taking the simple approach for storing relatively standard Git pack and index files on an abstract filesystem. Packs are accessed by an in-process buffer cache, similar to the WindowCache used by the local filesystem storage layer. Unlike the local file IO, there are some assumptions that the storage system has relatively high latency and no concept of "file handles". Instead it looks at the file more like HTTP byte range requests, where a read channel is a simply a thunk to trigger a read request over the network. The DFS code in this change is still abstract, it does not store on any particular filesystem, but is fairly well suited to the Amazon S3 or Apache Hadoop HDFS. Storing packs directly on HDFS rather than HBase removes a layer of abstraction, as most HBase row reads turn into an HDFS read. Most of the DFS code in this change was blatently copied from the local filesystem code. Most parts should be refactored to be shared between the two storage systems, but right now I am hesistent to do this due to how well tuned the local filesystem code currently is. Change-Id: Iec524abdf172e9ec5485d6c88ca6512cd8a6eafb
13 years ago
DFS: A storage layer for JGit In practice the DHT storage layer has not been performing as well as large scale server environments want to see from a Git server. The performance of the DHT schema degrades rapidly as small changes are pushed into the repository due to the chunk size being less than 1/3 of the pushed pack size. Small chunks cause poor prefetch performance during reading, and require significantly longer prefetch lists inside of the chunk meta field to work around the small size. The DHT code is very complex (>17,000 lines of code) and is very sensitive to the underlying database round-trip time, as well as the way objects were written into the pack stream that was chunked and stored on the database. A poor pack layout (from any version of C Git prior to Junio reworking it) can cause the DHT code to be unable to enumerate the objects of the linux-2.6 repository in a completable time scale. Performing a clone from a DHT stored repository of 2 million objects takes 2 million row lookups in the DHT to locate the OBJECT_INDEX row for each object being cloned. This is very difficult for some DHTs to scale, even at 5000 rows/second the lookup stage alone takes 6 minutes (on local filesystem, this is almost too fast to bother measuring). Some servers like Apache Cassandra just fall over and cannot complete the 2 million lookups in rapid fire. On a ~400 MiB repository, the DHT schema has an extra 25 MiB of redundant data that gets downloaded to the JGit process, and that is before you consider the cost of the OBJECT_INDEX table also being fully loaded, which is at least 223 MiB of data for the linux kernel repository. In the DHT schema answering a `git clone` of the ~400 MiB linux kernel needs to load 248 MiB of "index" data from the DHT, in addition to the ~400 MiB of pack data that gets sent to the client. This is 193 MiB more data to be accessed than the native filesystem format, but it needs to come over a much smaller pipe (local Ethernet typically) than the local SATA disk drive. I also never got around to writing the "repack" support for the DHT schema, as it turns out to be fairly complex to safely repack data in the repository while also trying to minimize the amount of changes made to the database, due to very common limitations on database mutation rates.. This new DFS storage layer fixes a lot of those issues by taking the simple approach for storing relatively standard Git pack and index files on an abstract filesystem. Packs are accessed by an in-process buffer cache, similar to the WindowCache used by the local filesystem storage layer. Unlike the local file IO, there are some assumptions that the storage system has relatively high latency and no concept of "file handles". Instead it looks at the file more like HTTP byte range requests, where a read channel is a simply a thunk to trigger a read request over the network. The DFS code in this change is still abstract, it does not store on any particular filesystem, but is fairly well suited to the Amazon S3 or Apache Hadoop HDFS. Storing packs directly on HDFS rather than HBase removes a layer of abstraction, as most HBase row reads turn into an HDFS read. Most of the DFS code in this change was blatently copied from the local filesystem code. Most parts should be refactored to be shared between the two storage systems, but right now I am hesistent to do this due to how well tuned the local filesystem code currently is. Change-Id: Iec524abdf172e9ec5485d6c88ca6512cd8a6eafb
13 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788
  1. /*
  2. * Copyright (C) 2008-2011, Google Inc.
  3. * Copyright (C) 2006-2008, Shawn O. Pearce <spearce@spearce.org>
  4. * and other copyright owners as documented in the project's IP log.
  5. *
  6. * This program and the accompanying materials are made available
  7. * under the terms of the Eclipse Distribution License v1.0 which
  8. * accompanies this distribution, is reproduced below, and is
  9. * available at http://www.eclipse.org/org/documents/edl-v10.php
  10. *
  11. * All rights reserved.
  12. *
  13. * Redistribution and use in source and binary forms, with or
  14. * without modification, are permitted provided that the following
  15. * conditions are met:
  16. *
  17. * - Redistributions of source code must retain the above copyright
  18. * notice, this list of conditions and the following disclaimer.
  19. *
  20. * - Redistributions in binary form must reproduce the above
  21. * copyright notice, this list of conditions and the following
  22. * disclaimer in the documentation and/or other materials provided
  23. * with the distribution.
  24. *
  25. * - Neither the name of the Eclipse Foundation, Inc. nor the
  26. * names of its contributors may be used to endorse or promote
  27. * products derived from this software without specific prior
  28. * written permission.
  29. *
  30. * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
  31. * CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
  32. * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  33. * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  34. * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
  35. * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
  36. * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
  37. * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
  38. * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  39. * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
  40. * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
  41. * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
  42. * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  43. */
  44. package org.eclipse.jgit.storage.dfs;
  45. import static org.eclipse.jgit.lib.Constants.OBJECT_ID_LENGTH;
  46. import static org.eclipse.jgit.lib.Constants.OBJ_BLOB;
  47. import static org.eclipse.jgit.lib.Constants.OBJ_TREE;
  48. import java.io.IOException;
  49. import java.io.InterruptedIOException;
  50. import java.security.MessageDigest;
  51. import java.text.MessageFormat;
  52. import java.util.ArrayList;
  53. import java.util.Arrays;
  54. import java.util.Collection;
  55. import java.util.Collections;
  56. import java.util.Comparator;
  57. import java.util.HashSet;
  58. import java.util.Iterator;
  59. import java.util.LinkedList;
  60. import java.util.List;
  61. import java.util.concurrent.ExecutionException;
  62. import java.util.zip.DataFormatException;
  63. import java.util.zip.Inflater;
  64. import org.eclipse.jgit.JGitText;
  65. import org.eclipse.jgit.errors.IncorrectObjectTypeException;
  66. import org.eclipse.jgit.errors.MissingObjectException;
  67. import org.eclipse.jgit.errors.StoredObjectRepresentationNotAvailableException;
  68. import org.eclipse.jgit.lib.AbbreviatedObjectId;
  69. import org.eclipse.jgit.lib.AnyObjectId;
  70. import org.eclipse.jgit.lib.AsyncObjectLoaderQueue;
  71. import org.eclipse.jgit.lib.AsyncObjectSizeQueue;
  72. import org.eclipse.jgit.lib.Constants;
  73. import org.eclipse.jgit.lib.InflaterCache;
  74. import org.eclipse.jgit.lib.ObjectId;
  75. import org.eclipse.jgit.lib.ObjectLoader;
  76. import org.eclipse.jgit.lib.ObjectReader;
  77. import org.eclipse.jgit.lib.ProgressMonitor;
  78. import org.eclipse.jgit.revwalk.ObjectWalk;
  79. import org.eclipse.jgit.revwalk.RevCommit;
  80. import org.eclipse.jgit.revwalk.RevObject;
  81. import org.eclipse.jgit.revwalk.RevWalk;
  82. import org.eclipse.jgit.storage.pack.CachedPack;
  83. import org.eclipse.jgit.storage.pack.ObjectReuseAsIs;
  84. import org.eclipse.jgit.storage.pack.ObjectToPack;
  85. import org.eclipse.jgit.storage.pack.PackOutputStream;
  86. import org.eclipse.jgit.storage.pack.PackWriter;
  87. import org.eclipse.jgit.util.BlockList;
  88. final class DfsReader extends ObjectReader implements ObjectReuseAsIs {
  89. /** Temporary buffer large enough for at least one raw object id. */
  90. final byte[] tempId = new byte[OBJECT_ID_LENGTH];
  91. /** Database this reader loads objects from. */
  92. final DfsObjDatabase db;
  93. private Inflater inf;
  94. private DfsBlock block;
  95. private DeltaBaseCache baseCache;
  96. private DfsPackFile last;
  97. private boolean wantReadAhead;
  98. private List<ReadAheadTask.BlockFuture> pendingReadAhead;
  99. DfsReader(DfsObjDatabase db) {
  100. this.db = db;
  101. }
  102. DfsReaderOptions getOptions() {
  103. return db.getReaderOptions();
  104. }
  105. DeltaBaseCache getDeltaBaseCache() {
  106. if (baseCache == null)
  107. baseCache = new DeltaBaseCache(this);
  108. return baseCache;
  109. }
  110. int getStreamFileThreshold() {
  111. return getOptions().getStreamFileThreshold();
  112. }
  113. @Override
  114. public ObjectReader newReader() {
  115. return new DfsReader(db);
  116. }
  117. @Override
  118. public Collection<ObjectId> resolve(AbbreviatedObjectId id)
  119. throws IOException {
  120. if (id.isComplete())
  121. return Collections.singleton(id.toObjectId());
  122. HashSet<ObjectId> matches = new HashSet<ObjectId>(4);
  123. for (DfsPackFile pack : db.getPacks()) {
  124. pack.resolve(this, matches, id, 256);
  125. if (256 <= matches.size())
  126. break;
  127. }
  128. return matches;
  129. }
  130. @Override
  131. public boolean has(AnyObjectId objectId) throws IOException {
  132. if (last != null && last.hasObject(this, objectId))
  133. return true;
  134. for (DfsPackFile pack : db.getPacks()) {
  135. if (last == pack)
  136. continue;
  137. if (pack.hasObject(this, objectId)) {
  138. last = pack;
  139. return true;
  140. }
  141. }
  142. return false;
  143. }
  144. @Override
  145. public ObjectLoader open(AnyObjectId objectId, int typeHint)
  146. throws MissingObjectException, IncorrectObjectTypeException,
  147. IOException {
  148. if (last != null) {
  149. ObjectLoader ldr = last.get(this, objectId);
  150. if (ldr != null)
  151. return ldr;
  152. }
  153. for (DfsPackFile pack : db.getPacks()) {
  154. if (pack == last)
  155. continue;
  156. ObjectLoader ldr = pack.get(this, objectId);
  157. if (ldr != null) {
  158. last = pack;
  159. return ldr;
  160. }
  161. }
  162. if (typeHint == OBJ_ANY)
  163. throw new MissingObjectException(objectId.copy(), "unknown");
  164. throw new MissingObjectException(objectId.copy(), typeHint);
  165. }
  166. private static final Comparator<FoundObject<?>> FOUND_OBJECT_SORT = new Comparator<FoundObject<?>>() {
  167. public int compare(FoundObject<?> a, FoundObject<?> b) {
  168. int cmp = a.packIndex - b.packIndex;
  169. if (cmp == 0)
  170. cmp = Long.signum(a.offset - b.offset);
  171. return cmp;
  172. }
  173. };
  174. private static class FoundObject<T extends ObjectId> {
  175. final T id;
  176. final DfsPackFile pack;
  177. final long offset;
  178. final int packIndex;
  179. FoundObject(T objectId, int packIdx, DfsPackFile pack, long offset) {
  180. this.id = objectId;
  181. this.pack = pack;
  182. this.offset = offset;
  183. this.packIndex = packIdx;
  184. }
  185. FoundObject(T objectId) {
  186. this.id = objectId;
  187. this.pack = null;
  188. this.offset = 0;
  189. this.packIndex = 0;
  190. }
  191. }
  192. private <T extends ObjectId> Iterable<FoundObject<T>> findAll(
  193. Iterable<T> objectIds) throws IOException {
  194. ArrayList<FoundObject<T>> r = new ArrayList<FoundObject<T>>();
  195. DfsPackFile[] packList = db.getPacks();
  196. if (packList.length == 0) {
  197. for (T t : objectIds)
  198. r.add(new FoundObject<T>(t));
  199. return r;
  200. }
  201. int lastIdx = 0;
  202. DfsPackFile lastPack = packList[lastIdx];
  203. OBJECT_SCAN: for (T t : objectIds) {
  204. try {
  205. long p = lastPack.findOffset(this, t);
  206. if (0 < p) {
  207. r.add(new FoundObject<T>(t, lastIdx, lastPack, p));
  208. continue;
  209. }
  210. } catch (IOException e) {
  211. // Fall though and try to examine other packs.
  212. }
  213. for (int i = 0; i < packList.length; i++) {
  214. if (i == lastIdx)
  215. continue;
  216. DfsPackFile pack = packList[i];
  217. try {
  218. long p = pack.findOffset(this, t);
  219. if (0 < p) {
  220. r.add(new FoundObject<T>(t, i, pack, p));
  221. lastIdx = i;
  222. lastPack = pack;
  223. continue OBJECT_SCAN;
  224. }
  225. } catch (IOException e) {
  226. // Examine other packs.
  227. }
  228. }
  229. r.add(new FoundObject<T>(t));
  230. }
  231. Collections.sort(r, FOUND_OBJECT_SORT);
  232. last = lastPack;
  233. return r;
  234. }
  235. @Override
  236. public <T extends ObjectId> AsyncObjectLoaderQueue<T> open(
  237. Iterable<T> objectIds, final boolean reportMissing) {
  238. wantReadAhead = true;
  239. Iterable<FoundObject<T>> order;
  240. IOException error = null;
  241. try {
  242. order = findAll(objectIds);
  243. } catch (IOException e) {
  244. order = Collections.emptyList();
  245. error = e;
  246. }
  247. final Iterator<FoundObject<T>> idItr = order.iterator();
  248. final IOException findAllError = error;
  249. return new AsyncObjectLoaderQueue<T>() {
  250. private FoundObject<T> cur;
  251. public boolean next() throws MissingObjectException, IOException {
  252. if (idItr.hasNext()) {
  253. cur = idItr.next();
  254. return true;
  255. } else if (findAllError != null) {
  256. throw findAllError;
  257. } else {
  258. cancelReadAhead();
  259. return false;
  260. }
  261. }
  262. public T getCurrent() {
  263. return cur.id;
  264. }
  265. public ObjectId getObjectId() {
  266. return cur.id;
  267. }
  268. public ObjectLoader open() throws IOException {
  269. if (cur.pack == null)
  270. throw new MissingObjectException(cur.id, "unknown");
  271. return cur.pack.load(DfsReader.this, cur.offset);
  272. }
  273. public boolean cancel(boolean mayInterruptIfRunning) {
  274. cancelReadAhead();
  275. return true;
  276. }
  277. public void release() {
  278. cancelReadAhead();
  279. }
  280. };
  281. }
  282. @Override
  283. public <T extends ObjectId> AsyncObjectSizeQueue<T> getObjectSize(
  284. Iterable<T> objectIds, final boolean reportMissing) {
  285. wantReadAhead = true;
  286. Iterable<FoundObject<T>> order;
  287. IOException error = null;
  288. try {
  289. order = findAll(objectIds);
  290. } catch (IOException e) {
  291. order = Collections.emptyList();
  292. error = e;
  293. }
  294. final Iterator<FoundObject<T>> idItr = order.iterator();
  295. final IOException findAllError = error;
  296. return new AsyncObjectSizeQueue<T>() {
  297. private FoundObject<T> cur;
  298. private long sz;
  299. public boolean next() throws MissingObjectException, IOException {
  300. if (idItr.hasNext()) {
  301. cur = idItr.next();
  302. if (cur.pack == null)
  303. throw new MissingObjectException(cur.id, "unknown");
  304. sz = cur.pack.getObjectSize(DfsReader.this, cur.offset);
  305. return true;
  306. } else if (findAllError != null) {
  307. throw findAllError;
  308. } else {
  309. cancelReadAhead();
  310. return false;
  311. }
  312. }
  313. public T getCurrent() {
  314. return cur.id;
  315. }
  316. public ObjectId getObjectId() {
  317. return cur.id;
  318. }
  319. public long getSize() {
  320. return sz;
  321. }
  322. public boolean cancel(boolean mayInterruptIfRunning) {
  323. cancelReadAhead();
  324. return true;
  325. }
  326. public void release() {
  327. cancelReadAhead();
  328. }
  329. };
  330. }
  331. @Override
  332. public void walkAdviceBeginCommits(RevWalk walk, Collection<RevCommit> roots) {
  333. wantReadAhead = true;
  334. }
  335. @Override
  336. public void walkAdviceBeginTrees(ObjectWalk ow, RevCommit min, RevCommit max) {
  337. wantReadAhead = true;
  338. }
  339. @Override
  340. public void walkAdviceEnd() {
  341. cancelReadAhead();
  342. }
  343. @Override
  344. public long getObjectSize(AnyObjectId objectId, int typeHint)
  345. throws MissingObjectException, IncorrectObjectTypeException,
  346. IOException {
  347. if (last != null) {
  348. long sz = last.getObjectSize(this, objectId);
  349. if (0 <= sz)
  350. return sz;
  351. }
  352. for (DfsPackFile pack : db.getPacks()) {
  353. if (pack == last)
  354. continue;
  355. long sz = pack.getObjectSize(this, objectId);
  356. if (0 <= sz) {
  357. last = pack;
  358. return sz;
  359. }
  360. }
  361. if (typeHint == OBJ_ANY)
  362. throw new MissingObjectException(objectId.copy(), "unknown");
  363. throw new MissingObjectException(objectId.copy(), typeHint);
  364. }
  365. public DfsObjectToPack newObjectToPack(RevObject obj) {
  366. return new DfsObjectToPack(obj);
  367. }
  368. private static final Comparator<DfsObjectRepresentation> REPRESENTATION_SORT = new Comparator<DfsObjectRepresentation>() {
  369. public int compare(DfsObjectRepresentation a, DfsObjectRepresentation b) {
  370. int cmp = a.packIndex - b.packIndex;
  371. if (cmp == 0)
  372. cmp = Long.signum(a.offset - b.offset);
  373. return cmp;
  374. }
  375. };
  376. public void selectObjectRepresentation(PackWriter packer,
  377. ProgressMonitor monitor, Iterable<ObjectToPack> objects)
  378. throws IOException, MissingObjectException {
  379. DfsPackFile[] packList = db.getPacks();
  380. if (packList.length == 0) {
  381. Iterator<ObjectToPack> itr = objects.iterator();
  382. if (itr.hasNext())
  383. throw new MissingObjectException(itr.next(), "unknown");
  384. return;
  385. }
  386. int objectCount = 0;
  387. int updated = 0;
  388. int posted = 0;
  389. List<DfsObjectRepresentation> all = new BlockList<DfsObjectRepresentation>();
  390. for (ObjectToPack otp : objects) {
  391. boolean found = false;
  392. for (int packIndex = 0; packIndex < packList.length; packIndex++) {
  393. DfsPackFile pack = packList[packIndex];
  394. long p = pack.findOffset(this, otp);
  395. if (0 < p) {
  396. DfsObjectRepresentation r = new DfsObjectRepresentation(otp);
  397. r.pack = pack;
  398. r.packIndex = packIndex;
  399. r.offset = p;
  400. all.add(r);
  401. found = true;
  402. }
  403. }
  404. if (!found)
  405. throw new MissingObjectException(otp, otp.getType());
  406. if ((++updated & 1) == 1) {
  407. monitor.update(1); // Update by 50%, the other 50% is below.
  408. posted++;
  409. }
  410. objectCount++;
  411. }
  412. Collections.sort(all, REPRESENTATION_SORT);
  413. try {
  414. wantReadAhead = true;
  415. for (DfsObjectRepresentation r : all) {
  416. r.pack.representation(this, r);
  417. packer.select(r.object, r);
  418. if ((++updated & 1) == 1 && posted < objectCount) {
  419. monitor.update(1);
  420. posted++;
  421. }
  422. }
  423. } finally {
  424. cancelReadAhead();
  425. }
  426. if (posted < objectCount)
  427. monitor.update(objectCount - posted);
  428. }
  429. public void copyObjectAsIs(PackOutputStream out, ObjectToPack otp,
  430. boolean validate) throws IOException,
  431. StoredObjectRepresentationNotAvailableException {
  432. DfsObjectToPack src = (DfsObjectToPack) otp;
  433. src.pack.copyAsIs(out, src, validate, this);
  434. }
  435. private static final Comparator<ObjectToPack> WRITE_SORT = new Comparator<ObjectToPack>() {
  436. public int compare(ObjectToPack o1, ObjectToPack o2) {
  437. DfsObjectToPack a = (DfsObjectToPack) o1;
  438. DfsObjectToPack b = (DfsObjectToPack) o2;
  439. int cmp = a.packIndex - b.packIndex;
  440. if (cmp == 0)
  441. cmp = Long.signum(a.offset - b.offset);
  442. return cmp;
  443. }
  444. };
  445. public void writeObjects(PackOutputStream out, List<ObjectToPack> list)
  446. throws IOException {
  447. if (list.isEmpty())
  448. return;
  449. // Sorting objects by order in the current packs is usually
  450. // worthwhile. Most packs are going to be OFS_DELTA style,
  451. // where the base must appear before the deltas. If both base
  452. // and delta are to be reused, this ensures the base writes in
  453. // the output first without the recursive write-base-first logic
  454. // used by PackWriter to ensure OFS_DELTA can be used.
  455. //
  456. // Sorting by pack also ensures newer objects go first, which
  457. // typically matches the desired order.
  458. //
  459. // Only do this sorting for OBJ_TREE and OBJ_BLOB. Commits
  460. // are very likely to already be sorted in a good order in the
  461. // incoming list, and if they aren't, JGit's PackWriter has fixed
  462. // the order to be more optimal for readers, so honor that.
  463. switch (list.get(0).getType()) {
  464. case OBJ_TREE:
  465. case OBJ_BLOB:
  466. Collections.sort(list, WRITE_SORT);
  467. }
  468. try {
  469. wantReadAhead = true;
  470. for (ObjectToPack otp : list)
  471. out.writeObject(otp);
  472. } finally {
  473. cancelReadAhead();
  474. }
  475. }
  476. public Collection<CachedPack> getCachedPacks() throws IOException {
  477. DfsPackFile[] packList = db.getPacks();
  478. List<CachedPack> cached = new ArrayList<CachedPack>(packList.length);
  479. for (DfsPackFile pack : packList) {
  480. DfsPackDescription desc = pack.getPackDescription();
  481. if (canBeCachedPack(desc))
  482. cached.add(new DfsCachedPack(pack));
  483. }
  484. return cached;
  485. }
  486. private static boolean canBeCachedPack(DfsPackDescription desc) {
  487. return desc.getTips() != null && !desc.getTips().isEmpty();
  488. }
  489. public void copyPackAsIs(PackOutputStream out, CachedPack pack,
  490. boolean validate) throws IOException {
  491. try {
  492. wantReadAhead = true;
  493. ((DfsCachedPack) pack).copyAsIs(out, validate, this);
  494. } finally {
  495. cancelReadAhead();
  496. }
  497. }
  498. /**
  499. * Copy bytes from the window to a caller supplied buffer.
  500. *
  501. * @param pack
  502. * the file the desired window is stored within.
  503. * @param position
  504. * position within the file to read from.
  505. * @param dstbuf
  506. * destination buffer to copy into.
  507. * @param dstoff
  508. * offset within <code>dstbuf</code> to start copying into.
  509. * @param cnt
  510. * number of bytes to copy. This value may exceed the number of
  511. * bytes remaining in the window starting at offset
  512. * <code>pos</code>.
  513. * @return number of bytes actually copied; this may be less than
  514. * <code>cnt</code> if <code>cnt</code> exceeded the number of bytes
  515. * available.
  516. * @throws IOException
  517. * this cursor does not match the provider or id and the proper
  518. * window could not be acquired through the provider's cache.
  519. */
  520. int copy(DfsPackFile pack, long position, byte[] dstbuf, int dstoff, int cnt)
  521. throws IOException {
  522. if (cnt == 0)
  523. return 0;
  524. long length = pack.length;
  525. if (0 <= length && length <= position)
  526. return 0;
  527. int need = cnt;
  528. do {
  529. pin(pack, position);
  530. int r = block.copy(position, dstbuf, dstoff, need);
  531. position += r;
  532. dstoff += r;
  533. need -= r;
  534. if (length < 0)
  535. length = pack.length;
  536. } while (0 < need && position < length);
  537. return cnt - need;
  538. }
  539. void copyPackAsIs(DfsPackFile pack, long length, boolean validate,
  540. PackOutputStream out) throws IOException {
  541. MessageDigest md = null;
  542. if (validate) {
  543. md = Constants.newMessageDigest();
  544. byte[] buf = out.getCopyBuffer();
  545. pin(pack, 0);
  546. if (block.copy(0, buf, 0, 12) != 12) {
  547. pack.setInvalid();
  548. throw new IOException(JGitText.get().packfileIsTruncated);
  549. }
  550. md.update(buf, 0, 12);
  551. }
  552. long position = 12;
  553. long remaining = length - (12 + 20);
  554. while (0 < remaining) {
  555. pin(pack, position);
  556. int ptr = (int) (position - block.start);
  557. int n = (int) Math.min(block.size() - ptr, remaining);
  558. block.write(out, position, n, md);
  559. position += n;
  560. remaining -= n;
  561. }
  562. if (md != null) {
  563. byte[] buf = new byte[20];
  564. byte[] actHash = md.digest();
  565. pin(pack, position);
  566. if (block.copy(position, buf, 0, 20) != 20) {
  567. pack.setInvalid();
  568. throw new IOException(JGitText.get().packfileIsTruncated);
  569. }
  570. if (!Arrays.equals(actHash, buf)) {
  571. pack.setInvalid();
  572. throw new IOException(MessageFormat.format(
  573. JGitText.get().packfileCorruptionDetected,
  574. pack.getPackDescription().getPackName()));
  575. }
  576. }
  577. }
  578. /**
  579. * Inflate a region of the pack starting at {@code position}.
  580. *
  581. * @param pack
  582. * the file the desired window is stored within.
  583. * @param position
  584. * position within the file to read from.
  585. * @param dstbuf
  586. * destination buffer the inflater should output decompressed
  587. * data to.
  588. * @param headerOnly
  589. * if true the caller wants only {@code dstbuf.length} bytes.
  590. * @return updated <code>dstoff</code> based on the number of bytes
  591. * successfully inflated into <code>dstbuf</code>.
  592. * @throws IOException
  593. * this cursor does not match the provider or id and the proper
  594. * window could not be acquired through the provider's cache.
  595. * @throws DataFormatException
  596. * the inflater encountered an invalid chunk of data. Data
  597. * stream corruption is likely.
  598. */
  599. int inflate(DfsPackFile pack, long position, byte[] dstbuf,
  600. boolean headerOnly) throws IOException, DataFormatException {
  601. prepareInflater();
  602. pin(pack, position);
  603. int dstoff = 0;
  604. for (;;) {
  605. dstoff = block.inflate(inf, position, dstbuf, dstoff);
  606. if (headerOnly && dstoff == dstbuf.length)
  607. return dstoff;
  608. if (inf.needsInput()) {
  609. position += block.remaining(position);
  610. pin(pack, position);
  611. } else if (inf.finished())
  612. return dstoff;
  613. else
  614. throw new DataFormatException();
  615. }
  616. }
  617. DfsBlock quickCopy(DfsPackFile p, long pos, long cnt)
  618. throws IOException {
  619. pin(p, pos);
  620. if (block.contains(p.key, pos + (cnt - 1)))
  621. return block;
  622. return null;
  623. }
  624. Inflater inflater() {
  625. prepareInflater();
  626. return inf;
  627. }
  628. private void prepareInflater() {
  629. if (inf == null)
  630. inf = InflaterCache.get();
  631. else
  632. inf.reset();
  633. }
  634. void pin(DfsPackFile pack, long position) throws IOException {
  635. DfsBlock b = block;
  636. if (b == null || !b.contains(pack.key, position)) {
  637. // If memory is low, we may need what is in our window field to
  638. // be cleaned up by the GC during the get for the next window.
  639. // So we always clear it, even though we are just going to set
  640. // it again.
  641. //
  642. block = null;
  643. if (pendingReadAhead != null)
  644. waitForBlock(pack.key, position);
  645. block = pack.getOrLoadBlock(position, this);
  646. }
  647. }
  648. boolean wantReadAhead() {
  649. return wantReadAhead;
  650. }
  651. void startedReadAhead(List<ReadAheadTask.BlockFuture> blocks) {
  652. if (pendingReadAhead == null)
  653. pendingReadAhead = new LinkedList<ReadAheadTask.BlockFuture>();
  654. pendingReadAhead.addAll(blocks);
  655. }
  656. private void cancelReadAhead() {
  657. if (pendingReadAhead != null) {
  658. for (ReadAheadTask.BlockFuture f : pendingReadAhead)
  659. f.cancel(true);
  660. pendingReadAhead = null;
  661. }
  662. wantReadAhead = false;
  663. }
  664. private void waitForBlock(DfsPackKey key, long position)
  665. throws InterruptedIOException {
  666. Iterator<ReadAheadTask.BlockFuture> itr = pendingReadAhead.iterator();
  667. while (itr.hasNext()) {
  668. ReadAheadTask.BlockFuture f = itr.next();
  669. if (f.contains(key, position)) {
  670. try {
  671. f.get();
  672. } catch (InterruptedException e) {
  673. throw new InterruptedIOException();
  674. } catch (ExecutionException e) {
  675. // Exceptions should never be thrown by get(). Ignore
  676. // this and let the normal load paths identify any error.
  677. }
  678. itr.remove();
  679. if (pendingReadAhead.isEmpty())
  680. pendingReadAhead = null;
  681. break;
  682. }
  683. }
  684. }
  685. /** Release the current window cursor. */
  686. @Override
  687. public void release() {
  688. cancelReadAhead();
  689. last = null;
  690. block = null;
  691. baseCache = null;
  692. try {
  693. InflaterCache.release(inf);
  694. } finally {
  695. inf = null;
  696. }
  697. }
  698. }