mirrors/jgit - jgit - source @ dussan.org

Commit grafiek

Auteur	SHA1	Bericht	Datum
Minh Thai	513d80db10	Lazily open ReadableChannel in BlockBasedFile.getOrLoadBlock To avoid opening the readable channel in case of DfsBlockCache hits. Also cleaning up typos around DfsBlockCache. Change-Id: I615e349cb4838387c1e6743cdc384d1b81b54369 Signed-off-by: Minh Thai <mthai@google.com>	5 jaren geleden
Minh Thai	8bc9acf264	DfsBlockCache to lock while loading object references We see the same index being loaded by multiple threads. Each is hundreds of MB and takes several seconds to load, causing server to run out of memory. This change introduces a lock to avoid these duplicate works. It uses a new set of locks similar in implementation to the loadLocks for getOrLoad of blocks. The locks are kept separate to prevent long-running index loading from blocking out fast block loading. The cache instance can be configured with a consumer to monitor the wait time of the new locks. Change-Id: I44962fe84093456962d5981545e3f7851ecb6e43 Signed-off-by: Minh Thai <mthai@google.com>	5 jaren geleden
Han-Wen Nienhuys	6d370d837c	Remove 'final' in parameter lists Change-Id: Id924f79c8b2c720297ebc49bf9c5d4ddd6d52547 Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>	6 jaren geleden
Dave Borowitz	14167272c2	Enforce DFS blockLimit is a multiple of blockSize Change-Id: I2821124ff88d7d1812a846ed20f3828fc9123b38	6 jaren geleden
Matthias Sohn	a224b78675	Fix javadoc in org.eclipse.jgit dfs package Change-Id: I1f5e3dc3ba34b323ee7244dbefee207ce19e6021 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	6 jaren geleden
Jonathan Nieder	061d24f6d5	Remove @since tags from internal packages These packages don't use @since tags because they are not part of the stable public API. Some @since tags snuck in, though. Remove them to make the convention easier to find for new contributors and the expectations clearer for users. Change-Id: I6c17d3cfc93657f1b33cf5c5708f2b1c712b0d31	7 jaren geleden
Philipp Marx	8adbfe4da6	Check that DfsBlockCache#blockSize is a power of 2 In case a value is used which isn’t a power of 2 there will be a high chance of java.lang.ArrayIndexOutBoundsException and org.eclipse.jgit.errors.CorruptObjectException due to a mismatching assumption for the DfsBlockCache#blockSizeShift parameter. Change-Id: Ib348b3704edf10b5f93a3ffab4fa6f09cbbae231 Signed-off-by: Philipp Marx <smigfu@googlemail.com>	7 jaren geleden
Philipp Marx	ccc899773e	Add "concurrencyLevel" option to DfsBlockCache Allow for higher concurrency on DfsBlockCache by adding a configuration for number of estimated concurrent requests. Change-Id: Ia65e58ecb2c459b6d9c9697a2f715d933270f7e6 Signed-off-by: Philipp Marx <smigfu@googlemail.com>	7 jaren geleden
Shawn Pearce	c761c8bb5c	Avoid storing large packs in block cache during reuse When a large pack (> 30% of the block cache) is being reused by copying it pollutes the block cache with noise by storing blocks that are never referenced again. Avoid this by streaming the file directly from its channel onto the output stream. Change-Id: I2e53de27f3dcfb93de68b1fad45f75ab23e79fe7	9 jaren geleden
Shawn Pearce	56497be34d	Delete broken DFS read-ahead support This implementation has been proven to deadlock in production server loads. Google has been running with it disabled for a quite a while, as the bugs have been difficult to identify and fix. Instead of suggesting it works and is useful, drop the code. JGit should not advertise support for functionality that is known to be broken. In a few of the places where read-ahead was enabled by DfsReader there is more information about what blocks should be loaded when. During object representation selection, or size lookup, or sending object as-is to a PackWriter, or sending an entire pack as-is the reader knows exactly which blocks are required in the cache, and it also can compute when those will be needed. The broken read-ahead code was stupid and just read a fixed amount ahead of the current offset, which can waste IOs if more precise data was available. DFS systems are usually slow to respond so read-ahead is still a desired feature, but it needs to be rebuilt from scratch and make better use of the offset information. Change-Id: Ibaed8288ec3340cf93eb269dc0f1f23ab5ab1aea	11 jaren geleden
Shawn Pearce	f32b861243	JGit 3.0: move internal classes into an internal subpackage This breaks all existing callers once. Applications are not supposed to build against the internal storage API unless they can accept API churn and make necessary updates as versions change. Change-Id: I2ab1327c202ef2003565e1b0770a583970e432e9	11 jaren geleden
Robin Rosenberg	c310fa0c80	Mark non-externalizable strings as such A few classes such as Constanrs are marked with @SuppressWarnings, as are toString() methods with many liternal, but otherwise $NLS-n$ is used for string containing text that should not be translated. A few literals may fall into the gray zone, but mostly I've tried to only tag the obvious ones. Change-Id: I22e50a77e2bf9e0b842a66bdf674e8fa1692f590	11 jaren geleden
Shawn O. Pearce	fa4cc2475f	DFS: A storage layer for JGit In practice the DHT storage layer has not been performing as well as large scale server environments want to see from a Git server. The performance of the DHT schema degrades rapidly as small changes are pushed into the repository due to the chunk size being less than 1/3 of the pushed pack size. Small chunks cause poor prefetch performance during reading, and require significantly longer prefetch lists inside of the chunk meta field to work around the small size. The DHT code is very complex (>17,000 lines of code) and is very sensitive to the underlying database round-trip time, as well as the way objects were written into the pack stream that was chunked and stored on the database. A poor pack layout (from any version of C Git prior to Junio reworking it) can cause the DHT code to be unable to enumerate the objects of the linux-2.6 repository in a completable time scale. Performing a clone from a DHT stored repository of 2 million objects takes 2 million row lookups in the DHT to locate the OBJECT_INDEX row for each object being cloned. This is very difficult for some DHTs to scale, even at 5000 rows/second the lookup stage alone takes 6 minutes (on local filesystem, this is almost too fast to bother measuring). Some servers like Apache Cassandra just fall over and cannot complete the 2 million lookups in rapid fire. On a ~400 MiB repository, the DHT schema has an extra 25 MiB of redundant data that gets downloaded to the JGit process, and that is before you consider the cost of the OBJECT_INDEX table also being fully loaded, which is at least 223 MiB of data for the linux kernel repository. In the DHT schema answering a `git clone` of the ~400 MiB linux kernel needs to load 248 MiB of "index" data from the DHT, in addition to the ~400 MiB of pack data that gets sent to the client. This is 193 MiB more data to be accessed than the native filesystem format, but it needs to come over a much smaller pipe (local Ethernet typically) than the local SATA disk drive. I also never got around to writing the "repack" support for the DHT schema, as it turns out to be fairly complex to safely repack data in the repository while also trying to minimize the amount of changes made to the database, due to very common limitations on database mutation rates.. This new DFS storage layer fixes a lot of those issues by taking the simple approach for storing relatively standard Git pack and index files on an abstract filesystem. Packs are accessed by an in-process buffer cache, similar to the WindowCache used by the local filesystem storage layer. Unlike the local file IO, there are some assumptions that the storage system has relatively high latency and no concept of "file handles". Instead it looks at the file more like HTTP byte range requests, where a read channel is a simply a thunk to trigger a read request over the network. The DFS code in this change is still abstract, it does not store on any particular filesystem, but is fairly well suited to the Amazon S3 or Apache Hadoop HDFS. Storing packs directly on HDFS rather than HBase removes a layer of abstraction, as most HBase row reads turn into an HDFS read. Most of the DFS code in this change was blatently copied from the local filesystem code. Most parts should be refactored to be shared between the two storage systems, but right now I am hesistent to do this due to how well tuned the local filesystem code currently is. Change-Id: Iec524abdf172e9ec5485d6c88ca6512cd8a6eafb	13 jaren geleden

11 Commits (3cea3676c75127dd720ea4c0b86d92ed040f7fa7)