summaryrefslogtreecommitdiffstats
path: root/org.eclipse.jgit.storage.dht
Commit message (Collapse)AuthorAgeFilesLines
* DHT: Use a proper HashMap for RecentChunk lookupsShawn O. Pearce2011-06-091-12/+16
| | | | | | | | | | | | | | | | A linear search is somewhat acceptable for only 4 recent chunks, but a HashMap based lookup would be better. The table will have 16 slots by default and given the hashCode() of ChunkKey is derived from the SHA-1 of the chunk, each chunk will fall into its own bucket within the table and thus evaluate only 1 entry during lookup instead of 4. Some users may also want to devote more memory to the recent chunks, in which case expanding this list to a longer length will help to reduce chunk faults, but would increase search time. Using a HashMap will help this code to scale to larger sizes better. Change-Id: Ia41b7a1cc69ad27b85749e3b74cbf8d0aa338044 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* DHT: Always have at least one recent chunk in DhtReaderShawn O. Pearce2011-06-091-1/+1
| | | | | | | | | | | | | | | The RecentChunks cache assumes there is always at least one recent chunk in the maxSize that it receives from the DhtReaderOptions. Ensure that is true by requiring the size to be at least 1. Running with 0 recent chunk cache is very a bad idea, often during commit walking the parents of a commit will be found on the same chunk as the commit that was just accessed. In these cases its a good idea to keep that last chunk around so the parents can be quickly accessed. Change-Id: I33b65286e8a4cbf6ef4ced28c547837f173e065d Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* DHT: Fix NPE during prefetchShawn O. Pearce2011-06-091-1/+1
| | | | | | | | | | The Prefetcher may have loaded a chunk that is a fragment, if the DhtReader is scanning the Prefetcher's chunks for a particular object fragment chunks will be missing the index and NPE during the findOffset() call into the index itself. Change-Id: Ie2823724c289f745655076c5209acec32361a1ea Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* DHT: Drop leading hash digits from row keysShawn O. Pearce2011-06-092-20/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Originally I put the first two digits of the object SHA-1 into the start of a row key to try and spread the load of objects around a DHT service. Unfortunately this tends to not work as well as I had hoped. Servers reading a repository need to contact every node in a DHT cluster if the cluster tries to evenly distribute the object rows. This is a lot of connections, especially if the cluster has many backend storage servers. If the library has an open connection limit (possibly due to JVM file descriptor limitations) it may need to open and close a lot of connections to access a repository, rather than being able to reuse the same connection to a handful of backend servers. This results in a lot of connection thrashing for some DHT type databases, and is inefficient. Some DHTs are able to operate even if part of the database space is currently unavailable. For example, a DHT service might assign some section of the key space to a node, and then fail that section over to another node when the primary is noticed as being offline. During that failover period that section of the key space is not available, but other sections hosted by other backends are still ready for service. Spreading keys all over the cluster makes it likely that any single backend being temporarily down means the entire cluster is down, rather than only some. This is a massive schema change, but it should improve relability and performance for any DHT system. Change-Id: I6b65bfb4c14b6f7bd323c2bd0638b49d429245be Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* Merge branch 'stable-1.0'Matthias Sohn2011-06-093-1/+62
|\ | | | | | | | | | | | | | | | | | | | | | | * stable-1.0: Prepare post JGit v1.0.0.201106090707-r builds JGit v1.0.0.201106090707-r Include about.html files in maven build Prepare post v1.0.0.201106081625-r builds JGit v1.0.0.201106081625-r Add missing about.html files to all shipped bundles Prepare post v1.0.0.201106071701-r builds JGit v1.0.0.201106071701-r
| * Prepare post JGit v1.0.0.201106090707-r buildsstable-1.0Matthias Sohn2011-06-092-13/+13
| | | | | | | | | | Change-Id: I35292f9f6fb5ebc591308fdd2d069203413e189d Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
| * JGit v1.0.0.201106090707-rv1.0.0.201106090707-rMatthias Sohn2011-06-092-13/+13
| | | | | | | | | | Change-Id: Iba44e71b6441a0e39122ca8666b51989e605f25f Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
| * Include about.html files in maven buildMatthias Sohn2011-06-091-0/+1
| | | | | | | | | | Change-Id: Ifa96090eb0fc336ee8080385f48212b5158dd9f7 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
| * Prepare post v1.0.0.201106081625-r buildsMatthias Sohn2011-06-092-13/+13
| | | | | | | | | | Change-Id: I5e6994844405f7839ad3b3439f98bcadb59d329b Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
| * JGit v1.0.0.201106081625-rv1.0.0.201106081625-rMatthias Sohn2011-06-082-13/+13
| | | | | | | | | | Change-Id: I629990189083bab4737938ad712080fba7917582 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
| * Add missing about.html files to all shipped bundlesMatthias Sohn2011-06-082-1/+61
| | | | | | | | | | Change-Id: I5a4ad9493da3816f21d9fdd0b5b977388d074500 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
| * Prepare post v1.0.0.201106071701-r buildsMatthias Sohn2011-06-082-13/+13
| | | | | | | | | | Change-Id: I67ee2912ef54462cf860dc4ec0a6334e9c619384 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
| * JGit v1.0.0.201106071701-rv1.0.0.201106071701-rMatthias Sohn2011-06-072-13/+13
| | | | | | | | | | Change-Id: Ic8f49336ba96c8dcf4bab2f74c0f1efc1ab55131 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
| * Prepare post v1.0.0.201106051725-r buildsMatthias Sohn2011-06-062-13/+13
| | | | | | | | | | Change-Id: I4839877e1a6fa7782f37423213af8d579727a494 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
| * JGit v1.0.0.201106051725-rv1.0.0.201106051725-rMatthias Sohn2011-06-052-13/+13
| | | | | | | | | | Change-Id: I39f4a23cf284505395d511dfedf02b7f5608df95 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
* | Prepare 1.1.0 buildsMatthias Sohn2011-06-062-18/+18
|/ | | | | Change-Id: I4cf017cd567543846839612ab3ace6d26233e01d Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
* Prepare post v1.0.0.201106011211-rc3 buildsMatthias Sohn2011-06-012-13/+13
| | | | | Change-Id: I4dec8eba7e35858aef65fcc10f91fad3fe5b52b9 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
* JGit v1.0.0.201106011211-rc3v1.0.0.201106011211-rc3Matthias Sohn2011-06-012-13/+13
| | | | | Change-Id: I574a05200471c431b3a02ac6ff208dc6aa90f539 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
* Remove incubation markerMatthias Sohn2011-05-311-1/+1
| | | | | Change-Id: I6018ce0cd3b7c8137e137848fe1f04551b257538 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
* DHT: Support removing a repository nameShawn O. Pearce2011-05-313-0/+43
| | | | | | | | | The first step to deleting a repository from the DHT storage is to remove the name binding in the RepositoryIndexTable, making the repository unavailable for lookup. Change-Id: I469bf92f4bf2f555a15949569b21937c14cb142b Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* DHT: Fix thread-safety issue in AbstractWriteBufferShawn O. Pearce2011-05-311-7/+18
| | | | | | | | | There is a data corruption issue with the 'running' list if a background thread schedules something onto the buffer while the application thread is also using it. Change-Id: I5ba78b98b6632965d677a9c8f209f0cf8320cc3d Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* DHT: Add sequence RefDataShawn O. Pearce2011-05-253-129/+197
| | | | | | | | | | | | | | | | | | RefData now uses a sequence number as part of the field, ensuring that updates always increase the sequence number by one whenever a reference is modified. Attaching a sequence number to RefData will help with storing reference log entries during updates. As the sequence number should be unique within the reference name space, log entries can be keyed by the sequence number and remain unique. Making this work over reference delete-create cycles will require an additional RefTable API to return the oldest sequence number previously used in the reference log to seed the recreated reference. Change-Id: I11cfff2a96ef962e57f29925a3eef41bdbf9f9bb Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
* DHT: Replace TinyProtobuf with Google Protocol BuffersShawn O. Pearce2011-05-2541-2668/+994
| | | | | | | | | | | | | | | | | | | | | | | | | | | The standard Google distribution of Protocol Buffers in Java is better maintained than TinyProtobuf, and should be faster for most uses. It does use slightly more memory due to many of our key types being stored as strings in protobuf messages, but this is probably worth the small hit to memory in exchange for better maintained code that is easier to reuse in other applications. Exposing all of our data members to the underlying implementation makes it easier to develop reporting and data mining tools, or to expand out a nested structure like RefData into a flat format in a SQL database table. Since the C++ `protoc` tool is necessary to convert the protobuf script into Java code, the generated files are committed as part of the source repository to make it easier for developers who do not have this tool installed to still build the overall JGit package and make use of it. Reviewers will need to be careful to ensure that any edits made to a *.proto file come in a commit that also updates the generated code to match. CQ: 5135 Change-Id: I53e11e82c186b9cf0d7b368e0276519e6a0b2893 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
* DHT: Remove per-process ChunkCacheShawn O. Pearce2011-05-256-626/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | Performance testing has indicated the per-process ChunkCache isn't very effective for the DHT storage implementation. If a server is using the DHT storage backend, it is most likely part of a larger cluster where requests are distributed in a round-robin fashion between the member servers. In such a scenario there is insufficient data locality between requests to get a good hit ratio on the per-process ChunkCache. A low hit ratio means the cache is actually hurting performance by eating up memory that could otherwise be used for transient request data, and increasing pressure on the GC when it needs to find free space. Remove all of the ChunkCache code. Installations that want to cache (to reduce database usage) should wrap their Database with a CacheDatabase and use a network based CacheServer. I left the ChunkCache in the original DHT storage commit because I wanted to document in the history of the project that its probably worth *not* having, but leave open a door for someone to revert this change if they find otherwise at a later date. Change-Id: I364d0725c46c5a19f7443642a40c89ba4d3fdd29 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
* Store Git on any DHTShawn O. Pearce2011-05-0598-0/+19928
jgit.storage.dht is a storage provider implementation for JGit that permits storing the Git repository in a distributed hashtable, NoSQL system, or other database. The actual underlying storage system is undefined, and can be plugged in by implementing 7 small interfaces: * Database * RepositoryIndexTable * RepositoryTable * RefTable * ChunkTable * ObjectIndexTable * WriteBuffer The storage provider interface tries to assume very little about the underlying storage system, and requires only three key features: * key -> value lookup (a hashtable is suitable) * atomic updates on single rows * asynchronous operations (Java's ExecutorService is easy to use) Most NoSQL database products offer all 3 of these features in their clients, and so does any decent network based cache system like the open source memcache product. Relying only on key equality for data retrevial makes it simple for the storage engine to distribute across multiple machines. Traditional SQL systems could also be used with a JDBC based spi implementation. Before submitting this change I have implemented six storage systems for the spi layer: * Apache HBase[1] * Apache Cassandra[2] * Google Bigtable[3] * an in-memory implementation for unit testing * a JDBC implementation for SQL * a generic cache provider that can ride on top of memcache All six systems came in with an spi layer around 1000 lines of code to implement the above 7 interfaces. This is a huge reduction in size compared to prior attempts to implement a new JGit storage layer. As this package shows, a complete JGit storage implementation is more than 17,000 lines of fairly complex code. A simple cache is provided in storage.dht.spi.cache. Implementers can use CacheDatabase to wrap any other type of Database and perform fast reads against a network based cache service, such as the open source memcached[4]. An implementation of CacheService must be provided to glue this spi onto the network cache. [1] https://github.com/spearce/jgit_hbase [2] https://github.com/spearce/jgit_cassandra [3] http://labs.google.com/papers/bigtable.html [4] http://memcached.org/ Change-Id: I0aa4072781f5ccc019ca421c036adff2c40c4295 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>