| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
Change-Id: I7a1ae73783c95041b59f047a7330e62e7f642149
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
|
|
|
|
|
| |
Change-Id: I9ec247135d93ef28d732e94f18d0ec1d0e2e6d44
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
|
|
|
|
|
| |
Change-Id: Ib099ec93d8243b238641d79328216874532ab5eb
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
|
|
|
|
|
| |
Change-Id: Iadcec7e5973600e005cbdeb837fa197d3ae2ea86
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
|
|
|
|
|
| |
Change-Id: I1244f6639263d156a6f9e4530167e5eb1826a535
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
|
|
|
|
|
| |
Change-Id: I1b989d3101272632eacabe25a0b111ad0ff5bb3b
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
|
|
|
|
|
|
|
|
|
|
| |
We should use a template for Mylyn commit messages that matches with our
guidelines for commit messages.
http://wiki.eclipse.org/EGit/Contributor_Guide#Commit_message_guidelines
Bug: 337401
Change-Id: I05812abf0eb0651d22c439142640f173fc2f2ba0
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
|
|
|
|
| |
Change-Id: I8dda83cdbe88beba4a480df9846848bf3aceb9e2
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
|
|
|
|
|
| |
Change-Id: Ie6d65fe45ad92c813ce3a227729aa43681922249
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Originally I put the first two digits of the object SHA-1 into the
start of a row key to try and spread the load of objects around a DHT
service. Unfortunately this tends to not work as well as I had hoped.
Servers reading a repository need to contact every node in a DHT
cluster if the cluster tries to evenly distribute the object rows.
This is a lot of connections, especially if the cluster has many
backend storage servers. If the library has an open connection
limit (possibly due to JVM file descriptor limitations) it may need
to open and close a lot of connections to access a repository,
rather than being able to reuse the same connection to a handful
of backend servers. This results in a lot of connection thrashing
for some DHT type databases, and is inefficient.
Some DHTs are able to operate even if part of the database space
is currently unavailable. For example, a DHT service might assign
some section of the key space to a node, and then fail that section
over to another node when the primary is noticed as being offline.
During that failover period that section of the key space is not
available, but other sections hosted by other backends are still
ready for service. Spreading keys all over the cluster makes it
likely that any single backend being temporarily down means the
entire cluster is down, rather than only some.
This is a massive schema change, but it should improve relability
and performance for any DHT system.
Change-Id: I6b65bfb4c14b6f7bd323c2bd0638b49d429245be
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
| |
Change-Id: I4cf017cd567543846839612ab3ace6d26233e01d
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
|
|
|
|
|
| |
Change-Id: I4dec8eba7e35858aef65fcc10f91fad3fe5b52b9
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
|
|
|
|
|
| |
Change-Id: I574a05200471c431b3a02ac6ff208dc6aa90f539
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
|
|
|
|
|
| |
Change-Id: I6018ce0cd3b7c8137e137848fe1f04551b257538
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
|
|
jgit.storage.dht is a storage provider implementation for JGit that
permits storing the Git repository in a distributed hashtable, NoSQL
system, or other database. The actual underlying storage system is
undefined, and can be plugged in by implementing 7 small interfaces:
* Database
* RepositoryIndexTable
* RepositoryTable
* RefTable
* ChunkTable
* ObjectIndexTable
* WriteBuffer
The storage provider interface tries to assume very little about the
underlying storage system, and requires only three key features:
* key -> value lookup (a hashtable is suitable)
* atomic updates on single rows
* asynchronous operations (Java's ExecutorService is easy to use)
Most NoSQL database products offer all 3 of these features in their
clients, and so does any decent network based cache system like the
open source memcache product. Relying only on key equality for data
retrevial makes it simple for the storage engine to distribute across
multiple machines. Traditional SQL systems could also be used with a
JDBC based spi implementation.
Before submitting this change I have implemented six storage systems
for the spi layer:
* Apache HBase[1]
* Apache Cassandra[2]
* Google Bigtable[3]
* an in-memory implementation for unit testing
* a JDBC implementation for SQL
* a generic cache provider that can ride on top of memcache
All six systems came in with an spi layer around 1000 lines of code to
implement the above 7 interfaces. This is a huge reduction in size
compared to prior attempts to implement a new JGit storage layer. As
this package shows, a complete JGit storage implementation is more
than 17,000 lines of fairly complex code.
A simple cache is provided in storage.dht.spi.cache. Implementers can
use CacheDatabase to wrap any other type of Database and perform fast
reads against a network based cache service, such as the open source
memcached[4]. An implementation of CacheService must be provided to
glue this spi onto the network cache.
[1] https://github.com/spearce/jgit_hbase
[2] https://github.com/spearce/jgit_cassandra
[3] http://labs.google.com/papers/bigtable.html
[4] http://memcached.org/
Change-Id: I0aa4072781f5ccc019ca421c036adff2c40c4295
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|