mirrors/jgit - jgit - source @ dussan.org

1617 Ревизии

49 Клонове

Клон: stable-1.1

Commit Graph

Автор	SHA1	Съобщение	Дата
Shawn O. Pearce	d34ec12019	DHT: Change DhtReadher caches to be dynamic by workload Instead of fixing the prefetch queue and recent chunk queue as different sizes, allow these to share the same limit but be scaled based on the work being performed. During walks about 20% of the space will be given to the prefetcher, and the other 80% will be used by the recent chunks cache. This should improve cases where there is bad locality between chunks. During writing of a pack stream, 90-100% of the space should be made available to the prefetcher, as the prefetch plan is usually very accurate about the order chunks will be needed in. Change-Id: I1ca7acb4518e66eb9d4138fb753df38e7254704d Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	преди 13 години
Shawn O. Pearce	7cad0adc7d	DHT: Remove per-process ChunkCache Performance testing has indicated the per-process ChunkCache isn't very effective for the DHT storage implementation. If a server is using the DHT storage backend, it is most likely part of a larger cluster where requests are distributed in a round-robin fashion between the member servers. In such a scenario there is insufficient data locality between requests to get a good hit ratio on the per-process ChunkCache. A low hit ratio means the cache is actually hurting performance by eating up memory that could otherwise be used for transient request data, and increasing pressure on the GC when it needs to find free space. Remove all of the ChunkCache code. Installations that want to cache (to reduce database usage) should wrap their Database with a CacheDatabase and use a network based CacheServer. I left the ChunkCache in the original DHT storage commit because I wanted to document in the history of the project that its probably worth not having, but leave open a door for someone to revert this change if they find otherwise at a later date. Change-Id: I364d0725c46c5a19f7443642a40c89ba4d3fdd29 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>	преди 13 години
Shawn O. Pearce	de8946c0c2	Store Git on any DHT jgit.storage.dht is a storage provider implementation for JGit that permits storing the Git repository in a distributed hashtable, NoSQL system, or other database. The actual underlying storage system is undefined, and can be plugged in by implementing 7 small interfaces: * Database * RepositoryIndexTable * RepositoryTable * RefTable * ChunkTable * ObjectIndexTable * WriteBuffer The storage provider interface tries to assume very little about the underlying storage system, and requires only three key features: * key -> value lookup (a hashtable is suitable) * atomic updates on single rows * asynchronous operations (Java's ExecutorService is easy to use) Most NoSQL database products offer all 3 of these features in their clients, and so does any decent network based cache system like the open source memcache product. Relying only on key equality for data retrevial makes it simple for the storage engine to distribute across multiple machines. Traditional SQL systems could also be used with a JDBC based spi implementation. Before submitting this change I have implemented six storage systems for the spi layer: * Apache HBase[1] * Apache Cassandra[2] * Google Bigtable[3] * an in-memory implementation for unit testing * a JDBC implementation for SQL * a generic cache provider that can ride on top of memcache All six systems came in with an spi layer around 1000 lines of code to implement the above 7 interfaces. This is a huge reduction in size compared to prior attempts to implement a new JGit storage layer. As this package shows, a complete JGit storage implementation is more than 17,000 lines of fairly complex code. A simple cache is provided in storage.dht.spi.cache. Implementers can use CacheDatabase to wrap any other type of Database and perform fast reads against a network based cache service, such as the open source memcached[4]. An implementation of CacheService must be provided to glue this spi onto the network cache. [1] https://github.com/spearce/jgit_hbase [2] https://github.com/spearce/jgit_cassandra [3] http://labs.google.com/papers/bigtable.html [4] http://memcached.org/ Change-Id: I0aa4072781f5ccc019ca421c036adff2c40c4295 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	преди 13 години

Автор

SHA1

Съобщение

Дата

Shawn O. Pearce

d34ec12019

DHT: Change DhtReadher caches to be dynamic by workload

Instead of fixing the prefetch queue and recent chunk queue as
different sizes, allow these to share the same limit but be scaled
based on the work being performed.

During walks about 20% of the space will be given to the prefetcher,
and the other 80% will be used by the recent chunks cache. This
should improve cases where there is bad locality between chunks.

During writing of a pack stream, 90-100% of the space should be
made available to the prefetcher, as the prefetch plan is usually
very accurate about the order chunks will be needed in.

Change-Id: I1ca7acb4518e66eb9d4138fb753df38e7254704d
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>

преди 13 години

Shawn O. Pearce

7cad0adc7d

DHT: Remove per-process ChunkCache

Performance testing has indicated the per-process ChunkCache isn't
very effective for the DHT storage implementation.  If a server is
using the DHT storage backend, it is most likely part of a larger
cluster where requests are distributed in a round-robin fashion
between the member servers.

In such a scenario there is insufficient data locality between
requests to get a good hit ratio on the per-process ChunkCache.  A low
hit ratio means the cache is actually hurting performance by eating up
memory that could otherwise be used for transient request data, and
increasing pressure on the GC when it needs to find free space.

Remove all of the ChunkCache code.  Installations that want to cache
(to reduce database usage) should wrap their Database with a
CacheDatabase and use a network based CacheServer.

I left the ChunkCache in the original DHT storage commit because I
wanted to document in the history of the project that its probably
worth *not* having, but leave open a door for someone to revert this
change if they find otherwise at a later date.

Change-Id: I364d0725c46c5a19f7443642a40c89ba4d3fdd29
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>

преди 13 години

Shawn O. Pearce

de8946c0c2

Store Git on any DHT

jgit.storage.dht is a storage provider implementation for JGit that
permits storing the Git repository in a distributed hashtable, NoSQL
system, or other database.  The actual underlying storage system is
undefined, and can be plugged in by implementing 7 small interfaces:

  *  Database
  *  RepositoryIndexTable
  *  RepositoryTable
  *  RefTable
  *  ChunkTable
  *  ObjectIndexTable
  *  WriteBuffer

The storage provider interface tries to assume very little about the
underlying storage system, and requires only three key features:

  *  key -> value lookup (a hashtable is suitable)
  *  atomic updates on single rows
  *  asynchronous operations (Java's ExecutorService is easy to use)

Most NoSQL database products offer all 3 of these features in their
clients, and so does any decent network based cache system like the
open source memcache product.  Relying only on key equality for data
retrevial makes it simple for the storage engine to distribute across
multiple machines.  Traditional SQL systems could also be used with a
JDBC based spi implementation.

Before submitting this change I have implemented six storage systems
for the spi layer:

  * Apache HBase[1]
  * Apache Cassandra[2]
  * Google Bigtable[3]
  * an in-memory implementation for unit testing
  * a JDBC implementation for SQL
  * a generic cache provider that can ride on top of memcache

All six systems came in with an spi layer around 1000 lines of code to
implement the above 7 interfaces.  This is a huge reduction in size
compared to prior attempts to implement a new JGit storage layer.  As
this package shows, a complete JGit storage implementation is more
than 17,000 lines of fairly complex code.

A simple cache is provided in storage.dht.spi.cache.  Implementers can
use CacheDatabase to wrap any other type of Database and perform fast
reads against a network based cache service, such as the open source
memcached[4].  An implementation of CacheService must be provided to
glue this spi onto the network cache.

[1] https://github.com/spearce/jgit_hbase
[2] https://github.com/spearce/jgit_cassandra
[3] http://labs.google.com/papers/bigtable.html
[4] http://memcached.org/

Change-Id: I0aa4072781f5ccc019ca421c036adff2c40c4295
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>

преди 13 години

3 Ревизии (stable-1.1)