You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

git_store.proto 8.5KB

Store Git on any DHT jgit.storage.dht is a storage provider implementation for JGit that permits storing the Git repository in a distributed hashtable, NoSQL system, or other database. The actual underlying storage system is undefined, and can be plugged in by implementing 7 small interfaces: * Database * RepositoryIndexTable * RepositoryTable * RefTable * ChunkTable * ObjectIndexTable * WriteBuffer The storage provider interface tries to assume very little about the underlying storage system, and requires only three key features: * key -> value lookup (a hashtable is suitable) * atomic updates on single rows * asynchronous operations (Java's ExecutorService is easy to use) Most NoSQL database products offer all 3 of these features in their clients, and so does any decent network based cache system like the open source memcache product. Relying only on key equality for data retrevial makes it simple for the storage engine to distribute across multiple machines. Traditional SQL systems could also be used with a JDBC based spi implementation. Before submitting this change I have implemented six storage systems for the spi layer: * Apache HBase[1] * Apache Cassandra[2] * Google Bigtable[3] * an in-memory implementation for unit testing * a JDBC implementation for SQL * a generic cache provider that can ride on top of memcache All six systems came in with an spi layer around 1000 lines of code to implement the above 7 interfaces. This is a huge reduction in size compared to prior attempts to implement a new JGit storage layer. As this package shows, a complete JGit storage implementation is more than 17,000 lines of fairly complex code. A simple cache is provided in storage.dht.spi.cache. Implementers can use CacheDatabase to wrap any other type of Database and perform fast reads against a network based cache service, such as the open source memcached[4]. An implementation of CacheService must be provided to glue this spi onto the network cache. [1] https://github.com/spearce/jgit_hbase [2] https://github.com/spearce/jgit_cassandra [3] http://labs.google.com/papers/bigtable.html [4] http://memcached.org/ Change-Id: I0aa4072781f5ccc019ca421c036adff2c40c4295 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago
Store Git on any DHT jgit.storage.dht is a storage provider implementation for JGit that permits storing the Git repository in a distributed hashtable, NoSQL system, or other database. The actual underlying storage system is undefined, and can be plugged in by implementing 7 small interfaces: * Database * RepositoryIndexTable * RepositoryTable * RefTable * ChunkTable * ObjectIndexTable * WriteBuffer The storage provider interface tries to assume very little about the underlying storage system, and requires only three key features: * key -> value lookup (a hashtable is suitable) * atomic updates on single rows * asynchronous operations (Java's ExecutorService is easy to use) Most NoSQL database products offer all 3 of these features in their clients, and so does any decent network based cache system like the open source memcache product. Relying only on key equality for data retrevial makes it simple for the storage engine to distribute across multiple machines. Traditional SQL systems could also be used with a JDBC based spi implementation. Before submitting this change I have implemented six storage systems for the spi layer: * Apache HBase[1] * Apache Cassandra[2] * Google Bigtable[3] * an in-memory implementation for unit testing * a JDBC implementation for SQL * a generic cache provider that can ride on top of memcache All six systems came in with an spi layer around 1000 lines of code to implement the above 7 interfaces. This is a huge reduction in size compared to prior attempts to implement a new JGit storage layer. As this package shows, a complete JGit storage implementation is more than 17,000 lines of fairly complex code. A simple cache is provided in storage.dht.spi.cache. Implementers can use CacheDatabase to wrap any other type of Database and perform fast reads against a network based cache service, such as the open source memcached[4]. An implementation of CacheService must be provided to glue this spi onto the network cache. [1] https://github.com/spearce/jgit_hbase [2] https://github.com/spearce/jgit_cassandra [3] http://labs.google.com/papers/bigtable.html [4] http://memcached.org/ Change-Id: I0aa4072781f5ccc019ca421c036adff2c40c4295 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago
Store Git on any DHT jgit.storage.dht is a storage provider implementation for JGit that permits storing the Git repository in a distributed hashtable, NoSQL system, or other database. The actual underlying storage system is undefined, and can be plugged in by implementing 7 small interfaces: * Database * RepositoryIndexTable * RepositoryTable * RefTable * ChunkTable * ObjectIndexTable * WriteBuffer The storage provider interface tries to assume very little about the underlying storage system, and requires only three key features: * key -> value lookup (a hashtable is suitable) * atomic updates on single rows * asynchronous operations (Java's ExecutorService is easy to use) Most NoSQL database products offer all 3 of these features in their clients, and so does any decent network based cache system like the open source memcache product. Relying only on key equality for data retrevial makes it simple for the storage engine to distribute across multiple machines. Traditional SQL systems could also be used with a JDBC based spi implementation. Before submitting this change I have implemented six storage systems for the spi layer: * Apache HBase[1] * Apache Cassandra[2] * Google Bigtable[3] * an in-memory implementation for unit testing * a JDBC implementation for SQL * a generic cache provider that can ride on top of memcache All six systems came in with an spi layer around 1000 lines of code to implement the above 7 interfaces. This is a huge reduction in size compared to prior attempts to implement a new JGit storage layer. As this package shows, a complete JGit storage implementation is more than 17,000 lines of fairly complex code. A simple cache is provided in storage.dht.spi.cache. Implementers can use CacheDatabase to wrap any other type of Database and perform fast reads against a network based cache service, such as the open source memcached[4]. An implementation of CacheService must be provided to glue this spi onto the network cache. [1] https://github.com/spearce/jgit_hbase [2] https://github.com/spearce/jgit_cassandra [3] http://labs.google.com/papers/bigtable.html [4] http://memcached.org/ Change-Id: I0aa4072781f5ccc019ca421c036adff2c40c4295 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278
  1. // Copyright (C) 2011, Google Inc.
  2. // and other copyright owners as documented in the project's IP log.
  3. //
  4. // This program and the accompanying materials are made available
  5. // under the terms of the Eclipse Distribution License v1.0 which
  6. // accompanies this distribution, is reproduced below, and is
  7. // available at http://www.eclipse.org/org/documents/edl-v10.php
  8. //
  9. // All rights reserved.
  10. //
  11. // Redistribution and use in source and binary forms, with or
  12. // without modification, are permitted provided that the following
  13. // conditions are met:
  14. //
  15. // - Redistributions of source code must retain the above copyright
  16. // notice, this list of conditions and the following disclaimer.
  17. //
  18. // - Redistributions in binary form must reproduce the above
  19. // copyright notice, this list of conditions and the following
  20. // disclaimer in the documentation and/or other materials provided
  21. // with the distribution.
  22. //
  23. // - Neither the name of the Eclipse Foundation, Inc. nor the
  24. // names of its contributors may be used to endorse or promote
  25. // products derived from this software without specific prior
  26. // written permission.
  27. //
  28. // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
  29. // CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
  30. // INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  31. // OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  32. // ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
  33. // CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
  34. // SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
  35. // NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
  36. // LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  37. // CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
  38. // STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
  39. // ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
  40. // ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  41. // !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
  42. //
  43. // WARNING: If you edit this file, run generate.sh
  44. //
  45. // !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
  46. syntax = "proto2";
  47. package org.eclipse.jgit.storage.dht;
  48. option java_generate_equals_and_hash = true;
  49. option java_package = "org.eclipse.jgit.generated.storage.dht.proto";
  50. // Entry in RefTable describing the target of the reference.
  51. // Either symref *OR* target must be populated, but never both.
  52. //
  53. message RefData {
  54. // Incrementing counter updated each time the RefData changes.
  55. // Should always start at 1.
  56. //
  57. required uint32 sequence = 5 [default = 0];
  58. // An ObjectId with an optional hint about where it can be found.
  59. //
  60. message Id {
  61. required string object_name = 1;
  62. optional string chunk_key = 2;
  63. }
  64. // Name of another reference this reference inherits its target
  65. // from. The target is inherited on-the-fly at runtime by reading
  66. // the other reference. Typically only "HEAD" uses symref.
  67. //
  68. optional string symref = 1;
  69. // ObjectId this reference currently points at.
  70. //
  71. optional Id target = 2;
  72. // True if the correct value for peeled is stored.
  73. //
  74. optional bool is_peeled = 3;
  75. // If is_peeled is true, this field is accurate. This field
  76. // exists only if target points to annotated tag object, then
  77. // this field stores the "object" field for that tag.
  78. //
  79. optional Id peeled = 4;
  80. }
  81. // Entry in ObjectIndexTable, describes how an object appears in a chunk.
  82. //
  83. message ObjectInfo {
  84. // Type of Git object.
  85. //
  86. enum ObjectType {
  87. COMMIT = 1;
  88. TREE = 2;
  89. BLOB = 3;
  90. TAG = 4;
  91. }
  92. optional ObjectType object_type = 1;
  93. // Position of the object's header within its chunk.
  94. //
  95. required int32 offset = 2;
  96. // Total number of compressed data bytes, not including the pack
  97. // header. For fragmented objects this is the sum of all chunks.
  98. //
  99. required int64 packed_size = 3;
  100. // Total number of bytes of the uncompressed object. For a
  101. // delta this is the size after applying the delta onto its base.
  102. //
  103. required int64 inflated_size = 4;
  104. // ObjectId of the delta base, if this object is stored as a delta.
  105. // The base is stored in raw binary.
  106. //
  107. optional bytes delta_base = 5;
  108. // True if the object requires more than one chunk to be stored.
  109. //
  110. optional bool is_fragmented = 6;
  111. }
  112. // Describes at a high-level the information about a chunk.
  113. // A repository can use this summary to determine how much
  114. // data is stored, or when garbage collection should occur.
  115. //
  116. message ChunkInfo {
  117. // Source of the chunk (what code path created it).
  118. //
  119. enum Source {
  120. RECEIVE = 1; // Came in over the network from external source.
  121. INSERT = 2; // Created in this repository (e.g. a merge).
  122. REPACK = 3; // Generated during a repack of this repository.
  123. }
  124. optional Source source = 1;
  125. // Type of Git object stored in this chunk.
  126. //
  127. enum ObjectType {
  128. MIXED = 0;
  129. COMMIT = 1;
  130. TREE = 2;
  131. BLOB = 3;
  132. TAG = 4;
  133. }
  134. optional ObjectType object_type = 2;
  135. // True if this chunk is a member of a fragmented object.
  136. //
  137. optional bool is_fragment = 3;
  138. // If present, key of the CachedPackInfo object
  139. // that this chunk is a member of.
  140. //
  141. optional string cached_pack_key = 4;
  142. // Summary description of the objects stored here.
  143. //
  144. message ObjectCounts {
  145. // Number of objects stored in this chunk.
  146. //
  147. optional int32 total = 1;
  148. // Number of objects stored in whole (non-delta) form.
  149. //
  150. optional int32 whole = 2;
  151. // Number of objects stored in OFS_DELTA format.
  152. // The delta base appears in the same chunk, or
  153. // may appear in an earlier chunk through the
  154. // ChunkMeta.base_chunk link.
  155. //
  156. optional int32 ofs_delta = 3;
  157. // Number of objects stored in REF_DELTA format.
  158. // The delta base is at an unknown location.
  159. //
  160. optional int32 ref_delta = 4;
  161. }
  162. optional ObjectCounts object_counts = 5;
  163. // Size in bytes of the chunk's compressed data column.
  164. //
  165. optional int32 chunk_size = 6;
  166. // Size in bytes of the chunk's index.
  167. //
  168. optional int32 index_size = 7;
  169. // Size in bytes of the meta information.
  170. //
  171. optional int32 meta_size = 8;
  172. }
  173. // Describes meta information about a chunk, stored inline with it.
  174. //
  175. message ChunkMeta {
  176. // Enumerates the other chunks this chunk depends upon by OFS_DELTA.
  177. // Entries are sorted by relative_start ascending, enabling search. Thus
  178. // the earliest chunk is at the end of the list.
  179. //
  180. message BaseChunk {
  181. // Bytes between start of the base chunk and start of this chunk.
  182. // Although the value is positive, its a negative offset.
  183. //
  184. required int64 relative_start = 1;
  185. required string chunk_key = 2;
  186. }
  187. repeated BaseChunk base_chunk = 1;
  188. // If this chunk is part of a fragment, key of every chunk that
  189. // makes up the fragment, including this chunk.
  190. //
  191. repeated string fragment = 2;
  192. // Chunks that should be prefetched if reading the current chunk.
  193. //
  194. message PrefetchHint {
  195. repeated string edge = 1;
  196. repeated string sequential = 2;
  197. }
  198. optional PrefetchHint commit_prefetch = 51;
  199. optional PrefetchHint tree_prefetch = 52;
  200. }
  201. // Describes a CachedPack, for efficient bulk clones.
  202. //
  203. message CachedPackInfo {
  204. // Unique name of the cached pack. This is the SHA-1 hash of
  205. // all of the objects that make up the cached pack, sorted and
  206. // in binary form. (Same rules as Git on the filesystem.)
  207. //
  208. required string name = 1;
  209. // SHA-1 of all chunk keys, which are themselves SHA-1s of the
  210. // raw chunk data. If any bit differs in compression (due to
  211. // repacking) the version will differ.
  212. //
  213. required string version = 2;
  214. // Total number of objects in the cached pack. This must be known
  215. // in order to set the final resulting pack header correctly before it
  216. // is sent to clients.
  217. //
  218. required int64 objects_total = 3;
  219. // Number of objects stored as deltas, rather than deflated whole.
  220. //
  221. optional int64 objects_delta = 4;
  222. // Total size of the chunks, in bytes, not including the chunk footer.
  223. //
  224. optional int64 bytes_total = 5;
  225. // Objects this pack starts from.
  226. //
  227. message TipObjectList {
  228. repeated string object_name = 1;
  229. }
  230. required TipObjectList tip_list = 6;
  231. // Chunks, in order of occurrence in the stream.
  232. //
  233. message ChunkList {
  234. repeated string chunk_key = 1;
  235. }
  236. required ChunkList chunk_list = 7;
  237. }