You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

ObjectDirectoryPackParser.java 14KB

Store Git on any DHT jgit.storage.dht is a storage provider implementation for JGit that permits storing the Git repository in a distributed hashtable, NoSQL system, or other database. The actual underlying storage system is undefined, and can be plugged in by implementing 7 small interfaces: * Database * RepositoryIndexTable * RepositoryTable * RefTable * ChunkTable * ObjectIndexTable * WriteBuffer The storage provider interface tries to assume very little about the underlying storage system, and requires only three key features: * key -> value lookup (a hashtable is suitable) * atomic updates on single rows * asynchronous operations (Java's ExecutorService is easy to use) Most NoSQL database products offer all 3 of these features in their clients, and so does any decent network based cache system like the open source memcache product. Relying only on key equality for data retrevial makes it simple for the storage engine to distribute across multiple machines. Traditional SQL systems could also be used with a JDBC based spi implementation. Before submitting this change I have implemented six storage systems for the spi layer: * Apache HBase[1] * Apache Cassandra[2] * Google Bigtable[3] * an in-memory implementation for unit testing * a JDBC implementation for SQL * a generic cache provider that can ride on top of memcache All six systems came in with an spi layer around 1000 lines of code to implement the above 7 interfaces. This is a huge reduction in size compared to prior attempts to implement a new JGit storage layer. As this package shows, a complete JGit storage implementation is more than 17,000 lines of fairly complex code. A simple cache is provided in storage.dht.spi.cache. Implementers can use CacheDatabase to wrap any other type of Database and perform fast reads against a network based cache service, such as the open source memcached[4]. An implementation of CacheService must be provided to glue this spi onto the network cache. [1] https://github.com/spearce/jgit_hbase [2] https://github.com/spearce/jgit_cassandra [3] http://labs.google.com/papers/bigtable.html [4] http://memcached.org/ Change-Id: I0aa4072781f5ccc019ca421c036adff2c40c4295 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago
Store Git on any DHT jgit.storage.dht is a storage provider implementation for JGit that permits storing the Git repository in a distributed hashtable, NoSQL system, or other database. The actual underlying storage system is undefined, and can be plugged in by implementing 7 small interfaces: * Database * RepositoryIndexTable * RepositoryTable * RefTable * ChunkTable * ObjectIndexTable * WriteBuffer The storage provider interface tries to assume very little about the underlying storage system, and requires only three key features: * key -> value lookup (a hashtable is suitable) * atomic updates on single rows * asynchronous operations (Java's ExecutorService is easy to use) Most NoSQL database products offer all 3 of these features in their clients, and so does any decent network based cache system like the open source memcache product. Relying only on key equality for data retrevial makes it simple for the storage engine to distribute across multiple machines. Traditional SQL systems could also be used with a JDBC based spi implementation. Before submitting this change I have implemented six storage systems for the spi layer: * Apache HBase[1] * Apache Cassandra[2] * Google Bigtable[3] * an in-memory implementation for unit testing * a JDBC implementation for SQL * a generic cache provider that can ride on top of memcache All six systems came in with an spi layer around 1000 lines of code to implement the above 7 interfaces. This is a huge reduction in size compared to prior attempts to implement a new JGit storage layer. As this package shows, a complete JGit storage implementation is more than 17,000 lines of fairly complex code. A simple cache is provided in storage.dht.spi.cache. Implementers can use CacheDatabase to wrap any other type of Database and perform fast reads against a network based cache service, such as the open source memcached[4]. An implementation of CacheService must be provided to glue this spi onto the network cache. [1] https://github.com/spearce/jgit_hbase [2] https://github.com/spearce/jgit_cassandra [3] http://labs.google.com/papers/bigtable.html [4] http://memcached.org/ Change-Id: I0aa4072781f5ccc019ca421c036adff2c40c4295 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515
  1. /*
  2. * Copyright (C) 2008-2011, Google Inc.
  3. * Copyright (C) 2007-2008, Robin Rosenberg <robin.rosenberg@dewire.com>
  4. * Copyright (C) 2008, Shawn O. Pearce <spearce@spearce.org> and others
  5. *
  6. * This program and the accompanying materials are made available under the
  7. * terms of the Eclipse Distribution License v. 1.0 which is available at
  8. * https://www.eclipse.org/org/documents/edl-v10.php.
  9. *
  10. * SPDX-License-Identifier: BSD-3-Clause
  11. */
  12. package org.eclipse.jgit.internal.storage.file;
  13. import java.io.File;
  14. import java.io.FileOutputStream;
  15. import java.io.IOException;
  16. import java.io.InputStream;
  17. import java.io.RandomAccessFile;
  18. import java.nio.file.StandardCopyOption;
  19. import java.security.MessageDigest;
  20. import java.text.MessageFormat;
  21. import java.util.Arrays;
  22. import java.util.List;
  23. import java.util.zip.CRC32;
  24. import java.util.zip.Deflater;
  25. import org.eclipse.jgit.errors.LockFailedException;
  26. import org.eclipse.jgit.internal.JGitText;
  27. import org.eclipse.jgit.lib.AnyObjectId;
  28. import org.eclipse.jgit.lib.Constants;
  29. import org.eclipse.jgit.lib.CoreConfig;
  30. import org.eclipse.jgit.lib.ObjectId;
  31. import org.eclipse.jgit.lib.ProgressMonitor;
  32. import org.eclipse.jgit.storage.pack.PackConfig;
  33. import org.eclipse.jgit.transport.PackParser;
  34. import org.eclipse.jgit.transport.PackedObjectInfo;
  35. import org.eclipse.jgit.util.FileUtils;
  36. import org.eclipse.jgit.util.NB;
  37. /**
  38. * Consumes a pack stream and stores as a pack file in
  39. * {@link org.eclipse.jgit.internal.storage.file.ObjectDirectory}.
  40. * <p>
  41. * To obtain an instance of a parser, applications should use
  42. * {@link org.eclipse.jgit.lib.ObjectInserter#newPackParser(InputStream)}.
  43. */
  44. public class ObjectDirectoryPackParser extends PackParser {
  45. private final FileObjectDatabase db;
  46. /** CRC-32 computation for objects that are appended onto the pack. */
  47. private final CRC32 crc;
  48. /** Running SHA-1 of any base objects appended after {@link #origEnd}. */
  49. private final MessageDigest tailDigest;
  50. /** Preferred format version of the pack-*.idx file to generate. */
  51. private int indexVersion;
  52. /** If true, pack with 0 objects will be stored. Usually these are deleted. */
  53. private boolean keepEmpty;
  54. /** Path of the temporary file holding the pack data. */
  55. private File tmpPack;
  56. /**
  57. * Path of the index created for the pack, to find objects quickly at read
  58. * time.
  59. */
  60. private File tmpIdx;
  61. /** Read/write handle to {@link #tmpPack} while it is being parsed. */
  62. private RandomAccessFile out;
  63. /** Length of the original pack stream, before missing bases were appended. */
  64. private long origEnd;
  65. /** The original checksum of data up to {@link #origEnd}. */
  66. private byte[] origHash;
  67. /** Current end of the pack file. */
  68. private long packEnd;
  69. /** Checksum of the entire pack file. */
  70. private byte[] packHash;
  71. /** Compresses delta bases when completing a thin pack. */
  72. private Deflater def;
  73. /** The pack that was created, if parsing was successful. */
  74. private Pack newPack;
  75. private PackConfig pconfig;
  76. ObjectDirectoryPackParser(FileObjectDatabase odb, InputStream src) {
  77. super(odb, src);
  78. this.db = odb;
  79. this.pconfig = new PackConfig(odb.getConfig());
  80. this.crc = new CRC32();
  81. this.tailDigest = Constants.newMessageDigest();
  82. indexVersion = db.getConfig().get(CoreConfig.KEY).getPackIndexVersion();
  83. }
  84. /**
  85. * Set the pack index file format version this instance will create.
  86. *
  87. * @param version
  88. * the version to write. The special version 0 designates the
  89. * oldest (most compatible) format available for the objects.
  90. * @see PackIndexWriter
  91. */
  92. public void setIndexVersion(int version) {
  93. indexVersion = version;
  94. }
  95. /**
  96. * Configure this index pack instance to keep an empty pack.
  97. * <p>
  98. * By default an empty pack (a pack with no objects) is not kept, as doi so
  99. * is completely pointless. With no objects in the pack there is no d stored
  100. * by it, so the pack is unnecessary.
  101. *
  102. * @param empty
  103. * true to enable keeping an empty pack.
  104. */
  105. public void setKeepEmpty(boolean empty) {
  106. keepEmpty = empty;
  107. }
  108. /**
  109. * Get the imported {@link org.eclipse.jgit.internal.storage.file.Pack}.
  110. * <p>
  111. * This method is supplied only to support testing; applications shouldn't
  112. * be using it directly to access the imported data.
  113. *
  114. * @return the imported PackFile, if parsing was successful.
  115. */
  116. public Pack getPack() {
  117. return newPack;
  118. }
  119. /** {@inheritDoc} */
  120. @Override
  121. public long getPackSize() {
  122. if (newPack == null)
  123. return super.getPackSize();
  124. File pack = newPack.getPackFile();
  125. long size = pack.length();
  126. String p = pack.getAbsolutePath();
  127. String i = p.substring(0, p.length() - ".pack".length()) + ".idx"; //$NON-NLS-1$ //$NON-NLS-2$
  128. File idx = new File(i);
  129. if (idx.exists() && idx.isFile())
  130. size += idx.length();
  131. return size;
  132. }
  133. /** {@inheritDoc} */
  134. @Override
  135. public PackLock parse(ProgressMonitor receiving, ProgressMonitor resolving)
  136. throws IOException {
  137. tmpPack = File.createTempFile("incoming_", ".pack", db.getDirectory()); //$NON-NLS-1$ //$NON-NLS-2$
  138. tmpIdx = new File(db.getDirectory(), baseName(tmpPack) + ".idx"); //$NON-NLS-1$
  139. try {
  140. out = new RandomAccessFile(tmpPack, "rw"); //$NON-NLS-1$
  141. super.parse(receiving, resolving);
  142. out.seek(packEnd);
  143. out.write(packHash);
  144. out.getChannel().force(true);
  145. out.close();
  146. writeIdx();
  147. tmpPack.setReadOnly();
  148. tmpIdx.setReadOnly();
  149. return renameAndOpenPack(getLockMessage());
  150. } finally {
  151. if (def != null)
  152. def.end();
  153. try {
  154. if (out != null && out.getChannel().isOpen())
  155. out.close();
  156. } catch (IOException closeError) {
  157. // Ignored. We want to delete the file.
  158. }
  159. cleanupTemporaryFiles();
  160. }
  161. }
  162. /** {@inheritDoc} */
  163. @Override
  164. protected void onPackHeader(long objectCount) throws IOException {
  165. // Ignored, the count is not required.
  166. }
  167. /** {@inheritDoc} */
  168. @Override
  169. protected void onBeginWholeObject(long streamPosition, int type,
  170. long inflatedSize) throws IOException {
  171. crc.reset();
  172. }
  173. /** {@inheritDoc} */
  174. @Override
  175. protected void onEndWholeObject(PackedObjectInfo info) throws IOException {
  176. info.setCRC((int) crc.getValue());
  177. }
  178. /** {@inheritDoc} */
  179. @Override
  180. protected void onBeginOfsDelta(long streamPosition,
  181. long baseStreamPosition, long inflatedSize) throws IOException {
  182. crc.reset();
  183. }
  184. /** {@inheritDoc} */
  185. @Override
  186. protected void onBeginRefDelta(long streamPosition, AnyObjectId baseId,
  187. long inflatedSize) throws IOException {
  188. crc.reset();
  189. }
  190. /** {@inheritDoc} */
  191. @Override
  192. protected UnresolvedDelta onEndDelta() throws IOException {
  193. UnresolvedDelta delta = new UnresolvedDelta();
  194. delta.setCRC((int) crc.getValue());
  195. return delta;
  196. }
  197. /** {@inheritDoc} */
  198. @Override
  199. protected void onInflatedObjectData(PackedObjectInfo obj, int typeCode,
  200. byte[] data) throws IOException {
  201. // ObjectDirectory ignores this event.
  202. }
  203. /** {@inheritDoc} */
  204. @Override
  205. protected void onObjectHeader(Source src, byte[] raw, int pos, int len)
  206. throws IOException {
  207. crc.update(raw, pos, len);
  208. }
  209. /** {@inheritDoc} */
  210. @Override
  211. protected void onObjectData(Source src, byte[] raw, int pos, int len)
  212. throws IOException {
  213. crc.update(raw, pos, len);
  214. }
  215. /** {@inheritDoc} */
  216. @Override
  217. protected void onStoreStream(byte[] raw, int pos, int len)
  218. throws IOException {
  219. out.write(raw, pos, len);
  220. }
  221. /** {@inheritDoc} */
  222. @Override
  223. protected void onPackFooter(byte[] hash) throws IOException {
  224. packEnd = out.getFilePointer();
  225. origEnd = packEnd;
  226. origHash = hash;
  227. packHash = hash;
  228. }
  229. /** {@inheritDoc} */
  230. @Override
  231. protected ObjectTypeAndSize seekDatabase(UnresolvedDelta delta,
  232. ObjectTypeAndSize info) throws IOException {
  233. out.seek(delta.getOffset());
  234. crc.reset();
  235. return readObjectHeader(info);
  236. }
  237. /** {@inheritDoc} */
  238. @Override
  239. protected ObjectTypeAndSize seekDatabase(PackedObjectInfo obj,
  240. ObjectTypeAndSize info) throws IOException {
  241. out.seek(obj.getOffset());
  242. crc.reset();
  243. return readObjectHeader(info);
  244. }
  245. /** {@inheritDoc} */
  246. @Override
  247. protected int readDatabase(byte[] dst, int pos, int cnt) throws IOException {
  248. return out.read(dst, pos, cnt);
  249. }
  250. /** {@inheritDoc} */
  251. @Override
  252. protected boolean checkCRC(int oldCRC) {
  253. return oldCRC == (int) crc.getValue();
  254. }
  255. private static String baseName(File tmpPack) {
  256. String name = tmpPack.getName();
  257. return name.substring(0, name.lastIndexOf('.'));
  258. }
  259. private void cleanupTemporaryFiles() {
  260. if (tmpIdx != null && !tmpIdx.delete() && tmpIdx.exists())
  261. tmpIdx.deleteOnExit();
  262. if (tmpPack != null && !tmpPack.delete() && tmpPack.exists())
  263. tmpPack.deleteOnExit();
  264. }
  265. /** {@inheritDoc} */
  266. @Override
  267. protected boolean onAppendBase(final int typeCode, final byte[] data,
  268. final PackedObjectInfo info) throws IOException {
  269. info.setOffset(packEnd);
  270. final byte[] buf = buffer();
  271. int sz = data.length;
  272. int len = 0;
  273. buf[len++] = (byte) ((typeCode << 4) | (sz & 15));
  274. sz >>>= 4;
  275. while (sz > 0) {
  276. buf[len - 1] |= (byte) 0x80;
  277. buf[len++] = (byte) (sz & 0x7f);
  278. sz >>>= 7;
  279. }
  280. tailDigest.update(buf, 0, len);
  281. crc.reset();
  282. crc.update(buf, 0, len);
  283. out.seek(packEnd);
  284. out.write(buf, 0, len);
  285. packEnd += len;
  286. if (def == null)
  287. def = new Deflater(Deflater.DEFAULT_COMPRESSION, false);
  288. else
  289. def.reset();
  290. def.setInput(data);
  291. def.finish();
  292. while (!def.finished()) {
  293. len = def.deflate(buf);
  294. tailDigest.update(buf, 0, len);
  295. crc.update(buf, 0, len);
  296. out.write(buf, 0, len);
  297. packEnd += len;
  298. }
  299. info.setCRC((int) crc.getValue());
  300. return true;
  301. }
  302. /** {@inheritDoc} */
  303. @Override
  304. protected void onEndThinPack() throws IOException {
  305. final byte[] buf = buffer();
  306. final MessageDigest origDigest = Constants.newMessageDigest();
  307. final MessageDigest tailDigest2 = Constants.newMessageDigest();
  308. final MessageDigest packDigest = Constants.newMessageDigest();
  309. long origRemaining = origEnd;
  310. out.seek(0);
  311. out.readFully(buf, 0, 12);
  312. origDigest.update(buf, 0, 12);
  313. origRemaining -= 12;
  314. NB.encodeInt32(buf, 8, getObjectCount());
  315. out.seek(0);
  316. out.write(buf, 0, 12);
  317. packDigest.update(buf, 0, 12);
  318. for (;;) {
  319. final int n = out.read(buf);
  320. if (n < 0)
  321. break;
  322. if (origRemaining != 0) {
  323. final int origCnt = (int) Math.min(n, origRemaining);
  324. origDigest.update(buf, 0, origCnt);
  325. origRemaining -= origCnt;
  326. if (origRemaining == 0)
  327. tailDigest2.update(buf, origCnt, n - origCnt);
  328. } else
  329. tailDigest2.update(buf, 0, n);
  330. packDigest.update(buf, 0, n);
  331. }
  332. if (!Arrays.equals(origDigest.digest(), origHash) || !Arrays
  333. .equals(tailDigest2.digest(), this.tailDigest.digest()))
  334. throw new IOException(
  335. JGitText.get().packCorruptedWhileWritingToFilesystem);
  336. packHash = packDigest.digest();
  337. }
  338. private void writeIdx() throws IOException {
  339. List<PackedObjectInfo> list = getSortedObjectList(null /* by ObjectId */);
  340. try (FileOutputStream os = new FileOutputStream(tmpIdx)) {
  341. final PackIndexWriter iw;
  342. if (indexVersion <= 0)
  343. iw = PackIndexWriter.createOldestPossible(os, list);
  344. else
  345. iw = PackIndexWriter.createVersion(os, indexVersion);
  346. iw.write(list, packHash);
  347. os.getChannel().force(true);
  348. }
  349. }
  350. private PackLock renameAndOpenPack(String lockMessage)
  351. throws IOException {
  352. if (!keepEmpty && getObjectCount() == 0) {
  353. cleanupTemporaryFiles();
  354. return null;
  355. }
  356. final MessageDigest d = Constants.newMessageDigest();
  357. final byte[] oeBytes = new byte[Constants.OBJECT_ID_LENGTH];
  358. for (int i = 0; i < getObjectCount(); i++) {
  359. final PackedObjectInfo oe = getObject(i);
  360. oe.copyRawTo(oeBytes, 0);
  361. d.update(oeBytes);
  362. }
  363. final String name = ObjectId.fromRaw(d.digest()).name();
  364. final File packDir = new File(db.getDirectory(), "pack"); //$NON-NLS-1$
  365. final File finalPack = new File(packDir, "pack-" + name + ".pack"); //$NON-NLS-1$ //$NON-NLS-2$
  366. final File finalIdx = new File(packDir, "pack-" + name + ".idx"); //$NON-NLS-1$ //$NON-NLS-2$
  367. final PackLock keep = new PackLock(finalPack, db.getFS());
  368. if (!packDir.exists() && !packDir.mkdir() && !packDir.exists()) {
  369. // The objects/pack directory isn't present, and we are unable
  370. // to create it. There is no way to move this pack in.
  371. //
  372. cleanupTemporaryFiles();
  373. throw new IOException(MessageFormat.format(
  374. JGitText.get().cannotCreateDirectory, packDir
  375. .getAbsolutePath()));
  376. }
  377. if (finalPack.exists()) {
  378. // If the pack is already present we should never replace it.
  379. //
  380. cleanupTemporaryFiles();
  381. return null;
  382. }
  383. if (lockMessage != null) {
  384. // If we have a reason to create a keep file for this pack, do
  385. // so, or fail fast and don't put the pack in place.
  386. //
  387. try {
  388. if (!keep.lock(lockMessage))
  389. throw new LockFailedException(finalPack,
  390. MessageFormat.format(
  391. JGitText.get().cannotLockPackIn, finalPack));
  392. } catch (IOException e) {
  393. cleanupTemporaryFiles();
  394. throw e;
  395. }
  396. }
  397. try {
  398. FileUtils.rename(tmpPack, finalPack,
  399. StandardCopyOption.ATOMIC_MOVE);
  400. } catch (IOException e) {
  401. cleanupTemporaryFiles();
  402. keep.unlock();
  403. throw new IOException(MessageFormat.format(
  404. JGitText.get().cannotMovePackTo, finalPack), e);
  405. }
  406. try {
  407. FileUtils.rename(tmpIdx, finalIdx, StandardCopyOption.ATOMIC_MOVE);
  408. } catch (IOException e) {
  409. cleanupTemporaryFiles();
  410. keep.unlock();
  411. if (!finalPack.delete())
  412. finalPack.deleteOnExit();
  413. throw new IOException(MessageFormat.format(
  414. JGitText.get().cannotMoveIndexTo, finalIdx), e);
  415. }
  416. boolean interrupted = false;
  417. try {
  418. FileSnapshot snapshot = FileSnapshot.save(finalPack);
  419. if (pconfig.doWaitPreventRacyPack(snapshot.size())) {
  420. snapshot.waitUntilNotRacy();
  421. }
  422. } catch (InterruptedException e) {
  423. interrupted = true;
  424. }
  425. try {
  426. newPack = db.openPack(finalPack);
  427. } catch (IOException err) {
  428. keep.unlock();
  429. if (finalPack.exists())
  430. FileUtils.delete(finalPack);
  431. if (finalIdx.exists())
  432. FileUtils.delete(finalIdx);
  433. throw err;
  434. } finally {
  435. if (interrupted) {
  436. // Re-set interrupted flag
  437. Thread.currentThread().interrupt();
  438. }
  439. }
  440. return lockMessage != null ? keep : null;
  441. }
  442. }