You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

ObjectDirectoryPackParser.java 14KB

Store Git on any DHT jgit.storage.dht is a storage provider implementation for JGit that permits storing the Git repository in a distributed hashtable, NoSQL system, or other database. The actual underlying storage system is undefined, and can be plugged in by implementing 7 small interfaces: * Database * RepositoryIndexTable * RepositoryTable * RefTable * ChunkTable * ObjectIndexTable * WriteBuffer The storage provider interface tries to assume very little about the underlying storage system, and requires only three key features: * key -> value lookup (a hashtable is suitable) * atomic updates on single rows * asynchronous operations (Java's ExecutorService is easy to use) Most NoSQL database products offer all 3 of these features in their clients, and so does any decent network based cache system like the open source memcache product. Relying only on key equality for data retrevial makes it simple for the storage engine to distribute across multiple machines. Traditional SQL systems could also be used with a JDBC based spi implementation. Before submitting this change I have implemented six storage systems for the spi layer: * Apache HBase[1] * Apache Cassandra[2] * Google Bigtable[3] * an in-memory implementation for unit testing * a JDBC implementation for SQL * a generic cache provider that can ride on top of memcache All six systems came in with an spi layer around 1000 lines of code to implement the above 7 interfaces. This is a huge reduction in size compared to prior attempts to implement a new JGit storage layer. As this package shows, a complete JGit storage implementation is more than 17,000 lines of fairly complex code. A simple cache is provided in storage.dht.spi.cache. Implementers can use CacheDatabase to wrap any other type of Database and perform fast reads against a network based cache service, such as the open source memcached[4]. An implementation of CacheService must be provided to glue this spi onto the network cache. [1] https://github.com/spearce/jgit_hbase [2] https://github.com/spearce/jgit_cassandra [3] http://labs.google.com/papers/bigtable.html [4] http://memcached.org/ Change-Id: I0aa4072781f5ccc019ca421c036adff2c40c4295 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
преди 13 години
Store Git on any DHT jgit.storage.dht is a storage provider implementation for JGit that permits storing the Git repository in a distributed hashtable, NoSQL system, or other database. The actual underlying storage system is undefined, and can be plugged in by implementing 7 small interfaces: * Database * RepositoryIndexTable * RepositoryTable * RefTable * ChunkTable * ObjectIndexTable * WriteBuffer The storage provider interface tries to assume very little about the underlying storage system, and requires only three key features: * key -> value lookup (a hashtable is suitable) * atomic updates on single rows * asynchronous operations (Java's ExecutorService is easy to use) Most NoSQL database products offer all 3 of these features in their clients, and so does any decent network based cache system like the open source memcache product. Relying only on key equality for data retrevial makes it simple for the storage engine to distribute across multiple machines. Traditional SQL systems could also be used with a JDBC based spi implementation. Before submitting this change I have implemented six storage systems for the spi layer: * Apache HBase[1] * Apache Cassandra[2] * Google Bigtable[3] * an in-memory implementation for unit testing * a JDBC implementation for SQL * a generic cache provider that can ride on top of memcache All six systems came in with an spi layer around 1000 lines of code to implement the above 7 interfaces. This is a huge reduction in size compared to prior attempts to implement a new JGit storage layer. As this package shows, a complete JGit storage implementation is more than 17,000 lines of fairly complex code. A simple cache is provided in storage.dht.spi.cache. Implementers can use CacheDatabase to wrap any other type of Database and perform fast reads against a network based cache service, such as the open source memcached[4]. An implementation of CacheService must be provided to glue this spi onto the network cache. [1] https://github.com/spearce/jgit_hbase [2] https://github.com/spearce/jgit_cassandra [3] http://labs.google.com/papers/bigtable.html [4] http://memcached.org/ Change-Id: I0aa4072781f5ccc019ca421c036adff2c40c4295 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
преди 13 години
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516
  1. /*
  2. * Copyright (C) 2008-2011, Google Inc.
  3. * Copyright (C) 2007-2008, Robin Rosenberg <robin.rosenberg@dewire.com>
  4. * Copyright (C) 2008, Shawn O. Pearce <spearce@spearce.org> and others
  5. *
  6. * This program and the accompanying materials are made available under the
  7. * terms of the Eclipse Distribution License v. 1.0 which is available at
  8. * https://www.eclipse.org/org/documents/edl-v10.php.
  9. *
  10. * SPDX-License-Identifier: BSD-3-Clause
  11. */
  12. package org.eclipse.jgit.internal.storage.file;
  13. import java.io.File;
  14. import java.io.FileOutputStream;
  15. import java.io.IOException;
  16. import java.io.InputStream;
  17. import java.io.RandomAccessFile;
  18. import java.nio.file.StandardCopyOption;
  19. import java.security.MessageDigest;
  20. import java.text.MessageFormat;
  21. import java.util.Arrays;
  22. import java.util.List;
  23. import java.util.zip.CRC32;
  24. import java.util.zip.Deflater;
  25. import org.eclipse.jgit.errors.LockFailedException;
  26. import org.eclipse.jgit.internal.JGitText;
  27. import org.eclipse.jgit.internal.storage.pack.PackExt;
  28. import org.eclipse.jgit.lib.AnyObjectId;
  29. import org.eclipse.jgit.lib.Constants;
  30. import org.eclipse.jgit.lib.CoreConfig;
  31. import org.eclipse.jgit.lib.ObjectId;
  32. import org.eclipse.jgit.lib.ProgressMonitor;
  33. import org.eclipse.jgit.storage.pack.PackConfig;
  34. import org.eclipse.jgit.transport.PackParser;
  35. import org.eclipse.jgit.transport.PackedObjectInfo;
  36. import org.eclipse.jgit.util.FileUtils;
  37. import org.eclipse.jgit.util.NB;
  38. /**
  39. * Consumes a pack stream and stores as a pack file in
  40. * {@link org.eclipse.jgit.internal.storage.file.ObjectDirectory}.
  41. * <p>
  42. * To obtain an instance of a parser, applications should use
  43. * {@link org.eclipse.jgit.lib.ObjectInserter#newPackParser(InputStream)}.
  44. */
  45. public class ObjectDirectoryPackParser extends PackParser {
  46. private final FileObjectDatabase db;
  47. /** CRC-32 computation for objects that are appended onto the pack. */
  48. private final CRC32 crc;
  49. /** Running SHA-1 of any base objects appended after {@link #origEnd}. */
  50. private final MessageDigest tailDigest;
  51. /** Preferred format version of the pack-*.idx file to generate. */
  52. private int indexVersion;
  53. /** If true, pack with 0 objects will be stored. Usually these are deleted. */
  54. private boolean keepEmpty;
  55. /** Path of the temporary file holding the pack data. */
  56. private File tmpPack;
  57. /**
  58. * Path of the index created for the pack, to find objects quickly at read
  59. * time.
  60. */
  61. private File tmpIdx;
  62. /** Read/write handle to {@link #tmpPack} while it is being parsed. */
  63. private RandomAccessFile out;
  64. /** Length of the original pack stream, before missing bases were appended. */
  65. private long origEnd;
  66. /** The original checksum of data up to {@link #origEnd}. */
  67. private byte[] origHash;
  68. /** Current end of the pack file. */
  69. private long packEnd;
  70. /** Checksum of the entire pack file. */
  71. private byte[] packHash;
  72. /** Compresses delta bases when completing a thin pack. */
  73. private Deflater def;
  74. /** The pack that was created, if parsing was successful. */
  75. private Pack newPack;
  76. private PackConfig pconfig;
  77. ObjectDirectoryPackParser(FileObjectDatabase odb, InputStream src) {
  78. super(odb, src);
  79. this.db = odb;
  80. this.pconfig = new PackConfig(odb.getConfig());
  81. this.crc = new CRC32();
  82. this.tailDigest = Constants.newMessageDigest();
  83. indexVersion = db.getConfig().get(CoreConfig.KEY).getPackIndexVersion();
  84. }
  85. /**
  86. * Set the pack index file format version this instance will create.
  87. *
  88. * @param version
  89. * the version to write. The special version 0 designates the
  90. * oldest (most compatible) format available for the objects.
  91. * @see PackIndexWriter
  92. */
  93. public void setIndexVersion(int version) {
  94. indexVersion = version;
  95. }
  96. /**
  97. * Configure this index pack instance to keep an empty pack.
  98. * <p>
  99. * By default an empty pack (a pack with no objects) is not kept, as doi so
  100. * is completely pointless. With no objects in the pack there is no d stored
  101. * by it, so the pack is unnecessary.
  102. *
  103. * @param empty
  104. * true to enable keeping an empty pack.
  105. */
  106. public void setKeepEmpty(boolean empty) {
  107. keepEmpty = empty;
  108. }
  109. /**
  110. * Get the imported {@link org.eclipse.jgit.internal.storage.file.Pack}.
  111. * <p>
  112. * This method is supplied only to support testing; applications shouldn't
  113. * be using it directly to access the imported data.
  114. *
  115. * @return the imported PackFile, if parsing was successful.
  116. */
  117. public Pack getPack() {
  118. return newPack;
  119. }
  120. /** {@inheritDoc} */
  121. @Override
  122. public long getPackSize() {
  123. if (newPack == null)
  124. return super.getPackSize();
  125. File pack = newPack.getPackFile();
  126. long size = pack.length();
  127. String p = pack.getAbsolutePath();
  128. String i = p.substring(0, p.length() - ".pack".length()) + ".idx"; //$NON-NLS-1$ //$NON-NLS-2$
  129. File idx = new File(i);
  130. if (idx.exists() && idx.isFile())
  131. size += idx.length();
  132. return size;
  133. }
  134. /** {@inheritDoc} */
  135. @Override
  136. public PackLock parse(ProgressMonitor receiving, ProgressMonitor resolving)
  137. throws IOException {
  138. tmpPack = File.createTempFile("incoming_", ".pack", db.getDirectory()); //$NON-NLS-1$ //$NON-NLS-2$
  139. tmpIdx = new File(db.getDirectory(), baseName(tmpPack) + ".idx"); //$NON-NLS-1$
  140. try {
  141. out = new RandomAccessFile(tmpPack, "rw"); //$NON-NLS-1$
  142. super.parse(receiving, resolving);
  143. out.seek(packEnd);
  144. out.write(packHash);
  145. out.getChannel().force(true);
  146. out.close();
  147. writeIdx();
  148. tmpPack.setReadOnly();
  149. tmpIdx.setReadOnly();
  150. return renameAndOpenPack(getLockMessage());
  151. } finally {
  152. if (def != null)
  153. def.end();
  154. try {
  155. if (out != null && out.getChannel().isOpen())
  156. out.close();
  157. } catch (IOException closeError) {
  158. // Ignored. We want to delete the file.
  159. }
  160. cleanupTemporaryFiles();
  161. }
  162. }
  163. /** {@inheritDoc} */
  164. @Override
  165. protected void onPackHeader(long objectCount) throws IOException {
  166. // Ignored, the count is not required.
  167. }
  168. /** {@inheritDoc} */
  169. @Override
  170. protected void onBeginWholeObject(long streamPosition, int type,
  171. long inflatedSize) throws IOException {
  172. crc.reset();
  173. }
  174. /** {@inheritDoc} */
  175. @Override
  176. protected void onEndWholeObject(PackedObjectInfo info) throws IOException {
  177. info.setCRC((int) crc.getValue());
  178. }
  179. /** {@inheritDoc} */
  180. @Override
  181. protected void onBeginOfsDelta(long streamPosition,
  182. long baseStreamPosition, long inflatedSize) throws IOException {
  183. crc.reset();
  184. }
  185. /** {@inheritDoc} */
  186. @Override
  187. protected void onBeginRefDelta(long streamPosition, AnyObjectId baseId,
  188. long inflatedSize) throws IOException {
  189. crc.reset();
  190. }
  191. /** {@inheritDoc} */
  192. @Override
  193. protected UnresolvedDelta onEndDelta() throws IOException {
  194. UnresolvedDelta delta = new UnresolvedDelta();
  195. delta.setCRC((int) crc.getValue());
  196. return delta;
  197. }
  198. /** {@inheritDoc} */
  199. @Override
  200. protected void onInflatedObjectData(PackedObjectInfo obj, int typeCode,
  201. byte[] data) throws IOException {
  202. // ObjectDirectory ignores this event.
  203. }
  204. /** {@inheritDoc} */
  205. @Override
  206. protected void onObjectHeader(Source src, byte[] raw, int pos, int len)
  207. throws IOException {
  208. crc.update(raw, pos, len);
  209. }
  210. /** {@inheritDoc} */
  211. @Override
  212. protected void onObjectData(Source src, byte[] raw, int pos, int len)
  213. throws IOException {
  214. crc.update(raw, pos, len);
  215. }
  216. /** {@inheritDoc} */
  217. @Override
  218. protected void onStoreStream(byte[] raw, int pos, int len)
  219. throws IOException {
  220. out.write(raw, pos, len);
  221. }
  222. /** {@inheritDoc} */
  223. @Override
  224. protected void onPackFooter(byte[] hash) throws IOException {
  225. packEnd = out.getFilePointer();
  226. origEnd = packEnd;
  227. origHash = hash;
  228. packHash = hash;
  229. }
  230. /** {@inheritDoc} */
  231. @Override
  232. protected ObjectTypeAndSize seekDatabase(UnresolvedDelta delta,
  233. ObjectTypeAndSize info) throws IOException {
  234. out.seek(delta.getOffset());
  235. crc.reset();
  236. return readObjectHeader(info);
  237. }
  238. /** {@inheritDoc} */
  239. @Override
  240. protected ObjectTypeAndSize seekDatabase(PackedObjectInfo obj,
  241. ObjectTypeAndSize info) throws IOException {
  242. out.seek(obj.getOffset());
  243. crc.reset();
  244. return readObjectHeader(info);
  245. }
  246. /** {@inheritDoc} */
  247. @Override
  248. protected int readDatabase(byte[] dst, int pos, int cnt) throws IOException {
  249. return out.read(dst, pos, cnt);
  250. }
  251. /** {@inheritDoc} */
  252. @Override
  253. protected boolean checkCRC(int oldCRC) {
  254. return oldCRC == (int) crc.getValue();
  255. }
  256. private static String baseName(File tmpPack) {
  257. String name = tmpPack.getName();
  258. return name.substring(0, name.lastIndexOf('.'));
  259. }
  260. private void cleanupTemporaryFiles() {
  261. if (tmpIdx != null && !tmpIdx.delete() && tmpIdx.exists())
  262. tmpIdx.deleteOnExit();
  263. if (tmpPack != null && !tmpPack.delete() && tmpPack.exists())
  264. tmpPack.deleteOnExit();
  265. }
  266. /** {@inheritDoc} */
  267. @Override
  268. protected boolean onAppendBase(final int typeCode, final byte[] data,
  269. final PackedObjectInfo info) throws IOException {
  270. info.setOffset(packEnd);
  271. final byte[] buf = buffer();
  272. int sz = data.length;
  273. int len = 0;
  274. buf[len++] = (byte) ((typeCode << 4) | (sz & 15));
  275. sz >>>= 4;
  276. while (sz > 0) {
  277. buf[len - 1] |= (byte) 0x80;
  278. buf[len++] = (byte) (sz & 0x7f);
  279. sz >>>= 7;
  280. }
  281. tailDigest.update(buf, 0, len);
  282. crc.reset();
  283. crc.update(buf, 0, len);
  284. out.seek(packEnd);
  285. out.write(buf, 0, len);
  286. packEnd += len;
  287. if (def == null)
  288. def = new Deflater(Deflater.DEFAULT_COMPRESSION, false);
  289. else
  290. def.reset();
  291. def.setInput(data);
  292. def.finish();
  293. while (!def.finished()) {
  294. len = def.deflate(buf);
  295. tailDigest.update(buf, 0, len);
  296. crc.update(buf, 0, len);
  297. out.write(buf, 0, len);
  298. packEnd += len;
  299. }
  300. info.setCRC((int) crc.getValue());
  301. return true;
  302. }
  303. /** {@inheritDoc} */
  304. @Override
  305. protected void onEndThinPack() throws IOException {
  306. final byte[] buf = buffer();
  307. final MessageDigest origDigest = Constants.newMessageDigest();
  308. final MessageDigest tailDigest2 = Constants.newMessageDigest();
  309. final MessageDigest packDigest = Constants.newMessageDigest();
  310. long origRemaining = origEnd;
  311. out.seek(0);
  312. out.readFully(buf, 0, 12);
  313. origDigest.update(buf, 0, 12);
  314. origRemaining -= 12;
  315. NB.encodeInt32(buf, 8, getObjectCount());
  316. out.seek(0);
  317. out.write(buf, 0, 12);
  318. packDigest.update(buf, 0, 12);
  319. for (;;) {
  320. final int n = out.read(buf);
  321. if (n < 0)
  322. break;
  323. if (origRemaining != 0) {
  324. final int origCnt = (int) Math.min(n, origRemaining);
  325. origDigest.update(buf, 0, origCnt);
  326. origRemaining -= origCnt;
  327. if (origRemaining == 0)
  328. tailDigest2.update(buf, origCnt, n - origCnt);
  329. } else
  330. tailDigest2.update(buf, 0, n);
  331. packDigest.update(buf, 0, n);
  332. }
  333. if (!Arrays.equals(origDigest.digest(), origHash) || !Arrays
  334. .equals(tailDigest2.digest(), this.tailDigest.digest()))
  335. throw new IOException(
  336. JGitText.get().packCorruptedWhileWritingToFilesystem);
  337. packHash = packDigest.digest();
  338. }
  339. private void writeIdx() throws IOException {
  340. List<PackedObjectInfo> list = getSortedObjectList(null /* by ObjectId */);
  341. try (FileOutputStream os = new FileOutputStream(tmpIdx)) {
  342. final PackIndexWriter iw;
  343. if (indexVersion <= 0)
  344. iw = PackIndexWriter.createOldestPossible(os, list);
  345. else
  346. iw = PackIndexWriter.createVersion(os, indexVersion);
  347. iw.write(list, packHash);
  348. os.getChannel().force(true);
  349. }
  350. }
  351. private PackLock renameAndOpenPack(String lockMessage)
  352. throws IOException {
  353. if (!keepEmpty && getObjectCount() == 0) {
  354. cleanupTemporaryFiles();
  355. return null;
  356. }
  357. final MessageDigest d = Constants.newMessageDigest();
  358. final byte[] oeBytes = new byte[Constants.OBJECT_ID_LENGTH];
  359. for (int i = 0; i < getObjectCount(); i++) {
  360. final PackedObjectInfo oe = getObject(i);
  361. oe.copyRawTo(oeBytes, 0);
  362. d.update(oeBytes);
  363. }
  364. ObjectId id = ObjectId.fromRaw(d.digest());
  365. File packDir = new File(db.getDirectory(), "pack"); //$NON-NLS-1$
  366. PackFile finalPack = new PackFile(packDir, id, PackExt.PACK);
  367. PackFile finalIdx = finalPack.create(PackExt.INDEX);
  368. final PackLock keep = new PackLock(finalPack, db.getFS());
  369. if (!packDir.exists() && !packDir.mkdir() && !packDir.exists()) {
  370. // The objects/pack directory isn't present, and we are unable
  371. // to create it. There is no way to move this pack in.
  372. //
  373. cleanupTemporaryFiles();
  374. throw new IOException(MessageFormat.format(
  375. JGitText.get().cannotCreateDirectory, packDir
  376. .getAbsolutePath()));
  377. }
  378. if (finalPack.exists()) {
  379. // If the pack is already present we should never replace it.
  380. //
  381. cleanupTemporaryFiles();
  382. return null;
  383. }
  384. if (lockMessage != null) {
  385. // If we have a reason to create a keep file for this pack, do
  386. // so, or fail fast and don't put the pack in place.
  387. //
  388. try {
  389. if (!keep.lock(lockMessage))
  390. throw new LockFailedException(finalPack,
  391. MessageFormat.format(
  392. JGitText.get().cannotLockPackIn, finalPack));
  393. } catch (IOException e) {
  394. cleanupTemporaryFiles();
  395. throw e;
  396. }
  397. }
  398. try {
  399. FileUtils.rename(tmpPack, finalPack,
  400. StandardCopyOption.ATOMIC_MOVE);
  401. } catch (IOException e) {
  402. cleanupTemporaryFiles();
  403. keep.unlock();
  404. throw new IOException(MessageFormat.format(
  405. JGitText.get().cannotMovePackTo, finalPack), e);
  406. }
  407. try {
  408. FileUtils.rename(tmpIdx, finalIdx, StandardCopyOption.ATOMIC_MOVE);
  409. } catch (IOException e) {
  410. cleanupTemporaryFiles();
  411. keep.unlock();
  412. if (!finalPack.delete())
  413. finalPack.deleteOnExit();
  414. throw new IOException(MessageFormat.format(
  415. JGitText.get().cannotMoveIndexTo, finalIdx), e);
  416. }
  417. boolean interrupted = false;
  418. try {
  419. FileSnapshot snapshot = FileSnapshot.save(finalPack);
  420. if (pconfig.doWaitPreventRacyPack(snapshot.size())) {
  421. snapshot.waitUntilNotRacy();
  422. }
  423. } catch (InterruptedException e) {
  424. interrupted = true;
  425. }
  426. try {
  427. newPack = db.openPack(finalPack);
  428. } catch (IOException err) {
  429. keep.unlock();
  430. if (finalPack.exists())
  431. FileUtils.delete(finalPack);
  432. if (finalIdx.exists())
  433. FileUtils.delete(finalIdx);
  434. throw err;
  435. } finally {
  436. if (interrupted) {
  437. // Re-set interrupted flag
  438. Thread.currentThread().interrupt();
  439. }
  440. }
  441. return lockMessage != null ? keep : null;
  442. }
  443. }