You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

IO.java 11KB

DFS: A storage layer for JGit In practice the DHT storage layer has not been performing as well as large scale server environments want to see from a Git server. The performance of the DHT schema degrades rapidly as small changes are pushed into the repository due to the chunk size being less than 1/3 of the pushed pack size. Small chunks cause poor prefetch performance during reading, and require significantly longer prefetch lists inside of the chunk meta field to work around the small size. The DHT code is very complex (>17,000 lines of code) and is very sensitive to the underlying database round-trip time, as well as the way objects were written into the pack stream that was chunked and stored on the database. A poor pack layout (from any version of C Git prior to Junio reworking it) can cause the DHT code to be unable to enumerate the objects of the linux-2.6 repository in a completable time scale. Performing a clone from a DHT stored repository of 2 million objects takes 2 million row lookups in the DHT to locate the OBJECT_INDEX row for each object being cloned. This is very difficult for some DHTs to scale, even at 5000 rows/second the lookup stage alone takes 6 minutes (on local filesystem, this is almost too fast to bother measuring). Some servers like Apache Cassandra just fall over and cannot complete the 2 million lookups in rapid fire. On a ~400 MiB repository, the DHT schema has an extra 25 MiB of redundant data that gets downloaded to the JGit process, and that is before you consider the cost of the OBJECT_INDEX table also being fully loaded, which is at least 223 MiB of data for the linux kernel repository. In the DHT schema answering a `git clone` of the ~400 MiB linux kernel needs to load 248 MiB of "index" data from the DHT, in addition to the ~400 MiB of pack data that gets sent to the client. This is 193 MiB more data to be accessed than the native filesystem format, but it needs to come over a much smaller pipe (local Ethernet typically) than the local SATA disk drive. I also never got around to writing the "repack" support for the DHT schema, as it turns out to be fairly complex to safely repack data in the repository while also trying to minimize the amount of changes made to the database, due to very common limitations on database mutation rates.. This new DFS storage layer fixes a lot of those issues by taking the simple approach for storing relatively standard Git pack and index files on an abstract filesystem. Packs are accessed by an in-process buffer cache, similar to the WindowCache used by the local filesystem storage layer. Unlike the local file IO, there are some assumptions that the storage system has relatively high latency and no concept of "file handles". Instead it looks at the file more like HTTP byte range requests, where a read channel is a simply a thunk to trigger a read request over the network. The DFS code in this change is still abstract, it does not store on any particular filesystem, but is fairly well suited to the Amazon S3 or Apache Hadoop HDFS. Storing packs directly on HDFS rather than HBase removes a layer of abstraction, as most HBase row reads turn into an HDFS read. Most of the DFS code in this change was blatently copied from the local filesystem code. Most parts should be refactored to be shared between the two storage systems, but right now I am hesistent to do this due to how well tuned the local filesystem code currently is. Change-Id: Iec524abdf172e9ec5485d6c88ca6512cd8a6eafb
13 years ago
DFS: A storage layer for JGit In practice the DHT storage layer has not been performing as well as large scale server environments want to see from a Git server. The performance of the DHT schema degrades rapidly as small changes are pushed into the repository due to the chunk size being less than 1/3 of the pushed pack size. Small chunks cause poor prefetch performance during reading, and require significantly longer prefetch lists inside of the chunk meta field to work around the small size. The DHT code is very complex (>17,000 lines of code) and is very sensitive to the underlying database round-trip time, as well as the way objects were written into the pack stream that was chunked and stored on the database. A poor pack layout (from any version of C Git prior to Junio reworking it) can cause the DHT code to be unable to enumerate the objects of the linux-2.6 repository in a completable time scale. Performing a clone from a DHT stored repository of 2 million objects takes 2 million row lookups in the DHT to locate the OBJECT_INDEX row for each object being cloned. This is very difficult for some DHTs to scale, even at 5000 rows/second the lookup stage alone takes 6 minutes (on local filesystem, this is almost too fast to bother measuring). Some servers like Apache Cassandra just fall over and cannot complete the 2 million lookups in rapid fire. On a ~400 MiB repository, the DHT schema has an extra 25 MiB of redundant data that gets downloaded to the JGit process, and that is before you consider the cost of the OBJECT_INDEX table also being fully loaded, which is at least 223 MiB of data for the linux kernel repository. In the DHT schema answering a `git clone` of the ~400 MiB linux kernel needs to load 248 MiB of "index" data from the DHT, in addition to the ~400 MiB of pack data that gets sent to the client. This is 193 MiB more data to be accessed than the native filesystem format, but it needs to come over a much smaller pipe (local Ethernet typically) than the local SATA disk drive. I also never got around to writing the "repack" support for the DHT schema, as it turns out to be fairly complex to safely repack data in the repository while also trying to minimize the amount of changes made to the database, due to very common limitations on database mutation rates.. This new DFS storage layer fixes a lot of those issues by taking the simple approach for storing relatively standard Git pack and index files on an abstract filesystem. Packs are accessed by an in-process buffer cache, similar to the WindowCache used by the local filesystem storage layer. Unlike the local file IO, there are some assumptions that the storage system has relatively high latency and no concept of "file handles". Instead it looks at the file more like HTTP byte range requests, where a read channel is a simply a thunk to trigger a read request over the network. The DFS code in this change is still abstract, it does not store on any particular filesystem, but is fairly well suited to the Amazon S3 or Apache Hadoop HDFS. Storing packs directly on HDFS rather than HBase removes a layer of abstraction, as most HBase row reads turn into an HDFS read. Most of the DFS code in this change was blatently copied from the local filesystem code. Most parts should be refactored to be shared between the two storage systems, but right now I am hesistent to do this due to how well tuned the local filesystem code currently is. Change-Id: Iec524abdf172e9ec5485d6c88ca6512cd8a6eafb
13 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375
  1. /*
  2. * Copyright (C) 2008-2009, Google Inc.
  3. * Copyright (C) 2009, Robin Rosenberg <robin.rosenberg@dewire.com>
  4. * Copyright (C) 2006-2008, Shawn O. Pearce <spearce@spearce.org>
  5. * and other copyright owners as documented in the project's IP log.
  6. *
  7. * This program and the accompanying materials are made available
  8. * under the terms of the Eclipse Distribution License v1.0 which
  9. * accompanies this distribution, is reproduced below, and is
  10. * available at http://www.eclipse.org/org/documents/edl-v10.php
  11. *
  12. * All rights reserved.
  13. *
  14. * Redistribution and use in source and binary forms, with or
  15. * without modification, are permitted provided that the following
  16. * conditions are met:
  17. *
  18. * - Redistributions of source code must retain the above copyright
  19. * notice, this list of conditions and the following disclaimer.
  20. *
  21. * - Redistributions in binary form must reproduce the above
  22. * copyright notice, this list of conditions and the following
  23. * disclaimer in the documentation and/or other materials provided
  24. * with the distribution.
  25. *
  26. * - Neither the name of the Eclipse Foundation, Inc. nor the
  27. * names of its contributors may be used to endorse or promote
  28. * products derived from this software without specific prior
  29. * written permission.
  30. *
  31. * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
  32. * CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
  33. * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  34. * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  35. * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
  36. * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
  37. * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
  38. * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
  39. * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  40. * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
  41. * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
  42. * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
  43. * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  44. */
  45. package org.eclipse.jgit.util;
  46. import java.io.EOFException;
  47. import java.io.File;
  48. import java.io.FileInputStream;
  49. import java.io.FileNotFoundException;
  50. import java.io.IOException;
  51. import java.io.InputStream;
  52. import java.nio.ByteBuffer;
  53. import java.nio.channels.ReadableByteChannel;
  54. import java.text.MessageFormat;
  55. import java.util.ArrayList;
  56. import java.util.List;
  57. import org.eclipse.jgit.internal.JGitText;
  58. /**
  59. * Input/Output utilities
  60. */
  61. public class IO {
  62. /**
  63. * Read an entire local file into memory as a byte array.
  64. *
  65. * @param path
  66. * location of the file to read.
  67. * @return complete contents of the requested local file.
  68. * @throws FileNotFoundException
  69. * the file does not exist.
  70. * @throws IOException
  71. * the file exists, but its contents cannot be read.
  72. */
  73. public static final byte[] readFully(final File path)
  74. throws FileNotFoundException, IOException {
  75. return IO.readFully(path, Integer.MAX_VALUE);
  76. }
  77. /**
  78. * Read at most limit bytes from the local file into memory as a byte array.
  79. *
  80. * @param path
  81. * location of the file to read.
  82. * @param limit
  83. * maximum number of bytes to read, if the file is larger than
  84. * only the first limit number of bytes are returned
  85. * @return complete contents of the requested local file. If the contents
  86. * exceeds the limit, then only the limit is returned.
  87. * @throws FileNotFoundException
  88. * the file does not exist.
  89. * @throws IOException
  90. * the file exists, but its contents cannot be read.
  91. */
  92. public static final byte[] readSome(final File path, final int limit)
  93. throws FileNotFoundException, IOException {
  94. FileInputStream in = new FileInputStream(path);
  95. try {
  96. byte[] buf = new byte[limit];
  97. int cnt = 0;
  98. for (;;) {
  99. int n = in.read(buf, cnt, buf.length - cnt);
  100. if (n <= 0)
  101. break;
  102. cnt += n;
  103. }
  104. if (cnt == buf.length)
  105. return buf;
  106. byte[] res = new byte[cnt];
  107. System.arraycopy(buf, 0, res, 0, cnt);
  108. return res;
  109. } finally {
  110. try {
  111. in.close();
  112. } catch (IOException ignored) {
  113. // do nothing
  114. }
  115. }
  116. }
  117. /**
  118. * Read an entire local file into memory as a byte array.
  119. *
  120. * @param path
  121. * location of the file to read.
  122. * @param max
  123. * maximum number of bytes to read, if the file is larger than
  124. * this limit an IOException is thrown.
  125. * @return complete contents of the requested local file.
  126. * @throws FileNotFoundException
  127. * the file does not exist.
  128. * @throws IOException
  129. * the file exists, but its contents cannot be read.
  130. */
  131. public static final byte[] readFully(final File path, final int max)
  132. throws FileNotFoundException, IOException {
  133. final FileInputStream in = new FileInputStream(path);
  134. try {
  135. long sz = Math.max(path.length(), 1);
  136. if (sz > max)
  137. throw new IOException(MessageFormat.format(
  138. JGitText.get().fileIsTooLarge, path));
  139. byte[] buf = new byte[(int) sz];
  140. int valid = 0;
  141. for (;;) {
  142. if (buf.length == valid) {
  143. if (buf.length == max) {
  144. int next = in.read();
  145. if (next < 0)
  146. break;
  147. throw new IOException(MessageFormat.format(
  148. JGitText.get().fileIsTooLarge, path));
  149. }
  150. byte[] nb = new byte[Math.min(buf.length * 2, max)];
  151. System.arraycopy(buf, 0, nb, 0, valid);
  152. buf = nb;
  153. }
  154. int n = in.read(buf, valid, buf.length - valid);
  155. if (n < 0)
  156. break;
  157. valid += n;
  158. }
  159. if (valid < buf.length) {
  160. byte[] nb = new byte[valid];
  161. System.arraycopy(buf, 0, nb, 0, valid);
  162. buf = nb;
  163. }
  164. return buf;
  165. } finally {
  166. try {
  167. in.close();
  168. } catch (IOException ignored) {
  169. // ignore any close errors, this was a read only stream
  170. }
  171. }
  172. }
  173. /**
  174. * Read an entire input stream into memory as a ByteBuffer.
  175. *
  176. * Note: The stream is read to its end and is not usable after calling this
  177. * method. The caller is responsible for closing the stream.
  178. *
  179. * @param in
  180. * input stream to be read.
  181. * @param sizeHint
  182. * a hint on the approximate number of bytes contained in the
  183. * stream, used to allocate temporary buffers more efficiently
  184. * @return complete contents of the input stream. The ByteBuffer always has
  185. * a writable backing array, with {@code position() == 0} and
  186. * {@code limit()} equal to the actual length read. Callers may rely
  187. * on obtaining the underlying array for efficient data access. If
  188. * {@code sizeHint} was too large, the array may be over-allocated,
  189. * resulting in {@code limit() < array().length}.
  190. * @throws IOException
  191. * there was an error reading from the stream.
  192. */
  193. public static ByteBuffer readWholeStream(InputStream in, int sizeHint)
  194. throws IOException {
  195. byte[] out = new byte[sizeHint];
  196. int pos = 0;
  197. while (pos < out.length) {
  198. int read = in.read(out, pos, out.length - pos);
  199. if (read < 0)
  200. return ByteBuffer.wrap(out, 0, pos);
  201. pos += read;
  202. }
  203. int last = in.read();
  204. if (last < 0)
  205. return ByteBuffer.wrap(out, 0, pos);
  206. TemporaryBuffer.Heap tmp = new TemporaryBuffer.Heap(Integer.MAX_VALUE);
  207. tmp.write(out);
  208. tmp.write(last);
  209. tmp.copy(in);
  210. return ByteBuffer.wrap(tmp.toByteArray());
  211. }
  212. /**
  213. * Read the entire byte array into memory, or throw an exception.
  214. *
  215. * @param fd
  216. * input stream to read the data from.
  217. * @param dst
  218. * buffer that must be fully populated, [off, off+len).
  219. * @param off
  220. * position within the buffer to start writing to.
  221. * @param len
  222. * number of bytes that must be read.
  223. * @throws EOFException
  224. * the stream ended before dst was fully populated.
  225. * @throws IOException
  226. * there was an error reading from the stream.
  227. */
  228. public static void readFully(final InputStream fd, final byte[] dst,
  229. int off, int len) throws IOException {
  230. while (len > 0) {
  231. final int r = fd.read(dst, off, len);
  232. if (r <= 0)
  233. throw new EOFException(JGitText.get().shortReadOfBlock);
  234. off += r;
  235. len -= r;
  236. }
  237. }
  238. /**
  239. * Read as much of the array as possible from a channel.
  240. *
  241. * @param channel
  242. * channel to read data from.
  243. * @param dst
  244. * buffer that must be fully populated, [off, off+len).
  245. * @param off
  246. * position within the buffer to start writing to.
  247. * @param len
  248. * number of bytes that should be read.
  249. * @return number of bytes actually read.
  250. * @throws IOException
  251. * there was an error reading from the channel.
  252. */
  253. public static int read(ReadableByteChannel channel, byte[] dst, int off,
  254. int len) throws IOException {
  255. if (len == 0)
  256. return 0;
  257. int cnt = 0;
  258. while (0 < len) {
  259. int r = channel.read(ByteBuffer.wrap(dst, off, len));
  260. if (r <= 0)
  261. break;
  262. off += r;
  263. len -= r;
  264. cnt += r;
  265. }
  266. return cnt != 0 ? cnt : -1;
  267. }
  268. /**
  269. * Read the entire byte array into memory, unless input is shorter
  270. *
  271. * @param fd
  272. * input stream to read the data from.
  273. * @param dst
  274. * buffer that must be fully populated, [off, off+len).
  275. * @param off
  276. * position within the buffer to start writing to.
  277. * @return number of bytes in buffer or stream, whichever is shortest
  278. * @throws IOException
  279. * there was an error reading from the stream.
  280. */
  281. public static int readFully(InputStream fd, byte[] dst, int off)
  282. throws IOException {
  283. int r;
  284. int len = 0;
  285. while ((r = fd.read(dst, off, dst.length - off)) >= 0
  286. && len < dst.length) {
  287. off += r;
  288. len += r;
  289. }
  290. return len;
  291. }
  292. /**
  293. * Skip an entire region of an input stream.
  294. * <p>
  295. * The input stream's position is moved forward by the number of requested
  296. * bytes, discarding them from the input. This method does not return until
  297. * the exact number of bytes requested has been skipped.
  298. *
  299. * @param fd
  300. * the stream to skip bytes from.
  301. * @param toSkip
  302. * total number of bytes to be discarded. Must be >= 0.
  303. * @throws EOFException
  304. * the stream ended before the requested number of bytes were
  305. * skipped.
  306. * @throws IOException
  307. * there was an error reading from the stream.
  308. */
  309. public static void skipFully(final InputStream fd, long toSkip)
  310. throws IOException {
  311. while (toSkip > 0) {
  312. final long r = fd.skip(toSkip);
  313. if (r <= 0)
  314. throw new EOFException(JGitText.get().shortSkipOfBlock);
  315. toSkip -= r;
  316. }
  317. }
  318. /**
  319. * Divides the given string into lines.
  320. *
  321. * @param s
  322. * the string to read
  323. * @return the string divided into lines
  324. */
  325. public static List<String> readLines(final String s) {
  326. List<String> l = new ArrayList<String>();
  327. StringBuilder sb = new StringBuilder();
  328. for (int i = 0; i < s.length(); i++) {
  329. char c = s.charAt(i);
  330. if (c == '\n') {
  331. l.add(sb.toString());
  332. sb.setLength(0);
  333. continue;
  334. }
  335. if (c == '\r') {
  336. if (i + 1 < s.length()) {
  337. c = s.charAt(++i);
  338. l.add(sb.toString());
  339. sb.setLength(0);
  340. if (c != '\n')
  341. sb.append(c);
  342. continue;
  343. } else { // EOF
  344. l.add(sb.toString());
  345. break;
  346. }
  347. }
  348. sb.append(c);
  349. }
  350. l.add(sb.toString());
  351. return l;
  352. }
  353. private IO() {
  354. // Don't create instances of a static only utility.
  355. }
  356. }