You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

TemporaryBuffer.java 16KB

Buffer very large delta streams to reduce explosion of CPU work Large delta streams are unpacked incrementally, but because a delta can seek to a random position in the base to perform a copy we may need to inflate the base repeatedly just to complete one delta. So work around it by copying the base to a temporary file, and then we can read from that temporary file using random seeks instead. Its far more efficient because we now only need to inflate the base once. This is still really ugly because we have to dump to a temporary file, but at least the code can successfully process a large file without throwing OutOfMemoryError. If speed is an issue, the user will need to increase the JVM heap and ensure core.streamFileThreshold is set to a higher value, so we don't use this code path as often. Unfortunately we lose the "optimization" of skipping over portions of a delta base that we don't actually need in the final result. This is going to cause us to inflate and write to disk useless regions that were deleted and do not appear in the final result. We could later improve on our code by trying to flatten delta instruction streams before we touch the bottom base object, and then only store the portions of the base we really need for the final result and that appear out-of-order. Since that is some pretty complex code I'm punting on it for now and just doing this simple whole-object buffering. Because the process umask might be permitting other users to read files we create, we put the temporary buffers into $GIT_DIR/objects. We can reasonably assume that if a reader can read our temporary buffer file in that directory, they can also read the base pack file we are pulling it from and therefore its not a security breach to expose the inflated content in a file. This requires a reader to have write access to the repository, but only if the file is really big. I'd rather err on the side of caution here and refuse to read a very big file into /tmp than to possibly expose a secured content because the Java 5 JVM won't let us create a protected temporary file that only the current user can access. Change-Id: I66fb80b08cbcaf0f65f2db0462c546a495a160dd Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago
Buffer very large delta streams to reduce explosion of CPU work Large delta streams are unpacked incrementally, but because a delta can seek to a random position in the base to perform a copy we may need to inflate the base repeatedly just to complete one delta. So work around it by copying the base to a temporary file, and then we can read from that temporary file using random seeks instead. Its far more efficient because we now only need to inflate the base once. This is still really ugly because we have to dump to a temporary file, but at least the code can successfully process a large file without throwing OutOfMemoryError. If speed is an issue, the user will need to increase the JVM heap and ensure core.streamFileThreshold is set to a higher value, so we don't use this code path as often. Unfortunately we lose the "optimization" of skipping over portions of a delta base that we don't actually need in the final result. This is going to cause us to inflate and write to disk useless regions that were deleted and do not appear in the final result. We could later improve on our code by trying to flatten delta instruction streams before we touch the bottom base object, and then only store the portions of the base we really need for the final result and that appear out-of-order. Since that is some pretty complex code I'm punting on it for now and just doing this simple whole-object buffering. Because the process umask might be permitting other users to read files we create, we put the temporary buffers into $GIT_DIR/objects. We can reasonably assume that if a reader can read our temporary buffer file in that directory, they can also read the base pack file we are pulling it from and therefore its not a security breach to expose the inflated content in a file. This requires a reader to have write access to the repository, but only if the file is really big. I'd rather err on the side of caution here and refuse to read a very big file into /tmp than to possibly expose a secured content because the Java 5 JVM won't let us create a protected temporary file that only the current user can access. Change-Id: I66fb80b08cbcaf0f65f2db0462c546a495a160dd Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago
Buffer very large delta streams to reduce explosion of CPU work Large delta streams are unpacked incrementally, but because a delta can seek to a random position in the base to perform a copy we may need to inflate the base repeatedly just to complete one delta. So work around it by copying the base to a temporary file, and then we can read from that temporary file using random seeks instead. Its far more efficient because we now only need to inflate the base once. This is still really ugly because we have to dump to a temporary file, but at least the code can successfully process a large file without throwing OutOfMemoryError. If speed is an issue, the user will need to increase the JVM heap and ensure core.streamFileThreshold is set to a higher value, so we don't use this code path as often. Unfortunately we lose the "optimization" of skipping over portions of a delta base that we don't actually need in the final result. This is going to cause us to inflate and write to disk useless regions that were deleted and do not appear in the final result. We could later improve on our code by trying to flatten delta instruction streams before we touch the bottom base object, and then only store the portions of the base we really need for the final result and that appear out-of-order. Since that is some pretty complex code I'm punting on it for now and just doing this simple whole-object buffering. Because the process umask might be permitting other users to read files we create, we put the temporary buffers into $GIT_DIR/objects. We can reasonably assume that if a reader can read our temporary buffer file in that directory, they can also read the base pack file we are pulling it from and therefore its not a security breach to expose the inflated content in a file. This requires a reader to have write access to the repository, but only if the file is really big. I'd rather err on the side of caution here and refuse to read a very big file into /tmp than to possibly expose a secured content because the Java 5 JVM won't let us create a protected temporary file that only the current user can access. Change-Id: I66fb80b08cbcaf0f65f2db0462c546a495a160dd Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago
Buffer very large delta streams to reduce explosion of CPU work Large delta streams are unpacked incrementally, but because a delta can seek to a random position in the base to perform a copy we may need to inflate the base repeatedly just to complete one delta. So work around it by copying the base to a temporary file, and then we can read from that temporary file using random seeks instead. Its far more efficient because we now only need to inflate the base once. This is still really ugly because we have to dump to a temporary file, but at least the code can successfully process a large file without throwing OutOfMemoryError. If speed is an issue, the user will need to increase the JVM heap and ensure core.streamFileThreshold is set to a higher value, so we don't use this code path as often. Unfortunately we lose the "optimization" of skipping over portions of a delta base that we don't actually need in the final result. This is going to cause us to inflate and write to disk useless regions that were deleted and do not appear in the final result. We could later improve on our code by trying to flatten delta instruction streams before we touch the bottom base object, and then only store the portions of the base we really need for the final result and that appear out-of-order. Since that is some pretty complex code I'm punting on it for now and just doing this simple whole-object buffering. Because the process umask might be permitting other users to read files we create, we put the temporary buffers into $GIT_DIR/objects. We can reasonably assume that if a reader can read our temporary buffer file in that directory, they can also read the base pack file we are pulling it from and therefore its not a security breach to expose the inflated content in a file. This requires a reader to have write access to the repository, but only if the file is really big. I'd rather err on the side of caution here and refuse to read a very big file into /tmp than to possibly expose a secured content because the Java 5 JVM won't let us create a protected temporary file that only the current user can access. Change-Id: I66fb80b08cbcaf0f65f2db0462c546a495a160dd Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago
Buffer very large delta streams to reduce explosion of CPU work Large delta streams are unpacked incrementally, but because a delta can seek to a random position in the base to perform a copy we may need to inflate the base repeatedly just to complete one delta. So work around it by copying the base to a temporary file, and then we can read from that temporary file using random seeks instead. Its far more efficient because we now only need to inflate the base once. This is still really ugly because we have to dump to a temporary file, but at least the code can successfully process a large file without throwing OutOfMemoryError. If speed is an issue, the user will need to increase the JVM heap and ensure core.streamFileThreshold is set to a higher value, so we don't use this code path as often. Unfortunately we lose the "optimization" of skipping over portions of a delta base that we don't actually need in the final result. This is going to cause us to inflate and write to disk useless regions that were deleted and do not appear in the final result. We could later improve on our code by trying to flatten delta instruction streams before we touch the bottom base object, and then only store the portions of the base we really need for the final result and that appear out-of-order. Since that is some pretty complex code I'm punting on it for now and just doing this simple whole-object buffering. Because the process umask might be permitting other users to read files we create, we put the temporary buffers into $GIT_DIR/objects. We can reasonably assume that if a reader can read our temporary buffer file in that directory, they can also read the base pack file we are pulling it from and therefore its not a security breach to expose the inflated content in a file. This requires a reader to have write access to the repository, but only if the file is really big. I'd rather err on the side of caution here and refuse to read a very big file into /tmp than to possibly expose a secured content because the Java 5 JVM won't let us create a protected temporary file that only the current user can access. Change-Id: I66fb80b08cbcaf0f65f2db0462c546a495a160dd Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago
Buffer very large delta streams to reduce explosion of CPU work Large delta streams are unpacked incrementally, but because a delta can seek to a random position in the base to perform a copy we may need to inflate the base repeatedly just to complete one delta. So work around it by copying the base to a temporary file, and then we can read from that temporary file using random seeks instead. Its far more efficient because we now only need to inflate the base once. This is still really ugly because we have to dump to a temporary file, but at least the code can successfully process a large file without throwing OutOfMemoryError. If speed is an issue, the user will need to increase the JVM heap and ensure core.streamFileThreshold is set to a higher value, so we don't use this code path as often. Unfortunately we lose the "optimization" of skipping over portions of a delta base that we don't actually need in the final result. This is going to cause us to inflate and write to disk useless regions that were deleted and do not appear in the final result. We could later improve on our code by trying to flatten delta instruction streams before we touch the bottom base object, and then only store the portions of the base we really need for the final result and that appear out-of-order. Since that is some pretty complex code I'm punting on it for now and just doing this simple whole-object buffering. Because the process umask might be permitting other users to read files we create, we put the temporary buffers into $GIT_DIR/objects. We can reasonably assume that if a reader can read our temporary buffer file in that directory, they can also read the base pack file we are pulling it from and therefore its not a security breach to expose the inflated content in a file. This requires a reader to have write access to the repository, but only if the file is really big. I'd rather err on the side of caution here and refuse to read a very big file into /tmp than to possibly expose a secured content because the Java 5 JVM won't let us create a protected temporary file that only the current user can access. Change-Id: I66fb80b08cbcaf0f65f2db0462c546a495a160dd Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago
Buffer very large delta streams to reduce explosion of CPU work Large delta streams are unpacked incrementally, but because a delta can seek to a random position in the base to perform a copy we may need to inflate the base repeatedly just to complete one delta. So work around it by copying the base to a temporary file, and then we can read from that temporary file using random seeks instead. Its far more efficient because we now only need to inflate the base once. This is still really ugly because we have to dump to a temporary file, but at least the code can successfully process a large file without throwing OutOfMemoryError. If speed is an issue, the user will need to increase the JVM heap and ensure core.streamFileThreshold is set to a higher value, so we don't use this code path as often. Unfortunately we lose the "optimization" of skipping over portions of a delta base that we don't actually need in the final result. This is going to cause us to inflate and write to disk useless regions that were deleted and do not appear in the final result. We could later improve on our code by trying to flatten delta instruction streams before we touch the bottom base object, and then only store the portions of the base we really need for the final result and that appear out-of-order. Since that is some pretty complex code I'm punting on it for now and just doing this simple whole-object buffering. Because the process umask might be permitting other users to read files we create, we put the temporary buffers into $GIT_DIR/objects. We can reasonably assume that if a reader can read our temporary buffer file in that directory, they can also read the base pack file we are pulling it from and therefore its not a security breach to expose the inflated content in a file. This requires a reader to have write access to the repository, but only if the file is really big. I'd rather err on the side of caution here and refuse to read a very big file into /tmp than to possibly expose a secured content because the Java 5 JVM won't let us create a protected temporary file that only the current user can access. Change-Id: I66fb80b08cbcaf0f65f2db0462c546a495a160dd Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago
Buffer very large delta streams to reduce explosion of CPU work Large delta streams are unpacked incrementally, but because a delta can seek to a random position in the base to perform a copy we may need to inflate the base repeatedly just to complete one delta. So work around it by copying the base to a temporary file, and then we can read from that temporary file using random seeks instead. Its far more efficient because we now only need to inflate the base once. This is still really ugly because we have to dump to a temporary file, but at least the code can successfully process a large file without throwing OutOfMemoryError. If speed is an issue, the user will need to increase the JVM heap and ensure core.streamFileThreshold is set to a higher value, so we don't use this code path as often. Unfortunately we lose the "optimization" of skipping over portions of a delta base that we don't actually need in the final result. This is going to cause us to inflate and write to disk useless regions that were deleted and do not appear in the final result. We could later improve on our code by trying to flatten delta instruction streams before we touch the bottom base object, and then only store the portions of the base we really need for the final result and that appear out-of-order. Since that is some pretty complex code I'm punting on it for now and just doing this simple whole-object buffering. Because the process umask might be permitting other users to read files we create, we put the temporary buffers into $GIT_DIR/objects. We can reasonably assume that if a reader can read our temporary buffer file in that directory, they can also read the base pack file we are pulling it from and therefore its not a security breach to expose the inflated content in a file. This requires a reader to have write access to the repository, but only if the file is really big. I'd rather err on the side of caution here and refuse to read a very big file into /tmp than to possibly expose a secured content because the Java 5 JVM won't let us create a protected temporary file that only the current user can access. Change-Id: I66fb80b08cbcaf0f65f2db0462c546a495a160dd Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago
Buffer very large delta streams to reduce explosion of CPU work Large delta streams are unpacked incrementally, but because a delta can seek to a random position in the base to perform a copy we may need to inflate the base repeatedly just to complete one delta. So work around it by copying the base to a temporary file, and then we can read from that temporary file using random seeks instead. Its far more efficient because we now only need to inflate the base once. This is still really ugly because we have to dump to a temporary file, but at least the code can successfully process a large file without throwing OutOfMemoryError. If speed is an issue, the user will need to increase the JVM heap and ensure core.streamFileThreshold is set to a higher value, so we don't use this code path as often. Unfortunately we lose the "optimization" of skipping over portions of a delta base that we don't actually need in the final result. This is going to cause us to inflate and write to disk useless regions that were deleted and do not appear in the final result. We could later improve on our code by trying to flatten delta instruction streams before we touch the bottom base object, and then only store the portions of the base we really need for the final result and that appear out-of-order. Since that is some pretty complex code I'm punting on it for now and just doing this simple whole-object buffering. Because the process umask might be permitting other users to read files we create, we put the temporary buffers into $GIT_DIR/objects. We can reasonably assume that if a reader can read our temporary buffer file in that directory, they can also read the base pack file we are pulling it from and therefore its not a security breach to expose the inflated content in a file. This requires a reader to have write access to the repository, but only if the file is really big. I'd rather err on the side of caution here and refuse to read a very big file into /tmp than to possibly expose a secured content because the Java 5 JVM won't let us create a protected temporary file that only the current user can access. Change-Id: I66fb80b08cbcaf0f65f2db0462c546a495a160dd Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago
Buffer very large delta streams to reduce explosion of CPU work Large delta streams are unpacked incrementally, but because a delta can seek to a random position in the base to perform a copy we may need to inflate the base repeatedly just to complete one delta. So work around it by copying the base to a temporary file, and then we can read from that temporary file using random seeks instead. Its far more efficient because we now only need to inflate the base once. This is still really ugly because we have to dump to a temporary file, but at least the code can successfully process a large file without throwing OutOfMemoryError. If speed is an issue, the user will need to increase the JVM heap and ensure core.streamFileThreshold is set to a higher value, so we don't use this code path as often. Unfortunately we lose the "optimization" of skipping over portions of a delta base that we don't actually need in the final result. This is going to cause us to inflate and write to disk useless regions that were deleted and do not appear in the final result. We could later improve on our code by trying to flatten delta instruction streams before we touch the bottom base object, and then only store the portions of the base we really need for the final result and that appear out-of-order. Since that is some pretty complex code I'm punting on it for now and just doing this simple whole-object buffering. Because the process umask might be permitting other users to read files we create, we put the temporary buffers into $GIT_DIR/objects. We can reasonably assume that if a reader can read our temporary buffer file in that directory, they can also read the base pack file we are pulling it from and therefore its not a security breach to expose the inflated content in a file. This requires a reader to have write access to the repository, but only if the file is really big. I'd rather err on the side of caution here and refuse to read a very big file into /tmp than to possibly expose a secured content because the Java 5 JVM won't let us create a protected temporary file that only the current user can access. Change-Id: I66fb80b08cbcaf0f65f2db0462c546a495a160dd Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600
  1. /*
  2. * Copyright (C) 2008-2009, Google Inc.
  3. * Copyright (C) 2008, Shawn O. Pearce <spearce@spearce.org>
  4. * and other copyright owners as documented in the project's IP log.
  5. *
  6. * This program and the accompanying materials are made available
  7. * under the terms of the Eclipse Distribution License v1.0 which
  8. * accompanies this distribution, is reproduced below, and is
  9. * available at http://www.eclipse.org/org/documents/edl-v10.php
  10. *
  11. * All rights reserved.
  12. *
  13. * Redistribution and use in source and binary forms, with or
  14. * without modification, are permitted provided that the following
  15. * conditions are met:
  16. *
  17. * - Redistributions of source code must retain the above copyright
  18. * notice, this list of conditions and the following disclaimer.
  19. *
  20. * - Redistributions in binary form must reproduce the above
  21. * copyright notice, this list of conditions and the following
  22. * disclaimer in the documentation and/or other materials provided
  23. * with the distribution.
  24. *
  25. * - Neither the name of the Eclipse Foundation, Inc. nor the
  26. * names of its contributors may be used to endorse or promote
  27. * products derived from this software without specific prior
  28. * written permission.
  29. *
  30. * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
  31. * CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
  32. * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  33. * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  34. * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
  35. * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
  36. * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
  37. * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
  38. * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  39. * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
  40. * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
  41. * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
  42. * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  43. */
  44. package org.eclipse.jgit.util;
  45. import java.io.File;
  46. import java.io.FileInputStream;
  47. import java.io.FileOutputStream;
  48. import java.io.IOException;
  49. import java.io.InputStream;
  50. import java.io.OutputStream;
  51. import java.util.ArrayList;
  52. import org.eclipse.jgit.internal.JGitText;
  53. import org.eclipse.jgit.lib.NullProgressMonitor;
  54. import org.eclipse.jgit.lib.ProgressMonitor;
  55. import org.eclipse.jgit.util.io.SafeBufferedOutputStream;
  56. /**
  57. * A fully buffered output stream.
  58. * <p>
  59. * Subclasses determine the behavior when the in-memory buffer capacity has been
  60. * exceeded and additional bytes are still being received for output.
  61. */
  62. public abstract class TemporaryBuffer extends OutputStream {
  63. /** Default limit for in-core storage. */
  64. protected static final int DEFAULT_IN_CORE_LIMIT = 1024 * 1024;
  65. /** Chain of data, if we are still completely in-core; otherwise null. */
  66. private ArrayList<Block> blocks;
  67. /**
  68. * Maximum number of bytes we will permit storing in memory.
  69. * <p>
  70. * When this limit is reached the data will be shifted to a file on disk,
  71. * preventing the JVM heap from growing out of control.
  72. */
  73. private int inCoreLimit;
  74. /** If {@link #inCoreLimit} has been reached, remainder goes here. */
  75. private OutputStream overflow;
  76. /**
  77. * Create a new empty temporary buffer.
  78. *
  79. * @param limit
  80. * maximum number of bytes to store in memory before entering the
  81. * overflow output path.
  82. */
  83. protected TemporaryBuffer(final int limit) {
  84. inCoreLimit = limit;
  85. reset();
  86. }
  87. @Override
  88. public void write(final int b) throws IOException {
  89. if (overflow != null) {
  90. overflow.write(b);
  91. return;
  92. }
  93. Block s = last();
  94. if (s.isFull()) {
  95. if (reachedInCoreLimit()) {
  96. overflow.write(b);
  97. return;
  98. }
  99. s = new Block();
  100. blocks.add(s);
  101. }
  102. s.buffer[s.count++] = (byte) b;
  103. }
  104. @Override
  105. public void write(final byte[] b, int off, int len) throws IOException {
  106. if (overflow == null) {
  107. while (len > 0) {
  108. Block s = last();
  109. if (s.isFull()) {
  110. if (reachedInCoreLimit())
  111. break;
  112. s = new Block();
  113. blocks.add(s);
  114. }
  115. final int n = Math.min(s.buffer.length - s.count, len);
  116. System.arraycopy(b, off, s.buffer, s.count, n);
  117. s.count += n;
  118. len -= n;
  119. off += n;
  120. }
  121. }
  122. if (len > 0)
  123. overflow.write(b, off, len);
  124. }
  125. /**
  126. * Dumps the entire buffer into the overflow stream, and flushes it.
  127. *
  128. * @throws IOException
  129. * the overflow stream cannot be started, or the buffer contents
  130. * cannot be written to it, or it failed to flush.
  131. */
  132. protected void doFlush() throws IOException {
  133. if (overflow == null)
  134. switchToOverflow();
  135. overflow.flush();
  136. }
  137. /**
  138. * Copy all bytes remaining on the input stream into this buffer.
  139. *
  140. * @param in
  141. * the stream to read from, until EOF is reached.
  142. * @throws IOException
  143. * an error occurred reading from the input stream, or while
  144. * writing to a local temporary file.
  145. */
  146. public void copy(final InputStream in) throws IOException {
  147. if (blocks != null) {
  148. for (;;) {
  149. Block s = last();
  150. if (s.isFull()) {
  151. if (reachedInCoreLimit())
  152. break;
  153. s = new Block();
  154. blocks.add(s);
  155. }
  156. int n = in.read(s.buffer, s.count, s.buffer.length - s.count);
  157. if (n < 1)
  158. return;
  159. s.count += n;
  160. }
  161. }
  162. final byte[] tmp = new byte[Block.SZ];
  163. int n;
  164. while ((n = in.read(tmp)) > 0)
  165. overflow.write(tmp, 0, n);
  166. }
  167. /**
  168. * Obtain the length (in bytes) of the buffer.
  169. * <p>
  170. * The length is only accurate after {@link #close()} has been invoked.
  171. *
  172. * @return total length of the buffer, in bytes.
  173. */
  174. public long length() {
  175. return inCoreLength();
  176. }
  177. private long inCoreLength() {
  178. final Block last = last();
  179. return ((long) blocks.size() - 1) * Block.SZ + last.count;
  180. }
  181. /**
  182. * Convert this buffer's contents into a contiguous byte array.
  183. * <p>
  184. * The buffer is only complete after {@link #close()} has been invoked.
  185. *
  186. * @return the complete byte array; length matches {@link #length()}.
  187. * @throws IOException
  188. * an error occurred reading from a local temporary file
  189. * @throws OutOfMemoryError
  190. * the buffer cannot fit in memory
  191. */
  192. public byte[] toByteArray() throws IOException {
  193. final long len = length();
  194. if (Integer.MAX_VALUE < len)
  195. throw new OutOfMemoryError(JGitText.get().lengthExceedsMaximumArraySize);
  196. final byte[] out = new byte[(int) len];
  197. int outPtr = 0;
  198. for (final Block b : blocks) {
  199. System.arraycopy(b.buffer, 0, out, outPtr, b.count);
  200. outPtr += b.count;
  201. }
  202. return out;
  203. }
  204. /**
  205. * Send this buffer to an output stream.
  206. * <p>
  207. * This method may only be invoked after {@link #close()} has completed
  208. * normally, to ensure all data is completely transferred.
  209. *
  210. * @param os
  211. * stream to send this buffer's complete content to.
  212. * @param pm
  213. * if not null progress updates are sent here. Caller should
  214. * initialize the task and the number of work units to <code>
  215. * {@link #length()}/1024</code>.
  216. * @throws IOException
  217. * an error occurred reading from a temporary file on the local
  218. * system, or writing to the output stream.
  219. */
  220. public void writeTo(final OutputStream os, ProgressMonitor pm)
  221. throws IOException {
  222. if (pm == null)
  223. pm = NullProgressMonitor.INSTANCE;
  224. for (final Block b : blocks) {
  225. os.write(b.buffer, 0, b.count);
  226. pm.update(b.count / 1024);
  227. }
  228. }
  229. /**
  230. * Open an input stream to read from the buffered data.
  231. * <p>
  232. * This method may only be invoked after {@link #close()} has completed
  233. * normally, to ensure all data is completely transferred.
  234. *
  235. * @return a stream to read from the buffer. The caller must close the
  236. * stream when it is no longer useful.
  237. * @throws IOException
  238. * an error occurred opening the temporary file.
  239. */
  240. public InputStream openInputStream() throws IOException {
  241. return new BlockInputStream();
  242. }
  243. /** Reset this buffer for reuse, purging all buffered content. */
  244. public void reset() {
  245. if (overflow != null) {
  246. destroy();
  247. }
  248. if (inCoreLimit < Block.SZ) {
  249. blocks = new ArrayList<Block>(1);
  250. blocks.add(new Block(inCoreLimit));
  251. } else {
  252. blocks = new ArrayList<Block>(inCoreLimit / Block.SZ);
  253. blocks.add(new Block());
  254. }
  255. }
  256. /**
  257. * Open the overflow output stream, so the remaining output can be stored.
  258. *
  259. * @return the output stream to receive the buffered content, followed by
  260. * the remaining output.
  261. * @throws IOException
  262. * the buffer cannot create the overflow stream.
  263. */
  264. protected abstract OutputStream overflow() throws IOException;
  265. private Block last() {
  266. return blocks.get(blocks.size() - 1);
  267. }
  268. private boolean reachedInCoreLimit() throws IOException {
  269. if (inCoreLength() < inCoreLimit)
  270. return false;
  271. switchToOverflow();
  272. return true;
  273. }
  274. private void switchToOverflow() throws IOException {
  275. overflow = overflow();
  276. final Block last = blocks.remove(blocks.size() - 1);
  277. for (final Block b : blocks)
  278. overflow.write(b.buffer, 0, b.count);
  279. blocks = null;
  280. overflow = new SafeBufferedOutputStream(overflow, Block.SZ);
  281. overflow.write(last.buffer, 0, last.count);
  282. }
  283. public void close() throws IOException {
  284. if (overflow != null) {
  285. try {
  286. overflow.close();
  287. } finally {
  288. overflow = null;
  289. }
  290. }
  291. }
  292. /** Clear this buffer so it has no data, and cannot be used again. */
  293. public void destroy() {
  294. blocks = null;
  295. if (overflow != null) {
  296. try {
  297. overflow.close();
  298. } catch (IOException err) {
  299. // We shouldn't encounter an error closing the file.
  300. } finally {
  301. overflow = null;
  302. }
  303. }
  304. }
  305. /**
  306. * A fully buffered output stream using local disk storage for large data.
  307. * <p>
  308. * Initially this output stream buffers to memory and is therefore similar
  309. * to ByteArrayOutputStream, but it shifts to using an on disk temporary
  310. * file if the output gets too large.
  311. * <p>
  312. * The content of this buffered stream may be sent to another OutputStream
  313. * only after this stream has been properly closed by {@link #close()}.
  314. */
  315. public static class LocalFile extends TemporaryBuffer {
  316. /** Directory to store the temporary file under. */
  317. private final File directory;
  318. /**
  319. * Location of our temporary file if we are on disk; otherwise null.
  320. * <p>
  321. * If we exceeded the {@link #inCoreLimit} we nulled out {@link #blocks}
  322. * and created this file instead. All output goes here through
  323. * {@link #overflow}.
  324. */
  325. private File onDiskFile;
  326. /**
  327. * Create a new temporary buffer.
  328. *
  329. * @deprecated Use the {@code File} overload to supply a directory.
  330. */
  331. @Deprecated
  332. public LocalFile() {
  333. this(null, DEFAULT_IN_CORE_LIMIT);
  334. }
  335. /**
  336. * Create a new temporary buffer, limiting memory usage.
  337. *
  338. * @param inCoreLimit
  339. * maximum number of bytes to store in memory. Storage beyond
  340. * this limit will use the local file.
  341. * @deprecated Use the {@code File,int} overload to supply a directory.
  342. */
  343. @Deprecated
  344. public LocalFile(final int inCoreLimit) {
  345. this(null, inCoreLimit);
  346. }
  347. /**
  348. * Create a new temporary buffer, limiting memory usage.
  349. *
  350. * @param directory
  351. * if the buffer has to spill over into a temporary file, the
  352. * directory where the file should be saved. If null the
  353. * system default temporary directory (for example /tmp) will
  354. * be used instead.
  355. */
  356. public LocalFile(final File directory) {
  357. this(directory, DEFAULT_IN_CORE_LIMIT);
  358. }
  359. /**
  360. * Create a new temporary buffer, limiting memory usage.
  361. *
  362. * @param directory
  363. * if the buffer has to spill over into a temporary file, the
  364. * directory where the file should be saved. If null the
  365. * system default temporary directory (for example /tmp) will
  366. * be used instead.
  367. * @param inCoreLimit
  368. * maximum number of bytes to store in memory. Storage beyond
  369. * this limit will use the local file.
  370. */
  371. public LocalFile(final File directory, final int inCoreLimit) {
  372. super(inCoreLimit);
  373. this.directory = directory;
  374. }
  375. protected OutputStream overflow() throws IOException {
  376. onDiskFile = File.createTempFile("jgit_", ".buf", directory); //$NON-NLS-1$ //$NON-NLS-2$
  377. return new FileOutputStream(onDiskFile);
  378. }
  379. public long length() {
  380. if (onDiskFile == null) {
  381. return super.length();
  382. }
  383. return onDiskFile.length();
  384. }
  385. public byte[] toByteArray() throws IOException {
  386. if (onDiskFile == null) {
  387. return super.toByteArray();
  388. }
  389. final long len = length();
  390. if (Integer.MAX_VALUE < len)
  391. throw new OutOfMemoryError(JGitText.get().lengthExceedsMaximumArraySize);
  392. final byte[] out = new byte[(int) len];
  393. final FileInputStream in = new FileInputStream(onDiskFile);
  394. try {
  395. IO.readFully(in, out, 0, (int) len);
  396. } finally {
  397. in.close();
  398. }
  399. return out;
  400. }
  401. public void writeTo(final OutputStream os, ProgressMonitor pm)
  402. throws IOException {
  403. if (onDiskFile == null) {
  404. super.writeTo(os, pm);
  405. return;
  406. }
  407. if (pm == null)
  408. pm = NullProgressMonitor.INSTANCE;
  409. final FileInputStream in = new FileInputStream(onDiskFile);
  410. try {
  411. int cnt;
  412. final byte[] buf = new byte[Block.SZ];
  413. while ((cnt = in.read(buf)) >= 0) {
  414. os.write(buf, 0, cnt);
  415. pm.update(cnt / 1024);
  416. }
  417. } finally {
  418. in.close();
  419. }
  420. }
  421. @Override
  422. public InputStream openInputStream() throws IOException {
  423. if (onDiskFile == null)
  424. return super.openInputStream();
  425. return new FileInputStream(onDiskFile);
  426. }
  427. @Override
  428. public void destroy() {
  429. super.destroy();
  430. if (onDiskFile != null) {
  431. try {
  432. if (!onDiskFile.delete())
  433. onDiskFile.deleteOnExit();
  434. } finally {
  435. onDiskFile = null;
  436. }
  437. }
  438. }
  439. }
  440. /**
  441. * A temporary buffer that will never exceed its in-memory limit.
  442. * <p>
  443. * If the in-memory limit is reached an IOException is thrown, rather than
  444. * attempting to spool to local disk.
  445. */
  446. public static class Heap extends TemporaryBuffer {
  447. /**
  448. * Create a new heap buffer with a maximum storage limit.
  449. *
  450. * @param limit
  451. * maximum number of bytes that can be stored in this buffer.
  452. * Storing beyond this many will cause an IOException to be
  453. * thrown during write.
  454. */
  455. public Heap(final int limit) {
  456. super(limit);
  457. }
  458. @Override
  459. protected OutputStream overflow() throws IOException {
  460. throw new IOException(JGitText.get().inMemoryBufferLimitExceeded);
  461. }
  462. }
  463. static class Block {
  464. static final int SZ = 8 * 1024;
  465. final byte[] buffer;
  466. int count;
  467. Block() {
  468. buffer = new byte[SZ];
  469. }
  470. Block(int sz) {
  471. buffer = new byte[sz];
  472. }
  473. boolean isFull() {
  474. return count == buffer.length;
  475. }
  476. }
  477. private class BlockInputStream extends InputStream {
  478. private byte[] singleByteBuffer;
  479. private int blockIndex;
  480. private Block block;
  481. private int blockPos;
  482. BlockInputStream() {
  483. block = blocks.get(blockIndex);
  484. }
  485. @Override
  486. public int read() throws IOException {
  487. if (singleByteBuffer == null)
  488. singleByteBuffer = new byte[1];
  489. int n = read(singleByteBuffer);
  490. return n == 1 ? singleByteBuffer[0] & 0xff : -1;
  491. }
  492. @Override
  493. public long skip(long cnt) throws IOException {
  494. long skipped = 0;
  495. while (0 < cnt) {
  496. int n = (int) Math.min(block.count - blockPos, cnt);
  497. if (0 < n) {
  498. blockPos += n;
  499. skipped += n;
  500. cnt -= n;
  501. } else if (nextBlock())
  502. continue;
  503. else
  504. break;
  505. }
  506. return skipped;
  507. }
  508. @Override
  509. public int read(byte[] b, int off, int len) throws IOException {
  510. if (len == 0)
  511. return 0;
  512. int copied = 0;
  513. while (0 < len) {
  514. int c = Math.min(block.count - blockPos, len);
  515. if (0 < c) {
  516. System.arraycopy(block.buffer, blockPos, b, off, c);
  517. blockPos += c;
  518. off += c;
  519. len -= c;
  520. copied += c;
  521. } else if (nextBlock())
  522. continue;
  523. else
  524. break;
  525. }
  526. return 0 < copied ? copied : -1;
  527. }
  528. private boolean nextBlock() {
  529. if (++blockIndex < blocks.size()) {
  530. block = blocks.get(blockIndex);
  531. blockPos = 0;
  532. return true;
  533. }
  534. return false;
  535. }
  536. }
  537. }