You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

TemporaryBuffer.java 18KB

Buffer very large delta streams to reduce explosion of CPU work Large delta streams are unpacked incrementally, but because a delta can seek to a random position in the base to perform a copy we may need to inflate the base repeatedly just to complete one delta. So work around it by copying the base to a temporary file, and then we can read from that temporary file using random seeks instead. Its far more efficient because we now only need to inflate the base once. This is still really ugly because we have to dump to a temporary file, but at least the code can successfully process a large file without throwing OutOfMemoryError. If speed is an issue, the user will need to increase the JVM heap and ensure core.streamFileThreshold is set to a higher value, so we don't use this code path as often. Unfortunately we lose the "optimization" of skipping over portions of a delta base that we don't actually need in the final result. This is going to cause us to inflate and write to disk useless regions that were deleted and do not appear in the final result. We could later improve on our code by trying to flatten delta instruction streams before we touch the bottom base object, and then only store the portions of the base we really need for the final result and that appear out-of-order. Since that is some pretty complex code I'm punting on it for now and just doing this simple whole-object buffering. Because the process umask might be permitting other users to read files we create, we put the temporary buffers into $GIT_DIR/objects. We can reasonably assume that if a reader can read our temporary buffer file in that directory, they can also read the base pack file we are pulling it from and therefore its not a security breach to expose the inflated content in a file. This requires a reader to have write access to the repository, but only if the file is really big. I'd rather err on the side of caution here and refuse to read a very big file into /tmp than to possibly expose a secured content because the Java 5 JVM won't let us create a protected temporary file that only the current user can access. Change-Id: I66fb80b08cbcaf0f65f2db0462c546a495a160dd Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago
Buffer very large delta streams to reduce explosion of CPU work Large delta streams are unpacked incrementally, but because a delta can seek to a random position in the base to perform a copy we may need to inflate the base repeatedly just to complete one delta. So work around it by copying the base to a temporary file, and then we can read from that temporary file using random seeks instead. Its far more efficient because we now only need to inflate the base once. This is still really ugly because we have to dump to a temporary file, but at least the code can successfully process a large file without throwing OutOfMemoryError. If speed is an issue, the user will need to increase the JVM heap and ensure core.streamFileThreshold is set to a higher value, so we don't use this code path as often. Unfortunately we lose the "optimization" of skipping over portions of a delta base that we don't actually need in the final result. This is going to cause us to inflate and write to disk useless regions that were deleted and do not appear in the final result. We could later improve on our code by trying to flatten delta instruction streams before we touch the bottom base object, and then only store the portions of the base we really need for the final result and that appear out-of-order. Since that is some pretty complex code I'm punting on it for now and just doing this simple whole-object buffering. Because the process umask might be permitting other users to read files we create, we put the temporary buffers into $GIT_DIR/objects. We can reasonably assume that if a reader can read our temporary buffer file in that directory, they can also read the base pack file we are pulling it from and therefore its not a security breach to expose the inflated content in a file. This requires a reader to have write access to the repository, but only if the file is really big. I'd rather err on the side of caution here and refuse to read a very big file into /tmp than to possibly expose a secured content because the Java 5 JVM won't let us create a protected temporary file that only the current user can access. Change-Id: I66fb80b08cbcaf0f65f2db0462c546a495a160dd Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago
Buffer very large delta streams to reduce explosion of CPU work Large delta streams are unpacked incrementally, but because a delta can seek to a random position in the base to perform a copy we may need to inflate the base repeatedly just to complete one delta. So work around it by copying the base to a temporary file, and then we can read from that temporary file using random seeks instead. Its far more efficient because we now only need to inflate the base once. This is still really ugly because we have to dump to a temporary file, but at least the code can successfully process a large file without throwing OutOfMemoryError. If speed is an issue, the user will need to increase the JVM heap and ensure core.streamFileThreshold is set to a higher value, so we don't use this code path as often. Unfortunately we lose the "optimization" of skipping over portions of a delta base that we don't actually need in the final result. This is going to cause us to inflate and write to disk useless regions that were deleted and do not appear in the final result. We could later improve on our code by trying to flatten delta instruction streams before we touch the bottom base object, and then only store the portions of the base we really need for the final result and that appear out-of-order. Since that is some pretty complex code I'm punting on it for now and just doing this simple whole-object buffering. Because the process umask might be permitting other users to read files we create, we put the temporary buffers into $GIT_DIR/objects. We can reasonably assume that if a reader can read our temporary buffer file in that directory, they can also read the base pack file we are pulling it from and therefore its not a security breach to expose the inflated content in a file. This requires a reader to have write access to the repository, but only if the file is really big. I'd rather err on the side of caution here and refuse to read a very big file into /tmp than to possibly expose a secured content because the Java 5 JVM won't let us create a protected temporary file that only the current user can access. Change-Id: I66fb80b08cbcaf0f65f2db0462c546a495a160dd Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago
Buffer very large delta streams to reduce explosion of CPU work Large delta streams are unpacked incrementally, but because a delta can seek to a random position in the base to perform a copy we may need to inflate the base repeatedly just to complete one delta. So work around it by copying the base to a temporary file, and then we can read from that temporary file using random seeks instead. Its far more efficient because we now only need to inflate the base once. This is still really ugly because we have to dump to a temporary file, but at least the code can successfully process a large file without throwing OutOfMemoryError. If speed is an issue, the user will need to increase the JVM heap and ensure core.streamFileThreshold is set to a higher value, so we don't use this code path as often. Unfortunately we lose the "optimization" of skipping over portions of a delta base that we don't actually need in the final result. This is going to cause us to inflate and write to disk useless regions that were deleted and do not appear in the final result. We could later improve on our code by trying to flatten delta instruction streams before we touch the bottom base object, and then only store the portions of the base we really need for the final result and that appear out-of-order. Since that is some pretty complex code I'm punting on it for now and just doing this simple whole-object buffering. Because the process umask might be permitting other users to read files we create, we put the temporary buffers into $GIT_DIR/objects. We can reasonably assume that if a reader can read our temporary buffer file in that directory, they can also read the base pack file we are pulling it from and therefore its not a security breach to expose the inflated content in a file. This requires a reader to have write access to the repository, but only if the file is really big. I'd rather err on the side of caution here and refuse to read a very big file into /tmp than to possibly expose a secured content because the Java 5 JVM won't let us create a protected temporary file that only the current user can access. Change-Id: I66fb80b08cbcaf0f65f2db0462c546a495a160dd Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago
Buffer very large delta streams to reduce explosion of CPU work Large delta streams are unpacked incrementally, but because a delta can seek to a random position in the base to perform a copy we may need to inflate the base repeatedly just to complete one delta. So work around it by copying the base to a temporary file, and then we can read from that temporary file using random seeks instead. Its far more efficient because we now only need to inflate the base once. This is still really ugly because we have to dump to a temporary file, but at least the code can successfully process a large file without throwing OutOfMemoryError. If speed is an issue, the user will need to increase the JVM heap and ensure core.streamFileThreshold is set to a higher value, so we don't use this code path as often. Unfortunately we lose the "optimization" of skipping over portions of a delta base that we don't actually need in the final result. This is going to cause us to inflate and write to disk useless regions that were deleted and do not appear in the final result. We could later improve on our code by trying to flatten delta instruction streams before we touch the bottom base object, and then only store the portions of the base we really need for the final result and that appear out-of-order. Since that is some pretty complex code I'm punting on it for now and just doing this simple whole-object buffering. Because the process umask might be permitting other users to read files we create, we put the temporary buffers into $GIT_DIR/objects. We can reasonably assume that if a reader can read our temporary buffer file in that directory, they can also read the base pack file we are pulling it from and therefore its not a security breach to expose the inflated content in a file. This requires a reader to have write access to the repository, but only if the file is really big. I'd rather err on the side of caution here and refuse to read a very big file into /tmp than to possibly expose a secured content because the Java 5 JVM won't let us create a protected temporary file that only the current user can access. Change-Id: I66fb80b08cbcaf0f65f2db0462c546a495a160dd Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago
Buffer very large delta streams to reduce explosion of CPU work Large delta streams are unpacked incrementally, but because a delta can seek to a random position in the base to perform a copy we may need to inflate the base repeatedly just to complete one delta. So work around it by copying the base to a temporary file, and then we can read from that temporary file using random seeks instead. Its far more efficient because we now only need to inflate the base once. This is still really ugly because we have to dump to a temporary file, but at least the code can successfully process a large file without throwing OutOfMemoryError. If speed is an issue, the user will need to increase the JVM heap and ensure core.streamFileThreshold is set to a higher value, so we don't use this code path as often. Unfortunately we lose the "optimization" of skipping over portions of a delta base that we don't actually need in the final result. This is going to cause us to inflate and write to disk useless regions that were deleted and do not appear in the final result. We could later improve on our code by trying to flatten delta instruction streams before we touch the bottom base object, and then only store the portions of the base we really need for the final result and that appear out-of-order. Since that is some pretty complex code I'm punting on it for now and just doing this simple whole-object buffering. Because the process umask might be permitting other users to read files we create, we put the temporary buffers into $GIT_DIR/objects. We can reasonably assume that if a reader can read our temporary buffer file in that directory, they can also read the base pack file we are pulling it from and therefore its not a security breach to expose the inflated content in a file. This requires a reader to have write access to the repository, but only if the file is really big. I'd rather err on the side of caution here and refuse to read a very big file into /tmp than to possibly expose a secured content because the Java 5 JVM won't let us create a protected temporary file that only the current user can access. Change-Id: I66fb80b08cbcaf0f65f2db0462c546a495a160dd Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago
Buffer very large delta streams to reduce explosion of CPU work Large delta streams are unpacked incrementally, but because a delta can seek to a random position in the base to perform a copy we may need to inflate the base repeatedly just to complete one delta. So work around it by copying the base to a temporary file, and then we can read from that temporary file using random seeks instead. Its far more efficient because we now only need to inflate the base once. This is still really ugly because we have to dump to a temporary file, but at least the code can successfully process a large file without throwing OutOfMemoryError. If speed is an issue, the user will need to increase the JVM heap and ensure core.streamFileThreshold is set to a higher value, so we don't use this code path as often. Unfortunately we lose the "optimization" of skipping over portions of a delta base that we don't actually need in the final result. This is going to cause us to inflate and write to disk useless regions that were deleted and do not appear in the final result. We could later improve on our code by trying to flatten delta instruction streams before we touch the bottom base object, and then only store the portions of the base we really need for the final result and that appear out-of-order. Since that is some pretty complex code I'm punting on it for now and just doing this simple whole-object buffering. Because the process umask might be permitting other users to read files we create, we put the temporary buffers into $GIT_DIR/objects. We can reasonably assume that if a reader can read our temporary buffer file in that directory, they can also read the base pack file we are pulling it from and therefore its not a security breach to expose the inflated content in a file. This requires a reader to have write access to the repository, but only if the file is really big. I'd rather err on the side of caution here and refuse to read a very big file into /tmp than to possibly expose a secured content because the Java 5 JVM won't let us create a protected temporary file that only the current user can access. Change-Id: I66fb80b08cbcaf0f65f2db0462c546a495a160dd Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago
Buffer very large delta streams to reduce explosion of CPU work Large delta streams are unpacked incrementally, but because a delta can seek to a random position in the base to perform a copy we may need to inflate the base repeatedly just to complete one delta. So work around it by copying the base to a temporary file, and then we can read from that temporary file using random seeks instead. Its far more efficient because we now only need to inflate the base once. This is still really ugly because we have to dump to a temporary file, but at least the code can successfully process a large file without throwing OutOfMemoryError. If speed is an issue, the user will need to increase the JVM heap and ensure core.streamFileThreshold is set to a higher value, so we don't use this code path as often. Unfortunately we lose the "optimization" of skipping over portions of a delta base that we don't actually need in the final result. This is going to cause us to inflate and write to disk useless regions that were deleted and do not appear in the final result. We could later improve on our code by trying to flatten delta instruction streams before we touch the bottom base object, and then only store the portions of the base we really need for the final result and that appear out-of-order. Since that is some pretty complex code I'm punting on it for now and just doing this simple whole-object buffering. Because the process umask might be permitting other users to read files we create, we put the temporary buffers into $GIT_DIR/objects. We can reasonably assume that if a reader can read our temporary buffer file in that directory, they can also read the base pack file we are pulling it from and therefore its not a security breach to expose the inflated content in a file. This requires a reader to have write access to the repository, but only if the file is really big. I'd rather err on the side of caution here and refuse to read a very big file into /tmp than to possibly expose a secured content because the Java 5 JVM won't let us create a protected temporary file that only the current user can access. Change-Id: I66fb80b08cbcaf0f65f2db0462c546a495a160dd Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago
Buffer very large delta streams to reduce explosion of CPU work Large delta streams are unpacked incrementally, but because a delta can seek to a random position in the base to perform a copy we may need to inflate the base repeatedly just to complete one delta. So work around it by copying the base to a temporary file, and then we can read from that temporary file using random seeks instead. Its far more efficient because we now only need to inflate the base once. This is still really ugly because we have to dump to a temporary file, but at least the code can successfully process a large file without throwing OutOfMemoryError. If speed is an issue, the user will need to increase the JVM heap and ensure core.streamFileThreshold is set to a higher value, so we don't use this code path as often. Unfortunately we lose the "optimization" of skipping over portions of a delta base that we don't actually need in the final result. This is going to cause us to inflate and write to disk useless regions that were deleted and do not appear in the final result. We could later improve on our code by trying to flatten delta instruction streams before we touch the bottom base object, and then only store the portions of the base we really need for the final result and that appear out-of-order. Since that is some pretty complex code I'm punting on it for now and just doing this simple whole-object buffering. Because the process umask might be permitting other users to read files we create, we put the temporary buffers into $GIT_DIR/objects. We can reasonably assume that if a reader can read our temporary buffer file in that directory, they can also read the base pack file we are pulling it from and therefore its not a security breach to expose the inflated content in a file. This requires a reader to have write access to the repository, but only if the file is really big. I'd rather err on the side of caution here and refuse to read a very big file into /tmp than to possibly expose a secured content because the Java 5 JVM won't let us create a protected temporary file that only the current user can access. Change-Id: I66fb80b08cbcaf0f65f2db0462c546a495a160dd Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago
Buffer very large delta streams to reduce explosion of CPU work Large delta streams are unpacked incrementally, but because a delta can seek to a random position in the base to perform a copy we may need to inflate the base repeatedly just to complete one delta. So work around it by copying the base to a temporary file, and then we can read from that temporary file using random seeks instead. Its far more efficient because we now only need to inflate the base once. This is still really ugly because we have to dump to a temporary file, but at least the code can successfully process a large file without throwing OutOfMemoryError. If speed is an issue, the user will need to increase the JVM heap and ensure core.streamFileThreshold is set to a higher value, so we don't use this code path as often. Unfortunately we lose the "optimization" of skipping over portions of a delta base that we don't actually need in the final result. This is going to cause us to inflate and write to disk useless regions that were deleted and do not appear in the final result. We could later improve on our code by trying to flatten delta instruction streams before we touch the bottom base object, and then only store the portions of the base we really need for the final result and that appear out-of-order. Since that is some pretty complex code I'm punting on it for now and just doing this simple whole-object buffering. Because the process umask might be permitting other users to read files we create, we put the temporary buffers into $GIT_DIR/objects. We can reasonably assume that if a reader can read our temporary buffer file in that directory, they can also read the base pack file we are pulling it from and therefore its not a security breach to expose the inflated content in a file. This requires a reader to have write access to the repository, but only if the file is really big. I'd rather err on the side of caution here and refuse to read a very big file into /tmp than to possibly expose a secured content because the Java 5 JVM won't let us create a protected temporary file that only the current user can access. Change-Id: I66fb80b08cbcaf0f65f2db0462c546a495a160dd Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago
Buffer very large delta streams to reduce explosion of CPU work Large delta streams are unpacked incrementally, but because a delta can seek to a random position in the base to perform a copy we may need to inflate the base repeatedly just to complete one delta. So work around it by copying the base to a temporary file, and then we can read from that temporary file using random seeks instead. Its far more efficient because we now only need to inflate the base once. This is still really ugly because we have to dump to a temporary file, but at least the code can successfully process a large file without throwing OutOfMemoryError. If speed is an issue, the user will need to increase the JVM heap and ensure core.streamFileThreshold is set to a higher value, so we don't use this code path as often. Unfortunately we lose the "optimization" of skipping over portions of a delta base that we don't actually need in the final result. This is going to cause us to inflate and write to disk useless regions that were deleted and do not appear in the final result. We could later improve on our code by trying to flatten delta instruction streams before we touch the bottom base object, and then only store the portions of the base we really need for the final result and that appear out-of-order. Since that is some pretty complex code I'm punting on it for now and just doing this simple whole-object buffering. Because the process umask might be permitting other users to read files we create, we put the temporary buffers into $GIT_DIR/objects. We can reasonably assume that if a reader can read our temporary buffer file in that directory, they can also read the base pack file we are pulling it from and therefore its not a security breach to expose the inflated content in a file. This requires a reader to have write access to the repository, but only if the file is really big. I'd rather err on the side of caution here and refuse to read a very big file into /tmp than to possibly expose a secured content because the Java 5 JVM won't let us create a protected temporary file that only the current user can access. Change-Id: I66fb80b08cbcaf0f65f2db0462c546a495a160dd Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692
  1. /*
  2. * Copyright (C) 2008-2009, Google Inc.
  3. * Copyright (C) 2008, Shawn O. Pearce <spearce@spearce.org> and others
  4. *
  5. * This program and the accompanying materials are made available under the
  6. * terms of the Eclipse Distribution License v. 1.0 which is available at
  7. * https://www.eclipse.org/org/documents/edl-v10.php.
  8. *
  9. * SPDX-License-Identifier: BSD-3-Clause
  10. */
  11. package org.eclipse.jgit.util;
  12. import java.io.BufferedOutputStream;
  13. import java.io.File;
  14. import java.io.FileInputStream;
  15. import java.io.FileOutputStream;
  16. import java.io.IOException;
  17. import java.io.InputStream;
  18. import java.io.OutputStream;
  19. import java.io.UncheckedIOException;
  20. import java.util.ArrayList;
  21. import org.eclipse.jgit.internal.JGitText;
  22. import org.eclipse.jgit.lib.NullProgressMonitor;
  23. import org.eclipse.jgit.lib.ProgressMonitor;
  24. /**
  25. * A fully buffered output stream.
  26. * <p>
  27. * Subclasses determine the behavior when the in-memory buffer capacity has been
  28. * exceeded and additional bytes are still being received for output.
  29. */
  30. public abstract class TemporaryBuffer extends OutputStream {
  31. /** Default limit for in-core storage. */
  32. protected static final int DEFAULT_IN_CORE_LIMIT = 1024 * 1024;
  33. /** Chain of data, if we are still completely in-core; otherwise null. */
  34. ArrayList<Block> blocks;
  35. /**
  36. * Maximum number of bytes we will permit storing in memory.
  37. * <p>
  38. * When this limit is reached the data will be shifted to a file on disk,
  39. * preventing the JVM heap from growing out of control.
  40. */
  41. private int inCoreLimit;
  42. /** Initial size of block list. */
  43. private int initialBlocks;
  44. /** If {@link #inCoreLimit} has been reached, remainder goes here. */
  45. private OutputStream overflow;
  46. /**
  47. * Create a new empty temporary buffer.
  48. *
  49. * @param limit
  50. * maximum number of bytes to store in memory before entering the
  51. * overflow output path; also used as the estimated size.
  52. */
  53. protected TemporaryBuffer(int limit) {
  54. this(limit, limit);
  55. }
  56. /**
  57. * Create a new empty temporary buffer.
  58. *
  59. * @param estimatedSize
  60. * estimated size of storage used, to size the initial list of
  61. * block pointers.
  62. * @param limit
  63. * maximum number of bytes to store in memory before entering the
  64. * overflow output path.
  65. * @since 4.0
  66. */
  67. protected TemporaryBuffer(int estimatedSize, int limit) {
  68. if (estimatedSize > limit)
  69. throw new IllegalArgumentException();
  70. this.inCoreLimit = limit;
  71. this.initialBlocks = (estimatedSize - 1) / Block.SZ + 1;
  72. reset();
  73. }
  74. /** {@inheritDoc} */
  75. @Override
  76. public void write(int b) throws IOException {
  77. if (overflow != null) {
  78. overflow.write(b);
  79. return;
  80. }
  81. Block s = last();
  82. if (s.isFull()) {
  83. if (reachedInCoreLimit()) {
  84. overflow.write(b);
  85. return;
  86. }
  87. s = new Block();
  88. blocks.add(s);
  89. }
  90. s.buffer[s.count++] = (byte) b;
  91. }
  92. /** {@inheritDoc} */
  93. @Override
  94. public void write(byte[] b, int off, int len) throws IOException {
  95. if (overflow == null) {
  96. while (len > 0) {
  97. Block s = last();
  98. if (s.isFull()) {
  99. if (reachedInCoreLimit())
  100. break;
  101. s = new Block();
  102. blocks.add(s);
  103. }
  104. final int n = Math.min(s.buffer.length - s.count, len);
  105. System.arraycopy(b, off, s.buffer, s.count, n);
  106. s.count += n;
  107. len -= n;
  108. off += n;
  109. }
  110. }
  111. if (len > 0)
  112. overflow.write(b, off, len);
  113. }
  114. /**
  115. * Dumps the entire buffer into the overflow stream, and flushes it.
  116. *
  117. * @throws java.io.IOException
  118. * the overflow stream cannot be started, or the buffer contents
  119. * cannot be written to it, or it failed to flush.
  120. */
  121. protected void doFlush() throws IOException {
  122. if (overflow == null)
  123. switchToOverflow();
  124. overflow.flush();
  125. }
  126. /**
  127. * Copy all bytes remaining on the input stream into this buffer.
  128. *
  129. * @param in
  130. * the stream to read from, until EOF is reached.
  131. * @throws java.io.IOException
  132. * an error occurred reading from the input stream, or while
  133. * writing to a local temporary file.
  134. */
  135. public void copy(InputStream in) throws IOException {
  136. if (blocks != null) {
  137. for (;;) {
  138. Block s = last();
  139. if (s.isFull()) {
  140. if (reachedInCoreLimit())
  141. break;
  142. s = new Block();
  143. blocks.add(s);
  144. }
  145. int n = in.read(s.buffer, s.count, s.buffer.length - s.count);
  146. if (n < 1)
  147. return;
  148. s.count += n;
  149. }
  150. }
  151. final byte[] tmp = new byte[Block.SZ];
  152. int n;
  153. while ((n = in.read(tmp)) > 0)
  154. overflow.write(tmp, 0, n);
  155. }
  156. /**
  157. * Obtain the length (in bytes) of the buffer.
  158. * <p>
  159. * The length is only accurate after {@link #close()} has been invoked.
  160. *
  161. * @return total length of the buffer, in bytes.
  162. */
  163. public long length() {
  164. return inCoreLength();
  165. }
  166. private long inCoreLength() {
  167. final Block last = last();
  168. return ((long) blocks.size() - 1) * Block.SZ + last.count;
  169. }
  170. /**
  171. * Convert this buffer's contents into a contiguous byte array.
  172. * <p>
  173. * The buffer is only complete after {@link #close()} has been invoked.
  174. *
  175. * @return the complete byte array; length matches {@link #length()}.
  176. * @throws java.io.IOException
  177. * an error occurred reading from a local temporary file
  178. */
  179. public byte[] toByteArray() throws IOException {
  180. final long len = length();
  181. if (Integer.MAX_VALUE < len)
  182. throw new OutOfMemoryError(JGitText.get().lengthExceedsMaximumArraySize);
  183. final byte[] out = new byte[(int) len];
  184. int outPtr = 0;
  185. for (Block b : blocks) {
  186. System.arraycopy(b.buffer, 0, out, outPtr, b.count);
  187. outPtr += b.count;
  188. }
  189. return out;
  190. }
  191. /**
  192. * Convert first {@code limit} number of bytes of the buffer content to
  193. * String.
  194. *
  195. * @param limit
  196. * the maximum number of bytes to be converted to String
  197. * @return first {@code limit} number of bytes of the buffer content
  198. * converted to String.
  199. * @since 5.12
  200. */
  201. public String toString(int limit) {
  202. try {
  203. return RawParseUtils.decode(toByteArray(limit));
  204. } catch (IOException e) {
  205. throw new UncheckedIOException(e);
  206. }
  207. }
  208. /**
  209. * Convert this buffer's contents into a contiguous byte array. If this size
  210. * of the buffer exceeds the limit only return the first {@code limit} bytes
  211. * <p>
  212. * The buffer is only complete after {@link #close()} has been invoked.
  213. *
  214. * @param limit
  215. * the maximum number of bytes to be returned
  216. * @return the byte array limited to {@code limit} bytes.
  217. * @throws java.io.IOException
  218. * an error occurred reading from a local temporary file
  219. * @since 4.2
  220. */
  221. public byte[] toByteArray(int limit) throws IOException {
  222. final long len = Math.min(length(), limit);
  223. if (Integer.MAX_VALUE < len)
  224. throw new OutOfMemoryError(
  225. JGitText.get().lengthExceedsMaximumArraySize);
  226. int length = (int) len;
  227. final byte[] out = new byte[length];
  228. int outPtr = 0;
  229. for (Block b : blocks) {
  230. int toCopy = Math.min(length - outPtr, b.count);
  231. System.arraycopy(b.buffer, 0, out, outPtr, toCopy);
  232. outPtr += toCopy;
  233. if (outPtr == length) {
  234. break;
  235. }
  236. }
  237. return out;
  238. }
  239. /**
  240. * Send this buffer to an output stream.
  241. * <p>
  242. * This method may only be invoked after {@link #close()} has completed
  243. * normally, to ensure all data is completely transferred.
  244. *
  245. * @param os
  246. * stream to send this buffer's complete content to.
  247. * @param pm
  248. * if not null progress updates are sent here. Caller should
  249. * initialize the task and the number of work units to <code>
  250. * {@link #length()}/1024</code>.
  251. * @throws java.io.IOException
  252. * an error occurred reading from a temporary file on the local
  253. * system, or writing to the output stream.
  254. */
  255. public void writeTo(OutputStream os, ProgressMonitor pm)
  256. throws IOException {
  257. if (pm == null)
  258. pm = NullProgressMonitor.INSTANCE;
  259. for (Block b : blocks) {
  260. os.write(b.buffer, 0, b.count);
  261. pm.update(b.count / 1024);
  262. }
  263. }
  264. /**
  265. * Open an input stream to read from the buffered data.
  266. * <p>
  267. * This method may only be invoked after {@link #close()} has completed
  268. * normally, to ensure all data is completely transferred.
  269. *
  270. * @return a stream to read from the buffer. The caller must close the
  271. * stream when it is no longer useful.
  272. * @throws java.io.IOException
  273. * an error occurred opening the temporary file.
  274. */
  275. public InputStream openInputStream() throws IOException {
  276. return new BlockInputStream();
  277. }
  278. /**
  279. * Same as {@link #openInputStream()} but handling destruction of any
  280. * associated resources automatically when closing the returned stream.
  281. *
  282. * @return an InputStream which will automatically destroy any associated
  283. * temporary file on {@link #close()}
  284. * @throws IOException
  285. * in case of an error.
  286. * @since 4.11
  287. */
  288. public InputStream openInputStreamWithAutoDestroy() throws IOException {
  289. return new BlockInputStream() {
  290. @Override
  291. public void close() throws IOException {
  292. super.close();
  293. destroy();
  294. }
  295. };
  296. }
  297. /**
  298. * Reset this buffer for reuse, purging all buffered content.
  299. */
  300. public void reset() {
  301. if (overflow != null) {
  302. destroy();
  303. }
  304. if (blocks != null)
  305. blocks.clear();
  306. else
  307. blocks = new ArrayList<>(initialBlocks);
  308. blocks.add(new Block(Math.min(inCoreLimit, Block.SZ)));
  309. }
  310. /**
  311. * Open the overflow output stream, so the remaining output can be stored.
  312. *
  313. * @return the output stream to receive the buffered content, followed by
  314. * the remaining output.
  315. * @throws java.io.IOException
  316. * the buffer cannot create the overflow stream.
  317. */
  318. protected abstract OutputStream overflow() throws IOException;
  319. private Block last() {
  320. return blocks.get(blocks.size() - 1);
  321. }
  322. private boolean reachedInCoreLimit() throws IOException {
  323. if (inCoreLength() < inCoreLimit)
  324. return false;
  325. switchToOverflow();
  326. return true;
  327. }
  328. private void switchToOverflow() throws IOException {
  329. overflow = overflow();
  330. final Block last = blocks.remove(blocks.size() - 1);
  331. for (Block b : blocks)
  332. overflow.write(b.buffer, 0, b.count);
  333. blocks = null;
  334. overflow = new BufferedOutputStream(overflow, Block.SZ);
  335. overflow.write(last.buffer, 0, last.count);
  336. }
  337. /** {@inheritDoc} */
  338. @Override
  339. public void close() throws IOException {
  340. if (overflow != null) {
  341. try {
  342. overflow.close();
  343. } finally {
  344. overflow = null;
  345. }
  346. }
  347. }
  348. /**
  349. * Clear this buffer so it has no data, and cannot be used again.
  350. */
  351. public void destroy() {
  352. blocks = null;
  353. if (overflow != null) {
  354. try {
  355. overflow.close();
  356. } catch (IOException err) {
  357. // We shouldn't encounter an error closing the file.
  358. } finally {
  359. overflow = null;
  360. }
  361. }
  362. }
  363. /**
  364. * A fully buffered output stream using local disk storage for large data.
  365. * <p>
  366. * Initially this output stream buffers to memory and is therefore similar
  367. * to ByteArrayOutputStream, but it shifts to using an on disk temporary
  368. * file if the output gets too large.
  369. * <p>
  370. * The content of this buffered stream may be sent to another OutputStream
  371. * only after this stream has been properly closed by {@link #close()}.
  372. */
  373. public static class LocalFile extends TemporaryBuffer {
  374. /** Directory to store the temporary file under. */
  375. private final File directory;
  376. /**
  377. * Location of our temporary file if we are on disk; otherwise null.
  378. * <p>
  379. * If we exceeded the {@link #inCoreLimit} we nulled out {@link #blocks}
  380. * and created this file instead. All output goes here through
  381. * {@link #overflow}.
  382. */
  383. private File onDiskFile;
  384. /**
  385. * Create a new temporary buffer, limiting memory usage.
  386. *
  387. * @param directory
  388. * if the buffer has to spill over into a temporary file, the
  389. * directory where the file should be saved. If null the
  390. * system default temporary directory (for example /tmp) will
  391. * be used instead.
  392. */
  393. public LocalFile(File directory) {
  394. this(directory, DEFAULT_IN_CORE_LIMIT);
  395. }
  396. /**
  397. * Create a new temporary buffer, limiting memory usage.
  398. *
  399. * @param directory
  400. * if the buffer has to spill over into a temporary file, the
  401. * directory where the file should be saved. If null the
  402. * system default temporary directory (for example /tmp) will
  403. * be used instead.
  404. * @param inCoreLimit
  405. * maximum number of bytes to store in memory. Storage beyond
  406. * this limit will use the local file.
  407. */
  408. public LocalFile(File directory, int inCoreLimit) {
  409. super(inCoreLimit);
  410. this.directory = directory;
  411. }
  412. @Override
  413. protected OutputStream overflow() throws IOException {
  414. onDiskFile = File.createTempFile("jgit_", ".buf", directory); //$NON-NLS-1$ //$NON-NLS-2$
  415. return new BufferedOutputStream(new FileOutputStream(onDiskFile));
  416. }
  417. @Override
  418. public long length() {
  419. if (onDiskFile == null) {
  420. return super.length();
  421. }
  422. return onDiskFile.length();
  423. }
  424. @Override
  425. public byte[] toByteArray() throws IOException {
  426. if (onDiskFile == null) {
  427. return super.toByteArray();
  428. }
  429. final long len = length();
  430. if (Integer.MAX_VALUE < len)
  431. throw new OutOfMemoryError(JGitText.get().lengthExceedsMaximumArraySize);
  432. final byte[] out = new byte[(int) len];
  433. try (FileInputStream in = new FileInputStream(onDiskFile)) {
  434. IO.readFully(in, out, 0, (int) len);
  435. }
  436. return out;
  437. }
  438. @Override
  439. public byte[] toByteArray(int limit) throws IOException {
  440. if (onDiskFile == null) {
  441. return super.toByteArray(limit);
  442. }
  443. final long len = Math.min(length(), limit);
  444. if (Integer.MAX_VALUE < len) {
  445. throw new OutOfMemoryError(
  446. JGitText.get().lengthExceedsMaximumArraySize);
  447. }
  448. final byte[] out = new byte[(int) len];
  449. try (FileInputStream in = new FileInputStream(onDiskFile)) {
  450. int read = 0;
  451. int chunk;
  452. while ((chunk = in.read(out, read, out.length - read)) >= 0) {
  453. read += chunk;
  454. if (read == out.length) {
  455. break;
  456. }
  457. }
  458. }
  459. return out;
  460. }
  461. @Override
  462. public void writeTo(OutputStream os, ProgressMonitor pm)
  463. throws IOException {
  464. if (onDiskFile == null) {
  465. super.writeTo(os, pm);
  466. return;
  467. }
  468. if (pm == null)
  469. pm = NullProgressMonitor.INSTANCE;
  470. try (FileInputStream in = new FileInputStream(onDiskFile)) {
  471. int cnt;
  472. final byte[] buf = new byte[Block.SZ];
  473. while ((cnt = in.read(buf)) >= 0) {
  474. os.write(buf, 0, cnt);
  475. pm.update(cnt / 1024);
  476. }
  477. }
  478. }
  479. @Override
  480. public InputStream openInputStream() throws IOException {
  481. if (onDiskFile == null)
  482. return super.openInputStream();
  483. return new FileInputStream(onDiskFile);
  484. }
  485. @Override
  486. public InputStream openInputStreamWithAutoDestroy() throws IOException {
  487. if (onDiskFile == null) {
  488. return super.openInputStreamWithAutoDestroy();
  489. }
  490. return new FileInputStream(onDiskFile) {
  491. @Override
  492. public void close() throws IOException {
  493. super.close();
  494. destroy();
  495. }
  496. };
  497. }
  498. @Override
  499. public void destroy() {
  500. super.destroy();
  501. if (onDiskFile != null) {
  502. try {
  503. if (!onDiskFile.delete())
  504. onDiskFile.deleteOnExit();
  505. } finally {
  506. onDiskFile = null;
  507. }
  508. }
  509. }
  510. }
  511. /**
  512. * A temporary buffer that will never exceed its in-memory limit.
  513. * <p>
  514. * If the in-memory limit is reached an IOException is thrown, rather than
  515. * attempting to spool to local disk.
  516. */
  517. public static class Heap extends TemporaryBuffer {
  518. /**
  519. * Create a new heap buffer with a maximum storage limit.
  520. *
  521. * @param limit
  522. * maximum number of bytes that can be stored in this buffer;
  523. * also used as the estimated size. Storing beyond this many
  524. * will cause an IOException to be thrown during write.
  525. */
  526. public Heap(int limit) {
  527. super(limit);
  528. }
  529. /**
  530. * Create a new heap buffer with a maximum storage limit.
  531. *
  532. * @param estimatedSize
  533. * estimated size of storage used, to size the initial list of
  534. * block pointers.
  535. * @param limit
  536. * maximum number of bytes that can be stored in this buffer.
  537. * Storing beyond this many will cause an IOException to be
  538. * thrown during write.
  539. * @since 4.0
  540. */
  541. public Heap(int estimatedSize, int limit) {
  542. super(estimatedSize, limit);
  543. }
  544. @Override
  545. protected OutputStream overflow() throws IOException {
  546. throw new IOException(JGitText.get().inMemoryBufferLimitExceeded);
  547. }
  548. }
  549. static class Block {
  550. static final int SZ = 8 * 1024;
  551. final byte[] buffer;
  552. int count;
  553. Block() {
  554. buffer = new byte[SZ];
  555. }
  556. Block(int sz) {
  557. buffer = new byte[sz];
  558. }
  559. boolean isFull() {
  560. return count == buffer.length;
  561. }
  562. }
  563. private class BlockInputStream extends InputStream {
  564. private byte[] singleByteBuffer;
  565. private int blockIndex;
  566. private Block block;
  567. private int blockPos;
  568. BlockInputStream() {
  569. block = blocks.get(blockIndex);
  570. }
  571. @Override
  572. public int read() throws IOException {
  573. if (singleByteBuffer == null)
  574. singleByteBuffer = new byte[1];
  575. int n = read(singleByteBuffer);
  576. return n == 1 ? singleByteBuffer[0] & 0xff : -1;
  577. }
  578. @Override
  579. public long skip(long cnt) throws IOException {
  580. long skipped = 0;
  581. while (0 < cnt) {
  582. int n = (int) Math.min(block.count - blockPos, cnt);
  583. if (0 < n) {
  584. blockPos += n;
  585. skipped += n;
  586. cnt -= n;
  587. } else if (nextBlock())
  588. continue;
  589. else
  590. break;
  591. }
  592. return skipped;
  593. }
  594. @Override
  595. public int read(byte[] b, int off, int len) throws IOException {
  596. if (len == 0)
  597. return 0;
  598. int copied = 0;
  599. while (0 < len) {
  600. int c = Math.min(block.count - blockPos, len);
  601. if (0 < c) {
  602. System.arraycopy(block.buffer, blockPos, b, off, c);
  603. blockPos += c;
  604. off += c;
  605. len -= c;
  606. copied += c;
  607. } else if (nextBlock())
  608. continue;
  609. else
  610. break;
  611. }
  612. return 0 < copied ? copied : -1;
  613. }
  614. private boolean nextBlock() {
  615. if (++blockIndex < blocks.size()) {
  616. block = blocks.get(blockIndex);
  617. blockPos = 0;
  618. return true;
  619. }
  620. return false;
  621. }
  622. }
  623. }