You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

TeeInputStream.java 3.9KB

Buffer very large delta streams to reduce explosion of CPU work Large delta streams are unpacked incrementally, but because a delta can seek to a random position in the base to perform a copy we may need to inflate the base repeatedly just to complete one delta. So work around it by copying the base to a temporary file, and then we can read from that temporary file using random seeks instead. Its far more efficient because we now only need to inflate the base once. This is still really ugly because we have to dump to a temporary file, but at least the code can successfully process a large file without throwing OutOfMemoryError. If speed is an issue, the user will need to increase the JVM heap and ensure core.streamFileThreshold is set to a higher value, so we don't use this code path as often. Unfortunately we lose the "optimization" of skipping over portions of a delta base that we don't actually need in the final result. This is going to cause us to inflate and write to disk useless regions that were deleted and do not appear in the final result. We could later improve on our code by trying to flatten delta instruction streams before we touch the bottom base object, and then only store the portions of the base we really need for the final result and that appear out-of-order. Since that is some pretty complex code I'm punting on it for now and just doing this simple whole-object buffering. Because the process umask might be permitting other users to read files we create, we put the temporary buffers into $GIT_DIR/objects. We can reasonably assume that if a reader can read our temporary buffer file in that directory, they can also read the base pack file we are pulling it from and therefore its not a security breach to expose the inflated content in a file. This requires a reader to have write access to the repository, but only if the file is really big. I'd rather err on the side of caution here and refuse to read a very big file into /tmp than to possibly expose a secured content because the Java 5 JVM won't let us create a protected temporary file that only the current user can access. Change-Id: I66fb80b08cbcaf0f65f2db0462c546a495a160dd Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago
Buffer very large delta streams to reduce explosion of CPU work Large delta streams are unpacked incrementally, but because a delta can seek to a random position in the base to perform a copy we may need to inflate the base repeatedly just to complete one delta. So work around it by copying the base to a temporary file, and then we can read from that temporary file using random seeks instead. Its far more efficient because we now only need to inflate the base once. This is still really ugly because we have to dump to a temporary file, but at least the code can successfully process a large file without throwing OutOfMemoryError. If speed is an issue, the user will need to increase the JVM heap and ensure core.streamFileThreshold is set to a higher value, so we don't use this code path as often. Unfortunately we lose the "optimization" of skipping over portions of a delta base that we don't actually need in the final result. This is going to cause us to inflate and write to disk useless regions that were deleted and do not appear in the final result. We could later improve on our code by trying to flatten delta instruction streams before we touch the bottom base object, and then only store the portions of the base we really need for the final result and that appear out-of-order. Since that is some pretty complex code I'm punting on it for now and just doing this simple whole-object buffering. Because the process umask might be permitting other users to read files we create, we put the temporary buffers into $GIT_DIR/objects. We can reasonably assume that if a reader can read our temporary buffer file in that directory, they can also read the base pack file we are pulling it from and therefore its not a security breach to expose the inflated content in a file. This requires a reader to have write access to the repository, but only if the file is really big. I'd rather err on the side of caution here and refuse to read a very big file into /tmp than to possibly expose a secured content because the Java 5 JVM won't let us create a protected temporary file that only the current user can access. Change-Id: I66fb80b08cbcaf0f65f2db0462c546a495a160dd Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134
  1. /*
  2. * Copyright (C) 2010, Google Inc.
  3. * and other copyright owners as documented in the project's IP log.
  4. *
  5. * This program and the accompanying materials are made available
  6. * under the terms of the Eclipse Distribution License v1.0 which
  7. * accompanies this distribution, is reproduced below, and is
  8. * available at http://www.eclipse.org/org/documents/edl-v10.php
  9. *
  10. * All rights reserved.
  11. *
  12. * Redistribution and use in source and binary forms, with or
  13. * without modification, are permitted provided that the following
  14. * conditions are met:
  15. *
  16. * - Redistributions of source code must retain the above copyright
  17. * notice, this list of conditions and the following disclaimer.
  18. *
  19. * - Redistributions in binary form must reproduce the above
  20. * copyright notice, this list of conditions and the following
  21. * disclaimer in the documentation and/or other materials provided
  22. * with the distribution.
  23. *
  24. * - Neither the name of the Eclipse Foundation, Inc. nor the
  25. * names of its contributors may be used to endorse or promote
  26. * products derived from this software without specific prior
  27. * written permission.
  28. *
  29. * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
  30. * CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
  31. * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  32. * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  33. * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
  34. * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
  35. * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
  36. * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
  37. * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  38. * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
  39. * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
  40. * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
  41. * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  42. */
  43. package org.eclipse.jgit.util.io;
  44. import java.io.IOException;
  45. import java.io.InputStream;
  46. import java.io.OutputStream;
  47. import org.eclipse.jgit.util.TemporaryBuffer;
  48. /**
  49. * Input stream that copies data read to another output stream.
  50. *
  51. * This stream is primarily useful with a {@link TemporaryBuffer}, where any
  52. * data read or skipped by the caller is also duplicated into the temporary
  53. * buffer. Later the temporary buffer can then be used instead of the original
  54. * source stream.
  55. *
  56. * During close this stream copies any remaining data from the source stream
  57. * into the destination stream.
  58. */
  59. public class TeeInputStream extends InputStream {
  60. private byte[] skipBuffer;
  61. private InputStream src;
  62. private OutputStream dst;
  63. /**
  64. * Initialize a tee input stream.
  65. *
  66. * @param src
  67. * source stream to consume.
  68. * @param dst
  69. * destination to copy the source to as it is consumed. Typically
  70. * this is a {@link TemporaryBuffer}.
  71. */
  72. public TeeInputStream(InputStream src, OutputStream dst) {
  73. this.src = src;
  74. this.dst = dst;
  75. }
  76. @Override
  77. public int read() throws IOException {
  78. byte[] b = skipBuffer();
  79. int n = read(b, 0, 1);
  80. return n == 1 ? b[0] & 0xff : -1;
  81. }
  82. @Override
  83. public long skip(long cnt) throws IOException {
  84. long skipped = 0;
  85. byte[] b = skipBuffer();
  86. while (0 < cnt) {
  87. int n = src.read(b, 0, (int) Math.min(b.length, cnt));
  88. if (n <= 0)
  89. break;
  90. dst.write(b, 0, n);
  91. skipped += n;
  92. cnt -= n;
  93. }
  94. return skipped;
  95. }
  96. @Override
  97. public int read(byte[] b, int off, int len) throws IOException {
  98. if (len == 0)
  99. return 0;
  100. int n = src.read(b, off, len);
  101. if (0 < n)
  102. dst.write(b, off, n);
  103. return n;
  104. }
  105. public void close() throws IOException {
  106. byte[] b = skipBuffer();
  107. for (;;) {
  108. int n = src.read(b);
  109. if (n <= 0)
  110. break;
  111. dst.write(b, 0, n);
  112. }
  113. dst.close();
  114. src.close();
  115. }
  116. private byte[] skipBuffer() {
  117. if (skipBuffer == null)
  118. skipBuffer = new byte[2048];
  119. return skipBuffer;
  120. }
  121. }