Throw BinaryBlobException from RawParseUtils#lineMap.
This makes detection of binaries exact for ResolveMerger and
DiffFormatter: they will classify files as binary regardless of where
the '\0' occurs in the text.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Change-Id: Id4342a199628d9406bfa04af1b023c27a47d4014
Avoid bad rounding "1 year, 12 months" in date formatter
Round first, then calculate the labels. This avoids "x years, 12 months"
and instead produces "x+1 years".
One test case has been added for the original example the bug was found
with, and one assertion has been moved from an existing test case to the
new test case, since it also triggered the bug.
Bug: 525907
Change-Id: I3270af3850c4fb7bae9123a0a6582f93055c9780
Signed-off-by: Michael Keppler <Michael.Keppler@gmx.de>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Avoid bad rounding "1 year, 12 months" in date formatter
Round first, then calculate the labels. This avoids "x years, 12 months"
and instead produces "x+1 years".
One test case has been added for the original example the bug was found
with, and one assertion has been moved from an existing test case to the
new test case, since it also triggered the bug.
Bug: 525907
Change-Id: I3270af3850c4fb7bae9123a0a6582f93055c9780
Signed-off-by: Michael Keppler <Michael.Keppler@gmx.de>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Ensure EOL stream type is DIRECT when -text attribute is present
Otherwise fancy combinations of attributes (binary or -text in
combination with crlf or eol) may result in the corruption of binary
data.
Bug: 520910
Change-Id: I3ffc666c13d1b9d2ed987b69a67bfc7f42ccdbfc
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Treat RawText of binary data as file with one single line.
This avoids executing mergeAlgorithm.merge on binary data, which is
unlikely to be useful.
Arguably, binary data should not make it to
ResolveMerger#contentMerge, but this approach has the following
advantages:
* binary detection is exact, since it doesn't only look at the start
of the blob.
* it is cheap, as we have to iterate over the bytes anyway to find
'\n'.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Change-Id: I424295df1dc60a719859d9d7c599067891b15792
Callers may estimate the size, and their estimate may be zero. Silently
allow this, rather than throwing IndexOutOfBoundsException later during
add.
Change-Id: Ife236f9f4ce469c57b18e76cf4fad6feb52cb2b0
Fix null return from FS.readPipe when command fails to launch
When a command invoked from readPipe fails to launch (i.e. the exec call
fails due to a missing command executable), Process.start() throws,
which gets caught by the generic IOException handler, resulting in a
null return. This change detects this case and rethrows a
CommandFailedException instead.
Additionally, this change uses /bin/sh instead of bash for its posix
command failure test, to accomodate building in environments where bash
is unavailable.
Change-Id: Ifae51e457e5718be610c0a0914b18fe35ea7b008
Signed-off-by: Bryan Donlan <bdonlan@gmail.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Android wants them to work, and we're only interested in them for bare
repos, so add them just for that.
Make sure to use symlinks instead of just using the copyfile
implementation. Some scripts look up where they're actually located in
order to find related files, so they need the link back to their
project.
Change-Id: I929b69b2505f03036f69e25a55daf93842871f30
Signed-off-by: Dan Willemsen <dwillemsen@google.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Jeff Gaston <jeffrygaston@google.com>
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Move SHA1 compress/recompress files to resource folder
This fixes Bazel build:
in srcs attribute of java_library rule //org.eclipse.jgit:jgit:
file '//org.eclipse.jgit:src/org/eclipse/jgit/util/sha1/SHA1.recompress'
is misplaced here (expected .java, .srcjar or .properties).
Another option that was considered is to exclude the non source files.
Change-Id: I7083f27a4a49bf6681c85c7cf7b08a83c9a70c77
Signed-off-by: David Ostrovsky <david@ostrovsky.org>
Update SHA1 class to include a Java port of sha1dc[1]'s ubc_check,
which can detect the attack pattern used by the SHAttered[2] authors.
Given the shattered example files that have the same SHA-1, this
modified implementation can identify there is risk of collision given
only one file in the pair:
$ jgit ...
[main] WARN org.eclipse.jgit.util.sha1.SHA1 - SHA-1 collision 38762cf7f5
When JGit detects probability of a collision the SHA1 class now warns
on the logger, reporting the object's SHA-1 hash, and then throws a
Sha1CollisionException to the caller.
From the paper[3] by Marc Stevens, the probability of a false positive
identification of a collision is about 14 * 2^(-160), sufficiently low
enough for any detected collision to likely be a real collision.
git-core[4] may adopt sha1dc before the system migrates to an entirely
new hash function. This commit enables JGit to remain compatible with
that move to sha1dc, and help protect users by warning if similar
attacks as SHAttered are identified.
Performance declined about 8% (detection off), now:
MessageDigest 238.41 MiB/s
MessageDigest 244.52 MiB/s
MessageDigest 244.06 MiB/s
MessageDigest 242.58 MiB/s
SHA1 216.77 MiB/s (was ~240.83 MiB/s)
SHA1 220.98 MiB/s
SHA1 221.76 MiB/s
SHA1 221.34 MiB/s
This decline in throughput is attributed to the step loop unrolling in
compress(), which was necessary to easily fit the UbcCheck logic into
the hash function. Using helper functions s1-s4 reduces the code
explosion, providing acceptable throughput.
With detection enabled (default):
SHA1 detectCollision 180.12 MiB/s
SHA1 detectCollision 181.59 MiB/s
SHA1 detectCollision 181.64 MiB/s
SHA1 detectCollision 182.24 MiB/s
sha1dc (native C) ~206.28 MiB/s
sha1dc (native C) ~204.47 MiB/s
sha1dc (native C) ~203.74 MiB/s
Average time across 100,000 calls to hash 4100 bytes (such as a commit
or tree) for the various algorithms available to JGit also shows SHA1
is slower than MessageDigest, but by an acceptable margin:
MessageDigest 17 usec
SHA1 18 usec
SHA1 detectCollision 22 usec
Time to index-pack for git.git (217982 objects, 69 MiB) has increased:
MessageDigest SHA1 w/ detectCollision
------------- -----------------------
20.12s 25.25s
19.87s 25.48s
20.04s 25.26s
avg 20.01s 25.33s +26%
Being implemented in Java with these additional safety checks is
clearly a penalty, but throughput is still acceptable given the
increased security against object name collisions.
[1] https://github.com/cr-marcstevens/sha1collisiondetection
[2] https://shattered.it/
[3] https://marc-stevens.nl/research/papers/C13-S.pdf
[4] https://public-inbox.org/git/20170223230621.43anex65ndoqbgnf@sigill.intra.peff.net/
Change-Id: I9fe4c6d8fc5e5a661af72cd3246c9e67b1b9fee6
Allow SHA1 instances to be reused to compute another hash value, and
resume caching them in ObjectInserter and PackParser. This shaves a
small amount of running time off parsing git.git's pack file:
before after
------ ------
25.25s 25.55s
25.48s 25.06s
25.26s 24.94s
Almost noise (small difference), but recycling the instances reduces
some stress on the memory allocator finding two 80 word message block
arrays needed for hashing and collision detection.
Change-Id: I4af88a720e81460293bc5c5d1d3db1a831e7e228
Generate names for objects using only the pure Java SHA1
implementation, but continue using MessageDigest in tests.
This opens the possibility of changing the hashing function
to incorporate additional safety measures, such as those
used in sha1dc[1].
Since MessageDigest has higher throughput, continue using
MessageDigest for computing pack, idx and DirCache trailers.
These are less likely to be sensitive to SHAttered[2] types
of attacks, as Git uses them to detect random bit flips
during transfer, and not for content identity.
[1] https://github.com/cr-marcstevens/sha1collisiondetection
[2] https://shattered.it/
Change-Id: If6da98334201f7f20cb916e46f782c45f373784e
This implementation is derived straight from the description written
in RFC 3174. On Mac OS X with Java 1.8.0_91 it offers similar
throughput as MessageDigest SHA-1:
system 239.75 MiB/s
system 244.71 MiB/s
system 245.00 MiB/s
system 244.92 MiB/s
sha1 234.08 MiB/s
sha1 244.50 MiB/s
sha1 242.99 MiB/s
sha1 241.73 MiB/s
This is the fastest implementation I could come up with. Common SHA-1
implementation tricks such as unrolling loops creates a method too
large for the JIT to effectively optimize, resulting in lower overall
hashing throughput. Using a preprocessor to perform the register
renaming of A-E also didn't help, as again the method was too large
for the JIT to effectively optimize.
Fortunately the fastest version is a naive, straight-forward
implementation very close to the description in RFC 3174.
Change-Id: I228b05c4a294ca2ad51386cf0e47978c68e1aa42
Enable and fix warnings about redundant specification of type arguments
Since the introduction of generic type parameter inference in Java 7,
it's not necessary to explicitly specify the type of generic parameters.
Enable the warning in Eclipse, and fix all occurrences.
Change-Id: I9158caf1beca5e4980b6240ac401f3868520aad0
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Enable and fix 'Should be tagged with @Override' warning
Set missingOverrideAnnotation=warning in Eclipse compiler preferences
which enables the warning:
The method <method> of type <type> should be tagged with @Override
since it actually overrides a superclass method
Justification for this warning is described in:
http://stackoverflow.com/a/94411/381622
Enabling this causes in excess of 1000 warnings across the entire
code-base. They are very easy to fix automatically with Eclipse's
"Quick Fix" tool.
Fix all of them except 2 which cause compilation failure when the
project is built with mvn; add TODO comments on those for further
investigation.
Change-Id: I5772061041fd361fe93137fd8b0ad356e748a29c
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Cover the case where the exception is wrapped up as a
cause, e.g., PackIndex#open(File).
Change-Id: I0df5b1e9c2ff886bdd84dee3658b6a50866699d1
Signed-off-by: Hongkai Liu <hongkai.liu@ericsson.com>
Don't rely on default locale when using toUpperCase() and toLowerCase()
Otherwise these methods may produce unexpected results if used for
strings that are intended to be interpreted locale independently.
Examples are programming language identifiers, protocol keys, and HTML
tags. For instance, "TITLE".toLowerCase() in a Turkish locale returns
"t\u0131tle", where '\u0131' is the LATIN SMALL LETTER DOTLESS I
character.
See
https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#toLowerCase--http://blog.thetaphi.de/2012/07/default-locales-default-charsets-and.html
Bug: 511238
Change-Id: Id8d8f37d84d62239c918b81f8d883ed798d87656
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Change StreamGobbler to Runnable to avoid unused Future
It can be considered a programming error to create a Future<T>
but do nothing with that object. There is an async computation
happening and without holding and checking the Future for done
or exception the caller has no idea if it has completed.
FS doesn't really care about these StreamGobblers finishing.
Instead use Runnable with execute(Runnable), which doesn't
return a Future.
Change-Id: I93b66d1f6c869e66be5c1169d8edafe781e601f6
dump HTTP: Avoid being confused by Content-Length of a gzipped stream
TransportHttp sets 'Accept-Encoding: gzip' to allow the server to
compress HTTP responses. When fetching a loose object over HTTP, it
uses the following code to read the response:
InputStream in = openInputStream(c);
int len = c.getContentLength();
return new FileStream(in, len);
If the content is gzipped, openInputStream decompresses it and produces
the correct content for the object. Unfortunately the Content-Length
header contains the length of the compressed stream instead of the
actual content length. Use a length of -1 instead since we don't know
the actual length.
Loose objects are already compressed, so the gzip encoding typically
produces a longer compressed payload. The value from the Content-Length
is too high, producing EOFException: Short read of block.
Change-Id: I8d5284dad608e3abd8217823da2b365e8cd998b0
Signed-off-by: Zhen Chen <czhen@google.com>
Helped-by: Jonathan Nieder <jrn@google.com>
Define MonotonicClock interface for advanced timestamps
MonotonicClock can be implemented to provide more certainity about
time than the standard System.currentTimeMillis() can provide. This
can be used by classes such as PersonIdent and Ketch to rely on
more certainity about time moving in a strictly ascending order.
Gerrit Code Review can also leverage this interface through its
embedding of JGit and use MonotonicClock and ProposedTimestamp to
provide stronger assurance that NoteDb time is moving forward.
Change-Id: I1a3cbd49a39b150a0d49b36d572da113ca83a786
Java 8 fixed the silent flush during close issue by
FilterOutputStream (base class of BufferedOutputStream)
using try-with-resources to close the stream, getting a
behavior matching what JGit's SafeBufferedOutputStream
was doing:
try {
flush();
} finally {
out.close();
}
With Java 8 as the minimum required version to run JGit
it is no longer necessary to override close() or have
this class. Deprecate the class, and use the JRE's version
of close.
Change-Id: Ic0584c140010278dbe4062df2e71be5df9a797b3
Because flush calls interrupt with writeLock held, it cannot interrupt
a write. Simplify by no longer defending against that.
Change-Id: Ib0b39b425335ff7b0ea1b1733562da5392576a15
StreamCopyThread#run consistently interrupts itself whenever it
discovers it has been interrupted by StreamCopyThread#flush while not
reading. The flushCount is not needed to avoid lost flushes.
All in-tree users of StreamCopyThread never flush. As a nice side
benefit, this avoids the expense of atomic operations that have no
purpose for those users.
Change-Id: I1afe415cd09a67f1891c3baf712a9003ad553062
Switch JSchSession to simple isolated OutputStream
Work around issues with JSch not handling interrupts by
isolating the JSch interactions onto another thread.
Run write and flush on a single threaded Executor using
simple Callable operations wrapping the method calls,
waiting on the future to determine the outcome before
allowing the caller to continue.
If any operation was interrupted the state of the stream
becomes fuzzy at close time. The implementation tries to
interrupt the pending write or flush, but this is very
likely to corrupt the stream object, so exceptions are
ignored during such a dirty close.
Change-Id: I42e3ba3d8c35a2e40aad340580037ebefbb99b53
StreamCopyThread: Do not drop data when flush is observed before writing
StreamCopyThread.flush was introduced in
61645b938b (Add timeouts to smart
transport clients, 2009-06-19) to support timeouts on write in JSch.
The commit message from that change explains:
JSch made a timeout on write difficult because they explicitly do
a catch for InterruptedException inside of their OutputStream. We
have to work around that by creating an additional thread that just
shuttles data between our own OutputStream and the real JSch stream.
The code that runs on that thread is structured as follows:
while (!done) {
int n = src.read(buf);
dst.write(buf, 0, n);
}
with src being a PipedInputStream representing the data to be written
to JSch. To add flush support, that change wanted to add an extra step
if (wantFlush)
dst.flush();
but to handle the case where the thread is blocked in the read() call
waiting for new input, it needs to interrupt the read. So that is how
it works: the caller runs
pipeOut.write(some data);
pipeOut.flush();
copyThread.flush();
to write some data and force it to flush by interrupting the read.
After the pipeOut.flush(), the StreamCopyThread reads the data that was
written and prepares to copy it out. If the copyThread.flush() call
interrupts the copyThread before it acquires writeLock and starts
writing, we throw away the data we just read to fulfill the flush.
Oops.
Noticed during the review of e67d59df3f
(StreamCopyThread: Do not let flush interrupt a write, 2016-11-04),
which introduced this bug.
Change-Id: I4aceb5610e1bfb251046097adf46bca54bc1d998
StreamCopyThread: Do not let flush interrupt a write
flush calls interrupt() to interrupt a pending read and trigger a
flush. Unfortunately that interrupt() call can also interrupt a
pending write, putting Jsch in a bad state and triggering "Short read
of block" errors. Add locking to ensure the flush only interrupts
reads as intended.
Change-Id: Ib105d9e107ae43549ced7e6da29c22ee41cde9d8
If there was a new flush() call during flush previous bytes, we need to
catch it in order to process the new bytes between the two flush()
calls instead of going to last catch IOException clause and end the
thread.
Change-Id: Ibc58a1fa97559238c13590aedbb85e482d85e465
Signed-off-by: Zhen Chen <czhen@google.com>
FS: Fix lazy initialization of non-volatile static field
The 'factory' field is lazy initialized in the detect() method.
According to FindBugs:
Because the compiler or processor may reorder instructions, threads
are not guaranteed to see a completely initialized object, if the
method can be called by multiple threads.
Fix this by declaring the member as 'volatile'.
Change-Id: Ib32663bb28c9564584256e01f625b4e7875e6223
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Don't log error if system git config does not exist
- enhance FS.readPipe to throw an exception if the external command
fails to enable the caller to handle the command failure
- reduce log level to warning if system git config does not exist
- improve log message
Bug: 476639
Change-Id: I94ae3caec22150dde81f1ea8e1e665df55290d42
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
AddCommandTest is flaky because IOException is thrown sometimes.
Caused by: java.io.IOException: Stream closed
at java.lang.ProcessBuilder$NullOutputStream.write(ProcessBuilder.java:433)
at java.io.OutputStream.write(OutputStream.java:116)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at java.io.FilterOutputStream.close(FilterOutputStream.java:158)
at org.eclipse.jgit.util.FS.runProcess(FS.java:993)
at org.eclipse.jgit.util.FS.execute(FS.java:1102)
at org.eclipse.jgit.treewalk.WorkingTreeIterator.filterClean(WorkingTreeIterator.java:470)
... 22 more
OpenJDK replaces the underlying OutputStream with NullOutputStream when
the process exits. This throws IOException for all write operation. When
it exits before JGit writes the input to the pipe buffer, the input
stays in BufferedOutputStream. The close method tries to write it again,
and IOException is thrown.
Since we ignore IOException in StreamGobbler, we also ignore it when
we close the stream.
Fixes Bug 499633.
Change-Id: I30c7ac78e05b00bd0152f697848f4d17d53efd17
Signed-off-by: Masaya Suzuki <draftcode@gmail.com>
[findBugs] Prevent potential NPE in FS_POSIX.readUmask()
BufferedReader.readLine() returns null if the end of the stream has been
reached.
Change-Id: I83102bbfb1316407247e0f29023077af9e8d9606
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
There was a bug in JGit which caused NPEs being thrown when Transport
errors should be reported. Avoid the NPE to let the original error show
up.
Change-Id: I9e1e2b0195bd61b7e531a09d0fc7bce109bd6515
TreeWalk provides the new method getEolStreamType. This new method can
be used with EolStreamTypeUtil in order to create a wrapped InputStream
or OutputStream when reading / writing files. The implementation
implements support for the git configuration options core.crlf, core.eol
and the .gitattributes "text", "eol" and "binary"
CQ: 10896
Bug: 486563
Change-Id: Ie4f6367afc2a6aec1de56faf95120fff0339a358
Signed-off-by: Ivan Motsch <ivan.motsch@bsiag.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Allow to reuse disableSslVerify method, move it to HttpSupport
The disableSslVerify method will be used in the follow up change.
Change-Id: Ie00b5e14244a9a036cbdef94768007f1c25aa8d3
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
The Large File Storage extension specified by GitHub [1] uses SHA-256 to
compute the ID of large files stored by the extension. Hence implement a
SHA-256 abstraction similar to the SHA-1 abstraction used by JGit.
[1] https://git-lfs.github.com/
Bug: 470333
Change-Id: I3a95954543c8570d73929e55f4a884b55dbf1b7a
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
This could have only happened during the getBytes call. Instead, use
Constants.encode, which is a non-throwing implementation.
This change is binary compatible with existing code compiled against
older versions of JGit, although it might break compilation of
previously compiling code due to dead catch blocks.
Change-Id: I191fec5cac718657407230de141440e86d0151fb
RevCommit: Better support invalid encoding headers
With this support we no longer need the 'utf-8' alias. UTF-8 will be
automatically tried when the encoding header is not recognized and used
if the character sequence cleanly decodes as UTF-8.
Modernize some of the references to use StandardCharsets.
Change-Id: I4c0c88750475560e1f2263180c4a98eb8febeca0