Jacek Centkowski [Thu, 24 Nov 2016 09:58:14 +0000 (10:58 +0100)]
Expose getObjectToTransfer method of FileLfsServlet
Providing own implementation to doGet/doPut methods is troublesome when
this method is private.
Change-Id: I098cdc5cb90410eaaebc56c88c2d9e168584dd6d Signed-off-by: Jacek Centkowski <geminica.programs@gmail.com> Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Fix JGits merge-base calculation in case of inconsistent commit times.
JGit was potentially failing to compute correct merge-bases when the
commit times where inconsistent (a parent commit was younger than a
child commit). The code in MergeBaseGenerator was aware of the fact that
sometimes the discovery of a merge base x can occur after the parents of
x have been seen (see comment in #carryOntoOne()). But in the light of
inconsistent commit times it was possible that these parents of a
merge-base have already been returned as a merge-base.
This commit fixes the bug by buffering all commits generated by
MergeBaseGenerator. It is expected that this buffer will be small
because the number of merge-bases will be small. Additionally a new
flag is used to mark the ancestors of merge-bases. This allows to filter
out the unwanted commits.
David Pursehouse [Fri, 18 Nov 2016 01:30:27 +0000 (17:30 -0800)]
Enable error-prone for the Maven build
- update maven-compiler-plugin to 3.6.0
- exclude InsecureCipherFactory from errorprone checks
See
https://maven.apache.org/guides/mini/guide-default-execution-ids.html#Example:_Configuring_compile_to_run_twice
https://groups.google.com/d/topic/error-prone-discuss/pzT45ZMCQOc/discussion
Change-Id: Ic16d3f15cf2ef40de62fe6bfe4b8b35f0c1edc4e Signed-off-by: David Pursehouse <david.pursehouse@gmail.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Zhen Chen [Tue, 22 Nov 2016 23:53:06 +0000 (15:53 -0800)]
dump HTTP: Avoid being confused by Content-Length of a gzipped stream
TransportHttp sets 'Accept-Encoding: gzip' to allow the server to
compress HTTP responses. When fetching a loose object over HTTP, it
uses the following code to read the response:
InputStream in = openInputStream(c);
int len = c.getContentLength();
return new FileStream(in, len);
If the content is gzipped, openInputStream decompresses it and produces
the correct content for the object. Unfortunately the Content-Length
header contains the length of the compressed stream instead of the
actual content length. Use a length of -1 instead since we don't know
the actual length.
Loose objects are already compressed, so the gzip encoding typically
produces a longer compressed payload. The value from the Content-Length
is too high, producing EOFException: Short read of block.
Change-Id: I8d5284dad608e3abd8217823da2b365e8cd998b0 Signed-off-by: Zhen Chen <czhen@google.com> Helped-by: Jonathan Nieder <jrn@google.com>
Zhen Chen [Tue, 22 Nov 2016 01:15:15 +0000 (17:15 -0800)]
Fix content length in HttpClientConnection
Per the interface specification, the getContentLength method should
return -1 if content length is unknown or greater than
Integer.MAX_VALUE.
For chunked transfer encoding, the content length is not included in the
header, hence will cause a NullPointerException when trying to parse the
content length header.
Shawn Pearce [Wed, 16 Nov 2016 01:34:07 +0000 (17:34 -0800)]
Define MonotonicClock interface for advanced timestamps
MonotonicClock can be implemented to provide more certainity about
time than the standard System.currentTimeMillis() can provide. This
can be used by classes such as PersonIdent and Ketch to rely on
more certainity about time moving in a strictly ascending order.
Gerrit Code Review can also leverage this interface through its
embedding of JGit and use MonotonicClock and ProposedTimestamp to
provide stronger assurance that NoteDb time is moving forward.
Shawn Pearce [Sun, 13 Nov 2016 19:24:41 +0000 (11:24 -0800)]
Deprecate SafeBufferedOutputStream
Java 8 fixed the silent flush during close issue by
FilterOutputStream (base class of BufferedOutputStream)
using try-with-resources to close the stream, getting a
behavior matching what JGit's SafeBufferedOutputStream
was doing:
try {
flush();
} finally {
out.close();
}
With Java 8 as the minimum required version to run JGit
it is no longer necessary to override close() or have
this class. Deprecate the class, and use the JRE's version
of close.
Shawn Pearce [Mon, 14 Nov 2016 21:59:59 +0000 (13:59 -0800)]
Support {get,set}GitwebDescription on InMemoryRepository
This simplifies testing for Gerrit Code Review where
application code is updating the repository description
and the test harness uses InMemoryRepository.
Shawn Pearce [Mon, 14 Nov 2016 19:00:34 +0000 (11:00 -0800)]
Add {get,set}GitwebDescription to Repository
This method pair allows the caller to read and modify the description
file that is traditionally used by gitweb and cgit when rendering a
repository on the web.
Gerrit Code Review has offered this feature for years as part of
its GitRepositoryManager interface, but its fundamentally a feature
of JGit and its Repository abstraction.
git-core typically initializes a repository with a default value
inside the description file. During getDescription() this string
is converted to null as it is never a useful description.
Shawn Pearce [Sun, 13 Nov 2016 19:52:12 +0000 (11:52 -0800)]
Extract insecure Cipher factory
Bazel runs ErrorProne by default and ErrorProne rightly complains that
allowing the user to specify any Cipher can lead to insecure code
(in particular, getCipher("AES") operates in ECB mode). Unfortunately
this is required to support existing repositories insecurely stored
on S3.
Extract the insecure factory code to its own class so this can be built
as a java_library() with this check disabled.
Jonathan Nieder [Sun, 13 Nov 2016 21:16:48 +0000 (13:16 -0800)]
StreamCopyThread: Remove unnecessary flushCount
StreamCopyThread#run consistently interrupts itself whenever it
discovers it has been interrupted by StreamCopyThread#flush while not
reading. The flushCount is not needed to avoid lost flushes.
All in-tree users of StreamCopyThread never flush. As a nice side
benefit, this avoids the expense of atomic operations that have no
purpose for those users.
Hugo Arès [Wed, 9 Nov 2016 13:49:41 +0000 (08:49 -0500)]
Get rid of SoftReference in RepositoryCache
Now that RepositoryCache have a time based eviction strategy, get rid
of the strategy to evict cache entries if heap memory is running low,
i.e. soft references. Main reason why time based eviction was
implemented was to offer an alternative to the unpredictable soft
references.
Relying on soft references is not working, especially in large heap. The
JVM GC will consider collecting soft references as last resort before
throwing an out of memory error. For example, an application like Gerrit
configured with a 128GB heap, GC will wait until all 128GB is filled
before collecting the soft references so the application will be
suffering long pauses caused by GC for a long time already. In other
words, you will have to restart application because it's unusable before
JVM eviction kicks in.
Keeping the SoftReference in RepositoryCache is causing more harm than
good. If you use the time based eviction (which is the default strategy)
and want to tune JVM to release soft references more aggressively, it
will release repositories from the cache even though they are not
expired which defeats the purpose of the repository cache.
Gerrit uses Lucene library which uses soft references and this is
causing a "memory leak" except if you configure JVM to release soft
references more aggressively which have the nasty side effect of
evicting non expired repositories from the cache.
Change-Id: I9940bd800464c7f007696d0ccde52ea617b2ebce Signed-off-by: Hugo Arès <hugo.ares@ericsson.com>
Shawn Pearce [Sun, 13 Nov 2016 01:06:39 +0000 (17:06 -0800)]
Switch JSchSession to simple isolated OutputStream
Work around issues with JSch not handling interrupts by
isolating the JSch interactions onto another thread.
Run write and flush on a single threaded Executor using
simple Callable operations wrapping the method calls,
waiting on the future to determine the outcome before
allowing the caller to continue.
If any operation was interrupted the state of the stream
becomes fuzzy at close time. The implementation tries to
interrupt the pending write or flush, but this is very
likely to corrupt the stream object, so exceptions are
ignored during such a dirty close.
Philipp Marx [Fri, 11 Nov 2016 09:43:09 +0000 (10:43 +0100)]
Check that DfsBlockCache#blockSize is a power of 2
In case a value is used which isn’t a power of 2 there will be a high
chance of java.lang.ArrayIndexOutBoundsException and
org.eclipse.jgit.errors.CorruptObjectException due to a mismatching
assumption for the DfsBlockCache#blockSizeShift parameter.
Change-Id: Ib348b3704edf10b5f93a3ffab4fa6f09cbbae231 Signed-off-by: Philipp Marx <smigfu@googlemail.com>
Matthias Sohn [Mon, 7 Nov 2016 21:31:10 +0000 (22:31 +0100)]
Fix loop in auto gc
* GC.tooManyLooseObjects() always responded true since the loop missed
to advance the iterator so it always incremented until the threshold was
exceeded.
* Also fix loop exit criterion which was off by 1.
* Add some tests.
Change-Id: I70976dfaa026efbcf3c46bd45941f37277a18e04 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Jonathan Nieder [Fri, 4 Nov 2016 22:59:32 +0000 (15:59 -0700)]
StreamCopyThread: Do not drop data when flush is observed before writing
StreamCopyThread.flush was introduced in 61645b938bc934fda3b0624c5bac1e3495634750 (Add timeouts to smart
transport clients, 2009-06-19) to support timeouts on write in JSch.
The commit message from that change explains:
JSch made a timeout on write difficult because they explicitly do
a catch for InterruptedException inside of their OutputStream. We
have to work around that by creating an additional thread that just
shuttles data between our own OutputStream and the real JSch stream.
The code that runs on that thread is structured as follows:
while (!done) {
int n = src.read(buf);
dst.write(buf, 0, n);
}
with src being a PipedInputStream representing the data to be written
to JSch. To add flush support, that change wanted to add an extra step
if (wantFlush)
dst.flush();
but to handle the case where the thread is blocked in the read() call
waiting for new input, it needs to interrupt the read. So that is how
it works: the caller runs
to write some data and force it to flush by interrupting the read.
After the pipeOut.flush(), the StreamCopyThread reads the data that was
written and prepares to copy it out. If the copyThread.flush() call
interrupts the copyThread before it acquires writeLock and starts
writing, we throw away the data we just read to fulfill the flush.
Oops.
Noticed during the review of e67d59df3fb8ae1efe94a54efce36784f7cc83e8
(StreamCopyThread: Do not let flush interrupt a write, 2016-11-04),
which introduced this bug.
Jonathan Nieder [Fri, 4 Nov 2016 18:48:38 +0000 (11:48 -0700)]
StreamCopyThread: Do not let flush interrupt a write
flush calls interrupt() to interrupt a pending read and trigger a
flush. Unfortunately that interrupt() call can also interrupt a
pending write, putting Jsch in a bad state and triggering "Short read
of block" errors. Add locking to ensure the flush only interrupts
reads as intended.
Zhen Chen [Mon, 31 Oct 2016 21:27:36 +0000 (14:27 -0700)]
Fix flush call race condition in StreamCopyThread
If there was a new flush() call during flush previous bytes, we need to
catch it in order to process the new bytes between the two flush()
calls instead of going to last catch IOException clause and end the
thread.
Thomas Wolf [Mon, 24 Oct 2016 07:29:30 +0000 (09:29 +0200)]
Don't serialize internal hash collision chain link
ObjectId is serializable, and so are its subtypes. Ensure that
serialization does not follow the hash collision chain internal to the
ObjectIdOwnerMap, otherwise completely unrelated objects may get
serialized when a RevObject is serialized.
Note that serializing a RevCommit or RevTag may serialize quite a few
objects due to the parent/object links they contain. A user has no real
control over how many objects will be written when a RevCommit is
serialized. C.f [1]. This change does not resolve that, but in any case
this internal hash collision chain link should not participate in
serialization.
[1] https://github.com/gitblit/gitblit/pull/1141
Change-Id: Ice331a9dc80a59ca360fcc04adaff8b5e750d847 Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Matthias Sohn [Mon, 24 Oct 2016 11:26:58 +0000 (13:26 +0200)]
Speedup CleanFilter by transferring data in chunks of 8k
Transferring data byte per byte is slow, running add with CleanFilter on
a 2.9MB file takes 20 seconds. Using a buffer of 8k shrinks this time to
70ms.
Change-Id: I3bc2d8c11fe6cfaffcc99dc2a00643e01ac4e9cc Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Kevin Corcoran [Wed, 13 Apr 2016 22:55:08 +0000 (15:55 -0700)]
Make streamFileThreshold configurable
Previously, the streamFileThreshold, the threshold at which a file
would be streamed rather than loaded entirely into memory, was only
configurable on a global basis.
This commit makes this threshold configurable on a per-loader basis.
Bug: 490404
Change-Id: I492c18c3155dbf56eedda9044a61d76120fd75f9 Signed-off-by: Kevin Corcoran <kevin.corcoran@puppetlabs.com> Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Matthias Sohn [Wed, 13 Apr 2016 12:54:09 +0000 (14:54 +0200)]
Implement auto gc
With the auto option, gc checks whether any housekeeping is required; if
not, it exits without performing any work. Some JGit commands run gc
--auto after performing operations that could create many loose objects.
Housekeeping is required if there are too many loose objects or too many
packs in the repository.
If the number of loose objects exceeds the value of the gc.auto option
jgit's GC consolidates all existing packs into a single pack (equivalent
to -A option), whereas git-core would combine all loose objects into a
single pack using repack -d -l. Setting the value of gc.auto to 0
disables automatic packing of loose objects.
If the number of packs exceeds the value of gc.autoPackLimit, then
existing packs (except those marked with a .keep file) are consolidated
into a single pack by using the -A option of repack. Setting
gc.autoPackLimit to 0 disables automatic consolidation of packs.
Like git the following jgit commands run auto gc:
- fetch
- merge
- rebase
- receive-pack
The auto gc for receive-pack can be suppressed by setting the config
option receive.autogc = false
Change-Id: I68a2a051b39ec2c53cb7c4b8f6c596ba65eeba5d Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Ned Twigg [Fri, 18 Mar 2016 09:46:38 +0000 (02:46 -0700)]
Checkout: Add the ability to checkout all paths.
Change-Id: Ie1e59c566b63d0dfac231e44e7ebd7f3f08f3e9f Signed-off-by: Ned Twigg <ned.twigg@diffplug.com> Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Marc Strapetz [Mon, 22 Aug 2016 07:53:18 +0000 (09:53 +0200)]
Fix possible SIOOBE in RefDirectory.parsePackedRefs
This SIOOBE happens reproducibly when trying to access
a repository containing Cygwin symlinks
Change-Id: I25f103fcc723bac7bfaaeee333a86f11627a92c7 Signed-off-by: Marc Strapetz <marc.strapetz@syntevo.com> Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Remove the assumption that the local repository is a file based one.
Change-Id: I8f10fe7a54e9fc07f2a23d7901e52b65aa570d45 Signed-off-by: Thomas Meyer <thomas.mey@web.de> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
David Turner [Mon, 26 Sep 2016 20:41:19 +0000 (16:41 -0400)]
TreeFormatter: disallow empty filenames in trees
Git barfs on these (and they don't make any sense), so we certainly
shouldn't write them.
Change-Id: I3faf8554a05f0fd147be2e63fbe55987d3f88099 Signed-off-by: David Turner <dturner@twosigma.com> Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
David Pursehouse [Wed, 19 Oct 2016 03:54:46 +0000 (12:54 +0900)]
ArchiveTest: Don't use string concatenation in loop
According to FindBugs:
In each iteration, the String is converted to a StringBuffer/
StringBuilder, appended to, and converted back to a String. This
can lead to a cost quadratic in the number of iterations, as the
growing string is recopied in each iteration.
Replace string concatenation with StringBuffer.
Change-Id: I60e09f274bed6722f4e0e4d096b0f2b1b31ec1b4 Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
David Turner [Fri, 14 Oct 2016 20:44:51 +0000 (16:44 -0400)]
Config: do not add spaces before units
Adding a space before the unit ('g', 'm', 'k) causes git to fail with
the error:
fatal: bad numeric config value
Change-Id: I57f11d3a1cdcca4549858e773af1a2a80fc0369f Signed-off-by: David Turner <dturner@twosigma.com> Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
David Pursehouse [Tue, 18 Oct 2016 04:51:27 +0000 (13:51 +0900)]
FS: Fix lazy initialization of non-volatile static field
The 'factory' field is lazy initialized in the detect() method.
According to FindBugs:
Because the compiler or processor may reorder instructions, threads
are not guaranteed to see a completely initialized object, if the
method can be called by multiple threads.
Fix this by declaring the member as 'volatile'.
Change-Id: Ib32663bb28c9564584256e01f625b4e7875e6223 Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Fix JGit CLI to follow native git's interpretation of http_proxy...
Native git (as many other tools) interprets the environment variables
http_proxy, HTTP_PROXY, ... in a specific way. "http_proxy" has to be
lowercase while "https_proxy" can be lowercase or uppercase (means:
"HTTPS_PROXY"). Lowercase has precedence. This can be looked up in
"ENVIRONMENT" section of [1]. Teach JGit CLI to behave similar.
Additionally teach JGit not to interpret the environment variables if
the java process was explicitly started with the system properties
telling JVM which proxy to use. A call like "http_proxy=proxy1 java
-Dhttp.proxyHost=proxy2 ..." should use proxy2 as proxy.
Hugo Arès [Wed, 12 Oct 2016 10:54:52 +0000 (06:54 -0400)]
Fix eviction of repositories with negative usage count
If the repository close method was called twice (or more) for one open,
the usage count became negative and the repository was never be evicted
from the cache because the method checking if repository is expired was
not considering negative usage count.
Change-Id: I18a80c415c54c37d1b9def2b311ff2d0afa455ca Signed-off-by: Hugo Arès <hugo.ares@ericsson.com>
Matthias Sohn [Mon, 10 Oct 2016 22:34:42 +0000 (00:34 +0200)]
Merge branch 'stable-4.5'
* stable-4.5:
pgm: Fix misspelled key of an externalized string
Add missing online help for Ketch server type option in CLI daemon
Remove wrong junit dependencies in org.eclipse.jgit.pgm
Change-Id: I9cac024c0b488101b539c713b0f5ffc8c01b55bd Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Thomas Wolf [Mon, 21 Dec 2015 18:43:56 +0000 (19:43 +0100)]
Fix symlink content comparison on MacOS in tree walk
Symlinks on MacOS are written as UTF-8 NFD, but
readSymbolicLink().toString() converts to NFC with potentially fewer
bytes. May occur in particular if the link target has non-ASCII
characters for which the NFC and NFD encodings differ. This may lead
to an EOFException: Short read of block.
This causes all kinds of weird effects in EGit, ranging from failing
rebases (which report the exception to the user) to EGit decorations in
the navigator silently disappearing (and never coming back).
* Rename readContentAsNormalizedString() to readSymlinkTarget() as it's
called only for symlinks. Also make it protected.
* Fix by allowing the read to succeed even if less than the expected
number of bytes are returned by the entry's input stream.
* Override in FileTreeIterator to use fs.readSymlink() directly.
Includes a new MacOS-only test.
Change-Id: I264c5972d67b1cbb1ed690580f5706e671b9affd Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>