Jonathan Nieder [Thu, 14 May 2015 23:24:15 +0000 (16:24 -0700)]
Allow ObjectWalk to be filtered by an arbitrary predicate
This will make it possible to declare a collection of objects as
ineligible for the walk en masse, for example if they are known to be
uninteresting via a bitmap.
Change-Id: I637008b25bf9fb57df60ebb2133a70214930546a Signed-off-by: Jonathan Nieder <jrn@google.com>
Shawn Pearce [Sun, 10 May 2015 02:53:53 +0000 (19:53 -0700)]
Remove SoftReference from dfs.DeltaBaseCache
The Java GC doesn't always clear these before running out of memory
and failing allocations. In practice OpenJDK 7 is leaving these live,
removing any advantage of the SoftReference to attempt to shed memory
when the GC is unable to continue allocating.
Instead follow the pattern of the DfsBlockCache and use hard refs
to the object data. Require applications to configure the cache
size more accurately given expected memory usage.
Shawn Pearce [Sun, 10 May 2015 01:11:20 +0000 (18:11 -0700)]
Fix memory leak in dfs.DeltaBaseCase
The LRU chain management code was broken leading to situations where
the chain was incomplete. This prevented the cache from removing
items when it exceeded its memory target, causing a leak.
One case was repeated hit on the head of the chain. moveToHead(e)
was invoked linking the head back to itself in a cycle orphaning
the rest of the table.
Add some unit tests to cover this and a few other paths.
Matthias Sohn [Sun, 12 Apr 2015 23:01:13 +0000 (01:01 +0200)]
Use ANY_DIFF filter in ResolveMerger only for bare repositories
As Chris pointed out change I822721c76c64e614f87a080ced2457941f53adcd
slowed down merge since ANY_DIFF filter is much less efficient than the
manual detection of diffs done in ResolveMerger.processEntry() since it
avoids unnecessary filesystem calls using the git index. Hence only set
the ANY_DIFF filter on bare repositories which don't have a working tree
to scan.
To test performance I used the setup described in Chris' comment on
change I822721c76c64e614f87a080ced2457941f53adcd and modified
ResolveMerger.mergeTrees() to not add the working tree in order to
simulate merging in a bare repository.
At least on Mac I couldn't detect a speedup, with and without the
ANY_DIFF filter merge test takes an average 0.67sec.
Change-Id: I17b3a06f369cee009490f54ad1a2deb6c145c7cf Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Shawn Pearce [Sun, 10 May 2015 19:19:12 +0000 (12:19 -0700)]
FS_POSIX: Rework umask detection to make it settable
Avoid always calling `sh -c umask` on startup, instead deferring
the invocation until the first time a working tree file needs to
use the execute bit. This allows servers using bare repos to avoid
a costly fork+exec for a value that is never used.
Store the umask as an int instead of two Boolean. This is slightly
smaller memory (one int vs. two references) and makes it easier for
an application to force setting the umask to a value that overrides
whatever the shell told JGit.
Simplify the code to bail by returning early when canExecute is
false, which is the common case for working tree files.
Shawn Pearce [Sun, 10 May 2015 17:42:09 +0000 (10:42 -0700)]
Expose disposeBody() on RevCommit and RevTag
Applications that use a commit message once and do not
need it again can free the body to save memory. Expose
the disposeBody() methods to support this and use it in
pgm.Log which only visits each commit once.
Shawn Pearce [Sat, 9 May 2015 05:26:28 +0000 (22:26 -0700)]
ObjectReader: remove the walkAdvice API
This was added a very long time ago to support the failed
DHT storage implementation. Since then no storage system
was able to make use of this API, but it pollutes internals
of the walkers.
Kill the API on ObjectReader and drop the invocations from
the walker code.
Previously using an ObjectWalk meant uninteresting commits may keep
their commit message buffers in memory just in case they were found to
be on the boundary and were output as UNINTERESTING for the caller.
This was incorrect inside StartGenerator. ObjectWalk hides these
internal UNINTERESTING cases from its caller unless RevSort.BOUNDARY
was explicitly set, and its false by default. Callers never see one
of these saved uninteresting commits.
Change the test to allow early dispose unless the application has
explicitly asked for RevSort.BOUNDARY. This allows uninteresting
commit buffers to be discarded and garbage collected in ObjectWalks
when the caller will never be given the RevCommit.
Shawn Pearce [Sat, 9 May 2015 17:47:13 +0000 (10:47 -0700)]
ObjectWalk: make setRetainBody(false) the default
Despite being the primary author of RevWalk and ObjectWalk I still
fail to remember to setRetainBody(false) in application code using
an ObjectWalk to examine the graph.
Document the default for RevWalk is setRetainBody(true), where the
application usually wants the commit bodies to display or inspect.
Change the default for ObjectWalk to setRetainBody(false), as nearly
all callers want only the graph shape and do not need the larger text
inside a commit body. This allows some code in JGit to be simplified.
FS_Win32: Avoid an IOException on Windows if bash is not in PATH
Change-Id: I3145f74ecee9f5b368e7f4b9fd7cb906f407eff5 Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Shawn Pearce [Wed, 6 May 2015 23:02:27 +0000 (16:02 -0700)]
Skip logging stack trace on corrupt objects
Instead of dumping a full stack trace when a client sends an invalid
commit, record only a short line explaining the attempt:
Cannot receive Invalid commit c0ff33...: invalid author into /tmp/jgit.git
The text alone is sufficient to explain the problem and the stack
trace does not lend any additional useful information. ObjectChecker
is quite clear about its rejection cases.
Shawn Pearce [Wed, 6 May 2015 22:47:34 +0000 (15:47 -0700)]
Add repository name to failures in HTTP server log
If UploadPack or ReceivePack has an exception record an identifier
associated with the repository as part of the log message. This can
help the HTTP admin track down the offending repository and take
action to repair the root cause.
When DirCacheTree.contains() is called and 'aOff' is greater than 'aLen'
an ArrayIndexOutOfBoundsException was thrown. This fix makes
DirCacheTree.contains() more robust and allows parsing such index files
without throwing AIOOB.
I couldn't create a test case leading to this situation but I have seen
such situations while inspecting Bug: 465393. It seems that such
situations are created on Windows when there are invalid pathes in the
index. There may be a not yet known bug leading to such situations in
combination with invalid pathes.
Matthias Sohn [Mon, 4 May 2015 06:56:28 +0000 (08:56 +0200)]
Use CBI eclipse-jarsigner-plugin 1.1.2-SNAPSHOT
CBI recently fixed a couple of resource leaks which probably caused
jar signing failures on Hudson (bug 464947). JGit builds were also
affected. Hence use version 1.1.2-SNAPSHOT until a new release is
available.
Bug:464947
Change-Id: I7fb4a65f888194f7209c866cd58551891c89fb7a Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Since git-core ff5effd (v1.7.12.1) the native wire protocol transmits
the server and client implementation and version strings using
capability "agent=git/1.7.12.1" or similar.
Support this in JGit and hang the implementation data off UploadPack
and ReceivePack. On HTTP transports default to the User-Agent HTTP
header until the client overrides this with the optional capability
string in the first line.
Extract the user agent string into a UserAgent class under transport
where it can be specified to a different value if the application's
build process has broken the Implementation-Version header in the
JGit package.
Add fsck.allowInvalidPersonIdent to accept invalid author/committers
A larger than expected number of real-world repositories found on
the Internet contain invalid author, committer and tagger lines
in their history. Many of these seem to be caused by users misusing
the user.name and user.email fields, e.g.:
[user]
name = Au Thor <author@example.com>
email = author@example.com
that some version of Git (or a reimplementation thereof) copied
directly into the object header. These headers are not valid and
are rejected by a strict fsck, making it impossible to transfer
the repository with JGit/EGit.
Another form is an invalid committer line with double negative for
the time zone, e.g.
Allow callers and users to weaken the fsck settings to accept these
sorts of breakages if they really want to work on a repo that has
broken history. Most routines will still function fine, however
commit timestamp sorting in RevWalk may become confused by a corrupt
committer line and sort commits out of order. This is mostly fine if
the corrupted chain is shorter than the slop window.
Hugo Arès [Tue, 7 Apr 2015 15:17:06 +0000 (11:17 -0400)]
Remove pack from list when file handle is stale
This error happens on nfs file system when you try to read a file that
was deleted or replaced.
When the error happens because the file was deleted, removing it from
the list is the proper way to handle the error, same use case as
FileNotFoundException. When the error happens because the file was
replaced, removing the file from the list will cause the file to be
re-read so it will get the latest version of the file.
Bug: 462868
Change-Id: I368af61a6cf73706601a3e4df4ef24f0aa0465c5 Signed-off-by: Hugo Arès <hugo.ares@ericsson.com>
Hugo Arès [Fri, 10 Apr 2015 13:08:07 +0000 (09:08 -0400)]
Lower log level to warn for handled pack errors
Pack not found and pack corrupted/invalid are handled by the code (pack
is removed from the list) so logging an error and the stacktrace is
misleading because it implies that there is an action to take to fix the
error.
Lower the log level to warn and remove the stacktrace for those 2 types
of errors and keep the error log statement for any other.
Change-Id: I2400fe5fec07ac6d6c244b852cce615663774e6e Signed-off-by: Hugo Arès <hugo.ares@ericsson.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
David Ostrovsky [Fri, 24 Apr 2015 16:55:31 +0000 (18:55 +0200)]
Revert "Let ObjectWalk.markUninteresting also mark the root tree as"
The Iff2de881 tried to fix missing tree ..." but introduced severe
performance degradation (>10x in some cases) when acting as server
(git push) and as client (replication). IOW cure is worse than the
disease.
Cached packs are only used when writing over the network or to
a bundle file and reuse validation is always disabled in these
two contexts. The client/consumer of the stream will be SHA-1
checksumming every object.
Reuse validation is most critical during local GC to avoid silently
ignoring corruption by stopping as soon as a problem is found and
leaving everything alone for the end-user to debug and salvage.
Cached packs are not supported during local GC as the bitmap rebuild
logic does not support including a cached pack in the result.
Strip out the validation and force PackWriter to always disable the
cached pack feature if reuseValidation is enabled.
Avoid storing large packs in block cache during reuse
When a large pack (> 30% of the block cache) is being reused by
copying it pollutes the block cache with noise by storing blocks
that are never referenced again.
Avoid this by streaming the file directly from its channel onto
the output stream.
Matthias Sohn [Wed, 22 Apr 2015 10:53:35 +0000 (12:53 +0200)]
Restore AwtCredentialsProvider to enable debugging pgm in Eclipse
In 6c1f7393882baf8464859136a70199ea96fcae0f the AWT based credentials
provider was dropped because we don't support Java 5 any longer so we
can always use the ConsoleCredentialsProvider which requires Java 6.
This broke debugging org.eclipse.jgit.pgm since Eclipse doesn't support
using a system console authenticator [1].
[1] see https://bugs.eclipse.org/bugs/show_bug.cgi?id=148831
David Pletcher [Thu, 16 Apr 2015 19:47:15 +0000 (12:47 -0700)]
Expose public getDepth method
The clone or fetch depth is a valuable bit of information
for access logging. Create a public getter to faciliate access.
A precondition check prevents unintentional misuse when the
data isn't valid yet.
Change-Id: I4603d5fd3bd4a767e3e2419b0f2da3664cfbd7f8 Signed-off-by: David Pletcher <dpletcher@google.com>
When a user tried to use a service not enabled in the remote server
a misleading error message was given:
fatal: remote error: Git access forbidden
This patch modifies the error message to make the cause clearer
to the user. Now, when the user tries to use a not enabled service,
the message error clearly states it: