DfsGarbageCollector: support disabling conversion to reftable
When a repository is initially created using only reftable but doesn't
yet have a GC pack, the garbage collector shouldn't scan the ref
database. Support disabling the reftable conversion path.
Remove dead warning about minUpdaeIndex and maxUpdateIndex affecting refresh
DfsGarbageCollector always performs refreshes. This warning was from
a prior iteration of the patch set and should have been removed before
the change was merged.
Change-Id: Ida7b9ddc991515ab233763f2cb985853c9143a3c Signed-off-by: David Pursehouse <david.pursehouse@gmail.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Shawn Pearce [Thu, 10 Aug 2017 23:41:26 +0000 (16:41 -0700)]
dfs: write reftable from DfsGarbageCollector
If a ReftableConfig has been supplied by the caller, write out a
reftable as a sibling of the the GC pack, alongside the heads.
To bootstrap from a non-reftable system, the refs are read from the
DfsRefDatabase if no GC reftables are present. Its assumed the
references are fully current, and do not need to be merged with any
other reftables. Any non-GC reftables will be pruned at the end of
the GC cycle, just like any packs that were replaced.
If a GC reftable is present, all existing reftables are compacted, and
references from DfsRefDatabase are only used to seed the packer. Its
assumed these are consistent with each other.
Thomas Wolf [Mon, 4 Sep 2017 09:02:37 +0000 (11:02 +0200)]
Fix Daemon.stop() to actually stop the listener thread
ServerSocket.accept() is not interruptible: a thread busy in accept()
may not react to Thread.interrupt() and may not return from accept()
via an InterruptedException. Close the socket instead to make the
daemon's listener thread terminate.
* Close the listening socket to get the listening thread to exit
instead of interrupting it.
* Add a stopAndWait() method that stops the listening thread and
then waits until it has indeed finished.
* Set SO_REUSE_ADDRESS on the listening socket.
Bug: 376369
Change-Id: I9d6014103e6dcb0173daea134feb44dc52c5c69a Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Matthias Sohn [Sat, 2 Sep 2017 22:06:59 +0000 (00:06 +0200)]
Remove workaround for bug in Java's ReferenceQueue
Sun's Java 5, 6, 7 implementation had a bug [1] where a Reference can be
enqueued and dequeued twice on the same reference queue due to a race
condition within ReferenceQueue.enqueue(Reference).
This bug was fixed for Java 8 [2] hence remove the workaround.
Thomas Wolf [Tue, 29 Aug 2017 07:37:30 +0000 (09:37 +0200)]
Don't assume name = path in .gitmodules
While parsing .gitmodules, the name of the submodule subsection is
purely arbitrary: it frequently is the path of the submodule, but
there's no requirement for it to be. By building a map of paths to
the section name in .gitmodules, we can more accurately return
the submodule URL.
Bug: 508801
Change-Id: I8399ccada1834d4cc5d023344b97dcf8d5869b16 Also-by: Doug Kelly <dougk.ff7@gmail.com> Signed-off-by: Doug Kelly <dougk.ff7@gmail.com> Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch> Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Matthias Sohn [Wed, 30 Aug 2017 00:17:51 +0000 (02:17 +0200)]
Add org.apache.commons.codec 1.9.0 to target platform
This is needed to run tests in org.eclipse.jgit.http.test from Eclipse.
The change 7ac1bfc8 which added this dependency to
org.eclipse.jgit.http.test was already merged.
Restrict dependency to org.apache.commons.codec to the
version range [1.6.0,2.0.0).
CQ: 14048
Change-Id: I461a5f6bfc114757061d68992f9bc7ab38622328 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Thomas Wolf [Thu, 31 Aug 2017 13:48:10 +0000 (15:48 +0200)]
Fix some tests for running in bazel
Some tests call out to external cgit. Those tests all failed for me
locally on Mac. Turned out that the reason was that the system git
config used by the git in the bazel run contained paths with ~/ but
somehow $HOME was not set. As a result the external git returned
with exit code 128.
Fix this by passing along $HOME explicitly. Also improve assertions
to make sure we do get the stderr of the external command in the
test log.
I hadn't noticed that until now because apparently the maven build
does pass along $HOME.
Change-Id: I7069676d5cc7b23a71e79a4866fe8acab5a405f4 Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Shawn Pearce [Wed, 30 Aug 2017 12:42:37 +0000 (08:42 -0400)]
Merge changes from topic 'fsck'
* changes:
DfsFsck: reduce memory usage during verifyIndex
DfsFsck: refactor pack verify into its own method
DfsFsck: run connectivity check pass exactly once
Shawn Pearce [Wed, 30 Aug 2017 01:40:03 +0000 (18:40 -0700)]
DfsFsck: reduce memory usage during verifyIndex
Don't convert a lot of ObjectId to String stored in generic
java.util.HashSet. This is a very expensive way to store objects.
Instead rely on "this" from the FsckPackParser to lookup information
about the objects in this pack file, which lets the verify code avoid
sorting the object list.
Use ObjectIdOwnerMap, which is the most efficient format JGit has
for storing lots of objects.
Shawn Pearce [Wed, 30 Aug 2017 01:14:51 +0000 (18:14 -0700)]
DfsFsck: run connectivity check pass exactly once
The simpler algorithm is to load all branch tips into an ObjectWalk
and run that walk exactly once. This avoids redoing work related to
parsing and considering trees reused across side branches.
Move the connectivity check into its own helper method. This moves it
left one level of identation, and makes it easier to fit the method's
logic with less line wrapping.
Add a "Counting objects..." progress monitor around this phase. Its
what is used when a server receives a push and is also trying to
verify the client sent all required objects.
Robin Stocker [Thu, 18 Jul 2013 17:08:29 +0000 (19:08 +0200)]
Fix compilation errors with args4j 2.0.23 and later
The multiValued attribute on @Option was removed. When the field is a
List, it's not actually needed (even with earlier versions of args4j),
see RmTest. In other cases, we have a custom handler, where it's also
not needed.
Matthias Sohn [Wed, 30 Aug 2017 01:07:18 +0000 (03:07 +0200)]
Partially revert c0ad77d8 "Enhance Eclipse save actions"
Do not automatically organize imports using a save action since this
seems to be buggy and removed some annotations org.eclipse.jgit.pgm
needs to use args4j.
Change-Id: I5a91292c3b9241ce2dde3e4ecce14ad460097129 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Matthias Sohn [Tue, 29 Aug 2017 21:41:16 +0000 (23:41 +0200)]
Partially revert c0ad77d8 "Enhance Eclipse save actions"
Revert the following save actions which were introduced in c0ad77d8:
- always use braces around blocks
- remove unused imports
Other than I expected save actions are run globally on edited files -
and not only on edited code lines only.
Hence revert the save action "Convert control statement bodies to
blocks" which would affect a large number of code lines not affected by
the change editing some small part of a class. This would generate a
large number of changes which may lead to many unnecessary conflicts.
Total number of affected lines across jgit would be around 10k lines.
Also revert "Remove unused imports" since it erroneously removes imports
of some annotations needed by pgm classes using args4j.
Change-Id: I879a47f68e664129e6124cf25c1ae1f6a2d7a5aa Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Reftable storage in DFS is related to pack storage. Reftables are
stored in the same namespace, but with PackExt.REFTABLE. Include
the set of DfsReftable instances in the PackList and export some
helpers to access the tables.
dfs: support reading reftables through DfsBlockCache
DfsBlockCache directly shares its internal byte[] with ReftableReader,
avoding copying between the DfsBlockCache and the BlockReader
instances used by ReftableReader.
Matthias Sohn [Sat, 26 Aug 2017 20:19:30 +0000 (22:19 +0200)]
Enhance Eclipse save actions
Add the following Eclipse save actions executed when saving modified
lines. This should help to reduce manual work needed to maintain a clean
and consistent code style:
- organize imports
- always use braces around blocks
- add missing annotations
- @Override including implementation of interface methods
- @Deprecated
- remove
- unused imports
- unnecessary $NON-NLS$ tags
- redundant type arguments
Also add default values for new settings that were introduced in recent
Eclipse versions up to Neon since we updated save rules the last time.
Change-Id: Idc90b249df044d0552f04edf01a5f607c4846f50 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Masaya Suzuki [Thu, 10 Aug 2017 06:30:03 +0000 (23:30 -0700)]
Add BlobObjectChecker
Some repositories can have a policy that do not accept certain blobs. To
check if the incoming pack file contains such blobs, ObjectChecker can
be used. However, this ObjectChecker is not called by PackParser if the
blob is stored as a whole. This is because the object can be so large
that it doesn't fit in memory.
This change introduces BlobObjectChecker. This interface takes chunks of
a blob instead of the entire object. ObjectChecker can optionally return
a BlobObjectChecker. This won't change existing ObjectChecker
implementation; existing implementation continues to receive deltified
blob objects only.
Thomas Wolf [Sun, 27 Aug 2017 13:35:35 +0000 (15:35 +0200)]
FetchCommand: pass on CredentialsProvider to submodule fetches
When a JGit API command is implemented in terms of other API
commands, the child command must "inherit" all relevant settings.
Calling configure() ensures that the CredentialsProvider and the
connection timeout are propagated correctly.
Bug: 515325
Change-Id: I948e306693a9edb7b199a735877413b6eddcfba4 Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Thomas Wolf [Sun, 13 Aug 2017 13:08:16 +0000 (15:08 +0200)]
Exclude file matching: fix backtracking on match failures after **
** matching always tries the empty match first. If a mismatch occurs
later, the ** must be extended by exactly one segment and matching must
resume with the matcher following the ** matcher.
Bug: 520920
Change-Id: Id019ad1c773bd645ae92e398021952f8e961f45c Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Thomas Wolf [Sat, 12 Aug 2017 19:51:52 +0000 (21:51 +0200)]
Fix path pattern matching to work also for gitattributes
Path pattern matching for attribute rules is different than matching
for excluded files.
The first difference concerns patterns without slashes. For
gitattributes those must match on the last component only, not on
any earlier segment. This is true also for directory-only patterns.
The second difference concerns directory-only patterns. Those also
must not match on a prefix or segment except the last one. They do
not apply recursively to all files beneath.
And third, matches only on a prefix must match for gitattributes
only if the last matcher was "/**".
Add a new parameter for such path matching to IMatcher.matches() and
pass it through as appropriate (false for gitignore, true for
gitattributes). As far as gitignore is concerned, there is no change.
New tests have been added, and some existing attribute matching tests
have been fixed since they operated on wrong assumptions.
Bug: 508568
Change-Id: Ie825dc2cac8a85a72a7eeb0abb888f3193d21dd2 Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Thomas Wolf [Mon, 14 Aug 2017 06:16:04 +0000 (08:16 +0200)]
Add new tests for gitignore/gitattribute pattern matching
These tests verify that JGit matches the same as C git, for
both attribute matching (.gitattributes) and file exclusion matching
(.gitignore). These tests work by setting up a test repository and
test rules, and then determine excluded files or attributes both with
JGit and with the native C git, and then compare the results.
For .gitignore tests, we run
git ls-files --ignored --exclude-standard -o
and for attribute tests we use
git check-attr --stdin --all
and pass the list of all files in the repository via stdin.
Change-Id: I5b40946e04ff4a97456be7dffe09374323b7c89d Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Thomas Wolf [Tue, 22 Aug 2017 07:02:31 +0000 (09:02 +0200)]
Add a getter for a list of RefSpecs to Config
Reading RefSpecs from a Config can be seen as another typed value
conversion, so add a getter to Config and to TypedConfigGetter. Use
it in RemoteConfig.
Doing this allows clients of the JGit library to customize the
handling of invalid RefSpecs in git config files by installing a
custom TypedConfigGetter.
Bug: 517314
Change-Id: I0ebc0f073fabc85c2a693b43f5ba5962d8a795ff Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Thomas Wolf [Tue, 15 Aug 2017 12:52:44 +0000 (14:52 +0200)]
Improve getting typed values from a Config
Make the handling of typed values somewhat configurable by using
a separate converter. The default converter is the same as before;
just the implementations of the getters were moved. They also still
raise IllegalArgumentException on invalid values as before.
The converter can be set globally via Config.setTypedConfigGetter(),
which EGit can use in its core Activator to plug in a variant that
catches the IllegalArgumentException, logs the problem, and then
returns the default value.
In this way the behavior for other users of the JGit library is
unchanged, while EGit can deal gracefully with invalid git configs.
Bug: 520978
Change-Id: Ie8f81d206e358b6cc57aa29b9d7ad2a5d34b86a1 Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Thomas Wolf [Sat, 10 Jun 2017 12:26:32 +0000 (14:26 +0200)]
Do most %-token substitutions in OpenSshConfig
Except for %p and %r and partially %C, we can do token substitutions
as defined by OpenSSH inside the config file parser. %p and %r can
be replaced only if specified in the config; if not, it would be the
caller's responsibility to replace them with values obtained from the
URI to connect to.
Jsch doesn't know about token substitutions at all. By doing the
replacements as good as we can in the config file parser, we can
make Jsch support most of these tokens.
%i is not handled at all as Java has no concept of a "user ID".
Includes unit tests.
Bug: 496170
Change-Id: If9d324090707de5d50c740b0d4455aefa8db46ee Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Thomas Wolf [Wed, 7 Jun 2017 16:39:19 +0000 (18:39 +0200)]
Let Jsch know about ~/.ssh/config
Ensure the Jsch instance used knows about ~/.ssh/config. This
enables Jsch to honor more user configurations (see
com.jcraft.jsch.Session.applyConfig()), in particular also the
UserKnownHostsFile configuration, or additional identities given
via multiple IdentityFile entries.
Turn JGit's OpenSshConfig into a full parser that can be a
Jsch-compliant ConfigRepository. This avoids a few bugs
in Jsch's OpenSSHConfig and keeps the JGit-facing interface
unchanged. At the same time we can supply a JGit OpenSshConfig
instance as a ConfigRepository to Jsch. And since they'll both
work from the same object, we can also be sure that the parsing
behavior is identical.
The parser does not handle the "Match" and "Include" keys, and it
doesn't do %-token substitutions (yet).
Note that Jsch doesn't handle multi-valued UserKnownHostFile
entries as known by modern OpenSSH.[1]
Additional tests for new features are provided in OpenSshConfigTest.
Bug: 490939
Change-Id: Ic683bd412fa8c5632142aebba4a07fad4c64c637 Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Masaya Suzuki [Fri, 25 Aug 2017 22:05:45 +0000 (15:05 -0700)]
Consume request body before flushing the buffer
This is continuation from https://git.eclipse.org/r/#/c/94249/. When an
error happens, we might not read the entire stream. Consume the request
body before we flush the buffer.
Thomas Wolf [Wed, 23 Aug 2017 09:50:05 +0000 (11:50 +0200)]
Cleanup: message reporting for HTTP redirect handling
The addition of "tooManyRedirects" in commit 7ac1bfc ("Do
authentication re-tries on HTTP POST") was an error I didn't
catch after rebasing that change. That message had been renamed
in the earlier commit e17bfc9 ("Add support to follow HTTP
redirects") to "redirectLimitExceeded".
Also make sure we always use the TransportException(URIish, ...)
constructor; it'll prefix the message given with the sanitized URI.
Change messages to remove the explicit mention of that URI inside the
message. Adapt tests that check the expected exception message text.
For the info logging of redirects, remove a potentially present
password component in the URI to avoid leaking it into the log.
Change-Id: I517112404757a9a947e92aaace743c6541dce6aa Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Thomas Wolf [Fri, 16 Jun 2017 08:25:53 +0000 (10:25 +0200)]
Do authentication re-tries on HTTP POST
There is at least one git server out there (GOGS) that does
not require authentication on the initial GET for
info/refs?service=git-receive-pack but that _does_ require
authentication for the subsequent POST to actually do the push.
This occurs on GOGS with public repositories; for private
repositories it wants authentication up front.
Handle this behavior by adding 401 handling to our POST request.
Note that this is suboptimal; we'll re-send the push data at
least twice if an authentication failure on POST occurs. It
would be much better if the server required authentication
up-front in the GET request.
Added authentication unit tests (using BASIC auth) to the
SmartClientSmartServerTest:
- clone with authentication
- clone with authentication but lacking CredentialsProvider
- clone with authentication and wrong password
- clone with authentication after redirect
- clone with authentication only on POST, but not on GET
Also tested manually in the wild using repositories at try.gogs.io.
That server offers only BASIC auth, so the other paths
(DIGEST, NEGOTIATE, fall back from DIGEST to BASIC) are untested
and I have no way to test them.
* public repository: GET unauthenticated, POST authenticated
Also tested after clearing the credentials and then entering a
wrong password: correctly asks three times during the HTTP
POST for user name and password, then gives up.
* private repository: authentication already on GET; then gets
applied correctly initially to the POST request, which succeeds.
Also fix the authentication to use the credentials for the redirected
URI if redirects had occurred. We must not present the credentials
for the original URI in that case. Consider a malicious redirect A->B:
this would allow server B to harvest the user credentials for server
A. The unit test for authentication after a redirect also tests for
this.
Bug: 513043
Change-Id: I97ee5058569efa1545a6c6f6edfd2b357c40592a Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Shawn Pearce [Sat, 19 Aug 2017 18:28:34 +0000 (11:28 -0700)]
reftable: explicitly store update_index per ref
Add an update_index to every reference in a reftable, storing the
exact transaction that last modified the reference. This is necessary
to fix some merge race conditions.
Consider updates at T1, T3 are present in two reftables. Compacting
these will create a table with range [T1,T3]. If T2 arrives during
or after the compaction its impossible for readers to know how to
merge the [T1,T3] table with the T2 table.
With an explicit update_index per reference, MergedReftable is able to
individually sort each reference, merging individual entries at T3
from [T1,T3] ahead of identically named entries appearing in T2.
David Pursehouse [Fri, 18 Aug 2017 21:05:41 +0000 (17:05 -0400)]
Merge changes Id3994e2d,I5e2a2868,I255af794
* changes:
LongObjectIdTest: Add back self comparison test
Format BUILD files with buildifier
Bazel: Add missing dependency in org.eclipse.jgit.http.test
resolve(Ref) helps callers recursively chase symbolic references and
is a useful function when wrapping a Reftable inside a RefDatabase, as
RefCursor does not resolve symbolic references during iteration.
Transactions may wish to merge several tables together as part of an
operation. Setting a byte limit allows the transaction to consider
only some recent tables, bounding the cost of the compaction.
MergedReftable combines multiple reference tables together in a stack,
allowing higher/later tables to shadow earlier/lower tables. This
forms the basis of a transaction system, where each transaction writes
a new reftable containing only the modified references, and readers
perform a merge on the fly to get the latest value.
Add additional test cases for looking up entries within a namespace
such as refs/heads/ or refs/tags/, where the seek is passed a name
that ends with '/'.
ReftableReader provides sequential scanning support over all
references, a range of references within a subtree (such as
"refs/heads/"), and lookup of a single reference. Reads can be
accelerated by an index block, if it was created by the writer.
The BlockSource interface provides an abstraction to read from the
reftable's backing storage, supporting a future commit to connect
to JGit DFS and the DfsBlockCache.
This is a simple writer to create reftable formatted files. Follow-up
commits will add support for reading from reftable, debugging
utilities, and tests.
Some repositories contain a lot of references (e.g. android at 866k,
rails at 31k). The reftable format provides:
- Near constant time lookup for any single reference, even when the
repository is cold and not in process or kernel cache.
- Near constant time verification a SHA-1 is referred to by at least
one reference (for allow-tip-sha1-in-want).
- Efficient lookup of an entire namespace, such as `refs/tags/`.
- Support atomic push `O(size_of_update)` operations.
- Combine reflog storage with ref storage.
Thomas Wolf [Wed, 22 Apr 2015 15:05:12 +0000 (17:05 +0200)]
Add support to follow HTTP redirects
git-core follows HTTP redirects so JGit should also provide this.
Implement config setting http.followRedirects with possible values
"false" (= never), "true" (= always), and "initial" (only on GET, but
not on POST).[1]
We must do our own redirect handling and cannot rely on the support
that the underlying real connection may offer. At least the JDK's
HttpURLConnection has two features that get in the way:
* it does not allow cross-protocol redirects and thus fails on
http->https redirects (for instance, on Github).
* it translates a redirect after a POST to a GET unless the system
property "http.strictPostRedirect" is set to true. We don't want
to manipulate that system setting nor require it.
Additionally, git has its own rules about what redirects it accepts;[2]
for instance, it does not allow a redirect that adds query arguments.
We handle response codes 301, 302, 303, and 307 as per RFC 2616.[3]
On POST we do not handle 303, and we follow redirects only if
http.followRedirects == true.
Redirects are followed only a certain number of times. There are two
ways to control that limit:
* by default, the limit is given by the http.maxRedirects system
property that is also used by the JDK. If the system property is
not set, the default is 5. (This is much lower than the JDK default
of 20, but I don't see the value of following so many redirects.)
* this can be overwritten by a http.maxRedirects git config setting.
The JGit http.* git config settings are currently all global; JGit has
no support yet for URI-specific settings "http.<pattern>.name". Adding
support for that is well beyond the scope of this change.
Like git-core, we log every redirect attempt (LOG.info) so that users
may know about the redirection having occurred.
Extends the test framework to configure an AppServer with HTTPS support
so that we can test cloning via HTTPS and redirections involving HTTPS.
CQ: 13987
Bug: 465167
Change-Id: I86518cb76842f7d326b51f8715e3bbf8ada89859 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com> Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Thomas Wolf [Fri, 7 Jul 2017 09:26:02 +0000 (11:26 +0200)]
Send a detailed event on working tree modifications
Currently there is no way to determine the precise changes done
to the working tree by a JGit command. Only the CheckoutCommand
actually provides access to the lists of modified, deleted, and
to-be-deleted files, but those lists may be inaccurate (since they
are determined up-front before the working tree is modified) if
the actual checkout then fails halfway through. Moreover, other
JGit commands that modify the working tree do not offer any way to
figure out which files were changed.
This poses problems for EGit, which may need to refresh parts of the
Eclipse workspace when JGit has done java.io file operations.
Provide the foundations for better file change tracking: the working
tree is modified exclusively in DirCacheCheckout. Make it emit a new
type of RepositoryEvent that lists all files that were modified or
deleted, even if the checkout failed halfway through. We update the
'updated' and 'removed' lists determined up-front in case of file
system problems to reflect the actual state of changes made.
EGit thus can register a listener for these events and then knows
exactly which parts of the Eclipse workspace may need to be refreshed.
Two commands manage checking out individual DirCacheEntries themselves:
checkout specific paths, and applying a stash with untracked files.
Make those two also emit such a new WorkingTreeModifiedEvent.
Furthermore, merges may modify files, and clean, rm, and stash create
may delete files.
CQ: 13969
Bug: 500106
Change-Id: I7a100aee315791fa1201f43bbad61fbae60b35cb Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>