Jonathan Nieder [Sat, 6 Aug 2016 00:36:08 +0000 (20:36 -0400)]
Merge changes I27961679,I91be6165,If0dbd562
* changes:
LfsProtocolServlet: Allow access to objects in request
LfsProtocolServlet: Allow getLargeFileRepository to raise exceptions
Remove references to org.eclipse.jgit.java7
Terry Parker [Thu, 4 Aug 2016 18:14:33 +0000 (11:14 -0700)]
Shallow fetch/clone: Make --depth mean the total history depth
cgit changed the --depth parameter to mean the total depth of history
rather than the depth of ancestors to be returned [1]. JGit still uses
the latter meaning, so update it to match cgit.
depth=0 still means a non-shallow clone. depth=1 now means only the
wants rather than the wants and their direct parents.
This is accomplished by changing the semantic meaning of "depth" in
UploadPack and PackWriter to mean the total depth of history desired,
while keeping "depth" in DepthWalk.{RevWalk,ObjectWalk} to mean
the depth of traversal. Thus UploadPack and PackWriter always
initialize their DepthWalks with "depth-1".
Terry Parker [Wed, 3 Aug 2016 16:01:22 +0000 (09:01 -0700)]
Shallow fetch: Respect "shallow" lines
When fetching from a shallow clone, the client sends "have" lines
to tell the server about objects it already has and "shallow" lines
to tell where its local history terminates. In some circumstances,
the server fails to honor the shallow lines and fails to return
objects that the client needs.
UploadPack passes the "have" lines to PackWriter so PackWriter can
omit them from the generated pack. UploadPack processes "shallow"
lines by calling RevWalk.assumeShallow() with the set of shallow
commits. RevWalk creates and caches RevCommits for these shallow
commits, clearing out their parents. That way, walks correctly
terminate at the shallow commits instead of assuming the client has
history going back behind them. UploadPack converts its RevWalk to an
ObjectWalk, maintaining the cached RevCommits, and passes it to
PackWriter.
Unfortunately, to support shallow fetches the PackWriter does the
following:
if (shallowPack && !(walk instanceof DepthWalk.ObjectWalk))
walk = new DepthWalk.ObjectWalk(reader, depth);
That is, when the client sends a "deepen" line (fetch --depth=<n>)
and the caller has not passed in a DepthWalk.ObjectWalk, PackWriter
throws away the RevWalk that was passed in and makes a new one. The
cleared parent lists prepared by RevWalk.assumeShallow() are lost.
Fortunately UploadPack intends to pass in a DepthWalk.ObjectWalk.
It tries to create it by calling toObjectWalkWithSameObjects() on
a DepthWalk.RevWalk. But it doesn't work: because DepthWalk.RevWalk
does not override the standard RevWalk#toObjectWalkWithSameObjects
implementation, the result is a plain ObjectWalk instead of an
instance of DepthWalk.ObjectWalk.
The result is that the "shallow" information is thrown away and
objects reachable from the shallow commits can be omitted from the
pack sent when fetching with --depth from a shallow clone.
Multiple factors collude to limit the circumstances under which this
bug can be observed:
1. Commits with depth != 0 don't enter DepthGenerator's pending queue.
That means a "have" cannot have any effect on DepthGenerator unless
it is also a "want".
2. DepthGenerator#next() doesn't call carryFlagsImpl(), so the
uninteresting flag is not propagated to ancestors there even if a
"have" is also a "want".
3. JGit treats a depth of 1 as "1 past the wants".
Because of (2), the only place the UNINTERESTING flag can leak to a
shallow commit's parents is in the carryFlags() call from
markUninteresting(). carryFlags() only traverses commits that have
already been parsed: commits yet to be parsed are supposed to inherit
correct flags from their parent in PendingGenerator#next (which
doesn't happen here --- that is (2)). So the list of commits that have
already been parsed becomes relevant.
When we hit the markUninteresting() call, all "want"s, "have"s, and
commits to be unshallowed have been parsed. carryFlags() only
affects the parsed commits. If the "want" is a direct parent of a
"have", then it carryFlags() marks it as uninteresting. If the "have"
was also a "shallow", then its parent pointer should have been null
and the "want" shouldn't have been marked, so we see the bug. If the
"want" is a more distant ancestor then (2) keeps the uninteresting
state from propagating to the "want" and we don't see the bug. If the
"shallow" is not also a "have" then the shallow commit isn't parsed
so (2) keeps the uninteresting state from propagating to the "want
so we don't see the bug.
Here is a reproduction case (time flowing left to right, arrows
pointing to parents). "C" must be a commit that the client
reports as a "have" during negotiation. That can only happen if the
server reports it as an existing branch or tag in the first round of
negotiation:
A <-- B <-- C <-- D
First do
git clone --depth 1 <repo>
which yields D as a "have" and C as a "shallow" commit. Then try
git fetch --depth 1 <repo> B:refs/heads/B
Negotiation sets up: have D, shallow C, have C, want B.
But due to this bug B is marked as uninteresting and is not sent.
Change-Id: I6e14b57b2f85e52d28cdcf356df647870f475440 Signed-off-by: Terry Parker <tparker@google.com>
David Pursehouse [Fri, 29 Jul 2016 03:37:48 +0000 (12:37 +0900)]
LfsProtocolServlet: Allow getLargeFileRepository to raise exceptions
According to the specification [1] the server may return the following
HTTP error responses:
- 403: The user has read, but not write access.
- 404: The repository does not exist for the user.
- 422: Validation error with one or more of the objects in the request.
In the current implementation, however, getLargeFileRepository can only
return null to indicate an error. This results in the error code:
- 503: Service Unavailable
being returned to the client regardless of what the actual reason was.
Add exception classes to cover these cases, derived from a common base
exception, and change the specification of getLargeFileRepository to throw
the base exception.
In LfsProtocolServlet#post, handle the new exceptions and send back the
appropriate HTTP responses as mentioned above.
The specification also mentions several other optional response codes (406,
429, 501, and 509) but these are not implemented in this commit. It should
be trivial to implement them in follow-up commits.
Terry Parker [Wed, 3 Aug 2016 15:36:55 +0000 (08:36 -0700)]
RevWalk: Make fields available to DepthWalk
DepthWalk needs to override toObjectWalkWithSameObjects() and thus
needs to be able to directly set the objects and freeFlags fields, so
make them package private.
Change-Id: I24561b82c54ba3d6522582ca25105b204d777074 Signed-off-by: Terry Parker <tparker@google.com>
Terry Parker [Tue, 2 Aug 2016 15:53:06 +0000 (08:53 -0700)]
Shallow fetch: avoid sending unneeded blobs
When doing an incremental fetch from JGit, "have" commits are marked
as "uninteresting". In a non-shallow fetch, when the RevWalk hits an
"uninteresting" commit it marks the commit's corresponding tree as
uninteresting. That has the effect of dropping those trees and all the
trees and blobs they reference out of the thin pack returned to the
client.
However, shallow fetches use a DepthWalk to limit the RevWalk, which
nearly always causes the RevWalk to terminate before encountering the
"have" commits. As a result the pack created for the incremental fetch
never encounters "uninteresting" tree objects and thus includes
duplicate objects that it knows the client already has.
Change-Id: I7b1f7c3b0d83e04d34cd2fa676f1ad4fec904c05 Signed-off-by: Terry Parker <tparker@google.com>
David Pursehouse [Fri, 22 Jul 2016 08:21:20 +0000 (17:21 +0900)]
FileLfsServlet: Return HTTP 422 instead of 400
According to the specification [1], the error response status code
should be 422 when there is a validation error with one or more of
the objects in the request
David Pursehouse [Fri, 22 Jul 2016 05:05:19 +0000 (14:05 +0900)]
Repository: Log negative useCnt message together with stack trace
The message "close() called when useCnt is already zero" is logged with
level warning, and then if debug logging is enabled, the stack trace is
logged separately with level debug.
Log the message and the stack trace in the same call, so that they always
appear together in the output rather than potentially interleaved with
other log statements.
Change-Id: I1b5c1557ddc2d19f3f5b29baec96e62bc467d88a Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
David Pursehouse [Tue, 26 Jul 2016 01:16:18 +0000 (10:16 +0900)]
MergeFormatter: Suppress warning about unchecked conversion
The warning can be fixed by adding a type to the argument, but doing so
breaks the API and previous attempts to fix it in that way [1, 2] were
reverted [3, 4].
Dan Wang [Fri, 3 Jun 2016 23:39:45 +0000 (16:39 -0700)]
Push implementation of option strings
Example usage:
$ ./jgit push \
--push-option "Reviewer=j.doe@example.org" \
--push-option "<arbitrary string>" \
origin HEAD:refs/for/master
Stefan Beller has also made an equivalent change to CGit:
http://thread.gmane.org/gmane.comp.version-control.git/299872
Change-Id: I6797e50681054dce3bd179e80b731aef5e200d77 Signed-off-by: Dan Wang <dwwang@google.com>
Dave Borowitz [Thu, 14 Jul 2016 16:11:51 +0000 (12:11 -0400)]
DfsObjectDatabase: Expose PackList and move markDirty there
What's invalidated when an object database is "dirty" is not the whole
database, but rather a specific list of packs. If there is a race
between getting the pack list and setting the volatile dirty flag
where the packs are rescanned, we don't need to mark the new pack list
as dirty.
This is a fine point that only really applies if the decision of
whether or not to mark dirty actually requires introspecting the pack
list (say, its timestamps). The general operation of "take whatever
is the current pack list and mark it dirty" may still be inherently
racy, but the cost is not so high.
This variable has been populated and never used since it was
introduced in commit 5cf53fdacf28d5cabe7ad1ed154fe7f4971225a9
(Speed up clone/fetch with large number of refs, 2013-02-18).
Noted by FindBugs:
"BatchRefUpdate.java:359, UC_USELESS_OBJECT, Priority: Normal"
Change-Id: I7aacb49540aaee4a83db3d38b15633bb6c4773d0 Signed-off-by: Dan Wang <dwwang@google.com>
Matthias Sohn [Thu, 14 Jul 2016 20:59:14 +0000 (22:59 +0200)]
Fix AppServer build errors in Eclipse with <4.6 target platforms
9aa3748 added dummy implementations for loadRoleInfo() and
loadUserInfo() to class MappedLoginService to fix compile errors in
Eclipse when using 4.6 target platform which brings Jetty 9.3 adding
these two methods. Unfortunately this causes errors when using non 4.6
target platform coming with an older Jetty version. Fix this by
extracting the anonymous subclass of MappedLoginService which allows to
suppress the unused private method errors in Eclipse.
Change-Id: I75baeea7ff4502ce9ef2b541b3c0555da5535d79 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Dave Borowitz [Thu, 14 Jul 2016 15:40:55 +0000 (11:40 -0400)]
Invalidate DfsObjDatabase pack list when refs are updated
Currently, there is a race where a user of a DfsRepository in a single
thread may get unexpected MissingObjectExceptions trying to look up an
object that appears as the current value of a ref:
1. Thread A scans packs before scanning refs, for example by reading
an object by SHA-1.
2. Thread B flushes an object and updates a ref to point to that
object.
3. Thread A looks up the ref updated in (2). Since it is scanning refs
for the first time, it sees the new object SHA-1.
4. Thread A tries to read the object it found in (3), using the cached
pack list it got from (1). The object appears missing.
Allow implementations to work around this by marking the object
database's current pack list as "dirty." A dirty pack list means that
DfsReader will rescan packs and try again if a requested object is
missing. Implementations should mark objects as dirty any time the ref
database reads or scans refs that might be newer than a previously
cached pack list.
Matthias Sohn [Tue, 12 Jul 2016 15:12:41 +0000 (17:12 +0200)]
Merge branch 'stable-4.4'
* stable-4.4:
Log if Repository.useCnt becomes negative
Time based eviction strategy for repository cache
Add method to read time unit from config
Align include.path max depth with native git
Config load should not fail on unsupported or nonexistent include path
Allow using JDK 7 bootclasspath when compiling JGit using Java 8
Extract work queue to allow reusing it
Change-Id: I6aeedb1cb8b0c3068af344a719c80a03ae68fc23 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
When Repository.close() decrements the useCount to 0 currently the cache
immediately evicts the repository from WindowCache and RepositoryCache.
This leads to I/O overhead on busy repositories because pack files and
references are inserted and deleted from the cache frequently.
This commit defers the eviction of a repository from the caches until
last use of the repository is older than time to live. The eviction is
handled by a background task running periodically.
Add two new configuration parameters:
* core.repositoryCacheExpireAfter: cache entries are evicted if the
cache entry wasn't accessed longer than this time in milliseconds
* core.repositoryCacheCleanupDelay: defines the interval in milliseconds
for running a background task evicting expired cache entries. If set to
-1 the delay is set to min(repositoryCacheExpireAfter, 10 minutes). If
set to 0 the time based eviction is switched off and no background task
is started. If time based eviction is switched off the JVM can still
evict cache entries if heap memory is running low.
Change-Id: I4a0214ad8b4a193985dda6a0ade63b70bdb948d7 Also-by: Matthias Sohn <matthias.sohn@sap.com> Also-by: Hugo Arès <hugo.ares@ericsson.com> Also-by: Sasa Zivkov <sasa.zivkov@sap.com>
This functionality is implemented in Gerrit ConfigUtil class. Add it to
JGit so it can eventually be remove from Gerrit.
Change-Id: I2d6564ff656b6ab9424a9360624061c94fd5f413 Signed-off-by: Hugo Arès <hugo.ares@ericsson.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Matthias Sohn [Sun, 10 Jul 2016 22:27:56 +0000 (00:27 +0200)]
Implement new abstract MappedLoginService methods added in Jetty 9.3
Eclipse Neon comes with Jetty 9.3 which is causing unimplemented
abstract method errors in test class AppServer when using the JGit or
EGit Neon target platform. Fix this by adding dummy implementations.
Change-Id: Ie49107d814a846997de95f149e91fe1ec2fbe4d8 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Jonathan Nieder [Thu, 7 Jul 2016 23:08:02 +0000 (16:08 -0700)]
DfsGarbageCollector: avoid closing idx and bitmap streams twice
These try-with-resources blocks close the underlying output stream
twice: once when closing the CountingOutputStream wrapper, then again
when closing the DfsOutputStream out.
Simplify by only closing the CountingOutputStream.
In practice this shouldn't matter because the close() method of a
Closable is required to be idempotent, but avoiding the redundant
extra close makes the code simpler to read and understand.
ReceivePack: report protocol parsing failures on channel 3
If the client sent a well-formed enough request to see it wants to use
side-band-64k for status reporting (meaning its a modern client), but
any other command record was somehow invalid (e.g. corrupt SHA-1)
report the parsing exception using channel 3. This allows clients to
see the failure and know the server will not be continuing.
git-core and JGit clients send all commands and then start a sideband
demux before sending the pack. By consuming all commands first we get
the client into a state where it can see and respond to the channel 3
server failure.
This behavior is useful on HTTPS connections when the client is buggy
and sent a corrupt command, but still managed to request side-band-64k
in the first line.
ReceivePack: catch InvalidObjectIdException while parsing shallow
The "shallow $id" parsing can also throw InvalidObjectIdException,
just like parseCommand. Move it into its own method with a proper
try-catch block to convert to the checked PackProtocolException.
ReceivePack: enable capabilities immediately on first line
Instead of deferring until after command parsing, enable the
capabilities after the first pkt-line has been read from the client.
This allows the server to setup the side-band-64k channel immediately.
push: Report fatal server errors during pack writing
If the push client has requested side-band support the server can
signal a fatal error parsing the pack using the error channel (3)
and then hang up. This may cause the PackWriter to fail to write to
data onto the network socket, which throws a misleading error back
up to the application and the user.
During a write failure poll the input to see if the side band system
can parse out an error message off channel 3. This should be fast as
there will either be an error present in the buffer, or the remote will
also have hung-up on the side band channel. In the case of a hang-up
just rethrow the original IOException as its a network error.
This roughly matches what C git does; once commands are sent and the
packer is started a new thread runs in the background to decode any
possible server error during unpacking on the remote peer
ReceivePack: Catch InvalidObjectIdException instead of IAE
The more specific type InvalidObjectIdException is thrown by
ObjectId.fromString(). Use it here in ReceivePack as the more
generic IAE is never thrown by the body of the try-catch block.
A RefAdvertiser writing to the network includes both the reference's
ObjectId and its peeled ObjectId in the advertised set. In smart HTTP
negotiation requests may bypass the RefAdvertiser and quickly build
the set based on current refs; include the peeled ObjectIds to match
behavior with the normal bidirectional protocols on git:// and SSH.
This field was being set twice within the block. Setting it just once
is sufficient. writeString() does not examine the field so it is fine
to set it after the call.
Shawn Pearce [Mon, 27 Jun 2016 15:52:00 +0000 (11:52 -0400)]
Merge changes from topic 'dfs-gc'
* changes:
Prune UNREACHABLE_GARBAGE packs when they expire
Use try-with-resources in DfsGarbageCollector.writePack
Fix lastModified to be consistent in DfsGarbageCollector
Add GC_REST PackSource to better order DFS packs
Mike Williams [Fri, 17 Jun 2016 15:34:36 +0000 (11:34 -0400)]
Prune UNREACHABLE_GARBAGE packs when they expire
DfsGarbageCollector will now enforce a maximum time to live (TTL) for
UNREACHABLE_GARBAGE packs. The default TTL is 1 day, which should be
enough time to avoid races with other processes that are inserting
data into the repository.
Change-Id: Id719e6e2a03cfc9a0c0aef8ed71d261dda14bd0c Signed-off-by: Mike Williams <miwilliams@google.com>
Hugo Arès [Wed, 15 Jun 2016 18:40:05 +0000 (14:40 -0400)]
Config load should not fail on unsupported or nonexistent include path
1f86350 added initial support for include.path. Relative path and path
with tilde are not yet supported but config load was failing if one of
those 2 unsupported options was encountered. Another problem was that
config load was failing if the include.path file did not exist.
Change the behavior to be consistent with native git. Ignore unsupported
or nonexistent include.path.
Bug: 495505
Bug: 496732
Change-Id: I7285d0e7abb6389ba6983e9c46021bea4344af68 Signed-off-by: Hugo Arès <hugo.ares@ericsson.com>
Shawn Pearce [Sun, 26 Jun 2016 18:18:59 +0000 (11:18 -0700)]
Fix lastModified to be consistent in DfsGarbageCollector
Set all packs written by the DfsGarbageCollector to use the same
starting timestamp as lastModified. This makes it easier to see
which packs came from the same DfsGarbageCollector run, as they
share the same timestamp.
This has provided decent performance for object lookups. Starting
from an arbitrary reference may find the content in a newer pack
created by DfsObjectInserter or a ReceivePack server. Compaction of
recent packs also contains newer content, and then most interesting
data is in the "main" GC pack. As the GC pack is self-contained (has
no edges leading outside) readers typically do not need to go further.
Adding a new GC_REST PackSource allows the DfsGarbageCollector to
identify to the pack ordering code which pack is which, so the
non-heads are scanned second during reads. This removes a hack that
was unique to Google's implementation that enforced this behavior by
fixing up the lastModified timestamp.
Renumber the PackSource's categories to reflect this search ordering.
Fix TreeWalk to reset attributes cache for each entry
Treewalk has a member 'attr' which caches the attributes for the current
entry. We did not reset the cache always when moving to next entry. The
effect was that when there are no attributes for an entry 'a' but 'a'
was skipped by a Treewalk filter then Treewalk stopped looking for
attributes until TreeWalk.next() was called again.
Fix DirCacheCheckout to return CheckoutConflictException
Problem occurs when the checkout wants to create a file 'd/f' but
the workingtree contains a dirty file 'd'. In order to create d/f the
file 'd' would have to be deleted and since the file is dirty that
content would be lost. This should lead to a CheckoutConflictException
for d/f when failOnConflict was set to true.
This fix also changes jgit checkout semantics to be more like native
gits checkout semantics. If during a checkout jgit wants to delete a
folder but finds that the working tree contains a dirty file at this
path then JGit will now throw an exception instead of silently keeping
the dirty file. Like in this example:
git init
touch b
git add b
git commit -m addB
mkdir a
touch a/c
git add a/c
git commit -m addAC
rm -fr a
touch a
git checkout HEAD~
Matthias Sohn [Sun, 19 Jun 2016 10:45:37 +0000 (12:45 +0200)]
Allow using JDK 7 bootclasspath when compiling JGit using Java 8
When compiling jgit using Java 8 set system property JDK_HOME to
JAVA_HOME path of JDK7 installation to compile against JDK 7 class
libraries. Otherwise jgit may hit runtime exceptions when running on
Java 7 (e.g. return type of ConcurrentHashMap.keySet() in JDK 8 class
library doesn't exist in JDK 7).
Shawn Pearce [Sun, 19 Jun 2016 04:40:35 +0000 (21:40 -0700)]
Optimize RefAdvertiser for wire protocol
The native wire protocol sends ref advertisements in the pkt-line
format, which requires encoding the ObjectId and ref name onto a byte
sequence. Busy servers show this is a very high source of garbage,
which pushes the garbage collector harder when there are many refs in
the repository (e.g. 70k, in a Gerrit managed repository).
Optimize the side band advertiser by retaining the CharsetEncoder,
minimizing the amount of temporary garbage built during encoding.
Marco Miller [Wed, 25 May 2016 20:45:13 +0000 (16:45 -0400)]
RefDirectory: remove ref lock file for following ref dir removal
Before this fix, ref directory removal did not work. That was because
the ref lock file was still in the leaf directory at deletion time.
Hence no deep ref directories were ever deleted, which negatively
impacted performance under large directory structure circumstances.
This fix removes the ref lock file before attempting to delete the ref
directory (which includes it). The other deep parent directories are
therefore now successfully deleted in turn, since leaf's content
(lock file) gets removed first.
So, given a structure such as refs/any/directory[/**], this fix now
deletes all empty directories up to -and including- 'directory'. The
'any' directory (e.g.) does not get deleted even if empty, as before.
The ref lock file is still also removed in the calling block's finally
clause, just in case, as before. Such double-unlock brought by this
fix is harmless (a no-op).
A new (private) RefDirectory#delete method is introduced to support
this #pack-specific case; other RefDirectory#delete callers remain
untouched.
Change-Id: I47ba1eeb9bcf0cb93d2ed105d84fea2dac756a5a Signed-off-by: Marco Miller <marco.miller@ericsson.com>
http transport does not use authentication fallback
Git servers supporting HTTP transport can send multiple WWW-Authenticate
challenges [1] for different authentication schemes the server supports.
If authentication fails now retry all authentication types proposed by
the server.
[1] https://tools.ietf.org/html/rfc2617#page-3
Bug: 492057
Change-Id: I01d438a5896f9b1008bd6b751ad9c7cbf780af1a Signed-off-by: Christian Pontesegger <christian.pontesegger@web.de> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Shawn Pearce [Thu, 2 Jun 2016 04:48:21 +0000 (21:48 -0700)]
DfsBlock: throw DataFormatException on 0 bytes
setInput should always push at least 1 byte into the Inflater. If 0
bytes (or negative!) are being sent the DfsBlock is inconsistent with
the position passed in. This indicates a severe programming problem
in the caller, and may cause an infinite loop in DfsReader.
Today we saw a handful of live examples of this but don't know what
the cause is. Guard against this error condition and throw with a
more verbose failure, which may prevent an infinite loop. Callers
will eventually catch DataFormatException and rethrow with more detail
about the object that cannot be inflated, with the DFE in the chain.
When using a DfsInserter for high-throughput insertion of many
objects (analogous to git-fast-import), we don't necessarily want to
do a random object lookup for each. It'll be faster from the
inserter's perspective to insert the duplicate objects and let a later
GC handle the deduplication.