Dave Borowitz [Fri, 7 Jul 2017 15:24:51 +0000 (11:24 -0400)]
Fix deleting symrefs
The RefDirectory implementation of doDelete never considered whether to
delete a symref or its leaf, because the detachingSymbolicRef bit was
never exposed from RefUpdate. The behavior was thus incorrectly to
always delete the symref, never the leaf.
There was no test for this behavior. The only thing that attempted to be
a test was testDeleteHeadInBareRepo, but this test was broken for
reasons unrelated to this bug. Specifically, it set the leaf to point to
a completely nonexistent object, and then asserted that deleting HEAD
resulted in NO_CHANGE. The only reason this test ever passed is because
of a quirk of updateImpl, which treats a missing object as the same as
null. This quirk aside, the test wasn't really testing the right thing.
Turn this into a real test by writing out a real object and pointing the
leaf at that.
Also, add a test for the detachingSymbolicRef case, i.e. deleting the
symref and leaving the leaf alone.
Dave Borowitz [Wed, 5 Jul 2017 18:19:36 +0000 (14:19 -0400)]
RefDirectory: Throw exception if CAS of packed ref list fails
The contents of the packedRefList AtomicReference should never differ
from what we expect prior to writing, because this segment of the code
is protected by the packed-refs lock file on disk. If it does happen,
whether due to programmer error or a rogue process not respecting the
locking protocol, it's better to let the caller know than to silently
drop the whole commit operation on the floor.
The existing concurrentOnlyOneWritesPackedRefs test is inherently
nondeterministic as written, and was already about 6% flaky as measured
by bazel:
$ bazel test --runs_per_test=200 //org.eclipse.jgit.test:org_eclipse_jgit_internal_storage_file_GcPackRefsTest
...
INFO: Elapsed time: 42.608s, Critical Path: 10.35s
//org.eclipse.jgit.test:org_eclipse_jgit_internal_storage_file_GcPackRefsTest FAILED in 12 out of 200 in 1.6s
Stats over 200 runs: max = 1.6s, min = 1.1s, avg = 1.3s, dev = 0.1s
This flakiness was caused by the assumption that exactly one of the 2
threads would fail, when both might actually succeed in practice due to
racing on the compare-and-swap.
For whatever reason, this change affected the interleaving behavior in
such a way that the flakiness jumped to around 50%. Making the
interleaving of the test fully deterministic is beyond the scope of this
change, but a simple tweak to the assertion is enough to make it pass
consistently 200+ times both before and after this change.
Some downstream code checks whether a ReceiveCommand is a create or a
delete based on the type field. Other downstream code (in particular a
good chunk of Gerrit code I wrote) checks the same thing by comparing
oldId/newId to zeroId. Unfortunately, there were no strict checks in the
constructor that ensures that zeroId is only set for oldId/newId if the
type argument corresponds, so a caller that passed mismatched IDs and
types would observe completely undefined behavior as a result. This is
and always has been a misuse of the API; throw IllegalArgumentException
so the caller knows that it is a misuse.
Similarly, throw from the constructor if oldId/newId are null. The
non-nullness requirement was already documented. Fix RefDirectoryTest to
not do the wrong thing.
Dave Borowitz [Fri, 7 Jul 2017 17:51:25 +0000 (13:51 -0400)]
Merge changes from topic 'packed-batch-ref-update'
* changes:
RefList: Support capacity <= 0 on new builders
Short-circuit writing packed-refs if no refs were packed
BatchRefUpdate: Clarify some ref prefix calls
Zhen Chen [Tue, 20 Jun 2017 21:28:25 +0000 (14:28 -0700)]
Make possible to overwrite the object count
Right now, PackParser relies on the object count from the pack header.
However, when creating Dfs INSERT packs, the object count is not known
at the beginning of the operation. And when we append the base to a
RECEIVE pack, we can't modify the pack header for object count in most
Dfs implementations.
Make it possible to tell PackParser the expected object count by adding
a setter for expectedObjectCount, implementation can overwrite the
object count in onPackHeader function.
Dave Borowitz [Wed, 5 Jul 2017 13:57:01 +0000 (09:57 -0400)]
BatchRefUpdate: Clarify some ref prefix calls
Inline the old addRefToPrefixes, since it was just a glorified addAll.
Split getPrefixes into a variant, addPrefixesTo, that doesn't allocate a
small Collection on every invocation. Use this in the tight loop of
getTakenPrefixes.
Shawn Pearce [Tue, 27 Jun 2017 16:51:39 +0000 (09:51 -0700)]
Use read ahead during copyPackThroughCache
If a block is missing from the block cache, open the pack stream,
retain the ReadableChannel, and turn on read-ahead. This should help
to load a medium sized pack into a cold cache more quickly from a
slower IO stream, as the pack is scanned sequentially and missing
blocks are more likely to be available through the read-ahead.
Mathieu Cartaud [Mon, 22 May 2017 08:33:52 +0000 (10:33 +0200)]
Support -merge attribute in binary macro
The merger is now able to react to the use of the merge attribute.
The value unset and the custom value 'binary' are handled (-merge
and merge=binary)
Since the specification of the merge attribute states that when the
attribute is unset, ours version must be kept in case of a conflict, we
don't overwrite the file but keep the local version.
David Turner [Wed, 21 Jun 2017 19:23:03 +0000 (15:23 -0400)]
Add a test for parsing fsck config options and expose FsckMode enum
These config options allow overriding the message type (error, warn or
ignore) of a specific message ID such as missingEmail.
The supported fsck message IDs are defined in ObjectChecker.ErrorType.
Since TransferConfig.FsckMode wasn't public parsing fsck configuration
options like e.g. fsck.missingEmail=ignore failed with an
IllegalAccessException. Fix this by declaring this enum public.
Change-Id: I3f41ff7a76a846250a63ce92a9fd111eb347269f Signed-off-by: David Turner <dturner@twosigma.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Oliver Lockwood [Wed, 21 Jun 2017 16:25:19 +0000 (17:25 +0100)]
Fix bug in multiple tag handling on DescribeCommand
In the case of multiple tags on the same commit, jgit previously
only ever looked at the last of those tags; git behaviour is to
return the first tag (or first matching one if --match is
specified).
Bug: 518377
Change-Id: I3b6b58ad9f8aa3879ae35b84542b7bddc74a27d6 Signed-off-by: Oliver Lockwood <oliver.lockwood@cantab.net>
David Pursehouse [Mon, 12 Jun 2017 04:41:20 +0000 (13:41 +0900)]
Add tests for SubmoduleConfig
Change-Id: Idcc93c2ca95938995d489cffda649c7d7b26c50e Signed-off-by: David Pursehouse <david.pursehouse@gmail.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Change-Id: I32024b36093eb095539e02b1788d074bc5237d9f Signed-off-by: David Pursehouse <david.pursehouse@gmail.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Terry Parker [Wed, 14 Jun 2017 21:02:45 +0000 (14:02 -0700)]
Add a new singlePack option to PackConfig
If set, "singlePack" will create a single GC pack file for all
objects reachable from refs/*. If not set, the GC pack will contain
object reachable from refs/heads/* and refs/tags/*, and the GC_REST
pack will contain all other reachable objects.
Change-Id: I56bcb6a9da2c10a0909c2f940c025db6f3acebcb Signed-off-by: Terry Parker <tparker@google.com>
David Pursehouse [Sun, 11 Jun 2017 11:24:12 +0000 (20:24 +0900)]
Merge branch 'stable-4.8'
* stable-4.8:
Use a dedicated executor to run auto-gc in command line interface
Allow to use an external ExecutorService for background auto-gc
Fetch: Add --recurse-submodules and --no-recurse-submodules options
Fix capitalization of command help summaries
Change-Id: I7c85f11daa34c11c7f6389de885a2183a686197e Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Matthias Sohn [Sun, 11 Jun 2017 09:51:59 +0000 (11:51 +0200)]
Use a dedicated executor to run auto-gc in command line interface
WorkQueue uses daemon threads so auto-gc would not be executed after
short-lived commands run in command line. Hence use a dedicated executor
which we shutdown when the command finishes.
Change-Id: I0c2429ecfa04387389d159168ba78a020a696228 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Matthias Sohn [Sat, 10 Jun 2017 23:20:34 +0000 (01:20 +0200)]
Allow to use an external ExecutorService for background auto-gc
If set use the external executor, otherwise use JGit's own simple
WorkQueue. Move WorkQueue to an internal package so we can reuse it
without exposing it in the public API.
Change-Id: I060d62ffd6692362a88b4bf13ee07b0dc857abe9 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
David Pursehouse [Fri, 24 Mar 2017 01:26:44 +0000 (10:26 +0900)]
Fetch: Add --recurse-submodules and --no-recurse-submodules options
Add options to control recursion into submodules on fetch.
Add a callback interface on FetchCommand, to allow Fetch to display
an update "Fetching submodule XYZ" for each submodule.
Change-Id: Id805044b57289ee0f384b434aba1dbd2fd317e5b Signed-off-by: David Pursehouse <david.pursehouse@gmail.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
David Pursehouse [Sat, 10 Jun 2017 05:14:12 +0000 (14:14 +0900)]
Merge branch 'stable-4.8'
* stable-4.8:
SubmoduleUpdateCommand#setCallback should return 'this'
CloneCommand#setCallback should return 'this'
Prepare 4.7.2-SNAPSHOT builds
JGit v4.7.1.201706071930-r
ArchiveCommand: Create prefix entry with commit time
Run auto GC in the background
Update Orbit to the Oxygen version R20170516192513
Change-Id: Ibf90b4899d097474e7836e6baab8829e66fca524 Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Matthias Sohn [Fri, 9 Jun 2017 22:58:23 +0000 (00:58 +0200)]
SubmoduleUpdateCommand#setCallback should return 'this'
The other methods in this class follow the builder pattern, and
return 'this', allowing multiple method calls to be chained in a
single statement.
Update the setCallback method to do the same.
Change-Id: I4ddaacd6d50601f47f61eb6be8b62c8d59cce062 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Zhen Chen [Thu, 8 Jun 2017 23:55:26 +0000 (16:55 -0700)]
Defer object collision check until pack stream is done
Object collision check requires read from local storage which may be
slow. We already delay this check for blobs, this change will also delay
other objects until the pack stream is closed. In this way, there is no
readCurs call until the pack stream is closed.
David Turner [Wed, 8 Feb 2017 20:07:18 +0000 (15:07 -0500)]
Run auto GC in the background
When running an automatic GC on a FileRepository, when the caller
passes a NullProgressMonitor, run the GC in a background thread. Use a
thread pool of size 1 to limit the number of background threads spawned
for background gc in the same application. In the next minor release we
can make the thread pool configurable.
In some cases, the auto GC limit is lower than the true number of
unreachable loose objects, so auto GC will run after every (e.g) fetch
operation. This leads to the appearance of poor fetch performance.
Since these GCs will never make progress (until either the objects
become referenced, or the two week timeout expires), blocking on them
simply reduces throughput.
In the event that an auto GC would make progress, it's still OK if it
runs in the background. The progress will still happen.
This matches the behavior of regular git.
Git (and now jgit) uses the lock file for gc.log to prevent simultaneous
runs of background gc. Further, it writes errors to gc.log, and won't
run background gc if that file is present and recent. If gc.log is too
old (according to the config gc.logexpiry), it will be ignored.
Change-Id: I3870cadb4a0a6763feff252e6eaef99f4aa8d0df Signed-off-by: David Turner <dturner@twosigma.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Shawn Pearce [Sun, 4 Jun 2017 20:58:16 +0000 (13:58 -0700)]
fetch: Accept any SHA-1 on lhs of refspec
Allow fetch to accept a SHA-1 on the left hand side of a RefSpec,
enabling callers to pass a specific SHA-1 they want that may not have
been advertised by the remote repository. This can be passed along to
the network protocol to be sent in a "want" line.
Rest of the plumbing only cares about the ObjectId of the Ref in
the askFor map, so make up a fake name using ObjectId.name() to
pass the desired ObjectId into the network code.
Matthias Sohn [Tue, 30 May 2017 11:01:53 +0000 (13:01 +0200)]
Merge branch 'master' into stable-4.8
* master:
Fix out-of-bounds exception in RepoCommand#relative
Fix null return from FS.readPipe when command fails to launch
RenameDetector: Clarify rename limits <= 0
Remove unnecessary cast for DfsReader
Allow DfsReader to be subclassed
Track read IO for DfsReader
Fix javadoc of TooLargeObjectInPackException
Exclude refs/tags from bitmap commit selection
Bryan Donlan [Mon, 22 May 2017 18:37:14 +0000 (11:37 -0700)]
Fix null return from FS.readPipe when command fails to launch
When a command invoked from readPipe fails to launch (i.e. the exec call
fails due to a missing command executable), Process.start() throws,
which gets caught by the generic IOException handler, resulting in a
null return. This change detects this case and rethrows a
CommandFailedException instead.
Additionally, this change uses /bin/sh instead of bash for its posix
command failure test, to accomodate building in environments where bash
is unavailable.
Change-Id: Ifae51e457e5718be610c0a0914b18fe35ea7b008 Signed-off-by: Bryan Donlan <bdonlan@gmail.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Shawn Pearce [Fri, 19 May 2017 19:23:02 +0000 (12:23 -0700)]
Track read IO for DfsReader
Compute how much disk IO a DfsReader is performing, and how long the
sum of those operations took on this reader instance. Implementations
of DFS and interested applications can get the stats by calling the
new DfsReader.getIoStats() method at or after close().
Terry Parker [Thu, 18 May 2017 08:30:14 +0000 (01:30 -0700)]
Exclude refs/tags from bitmap commit selection
Commit db77610 ensured that all refs/tags commits are added to the
primary GC pack. It did that by adding all of the refs/tags commits
to the primary GC pack PackWriter's "interesting" object set.
Unfortunately, all commit objects in the "interesting" set are
selected as commits for which bitmap indices will be built. In a
repository like chromium with lots of tags, this changed the number of
bitmaps created from <700 to >10000. That puts huge memory pressure on
the GC task.
This change restores the original behavior of ignoring tags when
selecting commits for bitmaps.
In the "uninteresting" set, commits for refs/heads and refs/tags for
unannotated tags can not be differentiated. We instead identify
refs/tags commits by passing their ObjectIds as a new "noBitmaps"
parameter to the PackWriter.preparePack() methods.
PackWriterBitmapPreparer.setupTipCommitBitmaps() can then use that
"noBitmaps" parameter to exclude those commits.
Change-Id: Icd287c6b04fc1e48de773033fe432a9b0e904ac5 Signed-off-by: Terry Parker <tparker@google.com>
Thomas Wolf [Mon, 8 May 2017 06:48:40 +0000 (08:48 +0200)]
Clean up the disk when cloning fails
CloneCommand.call() has three stages: preparation, then the actual
clone (init/fetch), and finally maybe checking out the working
directory.
Restructure such that if we fail or are cancelled during the actual
clone (middle phase), we do clean up the disk again. This prevents
leaving behind a partial clone in an inconsistent state: either we
have a fully successfully built clone, or nothing at all.
Bug: 516303
Change-Id: I9b18c60f8f99816d42a3deb7d4a33a9f22eeb709 Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
DirCacheCheckout is generating names for temporary files. It was not checking
the length of this filenames. It may happen that a generated filename is
longer than 255 chars which causes problems on certain platforms. Make sure
that filenames for temporary files do not exceed 255 chars.
Zhen Chen [Wed, 3 May 2017 21:56:40 +0000 (14:56 -0700)]
Reset ObjectWalker when it starts a new walk
The ObjectWalker in PackWriterBitmapWalker needs to be reset whenever it
starts a new walk. Move this responsibility from the caller to the
method when the new walk starts.
Some repository topologies can cause carryOntoHistory to overflow the
thread stack, due to its strategy of recursing into the 2nd+ parents
of a merge commit. This can easily happen if a project maintains a
local fork, and frequently pulls from the upstream repository, which
itself may have a branchy history.
Rewrite the carryOntoHistory algorithm to use a fixed amount of thread
stack, pushing the save points onto the heap. By using heap space the
thread stack depth is no longer a concern. Repositories are instead
limited by available memory.
The algorithm is now structured as two loops:
carryOntoHistory: This outer loop pops saved commits off the top of
the stack, allowing the inner loop algorithm to dive down that path
and carry bits onto commits along that part of the graph. The loop
ends when there are no more stack elements.
carryOntoHistoryInner: The inner loop walks along a single path of
the graph. For a string of pearls (commits with one parent each)
r <- s <- t <- u
the algorithm walks backwards from u to r by iteratively updating
its local variable 'c'. This avoids heap allocation along a simple
path that does not require remembering state.
The inner loop breaks in the HAVE_ALL case, when all bits have been
found to be previously set on the commit. This occurs when a prior
iteration of the outer loop (carryOntoHistory) explored a different
path to this same commit, and copied the bits onto it.
When the inner loop encounters a merge commit, it pushes all parents
onto the heap based stack by allocating individual CarryStack
elements for each parent. Parents are pushed in order, allowing
side branches to be explored first.
A small optimization is taken for the last parent, avoiding pushing
it and instead updating 'c', allowing the side branch to be entered
without allocating a CarryStack.
Delete expired garbage even when there is no GC pack present.
Delete the condition to check whether the garbage pack creation time
is older than the last GC operation, because it's not possible to
find the last GC operation time when there is no GC pack.
Add additional tests to make sure the contents of the expired garbage
packs are considered during the GC operation and any actively
referenced objects from the garbage packs are copied successfully
into the GC pack before deleting the garbage pack.
Martin Fick [Fri, 31 Mar 2017 17:31:47 +0000 (11:31 -0600)]
Add parseCommit(AnyObjectId) method to Repository.
It is quite common to want to parse a commit without already having a
RevWalk. Provide a shortcut to do so to make it more convenient, and to
ensure that the RevWalk is released afterwards.
Signed-off-by: Martin Fick<mfick@codeaurora.org>
Change-Id: I9528e80063122ac318f115900422a24ae49a920e
Dan Willemsen [Tue, 15 Nov 2016 23:33:53 +0000 (15:33 -0800)]
RepoCommand: Add linkfile support.
Android wants them to work, and we're only interested in them for bare
repos, so add them just for that.
Make sure to use symlinks instead of just using the copyfile
implementation. Some scripts look up where they're actually located in
order to find related files, so they need the link back to their
project.
Change-Id: I929b69b2505f03036f69e25a55daf93842871f30 Signed-off-by: Dan Willemsen <dwillemsen@google.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Jeff Gaston <jeffrygaston@google.com> Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Jonathan Nieder [Mon, 17 Apr 2017 18:16:50 +0000 (11:16 -0700)]
Process all "have"s even when MissingObjectException is encountered
Because objects described by the client using "have" lines do not need
to be reachable by any ref on the server, it is possible for them to
point to missing objects in the reachability graph. When such an
object is encountered, I1097a2defa4a9dcf502ca8baca5d32880378818f (Only
throw MissingObjectException when necessary, 2017-03-29) aborts the
"have" walk early to salvage the fetch. The downside of that change
is that remaining "have"s are ignored unless they pointed directly to
an object with a bitmap. In the worst case this can increase the
bandwidth cost of a fetch to the cost of a clone because most "have"s
are ignored.
Avoid this cost by bypassing the failed "have" completely and moving
on to the remaining "have"s.
Martin Fick [Tue, 24 Jan 2017 18:00:54 +0000 (11:00 -0700)]
Prevent alternates loop
When looping through alternates, prevent visiting the same object
directory twice. This could happen when the objects/info/alternates file
includes itself directly or indirectly via a another repo and its
alternates file.
Change-Id: I79bb3da099ebc3c262d2e6c61ed4578eb1aa3474 Signed-off-by: James Melvin <jmelvin@codeaurora.org> Signed-off-by: Martin Fick <mfick@codeaurora.org> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
David Pursehouse [Thu, 13 Apr 2017 05:14:55 +0000 (14:14 +0900)]
Bazel: Restrict src globs to Java source files
Generating the src list with an unrestricted wildcard causes all
files in the source tree to be included. This results in junk files
such as .orig (generated during merge conflict resolution) to be
included, which causes in a build error:
in srcs attribute of java_library rule //org.eclipse.jgit:jgit:
file '//org.eclipse.jgit:src/org/eclipse/jgit/gitrepo/RepoCommand.java.orig'
is misplaced here (expected .java, .srcjar or .properties).
Modify the globs to only include Java source files.
Change-Id: Iaef3db33ac71d71047cd28acb0378e15cb09ece9 Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Han-Wen Nienhuys [Tue, 28 Mar 2017 12:00:38 +0000 (14:00 +0200)]
Fix RepoCommand to allow for relative URLs
This is necessary for deploying submodules on android.googlesource.com.
* Allow an empty base URL. This is useful if the 'fetch' field is "."
and all names are relative to some host root.
* The URLs in the resulting superproject are relative to the
superproject's URL. Add RepoCommand#setDestinationURI to
set this. If unset, the existing behavior is maintained.
* Add two tests for the Android and Gerrit case, checking the URL
format in .gitmodules; the tests use a custom RemoteReader which is
representative of the use of this class in Gerrit's Supermanifest
plugin.
Change-Id: Ia75530226120d75aa0017c5410fd65d0563e91b Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>