David Turner [Wed, 4 Jan 2017 04:56:08 +0000 (23:56 -0500)]
push: support per-ref force-with-lease
When rebasing, force-pushing has a race condition: someone else might
have pushed a commit since the one you just rewrote. The force-with-lease
option prevents this by ensuring that the ref's old value is the one
that you expected.
Change-Id: I97ca9f8395396c76332bdd07c486e60549ca4401 Signed-off-by: David Turner <dturner@twosigma.com>
Shawn Pearce [Wed, 8 Feb 2017 23:34:00 +0000 (15:34 -0800)]
Assume GC_REST and GC_TXN also attempted deltas during packing
In a DFS repository the DfsGarbageCollector will typically attempt
delta compression while creating the three main pack files: GC,
GC_REST and GC_TXN. Include all of these in the wasDeltaAttempted()
decision so that future packers can bypass delta compression of
non-delta objects.
Shawn Pearce [Tue, 7 Feb 2017 00:04:29 +0000 (16:04 -0800)]
Prefer smaller GC files during DFS garbage collection
In 8ac65d33ed7a94f77cb066271669feebf9b882fc PackWriter changed its
behavior to always prefer the last object representation presented
to it by the ObjectReuseAsIs implementation. This was a fix to avoid
delta chain cycles.
Unfortunately it can lead to suboptimal compression when concurrent
GCs are run on the same repository. One case is automatic GC running
(with default settings) in parallel to a manual GC that has disabled
delta reuse in order to generate new smaller deltas for the entire
history of the repository.
Running GC with no-reuse generally requires more CPU time, which
also translates to a longer running time. This can lead to a race
where the automatic GC completes before the no-reuse GC, leaving
the repository in a state such as:
With the default sort ordering, the smaller no-reuse GC pack is
sorted earlier in the pack list, due to its more recent mtime.
During object reuse in a future GC, these smaller representations
are considered first by PackWriter, but are all discarded when the
auto GC file from 17:30 is examined second (due to its older mtime).
Work around this in two ways.
Well formed DFS repositories should have at most 1 GC pack. If
2 or more GC packs exist, break the sorting tie by selecting the
smaller file earlier in the pack list. This allows all normal read
code paths to favor the smaller file, which places less pressure
on the DfsBlockCache. If any GC race happens, readers serving clone
requests will prefer the file that is smaller.
During object reuse, flip this ordering so that the smaller file is
last. This allows PackWriter to see smaller deltas last, replacing
larger representations that were previously considered from other
pack files.
Shawn Pearce [Wed, 8 Feb 2017 05:00:30 +0000 (21:00 -0800)]
Fix missing deltas near type boundaries
Delta search was discarding discovered deltas if an object appeared
near a type boundary in the delta search window. This has caused JGit
to produce larger pack files than other implementations of the packing
algorithm.
Delta search works by pushing prior objects into a search window, an
ordered list of objects to attempt to delta compress the next object
against. (The window size is bounded, avoiding O(N^2) behavior.)
For implementation reasons multiple object types can appear in the
input list, and the window. PackWriter commonly passes both trees and
blobs in the input list handed to the DeltaWindow algorithm. The pack
file format requires an object to only delta compress against the same
type, so the DeltaWindow algorithm must stop doing comparisions if a
blob would be compared to a tree.
Because the input list is sorted by object type and the window is
recently considered prior objects, once a wrong type is discovered in
the window the search algorithm stops and uses the current result.
Unfortunately the termination condition was discarding any found
delta by setting deltaBase and deltaBuf to null when it was trying
to break the window search.
When this bug occurs, the state of the DeltaWindow looks like this:
current
|
\ /
input list: tree0 tree1 blob1 blob2
window: blob1 tree1 tree0
/ \
|
res.prev
As the loop iterates to the right across the window, it first finds
that blob1 is a suitable delta base for blob2, and temporarily holds
this in the bestDelta/deltaBuf fields. It then considers tree1, but
tree1 has the wrong type (blob != tree), so the window loop must give
up and fall through the remaining code.
Moving the condition up and discarding the window contents allows
the bestDelta/deltaBuf to be kept, letting the final file delta
compress blob1 against blob0.
The impact of this bug (and its fix) on real world repositories is
likely minimal. The boundary from blob to tree happens approximately
once in the search, as the input list is sorted by type. Only the
first window size worth of blobs (e.g. 10 or 250) were failing to
produce a delta in the final file.
This bug fix does produce significantly different results for small
test repositories created in the unit test suite, such as when a pack
may contains 6 objects (2 commits, 2 trees, 2 blobs). Packing test
cases can now better sample different output pack file sizes depending
on delta compression and object reuse flags in PackConfig.
Disabling the garbage pack coalescing when garbageTtl > 0 can result in
lot of garbage packs if they are created within the garbageTtl time.
To avoid a large number of garbage packs, re-introducing garbage pack
coalescing for the packs that are created within a single calendar day
when the garbageTtl is more than one day or one third of the garbageTtl.
Matthias Sohn [Mon, 30 Jan 2017 00:24:45 +0000 (01:24 +0100)]
GC: delete empty directories after purging loose objects
In order to limit the number of directories we check for emptiness only
consider fanout directories which contained unreferenced loose objects
we deleted in the same gc run.
Change-Id: Idf8d512867ee1c8ed40bd55752122ce83a98ffa2 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
David Pursehouse [Thu, 26 Jan 2017 04:38:50 +0000 (13:38 +0900)]
Merge FileTreeIteratorJava7Test into FileTreeIteratorTest
JGit now requires Java 8, so it is no longer necessary to have a
separate class for Java 7 specific tests. Remove it and merge its
tests into the existing FileTreeIteratorTest.
FileTreeIteratorTest has an @Before annotated method that sets up
some files in the git, which breaks the tests which have assumptions
on the file names. Add adjustments.
Change-Id: I14f88d8e079e1677c8dfbc1fcbf4444ea8265365 Signed-off-by: David Pursehouse <david.pursehouse@gmail.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Hongkai Liu [Tue, 24 Jan 2017 20:14:40 +0000 (15:14 -0500)]
Rename FileUtilTest to FileUtilsTest and merge in FileUtils7Test
Rename the test class to match the name of the class under test.
JGit now requires Java 8 so it is no longer necessary to have a
separate class (FileUtils7Test) for Java 7 tests. Merge those into
FileUtilsTest.
Change-Id: I39dd7e76a2e4ce97319c7d52261b0a1546879788 Signed-off-by: Hongkai Liu <hongkai.liu@ericsson.com> Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Hector Caballero [Thu, 26 Jan 2017 16:24:49 +0000 (11:24 -0500)]
Make GC cancellable when called programmatically
Sometimes, it is necessary to cancel a garbage collection operation.
When GC is called using the standalone executable, i.e., from a command
line, Control-Cing the process does the trick. When calling GC
programmatically, though, there is no mechanism to do it.
Add checks in the GC process so that a custom cancellable progress
monitor could be passed in order to cancel the operation at specific
points. In this case, the calling process set the cancel flag in the
progress monitor and the GC process will throw an exception that can
be caught and handled by the caller accordingly.
Hongkai Liu [Mon, 23 Jan 2017 16:33:40 +0000 (11:33 -0500)]
Clean up orphan files in GC
An orphan file is either a bitmap or an idx file in pack folder,
and its corresponding pack file is missing.
Change-Id: I3c4cb1f7aa99dd7b398bdb8d513f528d7761edff Signed-off-by: Hongkai Liu <hongkai.liu@ericsson.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
David Pursehouse [Fri, 27 Jan 2017 10:33:20 +0000 (19:33 +0900)]
RefUpdateTest: Don't call createBareRepository in try-with-resource
createBareRepository adds the created repo to the list of repos to be
closed in the superclass's teardown. Wrapping it in try-with-resource
causes it to be closed too many times, resulting in a corrupt use
count.
Change-Id: I4c70630bf6008544324dda453deb141f4f89472c Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
David Pursehouse [Fri, 27 Jan 2017 05:44:14 +0000 (14:44 +0900)]
RepoCommand#readFile: Don't call Git#getRepository() in try-with-resource
Using try-with-resource means that close() will automatically be
called on the Repository object. However, according to the javadoc
of Git#close():
If the repository was opened by a static factory method in this class,
then this method calls Repository#close() on the underlying repository
instance.
This means that Repository#close() is called twice, by Git.close()
and in the outer try-with-resource, leading to a corrupt use count.
Change-Id: I37ba517eb2cc67d1cd36813598772c70208d0bc9 Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
David Pursehouse [Fri, 27 Jan 2017 06:51:55 +0000 (15:51 +0900)]
RepoCommandTest: Don't wrap create{Bare,Work}Directory in t-w-r
These methods add the created Repository into "toClose", and they are
then closed by LocalDiskRepositoryTestCase's tearDown method.
Calling them in try-with-resource causes them to first be closed in
the test method, and then again in tearDown, which results in the use
count going negative and a log message on the console.
While this is not a serious problem, having so many false positives
in the logs will potentially drown out real cases of Repository being
closed too many times.
Change-Id: Ib374445e101dc11cb840957b8b19ee1caf777392 Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
David Pursehouse [Fri, 27 Jan 2017 03:05:12 +0000 (12:05 +0900)]
LocalDiskRepositoryTestCase: Only add to toClose through access method
Only using the access method means we only have one place where the
toClose set is modified, making it easier to debug either by adding
log statements or by setting a breakpoint.
Change-Id: I4f9f1774d5f2e10bcab381edfd84bb6ee0499a11 Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Matthias Sohn [Sat, 28 Jan 2017 14:06:15 +0000 (15:06 +0100)]
Don't rely on default locale when using toUpperCase() and toLowerCase()
Otherwise these methods may produce unexpected results if used for
strings that are intended to be interpreted locale independently.
Examples are programming language identifiers, protocol keys, and HTML
tags. For instance, "TITLE".toLowerCase() in a Turkish locale returns
"t\u0131tle", where '\u0131' is the LATIN SMALL LETTER DOTLESS I
character.
See
https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#toLowerCase--
http://blog.thetaphi.de/2012/07/default-locales-default-charsets-and.html
Bug: 511238
Change-Id: Id8d8f37d84d62239c918b81f8d883ed798d87656 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
David Pursehouse [Fri, 27 Jan 2017 02:32:18 +0000 (11:32 +0900)]
LocalDiskRepositoryTestCase: Prevent duplicates in list of repos to close
Change the "toClose" list to a set, which will not allow duplicate
entries. This reduces the number of false positive logs about corrupt
use count due to the same repository being closed more than once during
teardown.
Change-Id: I5ab0ff8b56e7f2b2c7aab5274d957708d26f42c5 Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
The Compacter and Garbage Collector will record the estimated size of
the newly going to be created compact, gc or garbage packs. This
information can be used by the clients to better make a call on how to
actually store the pack based on the approximated expected size.
Added a new protected method DfsObjDatabase.newPack(PackSource
packSource, long estimatedPackSize), so that the clients can override
this method to make use of the estimatedPackSize while creating a new
PackDescription object. The default implementation of this method is
equivalent to
newPack(packSource).setEstimatedPackSize(estimatedPackSize). I didn't
make it abstract because that would force all the existing sub classes
of DfsObjDatabase to implement this method. Due to this default
implementation, the estimatedPackSize is added to DfsPackDescription
using a setter instead of a constructor parameter (even though
constructor parameter would be a better choice as this value is set only
during the object creation).
The package is not used by the plugin and seems to be missing in the
platform anyway under some conditions, see bug 508321 (newer
org.apache.httpcomponents.httpclient_4.5.2 does NOT include the package,
org.apache.httpcomponents.httpclient_4.3.6 does).
Jonathan Nieder [Tue, 24 Jan 2017 18:16:19 +0000 (10:16 -0800)]
Remove @since tags from internal packages
These packages don't use @since tags because they are not part of the
stable public API. Some @since tags snuck in, though. Remove them to
make the convention easier to find for new contributors and the
expectations clearer for users.
David Turner [Tue, 24 Jan 2017 16:31:22 +0000 (11:31 -0500)]
gc: loosen unreferenced objects
An unreferenced object might appear in a pack. This could only happen
because it was previously referenced, and then later that reference
was removed. When we gc, we copy the referenced objects into a new
pack, and delete the old pack. This would remove the unreferenced
object. Now we first create a loose object from any unreferenced
object in the doomed pack. This kicks off the two-week grace period
for that object, after which it will be collected if it's not
referenced.
This matches the behavior of regular git.
Change-Id: I59539aca1d0d83622c41aa9bfbdd72fa868ee9fb Signed-off-by: David Turner <dturner@twosigma.com> Signed-off-by: Jonathan Nieder <jrn@google.com>
David Pursehouse [Fri, 20 Jan 2017 03:58:46 +0000 (12:58 +0900)]
Implement Bazel build for http-apache, lfs, lfs-server
Test plan:
$ bazel build all
$ unzip -t bazel-genfiles/all.zip
Archive: bazel-genfiles/all.zip
testing: libhttp-apache.jar OK
testing: libjgit-archive.jar OK
testing: libjgit-lfs-server.jar OK
testing: libjgit-lfs.jar OK
testing: libjgit-servlet.jar OK
testing: libjgit.jar OK
testing: libjunit.jar OK
No errors detected in compressed data of bazel-genfiles/all.zip.
Change-Id: I9e6c60898ccc6d2a4557ec7544c297442a9702b4 Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Shawn Pearce [Fri, 20 Jan 2017 16:41:29 +0000 (08:41 -0800)]
Change StreamGobbler to Runnable to avoid unused Future
It can be considered a programming error to create a Future<T>
but do nothing with that object. There is an async computation
happening and without holding and checking the Future for done
or exception the caller has no idea if it has completed.
FS doesn't really care about these StreamGobblers finishing.
Instead use Runnable with execute(Runnable), which doesn't
return a Future.
James Melvin [Tue, 3 Jan 2017 18:41:56 +0000 (11:41 -0700)]
gc: Add options to preserve and prune old pack files
The new --preserve-oldpacks option moves old pack files into the
preserved subdirectory instead of deleting them after repacking.
The new --prune-preserved option prunes old pack files from the
preserved subdirectory after repacking, but before potentially
moving the latest old packfiles to this subdirectory.
These options are designed to prevent stale file handle exceptions
during git operations which can happen on users of NFS repos when
repacking is done on them. The strategy is to preserve old pack files
around until the next repack with the hopes that they will become
unreferenced by then and not cause any exceptions to running processes
when they are finally deleted (pruned).
Change-Id: If3f729f0d9ce920ee2c3e6acdde46f2068be61d2 Signed-off-by: James Melvin <jmelvin@codeaurora.org>
David Ostrovsky [Sun, 6 Nov 2016 08:06:54 +0000 (09:06 +0100)]
Implement initial framework of Bazel build
The initial implementation only builds the packages consumed by
Gerrit Code Review.
Test build and execution is not implemented.
We prefer to consume maven_jar custom rule from bazlets repository,
for the same reasons as in the Gerrit project:
* Caching artifacts across different clones and projects
* Exposing source classifiers and neverlink artifact
TEST PLAN:
$ bazel build :all
$ unzip -t bazel-genfiles/all.zip
Archive: bazel-genfiles/all.zip
testing: libjgit-archive.jar OK
testing: libjgit-servlet.jar OK
testing: libjgit.jar OK
testing: libjunit.jar OK
No errors detected in compressed data of bazel-genfiles/all.zip.
Change-Id: Ia837ce95d9829fe2515f37b7a04a71a4598672a0 Signed-off-by: David Ostrovsky <david@ostrovsky.org> Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Wim Jongman [Fri, 6 Jan 2017 01:27:43 +0000 (02:27 +0100)]
Normalizer creating a valid branch name from a string
Generic normalization method for a possible invalid branch name.
The method compresses dividers between spaces, then replaces spaces
and non word characters with underscores.
This method is needed in preparation for subsequent EGit changes.
Thomas Wolf [Sun, 15 Jan 2017 20:35:50 +0000 (21:35 +0100)]
Fix StashApplyCommand for stashes containing untracked changes.
If there are untracked changes, apply only the untracked tree
after a successful merge. The merge tree from merging untracked
with HEAD would also contain files already reset before (changes
in tracked files) and try to reset those again,leading to false
checkout conflicts.
Bug: 505804
Change-Id: Iaced4d277623334d11e3d1cca5969590d7c5093e Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Marc Strapetz [Tue, 10 Jan 2017 15:28:54 +0000 (16:28 +0100)]
Fix possible InvalidObjectIdException in ObjectDirectory
ObjectDirectory.getShallowCommits should throw an IOException
instead of an InvalidArgumentException if invalid SHAs are present
in .git/shallow (as this file is usually edited by a human).
Change-Id: Ia3a39d38f7aec4282109c7698438f0795fbec905 Signed-off-by: Marc Strapetz <marc.strapetz@syntevo.com> Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Zhen Chen [Sat, 14 Jan 2017 06:02:37 +0000 (22:02 -0800)]
Skip pack header bytes in DfsPackFile
The 12 bytes `PACK...` header is written in PackWriter before reading
CachedPack files. In DfsPackFile#copyPackBypassCache, the header was not
skipped when the first block is not in cache.
David Pursehouse [Fri, 13 Jan 2017 01:41:27 +0000 (10:41 +0900)]
LfsProtocolServlet: Improve error on getLargeFileRepository failure
Externalize the error message and make it more specific. Also emit
an error log.
Change-Id: Ie7b1c90c54673bfb8e133fb0aa19a117d4ca6587 Signed-off-by: David Pursehouse <david.pursehouse@gmail.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
David Pursehouse [Fri, 13 Jan 2017 01:08:29 +0000 (10:08 +0900)]
Add support for refusing LFS request due to invalid authorization
Add a new exception type that server implementations can throw when a
client attempts to make an unauthorized LFS operation, which will result
in HTTP 401 Unauthorized being returned to the client.
An example of this is a Gerrit server that rejects a request to perform
an LFS operation on a ref that is not visible to the caller.
As defined in the LFS spec [1] the request may include authentication,
and per RFC 2616 [2], "401 response indicates that authorization has been
refused for those credentials".
Shawn Pearce [Tue, 3 Jan 2017 22:23:57 +0000 (14:23 -0800)]
Pack refs/tags/ with refs/heads/
This fixes a nasty performance issue for repositories that have many
objects referenced through refs/tags/, but not in refs/heads/.
Situations like this can arise when a project has made releases like
refs/tags/v1.0, and then decides to orphan history and start over for
version 2. The v1.0 objects are not reachable from master anymore,
but are still live due to the v1.0 tag.
When tags are packed in the GC_OTHER pack, bitmaps are not able to
cover the repository's contents. This may cause very slow counting
times during git clone, as the server must enumerate the ancient
history under refs/tags/ to respond to the client.
Clients by default always ask for all tags when asking for all heads
during clone. This has been true since git-core commit 8434c2f1afedb
(Apr 27 2008), when clone was converted to a builtin. Including tags
in the main GC pack should still allow servers to benefit from the
fast full pack reuse path when serving a clone to a client.
James Melvin [Tue, 27 Dec 2016 21:08:56 +0000 (14:08 -0700)]
Fix keep pack filename
Previously it was looking for a keep file with the name of a pack file
(extenstion included) appended with a '.keep'. However, the keep file
name should be the pack file name with a '.keep' extension
Change-Id: I9dc4c7c393ae20aefa0b9507df8df83610ce4d42 Signed-off-by: James Melvin <jmelvin@codeaurora.org>
Matthias Sohn [Sun, 18 Dec 2016 09:37:47 +0000 (10:37 +0100)]
Update maven-source-plugin to 3.0.1 to fix OOM during build
Recently we frequently suffer from OutOfMemoryError when creating source
archives in the Maven build. maven-source-plugin 3.0.0 has a bug [1]
causing OOM which is fixed in 3.0.1.
When a repository is being GCed and a concurrent push is received, there
is the possibility of having a missing object. This is due to the fact
that after the list of objects to delete is built, there is a window of
time when an unreferenced and ready to delete object can be referenced
by the incoming push. In that case, the object would be deleted because
there is no way to know it is no longer unreferenced. This will leave
the repository in an inconsistent state and most of the operations fail
with a missing tree/object error.
Given the incoming push change the last modified date for the now
referenced object, verify this one is still a candidate to delete
before actually performing the delete operation.
FileSnapshot.isModified may have reported a file to be clean although it
was actually dirty.
Imagine you have a FileSnapshot on file f. lastmodified and lastread are
both t0. Now time is t1 and you
1) modify the file
2) update the FileSnapshot of the file (lastModified=t1, lastRead=t1)
3) modify the file again
4) wait 3 seconds
5) ask the Filesnapshot whether the file is dirty or not. It erroneously
answered it's clean.
Any file which has been modified longer than 2.5 seconds ago was
reported to be clean. As the test shows that's not always correct.
The real-world problem fixed by this change is the following:
* A gerrit server using JGit to serve git repositories is processing
fetch requests while simultaneously a native git garbage collection
runs on the repo.
* At time t1 native git writes temporary files in the pack folder
setting the mtime of the pack folder to t1.
* A fetch request causes JGit to search for new packfiles and JGit
remembers this scan in a Filesnapshot on the packs folder. Since the gc
is not finished JGit doesn't see any new packfiles.
* The fetch is processed and the gc ends while the filesystem timer is
still t1. GC writes a new packfile and deletes the old packfile.
* 3 seconds later another request arrives. JGit does not yet know about
the new packfile but is also not rescanning the pack folder because it
cached that the last scan happened at time t1 and pack folder's mtime is
also t1. Now JGit will not be able to resolve any object contained in
this new pack. This behavior may be persistent if objects referenced by
the ref/meta/config branch are affected so gerrit can't read permissions
stored in the refs/meta/config branch anymore and will not allow any
pushes anymore. The pack folder will not change its mtime and therefore
no rescan will take place.
Change-Id: I3efd0ccffeb97b01207dc3e7a6b85c6b06928fad Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Zhen Chen [Wed, 23 Nov 2016 00:39:27 +0000 (16:39 -0800)]
Decide whether to "Accept-Encoding: gzip" on a request-by-request basis
When the reply is already compressed (e.g. a packfile fetched using dumb
HTTP), "Content-Encoding: gzip" wastes bandwidth relative to sending the
content raw. So don't "Accept-Encoding: gzip" for such requests.