Marco Miller [Thu, 21 Jun 2018 18:18:48 +0000 (14:18 -0400)]
ResolveMerger: Fix encoding with string; use bytes
This change fixes the issue [1]. Before this fix, a merge involving
the caching of consecutive yet similar filenames with Norwegian
characters [2] used to throw an IllegalStateException: Duplicate
stages not allowed. This was caused by inaccurate decoding of the
filenames, using string values assuming default encoding. In the
toString method of DirCacheEntry, used before through getPathString,
UTF-8 encoding is used, but the end result becomes default encoding,
through Object's default toString usage. The special characters in
those two consecutive (particular) filenames [2] were becoming the
very same decoded /single character, lending consecutive -but then
identical- filenames. Thus the perceived duplicate 0-staging of the
file(s).
Replace getPathString usage with getRawPath for this specific case,
or use byte array representations of cached entries instead of string.
Adding a test for this change is not possible, as there is no known
way to change the default encoding for filenames such as [2] (e.g.).
JGitTestUtil does write file contents through UTF-8, but encoding like
so does not apply to the actual file name. Hence there is no way to
create files with names properly made of special characters such as
[2]'s. And the test that is necessary for this case assumes such
Norwegian (or similar characters) filenames. Changing the default
locale programmatically in a test has no effect either. And changing
the LANG value passed to the JVM is only possible upon starting it.
On a local non-NFS filesystem the .git/config file will be orphaned if
it is replaced by a new process while the current process is reading the
old file. The current process successfully continues to read the
orphaned file until it closes the file handle.
Since NFS servers do not keep track of open files, instead of orphaning
the old .git/config file, such a replacement on an NFS filesystem will
instead cause the old file to be garbage collected (deleted). A stale
file handle exception will be raised on NFS clients if the file is
garbage collected (deleted) on the server while it is being read. Since
we no longer have access to the old file in these cases, the previous
code would just fail. However, in these cases, reopening the file and
rereading it will succeed (since it will open the new replacement file).
Since retrying the read is a viable strategy to deal with stale file
handles on the .git/config file, implement such a strategy.
Since it is possible that the .git/config file could be replaced again
while rereading it, loop on stale file handle exceptions, up to 5 extra
times, trying to read the .git/config file again, until we either read
the new file, or find that the file no longer exists. The limit of 5 is
arbitrary, and provides a safe upper bounds to prevent infinite loops
consuming resources in a potential unforeseen persistent error
condition.
Change-Id: I6901157b9dfdbd3013360ebe3eb40af147a8c626 Signed-off-by: Nasser Grainawi <nasser@codeaurora.org> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
When running on NFS there was a chance that JGits LockFile
semantic is broken because File#createNewFile() may allow
multiple clients to create the same file in parallel. This
change provides a fix which is only used when the new config
option core.supportsAtomicCreateNewFile is set to false. The
default for this option is true. This option can only be set in the
global or the system config file. The repository config file is not
taken into account in this case.
If the config option core.supportsAtomicCreateNewFile is true
then File#createNewFile() is trusted and the behaviour doesn't
change.
But if core.supportsAtomicCreateNewFile is set to false then after
successful creation of the lock file a hardlink to that lock file is
created and the attribute nlink of the lock file is checked to be 2. If
multiple clients manage to create the same lock file nlink would be
greater than 2 showing the error.
This expensive workaround is described in
https://www.time-travellers.org/shane/papers/NFS_considered_harmful.html
section III.d) "Exclusive File Creation"
Honor trustFolderStats also when reading packed-refs
Then list of packed refs was cached in RefDirectory based on mtime of
the packed-refs file. This may fail on NFS when attributes are cached.
A cached mtime of the packed-refs file could cause JGit to trust the
cached content of this file and to overlook that the file is modified.
Honor the config option trustFolderStats and always read the packed-refs
content if the option is false. By default this option is set to true
and this fix is not active.
Fix exception handling for opening bitmap index files
When creating a new PackFile instance it is specified whether this pack
has an associated bitmap index file or not. This information is cached
and the public method getBitmapIndex() will always assume a bitmap index
file must exist if the cached data tells so. But it may happen that the
packfiles are repacked during a gc in a different process causing the
packfile, bitmap-index and index file to be deleted. Since JGit still
has an open FileHandle on the packfile this file is not really deleted
and can still be accessed. But index and bitmap index file are deleted.
Fix getBitmapIndex() to invalidate the cached packfile instance if such
a situation occurs.
This problem showed up when a gerrit server was serving repositories
which where garbage collected with native git regularly. Fetch and
clone commands for certain repositories failed permanently after a
native git gc had deleted old bitmap index files.
Change-Id: I8e620bec74dd3f310ba42024f9a657062f868f0e Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
David Turner [Wed, 8 Feb 2017 20:07:18 +0000 (15:07 -0500)]
Run auto GC in the background
When running an automatic GC on a FileRepository, when the caller
passes a NullProgressMonitor, run the GC in a background thread. Use a
thread pool of size 1 to limit the number of background threads spawned
for background gc in the same application. In the next minor release we
can make the thread pool configurable.
In some cases, the auto GC limit is lower than the true number of
unreachable loose objects, so auto GC will run after every (e.g) fetch
operation. This leads to the appearance of poor fetch performance.
Since these GCs will never make progress (until either the objects
become referenced, or the two week timeout expires), blocking on them
simply reduces throughput.
In the event that an auto GC would make progress, it's still OK if it
runs in the background. The progress will still happen.
This matches the behavior of regular git.
Git (and now jgit) uses the lock file for gc.log to prevent simultaneous
runs of background gc. Further, it writes errors to gc.log, and won't
run background gc if that file is present and recent. If gc.log is too
old (according to the config gc.logexpiry), it will be ignored.
Change-Id: I3870cadb4a0a6763feff252e6eaef99f4aa8d0df Signed-off-by: David Turner <dturner@twosigma.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Matthias Sohn [Mon, 27 Mar 2017 08:52:36 +0000 (10:52 +0200)]
Merge branch 'stable-4.6'
* stable-4.6:
Only mark packfile invalid if exception signals permanent problem
Don't flag a packfile invalid if opening existing file failed
Prepare 4.5.2-SNAPSHOT builds
Change-Id: Ife4efad1135d3870a5a0fb71e60b9524fb8777ab Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
David Pursehouse [Mon, 27 Mar 2017 01:14:50 +0000 (10:14 +0900)]
Merge branch 'stable-4.5' into stable-4.6
* stable-4.5:
Only mark packfile invalid if exception signals permanent problem
Don't flag a packfile invalid if opening existing file failed
Prepare 4.5.2-SNAPSHOT builds
Change-Id: I20b50981adc54c426666015ff04fe3bb1db9abd9 Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Matthias Sohn [Sat, 25 Mar 2017 01:33:06 +0000 (02:33 +0100)]
Only mark packfile invalid if exception signals permanent problem
Add NoPackSignatureException and UnsupportedPackVersionException to
explicitly mark permanent unrecoverable problems with a pack
Assume problem with a pack is permanent only if we are sure the
exception signals a non-transient problem we can't recover from:
- AccessDeniedException: we lack permissions
- CorruptObjectException: we detected corruption
- EOFException: file ended unexpectedly
- NoPackSignatureException: pack has no pack signature
- NoSuchFileException: file has gone missing
- PackMismatchException: pack no longer matches its index
- UnpackException: unpacking failed
- UnsupportedPackIndexVersionException: unsupported pack index version
- UnsupportedPackVersionException: unsupported pack version
Do not attempt to handle Errors since they are thrown for serious
problems applications should not try to recover from.
Change-Id: I2c416ce2b0e23255c4fb03a3f9a0ee237f7a484a Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Luca Milanesio [Fri, 24 Mar 2017 00:18:12 +0000 (00:18 +0000)]
Don't flag a packfile invalid if opening existing file failed
A packfile random file open operation may fail with a
FileNotFoundException even if the file exists, possibly
for the temporary lack of resources.
Instead of managing the FileNotFoundException as any generic
IOException it is best to rethrow the exception but prevent
the packfile for being flagged as invalid until it is actually
opened and read successfully or unsuccessfully.
David Ostrovsky [Thu, 23 Mar 2017 05:44:51 +0000 (06:44 +0100)]
bazel: Consume hamcrest through transitive dependency
In I3ab958ce8 explicit dependency in lib/BUILD were defined and most
of the bazel build implementation was switched to using it. Switch
test.bzl test implementation to using explicit dependencies as well.
Change-Id: I4413d1a45addeeb2a980d07669fa034c2eebb3a4 Signed-off-by: David Ostrovsky <david@ostrovsky.org>
bazel test --test_tag_filters=api,dfs,revplot,treewalk //...
Change-Id: Ic41b05a79d855212e67b1b4707e9c6b4dc9ea70d Signed-off-by: David Ostrovsky <david@ostrovsky.org> Signed-off-by: Jonathan Nieder <jrn@google.com>
Jonathan Nieder [Mon, 20 Mar 2017 01:52:55 +0000 (18:52 -0700)]
bazel: Mark junit targets testonly
Only testonly targets (such as tests) need to use junit.
In particular this involves making the toplevel :all rule testonly.
It's not clear to me what that rule is for --- "bazel build //..."
already works to build all targets. In any case it appears to be for
testing, so marking it as testonly shouldn't be harmful.
Jonathan Nieder [Mon, 20 Mar 2017 00:41:26 +0000 (17:41 -0700)]
bazel: Add explicit targets for library dependencies
This provides a place to declare visibility restrictions and
transitive dependencies for each library.
Other targets should only declare dependencies on what they directly
use, making dependencies easier to maintain.
Trim the dependencies of org.eclipse.jgit:jgit to follow that rule.
It declares dependencies on Apache httpcomponents and the servlet
API but doesn't use them.
Tested:
* 'bazel build //...' succeeds
* applying the change https://gerrit-review.googlesource.com/90843
to a copy of Gerrit, following the instructions there, and running
'bazel test //...' in that copy of Gerrit still succeeds
David Pursehouse [Sun, 19 Mar 2017 01:23:29 +0000 (10:23 +0900)]
Fix test configuration to run RacyGitTests, and fix testRacyGitDetection
With the filename suffix "Tests", the module was not included in tests
when building with Maven, and without the @Test annotations the tests
didn't get executed under Eclipse or buck test.
testRacyGitDetection was failing because the index file did not exist.
Add the missing configuration, the missing annotations, and add a call
to reset() in testRacyGitDetection to force creation of the index file.
Change-Id: I29dd8f89c36fef4ab40bedce7f4a26bd9b2390e4 Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
David Ostrovsky [Sat, 18 Mar 2017 10:29:26 +0000 (11:29 +0100)]
RevFlagSetTest: Fix compilation error flagged by error prone
This fixes error flagged by error prone:
Java compilation in rule '//org.eclipse.jgit.test:jgit' failed: Worker
process sent response with exit code: 1.
org.eclipse.jgit.test/tst/org/eclipse/jgit/revwalk/RevFlagSetTest.java:149:
error: [CollectionIncompatibleType] Argument '"bob"' should not be
passed to this method; its type String is not compatible with its
collection's type argument RevFlag
assertFalse(set.contains("bob"));
Change-Id: I4a971ce92fee55e28b2ab0c7b716ac20fa9c6709 Signed-off-by: David Ostrovsky <david@ostrovsky.org>
David Ostrovsky [Sat, 18 Mar 2017 09:41:29 +0000 (10:41 +0100)]
Move SHA1 compress/recompress files to resource folder
This fixes Bazel build:
in srcs attribute of java_library rule //org.eclipse.jgit:jgit:
file '//org.eclipse.jgit:src/org/eclipse/jgit/util/sha1/SHA1.recompress'
is misplaced here (expected .java, .srcjar or .properties).
Another option that was considered is to exclude the non source files.
Change-Id: I7083f27a4a49bf6681c85c7cf7b08a83c9a70c77 Signed-off-by: David Ostrovsky <david@ostrovsky.org>
Luca Milanesio [Fri, 10 Mar 2017 00:20:23 +0000 (00:20 +0000)]
Don't remove pack when FileNotFoundException is transient
The FileNotFoundException is typically raised in three conditions:
1. file doesn't exist
2. incompatible read vs. read/write open modes
3. filesystem locking
4. temporary lack of resources (e.g. too many open files)
1. is already managed, 2. would never happen as packs are not
overwritten while with 3. and 4. it is worth logging the exception and
retrying to read the pack again.
Log transient errors using an exponential backoff strategy to avoid
flooding the logs with the same error if consecutive retries to access
the pack fail repeatedly.
FetchCommand: Fix detection of submodule recursion mode
The submodule.name.fetchRecurseSubmodules value was being read from the
configuration of the submodule, but it should be read from the config
of the parent repository.
Also, the fetch.recurseSubmodules value from the parent repository's
configuration was not being considered at all.
Fix both of these and add tests. Now the precedence of the recurse mode
is determined as follows:
1. Value passed to the API
2. Value configured in submodule.name.fetchRecurseSubmodules
3. Value configured in fetch.recurseSubmodules
4. Default to "on demand"
* stable-4.6:
Update Jetty to 9.4.1.v20170120 in buck build
Update Jetty to 9.4.1.v20170120
Update build to use Tycho 1.0.0
Update minimum JDK version in README
Change-Id: I735697c112094e883986ce13026d967291d88494 Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Jonathan Nieder [Sun, 26 Feb 2017 23:09:04 +0000 (15:09 -0800)]
Update Jetty to 9.4.1.v20170120 in buck build
5e8e2179b218ede7d14b69dc5149b0691b5859cf (Update Jetty to
9.4.1.v201470120, 2017-01-26) updated Jetty in the maven build.
Update the buck build to match so buck builds work again.
The buck build will go away soon, but in the meantime (until the bazel
build gets the same level of support) it is convenient as a faster way
of running tests than using maven.
The bazel build doesn't need this change since it doesn't build or run
http tests yet.
David Pursehouse [Mon, 13 Feb 2017 12:37:30 +0000 (21:37 +0900)]
FetchCommand: Add basic support for recursing into submodules
Extend FetchCommand to expose a new method, setRecurseSubmodules(mode),
which allows to set the mode to ON, OFF or ON_DEMAND.
After fetching a repository, its submodules are recursively fetched:
- When the mode is YES, submodules are always fetched.
- When the mode is NO, submodules are not fetched.
- When the mode is ON_DEMAND, submodules are only fetched when the
parent repository receives an update of the submodule and the new
revision is not already in the submodule.
The mode is determined in the following order of precedence:
- Value specified in the API call using setRecurseSubmodules.
- Value specified in the repository's config under the key
submodule.name.fetchRecurseSubmodules
- Defaults to ON_DEMAND if neither of the previous is set.
Extend FetchResult to recursively include results for submodules, as
a map of the submodule path to an instance of FetchResult.
Test setup is based on testCloneRepositoryWithNestedSubmodules.
Change-Id: Ibc841683763307cb76e78e142e0da5b11b1add2a Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Thomas Wolf [Thu, 23 Feb 2017 21:49:43 +0000 (22:49 +0100)]
Make Repository.normalizeBranchName less strict
This operation was added recently with the goal to provide some
way to auto-correct invalid user input, or to provide a correction
suggestion to the user -- EGit uses it now that way. But the initial
implementation was very restrictive; it removed all non-ASCII
characters and even slashes.
Understandably end users were not happy with that. Git has no such
restriction to ASCII-only; nor does JGit. Branch names should be
meaningful to the end user, and if a user-supplied branch name is
invalid for technical reasons, a "normalized" name should still
be meaningful to the user.
Rewrite to attempt a minimal fix such that the result will pass
isValidRefName.
* Replace all Unicode whitespace by underscore.
* Replace troublesome special characters by dash.
* Collapse sequences of underscores, dots, and dashes.
* Remove underscores, dots, and dashes following slashes, and
collapse sequences of slashes.
* Strip leading and trailing sequences of slashes, dots, dashes,
and underscores.
* Avoid the ".lock" extension.
* Avoid the Windows reserved device names.
* If input name is null return an empty String so callers don't need to
check for null.
This still allows branch names with single slashes as separators
between components, avoids some pitfalls that isValidRefName() tests
for, and leaves other character untouched and thus allows non-ASCII
branch names.
Also move the function from the bottom of the file up to where
isValidRefName is implemented.
Bug: 512508
Change-Id: Ia0576d9b2489162208c05e51c6d54e9f0c88c3a7 Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Shawn Pearce [Sat, 25 Feb 2017 19:43:42 +0000 (11:43 -0800)]
SHA-1: collision detection support
Update SHA1 class to include a Java port of sha1dc[1]'s ubc_check,
which can detect the attack pattern used by the SHAttered[2] authors.
Given the shattered example files that have the same SHA-1, this
modified implementation can identify there is risk of collision given
only one file in the pair:
When JGit detects probability of a collision the SHA1 class now warns
on the logger, reporting the object's SHA-1 hash, and then throws a
Sha1CollisionException to the caller.
From the paper[3] by Marc Stevens, the probability of a false positive
identification of a collision is about 14 * 2^(-160), sufficiently low
enough for any detected collision to likely be a real collision.
git-core[4] may adopt sha1dc before the system migrates to an entirely
new hash function. This commit enables JGit to remain compatible with
that move to sha1dc, and help protect users by warning if similar
attacks as SHAttered are identified.
Performance declined about 8% (detection off), now:
This decline in throughput is attributed to the step loop unrolling in
compress(), which was necessary to easily fit the UbcCheck logic into
the hash function. Using helper functions s1-s4 reduces the code
explosion, providing acceptable throughput.
sha1dc (native C) ~206.28 MiB/s
sha1dc (native C) ~204.47 MiB/s
sha1dc (native C) ~203.74 MiB/s
Average time across 100,000 calls to hash 4100 bytes (such as a commit
or tree) for the various algorithms available to JGit also shows SHA1
is slower than MessageDigest, but by an acceptable margin:
Being implemented in Java with these additional safety checks is
clearly a penalty, but throughput is still acceptable given the
increased security against object name collisions.
Magnus Vigerlöf [Sat, 18 Feb 2017 18:28:39 +0000 (19:28 +0100)]
Correct the boolean logic for filtering paths
The TreeWalk filtering classes need to support the three different
meanings of the return value the path comparison generates.
A new path comparison method (isPathMatch) is created with
three distinct return values (isPathPrefix use value '0' to
encode two of these) which will makes it possible for the logical
operators (especially NOT) to aggregate a correct verdict.
A filter like: AND(Path("path"), NOT(Path("path/to/other")))
Should filter out 'path/to/other/file', but not 'path/to/my/file'.
The path-limiting feature when testing path/to/my/file, would
result to run test for the following paths:
path
path/to
path/to/my
path/to/my/file
isPathPrefix('path/to/other') will return '0' for the first two
and since there is no way for NOT to distinguish between an exact
match and a match indicating that the tested path is a 'parent',
it will incorrectly return false and thus remove everything below
'path' immediately.
isPathMatch has a distinguished value for 'parent' matches that
will be preserved through the logic operators and should not
cause an over-eager removal of paths.
The functionality of isPathPrefix is required by other parts
and is untouched.
Unit tests are included to ensure that the logical functionality
is correct and can be preserved.
Change-Id: Ice2ca9406f09f1b179569e99b86a0e5d77baa20d Signed-off-by: Magnus Vigerlöf <magnus.vigerlof@gmail.com>
Shawn Pearce [Sun, 26 Feb 2017 19:44:51 +0000 (11:44 -0800)]
SHA1: support reset() and reuse instances
Allow SHA1 instances to be reused to compute another hash value, and
resume caching them in ObjectInserter and PackParser. This shaves a
small amount of running time off parsing git.git's pack file:
before after
------ ------
25.25s 25.55s
25.48s 25.06s
25.26s 24.94s
Almost noise (small difference), but recycling the instances reduces
some stress on the memory allocator finding two 80 word message block
arrays needed for hashing and collision detection.