Replace usage of ArrayIndexOutOfBoundsException in treewalk
Using exceptions during normal operations - for example with the
desire of expanding an array in the failure case - can have a
severe performance impact. When exceptions are instantiated,
a stack trace is collected. Generating stack trace can be expensive.
Compared to that, checking an array for length - even if done many
times - is cheap since this is a check that can run in just a
handful of CPU cycles.
Change-Id: Ifaf10623f6a876c9faecfa44654c9296315adfcb
Signed-off-by: Patrick Hiesel <hiesel@google.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
WorkingTreeIterator: handle different timestamp resolutions
Older JGit stored only milliseconds timestamps in the index. Newer
JGit may get finer timestamps from the file system. This leads to
slow index diffs when a new JGit runs against an index produced
by older JGit because many timestamps will differ and JGit will
then do many content checks. See [1].
Handle this migration case by only comparing milliseconds if the
index entry has only millisecond precision.
The inverse may also occur; also compare only milliseconds if the
file timestamp has only millisecond precision.
Do the same also for microsecond resolution. On Windows, NTFS may
provide 100ns resolution and may be used by external programs writing
the index, but Java's WindowsFileAttributes may provide only
microseconds.
File timestamp precision in Java depends not only on the Java APIs
used by different JGit versions but may also change when running the
same Java code on different VMs. And of course the resolution may
vary among operating and file systems. Moreover, timestamp precision
in the index depends on the program that wrote the index. Canonical
git may use a different resolution, maybe even different between git
versions.
[1] https://www.eclipse.org/forums/index.php/t/1100344/
Change-Id: Idfd08606c883cb98787b2138f9baf0cc89a57b56
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Fix WorkingTreeIterator.compareMetadata() for CheckStat.MINIMAL
If CheckStat is MINIMAL or timestamps have no nanosecond part
WorkingTreeIterator.compareMetaData only checks the second part of
timestamps and ignores nanoseconds which may have ended up in the index
by using native git.
If
fileLastModified.getEpochSecond() == cacheLastModified.getEpochSecond()
we currently proceed comparing fileLastModified and cacheLastModified
with full precision which is wrong since we determined that we detected
reduced timestamp resolution.
Fix this and also handle smudged index entries for CheckStat.MINIMAL.
Change-Id: I6149885903ac63d79b42d234cc02aa4e19578f3c
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Use Instant instead of milliseconds for filesystem timestamp handling
This enables higher file timestamp resolution on filesystems like ext4,
Mac APFS (1ns) or NTFS (100ns) providing high timestamp resolution on
filesystem level.
Note:
- on some OSes Java 8,9 truncate milliseconds, see
https://bugs.openjdk.java.net/browse/JDK-8177809, fixed in Java 10
- UnixFileAttributes truncates timestamp resolution to microseconds when
converting the internal representation to FileTime exposed in the API,
see https://bugs.openjdk.java.net/browse/JDK-8181493
- WindowsFileAttributes also provides only microsecond resolution
Change-Id: I25ffff31a3c6f725fc345d4ddc2f26da3b88f6f2
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
After cloning a repo with a submodule, non-recursively, JGit would
encounter in its TreeWalk in IndexDiff:
* first, a missing gitlink (in index & HEAD, not in working tree)
* second, the untracked folder (not in index and head, in working tree)
As a result, it would report the submodule as missing. Canonical git
reports a clean workspace.
The root cause of this is that the path of a gitlink "x" did
not compare equal to the path of a tree "x" in JGit.
Correct Paths.compare() to account for that. If two paths are otherwise
equal, then let gitlinks match both trees and files. Matching trees
solves the bug. Matching files is necessary to handle the case where
the gitlink directory was replaced by a file; see the new test case
IndexDiffSubmoduleTest.testSubmoduleReplacedByFile(). Comparisons of
unequal paths are left untouched, so the sort order is unchanged.
After the fix, another bug(?) in WorkingTreeIterator became apparent:
with core.dirNoGitLinks = true, it was no longer possible to overwrite
a gitlink in the index. This is now fixed in WorkingTreeIterator.
Add new test cases for the bug itself and for some related cases
(submodule directory deleted or replaced by a file) in
IndexDiffSubmoduleTest. Add a test for missing files in IndexDiffTest,
and adapt the PathsTest to test matching gitlinks.
Bug: 467631
Change-Id: I0549d10d46b1858e5eec3cc15095aa9f1d5f5280
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Fix replacement quoting for replaceAll in filter command
According to String.replaceAll JavaDoc:
"Note that backslashes (\) and dollar signs ($) in the replacement
string may cause the results to be different than if it were being
treated as a literal replacement string; see Matcher.replaceAll. Use
java.util.regex.Matcher.quoteReplacement to suppress the special meaning
of these characters, if desired."
Bug: 536318
Change-Id: Ib70cfec41bf73e14d23d94d14aee05a25b1e87f6
Signed-off-by: Markus Duft <markus.duft@ssi-schaefer.com>
Make FileTreeIterator not enter ignored directories by default. We
only need to enter ignored directories if we do some operation against
git, and there is at least one tracked file underneath an ignored
directory.
Walking ignored directories should be avoided as much as possible as
it is a potential performance bottleneck. Some projects have a lot of
files or very deep hierarchies in ignored directories; walking those
may be costly (especially so on Windows). See for instance also bug
500106.
Provide a FileTreeIterator.setWalkIgnoredDirectories() operation to
force the iterator to iterate also through otherwise ignored
directories. Useful for tests (IgnoreNodeTest, CGitIgnoreTest), or
to implement things like "git ls-files --ignored".
Add tests in DirCacheCheckoutTest, and amend IndexDiffTest to test a
little bit more.
Bug: 388582
Change-Id: I6ff584a42c55a07120a4369fd308409431bdb94a
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Remove it from
* package private functions.
* try blocks
* for loops
this was done with the following python script:
$ cat f.py
import sys
import re
import os
def replaceFinal(m):
return m.group(1) + "(" + m.group(2).replace('final ', '') + ")"
methodDecl = re.compile(r"^([\t ]*[a-zA-Z_ ]+)\(([^)]*)\)")
def subst(fn):
input = open(fn)
os.rename(fn, fn + "~")
dest = open(fn, 'w')
for l in input:
l = methodDecl.sub(replaceFinal, l)
dest.write(l)
dest.close()
for root, dirs, files in os.walk(".", topdown=False):
for f in files:
if not f.endswith('.java'):
continue
full = os.path.join(root, f)
print full
subst(full)
Change-Id: If533a75a417594fc893e7c669d2c1f0f6caeb7ca
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Significantly speed up FileTreeIterator on Windows
Getting attributes of files on Windows is an expensive operation.
Windows stores file attributes in the directory, so they are
basically available "for free" when a directory is listed. The
implementation of Java's Files.walkFileTree() takes advantage of
that (at least in the OpenJDK implementation for Windows) and
provides the attributes from the directory to a FileVisitor.
Using Files.walkFileTree() with a maximum depth of 1 is thus a
good approach on Windows to get both the file names and the
attributes in one go.
In my tests, this gives a significant speed-up of FileTreeIterator
over the "normal" way: using File.listFiles() and then reading the
attributes of each file individually. The speed-up is hard to
quantify exactly, but in my tests I've observed consistently 30-40%
for staging 500 files one after another, each individually, and up
to 50% for individual TreeWalks with a FileTreeIterator.
On Unix, this technique is detrimental. Unix stores file attributes
differently, and getting attributes of individual files is not costly.
On Unix, the old way of doing a listFiles() and getting individual
attributes (both native operations) is about three times faster than
using walkFileTree, which is implemented in Java.
Therefore, move the operation to FS/FS_Win32 and call it from
FileTreeIterator, so that we can have different implementations
depending on the file system.
A little performance test program is included as a JUnit test (to be
run manually).
While this does speed up things on Windows, it doesn't solve the basic
problem of bug 532300: the iterator always gets the full directory
listing and the attributes of all files, and the more files there are
the longer that takes.
Bug: 532300
Change-Id: Ic5facb871c725256c2324b0d97b95e6efc33282a
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Open auto-closeable resources in try-with-resource
When an auto-closeable resources is not opened in try-with-resource,
the warning "should be managed by try-with-resource" is emitted by
Eclipse.
Fix the ones that can be silenced simply by moving the declaration of
the variable into a try-with-resource.
In cases where we explicitly call the close() method, for example in
tests where we are testing specific behavior caused by the close(),
suppress the warning.
Leave the ones that will require more significant refcactoring to fix.
They can be done in separate commits that can be reviewed and tested
in isolation.
Change-Id: I9682cd20fb15167d3c7f9027cecdc82bc50b83c4
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Use TreeWalk#getEolStreamType(OperationType) instead.
Change-Id: I0f102ddf36102ff55a71448e376ed08743da5d1f
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
As it is right now some streams leak out of the filter construct. This
change clarifies responsibilities and fixes stream leaks
Change-Id: Ib9717d43a701a06a502434d64214d13a392de5ab
Signed-off-by: Markus Duft <markus.duft@ssi-schaefer.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Processing of negated rules, like !bin/ was not working correctly: they
were interpreted too broad, resulting in unexpected untracked files
which should actually be ignored
Bug: 409664
Change-Id: I0a422fd6607941461bf2175c9105a0311612efa0
Signed-off-by: Marc Strapetz <marc.strapetz@syntevo.com>
Transfer data in chunks of 8k Transferring data byte per byte is slow,
running checkout with CleanFilter on a 2.9MB file takes 20 seconds.
Using a buffer of 8k shrinks this time to 70ms.
Also register the filter commands in a way that the native GIT LFS can
be used alongside with JGit.
Implements auto-discovery of LFS server URL when cloning from a Gerrit
LFS server.
Change-Id: I452a5aa177dcb346d92af08b27c2e35200f246fd
Also-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Markus Duft <markus.duft@ssi-schaefer.com>
This doesn't handle the really hard thing, which is merging spurious
conflicts inside .gitmodules files. That's OK: git.git doesn't
either. Users can resolve the conflict themselves and then commit
the merge.
Previously, jgit would crash when attempting to merge conflicting
submodule changes. Even if there was no conflict, after a merge which
adds submodules, the repository would have been missing empty
directories for newly-added submodules.
This patch fixes the crash, and adds the empty directories where
necessary. It ensures that the index is in a conflicted state when
submodule changes conflict.
Reported-by: Alexey Korobkov
Bug: 494551
Change-Id: I79db6798c2bdcc1159b5b2589b02da198dc906a1
Signed-off-by: David Turner <dturner@twosigma.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
TreeWalk: Make getEolStreamType(OperationType) public
and deprecate getEolStreamType().
This resolves a TODO that was apparently supposed to be done in
version 4.4.
Change-Id: I5c9861aedabdc3f99dcf47519b3959a979e6a591
Happily, most anonymous SectionParser implementations can be replaced
with FooConfig::new, as long as the constructor takes a single Config
arg. Many of these, the non-public ones, can in turn be inlined. A few
remaining SectionParsers can be lambdas.
Change-Id: I3f563e752dfd2007dd3a48d6d313d20e2685943a
This is passed directly to the super constructor, where it is also
@Nullable. Marking it here saves the reader a jump.
Change-Id: Icc8db2f2dc6aae6e591aa4f09a3c283336a5424c
The TreeWalk filtering classes need to support the three different
meanings of the return value the path comparison generates.
A new path comparison method (isPathMatch) is created with
three distinct return values (isPathPrefix use value '0' to
encode two of these) which will makes it possible for the logical
operators (especially NOT) to aggregate a correct verdict.
A filter like: AND(Path("path"), NOT(Path("path/to/other")))
Should filter out 'path/to/other/file', but not 'path/to/my/file'.
The path-limiting feature when testing path/to/my/file, would
result to run test for the following paths:
path
path/to
path/to/my
path/to/my/file
isPathPrefix('path/to/other') will return '0' for the first two
and since there is no way for NOT to distinguish between an exact
match and a match indicating that the tested path is a 'parent',
it will incorrectly return false and thus remove everything below
'path' immediately.
isPathMatch has a distinguished value for 'parent' matches that
will be preserved through the logic operators and should not
cause an over-eager removal of paths.
The functionality of isPathPrefix is required by other parts
and is untouched.
Unit tests are included to ensure that the logical functionality
is correct and can be preserved.
Change-Id: Ice2ca9406f09f1b179569e99b86a0e5d77baa20d
Signed-off-by: Magnus Vigerlöf <magnus.vigerlof@gmail.com>
Generate names for objects using only the pure Java SHA1
implementation, but continue using MessageDigest in tests.
This opens the possibility of changing the hashing function
to incorporate additional safety measures, such as those
used in sha1dc[1].
Since MessageDigest has higher throughput, continue using
MessageDigest for computing pack, idx and DirCache trailers.
These are less likely to be sensitive to SHAttered[2] types
of attacks, as Git uses them to detect random bit flips
during transfer, and not for content identity.
[1] https://github.com/cr-marcstevens/sha1collisiondetection
[2] https://shattered.it/
Change-Id: If6da98334201f7f20cb916e46f782c45f373784e
Enable and fix warnings about redundant specification of type arguments
Since the introduction of generic type parameter inference in Java 7,
it's not necessary to explicitly specify the type of generic parameters.
Enable the warning in Eclipse, and fix all occurrences.
Change-Id: I9158caf1beca5e4980b6240ac401f3868520aad0
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Enable and fix 'Should be tagged with @Override' warning
Set missingOverrideAnnotation=warning in Eclipse compiler preferences
which enables the warning:
The method <method> of type <type> should be tagged with @Override
since it actually overrides a superclass method
Justification for this warning is described in:
http://stackoverflow.com/a/94411/381622
Enabling this causes in excess of 1000 warnings across the entire
code-base. They are very easy to fix automatically with Eclipse's
"Quick Fix" tool.
Fix all of them except 2 which cause compilation failure when the
project is built with mvn; add TODO comments on those for further
investigation.
Change-Id: I5772061041fd361fe93137fd8b0ad356e748a29c
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Fix symlink content comparison on MacOS in tree walk
Symlinks on MacOS are written as UTF-8 NFD, but
readSymbolicLink().toString() converts to NFC with potentially fewer
bytes. May occur in particular if the link target has non-ASCII
characters for which the NFC and NFD encodings differ. This may lead
to an EOFException: Short read of block.
This causes all kinds of weird effects in EGit, ranging from failing
rebases (which report the exception to the user) to EGit decorations in
the navigator silently disappearing (and never coming back).
* Rename readContentAsNormalizedString() to readSymlinkTarget() as it's
called only for symlinks. Also make it protected.
* Fix by allowing the read to succeed even if less than the expected
number of bytes are returned by the entry's input stream.
* Override in FileTreeIterator to use fs.readSymlink() directly.
Includes a new MacOS-only test.
Change-Id: I264c5972d67b1cbb1ed690580f5706e671b9affd
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Adds a JGit built-in implementation of the "git lfs smudge" filter. This
filter should do the same as the one described in [1] besides that it
only supports the local case when the lfs objects are already present in
the media directory. Remote cases where download of LFS objects from an
LFS server is needed will be done in a later commit.
[1] https://github.com/github/git-lfs/blob/master/docs/man/git-lfs-smudge.1.ronn
Change-Id: I8ff661d4edd3667ef7f86f3b4fa33e568eb4c8f4
Add configuration parameter to enable built-in hooks/filters
If the configuration parameter filter.<filterDriverName>.useJGitBuiltin
is set to true then for all corresponding filters JGit will try to
execute the built-in filter instead of the filter-command which is
defined in git configuration. It will fallback to the non-built-in
filters if no built-in filters are registered or if constructing them
leads to exceptions. If set to false JGit will not try to execute
built-in filters for the specified filter driver.
Example: The configuration contains the following lines
[filter "lfs"]
clean = git-lfs clean -- %f
smudge = git-lfs smudge -- %f
useJGitBuiltin = true
Addtionally the .gitattributes file in the root of the working tree
contains:
*.bin filter=lfs
In this case when new content is added similar to "git add 1.bin" then
the following will happen:
- jgit will check whether a built-in command factory was registered
for the command "jgit://builtin/lfs/clean". If that is true the
factory is used to create a built-in filter command and that
command is used to filter the content
- Otherwise jgit will call the external program "git lfs clean ..."
to do the filtering
Change-Id: Idadb1db06b1e89e7031d7ed6319904973c367d38
JGit supports clean filters defined in repository configuration. The
filters are implemented as external programs filtering content by
accepting the original content (as seen in the working tree) on stdin
and which emit the filtered content on stdout. To run such a filter JGit
has to start an external process and pump data into/from this process.
This commit adds support for clean filters which are implemented
in Java and which are executed by jgit's main thread. When a filter is
defined in the configuration as
"jgit://builtin/<filterDriverName>/clean" then JGit will lookup in a
static map whether a filter is registered under this name. If found
such a filter is called to do the filtering.
The functionality in this commit requires that a program using JGit
explicitly calls the JGit API to register built-in implementations for
specific clean filters. In follow-up commits configuration parameters
will be added which trigger such registrations. Other commits will add
implementations for lfs filters.
Change-Id: I0344d3c54801c9a46e5a606c5df17e5f2e17b2be
Don't check lastModified, length on folders for submodules
The metadata comparison of submodules is not reliable because of the
last modified timestamp and directory length.
Bug: 498759
Change-Id: If5db69ef3868e475ac477d3e8a7750b268799b0c
Fix TreeWalk to reset attributes cache for each entry
Treewalk has a member 'attr' which caches the attributes for the current
entry. We did not reset the cache always when moving to next entry. The
effect was that when there are no attributes for an entry 'a' but 'a'
was skipped by a Treewalk filter then Treewalk stopped looking for
attributes until TreeWalk.next() was called again.
Change-Id: Ied39b7fb5f56afe7a237da17801003d0abe6b1c7
Fix computation of id in WorkingTreeIterator with autocrlf and smudging
JGit failed to do checkouts when the index contained smudged entries and
autocrlf was on. In such cases the WorkingTreeIterator calculated the
SHA1 sometimes on content which was not correctly filtered. The SHA1 was
computed on content which two times went through a lf->crlf conversion.
We used to tell the treewalk whether it is a checkin or checkout
operation and always use the related filters when reading any content.
If on windows and autocrlf is true and we do a checkout operation then
we always used a lf->crlf conversion on any text content. That's not
correct. Even during a checkout we sometimes need the crlf->lf
conversion. E.g. when calculating the content-id for working-tree
content we need to use crlf->lf filtering although the overall operation
type is checkout.
Often this bug does not have effects because we seldom compute the
content-id of filesystem content during a checkout. But we do need to
know whether a file is dirty or not before we overwrite it during a
checkout. And if the index entries are smudged we don't trust the index
and compute filesystem-content-sha1's explicitly.
This caused EGit not to be able to switch branches anymore on Windows
when autocrlf was true. EGit denied the checkout because it thought
workingtree files are dirty because content-sha1 are computed on wrongly
filtered content.
Bug: 493360
Change-Id: I1072a57b4c529ba3aaa50b7b02d2b816bb64a9b8
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Implement the DIR_NO_GITLINKS setting with the same functionality
it provides in cGit.
Bug: 436200
Change-Id: I8304e42df2d7e8d7925f515805e075a92ff6ce28
Signed-off-by: Preben Ingvaldsen <preben@puppetlabs.com>
This commit introduces a FileModeStrategy to
the FileTreeIterator class. This provides a way to
allow different modes of traversing a file tree;
for example, to control whether or not a nested
.git directory should be treated as a gitlink.
Bug: 436200
Change-Id: Ibf85defee28cdeec1e1463e596d0dcd03090dddd
Signed-off-by: Preben Ingvaldsen <preben@puppetlabs.com>
TreeWalk provides the new method getEolStreamType. This new method can
be used with EolStreamTypeUtil in order to create a wrapped InputStream
or OutputStream when reading / writing files. The implementation
implements support for the git configuration options core.crlf, core.eol
and the .gitattributes "text", "eol" and "binary"
CQ: 10896
Bug: 486563
Change-Id: Ie4f6367afc2a6aec1de56faf95120fff0339a358
Signed-off-by: Ivan Motsch <ivan.motsch@bsiag.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>