Change-Id: I0a86ce0e393dfde9bb27f0b29e036e76c856396e
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Signed-off-by: Chris Aniszczyk <zx@twitter.com>
DirCacheCheckout and CanonicalTreeParser cooperate. CanonicalTreeParser
can detect malformed, potentially malicious tree entries and sets a
flag, while DirCacheCheckout refuses to work with such paths.
Malicious tree entries are ".", "..", ".git" (case insensitive), any
name containing '/' and (on Windows '\') and also (on Windows)
any paths ending in a combination of '.' or space or containing a ':'.
We also forbid all special names like "con" etc on Windows.
Some of the test can execute on any platform by enabling partial
platform emulation.
A new runtime exception, InvalidPathException, is introduced. For
backwards compatibility it extends InvalidArgumentException.
Change-Id: I86199105814b63d4340e5de0e471d0da6b579ead
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
There is no point in pushing all of the files within the edge
commits into the delta search when making a thin pack. This floods
the delta search window with objects that are unlikely to be useful
bases for the objects that will be written out, resulting in lower
data compression and higher transfer sizes.
Instead observe the path of a tree or blob that is being pushed
into the outgoing set, and use that path to locate up to WINDOW
ancestor versions from the edge commits. Push only those objects
into the edgeObjects set, reducing the number of objects seen by the
search window. This allows PackWriter to only look at ancestors
for the modified files, rather than all files in the project.
Limiting the search to WINDOW size makes sense, because more than
WINDOW edge objects will just skip through the window search as
none of them need to be delta compressed.
To further improve compression, sort edge objects into the front
of the window list, rather than randomly throughout. This puts
non-edges later in the window and gives them a better chance at
finding their base, since they search backwards through the window.
These changes make a significant difference in the thin-pack:
Before:
remote: Counting objects: 144190, done
remote: Finding sources: 100% (50275/50275)
remote: Getting sizes: 100% (101405/101405)
remote: Compressing objects: 100% (7587/7587)
Receiving objects: 100% (50275/50275), 24.67 MiB | 9.90 MiB/s, done.
Resolving deltas: 100% (40339/40339), completed with 2218 local objects.
real 0m30.267s
After:
remote: Counting objects: 61549, done
remote: Finding sources: 100% (50275/50275)
remote: Getting sizes: 100% (18862/18862)
remote: Compressing objects: 100% (7588/7588)
Receiving objects: 100% (50275/50275), 11.04 MiB | 3.51 MiB/s, done.
Resolving deltas: 100% (43160/43160), completed with 5014 local objects.
real 0m22.170s
The resulting pack is 13.63 MiB smaller, even though it contains the
same exact objects. 82,543 fewer objects had to have their sizes
looked up, which saved about 8s of server CPU time. 2,796 more
objects from the client were used as part of the base object set,
which contributed to the smaller transfer size.
Change-Id: Id01271950432c6960897495b09deab70e33993a9
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Sigend-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Fix TreeWalk bug comparing DirCache and WorkingTree with ANY_DIFF
When comparing a DirCache and a WorkingTree using ANY_DIFF we
sometimes didn't recursive into a subtree of both sides gave us
zeroId() back for the identity of a subtree. This happens when the
DirCache doesn't have a valid cache tree for the subtree, as then
it uses zeroId() for the ObjectId of the subtree, which then appears
to be equal to the zeroId() of the WorkingTreeIterator's subtree.
We work around this by adding a hasId() method that returns true
only if this iterator has a valid ObjectId. The idEquals method
on TreeWalk than only performs a compare between two iterators if
both iterators have a valid id.
Change-Id: I695f7fafbeb452e8c0703a05c02921fae0822d3f
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This allows callers to force the iterator back to its starting point,
so it can be traversed again. The default way to do this is to use
back(1) until first() is true, but this isn't very efficient for any
iterator. All current implementations have better ways to implement
reset without needing to seek backwards.
Change-Id: Ia26e6c852fdac8a0e9c80ac72c8cca9d897463f4
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Remove gitIgnoreTimestamp from abstract iterator API
This never should have been exposed on the top of the
AbstractTreeIterator type hierarchy. There is no concept of a
timestamp in a canonical tree read from the object database, and
the time in the DirCache isn't what we want here either.
Actually all that we need is to find the files whose names are
".gitignore" and are below the root directory. We can accomplish
that with a suffix filter, and process them immediately.
Change-Id: Ib09cbf81a9e038452ce491385c65498312e2916b
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
CC: Charley Wang <chwang@redhat.com>
CC: Chris Aniszczyk <caniszczyk@gmail.com>
CC: Stefan Lay <stefan.lay@sap.com>
CC: Matthias Sohn <matthias.sohn@sap.com>
This patch adds ignore compatibility to jgit. It encompasses
exclude files as well as .gitignore. Uses TreeWalk and
FileTreeIterator to find nodes and parses .gitignore
files when required. The patch includes a simple cache that
can be used to save results and avoid excessive gitignore
parsing.
CQ: 4302
Bug: 303925
Change-Id: Iebd7e5bb534accca4bf00d25bbc1f561d7cad11b
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Signed-off-by: Stefan Lay <stefan.lay@sap.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
PackWriter wants to categorize objects that are similar in path name,
so blobs that are probably from the same file (or same sort of file)
can be delta compressed against each other. Avoid converting into
a string by performing the hashing directly against the path buffer
in the tree iterator.
We only hash the last 16 bytes of the path, and we try avoid any
spaces, as we want the suffix of a file such as ".java" to be more
important than the directory it is in, like "src".
Change-Id: I31770ee711526306769a6f534afb19f937e0ba85
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
We don't actually need a Repository object here, just an ObjectReader
that can load content for us. So change the API to depend on that.
However, this breaks the asCommit and asTag legacy translation methods
on RevCommit and RevTag, so we still have to keep the Repository
inside of RevWalk for those two types. Hopefully we can drop those in
the future, and then drop the Repository off the RevWalk.
Change-Id: Iba983e48b663790061c43ae9ffbb77dfe6f4818e
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The WindowCache is an implementation detail of PackFile and how its
used by ObjectDirectory. Lets start to hide it and replace the public
API with a more generic concept, ObjectReader.
Because PackedObjectLoader is also considered a private detail of
PackFile, we have to make PackWriter temporarily dependent upon the
WindowCursor and thus FileRepository and ObjectDirectory in order to
just start the refactoring. In later changes we will clean up the
APIs more, exposing sufficient support to PackWriter without needing
the file specific implementation details.
Change-Id: I676be12b57f3534f1285854ee5de1aa483895398
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
As discussed on the egit-dev mailing list, we prefer not to have
trailing whitespace in our source code. Correct all currently
offending lines by trimming them.
Change-Id: I002b1d1980071084c0bc53242c8f5900970e6845
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Per CQ 3448 this is the initial contribution of the JGit project
to eclipse.org. It is derived from the historical JGit repository
at commit 3a2dd9921c.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>