UploadPack: Verify clients send only commits for shallow lines
If a client mistakenly tries to send a tag object as a shallow line
JGit blindly assumes this is a commit and tries to parse the tag
buffer using the commit parser. This can cause an obtuse error like:
InvalidObjectIdException: Invalid id: t c0ff331234...
The "t" comes from the "object c0ff331234..." line of the tag tring
to be parsed as though it where the "tree" line of a commit.
Run any client supplied shallow lines through the RevWalk to lookup
the object types. Fail fast with a protocol exception if any of them
are non-commit.
Skip objects not known to this repository. This matches behavior
with git-core's upload-pack, which sliently skips over any shallow
line object named by the client but not known by the server.
When we have a URI that contains an empty path component (that is
it only contains a "/") we want to fall back to the host as
humanish name.
This change is according to the behavior of upstream git, which
falls back on the hostname when guessing directory names for
newly cloned repositories (see [1] for the discussion).
Matthias Sohn [Mon, 13 Jul 2015 23:55:58 +0000 (01:55 +0200)]
Update org.apache.httpcomponents
- update org.apache.httpcomponents.httpcore to 4.3.3
- update org.apache.httpcomponents.httpclient to 4.3.6, 4.3.5 and later
are reported to fix vulnerability CVE-2014-3577
Dave Borowitz [Wed, 2 Sep 2015 19:04:33 +0000 (15:04 -0400)]
PushCertificateStore: Don't add no-op command to batch
If no refs match the input list and we are writing to a batch,
the returned new commit from write() will match the current commit.
Adding a command to the batch for this case is harmless as it will
succeed, but it's more straightforward to just skip adding a command
in this case.
Add tests or the combination of saving matching refs and saving to a
batch.
Matthias Sohn [Mon, 31 Aug 2015 07:38:18 +0000 (09:38 +0200)]
Update uses-clauses in OSGi manifests
In Bug 476164 it was reported that EGit doesn't start when the platform
comes with jsch 0.1.51 while this version of EGit/JGit brings jsch
0.1.53. This could be caused by outdated uses-clauses. Hence recompute
them using PDE tooling.
Bug: 476164
Change-Id: I185ba097884ead9cd034eba842bd3bf34181a99b Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Matthias Sohn [Fri, 28 Aug 2015 11:09:50 +0000 (13:09 +0200)]
Use java.io.File to check existence of loose objects in ObjectDirectory
It was reported in [1] that 197e3393a51424fae45e51dce4a649ba26e5a368 led
to a performance regression in a BFG benchmark. Analysis showed that
this is caused by the exists() method in FS_POSIX, now overriding the
default implementation in FS. The default implementation of FS.exists()
uses java.io.File.exists(), while the new implementation in FS_POSIX
uses java.nio.file.Files.exists() - by simply removing the override in
FS_POSIX, performance was restored.
Profiling showed that java.nio.file.Files.exists() is substantially
slower than java.io.File.exists(), to the point where the exists() call
doubles the average cost of a call to
ObjectDirectory.insertUnpackedObject() - which the BFG uses a lot,
because it's rewriting history. Average times measured on Ubuntu were:
The loose object exists test should be using java.io.File and not FS.
ObjectDirectory uses FS.resolve() to traverse symlinks to objects but
then once inside objects all 256 sharded directories should be real
directories, and the object files should be real files, not dangling
symlinks. java.io.File.exists() is sufficient here, and faster.
Change ObjectDirectory to use File.exists() once its computed the File
handle.
This does mean JGit cannot run ObjectDirectory code on an abstract
virtual filesystem plugged into NIO2. If you really want to run JGit on
an esoteric non-standard filesystem like "in memory" you should look at
the DFS storage backend, which has fewer abstraction points to deal
with. Or write your own from scratch.
Martin Fick [Wed, 19 Aug 2015 21:05:54 +0000 (15:05 -0600)]
Handle stale file handles on packed-refs file
On a local filesystem the packed-refs file will be orphaned if it is
replaced by another client while the current client is reading the old
one. However, since NFS servers do not keep track of open files, instead
of orphaning the old packed-refs file, such a replacement will cause the
old file to be garbage collected instead. A stale file handle exception
will be raised on NFS servers if the file is garbage collected (deleted)
on the server while it is being read. Since we no longer have access to
the old file in these cases, the previous code would just fail. However,
in these cases, reopening the file and rereading it will succeed (since
it will reopen the new replacement file). So retrying the read is a
viable strategy to deal with stale file handles on the packed-refs file,
implement such a strategy.
Since it is possible that the packed-refs file could be replaced again
while rereading it (multiple consecutive updates can easily occur with
ref deletions), loop on stale file handle exceptions, up to 5 extra
times, trying to read the packed-refs file again, until we either read
the new file, or find that the file no longer exists. The limit of 5 is
arbitrary, and provides a safe upper bounds to prevent infinite loops
consuming resources in a potential unforeseen persistent error
condition.
Change-Id: I085c472bafa6e2f32f610a33ddc8368bb4ab1814 Signed-off-by: Martin Fick<mfick@codeaurora.org> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Martin Fick [Tue, 25 Aug 2015 13:48:50 +0000 (07:48 -0600)]
Add public isStaleFileHandle() API, improve detection.
Add a public API to the FileUtils to determine if an IOException is a
stale NFS file handle exception. This will make it easier to detect
such errors, and interpret them consistently throughout the codebase.
This new API is a bit more lenient in its detection than the previous
detection, and should be able to detect some errors which previously
were not identified as stale file handle exceptions because they had the
word NFS in the error message. Adjust the packfile handling code to use
this new API for detection.
Change-Id: I21f80014546ba1afec7335890e5ae79e7f521412 Signed-off-by: Martin Fick<mfick@codeaurora.org>
Set "potentialNullReference" to "error" level and fixed all issues
There should be no functional change, the logic updated only to make
code simple so that compiler can understand what is going for. Removed
all @SuppressWarnings("null") annotations since they cannot be used if
"org.eclipse.jdt.core.compiler.problem.potentialNullReference" option is
set to the "error" level.
Matthias Sohn [Wed, 19 Aug 2015 13:48:12 +0000 (15:48 +0200)]
Update com.jcraft.jsch to 0.1.53
Update target platform to Orbit M20150818205559 for Mars in order to
update com.jcraft.jsch to 0.1.53. Also update pom.xml to use Mars target
platform profile by default.
CQ: 10045
Bug: 463580
Change-Id: I1bf151fbee7b00c7bd38cf1236c9bad50e3c64bd Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Shawn Pearce [Sun, 16 Aug 2015 21:10:16 +0000 (14:10 -0700)]
Expose the set of root commits in PackStatistics
Root commits are commits with zero parents. If a commmit has no
parents it is the first commit in the repository. In general the root
commits should be unique for any given project, as the first commit
will be created at a different time, by a different user with its own
message. These root commits can be used as a "fingerprint" to
identify disjoint histories.
Change FS not to throw NPE when facing InMemory databases
The FS class and the subclasses FS_POSIX assumed in the findHook()
method that every repository has a valid gitDir. But in tests when using
in-memory-repositories this is not true and this method was generating
NPEs. Change the method to return null if no repository directory can be
determined.
Shawn Pearce [Fri, 14 Aug 2015 04:29:30 +0000 (21:29 -0700)]
Fix NPE in DfsGarbageCollector and further reduce memory
DfsGarbageCollector asks PackWriter for the set of objects packed
after the bitmap index is written out. This is now null as it was
cleared to release memory. Instead use PackBitmapIndexBuilder to
build the set as it also has the objects.
Reduce memory in PackBitmapIndexBuilder by fully discarding the
ObjectToPack instances. This was the original intent of commit 4bb523475d44 ("PackWriter: shed memory while creating bitmaps")
but failed as the instances were still held live here.
Switch to BlockList instead of ObjectToPack[]. This allows the
JVM to allocate many smaller arrays instead of one contiguous
array with 5.2M reference pointers. In a tight heap the smaller
allocations are more feasible.
Reduce the initial EWAHCompressedBitmaps for the 4 type maps. On
average a typical repository is 30% commits, 30% trees and 30% blobs.
These bitmaps are typically very dense. PackWriter orders objects by
commit, tree, blob when writing the file so these should always be a
very dense run of 1s with some 0s before and after. So even the 1/3rd
allocation is likely too large, but the later trim() will reduce the
internal buffer anyway.
Shawn Pearce [Thu, 13 Aug 2015 05:58:26 +0000 (22:58 -0700)]
PackWriter: shed memory while creating bitmaps
Once bitmap creation begins the internal maps required for packing are
no longer necessary. On a repository with 5.2M objects this can save
more than 438 MiB of memory by allowing the ObjectToPack instances to
get garbage collected away.
Downside is the PackWriter cannot be used for any further opertions
except to write the bitmap index. This is an acceptable trade-off as
in practice nobody uses the PackWriter after the bitmaps are built.
Shawn Pearce [Thu, 13 Aug 2015 06:18:24 +0000 (23:18 -0700)]
Bitmap builder: actually compress EWAH bitmaps in memory
For construction performance each new EWAHBitmap is allocated at the
roughly worst-case size the bitmap would need if all of the words must
be literal and no run length compression is available. In practice
this is far larger than is required, wasting heap memory while the
bitmaps are computed.
Trim down each bitmap to its minimum required size. This copies the
internal array to a new smaller array, allowing the GC to reclaim the
prior larger array for reuse.
A single bitmap of 5.2M bits is only 79 KiB of memory without this
trim call but 15,000 such bitmaps is 1.1 GiB. Trimming can help fit
a larger number of bitmaps during processing.
Shawn Pearce [Thu, 13 Aug 2015 05:10:35 +0000 (22:10 -0700)]
Do not retain commit body during bitmap generation
The bitmap preparer only needs commit graph topology; it does not use
the message body. Allow the RevWalk to free the body after the commit
has been parsed to save memory.
Consider original file mode while checking parent ignore rules
The WorkingTreeIterator.isEntryIgnored() should use originally requested
file mode while descending to the file tree root and checking ignore
rules. Original code asking isEntryIgnored() on a file was using
directory mode instead if the .gitignore was not located in the same
directory.
In addition to honor the http_proxy variable for setting a proxy for
http JGit should also honor the https_proxy variable to set a similar
proxy for https traffic
Bug: 473365
Change-Id: I1002cb575e26cd842bf81ad751ec7c267b585ce2 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Fix ResolveMerger when files should be replaced by folders
When during Merge for a certain path OURS & BASE contains a file and
THEIRS contains a folder there was a bug in JGit leading to unnecessary
conflicts. This commit fixes it and adds a test for this situation.
Bug: 472693
Change-Id: I71fac5a6a2ef926c01adc266c6f9b3275e870129 Also-by: Clemens Buchacher <drizzd@aon.at> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Consider only escaping backslash for regular expressions in ignore rules
While checking if we should consider an ignore rule without '[]'
brackets as a regular expression, check if the backslash escapes one of
the glob special characters '?', '*', '[', '\\'. If not, backslash is
not a part of a regex and should be treated literally.
Dave Borowitz [Thu, 16 Jul 2015 00:55:17 +0000 (17:55 -0700)]
BaseReceivePack: Don't throw from getPushCertificate()
Rather than lazily parsing the push in this method, parse it at the
end of recvCommands(), which already contains the necessary try/catch
for handling this error. This allows later callers to avoid having to
handle this condition superfluously.
Dave Borowitz [Mon, 13 Jul 2015 19:03:42 +0000 (12:03 -0700)]
Allow saving push certs on a subset of refs
Consider a BatchRefUpdate produced by Gerrit Code Review, where the
original command pushed over the wire might refer to
"refs/for/master", but that command is ignored and replaced with some
additional commands like creating "refs/changes/34/1234/1". We do not
want to store the cert in "refs/for/master@{cert}", since that may
lead someone looking to the ref to the incorrect conclusion that that
ref exists.
Add a separate put method that takes a collection of commands, and
only stores certs on those refs that have a matching command in the
cert.
Matthias Sohn [Mon, 13 Jul 2015 23:55:58 +0000 (01:55 +0200)]
Update org.apache.httpcomponents
- update org.apache.httpcomponents.httpcore to 4.3.3
- update org.apache.httpcomponents.httpclient to 4.3.6, 4.3.5 and later
are reported to fix vulnerability CVE-2014-3577
Matthias Sohn [Mon, 13 Jul 2015 15:06:54 +0000 (17:06 +0200)]
Access static member LocalDiskRepositoryTestCase.CONTENT directly
37a1e4be moved this constant causing the following error message in
Eclipse: "The static field LocalDiskRepositoryTestCase.CONTENT should be
accessed directly".
Change-Id: I4ceb57a30f2e5a8f7e55109ef260a244ed5e7044 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Dave Borowitz [Tue, 30 Jun 2015 01:05:11 +0000 (18:05 -0700)]
Store push certificates in refs/meta/push-certs
Inspired by a proposal from gitolite[1], where we store a file in
a tree for each ref name, and the contents of the file is the latest
push cert to affect that ref.
The main modification from that proposal (other than lacking the
out-of-git batching) is to append "@{cert}" to filenames, which allows
storing certificates for both refs/foo and refs/foo/bar. Those
refnames cannot coexist at the same time in a repository, but we do
not want to discard the push certificate responsible for deleting the
ref, which we would have to do if refs/foo in the push cert tree
changed from a tree to a blob.
The "@{cert}" syntax is at least somewhat consistent with
gitrevisions(7) wherein @{...} describe operators on ref names.
As we cannot (currently) atomically update the push cert ref with the
refs that were updated, this operation is inherently racy. Kick the can
down the road by pushing this burden on callers.
Chris Price [Tue, 7 Jul 2015 12:06:50 +0000 (13:06 +0100)]
Move `RepositoryTestCase.indexState` to parent class
The test helper method `indexState` in `RepositoryTestCase` is
very useful for writing tests, even in cases where we need to
do things like create more than one repository for a test and
thus we don't want to use the built-in `db` member variable that
exists in `RepositoryTestCase`. Since the method is static,
we can move it up to the parent class `LocalDiskRepositoryTestCase`,
where it can be used by tests that aren't a great fit for inheriting
directly from `RepositoryTestCase`.
Bug: 436200
Change-Id: I2b6de75c001d2d77ddb607488af246548784a67f Signed-off-by: Chris Price <chris@puppetlabs.com>
Dave Borowitz [Tue, 30 Jun 2015 00:26:57 +0000 (17:26 -0700)]
PushCertificateParser: Add method for parsing from a stream
We intend to store received push certificates somewhere, like a
particular ref in the repository in question. For reading data back
out, it will be useful to read push certificates (without pkt-line
framing) in a streaming fashion.
Dave Borowitz [Mon, 6 Jul 2015 19:19:42 +0000 (15:19 -0400)]
BaseReceivePack: Treat all LFs as optional
Discussion on the git mailing list has concluded[1] that the intended
behavior for all (non-sideband) portions of the receive-pack protocol
is for trailing LFs in pkt-lines to be optional. Go back to using
PacketLineIn#readString() everywhere.
For push certificates specifically, we agreed that the payload signed
by the client is always concatenated with LFs even though the client
MAY omit LFs when framing the certificate for the wire. This is still
reflected in the implementation of PushCertificate#toText().
Chris Price [Tue, 7 Jul 2015 11:30:06 +0000 (12:30 +0100)]
Use local variable in RepositoryTestCase.indexState
There is a signature of the test helper method `indexState`,
in `RepositoryTestCase`, that accepts a `Repository` object
as an argument. However, there was one line of code where
this variable was not being used, and the method was instead
referring to a member variable `db`. I believe this was
probably just an oversight in a previous refactor, and
that the correct behavior is to use the variable from
the argument list. This change also has the benefit
of making it possible to convert this method to a static
method, since it no longer relies on any state from the class.
Bug: 436200
Change-Id: Iac95b046dc5bd0b3756642e241c3637f1fad3609 Signed-off-by: Chris Price <chris@puppetlabs.com>
Jonathan Nieder [Tue, 30 Jun 2015 21:42:39 +0000 (14:42 -0700)]
Throw InvalidObjectIdException from ObjectId.fromString("tooshort")
ObjectId.fromString already throws InvalidObjectIdException for most
malformed object ids, but for this kind it previously threw
IllegalArgumentException. Since InvalidObjectIdException is a child of
IllegalArgumentException, callers that catch IllegalArgumentException
will continue to work.
Change-Id: I24e1422d51607c86a1cb816a495703279e461f01 Signed-off-by: Jonathan Nieder <jrn@google.com>
For loose objects an expiration date can be set which will save too
young objects from being deleted. Add the same for packfiles. Packfiles
which are too young are not deleted.
Bug: 468024
Change-Id: I3956411d19b47aaadc215dab360d57fa6c24635e Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Dave Borowitz [Wed, 17 Jun 2015 17:12:22 +0000 (13:12 -0400)]
Add a separate type for the identity in a push certificate
These differ subtly from a PersonIdent, because they can contain
anything that is a valid User ID passed to gpg --local-user. Upstream
git push --signed will just take the configuration value from
user.signingkey and pass that verbatim in both --local-user and the
pusher field of the certificate. This does not necessarily contain an
email address, which means the parsing implementation ends up being
substantially different from RawParseUtils.parsePersonIdent.
Nonetheless, we try hard to match PersonIdent behavior in
questionable cases.
Dave Borowitz [Mon, 15 Jun 2015 20:50:22 +0000 (16:50 -0400)]
PushCertificateParser: include begin/end lines in signature
The signature is intended to be passed to a verification library such
as Bouncy Castle, which expects these lines to be present in order to
parse the signature.
Dave Borowitz [Mon, 15 Jun 2015 19:48:22 +0000 (15:48 -0400)]
PushCertificateParser: throw PackProtocolException in more cases
This is the subclass of IOException already thrown by
BaseReceivePack#recvCommands when encountering an invalid value on
the wire. That's what PushCertificateParser is doing too, so use the
same subclass.
Dave Borowitz [Mon, 15 Jun 2015 19:25:14 +0000 (15:25 -0400)]
Extract a class for signed push configuration
The default behavior is to read a repository's signed push
configuration from that repo's config file, but this is not very
flexible when it comes to managing groups of repositories (e.g. with
Gerrit). Allow callers to override the configuration using a POJO.