Shawn Pearce [Sun, 16 Aug 2015 21:10:16 +0000 (14:10 -0700)]
Expose the set of root commits in PackStatistics
Root commits are commits with zero parents. If a commmit has no
parents it is the first commit in the repository. In general the root
commits should be unique for any given project, as the first commit
will be created at a different time, by a different user with its own
message. These root commits can be used as a "fingerprint" to
identify disjoint histories.
Change FS not to throw NPE when facing InMemory databases
The FS class and the subclasses FS_POSIX assumed in the findHook()
method that every repository has a valid gitDir. But in tests when using
in-memory-repositories this is not true and this method was generating
NPEs. Change the method to return null if no repository directory can be
determined.
Shawn Pearce [Fri, 14 Aug 2015 04:29:30 +0000 (21:29 -0700)]
Fix NPE in DfsGarbageCollector and further reduce memory
DfsGarbageCollector asks PackWriter for the set of objects packed
after the bitmap index is written out. This is now null as it was
cleared to release memory. Instead use PackBitmapIndexBuilder to
build the set as it also has the objects.
Reduce memory in PackBitmapIndexBuilder by fully discarding the
ObjectToPack instances. This was the original intent of commit 4bb523475d44 ("PackWriter: shed memory while creating bitmaps")
but failed as the instances were still held live here.
Switch to BlockList instead of ObjectToPack[]. This allows the
JVM to allocate many smaller arrays instead of one contiguous
array with 5.2M reference pointers. In a tight heap the smaller
allocations are more feasible.
Reduce the initial EWAHCompressedBitmaps for the 4 type maps. On
average a typical repository is 30% commits, 30% trees and 30% blobs.
These bitmaps are typically very dense. PackWriter orders objects by
commit, tree, blob when writing the file so these should always be a
very dense run of 1s with some 0s before and after. So even the 1/3rd
allocation is likely too large, but the later trim() will reduce the
internal buffer anyway.
Shawn Pearce [Thu, 13 Aug 2015 05:58:26 +0000 (22:58 -0700)]
PackWriter: shed memory while creating bitmaps
Once bitmap creation begins the internal maps required for packing are
no longer necessary. On a repository with 5.2M objects this can save
more than 438 MiB of memory by allowing the ObjectToPack instances to
get garbage collected away.
Downside is the PackWriter cannot be used for any further opertions
except to write the bitmap index. This is an acceptable trade-off as
in practice nobody uses the PackWriter after the bitmaps are built.
Shawn Pearce [Thu, 13 Aug 2015 06:18:24 +0000 (23:18 -0700)]
Bitmap builder: actually compress EWAH bitmaps in memory
For construction performance each new EWAHBitmap is allocated at the
roughly worst-case size the bitmap would need if all of the words must
be literal and no run length compression is available. In practice
this is far larger than is required, wasting heap memory while the
bitmaps are computed.
Trim down each bitmap to its minimum required size. This copies the
internal array to a new smaller array, allowing the GC to reclaim the
prior larger array for reuse.
A single bitmap of 5.2M bits is only 79 KiB of memory without this
trim call but 15,000 such bitmaps is 1.1 GiB. Trimming can help fit
a larger number of bitmaps during processing.
Shawn Pearce [Thu, 13 Aug 2015 05:10:35 +0000 (22:10 -0700)]
Do not retain commit body during bitmap generation
The bitmap preparer only needs commit graph topology; it does not use
the message body. Allow the RevWalk to free the body after the commit
has been parsed to save memory.
Consider original file mode while checking parent ignore rules
The WorkingTreeIterator.isEntryIgnored() should use originally requested
file mode while descending to the file tree root and checking ignore
rules. Original code asking isEntryIgnored() on a file was using
directory mode instead if the .gitignore was not located in the same
directory.
In addition to honor the http_proxy variable for setting a proxy for
http JGit should also honor the https_proxy variable to set a similar
proxy for https traffic
Bug: 473365
Change-Id: I1002cb575e26cd842bf81ad751ec7c267b585ce2 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Fix ResolveMerger when files should be replaced by folders
When during Merge for a certain path OURS & BASE contains a file and
THEIRS contains a folder there was a bug in JGit leading to unnecessary
conflicts. This commit fixes it and adds a test for this situation.
Bug: 472693
Change-Id: I71fac5a6a2ef926c01adc266c6f9b3275e870129 Also-by: Clemens Buchacher <drizzd@aon.at> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Consider only escaping backslash for regular expressions in ignore rules
While checking if we should consider an ignore rule without '[]'
brackets as a regular expression, check if the backslash escapes one of
the glob special characters '?', '*', '[', '\\'. If not, backslash is
not a part of a regex and should be treated literally.
Dave Borowitz [Thu, 16 Jul 2015 00:55:17 +0000 (17:55 -0700)]
BaseReceivePack: Don't throw from getPushCertificate()
Rather than lazily parsing the push in this method, parse it at the
end of recvCommands(), which already contains the necessary try/catch
for handling this error. This allows later callers to avoid having to
handle this condition superfluously.
Dave Borowitz [Mon, 13 Jul 2015 19:03:42 +0000 (12:03 -0700)]
Allow saving push certs on a subset of refs
Consider a BatchRefUpdate produced by Gerrit Code Review, where the
original command pushed over the wire might refer to
"refs/for/master", but that command is ignored and replaced with some
additional commands like creating "refs/changes/34/1234/1". We do not
want to store the cert in "refs/for/master@{cert}", since that may
lead someone looking to the ref to the incorrect conclusion that that
ref exists.
Add a separate put method that takes a collection of commands, and
only stores certs on those refs that have a matching command in the
cert.
Matthias Sohn [Mon, 13 Jul 2015 23:55:58 +0000 (01:55 +0200)]
Update org.apache.httpcomponents
- update org.apache.httpcomponents.httpcore to 4.3.3
- update org.apache.httpcomponents.httpclient to 4.3.6, 4.3.5 and later
are reported to fix vulnerability CVE-2014-3577
Matthias Sohn [Mon, 13 Jul 2015 15:06:54 +0000 (17:06 +0200)]
Access static member LocalDiskRepositoryTestCase.CONTENT directly
37a1e4be moved this constant causing the following error message in
Eclipse: "The static field LocalDiskRepositoryTestCase.CONTENT should be
accessed directly".
Change-Id: I4ceb57a30f2e5a8f7e55109ef260a244ed5e7044 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Dave Borowitz [Tue, 30 Jun 2015 01:05:11 +0000 (18:05 -0700)]
Store push certificates in refs/meta/push-certs
Inspired by a proposal from gitolite[1], where we store a file in
a tree for each ref name, and the contents of the file is the latest
push cert to affect that ref.
The main modification from that proposal (other than lacking the
out-of-git batching) is to append "@{cert}" to filenames, which allows
storing certificates for both refs/foo and refs/foo/bar. Those
refnames cannot coexist at the same time in a repository, but we do
not want to discard the push certificate responsible for deleting the
ref, which we would have to do if refs/foo in the push cert tree
changed from a tree to a blob.
The "@{cert}" syntax is at least somewhat consistent with
gitrevisions(7) wherein @{...} describe operators on ref names.
As we cannot (currently) atomically update the push cert ref with the
refs that were updated, this operation is inherently racy. Kick the can
down the road by pushing this burden on callers.
Chris Price [Tue, 7 Jul 2015 12:06:50 +0000 (13:06 +0100)]
Move `RepositoryTestCase.indexState` to parent class
The test helper method `indexState` in `RepositoryTestCase` is
very useful for writing tests, even in cases where we need to
do things like create more than one repository for a test and
thus we don't want to use the built-in `db` member variable that
exists in `RepositoryTestCase`. Since the method is static,
we can move it up to the parent class `LocalDiskRepositoryTestCase`,
where it can be used by tests that aren't a great fit for inheriting
directly from `RepositoryTestCase`.
Bug: 436200
Change-Id: I2b6de75c001d2d77ddb607488af246548784a67f Signed-off-by: Chris Price <chris@puppetlabs.com>
Dave Borowitz [Tue, 30 Jun 2015 00:26:57 +0000 (17:26 -0700)]
PushCertificateParser: Add method for parsing from a stream
We intend to store received push certificates somewhere, like a
particular ref in the repository in question. For reading data back
out, it will be useful to read push certificates (without pkt-line
framing) in a streaming fashion.
Dave Borowitz [Mon, 6 Jul 2015 19:19:42 +0000 (15:19 -0400)]
BaseReceivePack: Treat all LFs as optional
Discussion on the git mailing list has concluded[1] that the intended
behavior for all (non-sideband) portions of the receive-pack protocol
is for trailing LFs in pkt-lines to be optional. Go back to using
PacketLineIn#readString() everywhere.
For push certificates specifically, we agreed that the payload signed
by the client is always concatenated with LFs even though the client
MAY omit LFs when framing the certificate for the wire. This is still
reflected in the implementation of PushCertificate#toText().
Chris Price [Tue, 7 Jul 2015 11:30:06 +0000 (12:30 +0100)]
Use local variable in RepositoryTestCase.indexState
There is a signature of the test helper method `indexState`,
in `RepositoryTestCase`, that accepts a `Repository` object
as an argument. However, there was one line of code where
this variable was not being used, and the method was instead
referring to a member variable `db`. I believe this was
probably just an oversight in a previous refactor, and
that the correct behavior is to use the variable from
the argument list. This change also has the benefit
of making it possible to convert this method to a static
method, since it no longer relies on any state from the class.
Bug: 436200
Change-Id: Iac95b046dc5bd0b3756642e241c3637f1fad3609 Signed-off-by: Chris Price <chris@puppetlabs.com>
Jonathan Nieder [Tue, 30 Jun 2015 21:42:39 +0000 (14:42 -0700)]
Throw InvalidObjectIdException from ObjectId.fromString("tooshort")
ObjectId.fromString already throws InvalidObjectIdException for most
malformed object ids, but for this kind it previously threw
IllegalArgumentException. Since InvalidObjectIdException is a child of
IllegalArgumentException, callers that catch IllegalArgumentException
will continue to work.
Change-Id: I24e1422d51607c86a1cb816a495703279e461f01 Signed-off-by: Jonathan Nieder <jrn@google.com>
For loose objects an expiration date can be set which will save too
young objects from being deleted. Add the same for packfiles. Packfiles
which are too young are not deleted.
Bug: 468024
Change-Id: I3956411d19b47aaadc215dab360d57fa6c24635e Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Dave Borowitz [Wed, 17 Jun 2015 17:12:22 +0000 (13:12 -0400)]
Add a separate type for the identity in a push certificate
These differ subtly from a PersonIdent, because they can contain
anything that is a valid User ID passed to gpg --local-user. Upstream
git push --signed will just take the configuration value from
user.signingkey and pass that verbatim in both --local-user and the
pusher field of the certificate. This does not necessarily contain an
email address, which means the parsing implementation ends up being
substantially different from RawParseUtils.parsePersonIdent.
Nonetheless, we try hard to match PersonIdent behavior in
questionable cases.
Dave Borowitz [Mon, 15 Jun 2015 20:50:22 +0000 (16:50 -0400)]
PushCertificateParser: include begin/end lines in signature
The signature is intended to be passed to a verification library such
as Bouncy Castle, which expects these lines to be present in order to
parse the signature.
Dave Borowitz [Mon, 15 Jun 2015 19:48:22 +0000 (15:48 -0400)]
PushCertificateParser: throw PackProtocolException in more cases
This is the subclass of IOException already thrown by
BaseReceivePack#recvCommands when encountering an invalid value on
the wire. That's what PushCertificateParser is doing too, so use the
same subclass.
Dave Borowitz [Mon, 15 Jun 2015 19:25:14 +0000 (15:25 -0400)]
Extract a class for signed push configuration
The default behavior is to read a repository's signed push
configuration from that repo's config file, but this is not very
flexible when it comes to managing groups of repositories (e.g. with
Gerrit). Allow callers to override the configuration using a POJO.
Dave Borowitz [Mon, 15 Jun 2015 18:54:13 +0000 (14:54 -0400)]
BaseReceivePack: fix reading cert lines in command loop
Add a missing continues to prevent falling through to the command
parsing section. The first continue happens when the command list is
empty, so change the condition to see whether we have read the first
line, rather than any commands.
Fix comparison to BEGIN_SIGNATURE to use raw line with newline.
a85e817d is a slightly breaking API change to classes that were
technically public and technically released in 4.0. However, it is
highly unlikely that people were actually depending on public behavior,
since there were no public methods to create PushCertificates with
anything other than null field values, or a PushCertificateParser that
did anything other than infinite loop or throw exceptions when reading.
Change-Id: I1d0ba9ea0a347e8ff5a0f4af169d9bb18c5838d2 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Terry Parker [Fri, 12 Jun 2015 19:00:36 +0000 (12:00 -0700)]
Introduce PostUploadHook to replace UploadPackLogger
UploadPackLogger is incorrectly named--it can be used to trigger any
post upload action, such as GC/compaction. This change introduces
PostUploadHook/PostUploadHookChain to replace
UploadPackLogger/UploadPackLoggerChain and deprecates the latter.
It also introduces PackStatistics as a replacement for
PackWriter.Statistics, since the latter is not public API.
It changes PackWriter to use PackStatistics and reimplements
PackWriter.Statistics to delegate to PackStatistics.
Change-Id: Ic51df1613e471f568ffee25ae67e118425b38986 Signed-off-by: Terry Parker <tparker@google.com>
Jonathan Nieder [Wed, 10 Jun 2015 22:43:48 +0000 (15:43 -0700)]
Treat CloneCommand.setBranch(null) as setBranch("HEAD")
This method is documented to take a branch name (not a possibly null
string). The only way a caller could have set null without either
re-setting to a sane value afterward or producing NullPointerException
was to also call setNoCheckout(true), in which case there would have
been no reason to set the branch in the first place.
Make setBranch(null) request the default behavior (remote's default
branch) instead, imitating C git's clone --no-branch.
Change-Id: I960e7046b8d5b5bc75c7f3688f3a075d3a951b00 Signed-off-by: Jonathan Nieder <jrn@google.com>
Jonathan Nieder [Wed, 10 Jun 2015 22:43:27 +0000 (15:43 -0700)]
Treat CloneCommand.setRemote(null) as setRemote("origin")
A non-bare clone command with null remote produces a
NullPointerException when trying to produce a refspec to fetch against.
In a bare repository, a null remote name is accepted by mistake,
producing a configuration with items like 'remote.url' instead of
'remote.<remote>.url'. This was never meant to work.
Instead, let's make setRemote(null) undo any previous setRemote calls
and re-set the remote name to DEFAULT_REMOTE, imitating C git clone's
--no-origin option.
While we're here, add some tests for setRemote working normally.
Change-Id: I76f502da5e677df501d3ef387e7f61f42a7ca238 Signed-off-by: Jonathan Nieder <jrn@google.com>
Jonathan Nieder [Wed, 10 Jun 2015 20:59:48 +0000 (13:59 -0700)]
Handle null in ProgressMonitor setters
These commands' monitor fields can never be null unless someone passes
null to setProgressMonitor. Anyone passing null probably meant to
disable the ProgressMonitor, so do that (by falling back to
NullProgressMonitor.INSTANCE) instead of saving a null and eventually
producing NullPointerException.
Change-Id: I63ad93ea8ad669fd333a5fd40880e7583ba24827 Signed-off-by: Jonathan Nieder <jrn@google.com>
Dave Borowitz [Wed, 10 Jun 2015 00:23:03 +0000 (17:23 -0700)]
Rewrite push certificate parsing
- Consistently return structured data, such as actual ReceiveCommands,
which is more useful for callers that are doing things other than
verifying the signature, e.g. recording the set of commands.
- Store the certificate version field, as this is required to be part
of the signed payload.
- Add a toText() method to recreate the actual payload for signature
verification. This requires keeping track of the un-chomped command
strings from the original protocol stream.
- Separate the parser from the certificate itself, so the actual
PushCertificate object can be immutable. Make a fair attempt at deep
immutability, but this is not possible with the current mutable
ReceiveCommand structure.
- Use more detailed error messages that don't involve NON-NLS strings.
- Document null return values more thoroughly. Instead of having the
undocumented behavior of throwing NPE from certain methods if they
are not first guarded by enabled(), eliminate enabled() and return
null from those methods.
- Add tests for parsing a push cert from a section of pkt-line stream
using a real live stream captured with Wireshark (which, it should
be noted, uncovered several simply incorrect statements in C git's
Documentation/technical/pack-protocol.txt).
This is a slightly breaking API change to classes that were
technically public and technically released in 4.0. However, it is
highly unlikely that people were actually depending on public
behavior, since there were no public methods to create
PushCertificates with anything other than null field values, or a
PushCertificateParser that did anything other than infinite loop or
throw exceptions when reading.
Dave Borowitz [Wed, 10 Jun 2015 19:47:04 +0000 (12:47 -0700)]
Allow trailing newlines in receive-pack
C git's receive-pack.c strips trailing newlines in command lists when
present[1], although send-pack.c does not send them, at least in the
case of command lists[2]. Change JGit to match this behavior.
Add tests.
This also fixes parsing of commands in the push cert, which, unlike
commands sent in the non-push case, always have trailing newlines.
The new submodule layout where GITDIR of a submodule is located at
<parent-repo-GITDIR>/modules/<submodule-path> was only used during
clone. Teach SubmoduleAddCommand to use the new layout.
Bug: 469666
Change-Id: Ie97dc0607b71499560444616f362bccee9cce515 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Jonathan Nieder [Wed, 10 Jun 2015 21:48:21 +0000 (14:48 -0700)]
submodule test: Use config.unset instead of setting to null
Most relative-URL tests for SubmoduleInitCommand carry out the following
steps:
1. add a submodule at path "sub" to the index
2. set remote.origin.url in .git/config
3. configure .gitmodules, possibly using relative URLs, and see what
happens
resolveWorkingDirectoryRelativeUrl() is meant to test the fallback when
remote.origin.url is not set, to match C git which treats the URL as
relative to the cwd in that case. To do so, in step (2) it sets
remote.origin.url to null.
However, Config.setString when taking a null value does not actually
unset that value from the configuration --- it sets it to the empty
string. This means we are testing a behavior that C git never
supported. Use Config.unset instead.
Change-Id: I7af29fbbd333a2598843d62c320093c48b2ad972 Signed-off-by: Jonathan Nieder <jrn@google.com>
* changes:
Allow setting detail message and cause when constructing most exceptions
Use message from ServiceNotAuthorizedException, ServiceNotEnabledException
dumb HTTP: Clarify AsIsFilter by introducing req and res locals
Clarify description of ServiceNotAuthorizedException