ReceivePack: catch InvalidObjectIdException while parsing shallow
The "shallow $id" parsing can also throw InvalidObjectIdException,
just like parseCommand. Move it into its own method with a proper
try-catch block to convert to the checked PackProtocolException.
ReceivePack: enable capabilities immediately on first line
Instead of deferring until after command parsing, enable the
capabilities after the first pkt-line has been read from the client.
This allows the server to setup the side-band-64k channel immediately.
push: Report fatal server errors during pack writing
If the push client has requested side-band support the server can
signal a fatal error parsing the pack using the error channel (3)
and then hang up. This may cause the PackWriter to fail to write to
data onto the network socket, which throws a misleading error back
up to the application and the user.
During a write failure poll the input to see if the side band system
can parse out an error message off channel 3. This should be fast as
there will either be an error present in the buffer, or the remote will
also have hung-up on the side band channel. In the case of a hang-up
just rethrow the original IOException as its a network error.
This roughly matches what C git does; once commands are sent and the
packer is started a new thread runs in the background to decode any
possible server error during unpacking on the remote peer
ReceivePack: Catch InvalidObjectIdException instead of IAE
The more specific type InvalidObjectIdException is thrown by
ObjectId.fromString(). Use it here in ReceivePack as the more
generic IAE is never thrown by the body of the try-catch block.
A RefAdvertiser writing to the network includes both the reference's
ObjectId and its peeled ObjectId in the advertised set. In smart HTTP
negotiation requests may bypass the RefAdvertiser and quickly build
the set based on current refs; include the peeled ObjectIds to match
behavior with the normal bidirectional protocols on git:// and SSH.
This field was being set twice within the block. Setting it just once
is sufficient. writeString() does not examine the field so it is fine
to set it after the call.
Shawn Pearce [Mon, 27 Jun 2016 15:52:00 +0000 (11:52 -0400)]
Merge changes from topic 'dfs-gc'
* changes:
Prune UNREACHABLE_GARBAGE packs when they expire
Use try-with-resources in DfsGarbageCollector.writePack
Fix lastModified to be consistent in DfsGarbageCollector
Add GC_REST PackSource to better order DFS packs
Mike Williams [Fri, 17 Jun 2016 15:34:36 +0000 (11:34 -0400)]
Prune UNREACHABLE_GARBAGE packs when they expire
DfsGarbageCollector will now enforce a maximum time to live (TTL) for
UNREACHABLE_GARBAGE packs. The default TTL is 1 day, which should be
enough time to avoid races with other processes that are inserting
data into the repository.
Change-Id: Id719e6e2a03cfc9a0c0aef8ed71d261dda14bd0c Signed-off-by: Mike Williams <miwilliams@google.com>
Shawn Pearce [Sun, 26 Jun 2016 18:18:59 +0000 (11:18 -0700)]
Fix lastModified to be consistent in DfsGarbageCollector
Set all packs written by the DfsGarbageCollector to use the same
starting timestamp as lastModified. This makes it easier to see
which packs came from the same DfsGarbageCollector run, as they
share the same timestamp.
This has provided decent performance for object lookups. Starting
from an arbitrary reference may find the content in a newer pack
created by DfsObjectInserter or a ReceivePack server. Compaction of
recent packs also contains newer content, and then most interesting
data is in the "main" GC pack. As the GC pack is self-contained (has
no edges leading outside) readers typically do not need to go further.
Adding a new GC_REST PackSource allows the DfsGarbageCollector to
identify to the pack ordering code which pack is which, so the
non-heads are scanned second during reads. This removes a hack that
was unique to Google's implementation that enforced this behavior by
fixing up the lastModified timestamp.
Renumber the PackSource's categories to reflect this search ordering.
Fix TreeWalk to reset attributes cache for each entry
Treewalk has a member 'attr' which caches the attributes for the current
entry. We did not reset the cache always when moving to next entry. The
effect was that when there are no attributes for an entry 'a' but 'a'
was skipped by a Treewalk filter then Treewalk stopped looking for
attributes until TreeWalk.next() was called again.
Fix DirCacheCheckout to return CheckoutConflictException
Problem occurs when the checkout wants to create a file 'd/f' but
the workingtree contains a dirty file 'd'. In order to create d/f the
file 'd' would have to be deleted and since the file is dirty that
content would be lost. This should lead to a CheckoutConflictException
for d/f when failOnConflict was set to true.
This fix also changes jgit checkout semantics to be more like native
gits checkout semantics. If during a checkout jgit wants to delete a
folder but finds that the working tree contains a dirty file at this
path then JGit will now throw an exception instead of silently keeping
the dirty file. Like in this example:
git init
touch b
git add b
git commit -m addB
mkdir a
touch a/c
git add a/c
git commit -m addAC
rm -fr a
touch a
git checkout HEAD~
Shawn Pearce [Sun, 19 Jun 2016 04:40:35 +0000 (21:40 -0700)]
Optimize RefAdvertiser for wire protocol
The native wire protocol sends ref advertisements in the pkt-line
format, which requires encoding the ObjectId and ref name onto a byte
sequence. Busy servers show this is a very high source of garbage,
which pushes the garbage collector harder when there are many refs in
the repository (e.g. 70k, in a Gerrit managed repository).
Optimize the side band advertiser by retaining the CharsetEncoder,
minimizing the amount of temporary garbage built during encoding.
http transport does not use authentication fallback
Git servers supporting HTTP transport can send multiple WWW-Authenticate
challenges [1] for different authentication schemes the server supports.
If authentication fails now retry all authentication types proposed by
the server.
[1] https://tools.ietf.org/html/rfc2617#page-3
Bug: 492057
Change-Id: I01d438a5896f9b1008bd6b751ad9c7cbf780af1a Signed-off-by: Christian Pontesegger <christian.pontesegger@web.de> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Shawn Pearce [Thu, 2 Jun 2016 04:48:21 +0000 (21:48 -0700)]
DfsBlock: throw DataFormatException on 0 bytes
setInput should always push at least 1 byte into the Inflater. If 0
bytes (or negative!) are being sent the DfsBlock is inconsistent with
the position passed in. This indicates a severe programming problem
in the caller, and may cause an infinite loop in DfsReader.
Today we saw a handful of live examples of this but don't know what
the cause is. Guard against this error condition and throw with a
more verbose failure, which may prevent an infinite loop. Callers
will eventually catch DataFormatException and rethrow with more detail
about the object that cannot be inflated, with the DFE in the chain.
When using a DfsInserter for high-throughput insertion of many
objects (analogous to git-fast-import), we don't necessarily want to
do a random object lookup for each. It'll be faster from the
inserter's perspective to insert the duplicate objects and let a later
GC handle the deduplication.
Matthias Sohn [Wed, 1 Jun 2016 14:09:40 +0000 (16:09 +0200)]
Merge branch 'master' into stable-4.4
* master:
Fix javadoc errors and unused imports introduced by ddd0fe25
RepoCommand: record manifest shallow recommendation in .gitmodules
RepoCommand: record manifest groups as submodule labels
Remove the deprecated TestRepository.getClock() method
Replace use of deprecated method Repository.getRef()
[findBugs] Prevent potential NPE in
FileLfsRepository.getOutputStream()
Better report on client side if push failed due to too large object
[findBugs] Prevent potential NPE in CloneCommand.init()
RepoCommand: remove --record-remote-branches
RepoCommandTest: Improve assertion message for remote branch recording
Change-Id: I4fbce4f84925a933fcc9a48058ed6793f5821b97 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Stefan Beller [Tue, 31 May 2016 22:19:52 +0000 (15:19 -0700)]
RepoCommand: record manifest shallow recommendation in .gitmodules
Git core learned about the submodule.<name>.shallow option in
.gitmodules files, which is a recommendation to clone a submodule
shallow. A repo manifest may record a clone depth recommendation as
an optional field, which contains more information than a binary
shallow/nonshallow recommendation, so any attempted conversion may be
lossy. In practice the clone depth recommendation is either '1' or doesn't
exist, which is the binary behavior we have in Git core.
Change-Id: I51aa9cb6d1d9660dae6ab6d21ad7bae9bc5325e6 Signed-off-by: Stefan Beller <sbeller@google.com>
Stefan Beller [Tue, 31 May 2016 22:18:20 +0000 (15:18 -0700)]
RepoCommand: record manifest groups as submodule labels
Git core learned about attributes in pathspecs:
pathspec: allow querying for attributes
The pathspec mechanism is extended via the new
":(attr:eol=input)pattern/to/match" syntax to filter paths so that it
requires paths to not just match the given pattern but also have the
specified attrs attached for them to be chosen.
We intend to use these pathspec attribute patterns for submodule
grouping, similar to the grouping in repo. So the RepoCommand which
translates repo manifest files into submodules should propagate this
information along. This requires writing information to the
.gitattributes file instead of the .gitmodules file. For now we just
overwrite any existing .gitattributes file and do not care about prior
attributes set. If this becomes an issue we need to figure out how to
correctly amend the grouping information to an existing .gitattributes
file.
Change-Id: I0f55b45786b6b8fc3d5be62d7f6aab9ac00ed60e Signed-off-by: Stefan Beller <sbeller@google.com>
Matthias Sohn [Mon, 23 May 2016 23:12:20 +0000 (01:12 +0200)]
Better report on client side if push failed due to too large object
JGits PushCommand and BasePackPushConnection were throwing a generic
exception when the pushed pack-file was rejected by the server since it
contained too large objects. Teach JGit to better analyze the server's
response to detect this situation and throw a more specific exception.
Detect this situation by parsing the status line sent by the server.
This change only recognizes the response sent by a JGit based server.
All other servers which report such problems in a different way still
lead to a generic TransportExceptions.
Also see https://git.eclipse.org/r/#/c/46348/
Change-Id: I8d6d65e4585ebb3846f7207e7d1a2f82fa9cbd86 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Matthias Sohn [Tue, 24 May 2016 14:59:15 +0000 (16:59 +0200)]
Merge branch 'master' into stable-4.4
* master:
JGit CLI: allow to call git init with specific directory
Redirect all Show output to outs
Support git config [include] section with absolute path(s)
Added filter for merge and non-merges commits.
[findBugs] Prevent potential NPE in FS_POSIX.readUmask()
[findBugs] Fix calculation of host header in SignerV4
Update Orbit repository to S20160518051658 for Neon RC2
Fix StashApply regarding handling of untracked files
GC should not pack objects only referenced by ORIG_HEAD,...
Make sure to overwrite files when "reset --hard" detects conflicts
Allow setting FileMode to executable when applying patches in
ApplyCommand
Fix config value get to return last instead of 1st just like git
Remove UTF-8 checking duplication in Config lib subclasses
Update Maven plugins
Fix type parameter in javadoc in TestRepository.delete(String ref)
TestRepository: Add delete() method
Make BaseReceivePack.setAtomic public
ReceivePack: Pass atomic setting from client to BatchRefUpdate
Change-Id: I5c9c5b7ccb23fb48b44b3da10b2c5d876d043d24 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Matthias Sohn [Tue, 24 May 2016 14:44:03 +0000 (16:44 +0200)]
Merge branch 'stable-4.3' into stable-4.4
* stable-4.3:
Fix computation of id in WorkingTreeIterator with autocrlf and
smudging
Prepare 4.3.2-SNAPSHOT builds
JGit v4.3.1.201605051710-r
Scan loose ref before packed in case gc about to remove the loose
Fix possible NPEs when reporting transport errors
Fix calling of clean/smudge filters from Checkout,MergeCommands
Fix ApplyCommand when result of patch is an empty file
Change-Id: I829f06699f6670e519d04c927bdba4b82df29199 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Fix computation of id in WorkingTreeIterator with autocrlf and smudging
JGit failed to do checkouts when the index contained smudged entries and
autocrlf was on. In such cases the WorkingTreeIterator calculated the
SHA1 sometimes on content which was not correctly filtered. The SHA1 was
computed on content which two times went through a lf->crlf conversion.
We used to tell the treewalk whether it is a checkin or checkout
operation and always use the related filters when reading any content.
If on windows and autocrlf is true and we do a checkout operation then
we always used a lf->crlf conversion on any text content. That's not
correct. Even during a checkout we sometimes need the crlf->lf
conversion. E.g. when calculating the content-id for working-tree
content we need to use crlf->lf filtering although the overall operation
type is checkout.
Often this bug does not have effects because we seldom compute the
content-id of filesystem content during a checkout. But we do need to
know whether a file is dirty or not before we overwrite it during a
checkout. And if the index entries are smudged we don't trust the index
and compute filesystem-content-sha1's explicitly.
This caused EGit not to be able to switch branches anymore on Windows
when autocrlf was true. EGit denied the checkout because it thought
workingtree files are dirty because content-sha1 are computed on wrongly
filtered content.
Bug: 493360
Change-Id: I1072a57b4c529ba3aaa50b7b02d2b816bb64a9b8 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Marco Miller [Fri, 6 May 2016 17:18:38 +0000 (13:18 -0400)]
Support git config [include] section with absolute path(s)
As per [1], but limited to absolute paths indeed. No support yet for
tilde or $HOME expansion. Support for the --[no-]includes options
([1]) is not part of this commit scope either, but those options'
defaults are in effect as described in [1].
[1] https://git-scm.com/docs/git-config
Included path can be a config file that includes other path-s in turn.
An exception is thrown if too many recursions (circular includes)
happen because of ill-specified config files.
Change-Id: I700bd7b7e1625eb7de0180f220c707d8e7b0930b Signed-off-by: Marco Miller <marco.miller@ericsson.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Fix StashApply regarding handling of untracked files
There was a bug regarding how JGit handled untracked files when applying
a stash. Problem was that untracked files are applied by doing a merge
of HEAD and untrackedFiles commit with a merge base of the stashed HEAD.
That's wrong because the untrackedFiles commit has no parent and
contains only the untracked files. Using stashed HEAD as merge base
leads to unneccessary conflicts on files not event included in the
untrackedFiles commit.
Imagine this graph directly before you want to apply a stash which was
based on 0. You want to apply the stash on current HEAD commit 5.
5 (HEAD,master)
/
0---+
\ \
1---3 (WIP on master)
/
2 (untracked files on master)
Imagine for a specific (tracked) file f
- commit 0 contains X
- HEAD contains Y
- commit 2 (the untracked files) does not contain file f
A merge of 2 and 5 with a merge base of 0 leads to a conflict. The 5
commit wants to modify the file and the 2 commit wants to delete the
file -> conflict.
If no merge base is set then the semantic is correct.
Thanks to Bow for finding this bug and providing the test case.
GC should not pack objects only referenced by ORIG_HEAD,...
There are references which are returned by
RefDatabase.getAdditionalRefs() which are allowed to point to
non-existing objects. These refs should not save objects from being
garbage collected. Examples for these references are ORIG_HEAD,
MERGE_HEAD, FETCH_HEAD and CHERRY_PICK_HEAD. Native git will not take
these references into account when doing a gc and therefore these
references may point to non-existing objects after a gc. Teach JGit's
GC to behave the same: ignore additional refs if they don't start with
"refs/". Examples for refs returned by getAdditionalRefs() which do
start with "refs/" are the bootstrap refs when using reftree's (see
commit 115f1ad3974d1162b33d1c8eff466019d1f249f3). See also
http://article.gmane.org/gmane.comp.version-control.git/294126.
Bug: 479697
Change-Id: I10e40589f13e72aacdd9f86f3b44696fd1cd068a Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Make sure to overwrite files when "reset --hard" detects conflicts
When performing a "reset --hard" a checkout is done. The pathes are
checked for potential checkout conflicts. But in the end for all
remaining conflicts these files are simply deleted from the working
tree. That's not the right strategy to handle checkout conflicts during
"reset --hard". Instead for every conflict we should simply checkout the
merge commit's content.
This is different from native gits behavior which reports errors when
during a "checkout --hard" a file shows up where a folder was expected.
"warning: unable to unlink d/c.txt: Not a directory"
Why it is like that in native git was asked in
http://permalink.gmane.org/gmane.comp.version-control.git/279482. Unless
it is explained why native git why this error is reported JGit should
overwrite the files.
Bug: 474842
Change-Id: I08e23822a577aaf22120c5137eb169b6bd08447b Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Nadav Cohen [Mon, 2 May 2016 02:06:39 +0000 (19:06 -0700)]
Allow setting FileMode to executable when applying patches in ApplyCommand
git-apply allows modifying file modes in patched files using either
"new mode" or "new file mode" headers. This patch adds support for
setting files as executables and vice-versa.
Marco Miller [Fri, 6 May 2016 20:19:42 +0000 (16:19 -0400)]
Fix config value get to return last instead of 1st just like git
Before this fix, getting the value of 'key' below used to return
value1. This fix makes it so that value3 gets returned instead,
just like native git's get.
[section]
key = value1
key = value2
key = value3
Change-Id: Iccb24de9b63c3ad8646c909494ca3f8c9ed6e29c Signed-off-by: Marco Miller <marco.miller@ericsson.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Marco Miller [Mon, 9 May 2016 16:36:00 +0000 (12:36 -0400)]
Remove UTF-8 checking duplication in Config lib subclasses
Change-Id: Ib9f0ae8207000a36c5bf1a92fcc2c32efc4c0984 Signed-off-by: Marco Miller <marco.miller@ericsson.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Matthias Sohn [Wed, 4 May 2016 23:13:35 +0000 (01:13 +0200)]
Merge branch 'stable-4.3'
* stable-4.3:
Scan loose ref before packed in case gc about to remove the loose
Fix possible NPEs when reporting transport errors
Fix calling of clean/smudge filters from Checkout,MergeCommands
Fix ApplyCommand when result of patch is an empty file
Change-Id: I0dc76b7a8c87ce8e0386a1b9e0fa92a3aa62abf7 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Marco Miller [Wed, 20 Apr 2016 21:42:34 +0000 (17:42 -0400)]
Scan loose ref before packed in case gc about to remove the loose
Before this change, jgit used to read packed-refs before scanning
loose refs. That was not a problem if gc didn't run concurrently. When
gc did run concurrently with such refs reading, that order sometimes
broke the latter. This lead to reading an older version of a ref's
tip, which meant "losing" the real tip or commit. The specific
read-Vs-gc concurrency scenario which broke reading that way follows:
1. let ref R be in packed-refs and R' be in loose
2. jgit starts reading packed-refs
3. gc also starts its business around that very time
4. jgit still has the time to read R from packed-refs
5. as gc is not done yet updating packed-refs with R'
6. jgit then starts scanning loose refs (or is about to)
7. gc quickly ends up being done moving loose R' to packed-refs
8. so gc (quickly) removes loose refs
9. -while jgit is scanning loose refs, now gone
10. so jgit assumes no loose to consider => packed-refs winning
11. so jgit wrongfully returns R (from 4.) as the tip, instead of R'.
This fix switches the order so loose refs are scanned (secured) before
taking the time to read packed-refs. This way, knowledge of the
likelier tip is guaranteed for ref reading to return the true tip
- despite concurrent gc. If there is no loose ref to scan, jgit reads
packed-refs and lands on R' (or S), which it then returns, as
expected. The gerrit issue [1] should be solved by this fix.
Change-Id: Ibd120120a361a3a6ed565f3836afc1db706fbcdd Signed-off-by: Marco Miller <marco.miller@ericsson.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Matthias Sohn [Mon, 4 Apr 2016 09:12:21 +0000 (11:12 +0200)]
Prepare Neon target platform
also use the Neon target platform as the default target platform.
Neon Eclipse platform requires BREE 8 so we have to use Java 8 at least
for the JGit packaging build (for the compiler settings we still stick
to source and target 1.7 since we want to still support Java 7)
otherwise unpacking platform pack200 archives will fail since they are
built using Java 8 and hence cannot be unpacked using Java 7's
unpack200.
Update org.junit from 4.11 to 4.11 and org.apache.ant from from 1.9.2 to
1.9.6 since the older versions are not available in Neon orbit version
Ignore a couple of tests in ResourceUtilTest which now fail [1] since
bug 476585 was fixed in Neon M6.
CQ: 10694
CQ: 11308
Change-Id: I1a99a3ac2148693e21c57df5aeb848035b52b97b Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Dave Borowitz [Fri, 29 Apr 2016 15:10:36 +0000 (11:10 -0400)]
PersonIdent: Strip some special chars from external strings
The special characters <> and '\n' interfere with parsing of
identities. C git strips these special characters, so we should too.
Rather than allocating extra strings by calling String#trim(), add a
few lines to our sanitization method to perform the same trimming as
described in String's Javadoc.
Dave Borowitz [Fri, 29 Apr 2016 14:47:17 +0000 (10:47 -0400)]
PersonIdent: Document that name and email aren't trimmed
This might be somewhat surprising behavior to users who might
naturally assume the following invariant:
ident.equals(parseIdent(ident.toExternalString()))
This invariant does not hold since whitespace is only trimmed during
serialization. We don't want to mess with the strings during
initialization, as this is called during the highly-optimized commit
parsing codepath.
Dave Borowitz [Mon, 25 Apr 2016 17:36:46 +0000 (13:36 -0400)]
Expose the ObjectInserter that created an ObjectReader
We've found in Gerrit Code Review that it is common to pass around
both an ObjectReader (or more commonly a RevWalk wrapping one) and an
ObjectInserter. These code paths often assume that the ObjectReader
can read back any objects created by the ObjectInserter without
flushing. However, we previously had no way to enforce that constraint
programmatically, leading to hard-to-spot problems.
Provide a solution by exposing the ObjectInserter that created an
ObjectReader, when known. Callers can either continue passing both
objects and check:
reader.getCreatedFromInserter() == inserter
or they can just pass around ObjectReader and extract the inserter
when it's needed (checking that it's not null at usage time).
To align with the version used in Gerrit's master build.
Change-Id: I3b6e21bf367ad1fb3598dc06b968aee6187d5aed Signed-off-by: David Pursehouse <david.pursehouse@sonymobile.com> Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>