Refactor all of the push option support code to allocate the list
immediately before parsing the options section off the stream.
Move option support down to ReceivePack instead of BaseReceivePack.
Push options are specific to the ReceivePack protocol and are not
likely to appear in the 4 year old subscription proposal. These
changes are OK before JGit 4.5 ships as no consumer should be relying
on these new APIs.
Change-Id: Ib07d18c877628aba07da07cd91875f918d509c49
Example usage:
$ ./jgit push \
--push-option "Reviewer=j.doe@example.org" \
--push-option "<arbitrary string>" \
origin HEAD:refs/for/master
Stefan Beller has also made an equivalent change to CGit:
http://thread.gmane.org/gmane.comp.version-control.git/299872
Change-Id: I6797e50681054dce3bd179e80b731aef5e200d77
Signed-off-by: Dan Wang <dwwang@google.com>
ReceivePack: enable capabilities immediately on first line
Instead of deferring until after command parsing, enable the
capabilities after the first pkt-line has been read from the client.
This allows the server to setup the side-band-64k channel immediately.
Change-Id: I141b7fc92e983a41d3a58da8e1464a6917422b6c
Fix that exceptions in ReceivePack cause Invalid Channel 101 exceptions
When during a PushOperation the server hits an exception different from
UnpackException the JGit server behaved wrong. That kind of exceptions
are handled so late that the connection is already released and the
information whether to talk sideband to the client is lost. In detail:
ReceivePack.receive() will call release() and that will reset the
capabilities. But later on the stack in ReceivePackServlet.doPost() it
is tried to send a response to client now with reset capabilities (no
sideband!).
Change-Id: I0a609acc6152ab43b47a93d712deb65bb1105f75
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Implement atomic refs update, if possible by database
Inspired by the series[1], this implements the possibility to
have atomic ref transactions.
If the database supports atomic ref update capabilities, we'll
advertise these. If the client wishes to use this feature, either
all refs will be updated or none at all.
[1] http://thread.gmane.org/gmane.comp.version-control.git/259019/focus=259024
Change-Id: I7b5d19c21f3b5557e41b9bcb5d359a65ff1a493d
Signed-off-by: Stefan Beller <sbeller@google.com>
Revert "Add getPackFile to ReceivePack to make PostReceiveHook more
usable"
This reverts commit 2670fd427c.
By returning an instance of File from the ReceivePack.getPackFile the
abstraction of the persistence implementation was broken.
Change-Id: I28e3ebf3a659a7cbc94be51bba9e1ad338f2b786
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Add getPackFile to ReceivePack to make PostReceiveHook more usable
Having access to the pack file that was created by the ReceivePack
may be useful for post receive hooks. For example, a hook may want
to check the size of the received pack and the created index.
Change-Id: I4d51758e4565d32c9f8892242947eb72644b847d
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
A few classes such as Constanrs are marked with @SuppressWarnings, as are
toString() methods with many liternal, but otherwise $NLS-n$ is used for
string containing text that should not be translated. A few literals may
fall into the gray zone, but mostly I've tried to only tag the obvious
ones.
Change-Id: I22e50a77e2bf9e0b842a66bdf674e8fa1692f590
None of these should have been exposed to base classes. The majority
of them are private implementation details that are not required by a
subclass in order to interact with the base protocol definition. The
few that are needed should be visible as accessor methods, so the
internals can be modified without breaking the public JGit API.
Change-Id: I874179105c9c37703307facbbf99387c52bf772c
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
I have unfortunately introduced a few bugs in the native Git client
over the years. 1.7.5 is unable to send chunked requests correctly,
resulting in corrupt data at the server. Ban this client whenever
it uses chunked encoding with an error message.
Prior to some more recent versions, git push over HTTP failed to
report status information and error messages due to a race within
the client and its helper process. Check for these bad versions and
send errors as messages before the status report, enabling users
to see the failures on their terminal.
Change-Id: Ic62d6591cbd851d21dbb3e9b023d655eaecb0624
We are working on a publish/subscribe based git protocol, and we want to
reuse several parts of the ReceivePack-like code for reading commands
and processing a pack. In this new implementation, the connection
management will be very different, in particular, there may be multiple
packs received on a single open connection. So, hoist out as much as we
can from ReceivePack, mostly just leaving behind the single-connection
version in that class.
Change-Id: I5567aad6ae77951f73f59c1f91996d934ea88334
When a client POSTs to /git-{upload,receive}-pack, the first line
includes their client capabilities. As soon as the C git client sends
side-band(-64k), it goes into a state where it chokes on data not sent
in a valid sideband channel.
GitSmartHttpTools.sendError() is called early in the request, likely
before a {Upload,Receive}Pack handler is assigned or, even so, before it
has read the request. In some cases we must read the first line manually
within sendError() to tell whether sideband is needed.
Change-Id: I8277fd45a4ec3b71fa8f87404b4f5d1a09e0f384
Modify refs in UploadPack/ReceivePack using a hook interface
This is intended to replace the RefFilter interface (but does not yet,
for backwards compatibility). That interface required lots of extra
scanning and copying in filter cases such as only advertising a subtree
of the refs directory. Instead, provide a hook that can be executed
right before ref advertisement, using the public methods on
UploadPack/ReceivePack to explicitly set the map of advertised refs.
Change-Id: I0067019a191c8148af2cfb71a675f2258c5af0ca
Expose an OutputStream from ReceivePack for sending client messages
Callers may want to format and flush their own output, for example in a
PreReceiveHook that creates its own TextProgressMonitor. The actual
underlying msgOut can change over the lifetime of ReceivePack, so we
implement a small wrapper.
Change-Id: I57b6d6cad2542aaa93dcadc06cb3e933e81bcd3d
Execute ReceiveCommands via a method rather than in ReceivePack
This allows a PreReceiveHook to easily "take over" all of the
ReceiveCommands passed to it, preventing any of them from being handled
within the ReceivePack core.
Change-Id: I2a8c1fc44e8dcadf22cd97a8ec4ee79d4d9d08f1
ReceivePack (and PackParser) can be configured with the
maxObjectSizeLimit in order to prevent users from pushing too large
objects to Git. The limit check is applied to all object types
although it is most likely that a BLOB will exceed the limit. In all
cases the size of the object header is excluded from the object size
which is checked against the limit as this is the size of which a BLOB
object would take in the working tree when checked out as a file.
When an object exceeds the maxObjectSizeLimit the receive-pack will
abort immediately.
Delta objects (both offset and ref delta) are also checked against the
limit. However, for delta objects we will first check the size of the
inflated delta block against the maxObjectSizeLimit and abort
immediately if it exceeds the limit. In this case we even do not know
the exact size of the resolved delta object but we assume it will be
larger than the given maxObjectSizeLimit as delta is generally only
chosen if the delta can copy more data from the base object than the
delta needs to insert or needs to represent the copy ranges. Aborting
early, in this case, avoids unnecessary inflating of the (huge) delta
block.
Unfortunately, it is too expensive (especially for a large delta) to
compute SHA-1 of an object that causes the receive-pack to abort.
This would decrease the value of this feature whose main purpose is to
protect server resources from users pushing huge objects. Therefore
we don't report the SHA-1 in the error message.
Change-Id: I177ef24553faacda444ed5895e40ac8925ca0d1e
Signed-off-by: Sasa Zivkov <sasa.zivkov@sap.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
If a fetch or push needs to apply more than a few references
to the local repository it may take more than 0.25 seconds to
process all of the updates. This is especially true in the DHT
storage system during an initial push of a project with many tags.
The backend database may need to use a transaction to ensure each
tag reference creation is unique, and there may be large delays
caused by these transactions.
Change-Id: Ib11a077adfbd525253e425d327f2e2c2380804c7
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Some servlet containers require the servlet to read the EOF marker
from the input stream before a response can be output if the stream
is using "Transfer-Encoding: chunked"... which is typical for any
sort of large push to a repository over smart HTTP.
Ensure the EOF is always read by the PackParser when it is handling
the stream, and fail fast if there is more data present than expected
since this does indicate a protocol error.
Also ensure the EOF is read by UploadPack before it starts to output
a partial response using packing progress meters.
Change-Id: I131db9dea20b2324cb7c3272a814f21296bc64bd
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Fix ReceivePack connectivity validation with alternates
If a repository has an alternate object database, the alternate has
its references advertised as ".have" lines, which permits the client
to use these as delta base candidates when generating the pack. If
setCheckReferencedObjectsAreReachable(true) is used, these additional
have lines need to be considered in addition to the advertised refs.
Change-Id: Ie39c6696f9d3ff147ef4405cd5624f6011700ce5
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
As PackParser supports a progress meter for the "Resolving deltas"
phase of its work, we should export this to smart HTTP clients so
they know the server is still working on their (large) upload.
However this isn't as simple as just dropping in a binding for
the SmartOutputStream to flush when its told to. We want to
avoid spurious flushes triggered by the use of sideband, or the
status report formatting in the send-pack/receive-pack protocol.
Change-Id: Ibd88022a298c5fed0edb23dfaf2e90278807ba8b
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
It isn't strictly necessary to validate every reference's target
object is reachable in the repository before advertising it to a
client. This is an expensive operation when there are thousands of
references, and its very unlikely that a reference uses a missing
object, because garbage collection proceeds from the references and
walks down through the graph. So trying to hide a dangling reference
from clients is relatively pointless.
Even if we are trying to avoid giving a client a corrupt repository,
this simple check isn't sufficient. It is possible for a reference to
point to a valid commit, but that commit to have a missing blob in its
root tree. This can be caused by staging a file into the index,
waiting several weeks, then committing that file while also racing
against a prune. The prune may delete the blob, since its
modification time is more than 2 weeks ago, but retain the commit,
since its modification time is right now.
Such graph corruption is already caught during PackWriter as it
enumerates the graph from the client's want list and digs back
to the roots or common base. Leave the reference validation also
for that same phase, where we know we have to parse the object to
support the enumeration.
Change-Id: Iee70ead0d3ed2d2fcc980417d09d7a69b05f5c2f
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
CGit push clients 1.6.6 and later support progress messages on the
side-band-64k channel during push, as this was introduced to handle
server side hook errors reported over smart HTTP.
Since JGit's delta resolution isn't always as fast as CGit's is,
a user may think the server has crashed and failed to report
status if the user pushed a lot of content and sees no feedback.
Exposing the progress monitor during the resolving deltas phase
will let the user know the server is still making forward progress.
This also helps BasePackPushConnection, which has a bounded timeout
on how long it will wait before assuming the remote server is dead.
Progress messages pushed down the side-band channel will reset the
read timer, helping the connection to stay alive and avoid timing
out before the remote side's work is complete.
Change-Id: I429c825e5a724d2f21c66f95526d9c49edcc6ca9
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Refactor IndexPack to not require local filesystem
By moving the logic that parses a pack stream from the network (or
a bundle) into a type that can be constructed by an ObjectInserter,
repository implementations have a chance to inject their own logic
for storing object data received into the destination repository.
The API isn't completely generic yet, there are still quite a few
assumptions that the PackParser subclass is storing the data onto
the local filesystem as a single file. But its about the simplest
split of IndexPack I can come up with without completely ripping
the code apart.
Change-Id: I5b167c9cc6d7a7c56d0197c62c0fd0036a83ec6c
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
java.io.File.delete() reports failure as an exceptional
return value false. Fix the code which silently ignored
this exceptional return value. Also remove some duplicate
deletion helper methods.
Change-Id: I80ed20ca1f07a2bc6e779957a4ad0c713789c5be
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Fix checkReferencedIsReachable to use correct base list
When checkReferencedIsReachable is set in ReceivePack we are trying
to prove that the push client is permitted to access an object that
it did not send to us, but that the received objects link to either
via a link inside of an object (e.g. commit parent pointer or tree
member) or by a delta base reference.
To do this check we are making a list of every potential delta base,
and then ensuring that every delta base used appears on this list.
If a delta base does not appear on this list, we abort with an error,
letting the client know we are missing a particular object.
Preventing spurious errors about missing delta base objects requires
us to use the exact same list of potential delta bases as the remote
push client used. This means we must use TOPO ordering, and we
need to enable BOUNDARY sorting so that ObjectWalk will correctly
include any trees found during the enumeration back to the common
merge base between the interesting and uninteresting heads.
To ensure JGit's own push client matches this same potential delta
base list, we need to undo 60aae90d4d ("Disable topological
sorting in PackWriter") and switch back to using the conventional
TOPO ordering for commits in a pack file. This ensures that our
own push client will use the same potential base object list as
checkReferencedIsReachable uses on the receiving side.
Change-Id: I14d0a326deb62a43f987b375cfe519711031e172
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Since we are only checking the links between objects we don't need
to hold onto commit messages after their headers have been parsed
by the walker. Dropping them saves a bit of memory, which is always
good when accepting huge pack files.
Change-Id: I378920409b6acf04a35cdf24f81567b1ce030e36
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
ReceivePack: Rethrow exceptions caught during indexing
If we get an exception while indexing the incoming pack, its likely
a stream corruption. We already report an error to the client, but
we eat the stack trace, which makes debugging issues related to a
bug inside of JGit nearly impossible. Rethrow it under a new type
UnpackException, so embedding servers or applications can catch the
error and provide it to a human who might be able to forward such
traces onto a JGit developer for evaluation.
Change-Id: Icad41148bbc0c76f284c7033a195a6b51911beab
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Undo translation of protocol string 'unpack error'
This string is part of the network protocol, and isn't meant to
be translated into another language. Clients actually scan for
the string "unpack error " off the wire and react magically to
this information. If it were translated, they would instead have
a protocol exception, which isn't very useful when there is already
an error occurring.
Change-Id: Ia5dc8d36ba65ad2552f683bb637e80b77a7d92f0
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Update a number of calling sites of RevWalk to ensure the walker's
internal ObjectReader is released after the walk is no longer used.
Because the ObjectReader is likely to hold onto a native resource
like an Inflater, we don't want to leak them outside of their
useful scope.
Where possible we also try to share ObjectReaders across several
walk pools, or between a walker and a PackWriter. This permits
the ObjectReader to actually do some caching if it felt inclined
to do so.
Not everything was updated, we'll probably need to come back and
update even more call sites, but these are some of the biggest
offenders. Test cases in particular aren't updated. My plan is to
move most storage-agnostic tests onto some purely in-memory storage
solution that doesn't do compression.
Change-Id: I04087ec79faeea208b19848939898ad7172b6672
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
We don't actually need a Repository object here, just an ObjectReader
that can load content for us. So change the API to depend on that.
However, this breaks the asCommit and asTag legacy translation methods
on RevCommit and RevTag, so we still have to keep the Repository
inside of RevWalk for those two types. Hopefully we can drop those in
the future, and then drop the Repository off the RevWalk.
Change-Id: Iba983e48b663790061c43ae9ffbb77dfe6f4818e
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Move FileRepository to storage.file.FileRepository
This move isolates all of the local file specific implementation code
into a single package, where their package-private methods and support
classes are properly hidden away from the rest of the core library.
Because of the sheer number of files impacted, I have limited this
change to only the renames and the updated imports.
Change-Id: Icca4884e1a418f83f8b617d0c4c78b73d8a4bd17
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Instead of peeling things by hand in application level code, defer
the peeling logic into RevWalk's new peel utility method.
Change-Id: Idabd10dc41502e782f6a2eeb56f09566b97775a8
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The strings are externalized into the root resource bundles.
The resource bundles are stored under the new "resources" source
folder to get proper maven build.
Strings from tests are, in general, not externalized. Only in
cases where it was necessary to make the test pass the strings
were externalized. This was typically necessary in cases where
e.getMessage() was used in assert and the exception message was
slightly changed due to reuse of the externalized strings.
Change-Id: Ic0f29c80b9a54fcec8320d8539a3e112852a1f7b
Signed-off-by: Sasa Zivkov <sasa.zivkov@sap.com>
This option was mis-named from day 1. Its not checking that the
objects provided by the client are reachable, its actually doing
a scan to prove that objects referenced by the client are already
reachable through another reference on the server, or were sent
as part of the pack from the client.
Rename it checkReferencedObjectsAreReachable, since we really are
trying to validate that objects referenced by the client's actions
are reachable to the client.
We also need to ensure we run checkConnectivity() anytime this is
enabled, even if the caller didn't turn on fsck for object formats.
Otherwise the check would be completely bypassed.
Change-Id: Ic352ddb0ca8464d407c6da5c83573093e018af19
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
ReceivePack: Micro-optimize object lookup when checking connectivity
If we are checking the visibility of everything referenced in the
pack that isn't already reachable by a reference, it needs to be
in the provided set. Since the provided set lists everything that
is in this pack, we can avoid checking to see if the blob exists
on disk, because we know it should be there, it was found in the
pack we just consumed.
Change-Id: Ie3c7746f734d13077242100a68e048f1ac18c34a
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
If a tree was referenced but not provided in the pack, report it
as a missing tree and not as a missing blob.
Change-Id: Iab05705349cdf0d30cc3f8afc6698a8d2a941343
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
IndexPack: Tighten up new and base object bookkeeping
The only current consumer of these collections is ReceivePack,
where it needs to test ObjectId equality between a RevObject and an
ObjectId. There we were copying from a traditional HashSet<ObjectId>
into an ObjectIdSubclassMap<ObjectId>, as the latter can perform
hashing using ObjectId's native value support, bypassing RevObject's
override on hashCode() and equals(). Instead of doing that copy,
directly create ObjectIdSubclassMap instances inside of ReceivePack.
We also only need to record the objects that do not appear in the
incoming pack, and were therefore copied from the local repositiory
in order to complete delta resolution. Instead of listing everything
that used an OBJ_REF_DELTA format, list only the objects that we
pulled from the destination repository via a normal ObjectLoader.
ReceivePack can now discard the IndexPack object, and all of its
other data, as soon as these collections are held by the check
connectivity method. This frees up memory for the ObjectWalk's
own RevObject pool.
Change-Id: I22ef71b45c2045a0202e7fd550a770ee1f6f38a6
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
ReceivePack: Remove need new,base object id properties
These are more like internal implementation details of how IndexPack
works with ReceivePack to validate the incoming object stream.
Callers who are embedding the ReceivePack logic in their own
application don't really need to know the details of which objects
were used for delta bases in the incoming thin pack, or exactly
which objects were newly transmitted.
Hide these from the API, as exposing them through ReceivePack was
an early mistake.
Change-Id: I7ee44a314fa19e6a8520472ce05de92c324ad43e
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
ReceivePack: Discard IndexPack as soon as possible
The IndexPack object carries a good bit of state within itself about
the objects received over the wire. The earlier we can discard it,
the sooner the GC is able to reclaim this chunk of memory for other
uses. So drop it as soon as we are certain the pack is valid and we
have no connectivity concerns.
Change-Id: I1e8bc87c2e9183733043622237a064e55957891f
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>