When a client POSTs to /git-{upload,receive}-pack, the first line
includes their client capabilities. As soon as the C git client sends
side-band(-64k), it goes into a state where it chokes on data not sent
in a valid sideband channel.
GitSmartHttpTools.sendError() is called early in the request, likely
before a {Upload,Receive}Pack handler is assigned or, even so, before it
has read the request. In some cases we must read the first line manually
within sendError() to tell whether sideband is needed.
Change-Id: I8277fd45a4ec3b71fa8f87404b4f5d1a09e0f384
Check connection's error stream before reading from it
HttpURLConnection.getErrorStream can return null which is
currently not guarded against and will throw an NPE preventing
the actual error response code from bubbling up.
Change-Id: I04fb8dbda16b7df3b82fc579088a303b2fd21e87
Modify refs in UploadPack/ReceivePack using a hook interface
This is intended to replace the RefFilter interface (but does not yet,
for backwards compatibility). That interface required lots of extra
scanning and copying in filter cases such as only advertising a subtree
of the refs directory. Instead, provide a hook that can be executed
right before ref advertisement, using the public methods on
UploadPack/ReceivePack to explicitly set the map of advertised refs.
Change-Id: I0067019a191c8148af2cfb71a675f2258c5af0ca
Expose an OutputStream from ReceivePack for sending client messages
Callers may want to format and flush their own output, for example in a
PreReceiveHook that creates its own TextProgressMonitor. The actual
underlying msgOut can change over the lifetime of ReceivePack, so we
implement a small wrapper.
Change-Id: I57b6d6cad2542aaa93dcadc06cb3e933e81bcd3d
Allow creating ReceiveCommands with a specified type
This allows callers who know in advance whether a command is UPDATE or
UPDATE_NONFASTFORWARD to specify this in the constructor rather than
with a separate method call.
Change-Id: Iae483594a4ff370ff75d17a7b0648c5590b3d1bd
Execute ReceiveCommands via a method rather than in ReceivePack
This allows a PreReceiveHook to easily "take over" all of the
ReceiveCommands passed to it, preventing any of them from being handled
within the ReceivePack core.
Change-Id: I2a8c1fc44e8dcadf22cd97a8ec4ee79d4d9d08f1
Make sure all bytes are written to files on close, or get an error.
Java's BufferedOutputStream swallows any errors that occur when flushing
the buffer in close().
This class overrides close to make sure an error during the final
flush is reported back to the caller.
Change-Id: I74a82b31505fadf8378069c5f6554f1033c28f9b
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Add simple chain implementations of transport hooks and loggers
Allows callers to effectively run multiple hooks and loggers without
modifying the UploadPack/ReceivePack interface.
Change-Id: I5b388816b63036ffff08ef3a9b857ccb764cb8c4
Add percent-encoding of reserved characters in URIish
We do this for the the names that have an explicit scheme and
do it both ways. The URIish is parsed before decoding. Only
a few special characters are encoded for the path part of the
URI, i.e. space, non-ASCII and control characters. The percent
encoding is assumed to be a stream encoding so we interpret it
as UTF-8.
Change-Id: I82d1910df9472e21d7212a2b984ff7d8fb2cbf0f
Reset SSH connection and credentials on "Auth fail"
When SSH user/password authentication failed this may have been caused
by changed credentials on the server side. When the SSH credentials of a
user change the SSH connection needs to be re-established and
credentials which may have been stored by the credentials provider
need to be reset in order to enable prompting for the new credentials.
Bug: 356233
Change-Id: I7d64c5f39b68a9687c858bb68a961616eabbc751
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
ReceivePack (and PackParser) can be configured with the
maxObjectSizeLimit in order to prevent users from pushing too large
objects to Git. The limit check is applied to all object types
although it is most likely that a BLOB will exceed the limit. In all
cases the size of the object header is excluded from the object size
which is checked against the limit as this is the size of which a BLOB
object would take in the working tree when checked out as a file.
When an object exceeds the maxObjectSizeLimit the receive-pack will
abort immediately.
Delta objects (both offset and ref delta) are also checked against the
limit. However, for delta objects we will first check the size of the
inflated delta block against the maxObjectSizeLimit and abort
immediately if it exceeds the limit. In this case we even do not know
the exact size of the resolved delta object but we assume it will be
larger than the given maxObjectSizeLimit as delta is generally only
chosen if the delta can copy more data from the base object than the
delta needs to insert or needs to represent the copy ranges. Aborting
early, in this case, avoids unnecessary inflating of the (huge) delta
block.
Unfortunately, it is too expensive (especially for a large delta) to
compute SHA-1 of an object that causes the receive-pack to abort.
This would decrease the value of this feature whose main purpose is to
protect server resources from users pushing huge objects. Therefore
we don't report the SHA-1 in the error message.
Change-Id: I177ef24553faacda444ed5895e40ac8925ca0d1e
Signed-off-by: Sasa Zivkov <sasa.zivkov@sap.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Actually this is not ok according to the RFC, but this implementation is
ment to be Git compatible. A '\' is needed when the authentication
requires or allows authentication to a Windows domain where the
user name can be specified as DOMAIN\user.
Change-Id: If02f258c032486f1afd2e09592a3c7069942eb8b
Clients cache the set of advertised references at the start of a
negotiation, and keep replaying the same "want SHA1" list to the
server on each negotiation step. If another client pushes into
a branch and moves it by fast-forward, any request to obtain that
branch's prior SHA-1 is still valid, the commit is reachable from
the new position of the reference. Unfortunately the fast-forward
causes smart HTTP negotations to fail, as the server no longer is
advertising that prior SHA-1.
Instead of causing clients to fail out with a "want invalid" error
and forcing the end-user retry, possibly getting into a never ending
try-fail-retry race while other clients are pushing into the same
busy repository, allow the slightly stale want request so long as
it is still reachable.
C Git implemented this same change recently to fix races on the
smart HTTP protocol when the C Git git-http-backend is used.
The new RequestPolicy feature also allows server authors to make
an even more lenient configuration that exports any SHA-1 to the
client. This might be useful in certain settings where a server
has authenticated the client as the "repository owner" and wants
to allow them to grab any content from the server as a complete
unbroken history chain.
The new setAdvertisedRefs() method allows server authors to manually
fix the references that are advertised, possibly bypassing the
getAllRefs() call on the Repository object.
Change-Id: I7cdb563bf9c55c83653f217f6e53c3add55a0541
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Do not requeue state vector in stateless RPC fetch
If the no-done capability was enabled on the connection, don't
queue up the state vector again once the ACK %s ready message
is observed from the remote. The pack will be following in this
response stream, so the state vector is no longer required.
Change-Id: I7bd1e76957cb58c7ff1cdaeef227f1b02a7e5d24
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The client's use of UnionInputStream was broken when combined with a
8192 byte buffer used by PackParser. A smart HTTP client connection
always pushes in the execute stateless RPC input stream after the
data stream has ended from the remote peer. At the end of the pack,
PackParser asked to fill a 8192 byte buffer, but if only e.g. 1000
bytes remained UnionInputStream went to the next stream and asked
it for input, which triggered a new RPC, and failed because there
was nothing pending in the request buffer.
Change UnionInputStream to only return what it consumed from a
single InputStream without invoking the next InputStream, just in
case that second InputStream happens to be one of these magical
ones that generates an RPC invocation.
Change-Id: I0e51a8e6fea1647e4d2e08ac9cfc69c2945ce4cb
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This implements the server side of shallow clones only (i.e.
git-upload-pack), not the client side.
CQ: 5517
Bug: 301627
Change-Id: Ied5f501f9c8d1fe90ab2ba44fac5fa67ed0035a4
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
During parsing these are used with contains(). If they are a List
type, the contains operation is not efficient. Some callers such
as UploadPack often pass a List here, so convert to Set when the
type isn't efficient for contains().
Change-Id: If948ae3bf1f46e756bd2d5db14795e12ba7a6207
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
IndexPack: Defer the "Resolving deltas" progress meter
If delta resolution completes in < 1000 milliseconds, don't bother
showing the progress meter. This is actually very common for a Gerrit
Code Review server, where the client is probably sending 1 commit and
only a few trees/blobs modified... and the base objects are hot in the
process buffer cache.
The 1000 millisecond delay is just a guess at a reasonable time to wait.
Change-Id: I440baa64ab0dfa21be61deae8dcd3ca061bed8ce
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This progress meter never reached 100% as it did not update while
resolving the external bases in thin packs.
Instead of updating in batches at the top level, update once per delta
that is resolved. The batching progress meter type should smooth out
the frequent updates to an update rate that is more reasonable to send
to the UI, while also ensuring a successful pack parse always reaches
100% deltas resolved.
Change-Id: Ic77dcac542cfa97213a6b0194708f9d3c256d223
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The cached object databases should not require a close to release
their cached resources. Most object databases just return their
own reference for newCachedDatabase(), so a close() here kills
the real database's internal caches, and possibly underlying files,
resulting in poor performance for the callers of PackParser like
ReceivePack or FetchProcess trying to then go look up objects that
were just parsed, or that current references point to.
Change-Id: Ia4a239093866e5b9faf82744f729fb73f4373f1a
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
If an internal exception occurs while packing and the request
needs to abort, the HTTP response might already be committed due
to progress message having already been delivered to the client.
This prevents UploadPackServlet from resetting the response and
sending back an HTTP 500 response.
Try to catch all exceptions and report internal errors over the
sideband stream or as an ERR command during the initial ACK/NAK
negotiation phase. This allows JGit to transmit an error message
that the user will receive on their console without needing to
worry about resetting the (already gone) HTTP response.
Change-Id: Ie393fb8bb55d2b79ab1276adf71c781c1807f9fe
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
If a fetch or push needs to apply more than a few references
to the local repository it may take more than 0.25 seconds to
process all of the updates. This is especially true in the DHT
storage system during an initial push of a project with many tags.
The backend database may need to use a transaction to ensure each
tag reference creation is unique, and there may be large delays
caused by these transactions.
Change-Id: Ib11a077adfbd525253e425d327f2e2c2380804c7
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Some servlet containers require the servlet to read the EOF marker
from the input stream before a response can be output if the stream
is using "Transfer-Encoding: chunked"... which is typical for any
sort of large push to a repository over smart HTTP.
Ensure the EOF is always read by the PackParser when it is handling
the stream, and fail fast if there is more data present than expected
since this does indicate a protocol error.
Also ensure the EOF is read by UploadPack before it starts to output
a partial response using packing progress meters.
Change-Id: I131db9dea20b2324cb7c3272a814f21296bc64bd
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Use the stored password instead of prompting for it all the time
EGit change Iba3b87293c22e5fe7d989fc312184aa7463c4387 is also required
to make this work for EGit.
Change-Id: Iedc80e133e66d72e78ff0980b6e12634f75eca36
Signed-off-by: Carsten Pfeiffer <carsten.pfeiffer@gebit.de>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Since d1718a the method getHumanishName was broken on windows since
the URIish is not normalized anymore. For a path like
"C:\gitRepositories\egit" the whole path was returned instead of
"egit".
Bug: 343519
Change-Id: I95056009072b99d32f288966302d0f8188b47836
Signed-off-by: Stefan Lay <stefan.lay@sap.com>
jgit.storage.dht is a storage provider implementation for JGit that
permits storing the Git repository in a distributed hashtable, NoSQL
system, or other database. The actual underlying storage system is
undefined, and can be plugged in by implementing 7 small interfaces:
* Database
* RepositoryIndexTable
* RepositoryTable
* RefTable
* ChunkTable
* ObjectIndexTable
* WriteBuffer
The storage provider interface tries to assume very little about the
underlying storage system, and requires only three key features:
* key -> value lookup (a hashtable is suitable)
* atomic updates on single rows
* asynchronous operations (Java's ExecutorService is easy to use)
Most NoSQL database products offer all 3 of these features in their
clients, and so does any decent network based cache system like the
open source memcache product. Relying only on key equality for data
retrevial makes it simple for the storage engine to distribute across
multiple machines. Traditional SQL systems could also be used with a
JDBC based spi implementation.
Before submitting this change I have implemented six storage systems
for the spi layer:
* Apache HBase[1]
* Apache Cassandra[2]
* Google Bigtable[3]
* an in-memory implementation for unit testing
* a JDBC implementation for SQL
* a generic cache provider that can ride on top of memcache
All six systems came in with an spi layer around 1000 lines of code to
implement the above 7 interfaces. This is a huge reduction in size
compared to prior attempts to implement a new JGit storage layer. As
this package shows, a complete JGit storage implementation is more
than 17,000 lines of fairly complex code.
A simple cache is provided in storage.dht.spi.cache. Implementers can
use CacheDatabase to wrap any other type of Database and perform fast
reads against a network based cache service, such as the open source
memcached[4]. An implementation of CacheService must be provided to
glue this spi onto the network cache.
[1] https://github.com/spearce/jgit_hbase
[2] https://github.com/spearce/jgit_cassandra
[3] http://labs.google.com/papers/bigtable.html
[4] http://memcached.org/
Change-Id: I0aa4072781f5ccc019ca421c036adff2c40c4295
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Smart HTTP clients may request both multi_ack_detailed and no-done in
the same request to prevent the client from needing to send a "done"
line to the server in response to a server's "ACK %s ready".
For smart HTTP, this can save 1 full HTTP RPC in the fetch exchange,
improving overall latency when incrementally updating a client that
has not diverged very far from the remote repository.
Unfortuantely this capability cannot be enabled for the traditional
bi-directional connections. multi_ack_detailed has the client sending
more "have" lines at the same time that the server is creating the
"ACK %s ready" and writing out the PACK stream, resulting in some race
conditions and/or deadlock, depending on how the pipe buffers are
implemented. For very small updates, a server might actually be able
to send "ACK %s ready", then the PACK, and disconnect before the
client even finishes sending its first batch of "have" lines. This
may cause the client to fail with a broken pipe exception. To avoid
all of these potential problems, "no-done" is restricted only to the
smart HTTP variant of the protocol.
Change-Id: Ie0d0a39320202bc096fec2e97cb58e9efd061b2d
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Fix ReceivePack connectivity validation with alternates
If a repository has an alternate object database, the alternate has
its references advertised as ".have" lines, which permits the client
to use these as delta base candidates when generating the pack. If
setCheckReferencedObjectsAreReachable(true) is used, these additional
have lines need to be considered in addition to the advertised refs.
Change-Id: Ie39c6696f9d3ff147ef4405cd5624f6011700ce5
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
UploadPack: Add a PreUploadHook to monitor and control behavior
Embedding applications can use this hook to watch actions within
UploadPack and possibly reject them. This could be useful to prevent
clones of a large repository from this server, or to stop abusive
negotiation rounds that offer thousands of objects in a single batch.
Change-Id: Id96f1885ac4d61f22c80b6418fff54184b7348ba
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
PacketLineOut is already public. Make PacketLineIn partially public
in case an application needs to use some of the pkt-line protocol.
Change-Id: I5b383eca980bd9e16a7dbdb5aed040c6586d4f46
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
We used to normalize URI's since it seems simple. This however causes
inconsistencies to the user and to out tests. Just pass backslashes
through and make sure our parser can handle them.
Bug: 341062
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
Change-Id: I2c8e917a086faabcd8749160c2acc9dd05a42838
The RemoteSession interface operates like a simplified version of
java.lang.Runtime with a single exec method (and a disconnect
method). It returns a java.lang.Process, which should begin execution
immediately. Note that this greatly simplifies the interface for
running commands. There is no longer a connect method, and most
implementations will contain the bulk of their code inside
Process.exec, or a constructor called by Process.exec. (See the
revised implementations of JschSession and ExtSession.)
Implementations can now configure their connections properly without
either ignoring the proper use of the interface or trying to adhere
to an overly strict interface with odd rules about what methods are
called first. For example, Jsch needs to create the output stream
before executing, which it now does in the process constructor. These
changes should make it much easier to add alternate session
implementations in the future.
Also-by: John D Eblen <jdeblen@comcast.net>
Bug: 336749
CQ: 5004
Change-Id: Iece43632086afadf175af6638255041ccaf2bfbb
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Assume refs of alternates are reachable during fetch
When fetching from a remote peer, consider all of the refs of any
alternate repository to be reachable locally, in addition to the refs
of the local repository. This mirrors the push protocol and may avoid
unnecessary object transfer when the local repository is empty, but
its alternate and the remote share a lot of common history.
Junio C Hamano recently proposed a similar change to C Git's fetch
client, in order to work around a performance bug I identified when
fetching between two repositories that actually shared the same
alternate repository on the local system.
Change-Id: Iffb0b70e1223901ce2caac3b87ba7e0d6634d265
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>