| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
Our rule to enforce javodocs for public members gives us a problem
because there are some patterns where javadoc make little sense so we
make the comments as small as possible, which our formatting rules do
not like, so disable it for those source files.
Change-Id: I6e3edb1e650ed45428b89cf41e6151b6536bca8a
Signed-off-by: Chris Aniszczyk <zx@twitter.com>
|
|
|
|
|
|
|
|
|
|
|
| |
For streams that should not be closed, i.e. don't own an underlying
stream, and in-memory streams that do not need to be closed we just
suppress the warning. This mostly apply to test cases. GC is enough.
For streams with external resources (i.e. files) we add the necessary
call to close().
Change-Id: I4d883ba2e7d07f199fe57ccb3459ece00441a570
|
|
|
|
| |
Change-Id: Ic7b8494713d59574ee8f064a5c1042b48aedf012
|
|\
| |
| |
| |
| |
| | |
* changes:
Use '406 Not Acceptable' when info/refs is disabled
Compress large /info/refs responses on HTTP
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Instead of a confusing 403 Forbidden error indicating the dumb file
service is disabled on this server, use 406 Not Acceptable to mean
the client sent a request for content (the plain info/refs file)
that this server does not want to provide.
The stock C Git client will report HTTP 406 error if it predates
1.6.6 or something goes wrong with the smart request and it tried
falling back to the dumb request. This may help to debug cases where
a broken proxy server exists between the client and the server and
has mangled a prior smart info/refs response.
Change-Id: Ic2b78ba9502e4bbdff7cc3ba1fd284cf7616412b
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Enable streaming compression for any response that is bigger than
the 32 KiB buffer used by SmartOutputStream. This is useful on the
info/refs file which can have many branches and tags listed, and
is often bigger than 32 KiB, but also compresses by at least 50%.
Disable streaming compression on large git-upload-pack responses,
as these are usually highly compressed Git pack data. Trying to
compress these with gzip will only waste CPU time and additional
transfer space with the gzip wrapper. Small git-upload-pack data
is usually text based negotiation responses and can be squeezed
smaller with a little bit of CPU usage.
Change-Id: Ia13e63ed334f594d5e1ab53a97240eb2e8f550e2
|
|/
|
|
|
|
|
|
|
| |
Invoke the wrapper types' valueOf via static imports.
For booleans used in asserts, add a new assert in
the JUnit utility package since out current version of JUnit
does not have the assert(boolean, boolean) method.
Change-Id: I9099bd8efbc8c133479344d51ce7dabed8958a2b
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I have unfortunately introduced a few bugs in the native Git client
over the years. 1.7.5 is unable to send chunked requests correctly,
resulting in corrupt data at the server. Ban this client whenever
it uses chunked encoding with an error message.
Prior to some more recent versions, git push over HTTP failed to
report status information and error messages due to a race within
the client and its helper process. Check for these bad versions and
send errors as messages before the status report, enabling users
to see the failures on their terminal.
Change-Id: Ic62d6591cbd851d21dbb3e9b023d655eaecb0624
|
|
|
|
|
|
|
|
|
|
| |
This has no effect on Git clients, but for browsers, 403 Forbidden may
be more appropriate. 500 Internal Server Error implies that there is
a problem with the server, whereas ServiceMayNotContinueException is
specifically intended to cover cases where the server is functioning
correctly but has determined that the request may not proceed.
Change-Id: I825abd2a029d372060103655eabf488a0547c1e8
|
|
|
|
|
|
|
| |
This allows the use of precompiled patterns, such as those compiled with
flags.
Change-Id: I1c87fea98e246004aecbae3aabaf1d21fbf3176e
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Include some behaviors that were not clear to me until I had used it a
few times.
Warn about broken behavior for capture groups that do not match. It
would be nice to support these, but even for the cases where it's
clear what the behavior should be, it would be infeasible to
implement.
For example, consider the second group of the regex "(/a)/b(/c)?"
matched against the path "/a/b". We might want getServletPath() to
return "/a/b" and getPathInfo() to return null, but this is hard to
implement: there's no easy way to say "the substring up to the point
where (/c) would have matched if it were in the string even though
it's not." And even if we could, it's not clear there is even a right
answer in the general case.
Moreover, ideally we could warn about such broken patterns at servlet
initialization time, rather than at runtime, but even answering the
question of whether there are capture groups that might not match
requires more customized regular expression parsing than we want to
embark on. Hence, the best we can do is document how it fails.
Change-Id: I7bd5011f5bd387f9345a0e79b22a4d7ed918a190
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implementations may want to send an error message to the user, which
doesn't really fit with any of the existing exception types.
ServiceMayNotContinueException, on the other hand, is documented as
always containing a user-visible error string, so use that.
Modify the git and HTTP transport mechanisms to properly relay this
message to the end user.
Change-Id: I362e67ea46102a145bf2c6284d38788537c9735f
|
|
|
|
|
| |
Change-Id: Ie22ac47e315bff76f224214bc042fc483eb01550
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The fetch-pack/upload-pack stream usually has an LF at the
end of the first "want" line. Trim this when checking to
see if side-band or side-band-64k was used.
Perform the same trim for send-pack/receive-pack, as it is
harmless in this context to ignore an LF just before doing
an error report.
Change-Id: I6ef946bb6124fa72c52bd5320187eaac3ed906e7
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a client POSTs to /git-{upload,receive}-pack, the first line
includes their client capabilities. As soon as the C git client sends
side-band(-64k), it goes into a state where it chokes on data not sent
in a valid sideband channel.
GitSmartHttpTools.sendError() is called early in the request, likely
before a {Upload,Receive}Pack handler is assigned or, even so, before it
has read the request. In some cases we must read the first line manually
within sendError() to tell whether sideband is needed.
Change-Id: I8277fd45a4ec3b71fa8f87404b4f5d1a09e0f384
|
|
|
|
|
|
|
|
|
|
|
| |
This is intended to replace the RefFilter interface (but does not yet,
for backwards compatibility). That interface required lots of extra
scanning and copying in filter cases such as only advertising a subtree
of the refs directory. Instead, provide a hook that can be executed
right before ref advertisement, using the public methods on
UploadPack/ReceivePack to explicitly set the map of advertised refs.
Change-Id: I0067019a191c8148af2cfb71a675f2258c5af0ca
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The HTTP RFCs require a server to fully consume the request body before
it can return a non-error status code, which is any code below 400.
JGit returns most Git level errors inside of an HTTP 200 OK response,
and sometimes this happens before the entire request was consumed from
the servlet container. In such cases the body must be skipped or read
until EOF is reached, ensuring the HTTP keep-alive semantics will work
for the next request on the same TCP connection.
HTTP status codes >= 400 may be returned without consuming the body,
and a servlet container must set "Connection: close" in the response
headers when this happens, since the state of the request body is not
well defined with an early abort.
With the introduction of sendError() in GitSmartHttpTools there are
only a handful of locations that need to worry about the request body
being consumed, so sprinkle the call in as necessary.
Change-Id: I5381e110585f780c01a764df8e27c80aacf5146e
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Error messages are typically short, below the 32 KiB in-memory buffer
size of the SmartOutputStream. When an error is queued up for sending
to a client and an exception is thrown up into the servlet handler we
discarded the message and sent nothing to the client, as the messages
were stuck inside of the SmartOutputStream buffer.
Hoist the creation of the output stream above the invocation of try
block of the service, and use close() in the few catch blocks that
assume there are buffered messages ready for transmission. This will
ensure errors from unpacking a stream in ReceivePack are sent off to
a client correctly, as previously these were causing no status report
to arrive at the client side as the data was stuck in the buffer.
Change-Id: I5534b560697731121f48979ae077aa7c95b8e39c
|
|
|
|
|
|
|
|
| |
The stream must be closed to ensure the native resources associated
with its internal Deflater instance are cleaned up early, instead of
waiting for GC to identify the dead object and finialize it.
Change-Id: Ic31b5df563f19404ed4682556999f4332aa61562
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The GitSmartHttpTools class started as utility functions to help report
useful error messages to users of the android.googlesource.com service.
Now that the GitServlet and GitFilter classes support filters before a
git-upload-pack or git-receive-pack request, server implementors may
these routines helpful to report custom messages to clients. Using the
sendError() method to return an HTTP 200 OK with error text embedded in
the payload prevents native Git clients from retrying the action with a
dumb Git or WebDAV HTTP request.
Refactor some of the existing code to use these new error functions and
protocol constants. The new sendError() function is very close to being
identical to the old error handling code in RepositoryFilter, however we
now use the POST Content-Type rather than the Accept HTTP header to check
if the client will accept the error data in the response body rather than
using the HTTP status code. This is a more reliable way of checking for
native Git clients, as the Accept header was not always populated with the
correct string in older versions of Git smart HTTP.
Change-Id: I828ac2deb085af12b6689c10f86662ddd39bd1a2
|
|
|
|
|
|
|
|
|
| |
If removing the leading slash results in an empty string, return
with an HTTP 404 error before trying to use the RepositoryResolver.
Moving this into a loop ahead of the length check ensures there is
no empty string passed into the resolver.
Change-Id: I80e5b7cf25ae9f2164b5c396a29773e5c7d7286e
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All Git URLs operate off a suffix approach, for example the default
binding is for paths such as:
*/info/refs
*/git-upload-pack
*/git-receive-pack
These names are not common on project hosting servers, especially
one like Gerrit Code Review.
In addition to offering Git-over-HTTP as a servlet, offer it as a
filter that triggers when a matching suffix appears, but otherwise
delegates the request through the chain. This filter would permit
Gerrit Code Review to place projects at the root of the server,
rather than within the "/p/" subdirectory, making the HTTP and SSH
URL structure exactly match each other.
To prevent breakage with existing users, the MetaServlet and
GitServlet are kept as wrappers delegating to their filters,
returning 404 Not Found when the filter has no match.
Change-Id: I2465c15c086497e0faaae5941159d80c028fa8b1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If an internal exception occurs while packing and the request
needs to abort, the HTTP response might already be committed due
to progress message having already been delivered to the client.
This prevents UploadPackServlet from resetting the response and
sending back an HTTP 500 response.
Try to catch all exceptions and report internal errors over the
sideband stream or as an ERR command during the initial ACK/NAK
negotiation phase. This allows JGit to transmit an error message
that the user will receive on their console without needing to
worry about resetting the (already gone) HTTP response.
Change-Id: Ie393fb8bb55d2b79ab1276adf71c781c1807f9fe
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
|
| |
The filter did not correctly match smart HTTP client requests,
so it always fell back on HTTP status codes for errors. This
usually causes a smart client to retry a dumb request, which
is not what the server wants.
Change-Id: I42592378dc42fbe308ef30a2923786c690f668a9
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Smart HTTP clients may request both multi_ack_detailed and no-done in
the same request to prevent the client from needing to send a "done"
line to the server in response to a server's "ACK %s ready".
For smart HTTP, this can save 1 full HTTP RPC in the fetch exchange,
improving overall latency when incrementally updating a client that
has not diverged very far from the remote repository.
Unfortuantely this capability cannot be enabled for the traditional
bi-directional connections. multi_ack_detailed has the client sending
more "have" lines at the same time that the server is creating the
"ACK %s ready" and writing out the PACK stream, resulting in some race
conditions and/or deadlock, depending on how the pipe buffers are
implemented. For very small updates, a server might actually be able
to send "ACK %s ready", then the PACK, and disconnect before the
client even finishes sending its first batch of "have" lines. This
may cause the client to fail with a broken pipe exception. To avoid
all of these potential problems, "no-done" is restricted only to the
smart HTTP variant of the protocol.
Change-Id: Ie0d0a39320202bc096fec2e97cb58e9efd061b2d
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the client is clearly making a smart HTTP request to our smart
HTTP server, return any errors like RepositoryNotFoundException or
ServiceNotEnabledException inside of the payload as a Git level ERR
message, rather than an HTTP error code.
This prevents the C Git command line client from retrying a failed
"$URL/info/refs?service=git-upload-pack" request without the smart
service URL, only to fail again with "403 Forbidden" when the dumb
as-is service has been disabled by the server configuration, or is
unavailable because the repository is not on the local filesystem.
Change-Id: I57e8756d5026e885e0ca615979bfcd729703be6c
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
|
| |
Embedding applications can use this hook to watch actions within
UploadPack and possibly reject them. This could be useful to prevent
clones of a large repository from this server, or to stop abusive
negotiation rounds that offer thousands of objects in a single batch.
Change-Id: Id96f1885ac4d61f22c80b6418fff54184b7348ba
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Permit applications embedding GitServlet to wrap the
info/refs?service=$name and /$name operations with a
servlet Filter.
To help applications inspect state of the operation,
expose the UploadPack or ReceivePack object into a
request attribute. This can be useful for logging,
or to implement throttling of requests like Gerrit
Code Review uses to prevent server overload.
Change-Id: Ib8773c14e2b7a650769bd578aad745e6651210cb
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
| |
Change-Id: I461786baffab15515365ead1d9c350a1880443ea
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As PackParser supports a progress meter for the "Resolving deltas"
phase of its work, we should export this to smart HTTP clients so
they know the server is still working on their (large) upload.
However this isn't as simple as just dropping in a binding for
the SmartOutputStream to flush when its told to. We want to
avoid spurious flushes triggered by the use of sideband, or the
status report formatting in the send-pack/receive-pack protocol.
Change-Id: Ibd88022a298c5fed0edb23dfaf2e90278807ba8b
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some clients coming through proxies may advertise a different
Accept-Encoding, for example "Accept-Encoding: gzip(proxy)".
Matching by substring causes us to identify this as a false positive;
that the client understands gzip encoding and will inflate the
response before reading it.
In this particular case however it doesn't. Its the reverse proxy
server in front of JGit letting us know the proxy<->JGit link can
be gzip compressed, while the client<->proxy part of the link is not:
client <-- no gzip --> proxy <-- gzip --> JGit
Use a more standard method of parsing by splitting the value into
tokens, and only using gzip if one of the tokens is exactly the
string "gzip". Add a unit test to make sure this isn't broken in
the future.
Change-Id: I30cda8a6d11ad235b56457adf54a2d27095d964e
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Using a resolver and factory pattern for the anonymous git:// Daemon
class makes transport.Daemon more useful on non-file storage systems,
or in embedded applications where the caller wants more precise
control over the work tasks constructed within the daemon.
Rather than defining new interfaces, move the existing HTTP ones
into transport.resolver and make them generic on the connection
handle type. For HTTP, continue to use HttpServletRequest, and
for transport.Daemon use DaemonClient.
To remain compatible with transport.Daemon, FileResolver needs to
learn how to use multiple base directories, and how to export any
Repository instance at a fixed name.
Change-Id: I1efa6b2bd7c6567e983fbbf346947238ea2e847e
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It isn't strictly necessary to validate every reference's target
object is reachable in the repository before advertising it to a
client. This is an expensive operation when there are thousands of
references, and its very unlikely that a reference uses a missing
object, because garbage collection proceeds from the references and
walks down through the graph. So trying to hide a dangling reference
from clients is relatively pointless.
Even if we are trying to avoid giving a client a corrupt repository,
this simple check isn't sufficient. It is possible for a reference to
point to a valid commit, but that commit to have a missing blob in its
root tree. This can be caused by staging a file into the index,
waiting several weeks, then committing that file while also racing
against a prune. The prune may delete the blob, since its
modification time is more than 2 weeks ago, but retain the commit,
since its modification time is right now.
Such graph corruption is already caught during PackWriter as it
enumerates the graph from the client's want list and digs back
to the roots or common base. Leave the reference validation also
for that same phase, where we know we have to parse the object to
support the enumeration.
Change-Id: Iee70ead0d3ed2d2fcc980417d09d7a69b05f5c2f
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Update a number of calling sites of RevWalk to ensure the walker's
internal ObjectReader is released after the walk is no longer used.
Because the ObjectReader is likely to hold onto a native resource
like an Inflater, we don't want to leak them outside of their
useful scope.
Where possible we also try to share ObjectReaders across several
walk pools, or between a walker and a PackWriter. This permits
the ObjectReader to actually do some caching if it felt inclined
to do so.
Not everything was updated, we'll probably need to come back and
update even more call sites, but these are some of the biggest
offenders. Test cases in particular aren't updated. My plan is to
move most storage-agnostic tests onto some purely in-memory storage
solution that doesn't do compression.
Change-Id: I04087ec79faeea208b19848939898ad7172b6672
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This move isolates all of the local file specific implementation code
into a single package, where their package-private methods and support
classes are properly hidden away from the rest of the core library.
Because of the sheer number of files impacted, I have limited this
change to only the renames and the updated imports.
Change-Id: Icca4884e1a418f83f8b617d0c4c78b73d8a4bd17
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some types of repositories might not be stored on local disk. For
these, they will most likely return null for getDirectory() as the
java.io.File type cannot describe where their storage is, its not
in the host's filesystem.
Document that getDirectory() can return null now, and update all
current non-test callers in JGit that might run into problems on
such repositories. For the most part, just act like its bare.
Change-Id: I061236a691372a267fd7d41f0550650e165d2066
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If UploadPack invokes flush() on the output stream we pass it, its
most likely the progress messages coming down the side band stream.
As pack generation can take a while, we want to push that down
at the client as early as we can, to keep the connection alive,
and to let the user know we are still working on their behalf.
Ensure we dump the temporary buffer whenever flush() is invoked,
otherwise the messages don't get sent in a timely fashion to the
user agent (in this case, git fetch).
We specifically don't implement flush() for ReceivePack right now,
as that protocol currently does not provide progress messages to
the user, but it does invoke flush several times, as the different
streams include '0000' type flush-pkts to denote various end points.
Change-Id: I797c90a2c562a416223dc0704785f61ac64e0220
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
| |
Change-Id: I53f71dc0e3c68839da5ff5a2e0f3eeb8340e4793
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On Windows, FS_Win32_Cygwin has been used if a Cygwin Git installation
is present in the PATH. Assuming that the user works with the Cygwin
Git installation may result in unnecessary overhead if he actually
does not.
Applications built on top of jgit may have more knowledge on the
actually used Git client (Cygwin or not) and hence should be able to
configure which FS to use accordingly.
Change-Id: Ifc4278078b298781d55cf5421e9647a21fa5db24
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The J2SE NIO APIs require that FileChannel close the underlying file
descriptor if a thread is interrupted while it is inside of a read or
write operation on that channel. This is insane, because it means we
cannot share the file descriptor between threads. If a thread is in
the middle of the FileChannel variant of IO.readFully() and it
receives an interrupt, the pack will be automatically closed on us.
This causes the other threads trying to use that same FileChannel to
receive IOExceptions, which leads to the pack getting marked as
invalid. Once the pack is marked invalid, JGit loses access to its
entire contents and starts to report MissingObjectExceptions.
Because PackWriter must ensure that the chosen pack file stays
available until the current object's data is fully copied to the
output, JGit cannot simply reopen the pack when its automatically
closed due to an interrupt being sent at the wrong time. The pack may
have been deleted by a concurrent `git gc` process, and that open file
descriptor might be the last reference to the inode on disk. Once its
closed, the PackWriter loses access to that object representation, and
it cannot complete sending the object the client.
Fortunately, RandomAccessFile's readFully method does not have this
problem. Interrupts during readFully() are ignored. However, it
requires us to first seek to the offset we need to read, then issue
the read call. This requires locking around the file descriptor to
prevent concurrent threads from moving the pointer before the read.
This reduces the concurrency level, as now only one window can be
paged in at a time from each pack. However, the WindowCache should
already be holding most of the pages required to handle the working
set for a process, and its own internal locking was already limiting
us on the number of concurrent loads possible. Provided that most
concurrent accesses are getting hits in the WindowCache, or are for
different repositories on the same server, we shouldn't see a major
performance hit due to the more serialized loading.
I would have preferred to use a pool of RandomAccessFiles for each
pack, with threads borrowing an instance dedicated to that thread
whenever they needed to page in a window. This would permit much
higher levels of concurrency by using multiple file descriptors (and
file pointers) for each pack. However the code became too complex to
develop in any reasonable period of time, so I've chosen to retrofit
the existing code with more serialization instead.
Bug: 308945
Change-Id: I2e6e11c6e5a105e5aef68871b66200fd725134c9
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The strings are externalized into the root resource bundles.
The resource bundles are stored under the new "resources" source
folder to get proper maven build.
Strings from tests are, in general, not externalized. Only in
cases where it was necessary to make the test pass the strings
were externalized. This was typically necessary in cases where
e.getMessage() was used in assert and the exception message was
slightly changed due to reuse of the externalized strings.
Change-Id: Ic0f29c80b9a54fcec8320d8539a3e112852a1f7b
Signed-off-by: Sasa Zivkov <sasa.zivkov@sap.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The HTTP server side code now uses the same approach that the smart
HTTP client code uses when preparing a request body. The payload
is streamed into a TemporaryBuffer of limited size. If the entire
data fits, its compressed with gzip if the user agent supports that,
and a Content-Length header is used to transmit the fixed length
body to the peer. If however the data overflows the limited memory
segment, its streamed uncompressed to the peer.
One might initially think that larger contents which overflow
the buffer should also be compressed, rather than sent raw, since
they were deemed "large". But usually these larger contents are
actually a pack file which has been already heavily compressed by
Git specific routines. Trying to deflate that with gzip is probably
going to take up more space, not less, so the compression overhead
isn't worthwhile.
This buffer and compress optimization helps repositories with a
large number of references, as their text based advertisements
compress well. For example jgit's own native repository currently
requires 32,628 bytes for its full advertisement of 489 references.
Most repositories have fewer references, and thus could compress
their entire response in one buffer.
Change-Id: I790609c9f763339e0a1db9172aa570e29af96f42
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Don't copy and sort the set of references if they are passed through
in a RefMap or a SortedMap using the key's natural sort ordering.
Either map is already in the order we want to present the items
to the client in, so copying and sorting is a waste of local CPU
and memory.
Change-Id: I49ada7c1220e0fc2a163b9752c2b77525d9c82c1
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit actually does three major changes to the way references
are handled within JGit. Unfortunately they were easier to do as
a single massive commit than to break them up into smaller units.
Disambiguate symbolic references:
---------------------------------
Reporting a symbolic reference such as HEAD as though it were
any other normal reference like refs/heads/master causes subtle
programming errors. We have been bitten by this error on several
occasions, as have some downstream applications written by myself.
Instead of reporting HEAD as a reference whose name differs from
its "original name", report it as an actual SymbolicRef object
that the application can test the type and examine the target of.
With this change, Ref is now an abstract type with different
subclasses for the different types.
In the classical example of "HEAD" being a symbolic reference to
branch "refs/heads/master", the Repository.getAllRefs() method
will now return:
Map<String, Ref> all = repository.getAllRefs();
SymbolicRef HEAD = (SymbolicRef) all.get("HEAD");
ObjectIdRef master = (ObjectIdRef) all.get("refs/heads/master");
assertSame(master, HEAD.getTarget());
assertSame(master.getObjectId(), HEAD.getObjectId());
assertEquals("HEAD", HEAD.getName());
assertEquals("refs/heads/master", master.getName());
A nice side-effect of this change is the storage type of the
symbolic reference is no longer ambiguous with the storge type
of the underlying reference it targets. In the above example,
if master was only available in the packed-refs file, then the
following is also true:
assertSame(Ref.Storage.LOOSE, HEAD.getStorage());
assertSame(Ref.Storage.PACKED, master.getStorage());
(Prior to this change we returned the ambiguous storage of
LOOSE_PACKED for HEAD, which was confusing since it wasn't
actually true on disk).
Another nice side-effect of this change is all intermediate
symbolic references are preserved, and are therefore visible
to the application when they walk the target chain. We can
now correctly inspect chains of symbolic references.
As a result of this change the Ref.getOrigName() method has been
removed from the API. Applications should identify a symbolic
reference by testing for isSymbolic() and not by using an arcane
string comparsion between properties.
Abstract the RefDatabase storage:
---------------------------------
RefDatabase is now abstract, similar to ObjectDatabase, and a
new concrete implementation called RefDirectory is used for the
traditional on-disk storage layout. In the future we plan to
support additional implementations, such as a pure in-memory
RefDatabase for unit testing purposes.
Optimize RefDirectory:
----------------------
The implementation of the in-memory reference cache, reading, and
update routines has been completely rewritten. Much of the code
was heavily borrowed or cribbed from the prior implementation,
so copyright notices have been left intact as much as possible.
The RefDirectory cache no longer confuses symbolic references
with normal references. This permits the cache to resolve the
value of a symbolic reference as late as possible, ensuring it
is always current, without needing to maintain reverse pointers.
The cache is now 2 sorted RefLists, rather than 3 HashMaps.
Using sorted lists allows the implementation to reduce the
in-memory footprint when storing many refs. Using specialized
types for the elements allows the code to avoid additional map
lookups for auxiliary stat information.
To improve scan time during getRefs(), the lists are returned via
a copy-on-write contract. Most callers of getRefs() do not modify
the returned collections, so the copy-on-write semantics improves
access on repositories with a large number of packed references.
Iterator traversals of the returned Map<String,Ref> are performed
using a simple merge-join of the two cache lists, ensuring we can
perform the entire traversal in linear time as a function of the
number of references: O(PackedRefs + LooseRefs).
Scans of the loose reference space to update the cache run in
O(LooseRefs log LooseRefs) time, as the directory contents
are sorted before being merged against the in-memory cache.
Since the majority of stable references are kept packed, there
typically are only a handful of reference names to be sorted,
so the sorting cost should not be very high.
Locking is reduced during getRefs() by taking advantage of the
copy-on-write semantics of the improved cache data structure.
This permits concurrent readers to pull back references without
blocking each other. If there is contention updating the cache
during a scan, one or more updates are simply skipped and will
get picked up again in a future scan.
Writing to the $GIT_DIR/packed-refs during reference delete is
now fully atomic. The file is locked, reparsed fresh, and written
back out if a change is necessary. This avoids all race conditions
with concurrent external updates of the packed-refs file.
The RefLogWriter class has been fully folded into RefDirectory
and is therefore deleted. Maintaining the reference's log is
the responsiblity of the database implementation, and not all
implementations will use java.io for access.
Future work still remains to be done to abstract the ReflogReader
class away from local disk IO.
Change-Id: I26b9287c45a4b2d2be35ba2849daa316f5eec85d
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Clients can request smart fetch support by examining the info/refs URL
with the service parameter set to the magic git-upload-pack string:
GET /$GIT_DIR/info/refs?service=git-upload-pack HTTP/1.1
The response is formatted with the upload pack capabilities, using
the standard packet line formatter. A special header line is put
in front of the standard upload-pack advertisement to let clients
know the service was recognized and is supported.
If the requested service is disabled an authorization status code is
returned, allowing the user agent to retry once they have obtained
credentials from a human, in case authentication is required by
the configured UploadPackFactory implementation.
Change-Id: Ib0f1a458c88b4b5509b0f882f55f83f5752bc57a
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Clients can request smart push support by examining the info/refs URL
with the service parameter set to the magic git-receive-pack string:
GET /$GIT_DIR/info/refs?service=git-receive-pack HTTP/1.1
The response is formatted with the receive pack capabilities, using
the standard packet line formatter. A special header block is put
in front of the standard receive-pack advertisement to let clients
know the service was recognized and is supported.
If the requested service is disabled an authorization status code is
returned, allowing the user agent to retry once they have obtained
credentials from a human, in case authentication is required by
the configured ReceivePackFactory implementation.
Change-Id: Ie4f6e0c7b68a68ec4b7cdd5072f91dd406210d4f
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
This is a simple HTTP server that provides the minimum server side
support required for dumb (non-git aware) transport clients.
We produce the info/refs and objects/info/packs file on the fly
from the local repository state, but otherwise serve data as raw
files from the on-disk structure.
In the future we could better optimize the FileSender class and the
servlets that use it to take advantage of direct file to network
APIs in more advanced servlet containers like Jetty.
Our glue package borrows the idea of a micro embedded DSL from
Google Guice and uses it to configure a collection of Filters
and HttpServlets, all of which are matched against requests using
regular expressions. If a subgroup exists in the pattern, it is
extracted and used for the path info component of the request.
Change-Id: Ia0f1a425d07d035e344ae54faf8aeb04763e7487
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|