Since git-core ff5effd (v1.7.12.1) the native wire protocol transmits
the server and client implementation and version strings using
capability "agent=git/1.7.12.1" or similar.
Support this in JGit and hang the implementation data off UploadPack
and ReceivePack. On HTTP transports default to the User-Agent HTTP
header until the client overrides this with the optional capability
string in the first line.
Extract the user agent string into a UserAgent class under transport
where it can be specified to a different value if the application's
build process has broken the Implementation-Version header in the
JGit package.
Change-Id: Icfc6524d84a787386d1786310b421b2f92ae9e65
The clone or fetch depth is a valuable bit of information
for access logging. Create a public getter to faciliate access.
A precondition check prevents unintentional misuse when the
data isn't valid yet.
Change-Id: I4603d5fd3bd4a767e3e2419b0f2da3664cfbd7f8
Signed-off-by: David Pletcher <dpletcher@google.com>
cgit has this feature for some time. This will teach JGit to send symbolic refs,
too.
Change-Id: I7cb2ab4e6d31a838a0af92eac64535fdb66ed74a
Signed-off-by: Yuxuan 'fishy' Wang <fishywang@google.com>
UploadPack: Always make PackWriter.Statistics available
If the packer fails, still obtain the stats and make them available
to the logger and the caller. Failures can frequently happen when
a client disconnects in the middle of a pack stream. Server admins
may still want to examine the timing metrics from counting and
compressing phases.
Change-Id: Iceae4f68b5473f4223d85c9edfb57837fc818eed
In certain cases a JGit server updating an existing shallow client
selected a common ancestor that was behind the shallow edge of
the client. This allowed the server to assume the client had some
objects it did not have and allowed creation of pack deltas the
client could never inflate.
Any commit the client has advertised as shallow must be treated
by UploadPack server as though it has no parents. With no parents
the walker cannot visit graph history the client does not have,
and PackWriter cannot consider delta base candidates the client
is lacking.
Change-Id: I4922b9354df9f490966a586fb693762e897345a2
Propagate IOException where possible when getting refs.
Currently, Repository.getAllRefs() and Repository.getTags() silently
ignores an IOException and instead returns an empty map. Repository
is a public API and as such cannot be changed until the next major
revision change. Where possible, update the internal jgit APIs to
use the RefDatabase directly, since it propagates the error.
Change-Id: I4e4537d8bd0fa772f388262684c5c4ca1929dc4c
Change-Id: I9754e2124c0fe6ad2dbde5597c3ed10f1c3efef5
Signed-off-by: Lars Vogel <Lars.Vogel@gmail.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Advertise capabilities with no refs in upload service.
With reference hiding, it is possible for a repository to appear
empty when all refs are hidden. This causes capabilities to not be
advertised either, since they are published with the first reference,
breaking fetch by SHA1 support.
Always advertise the capabilites by publishing the symbolic capabilities
reference when the repository has no references to advertise (similar to
the receive service).
Change-Id: I8060e430ee03571dc51239e702864c85e888505c
UploadPack can be invoked with no capabilities selected by the
client if the client is an ancient version of Git that nobody in
their right mind should still be using. Or if the client is very
broken and does not want to use any of the newer features added to
the protocol since its inception.
Change-Id: I3baa6f90e6a41a37a8eab8449a3cc41f4efcb91a
Change RequestValidator parameter to ObjectId list
Instead of RevObject list, this allows a custom request validator to be called
on SHA-1's corresponding to objects that may not exist in repository storage
Change-Id: I19bb667beff0d0c144150a61d7a1dc6c9703be7f
Signed-off-by: Greg Hill <greghill@google.com>
Make the existing concrete implementations public as well so custom
implementations may delegate to them where appropriate. Treat all custom
implementations as providing allow-tip-sha1 in want.
Change-Id: If386fe25c0d3b4551a97c16a22350714453b03e9
Associate each RequestPolicy with an implementation of a
RequestValidator interface that contains the validation logic. The
checkWants method is only called if there are wants that were not
advertised, since clients may always request any advertised want
according to the git protocol. Calling the method only once at the
end of parsing the want list also means policy implementations can be
stateful, unlike the previous switch statement inside a loop.
For the special handling of unidirectional pipes, simply check
isBiDirectional() and delegate to other implementations if necessary.
Change-Id: I52a174999ac3a5aca46d3469cb0b81edd1710580
UploadPack: configure RequestPolicy with TransportConfig
C git 1.8.2 supports setting the equivalent of RequestPolicy.TIP with
uploadpack.allowtipsha1. Parse this into TransportConfig and use it
from UploadPack. An explicitly set RequestPolicy overrides the config,
and the policy may still be upgraded on a unidirectional connection to
avoid races.
Defer figuring out the effective RequestPolicy to later in the
process. This is a minor semantic change to fix a bug: previously,
calling setRequestPolicy(ADVERTISED) _after_ calling
setBiDirectionalPipe(true) would have reintroduced the race condition
otherwise fixed by 01888db892.
Change-Id: I264e028a76574434cecb34904d9f5944b290df78
This protocol capability, new in C git 1.8.2, corresponds to
RequestPolicy.TIP, so advertise it if that request policy was set.
Change-Id: I0d52af8a7747e951a87f060a5124f822ce1b2b26
Add RequestPolicy.TIP to allow fetching non-advertised ref tips
Users of UploadPack may set a custom RefFilter or AdvertisedRefsHook
that limits which refs are advertised, but clients may learn of a
SHA-1 that the server should have as a ref tip through some
alternative means. Support serving such objects from the server side
with a new RequestPolicy.
As with ADVERTISED, we need a special relaxed RequestPolicy to allow
commits reachable from the set of valid tips for unidirectional
connections.
Change-Id: I0d0cc4f8ee04d265e5be8221b9384afb1b374315
Use NullOutputStream not DisabledOutputStream in UploadPack
The stream should not throw IllegalStateException if it is off.
Flush the stream after the hook runs, in case any messages need
to be sent ahead of the pack.
Change-Id: I21c7a0258ab1308406d226293fa0e7da69b4f57b
Allow PreUploadHook.onSendPack to send messages to the client
Before transmitting to the client a hook may want to send along
a text message ahead of the pack, such as a "message of the day".
Enable this usage by mirroring the message sending API from
ReceivePack on the UploadPack instance, using the side band.
Change-Id: I31cd254a4ddb816641397a3e9c2c20212471c37f
Disable CRC32 computation when no PackIndex will be created
If a server is streaming 3GiB worth of pack data to a client there
is no reason to compute the CRC32 checksum on the objects. The
CRC32 code computed by PackWriter is used only in the new index
created by writeIndex(), which is never invoked for the native Git
network protocols.
Object reuse may still compute its own CRC32 to verify the data
being copied from an existing pack has not been corrupted. This
check is done by the ObjectReader that implements ObjectReuseAsIs
and has no relationship to the CRC32 being skipped during output.
Change-Id: I05626f2e0d6ce19119b57d8a27193922636d60a7
JGit 3.0: move internal classes into an internal subpackage
This breaks all existing callers once. Applications are not supposed
to build against the internal storage API unless they can accept API
churn and make necessary updates as versions change.
Change-Id: I2ab1327c202ef2003565e1b0770a583970e432e9
Avoid looking at UNREACHABLE_GARBAGE for client have lines
Clients send a bunch of unknown objects to UploadPack on each round
of negotiation. Many of these are not known to the server, which
leads the implementation to be looking at indexes for garbage packs.
Disable examining the index of a garbage pack, allowing servers to
avoid reading them from disk during negotiation.
The effect of this change is the server will only ACK a have line
if the object was reachable during the last garbage collection,
or was recently added to the repository. For most repositories
there is no impact in this behavior change.
If a repository rewinds a branch, runs GC, and then resets the
branch back to where it was before, the now current tip is going to
be skipped by this change. A client that has the commit may wind up
getting a slightly larger data transfer from the server as an older
common ancestor will be chosen during negotiation. This is fixable
on the server side by running GC again to correct the layout of
objects in pack files.
Change-Id: Icd550359ef70fc7b701980f9b13d923fd13c744b
Simplify UploadPack by parsing wants separately from haves
The DHT backend was very slow at parsing objects. To work around
that performance limitation I obfuscated UploadPack by folding both
the want and have sets together in a single parse queue. Since DHT
was removed the complexity is no longer constructive to JGit.
Doing this refactoring prepares the code for a slightly future
change where the have lines need to be handled specially from the
want lines. Splitting the parsing up into two phases makes such
a modification trivial.
Change-Id: If7aad533b82448bbb688278e21f709282e5ccf4b
A few classes such as Constanrs are marked with @SuppressWarnings, as are
toString() methods with many liternal, but otherwise $NLS-n$ is used for
string containing text that should not be translated. A few literals may
fall into the gray zone, but mostly I've tried to only tag the obvious
ones.
Change-Id: I22e50a77e2bf9e0b842a66bdf674e8fa1692f590
For streams that should not be closed, i.e. don't own an underlying
stream, and in-memory streams that do not need to be closed we just
suppress the warning. This mostly apply to test cases. GC is enough.
For streams with external resources (i.e. files) we add the necessary
call to close().
Change-Id: I4d883ba2e7d07f199fe57ccb3459ece00441a570
When a client POSTs to /git-{upload,receive}-pack, the first line
includes their client capabilities. As soon as the C git client sends
side-band(-64k), it goes into a state where it chokes on data not sent
in a valid sideband channel.
GitSmartHttpTools.sendError() is called early in the request, likely
before a {Upload,Receive}Pack handler is assigned or, even so, before it
has read the request. In some cases we must read the first line manually
within sendError() to tell whether sideband is needed.
Change-Id: I8277fd45a4ec3b71fa8f87404b4f5d1a09e0f384
Modify refs in UploadPack/ReceivePack using a hook interface
This is intended to replace the RefFilter interface (but does not yet,
for backwards compatibility). That interface required lots of extra
scanning and copying in filter cases such as only advertising a subtree
of the refs directory. Instead, provide a hook that can be executed
right before ref advertisement, using the public methods on
UploadPack/ReceivePack to explicitly set the map of advertised refs.
Change-Id: I0067019a191c8148af2cfb71a675f2258c5af0ca
Clients cache the set of advertised references at the start of a
negotiation, and keep replaying the same "want SHA1" list to the
server on each negotiation step. If another client pushes into
a branch and moves it by fast-forward, any request to obtain that
branch's prior SHA-1 is still valid, the commit is reachable from
the new position of the reference. Unfortunately the fast-forward
causes smart HTTP negotations to fail, as the server no longer is
advertising that prior SHA-1.
Instead of causing clients to fail out with a "want invalid" error
and forcing the end-user retry, possibly getting into a never ending
try-fail-retry race while other clients are pushing into the same
busy repository, allow the slightly stale want request so long as
it is still reachable.
C Git implemented this same change recently to fix races on the
smart HTTP protocol when the C Git git-http-backend is used.
The new RequestPolicy feature also allows server authors to make
an even more lenient configuration that exports any SHA-1 to the
client. This might be useful in certain settings where a server
has authenticated the client as the "repository owner" and wants
to allow them to grab any content from the server as a complete
unbroken history chain.
The new setAdvertisedRefs() method allows server authors to manually
fix the references that are advertised, possibly bypassing the
getAllRefs() call on the Repository object.
Change-Id: I7cdb563bf9c55c83653f217f6e53c3add55a0541
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This implements the server side of shallow clones only (i.e.
git-upload-pack), not the client side.
CQ: 5517
Bug: 301627
Change-Id: Ied5f501f9c8d1fe90ab2ba44fac5fa67ed0035a4
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
During parsing these are used with contains(). If they are a List
type, the contains operation is not efficient. Some callers such
as UploadPack often pass a List here, so convert to Set when the
type isn't efficient for contains().
Change-Id: If948ae3bf1f46e756bd2d5db14795e12ba7a6207
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
If an internal exception occurs while packing and the request
needs to abort, the HTTP response might already be committed due
to progress message having already been delivered to the client.
This prevents UploadPackServlet from resetting the response and
sending back an HTTP 500 response.
Try to catch all exceptions and report internal errors over the
sideband stream or as an ERR command during the initial ACK/NAK
negotiation phase. This allows JGit to transmit an error message
that the user will receive on their console without needing to
worry about resetting the (already gone) HTTP response.
Change-Id: Ie393fb8bb55d2b79ab1276adf71c781c1807f9fe
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Some servlet containers require the servlet to read the EOF marker
from the input stream before a response can be output if the stream
is using "Transfer-Encoding: chunked"... which is typical for any
sort of large push to a repository over smart HTTP.
Ensure the EOF is always read by the PackParser when it is handling
the stream, and fail fast if there is more data present than expected
since this does indicate a protocol error.
Also ensure the EOF is read by UploadPack before it starts to output
a partial response using packing progress meters.
Change-Id: I131db9dea20b2324cb7c3272a814f21296bc64bd
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Smart HTTP clients may request both multi_ack_detailed and no-done in
the same request to prevent the client from needing to send a "done"
line to the server in response to a server's "ACK %s ready".
For smart HTTP, this can save 1 full HTTP RPC in the fetch exchange,
improving overall latency when incrementally updating a client that
has not diverged very far from the remote repository.
Unfortuantely this capability cannot be enabled for the traditional
bi-directional connections. multi_ack_detailed has the client sending
more "have" lines at the same time that the server is creating the
"ACK %s ready" and writing out the PACK stream, resulting in some race
conditions and/or deadlock, depending on how the pipe buffers are
implemented. For very small updates, a server might actually be able
to send "ACK %s ready", then the PACK, and disconnect before the
client even finishes sending its first batch of "have" lines. This
may cause the client to fail with a broken pipe exception. To avoid
all of these potential problems, "no-done" is restricted only to the
smart HTTP variant of the protocol.
Change-Id: Ie0d0a39320202bc096fec2e97cb58e9efd061b2d
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
UploadPack: Add a PreUploadHook to monitor and control behavior
Embedding applications can use this hook to watch actions within
UploadPack and possibly reject them. This could be useful to prevent
clones of a large repository from this server, or to stop abusive
negotiation rounds that offer thousands of objects in a single batch.
Change-Id: Id96f1885ac4d61f22c80b6418fff54184b7348ba
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Instead of aborting hard with a server-side exception, report an error
to the client with "ERR %s" in a context where the client is expecting
ACK/NAK. Older clients will report this text to the user, but newer
ones know how to format this message in a more user-friendly way.
Change-Id: I1879b38988ba66f648c069c10dbfa14c3f34adb2
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Improve native Git transport when following repository
If the client is only following the remote repository and has not
created any new non-common commits, the client will wind up sending
a "have %s" line for each tag in the repository. For some projects
like git.git, this is 339 tags and growing, resulting in more than
16 KiB needing to be POSTed over 12 HTTP requests.
Teach UploadPack (server side) to always execute the okToGiveUp()
logic at least once per negotiation round to determine if the server
can compute a pack right now. If it can, shove in an "ACK %s ready"
message to tell the client this and try to prevent receiving ancient
tags in future negotiation rounds.
Teach BasePackFetchConnection (client side) to honor a "ACK %s ready"
from the remote and break out of its SEND_HAVE loop once the remote
knows it can create a pack. This avoids sending the remaining 307
tags of git.git.
These two changes together reduce the number of HTTP RPCs from 13
down to 3 in order to fetch from git.git over smart HTTP. If either
side is missing the change, the older behavior (and its 13 RPCs)
is used.
Change-Id: I64736318fd0abf9ee5e56bd0b737707adb580b37
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
PackWriter: Avoid CRC-32 validation when feeding IndexPack
There is no need to validate the object contents during
copyObjectAsIs if the result is going to be parsed by unpack-objects
or index-pack. Both programs will compute the SHA-1 of the object,
and also validate most of the pack structure. For git daemon
like servers, this work is already done on the client end of the
connection, so the server doesn't need to repeat that work itself.
Disable object validation for the 3 transport cases where we know
the remote side will handle object validation for us (push, bundle
creation, and upload pack). This improves performance on the server
side by reducing the work that must be done.
Change-Id: Iabb78eec45898e4a17f7aab3fb94c004d8d69af6
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
JGit doesn't generate deltas for commit or tag objects when it packs
a repository from scratch. This is an explicit design decision that
is (mostly) justified by the fact that these objects do not delta
compress well.
Annotated tags are made once on stable points of the project history,
it is unlikely they will ever appear again with sufficient common
text to justify using a delta over just deflating the raw content.
JGit never tries to delta compress annotated tags and I take the
stance that these are best stored as non-deltas given how frequently
they might be accessed by repository viewers.
Commits only have sufficient common text when they are cherry-picked
to forward-port or back-port a change from one branch to another.
Even in these cases the distance between the commits as returned
by the log traversal has to be small enough that they would both
appear in the delta search window at the same time in order to
delta compress one of the messages against the other. JGit never
tries to delta compress commits, as it requires a lot of CPU time
but typically does not produce a smaller pack file.
Avoid reusing deltas for either of these types when constructing a
new pack. To avoid killing performance during serving of network
clients, UploadPack disables this code change by allowing PackWriter
to reuse delta commits. Repositories that were already repacked by
C Git will not have their delta commits decompressed and recompressed
on the fly during object writing, saving server-side CPU resources.
Change-Id: I749407e7c5c677e05e4d054b40db7656cfa7fca8
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Many source browsers and network related tools like UploadPack need
to find and parse the target of all branches and annotated tags
within the repository during their startup phase. Clustering these
together into the same part of the pack file will improve locality,
reducing thrashing when an application starts and needs to load
all of these into memory at once.
To prevent bottlenecking basic log viewing tools that are scannning
backwards from the tip of a current branch (and don't need tags)
we place this cluster of older targets after 4096 newer commits
have already been placed into the pack stream. 4096 was chosen as
a rough guess, but was based on a few factors:
- log viewers typically show 5-200 commits per page
- users only view the first page or two
- DHT can cram 2200-4000 commits per 1 MiB chunk
thus these will fall into the second commit chunk (roughly)
Unfortunately this placement hurts history tools that are scanning
backwards through the commit graph and completely ignored tags or
branch heads when they started.
An ancient tagged commit is no longer positioned behind its first
child (its now much earlier), resulting in a page fault for the
parser to reload this cluster of objects on demand. This may be
an acceptable loss. If a user is walking backwards and has already
scanned through more than 4096 commits of history, waiting for the
region to reload isn't really that bad compared to the amount of
time already spent.
If the repository is so small that there are less than 4096 commits,
this change has no impact on the placement of objects.
Change-Id: If3052e430d305e17878d94145c93754f56b74c61
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>