summaryrefslogtreecommitdiffstats
path: root/org.eclipse.jgit.http.server/src
Commit message (Collapse)AuthorAgeFilesLines
* Fix error handling in RepositoryFilterShawn O. Pearce2011-05-041-3/+6
| | | | | | | | | | The filter did not correctly match smart HTTP client requests, so it always fell back on HTTP status codes for errors. This usually causes a smart client to retry a dumb request, which is not what the server wants. Change-Id: I42592378dc42fbe308ef30a2923786c690f668a9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* Implement the no-done capabilityShawn O. Pearce2011-04-211-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Smart HTTP clients may request both multi_ack_detailed and no-done in the same request to prevent the client from needing to send a "done" line to the server in response to a server's "ACK %s ready". For smart HTTP, this can save 1 full HTTP RPC in the fetch exchange, improving overall latency when incrementally updating a client that has not diverged very far from the remote repository. Unfortuantely this capability cannot be enabled for the traditional bi-directional connections. multi_ack_detailed has the client sending more "have" lines at the same time that the server is creating the "ACK %s ready" and writing out the PACK stream, resulting in some race conditions and/or deadlock, depending on how the pipe buffers are implemented. For very small updates, a server might actually be able to send "ACK %s ready", then the PACK, and disconnect before the client even finishes sending its first batch of "have" lines. This may cause the client to fail with a broken pipe exception. To avoid all of these potential problems, "no-done" is restricted only to the smart HTTP variant of the protocol. Change-Id: Ie0d0a39320202bc096fec2e97cb58e9efd061b2d Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* smart HTTP: Return errors inside payloadShawn O. Pearce2011-04-014-7/+61
| | | | | | | | | | | | | | | | When the client is clearly making a smart HTTP request to our smart HTTP server, return any errors like RepositoryNotFoundException or ServiceNotEnabledException inside of the payload as a Git level ERR message, rather than an HTTP error code. This prevents the C Git command line client from retrying a failed "$URL/info/refs?service=git-upload-pack" request without the smart service URL, only to fail again with "403 Forbidden" when the dumb as-is service has been disabled by the server configuration, or is unavailable because the repository is not on the local filesystem. Change-Id: I57e8756d5026e885e0ca615979bfcd729703be6c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* UploadPack: Add a PreUploadHook to monitor and control behaviorShawn O. Pearce2011-04-012-1/+16
| | | | | | | | | | Embedding applications can use this hook to watch actions within UploadPack and possibly reject them. This could be useful to prevent clones of a large repository from this server, or to stop abusive negotiation rounds that offer thousands of objects in a single batch. Change-Id: Id96f1885ac4d61f22c80b6418fff54184b7348ba Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* Allow application filters on smart HTTP operationsShawn O. Pearce2011-04-015-64/+228
| | | | | | | | | | | | | | | Permit applications embedding GitServlet to wrap the info/refs?service=$name and /$name operations with a servlet Filter. To help applications inspect state of the operation, expose the UploadPack or ReceivePack object into a request attribute. This can be useful for logging, or to implement throttling of requests like Gerrit Code Review uses to prevent server overload. Change-Id: Ib8773c14e2b7a650769bd578aad745e6651210cb Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* Use generics in RepositoryFilter constructorRobin Rosenberg2011-02-251-2/+2
| | | | Change-Id: I461786baffab15515365ead1d9c350a1880443ea
* smart-http: Support progress in ReceivePackShawn O. Pearce2011-02-151-1/+6
| | | | | | | | | | | | | | As PackParser supports a progress meter for the "Resolving deltas" phase of its work, we should export this to smart HTTP clients so they know the server is still working on their (large) upload. However this isn't as simple as just dropping in a binding for the SmartOutputStream to flush when its told to. We want to avoid spurious flushes triggered by the use of sideband, or the status report formatting in the send-pack/receive-pack protocol. Change-Id: Ibd88022a298c5fed0edb23dfaf2e90278807ba8b Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* smart-http: Fix recognition of gzip encodingShawn O. Pearce2011-02-151-2/+17
| | | | | | | | | | | | | | | | | | | | | | Some clients coming through proxies may advertise a different Accept-Encoding, for example "Accept-Encoding: gzip(proxy)". Matching by substring causes us to identify this as a false positive; that the client understands gzip encoding and will inflate the response before reading it. In this particular case however it doesn't. Its the reverse proxy server in front of JGit letting us know the proxy<->JGit link can be gzip compressed, while the client<->proxy part of the link is not: client <-- no gzip --> proxy <-- gzip --> JGit Use a more standard method of parsing by splitting the value into tokens, and only using gzip if one of the tokens is exactly the string "gzip". Add a unit test to make sure this isn't broken in the future. Change-Id: I30cda8a6d11ad235b56457adf54a2d27095d964e Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* daemon: Use HTTP's resolver and factory patternShawn O. Pearce2011-02-1416-544/+41
| | | | | | | | | | | | | | | | | | | Using a resolver and factory pattern for the anonymous git:// Daemon class makes transport.Daemon more useful on non-file storage systems, or in embedded applications where the caller wants more precise control over the work tasks constructed within the daemon. Rather than defining new interfaces, move the existing HTTP ones into transport.resolver and make them generic on the connection handle type. For HTTP, continue to use HttpServletRequest, and for transport.Daemon use DaemonClient. To remain compatible with transport.Daemon, FileResolver needs to learn how to use multiple base directories, and how to export any Repository instance at a fixed name. Change-Id: I1efa6b2bd7c6567e983fbbf346947238ea2e847e Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* RefAdvertiser: Avoid object parsingShawn O. Pearce2011-02-021-31/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | It isn't strictly necessary to validate every reference's target object is reachable in the repository before advertising it to a client. This is an expensive operation when there are thousands of references, and its very unlikely that a reference uses a missing object, because garbage collection proceeds from the references and walks down through the graph. So trying to hide a dangling reference from clients is relatively pointless. Even if we are trying to avoid giving a client a corrupt repository, this simple check isn't sufficient. It is possible for a reference to point to a valid commit, but that commit to have a missing blob in its root tree. This can be caused by staging a file into the index, waiting several weeks, then committing that file while also racing against a prune. The prune may delete the blob, since its modification time is more than 2 weeks ago, but retain the commit, since its modification time is right now. Such graph corruption is already caught during PackWriter as it enumerates the graph from the client's want list and digs back to the roots or common base. Leave the reference validation also for that same phase, where we know we have to parse the object to support the enumeration. Change-Id: Iee70ead0d3ed2d2fcc980417d09d7a69b05f5c2f Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* Ensure RevWalk is released when doneShawn O. Pearce2010-06-293-24/+39
| | | | | | | | | | | | | | | | | | | | | | Update a number of calling sites of RevWalk to ensure the walker's internal ObjectReader is released after the walk is no longer used. Because the ObjectReader is likely to hold onto a native resource like an Inflater, we don't want to leak them outside of their useful scope. Where possible we also try to share ObjectReaders across several walk pools, or between a walker and a PackWriter. This permits the ObjectReader to actually do some caching if it felt inclined to do so. Not everything was updated, we'll probably need to come back and update even more call sites, but these are some of the biggest offenders. Test cases in particular aren't updated. My plan is to move most storage-agnostic tests onto some purely in-memory storage solution that doesn't do compression. Change-Id: I04087ec79faeea208b19848939898ad7172b6672 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* Move FileRepository to storage.file.FileRepositoryShawn O. Pearce2010-06-263-4/+4
| | | | | | | | | | | | This move isolates all of the local file specific implementation code into a single package, where their package-private methods and support classes are properly hidden away from the rest of the core library. Because of the sheer number of files impacted, I have limited this change to only the renames and the updated imports. Change-Id: Icca4884e1a418f83f8b617d0c4c78b73d8a4bd17 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* Allow Repository.getDirectory() to be nullShawn O. Pearce2010-06-252-1/+5
| | | | | | | | | | | | | | Some types of repositories might not be stored on local disk. For these, they will most likely return null for getDirectory() as the java.io.File type cannot describe where their storage is, its not in the host's filesystem. Document that getDirectory() can return null now, and update all current non-test callers in JGit that might run into problems on such repositories. For the most part, just act like its bare. Change-Id: I061236a691372a267fd7d41f0550650e165d2066 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* UploadPack: Permit flushing progress messages under smart HTTPShawn O. Pearce2010-06-231-1/+6
| | | | | | | | | | | | | | | | | | | | If UploadPack invokes flush() on the output stream we pass it, its most likely the progress messages coming down the side band stream. As pack generation can take a while, we want to push that down at the client as early as we can, to keep the connection alive, and to let the user know we are still working on their behalf. Ensure we dump the temporary buffer whenever flush() is invoked, otherwise the messages don't get sent in a timely fashion to the user agent (in this case, git fetch). We specifically don't implement flush() for ReceivePack right now, as that protocol currently does not provide progress messages to the user, but it does invoke flush several times, as the different streams include '0000' type flush-pkts to denote various end points. Change-Id: I797c90a2c562a416223dc0704785f61ac64e0220 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* git-servlet: Fix comparing uploadFactory with the wrong DISABLED instanceRobin Rosenberg2010-06-141-1/+1
| | | | | Change-Id: I53f71dc0e3c68839da5ff5a2e0f3eeb8340e4793 Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
* Repository can be configured with FSMarc Strapetz2010-06-041-1/+2
| | | | | | | | | | | | | On Windows, FS_Win32_Cygwin has been used if a Cygwin Git installation is present in the PATH. Assuming that the user works with the Cygwin Git installation may result in unnecessary overhead if he actually does not. Applications built on top of jgit may have more knowledge on the actually used Git client (Cygwin or not) and hence should be able to configure which FS to use accordingly. Change-Id: Ifc4278078b298781d55cf5421e9647a21fa5db24
* Don't use interruptable pread() to access pack filesShawn O. Pearce2010-05-271-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The J2SE NIO APIs require that FileChannel close the underlying file descriptor if a thread is interrupted while it is inside of a read or write operation on that channel. This is insane, because it means we cannot share the file descriptor between threads. If a thread is in the middle of the FileChannel variant of IO.readFully() and it receives an interrupt, the pack will be automatically closed on us. This causes the other threads trying to use that same FileChannel to receive IOExceptions, which leads to the pack getting marked as invalid. Once the pack is marked invalid, JGit loses access to its entire contents and starts to report MissingObjectExceptions. Because PackWriter must ensure that the chosen pack file stays available until the current object's data is fully copied to the output, JGit cannot simply reopen the pack when its automatically closed due to an interrupt being sent at the wrong time. The pack may have been deleted by a concurrent `git gc` process, and that open file descriptor might be the last reference to the inode on disk. Once its closed, the PackWriter loses access to that object representation, and it cannot complete sending the object the client. Fortunately, RandomAccessFile's readFully method does not have this problem. Interrupts during readFully() are ignored. However, it requires us to first seek to the offset we need to read, then issue the read call. This requires locking around the file descriptor to prevent concurrent threads from moving the pointer before the read. This reduces the concurrency level, as now only one window can be paged in at a time from each pack. However, the WindowCache should already be holding most of the pages required to handle the working set for a process, and its own internal locking was already limiting us on the number of concurrent loads possible. Provided that most concurrent accesses are getting hits in the WindowCache, or are for different repositories on the same server, we shouldn't see a major performance hit due to the more serialized loading. I would have preferred to use a pool of RandomAccessFiles for each pack, with threads borrowing an instance dedicated to that thread whenever they needed to page in a window. This would permit much higher levels of concurrency by using multiple file descriptors (and file pointers) for each pack. However the code became too complex to develop in any reasonable period of time, so I've chosen to retrofit the existing code with more serialization instead. Bug: 308945 Change-Id: I2e6e11c6e5a105e5aef68871b66200fd725134c9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* Externalize strings from JGitSasa Zivkov2010-05-1912-24/+122
| | | | | | | | | | | | | | | The strings are externalized into the root resource bundles. The resource bundles are stored under the new "resources" source folder to get proper maven build. Strings from tests are, in general, not externalized. Only in cases where it was necessary to make the test pass the strings were externalized. This was typically necessary in cases where e.getMessage() was used in assert and the exception message was slightly changed due to reuse of the externalized strings. Change-Id: Ic0f29c80b9a54fcec8320d8539a3e112852a1f7b Signed-off-by: Sasa Zivkov <sasa.zivkov@sap.com>
* http.server: Use TemporaryBuffer and compress some responsesShawn O. Pearce2010-03-126-33/+149
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The HTTP server side code now uses the same approach that the smart HTTP client code uses when preparing a request body. The payload is streamed into a TemporaryBuffer of limited size. If the entire data fits, its compressed with gzip if the user agent supports that, and a Content-Length header is used to transmit the fixed length body to the peer. If however the data overflows the limited memory segment, its streamed uncompressed to the peer. One might initially think that larger contents which overflow the buffer should also be compressed, rather than sent raw, since they were deemed "large". But usually these larger contents are actually a pack file which has been already heavily compressed by Git specific routines. Trying to deflate that with gzip is probably going to take up more space, not less, so the compression overhead isn't worthwhile. This buffer and compress optimization helps repositories with a large number of references, as their text based advertisements compress well. For example jgit's own native repository currently requires 32,628 bytes for its full advertisement of 489 references. Most repositories have fewer references, and thus could compress their entire response in one buffer. Change-Id: I790609c9f763339e0a1db9172aa570e29af96f42 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* Optimize RefAdvertiser performance by avoiding sortingShawn O. Pearce2010-01-231-1/+1
| | | | | | | | | | | Don't copy and sort the set of references if they are passed through in a RefMap or a SortedMap using the key's natural sort ordering. Either map is already in the order we want to present the items to the client in, so copying and sorting is a waste of local CPU and memory. Change-Id: I49ada7c1220e0fc2a163b9752c2b77525d9c82c1 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* Rewrite reference handling to be abstract and accurateShawn O. Pearce2010-01-231-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit actually does three major changes to the way references are handled within JGit. Unfortunately they were easier to do as a single massive commit than to break them up into smaller units. Disambiguate symbolic references: --------------------------------- Reporting a symbolic reference such as HEAD as though it were any other normal reference like refs/heads/master causes subtle programming errors. We have been bitten by this error on several occasions, as have some downstream applications written by myself. Instead of reporting HEAD as a reference whose name differs from its "original name", report it as an actual SymbolicRef object that the application can test the type and examine the target of. With this change, Ref is now an abstract type with different subclasses for the different types. In the classical example of "HEAD" being a symbolic reference to branch "refs/heads/master", the Repository.getAllRefs() method will now return: Map<String, Ref> all = repository.getAllRefs(); SymbolicRef HEAD = (SymbolicRef) all.get("HEAD"); ObjectIdRef master = (ObjectIdRef) all.get("refs/heads/master"); assertSame(master, HEAD.getTarget()); assertSame(master.getObjectId(), HEAD.getObjectId()); assertEquals("HEAD", HEAD.getName()); assertEquals("refs/heads/master", master.getName()); A nice side-effect of this change is the storage type of the symbolic reference is no longer ambiguous with the storge type of the underlying reference it targets. In the above example, if master was only available in the packed-refs file, then the following is also true: assertSame(Ref.Storage.LOOSE, HEAD.getStorage()); assertSame(Ref.Storage.PACKED, master.getStorage()); (Prior to this change we returned the ambiguous storage of LOOSE_PACKED for HEAD, which was confusing since it wasn't actually true on disk). Another nice side-effect of this change is all intermediate symbolic references are preserved, and are therefore visible to the application when they walk the target chain. We can now correctly inspect chains of symbolic references. As a result of this change the Ref.getOrigName() method has been removed from the API. Applications should identify a symbolic reference by testing for isSymbolic() and not by using an arcane string comparsion between properties. Abstract the RefDatabase storage: --------------------------------- RefDatabase is now abstract, similar to ObjectDatabase, and a new concrete implementation called RefDirectory is used for the traditional on-disk storage layout. In the future we plan to support additional implementations, such as a pure in-memory RefDatabase for unit testing purposes. Optimize RefDirectory: ---------------------- The implementation of the in-memory reference cache, reading, and update routines has been completely rewritten. Much of the code was heavily borrowed or cribbed from the prior implementation, so copyright notices have been left intact as much as possible. The RefDirectory cache no longer confuses symbolic references with normal references. This permits the cache to resolve the value of a symbolic reference as late as possible, ensuring it is always current, without needing to maintain reverse pointers. The cache is now 2 sorted RefLists, rather than 3 HashMaps. Using sorted lists allows the implementation to reduce the in-memory footprint when storing many refs. Using specialized types for the elements allows the code to avoid additional map lookups for auxiliary stat information. To improve scan time during getRefs(), the lists are returned via a copy-on-write contract. Most callers of getRefs() do not modify the returned collections, so the copy-on-write semantics improves access on repositories with a large number of packed references. Iterator traversals of the returned Map<String,Ref> are performed using a simple merge-join of the two cache lists, ensuring we can perform the entire traversal in linear time as a function of the number of references: O(PackedRefs + LooseRefs). Scans of the loose reference space to update the cache run in O(LooseRefs log LooseRefs) time, as the directory contents are sorted before being merged against the in-memory cache. Since the majority of stable references are kept packed, there typically are only a handful of reference names to be sorted, so the sorting cost should not be very high. Locking is reduced during getRefs() by taking advantage of the copy-on-write semantics of the improved cache data structure. This permits concurrent readers to pull back references without blocking each other. If there is contention updating the cache during a scan, one or more updates are simply skipped and will get picked up again in a future scan. Writing to the $GIT_DIR/packed-refs during reference delete is now fully atomic. The file is locked, reparsed fresh, and written back out if a change is necessary. This avoids all race conditions with concurrent external updates of the packed-refs file. The RefLogWriter class has been fully folded into RefDirectory and is therefore deleted. Maintaining the reference's log is the responsiblity of the database implementation, and not all implementations will use java.io for access. Future work still remains to be done to abstract the ReflogReader class away from local disk IO. Change-Id: I26b9287c45a4b2d2be35ba2849daa316f5eec85d Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* server side: smart fetch over HTTPShawn O. Pearce2010-01-124-0/+312
| | | | | | | | | | | | | | | | | | | | Clients can request smart fetch support by examining the info/refs URL with the service parameter set to the magic git-upload-pack string: GET /$GIT_DIR/info/refs?service=git-upload-pack HTTP/1.1 The response is formatted with the upload pack capabilities, using the standard packet line formatter. A special header line is put in front of the standard upload-pack advertisement to let clients know the service was recognized and is supported. If the requested service is disabled an authorization status code is returned, allowing the user agent to retry once they have obtained credentials from a human, in case authentication is required by the configured UploadPackFactory implementation. Change-Id: Ib0f1a458c88b4b5509b0f882f55f83f5752bc57a Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* server side: smart push over HTTPShawn O. Pearce2010-01-125-5/+482
| | | | | | | | | | | | | | | | | | | | Clients can request smart push support by examining the info/refs URL with the service parameter set to the magic git-receive-pack string: GET /$GIT_DIR/info/refs?service=git-receive-pack HTTP/1.1 The response is formatted with the receive pack capabilities, using the standard packet line formatter. A special header block is put in front of the standard receive-pack advertisement to let clients know the service was recognized and is supported. If the requested service is disabled an authorization status code is returned, allowing the user agent to retry once they have obtained credentials from a human, in case authentication is required by the configured ReceivePackFactory implementation. Change-Id: Ie4f6e0c7b68a68ec4b7cdd5072f91dd406210d4f Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* Simple dumb HTTP server for GitShawn O. Pearce2010-01-1225-0/+3165
This is a simple HTTP server that provides the minimum server side support required for dumb (non-git aware) transport clients. We produce the info/refs and objects/info/packs file on the fly from the local repository state, but otherwise serve data as raw files from the on-disk structure. In the future we could better optimize the FileSender class and the servlets that use it to take advantage of direct file to network APIs in more advanced servlet containers like Jetty. Our glue package borrows the idea of a micro embedded DSL from Google Guice and uses it to configure a collection of Filters and HttpServlets, all of which are matched against requests using regular expressions. If a subgroup exists in the pattern, it is extracted and used for the path info component of the request. Change-Id: Ia0f1a425d07d035e344ae54faf8aeb04763e7487 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>