mirrors/jgit - jgit - source @ dussan.org

Commit-Graf

Upphovsman	SHA1	Meddelande	Datum
Dave Borowitz	b646578d89	Add a method to DfsOutputStream to read as an InputStream Change-Id: I0ec1f17a88bc14f22c10f9bc8d6f5b5118410e3a	11 år sedan
Shawn Pearce	f32b861243	JGit 3.0: move internal classes into an internal subpackage This breaks all existing callers once. Applications are not supposed to build against the internal storage API unless they can accept API churn and make necessary updates as versions change. Change-Id: I2ab1327c202ef2003565e1b0770a583970e432e9	11 år sedan
Shawn O. Pearce	fa4cc2475f	DFS: A storage layer for JGit In practice the DHT storage layer has not been performing as well as large scale server environments want to see from a Git server. The performance of the DHT schema degrades rapidly as small changes are pushed into the repository due to the chunk size being less than 1/3 of the pushed pack size. Small chunks cause poor prefetch performance during reading, and require significantly longer prefetch lists inside of the chunk meta field to work around the small size. The DHT code is very complex (>17,000 lines of code) and is very sensitive to the underlying database round-trip time, as well as the way objects were written into the pack stream that was chunked and stored on the database. A poor pack layout (from any version of C Git prior to Junio reworking it) can cause the DHT code to be unable to enumerate the objects of the linux-2.6 repository in a completable time scale. Performing a clone from a DHT stored repository of 2 million objects takes 2 million row lookups in the DHT to locate the OBJECT_INDEX row for each object being cloned. This is very difficult for some DHTs to scale, even at 5000 rows/second the lookup stage alone takes 6 minutes (on local filesystem, this is almost too fast to bother measuring). Some servers like Apache Cassandra just fall over and cannot complete the 2 million lookups in rapid fire. On a ~400 MiB repository, the DHT schema has an extra 25 MiB of redundant data that gets downloaded to the JGit process, and that is before you consider the cost of the OBJECT_INDEX table also being fully loaded, which is at least 223 MiB of data for the linux kernel repository. In the DHT schema answering a `git clone` of the ~400 MiB linux kernel needs to load 248 MiB of "index" data from the DHT, in addition to the ~400 MiB of pack data that gets sent to the client. This is 193 MiB more data to be accessed than the native filesystem format, but it needs to come over a much smaller pipe (local Ethernet typically) than the local SATA disk drive. I also never got around to writing the "repack" support for the DHT schema, as it turns out to be fairly complex to safely repack data in the repository while also trying to minimize the amount of changes made to the database, due to very common limitations on database mutation rates.. This new DFS storage layer fixes a lot of those issues by taking the simple approach for storing relatively standard Git pack and index files on an abstract filesystem. Packs are accessed by an in-process buffer cache, similar to the WindowCache used by the local filesystem storage layer. Unlike the local file IO, there are some assumptions that the storage system has relatively high latency and no concept of "file handles". Instead it looks at the file more like HTTP byte range requests, where a read channel is a simply a thunk to trigger a read request over the network. The DFS code in this change is still abstract, it does not store on any particular filesystem, but is fairly well suited to the Amazon S3 or Apache Hadoop HDFS. Storing packs directly on HDFS rather than HBase removes a layer of abstraction, as most HBase row reads turn into an HDFS read. Most of the DFS code in this change was blatently copied from the local filesystem code. Most parts should be refactored to be shared between the two storage systems, but right now I am hesistent to do this due to how well tuned the local filesystem code currently is. Change-Id: Iec524abdf172e9ec5485d6c88ca6512cd8a6eafb	13 år sedan
Shawn O. Pearce	ded8f6c721	Support large whole packed objects as streams Similar to the loose object support, whole packed objects can now be streamed back to the caller. The streaming is less efficient as we copy the data from the cached window array into the InflaterInputStream's internal buffer, then inflate it there before returning to the application. Like with unpacked objects, there is plenty of room for some optimization, especially for the copyTo method, where we don't necessarily need so much buffering to exist. Change-Id: Ie23be81289e37e24b91d17b0891e47b9da988008 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 år sedan
Shawn O. Pearce	ea21c111cb	Move PackWriter over to storage.pack.PackWriter Similar to what we did with the file code, move the pack writer into its own package so the related classes and their package private methods are hidden from the rest of the library. Change-Id: Ic1b5c7c8c8d266e90c910d8d68dfc8e93586854f Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 år sedan
Shawn O. Pearce	86547022f0	Tighten up local packed object representation during packing Rather than making a loader, and then using that to fill the object representation, parse the header and set up our data directly. This saves some time, as we don't waste cycles on information we won't use right now. The weight computed for a representation is now its actual stored size in the pack file, rather than its inflated size. This accounts for changes made when the compression level is modified on the repository. It is however more costly to determine the weight of the object, since we have to find its length in the pack. To try and recover that cost we now cache the length as part of our ObjectToPack record, so it doesn't have to be found during the output phase. A LocalObjectToPack now costs us (assuming 32 bit pointers): (32 bit) (64 bit) vm header: 8 bytes 8 bytes ObjectId: 20 bytes 20 bytes PackedObjectInfo: 12 bytes 12 bytes ObjectToPack: 8 bytes 12 bytes LocalOTP: 20 bytes 24 bytes ----------- --------- 68 bytes 74 bytes Change-Id: I923d2736186eb2ac8ab498d3eb137e17930fcb50 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 år sedan
Shawn O. Pearce	ad5238dc67	Move FileRepository to storage.file.FileRepository This move isolates all of the local file specific implementation code into a single package, where their package-private methods and support classes are properly hidden away from the rest of the core library. Because of the sheer number of files impacted, I have limited this change to only the renames and the updated imports. Change-Id: Icca4884e1a418f83f8b617d0c4c78b73d8a4bd17 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 år sedan
Shawn O. Pearce	ece88b99eb	Redo PackWriter object reuse output Output of selected reuses is refactored to use a new ObjectReuseAsIs interface that extends the ObjectReader. This interface allows the reader to control how it performs the reuse into the output stream, but also allows it to throw an exception to request the writer to find a different candidate representation. The PackFile reuse code was overhauled, cleaning up the APIs so they aren't exposed in the object loader, but instead are now a single method on the PackFile itself. The reuse algorithm was changed to do a data verification pass, followed by the copy pass to the output. This permits us to work around a corrupt object in a pack file by seeking another copy of that object when this one is bad. The reuse code was also optimized for the common case, where the in-pack representation is under 16 KiB. In these smaller cases data is sent to the pack writer more directly, avoiding some copying. Change-Id: I6350c2b444118305e8446ce1dfd049259832bcca Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 år sedan
Shawn O. Pearce	bf4ffff07f	Redo PackWriter object reuse selection The new selection implementation uses a public API on the ObjectReader, allowing the storage library to enumerate its candidates and select the best one for this packer without needing to build a temporary list of the candidates first. Change-Id: Ie01496434f7d3581d6d3bbb9e33c8f9fa649b6cd Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 år sedan
Shawn O. Pearce	6fc3ecac84	Extract PackFile specific code to ObjectToPack subclass The ObjectReader class is dual-purposed into being a factory for the ObjectToPack, permitting specific ObjectDatabase implementations to override the method and offer their own custom subclass of the generic ObjectToPack class. By allowing them to directly extend the type, each implementation can add custom fields to support tracking where an object is stored, without incurring any additional penalties like a parallel Map<ObjectId,Object> would cost. The reader was chosen to act as a factory rather than the database, as the reader will eventually be tied more tightly with the ObjectWalk and TreeWalk. During object enumeration the reader would have had to load the object for the RevWalk, and may chose to cache object position data internally so it can later be reused and fed into the ObjectToPack instance supplied to the PackWriter. Since a reader is not thread-safe, and is scoped to this PackWriter and its internal ObjectWalk, its a great place for the database to perform caching, if any. Right now this change goes a bit backwards by changing what should be generic ObjectToPack references inside of PackWriter to the very PackFile specific LocalObjectToPack subclass. We will correct these in a later commit as we start to refine what the ObjectToPack API will eventually look like in order to better support the PackWriter. Change-Id: I9f047d26b97e46dee3bc0ccb4060bbebedbe8ea9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 år sedan
Shawn O. Pearce	5e33a1de83	Simple dumb HTTP server for Git This is a simple HTTP server that provides the minimum server side support required for dumb (non-git aware) transport clients. We produce the info/refs and objects/info/packs file on the fly from the local repository state, but otherwise serve data as raw files from the on-disk structure. In the future we could better optimize the FileSender class and the servlets that use it to take advantage of direct file to network APIs in more advanced servlet containers like Jetty. Our glue package borrows the idea of a micro embedded DSL from Google Guice and uses it to configure a collection of Filters and HttpServlets, all of which are matched against requests using regular expressions. If a subgroup exists in the pattern, it is extracted and used for the path info component of the request. Change-Id: Ia0f1a425d07d035e344ae54faf8aeb04763e7487 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 år sedan
Git Development Community	1a6964c827	Initial JGit contribution to eclipse.org Per CQ 3448 this is the initial contribution of the JGit project to eclipse.org. It is derived from the historical JGit repository at commit `3a2dd9921c`. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 år sedan

1 Incheckningar (b646578d89513c5db30151f0b865c7335c88d682)