mirrors/jgit - jgit - source @ dussan.org

Commit Graph

Author	SHA1	Message	Date
Shawn Pearce	40051505d7	GC: Pack RefTrees in their own pack The RefTree graph needs to be quickly accessed to read references. It is also distinct graph disconnected from the rest of the repository. Store the commit and tree objects in their own pack. Change-Id: Icbb735be8fa91ccbf0708ca3a219b364e11a6b83	8 years ago
Shawn Pearce	f0d634eed7	DFS: Allow other RefDatabase implementations Permit a DfsRepository implementation to use a different RefDatabase than DfsRefDatabase. Change-Id: Ia263285f547bde1943993cc994d0222185021a16	8 years ago
Shawn Pearce	29aa444760	PackWriter: use lib.ObjectIdSet to avoid wrapper Hoist ObjectIdSet up to lib as part of the public API and add the interface to some common types like PackIndex and JGit custom ObjectId map types. This cleans up wrapper code in a number of places by allowing direct use of the types as an ObjectIdSet. Future commits can now rely on ObjectIdSet as a simple read-only type to check a set of objects from a number of storage options. Change-Id: Ib62b062421d475bd52abd6c84a73916ef36e084b	8 years ago
Terry Parker	d9bbb04c3e	Introduce PostUploadHook to replace UploadPackLogger UploadPackLogger is incorrectly named--it can be used to trigger any post upload action, such as GC/compaction. This change introduces PostUploadHook/PostUploadHookChain to replace UploadPackLogger/UploadPackLoggerChain and deprecates the latter. It also introduces PackStatistics as a replacement for PackWriter.Statistics, since the latter is not public API. It changes PackWriter to use PackStatistics and reimplements PackWriter.Statistics to delegate to PackStatistics. Change-Id: Ic51df1613e471f568ffee25ae67e118425b38986 Signed-off-by: Terry Parker <tparker@google.com>	9 years ago
Matthias Sohn	0e73d39506	Use AutoClosable to close resources in bundle org.eclipse.jgit - use try-with-resource where possible - replace use of deprecated release() by close() Change-Id: I0f139c3535679087b7fa09649166bca514750b81 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	9 years ago
Matthias Sohn	842ae868cf	Externalize error messages used in DfsGarbageCollector Change-Id: I11631afb33a2bb29d994551a0be8775bbe277300 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	9 years ago
Matthias Sohn	13cfc83b2b	Use try-with-resource to close resources in DfsGarbageCollector Change-Id: Iaa51a46a9dde13d6f5c0f9ff54a68cea0ef1fde3 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	9 years ago
Colby Ranger	5218f7b33a	Propagate IOException where possible when getting refs. Currently, Repository.getAllRefs() and Repository.getTags() silently ignores an IOException and instead returns an empty map. Repository is a public API and as such cannot be changed until the next major revision change. Where possible, update the internal jgit APIs to use the RefDatabase directly, since it propagates the error. Change-Id: I4e4537d8bd0fa772f388262684c5c4ca1929dc4c	10 years ago
Shawn Pearce	1eed78657f	Don't delta compress garbage objects Garbage is randomly ordered and unlikely to delta compress against other garbage. Disable delta compression allowing objects to switch to whole form when moving to the garbage pack. Because the garbage is not well compressed assume deltas were not attempted during a normal GC cycle. Override the reuse settings, garbage that can be reused should be reused as-is into the garbage pack rather than switching something like the compression level during a GC. It is intended that garbage will eventually be removed from the repository so expending CPU time on a compression switch is not worthwhile. Change-Id: I0e8e58ee99e5011d375d3d89c94f2957de8402b9	11 years ago
Shawn Pearce	7f1c2ec1eb	Always add FileExt to DfsPackDescription Instead of forcing the implementation of the DFS backend to handle making sure the extension bits are set correctly, have the common callers in JGit set the extension at the same time they supply the file sizes to the pack description. This simplifies assumptions for an implementation of the DFS backend. Change-Id: I55142ad8ea08a3e2e8349f72b3714578eba9c342	11 years ago
Shawn Pearce	f32b861243	JGit 3.0: move internal classes into an internal subpackage This breaks all existing callers once. Applications are not supposed to build against the internal storage API unless they can accept API churn and make necessary updates as versions change. Change-Id: I2ab1327c202ef2003565e1b0770a583970e432e9	11 years ago
Shawn Pearce	3760e4319b	Remove cached_packs support in favor of bitmaps The bitmap code in PackWriter knows exactly when to use a pack as a "cached pack". It enables cached pack usage only when the pack has a bitmap and its entire closure of objects needs to be sent. This is a much simpler code path to maintain, and JGit actually has a way to write the necessary index. Change-Id: I2645d482f8733fdf0c4120cc59ba9aa4d4ba6881	11 years ago
Shawn Pearce	b2c0021b8a	Remove objects before optimization from DfsGarbageCollector Just counting objects is not sufficient. There are some race conditions with receive packs and delta base completion that may confuse such a simple algorithm. Instead always do the larger set computations, and rely on the PackWriter having no objects pending as the way to avoid creating an empty pack file. Change-Id: Ic81fefb158ed6ef8d6522062f2be0338a49f6bc4	11 years ago
Shawn Pearce	fc6b898cbe	Simplfy caching of DfsPackDescription from PackWriter.Statistics Let the pack description copy the relevant stats values. This moves it out of the garbage collector and compactor algorithms, co-locating with something that might care. Remove some unnecessary code from the DfsPackCompactor, the stats tracks the same information and can supply it. Change-Id: Id64ab38d507c0ed19ae0d106862d175b7364eba3	11 years ago
Shawn Pearce	bb002c619b	Avoid repacking unreachable garbage in DfsGarbageCollector If a repository has significant amounts of unreachable garbage the final phase to coalesce it can take longer than any other part of the garbage collection phase. Provide a setting for applications to tweak the threshold where coalescing ends and files just remain on disk. Change-Id: I5f11a998a7185c75ece3271d8bc6181bb83f54c1	11 years ago
Colby Ranger	c660362768	Write the bitmap index correctly in DFS GC. A bug caused the .bitmap to actually have the .idx contents. Change-Id: I428bb27d419e8b1b69b6f3e2fd07cd29703669ad	11 years ago
Colby Ranger	f82821728b	Enable writing pack indexes with bitmaps in the GC. Update the dfs and file GC implementations to prepare and write bitmaps on the packs that contain the full closure of the object graph. Update the DfsPackDescription to include the index version. Change-Id: I3f1421e9cd90fe93e7e2ef2b8179ae2f1ba819ed	11 years ago
Colby Ranger	7fbd6588be	Reduce memory held and speed up DfsGarbageCollector. getObjectList() returns a list of ObjectToPack. These can hold on to a lot of memory. Furthermore, binary searching for objects in a sorted array can be slow. Improve the speed and reduce the memory by creating a copy of the ObjectId and inserting it into an ObjectIdOwnerMap. Change-Id: Ib5aa5b7447e05938b47fa55812a87b9872c20ea7	11 years ago
Colby Ranger	7c58f6282a	Update DfsGarbageCollector to not read back a pack index. Previously, the Dfs GC excluded objects from packs by passing a previously written index to the PackWriter. Reading back a file on Dfs is slow. Instead, allow the PackWriter to expose the objects included in a pack and forward that to invocations of excludeObjects() . Change-Id: I377cb4ab07f62cf790505e1eeb0b2efe81897c79	11 years ago
Colby Ranger	698705c754	Rename PackConstants to PackExt, a typed pack file extension. PackConstants previously contained string values for the pack and pack index extension. Change PackConstant to be PackExt, a typed wrapper around the string pack file extension. Change-Id: I86ac4db6da8f33aa42d6f37cfcc119e819444318	11 years ago
Colby Ranger	510a605546	Use file extension with DfsPackDescription get/set file size. Previously the size getters and setters had explicit methods for index and pack. Update the api to be based on the file extension. This will make it possible to support other extensions in the future, such as the forthcoming bitmap extensions. Change-Id: Iab9d4abe0af65b2fc71ad71ef1db0feb6b3b5c58	11 years ago
Colby Ranger	5d3c2b3def	Update DfsObjDatabase API to open/write by pack extension. Previously, the DfsObjDatabase had a hardcoded getPackFile() and getPackIndex() methods which opens a .pack and .idx file, respectively. A future change to add a bitmap index will need to be stored in a parallel .bitmap file. Update the DfsObjDatabase to support opening and writing of files for any pack extension. Change-Id: I7c403b501e242096a2d435f6865d6025a9f86108	11 years ago
Shawn O. Pearce	fa4cc2475f	DFS: A storage layer for JGit In practice the DHT storage layer has not been performing as well as large scale server environments want to see from a Git server. The performance of the DHT schema degrades rapidly as small changes are pushed into the repository due to the chunk size being less than 1/3 of the pushed pack size. Small chunks cause poor prefetch performance during reading, and require significantly longer prefetch lists inside of the chunk meta field to work around the small size. The DHT code is very complex (>17,000 lines of code) and is very sensitive to the underlying database round-trip time, as well as the way objects were written into the pack stream that was chunked and stored on the database. A poor pack layout (from any version of C Git prior to Junio reworking it) can cause the DHT code to be unable to enumerate the objects of the linux-2.6 repository in a completable time scale. Performing a clone from a DHT stored repository of 2 million objects takes 2 million row lookups in the DHT to locate the OBJECT_INDEX row for each object being cloned. This is very difficult for some DHTs to scale, even at 5000 rows/second the lookup stage alone takes 6 minutes (on local filesystem, this is almost too fast to bother measuring). Some servers like Apache Cassandra just fall over and cannot complete the 2 million lookups in rapid fire. On a ~400 MiB repository, the DHT schema has an extra 25 MiB of redundant data that gets downloaded to the JGit process, and that is before you consider the cost of the OBJECT_INDEX table also being fully loaded, which is at least 223 MiB of data for the linux kernel repository. In the DHT schema answering a `git clone` of the ~400 MiB linux kernel needs to load 248 MiB of "index" data from the DHT, in addition to the ~400 MiB of pack data that gets sent to the client. This is 193 MiB more data to be accessed than the native filesystem format, but it needs to come over a much smaller pipe (local Ethernet typically) than the local SATA disk drive. I also never got around to writing the "repack" support for the DHT schema, as it turns out to be fairly complex to safely repack data in the repository while also trying to minimize the amount of changes made to the database, due to very common limitations on database mutation rates.. This new DFS storage layer fixes a lot of those issues by taking the simple approach for storing relatively standard Git pack and index files on an abstract filesystem. Packs are accessed by an in-process buffer cache, similar to the WindowCache used by the local filesystem storage layer. Unlike the local file IO, there are some assumptions that the storage system has relatively high latency and no concept of "file handles". Instead it looks at the file more like HTTP byte range requests, where a read channel is a simply a thunk to trigger a read request over the network. The DFS code in this change is still abstract, it does not store on any particular filesystem, but is fairly well suited to the Amazon S3 or Apache Hadoop HDFS. Storing packs directly on HDFS rather than HBase removes a layer of abstraction, as most HBase row reads turn into an HDFS read. Most of the DFS code in this change was blatently copied from the local filesystem code. Most parts should be refactored to be shared between the two storage systems, but right now I am hesistent to do this due to how well tuned the local filesystem code currently is. Change-Id: Iec524abdf172e9ec5485d6c88ca6512cd8a6eafb	13 years ago

11 Commits (40051505d7aaccfe2efaf5f3022f1d99a3976554)