mirrors/jgit - jgit - source @ dussan.org

提交图

作者	SHA1	备注	提交日期
Shawn Pearce	aa8d5ac26c	Remove unnecessary inflate stride in DfsBlock OpenJDK 7 does not benefit from using an inflate stride on the input array. The implementation of java.util.zip.Inflater supplies the entire input byte[] to libz, with no regards for the bounds supplied. Slicing at 512 byte increments in DfsBlock no longer has any benefit. In OpenJDK 6 the native portion of Inflater used GetByteArrayRegion to obtain a copy of the input buffer for libz. In this use case supplying a small stride made sense, it avoided allocating space for and copying data past the end of the object's compressed stream. In OpenJDK 7 the native code uses GetPrimitiveArrayCritical, which tries to avoid copying by freezing Java garbage collection and accessing the byte[] contents in place. On OpenJDK 7 derived JVMs it is likely more efficient to supply the entire DfsBlock. Since OpenJDK 5 and 6 are deprecated and replaced by OpenJDK 7 it is reasonable to suggest any consumers running JGit with DFS support use an OpenJDK 7 derived JVM. However, JGit still targets local filesystem support on Java 5, so it is still not reasonble to apply this same simplification to the internal.storage.file package. See: JDK-6751338 (http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6751338) Change-Id: Ib248b6d383da5c8aa887d9c355a0df6f3e2247a5	10 年前
Shawn Pearce	65f44bef23	Remove DFS locality ordering during packing PackWriter generally chooses the order for objects when it builds the object lists. This ordering already depends on history information to guide placing more recent objects first and historical objects last. Allow PackWriter to make the basic ordering decisions, instead of trying to override them. The old approach of sorting the list caused DfsReader to override any ordering change PackWriter might have tried to make when repacking a repository. This now better matches with WindowCursor's implementation, where PackWriter solely determines the object ordering. Change-Id: Ic17ab5631ec539f0758b962966c3a1823735b814	11 年前
Shawn Pearce	56497be34d	Delete broken DFS read-ahead support This implementation has been proven to deadlock in production server loads. Google has been running with it disabled for a quite a while, as the bugs have been difficult to identify and fix. Instead of suggesting it works and is useful, drop the code. JGit should not advertise support for functionality that is known to be broken. In a few of the places where read-ahead was enabled by DfsReader there is more information about what blocks should be loaded when. During object representation selection, or size lookup, or sending object as-is to a PackWriter, or sending an entire pack as-is the reader knows exactly which blocks are required in the cache, and it also can compute when those will be needed. The broken read-ahead code was stupid and just read a fixed amount ahead of the current offset, which can waste IOs if more precise data was available. DFS systems are usually slow to respond so read-ahead is still a desired feature, but it needs to be rebuilt from scratch and make better use of the offset information. Change-Id: Ibaed8288ec3340cf93eb269dc0f1f23ab5ab1aea	11 年前
Shawn Pearce	d72416afbb	Optimize DFS object reuse selection code Rewrite this complicated logic to examine each pack file exactly once. This reduces thrashing when there are many large pack files present and the reader needs to locate each object's header. The intermediate temporary list is now smaller, it is bounded to the same length as the input object list. In the prior version of this code the list contained one entry for every representation of every object being packed. Only one representation object is allocated, reducing the overall memory footprint to be approximately one reference per object found in the current pack file (the pointer in the BlockList). This saves considerable working set memory compared to the prior version that made and held onto a new representation for every ObjectToPack. Change-Id: I2c1f18cd6755643ac4c2cf1f23b5464ca9d91b22	11 年前
Shawn Pearce	f32b861243	JGit 3.0: move internal classes into an internal subpackage This breaks all existing callers once. Applications are not supposed to build against the internal storage API unless they can accept API churn and make necessary updates as versions change. Change-Id: I2ab1327c202ef2003565e1b0770a583970e432e9	11 年前
Shawn Pearce	3760e4319b	Remove cached_packs support in favor of bitmaps The bitmap code in PackWriter knows exactly when to use a pack as a "cached pack". It enables cached pack usage only when the pack has a bitmap and its entire closure of objects needs to be sent. This is a much simpler code path to maintain, and JGit actually has a way to write the necessary index. Change-Id: I2645d482f8733fdf0c4120cc59ba9aa4d4ba6881	11 年前
Shawn Pearce	4e9fe58bb5	Avoid looking at UNREACHABLE_GARBAGE for client have lines Clients send a bunch of unknown objects to UploadPack on each round of negotiation. Many of these are not known to the server, which leads the implementation to be looking at indexes for garbage packs. Disable examining the index of a garbage pack, allowing servers to avoid reading them from disk during negotiation. The effect of this change is the server will only ACK a have line if the object was reachable during the last garbage collection, or was recently added to the repository. For most repositories there is no impact in this behavior change. If a repository rewinds a branch, runs GC, and then resets the branch back to where it was before, the now current tip is going to be skipped by this change. A client that has the commit may wind up getting a slightly larger data transfer from the server as an older common ancestor will be chosen during negotiation. This is fixable on the server side by running GC again to correct the layout of objects in pack files. Change-Id: Icd550359ef70fc7b701980f9b13d923fd13c744b	11 年前
Shawn Pearce	88c962484f	Rename DfsPackFile getBitmap method to match PackFile There is no reason for these to differ in name. Match the shorter name used by PackFile. Change-Id: I2d3a299069acc5ce276b1b5439ff2258903c6ff3	11 年前
Colby Ranger	dafcb8f6db	Support creating pack bitmap indexes in PackWriter. Update the PackWriter to support writing out pack bitmap indexes, a parallel ".bitmap" file to the ".pack" file. Bitmaps are selected at commits every 1 to 5,000 commits for each unique path from the start. The most recent 100 commits are all bitmapped. The next 19,000 commits have a bitmaps every 100 commits. The remaining commits have a bitmap every 5,000 commits. Commits with more than 1 parent are prefered over ones with 1 or less. Furthermore, previously computed bitmaps are reused, if the previous entry had the reuse flag set, which is set when the bitmap was placed at the max allowed distance. Bitmaps are used to speed up the counting phase when packing, for requests that are not shallow. The PackWriterBitmapWalker uses a RevFilter to proactively mark commits with RevFlag.SEEN, when they appear in a bitmap. The walker produces the full closure of reachable ObjectIds, given the collection of starting ObjectIds. For fetch request, two ObjectWalks are executed to compute the ObjectIds reachable from the haves and from the wants. The ObjectIds needed to be written are determined by taking all the resulting wants AND NOT the haves. For clone requests, we get cached pack support for "free" since it is possible to determine if all of the ObjectIds in a pack file are included in the resulting list of ObjectIds to write. On my machine, the best times for clones and fetches of the linux kernel repository (with about 2.6M objects and 300K commits) are tabulated below: Operation Index V2 Index VE003 Clone 37530ms (524.06 MiB) 82ms (524.06 MiB) Fetch (1 commit back) 75ms 107ms Fetch (10 commits back) 456ms (269.51 KiB) 341ms (265.19 KiB) Fetch (100 commits back) 449ms (269.91 KiB) 337ms (267.28 KiB) Fetch (1000 commits back) 2229ms ( 14.75 MiB) 189ms ( 14.42 MiB) Fetch (10000 commits back) 2177ms ( 16.30 MiB) 254ms ( 15.88 MiB) Fetch (100000 commits back) 14340ms (185.83 MiB) 1655ms (189.39 MiB) Change-Id: Icdb0cdd66ff168917fb9ef17b96093990cc6a98d	12 年前
Colby Ranger	3b325917a5	Added read/write support for pack bitmap index. A pack bitmap index is an additional index of compressed bitmaps of the object graph. Furthermore, a logical API of the index functionality is included, as it is expected to be used by the PackWriter. Compressed bitmaps are created using the javaewah library, which is a word-aligned compressed variant of the Java bitset class based on run-length encoding. The library only works with positive integer values. Thus, the maximum number of ObjectIds in a pack file that this index can currently support is limited to Integer.MAX_VALUE. Every ObjectId is given an integer mapping. The integer is the position of the ObjectId in the complete ObjectId list, sorted by offset, for the pack file. That integer is what the bitmaps use to reference the ObjectId. Currently, the new index format can only be used with pack files that contain a complete closure of the object graph e.g. the result of a garbage collection. The index file includes four bitmaps for the Git object types i.e. commits, trees, blobs, and tags. In addition, a collection of bitmaps keyed by an ObjectId is also included. The bitmap for each entry in the collection represents the full closure of ObjectIds reachable from the keyed ObjectId (including the keyed ObjectId itself). The bitmaps are further compressed by XORing the current bitmaps against prior bitmaps in the index, and selecting the smallest representation. The XOR'd bitmap and offset from the current entry to the position of the bitmap to XOR against is the actual representation of the entry in the index file. Each entry contains one byte, which is currently used to note whether the bitmap should be blindly reused. Change-Id: Id328724bf6b4c8366a088233098c18643edcf40f	12 年前
Colby Ranger	be7a135e94	Break the dependency on RevObject when creating a newObjectToPack(). Update the ObjectReuseAsIs API to support creating new ObjectToPack with only the AnyObjectId and Git object type. This is needed to support the future pack index bitmaps, which only contain this information and do not want the overhead of creating a temporary object for every ObjectId. Change-Id: I906360b471412688bf429ecef74fd988f47875dc	12 年前
Colby Ranger	698705c754	Rename PackConstants to PackExt, a typed pack file extension. PackConstants previously contained string values for the pack and pack index extension. Change PackConstant to be PackExt, a typed wrapper around the string pack file extension. Change-Id: I86ac4db6da8f33aa42d6f37cfcc119e819444318	11 年前
Colby Ranger	5d3c2b3def	Update DfsObjDatabase API to open/write by pack extension. Previously, the DfsObjDatabase had a hardcoded getPackFile() and getPackIndex() methods which opens a .pack and .idx file, respectively. A future change to add a bitmap index will need to be stored in a parallel .bitmap file. Update the DfsObjDatabase to support opening and writing of files for any pack extension. Change-Id: I7c403b501e242096a2d435f6865d6025a9f86108	11 年前
Marc Strapetz	67edd3eda7	RevWalk support for shallow clones StartGenerator now processes .git/shallow to have the RevWalk stop for shallow commits. See RevWalkShallowTest for tests. Bug: 394543 CQ: 6908 Change-Id: Ia5af1dab3fe9c7888f44eeecab1e1bcf2e8e48fe Signed-off-by: Chris Aniszczyk <zx@twitter.com>	11 年前
Shawn O. Pearce	3534fa9c61	Expose some DFS APIs as public or protected Expose class DfsReader and method DfsPackFile.hasObject() as public. Applications may want to be able to inquire about some details of the storage of a repository. Make this possible by exposing some simple accessor methods. Expose method DfsObjDatabase.clearCache() as protected, allowing implementing subclasses to dump the cache if necessary, and force it to reload on a future request. Change-Id: Ic592c82d45ace9f2fa5f8d7e4bacfdce96dea969	12 年前
Robin Rosenberg	95d311f888	Move JGitText to an internal package Change-Id: I763590a45d75f00a09097ab6f89581a3bbd3c797	12 年前
Robin Rosenberg	708febedaf	cleanup: Remove unnecessary @SuppressWarnings Change-Id: Ie22ac47e315bff76f224214bc042fc483eb01550 Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>	12 年前
Shawn O. Pearce	2fbf296fda	Fix duplicate objects in "thin+cached" packs from DFS The DfsReader must offer every representation of an object that exists on the local repository when PackWriter asks for them. This is necessary to identify objects in the thin pack part that are also in the cached pack that will be appended onto the end of the stream. Without looking at all alternatives, PackWriter may pack the same object twice (once in the thin section, again in the cached base pack). This may cause the command line C version to go into an infinite loop when repacking the resulting repository, as it may see a delta chain cycle with one of those duplicate copies of the object. Previously the DfsReader tried to avoid looking at packs that it might not care about, but this is insufficient, as all versions must be considered during pack generation. Change-Id: Ibf4a3e8ea5c42aef16404ffc42a5781edd97b18e	12 年前
Shawn O. Pearce	4b84186b64	Refactor DfsReader selection of cached packs Make the code more clear with a simple refactoring of the boolean logic into a method that describes the condition we are looking for on each pack file. A cached pack is possible if there exists a tips collection, and the collection is non-empty. Change-Id: I4ac42b0622b39d159a0f4f223e291c35c71f672c	12 年前
Matthias Sohn	afebe7880d	[findBugs] Prefer short-cut logic as it's more performant Change-Id: I64577f8fd19ee0d2d407479cc70e521adc367f37 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	12 年前
Shawn O. Pearce	fa4cc2475f	DFS: A storage layer for JGit In practice the DHT storage layer has not been performing as well as large scale server environments want to see from a Git server. The performance of the DHT schema degrades rapidly as small changes are pushed into the repository due to the chunk size being less than 1/3 of the pushed pack size. Small chunks cause poor prefetch performance during reading, and require significantly longer prefetch lists inside of the chunk meta field to work around the small size. The DHT code is very complex (>17,000 lines of code) and is very sensitive to the underlying database round-trip time, as well as the way objects were written into the pack stream that was chunked and stored on the database. A poor pack layout (from any version of C Git prior to Junio reworking it) can cause the DHT code to be unable to enumerate the objects of the linux-2.6 repository in a completable time scale. Performing a clone from a DHT stored repository of 2 million objects takes 2 million row lookups in the DHT to locate the OBJECT_INDEX row for each object being cloned. This is very difficult for some DHTs to scale, even at 5000 rows/second the lookup stage alone takes 6 minutes (on local filesystem, this is almost too fast to bother measuring). Some servers like Apache Cassandra just fall over and cannot complete the 2 million lookups in rapid fire. On a ~400 MiB repository, the DHT schema has an extra 25 MiB of redundant data that gets downloaded to the JGit process, and that is before you consider the cost of the OBJECT_INDEX table also being fully loaded, which is at least 223 MiB of data for the linux kernel repository. In the DHT schema answering a `git clone` of the ~400 MiB linux kernel needs to load 248 MiB of "index" data from the DHT, in addition to the ~400 MiB of pack data that gets sent to the client. This is 193 MiB more data to be accessed than the native filesystem format, but it needs to come over a much smaller pipe (local Ethernet typically) than the local SATA disk drive. I also never got around to writing the "repack" support for the DHT schema, as it turns out to be fairly complex to safely repack data in the repository while also trying to minimize the amount of changes made to the database, due to very common limitations on database mutation rates.. This new DFS storage layer fixes a lot of those issues by taking the simple approach for storing relatively standard Git pack and index files on an abstract filesystem. Packs are accessed by an in-process buffer cache, similar to the WindowCache used by the local filesystem storage layer. Unlike the local file IO, there are some assumptions that the storage system has relatively high latency and no concept of "file handles". Instead it looks at the file more like HTTP byte range requests, where a read channel is a simply a thunk to trigger a read request over the network. The DFS code in this change is still abstract, it does not store on any particular filesystem, but is fairly well suited to the Amazon S3 or Apache Hadoop HDFS. Storing packs directly on HDFS rather than HBase removes a layer of abstraction, as most HBase row reads turn into an HDFS read. Most of the DFS code in this change was blatently copied from the local filesystem code. Most parts should be refactored to be shared between the two storage systems, but right now I am hesistent to do this due to how well tuned the local filesystem code currently is. Change-Id: Iec524abdf172e9ec5485d6c88ca6512cd8a6eafb	13 年前

5 次代码提交 (aa8d5ac26c29663ef2792cd2eb1a749746c9307f)