mirrors/jgit - jgit - source @ dussan.org

Графік комітів

Автор	SHA1	Повідомлення	Дата
Shawn O. Pearce	461b012e95	PackWriter: Support reuse of entire packs The most expensive part of packing a repository for transport to another system is enumerating all of the objects in the repository. Once this gets to the size of the linux-2.6 repository (1.8 million objects), enumeration can take several CPU minutes and costs a lot of temporary working set memory. Teach PackWriter to efficiently reuse an existing "cached pack" by answering a clone request with a thin pack followed by a larger cached pack appended to the end. This requires the repository owner to first construct the cached pack by hand, and record the tip commits inside of $GIT_DIR/objects/info/cached-packs: cd $GIT_DIR root=$(git rev-parse master) tmp=objects/.tmp-$$ names=$(echo $root \| git pack-objects --keep-true-parents --revs $tmp) for n in $names; do chmod a-w $tmp-$n.pack $tmp-$n.idx touch objects/pack/pack-$n.keep mv $tmp-$n.pack objects/pack/pack-$n.pack mv $tmp-$n.idx objects/pack/pack-$n.idx done (echo "+ $root"; for n in $names; do echo "P $n"; done; echo) >>objects/info/cached-packs git repack -a -d When a clone request needs to include $root, the corresponding cached pack will be copied as-is, rather than enumerating all of the objects that are reachable from $root. For a linux-2.6 kernel repository that should be about 376 MiB, the above process creates two packs of 368 MiB and 38 MiB[1]. This is a local disk usage increase of ~26 MiB, due to reduced delta compression between the large cached pack and the smaller recent activity pack. The overhead is similar to 1 full copy of the compressed project sources. With this cached pack in hand, JGit daemon completes a clone request in 1m17s less time, but a slightly larger data transfer (+2.39 MiB): Before: remote: Counting objects: 1861830, done remote: Finding sources: 100% (1861830/1861830) remote: Getting sizes: 100% (88243/88243) remote: Compressing objects: 100% (88184/88184) Receiving objects: 100% (1861830/1861830), 376.01 MiB \| 19.01 MiB/s, done. remote: Total `1861830` (delta 4706), reused `1851053` (delta `1553844`) Resolving deltas: 100% (1564621/1564621), done. real 3m19.005s After: remote: Counting objects: 1601, done remote: Counting objects: 1828460, done remote: Finding sources: 100% (50475/50475) remote: Getting sizes: 100% (18843/18843) remote: Compressing objects: 100% (7585/7585) remote: Total `1861830` (delta 2407), reused `1856197` (delta 37510) Receiving objects: 100% (1861830/1861830), 378.40 MiB \| 31.31 MiB/s, done. Resolving deltas: 100% (1559477/1559477), done. real 2m2.938s Repository owners can periodically refresh their cached packs by repacking their repository, folding all newer objects into a larger cached pack. Since repacking is already considered to be a normal Git maintenance activity, this isn't a very big burden. [1] In this test $root was set back about two weeks. Change-Id: Ib87131d5c4b5e8c5cacb0f4fe16ff4ece554734b Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 роки тому
Shawn O. Pearce	71f168fcd7	PackWriter: Display totals after sending objects CGit pack-objects displays a totals line after the pack data was fully written. This can be useful to understand some of the decisions made by the packer, and has been a great tool for helping to debug some of that code. Track some of the basic values, and send it to the client when packing is done: remote: Counting objects: 1826776, done remote: Finding sources: 100% (55121/55121) remote: Getting sizes: 100% (25654/25654) remote: Compressing objects: 100% (11434/11434) remote: Total `1861830` (delta 3926), reused `1854705` (delta 38306) Receiving objects: 100% (1861830/1861830), 386.03 MiB \| 30.32 MiB/s, done. Change-Id: If3b039017a984ed5d5ae80940ce32bda93652df5 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 роки тому
Shawn O. Pearce	04759f3274	RefAdvertiser: Avoid object parsing It isn't strictly necessary to validate every reference's target object is reachable in the repository before advertising it to a client. This is an expensive operation when there are thousands of references, and its very unlikely that a reference uses a missing object, because garbage collection proceeds from the references and walks down through the graph. So trying to hide a dangling reference from clients is relatively pointless. Even if we are trying to avoid giving a client a corrupt repository, this simple check isn't sufficient. It is possible for a reference to point to a valid commit, but that commit to have a missing blob in its root tree. This can be caused by staging a file into the index, waiting several weeks, then committing that file while also racing against a prune. The prune may delete the blob, since its modification time is more than 2 weeks ago, but retain the commit, since its modification time is right now. Such graph corruption is already caught during PackWriter as it enumerates the graph from the client's want list and digs back to the roots or common base. Leave the reference validation also for that same phase, where we know we have to parse the object to support the enumeration. Change-Id: Iee70ead0d3ed2d2fcc980417d09d7a69b05f5c2f Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 роки тому
Shawn O. Pearce	f048af3fd1	Implement async/batch lookup of object data An ObjectReader implementation may be very slow for a single object, but yet support bulk queries efficiently by batching multiple small requests into a single larger request. This easily happens when the reader is built on top of a database that is stored on another host, as the network round-trip time starts to dominate the operation cost. RevWalk, ObjectWalk, UploadPack and PackWriter are the first major users of this new bulk interface, with the goal being to support an efficient way to pack a repository for a fetch/clone client when the source repository is stored in a high-latency storage system. Processing the want/have lists is now done in bulk, to remove the high costs associated with common ancestor negotiation. PackWriter already performs object reuse selection in bulk, but it now can also do the object size lookup and object counting phases with higher efficiency. Actual object reuse, deltification, and final output are still doing sequential lookups, making them a bit more expensive to perform. Change-Id: I4c966f84917482598012074c370b9831451404ee Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 роки тому
Shawn O. Pearce	9fbce904e6	Pass PackConfig down to PackWriter when packing When we are creating a pack the higher level application should be able to override the PackConfig used, allowing it to control the number of threads used or how much memory is allocated per writer. Change-Id: I47795987bb0d161d3642082acc2f617d7cb28d8c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 роки тому
Shawn O. Pearce	bb99ec0aa0	Simplify UploadPack use of options during writing We only use these variables once, so just put them at the proper use site and avoid assigning the local variable. The code is a bit shorter and the intent is a little bit more clear. Change-Id: I70d120fb149b612ac93055ea39bc053b8d90a5db Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 роки тому
Shawn O. Pearce	515deaf7e5	Ensure RevWalk is released when done Update a number of calling sites of RevWalk to ensure the walker's internal ObjectReader is released after the walk is no longer used. Because the ObjectReader is likely to hold onto a native resource like an Inflater, we don't want to leak them outside of their useful scope. Where possible we also try to share ObjectReaders across several walk pools, or between a walker and a PackWriter. This permits the ObjectReader to actually do some caching if it felt inclined to do so. Not everything was updated, we'll probably need to come back and update even more call sites, but these are some of the biggest offenders. Test cases in particular aren't updated. My plan is to move most storage-agnostic tests onto some purely in-memory storage solution that doesn't do compression. Change-Id: I04087ec79faeea208b19848939898ad7172b6672 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 роки тому
Shawn O. Pearce	06f635a4bc	Fix minor formatting issue in UploadPack Change-Id: Ifc0c3a94dc0e16126af6cf17e9c4a7cb96e8ffab Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 роки тому
Shawn O. Pearce	6b62e53b60	Move PackWriter progress monitors onto the operations Rather than taking the ProgressMonitor objects in our constructor and carrying them around as instance fields, take them as arguments to the actual time consuming operations we need to run. Change-Id: I2b230d07e277de029b1061c807e67de5428cc1c4 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 роки тому
Shawn O. Pearce	a45728d7a4	Ensure ObjectReader used by PackWriter is released The ObjectReader API demands that we release the reader when we are done with it. PackWriter contains a reader, which it uses for the entire packing session. Expose the release of the reader through a release method on the writer. This still doesn't address the RevWalk and TreeWalk users, who don't correctly release their reader. But its a small step in the right direction. Change-Id: I5cb0b5c1b432434a799fceb21b86479e09b84a0a Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 роки тому
Shawn O. Pearce	ea21c111cb	Move PackWriter over to storage.pack.PackWriter Similar to what we did with the file code, move the pack writer into its own package so the related classes and their package private methods are hidden from the rest of the library. Change-Id: Ic1b5c7c8c8d266e90c910d8d68dfc8e93586854f Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 роки тому
Shawn O. Pearce	47c07e1a0d	Replace manual peel loops with RevWalk.peel Instead of peeling things by hand in application level code, defer the peeling logic into RevWalk's new peel utility method. Change-Id: Idabd10dc41502e782f6a2eeb56f09566b97775a8 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 роки тому
Shawn O. Pearce	5ed96eb7f4	UploadPack: Avoid unnecessary flush in smart HTTP Under smart HTTP the biDirectionalPipe flag is false, and we return back immediately at this point in the negotiation process. There is no need to flush the stream to the client, the request is over and it will be automatically flushed out by the higher level servlet that invoked us. Avoiding flush here allows us to only use flush after a progress message is sent during pack generation. Change-Id: Id0c8b7e95e3be6ca4c1b479e096bed6b0283b828 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 роки тому
Sasa Zivkov	f3d8a8ecad	Externalize strings from JGit The strings are externalized into the root resource bundles. The resource bundles are stored under the new "resources" source folder to get proper maven build. Strings from tests are, in general, not externalized. Only in cases where it was necessary to make the test pass the strings were externalized. This was typically necessary in cases where e.getMessage() was used in assert and the exception message was slightly changed due to reuse of the externalized strings. Change-Id: Ic0f29c80b9a54fcec8320d8539a3e112852a1f7b Signed-off-by: Sasa Zivkov <sasa.zivkov@sap.com>	14 роки тому
Shawn O. Pearce	2156aa894c	Reduce multi-level buffered streams in transport code Some transports actually provide stream buffering on their own, without needing to be wrapped up inside of a BufferedInputStream in order to smooth out system calls to read or write. A great example of this is the JSch SSH client, or the Apache MINA SSHD server. Both use custom buffering to packetize the streams into the encrypted SSH channel, and wrapping them up inside of a BufferedInputStream or BufferedOutputStream is relatively pointless. Our SideBandOutputStream implementation also provides some fairly large buffering, equal to one complete side-band packet on the main data channel. Wrapping that inside of a BufferedOutputStream just to smooth out small writes from PackWriter causes extra data copies, and provides no advantage. We can save some memory and some CPU cycles by letting PackWriter dump directly into the SideBandOutputStream's internal buffer array. Instead we push the buffering streams down to be as close to the network socket (or operating system pipe) as possible. This allows us to smooth out the smaller reads/writes from pkt-line messages during advertisement and negotation, but avoid copying altogether when the stream switches to larger writes over a side band channel. Change-Id: I2f6f16caee64783c77d3dd1b2a41b3cc0c64c159 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 роки тому
Shawn O. Pearce	0af5944cac	Refactor SideBandOutputStream to be buffered Instead of relying on our callers to wrap us up inside of a BufferedOutputStream and using the proper block sizing, do the buffering directly inside of SideBandOutputStream. This ensures we don't get large write-throughs from BufferedOutputStream that might overflow the configured packet size. The constructor of SideBandOutputStream is also beefed up to check its arguments and ensure they are within acceptable ranges for the current side-band protocol. Change-Id: Ic14567327d03c9e972f9734b8228178bc448867d Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 роки тому
Nico Sallembien	e54d33b687	Add a RefFilter interface to ReceivePack and UploadPack When a user of ReceivePack or UploadPack wants to control what refs are sent to the client, for instance when some refs should be hidden from some clients, this interface can be extended to provide a fine grained control over what refs are sent to the client. Change-Id: Ie6320b0f8922e1a5e1bad91c016bd476ea094366	14 роки тому
Shawn O. Pearce	36f05a9c27	Optimize RefAdvertiser performance by avoiding sorting Don't copy and sort the set of references if they are passed through in a RefMap or a SortedMap using the key's natural sort ordering. Either map is already in the order we want to present the items to the client in, so copying and sorting is a waste of local CPU and memory. Change-Id: I49ada7c1220e0fc2a163b9752c2b77525d9c82c1 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 роки тому
Shawn O. Pearce	7ed6805425	Expose RefAdvertiser for reuse outside of the transport package By making this class and its methods public, and the actual writing abstract, we can reuse this code for other formats like writing an info/refs file for HTTP transports. Change-Id: Id0e349c30a0f5a8c1527e0e7383b80243819d9c5 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 роки тому
Shawn O. Pearce	e187618b6b	Teach UploadPack how to use an RPC style interface If biDirectionalPipe is false UploadPack does not start out with the advertisement but instead assumes it should read one block of want/have lines, process that, and write the ACK/NAKs out. This means it only is doing one read through the input followed by one write to the output, which fits with the HTTP request processing model, and any other type of RPC system. Change-Id: Ia9f7c46ee556f996367180f15d2caa8572cdd59f Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 роки тому
Shawn O. Pearce	a22b8f5fac	Implement multi_ack_detailed protocol extension The multi_ack_detailed extension breaks out the "ACK %s continue" status code into "ACK %s common" and "ACK %s ready" states, making it easier to discover which objects are truely common, and which objects are simply on a chain the server doesn't care learning about. Change-Id: Ie8e907424cfbbba84996ca205d49eacf339f9d04 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 роки тому
Git Development Community	1a6964c827	Initial JGit contribution to eclipse.org Per CQ 3448 this is the initial contribution of the JGit project to eclipse.org. It is derived from the historical JGit repository at commit `3a2dd9921c`. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 роки тому

22 Коміти (461b012e9565af8174e5b9d2b2c3a582011ce77e)