mirrors/jgit - jgit - source @ dussan.org

Tree: b77ba04976

Author	SHA1	Message	Date
Colby Ranger	b77ba04976	Do not delta compress objects that have already tried to compress. If an object is in a pack file already, delta compression will not attempt to re-compress it. This assumes that the previous packing already performed the optimal compression attempt, however, the subclasses of StoredObjectRepresentation may use other heuristics to determine if the stored format is optimal. Change-Id: I403de522f4b0dd2667d54f6faed621f392c07786	11 years ago
Shawn O. Pearce	37a10e3006	PackWriter: Don't include edges in progress meter When compressing objects, don't include the edges in the progress meter. These cost almost no CPU time as they are simply pushed into and popped out of the delta search window. Change-Id: I7ea19f0263e463c65da34a7e92718c6db1d4a131 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>	13 years ago
Shawn O. Pearce	e6bd689d2c	Improve LargeObjectException reporting Use 3 different types of LargeObjectException for the 3 major ways that we can fail to load an object. For each of these use a unique string translation which describes the root cause better than just the ObjectId.name() does. Change-Id: I810c98d5691b74af9fc6cbd46fc9879e35a7bdca Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 years ago
Shawn O. Pearce	f048af3fd1	Implement async/batch lookup of object data An ObjectReader implementation may be very slow for a single object, but yet support bulk queries efficiently by batching multiple small requests into a single larger request. This easily happens when the reader is built on top of a database that is stored on another host, as the network round-trip time starts to dominate the operation cost. RevWalk, ObjectWalk, UploadPack and PackWriter are the first major users of this new bulk interface, with the goal being to support an efficient way to pack a repository for a fetch/clone client when the source repository is stored in a high-latency storage system. Processing the want/have lists is now done in bulk, to remove the high costs associated with common ancestor negotiation. PackWriter already performs object reuse selection in bulk, but it now can also do the object size lookup and object counting phases with higher efficiency. Actual object reuse, deltification, and final output are still doing sequential lookups, making them a bit more expensive to perform. Change-Id: I4c966f84917482598012074c370b9831451404ee Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	1a06179ea7	Move PackWriter configuration to PackConfig This refactoring permits applications to configure global per-process settings for all packing and easily pass it through to per-request PackWriters, ensuring that the process configuration overrides the repository specific settings. For example this might help in a daemon environment where the server wants to cap the resources used to serve a dynamic upload pack request, even though the repository's own pack.* settings might be configured to be more aggressive. This allows fast but less bandwidth efficient serving of clients, while still retaining good compression through a cron managed `git gc`. Change-Id: I58cc5e01b48924b1a99f79aa96c8150cdfc50846 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	12fe0f2d1e	Discard the uncompressed delta as soon as its compressed The DeltaCache will most likely need to copy the compressed delta into a new buffer in order to compact away the wasted space at the end caused by over allocation. Since we don't need the uncompressed format anymore, null out our only reference to it so the GC can reclaim this memory if it needs to perform a collection in order to satisfy the cache's allocation attempt. Change-Id: I50403cfd2e3001b093f93a503cccf7adab43cc9d Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	9734194917	Honor pack.windowlimit to cap memory usage during packing The pack.windowlimit configuration parameter places an upper bound on the number of bytes used by the DeltaWindow class as it scans through the object list. If memory usage would exceed the limit the window is temporarily decreased in size to keep memory used within that bound. Change-Id: I09521b8f335475d8aee6125826da8ba2e545060d Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	a960d1429e	Cache small deltas during packing PackWriter now caches small deltas, or deltas that are very tiny compared to their source inputs, so that the writing phase goes faster by reusing those cached deltas. The cached data is stored compressed, which usually translates to a bigger footprint due to deltas being very hard to compress, but saves time during writing by avoiding the deflate step. They are held under SoftReferences so that the JVM GC can clear out deltas if memory gets very tight. We would rather continue working and spend a bit more CPU time during writing than crash due to OOME. To avoid OutOfMemoryErrors during the caching phase we also trap OOME and just abort out of the caching. Because deflateBound() always produces something larger than what we need to actually store the deflated data, we copy it over into a new buffer if the actual length doesn't match the buffer length. When packing jgit.git this saves over 111 KiB in the cache, and is thus a worthwhile hit on CPU time. To further save memory we store the inflated size of the delta (which we need for the object header) in the same field as the pathHash, as the pathHash is no longer necessary by this phase of the packing algorithm. Change-Id: I0da0c600d845e8ec962289751f24e65b5afa56d7 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	dfad23bf3d	Implement delta generation during packing PackWriter now produces new deltas if there is not a suitable delta available for reuse from an existing pack file. This permits JGit to send less data on the wire by sending a delta relative to an object the other side already has, instead of sending the whole object. The delta searching algorithm is similar in style to what C Git uses, but apparently has some differences (see below for more on). Briefly, objects that should be considered for delta compression are pushed onto a list. This list is then sorted by a rough similarity score, which is derived from the path name the object was discovered at in the repository during object counting. The list is then walked in order. At each position in the list, up to $WINDOW objects prior to it are attempted as delta bases. Each object in the window is tried, and the shortest delta instruction sequence selects the base object. Some rough rules are used to prevent pathological behavior during this matching phase, like skipping pairings of objects that are not similar enough in size. PackWriter intentionally excludes commits and annotated tags from this new delta search phase. In the JGit repository only 28 out of 2600+ commits can be delta compressed by C Git. As the commit count tends to be a fair percentage of the total number of objects in the repository, and they generally do not delta compress well, skipping over them can improve performance with little increase in the output pack size. Because this implementation was rebuilt from scratch based on my own memory of how the packing algorithm has evolved over the years in C Git, PackWriter, DeltaWindow, and DeltaEncoder don't use exactly the same rules everywhere, and that leads JGit to produce different (but logically equivalent) pack files. Repository \| Pack Size (bytes) \| Packing Time \| JGit - CGit = Difference \| JGit / CGit -----------+----------------------------------+----------------- git \| `25094348` - `24322890` = +771458 \| 59.434s / 59.133s jgit \| `5669515` - `5709046` = - 39531 \| 6.654s / 6.806s linux-2.6 \| 389M - 386M = +3M \| 20m02s / 18m01s For the above tests pack.threads was set to 1, window size=10, delta depth=50, and delta and object reuse was disabled for both implementations. Both implementations were reading from an already fully packed repository on local disk. The running time reported is after 1 warm-up run of the tested implementation. PackWriter is writing 771 KiB more data on git.git, 3M more on linux-2.6, but is actually 39.5 KiB smaller on jgit.git. Being larger by less than 0.7% on linux-2.6 isn't bad, nor is taking an extra 2 minutes to pack. On the running time side, JGit is at a major disadvantage because linux-2.6 doesn't fit into the default WindowCache of 20M, while C Git is able to mmap the entire pack and have it available instantly in physical memory (assuming hot cache). CGit also has a feature where it caches deltas that were created during the compression phase, and uses those cached deltas during the writing phase. PackWriter does not implement this (yet), and therefore must create every delta twice. This could easily account for the increased running time we are seeing. Change-Id: I6292edc66c2e95fbe45b519b65fdb3918068889c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago

9 Commits (b77ba049762e4ea3aadb756dad1d06c859bb3fe3)