mirrors/jgit - jgit - source @ dussan.org

Commit Graph

Автор	SHA1	Съобщение	Дата
Shawn O. Pearce	5fce8d81d8	Fix cloning of repositories with big objects When running IndexPack we use a CachedObjectDirectory, which knows what objects are loose and tries to avoid stat(2) calls for objects that do not exist in the repository, as stat(2) on Win32 is very slow. However large delta objects found in a pack file are expanded into a loose object, in order to avoid costly delta chain processing when that object is used as a base for another delta. If this expand occurs while working with the CachedObjectDirectory, we need to update the cached directory data to include this new object, otherwise it won't be available when we try to open it during the object verify phase. Bug: 324868 Change-Id: Idf0c76d4849d69aa415ead32e46a435622395d68 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	преди 13 години
Shawn O. Pearce	41dd9ed1c0	Unpack and cache large deltas as loose objects Instead of spooling large delta bases into temporary files and then immediately deleting them afterwards, spool the large delta out to a normal loose object. Later any requests for that large delta can be answered by reading from the loose object, which is much easier to stream efficiently for readers. Since the object is now duplicated, once in the pack as a delta and again as a loose object, any future prune-packed will automatically delete the loose object variant, releasing the wasted disk space. As prune-packed is run automatically during either repack or gc, and gc --auto triggers automatically based on the number of loose objects, we get automatic cache management for free. Large objects that were unpacked will be periodically cleared out, and will simply be restored later if they are needed again. After a short offline discussion with Junio Hamano today, we may want to propose a change to prune-packed to hold onto larger loose objects which also exist in pack files as deltas, if the loose object was recently accessed or modified in the last 2 days. Change-Id: I3668a3967c807010f48cd69f994dcbaaf582337c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	преди 13 години
Shawn O. Pearce	3f66e65e71	Remember loose objects and fast-track their lookup Recently created objects are usually what branches point to, and are usually written out as loose objects. But due to the high cost of asking the operating system if a file exists, these are the last thing that ObjectDirectory examines when looking for an object by its ObjectId. Caching recently seen loose objects permits the opening code to jump directly to the loose object, accelerating lookup for branch heads that are accessed often. To avoid exploding the cache its limited to approximately 2048 entries. When more ids are added, the table is simply cleared and reset in size. Change-Id: I18f483217412b102f754ffd496c87061d592e535 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	преди 13 години
Shawn O. Pearce	eb64ccad6d	Correctly name DeltaBaseCache This class is used only to cache the unpacked form of an object that was used as a base for another object. The theory goes that if an object is used as a delta base for A, it will probably also be a delta base for B, C, D, E, etc. and therefore having an unpacked copy of it on hand will make delta resolution for the others very fast. However since objects are usually only accessed once, we don't want to cache everything we unpack, just things that we are likely to need again. The only things we need again are the delta bases. Hence, its a delta base cache. This gets us the class name UnpackedObjectCache back, so we can use it to actually create a cache of unpacked object information. Change-Id: I121f356cf4eca7b80126497264eac22bd5825a1d Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	преди 13 години
Shawn O. Pearce	ba984ba2e0	Fix checkReferencedIsReachable to use correct base list When checkReferencedIsReachable is set in ReceivePack we are trying to prove that the push client is permitted to access an object that it did not send to us, but that the received objects link to either via a link inside of an object (e.g. commit parent pointer or tree member) or by a delta base reference. To do this check we are making a list of every potential delta base, and then ensuring that every delta base used appears on this list. If a delta base does not appear on this list, we abort with an error, letting the client know we are missing a particular object. Preventing spurious errors about missing delta base objects requires us to use the exact same list of potential delta bases as the remote push client used. This means we must use TOPO ordering, and we need to enable BOUNDARY sorting so that ObjectWalk will correctly include any trees found during the enumeration back to the common merge base between the interesting and uninteresting heads. To ensure JGit's own push client matches this same potential delta base list, we need to undo `60aae90d4d` ("Disable topological sorting in PackWriter") and switch back to using the conventional TOPO ordering for commits in a pack file. This ensures that our own push client will use the same potential base object list as checkReferencedIsReachable uses on the receiving side. Change-Id: I14d0a326deb62a43f987b375cfe519711031e172 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	преди 13 години
Shawn O. Pearce	741659fed4	DeltaStream: Fix data corruption when reading large copies If the copy instruction was larger than the input buffer given to us, we copied the wrong part of the base stream during the next read(). This occurred on really big binary files where a copy instruction of 64k wasn't unreasonable, but the caller's buffer was only 8192 bytes long. We copied the first 8192 bytes correctly, but then reseeked the base stream back to the start of the copy region on the second read of 8192 bytes. Instead of a sequence like ABCD being read into the caller, we read AAAA. Change-Id: I240a3f722a3eda1ce8ef5db93b380e3bceb1e201 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	преди 13 години
Robin Rosenberg	8145e40233	cleanup: Remove unnecessary @SuppressWarnings Change-Id: I1b239b587e1cc811bbd6e1513b07dc93a891a842 Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>	преди 13 години
Shawn O. Pearce	e29cd27961	Move ObjectDirectory streaming limit to WindowCacheConfig IDEs like Eclipse offer up the settings in WindowCacheConfig to the user as a global set of options that are configured for the entire JVM process, not per-repository, as the cache is shared across the entire JVM. The limit on how much we are willing to allocate for an object buffer is similar to the limit on how much we can use for data caches, allocating that much space impacts the entire JVM and not just a single repository, so it should be a global limit. Change-Id: I22eafb3e223bf8dea57ece82cd5df8bfe5badebc Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	преди 13 години
Shawn O. Pearce	e6bd689d2c	Improve LargeObjectException reporting Use 3 different types of LargeObjectException for the 3 major ways that we can fail to load an object. For each of these use a unique string translation which describes the root cause better than just the ObjectId.name() does. Change-Id: I810c98d5691b74af9fc6cbd46fc9879e35a7bdca Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	преди 13 години
Shawn O. Pearce	b24f907e3e	Buffer very large delta streams to reduce explosion of CPU work Large delta streams are unpacked incrementally, but because a delta can seek to a random position in the base to perform a copy we may need to inflate the base repeatedly just to complete one delta. So work around it by copying the base to a temporary file, and then we can read from that temporary file using random seeks instead. Its far more efficient because we now only need to inflate the base once. This is still really ugly because we have to dump to a temporary file, but at least the code can successfully process a large file without throwing OutOfMemoryError. If speed is an issue, the user will need to increase the JVM heap and ensure core.streamFileThreshold is set to a higher value, so we don't use this code path as often. Unfortunately we lose the "optimization" of skipping over portions of a delta base that we don't actually need in the final result. This is going to cause us to inflate and write to disk useless regions that were deleted and do not appear in the final result. We could later improve on our code by trying to flatten delta instruction streams before we touch the bottom base object, and then only store the portions of the base we really need for the final result and that appear out-of-order. Since that is some pretty complex code I'm punting on it for now and just doing this simple whole-object buffering. Because the process umask might be permitting other users to read files we create, we put the temporary buffers into $GIT_DIR/objects. We can reasonably assume that if a reader can read our temporary buffer file in that directory, they can also read the base pack file we are pulling it from and therefore its not a security breach to expose the inflated content in a file. This requires a reader to have write access to the repository, but only if the file is really big. I'd rather err on the side of caution here and refuse to read a very big file into /tmp than to possibly expose a secured content because the Java 5 JVM won't let us create a protected temporary file that only the current user can access. Change-Id: I66fb80b08cbcaf0f65f2db0462c546a495a160dd Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	преди 13 години
Shawn O. Pearce	7a9edb3662	Fix reuse from pack file for REF_DELTA types We miscomputed the CRC32 checksum for a REF_DELTA type of object, by not including the full 20 byte ObjectId of the delta base in the CRC code we use when the delta is too large to go through our two faster small reuse code paths. This resulted in a corruption error during packing, where the PackFile erroneously suspected the data was wrong on the local filesystem and aborted writing, because the CRC didn't match what we had read from the index. Change-Id: I7d12cdaeaf2c83ddc11223ce0108d9bd6886e025 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	преди 13 години
Shawn O. Pearce	c11711f98e	Use limited getCachedBytes code to reduce duplication Rather than duplicating this block everywhere, reuse the limited size form of getCachedBytes to acquire the content of an object. Change-Id: I2e26a823e6fd0964d8f8dbfaa0fc2e8834c179c1 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>	преди 13 години
Shawn O. Pearce	1c3f3fdbd2	Fix ObjectDirectory abbreviation resolution to notice new packs If we can't resolve an abbreviation, it might be because there is a new pack file we haven't picked up yet. Try scanning the packs again and recheck each pack if there were differences from the last scan we did. Because of this, we don't have to open a pack during the test where we generate a pack on the fly. We'll miss on the first loop during which the PackList is the NO_PACKS magic initialization constant, and pick up the newly created index during this retry logic. Change-Id: I7b97efb29a695ee60c90818be380f7ea23ad13a3 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	преди 13 години
Shawn O. Pearce	a5c18fcfc7	Fully implement SHA-1 abbreviations ObjectReader implementations are now responsible for creating the unique abbreviation of an ObjectId, or for resolving an abbreviation back to its full form. In this latter case the reader can offer up multiple candidates to the caller, who may be able to disambiguate them based on context. Repository.resolve() doesn't take multiple candidates into account right now, but it could in the future by looking for a remaining ^0 or ^{commit} suffix and take an expansion if there is only one commit that matches the input abbreviation. It could also use the distance from an annotated tag to resolve "tag-NNN-gcommit" style strings that are often output by `git describe`. Change-Id: Icd3250adc8177ae05278b858933afdca0cbbdb56 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	преди 13 години
Shawn O. Pearce	28ba4747bc	Allow ObjectReuseAsIs to have more control over write ordering The reuse system used by an object database may be able to benefit from knowing what objects are coming next, and even improve data throughput by delaying (or moving up) objects that are stored near each other in the source database. Pushing the iteration down into the reuse code makes it possible for a smarter implementation to aggregate reuse. But for the standard pack file format on disk we don't bother, its quite efficient already. Change-Id: I64f0048ca7071a8b44950d6c2a5dfbca3be6bba6 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	преди 14 години
Shawn O. Pearce	fe18e52195	Allow ObjectToPack subclasses to use up to 4 bits of flags Some instances may benefit from having access to memory efficient storage for some small values, like single flag bits. Give up a portion of our delta depth field to make 4 bits available to any subclass that wants it. This still gives us room for delta chains of 1,048,576 objects, and that is just insane. Unpacking 1 million objects to get to something is longer than most users are willing to wait for data from Git. Change-Id: If17ea598dc0ddbde63d69a6fcec0668106569125 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	преди 14 години
Shawn O. Pearce	f048af3fd1	Implement async/batch lookup of object data An ObjectReader implementation may be very slow for a single object, but yet support bulk queries efficiently by batching multiple small requests into a single larger request. This easily happens when the reader is built on top of a database that is stored on another host, as the network round-trip time starts to dominate the operation cost. RevWalk, ObjectWalk, UploadPack and PackWriter are the first major users of this new bulk interface, with the goal being to support an efficient way to pack a repository for a fetch/clone client when the source repository is stored in a high-latency storage system. Processing the want/have lists is now done in bulk, to remove the high costs associated with common ancestor negotiation. PackWriter already performs object reuse selection in bulk, but it now can also do the object size lookup and object counting phases with higher efficiency. Actual object reuse, deltification, and final output are still doing sequential lookups, making them a bit more expensive to perform. Change-Id: I4c966f84917482598012074c370b9831451404ee Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	преди 14 години
Shawn O. Pearce	b85af06324	Allow object reuse selection to occur in parallel ObjectReader implementations may wish to use multiple threads in order to evaluate object reuse faster. Let the reader make that decision by passing the iteration down into the reader. Because the work is pushed into the reader, it may need to locate a given ObjectToPack given its ObjectId. This can easily occur if the reader has sent a list of ObjectIds to the object database and gets back information keyed only by ObjectId, without the ObjectToPack handle. Expose lookup using the PackWriter's own internal map, so the reader doesn't need to build a redundant copy to track the assocation of ObjectId back to ObjectToPack. Change-Id: I0c536405a55034881fb5db92a2d2a99534faed34 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	преди 14 години
Shawn O. Pearce	cc6210619b	Flush the pack header as soon as its ready When the output stream is deeply buffered (e.g. 1 MiB or more in an HTTP servlet on some containers) trying to kick out the header earlier will prevent the client from stalling hard while the first 1 MiB is received and it can process the pack header. Forcing a flush here lets the client see the header and start its progress monitor for "Receiving objects: (1/N)" so the user knows there is still activity occurring, even though the buffering may cause there to be some lag as the buffer fills up on the sending side. Change-Id: I3edf39e8f703fe87a738dc236d426b194db85e3a Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	преди 14 години
Shawn O. Pearce	109c695936	Expose getType in ObjectToPack Storage implementations may find this useful when implementing the ObjectReuseAsIs interface on their ObjectReader. Expose it so we don't force them to create a redundant copy of the information. Change-Id: I802ec8113c00884fccde5d0e92b9849716316f62 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	преди 14 години
Mathias Kinzler	b7388637d8	Fix missing Configuration Change eventing Configuration change events were not being triggered, now they are forwarded from the FileConfig up to the Repository's listeners. Change-Id: Ida94a59f5a2b7fa8ae0126e33c13343275483ee5 Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	преди 13 години
Robin Rosenberg	a85c08e1c8	Do not trigger RefsChangedEvent on the first attempt to read a ref Such events make no sense, it has never been visible to this process so no client can have a stale value of the ref. Change-Id: Iea3a5035b0a1410b80b09cf53387b22b78b18018 Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>	преди 14 години
Ketan Padegaonkar	376acfb6db	Add FileRepository(String) convenience constructor Add a convenience API in FileRepository to pass in a String that points to the GIT_DIR location. This is converted to a File and sent through the usual constructor. Change-Id: I588388f37e89b8c690020f110a1bc59f46170c40	преди 13 години
Shawn O. Pearce	5f5da8b1d4	Enable configuration of non-standard pack settings For daemons we might want to disable delta compression entirely, or in some strange case an administrator might need to turn of delta reuse. Expose these normally internal pack settings through the pack configuration section. Change-Id: I39bfefee8384c864cc04ffac724f197240c8a11a Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	преди 14 години
Shawn O. Pearce	1a06179ea7	Move PackWriter configuration to PackConfig This refactoring permits applications to configure global per-process settings for all packing and easily pass it through to per-request PackWriters, ensuring that the process configuration overrides the repository specific settings. For example this might help in a daemon environment where the server wants to cap the resources used to serve a dynamic upload pack request, even though the repository's own pack.* settings might be configured to be more aggressive. This allows fast but less bandwidth efficient serving of clients, while still retaining good compression through a cron managed `git gc`. Change-Id: I58cc5e01b48924b1a99f79aa96c8150cdfc50846 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	преди 14 години
Christian Halstrick	08c0c5d938	Fix unit tests under windows the following tests fail under windows because certain inputstreams are not closed and files cannot be deleted because of that. The main problem I found is UnpackedObject.InflaterInputStream.close(). This method may throw exceptions found by checkValidEndOfStream() but doesn't call super.close() before leaving. It is not clear to me which resources a close() method should release before it throws an exception. But those reseources which are not published to the outside and which therefore cannot be closed by other means have to be closed in all cases. I changed the close() method to call super.close() under all circumstances. failing tests: testStandardFormat_LargeObject_TruncatedZLibStream(org.eclipse.jgit.storage.file.UnpackedObjectTest) testStandardFormat_LargeObject_TrailingGarbage(org.eclipse.jgit.storage.file.UnpackedObjectTest) testPackFormat_SmallObject(org.eclipse.jgit.storage.file.UnpackedObjectTest) Change-Id: Id2e609a29e725aad953ff9bd88af6381df38399d Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>	преди 14 години
Shawn O. Pearce	21f76c2a69	Remove static progress task names from PackWriter These need to be dynamic based on the current thread's environment at time of execution in order to be properly localized for the end user that will be seeing these messages. Change-Id: I4976f462cfe606edd2761c0e36b2f6b20f63d53c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	преди 14 години
Shawn O. Pearce	1b783d0370	Allow PackWriter callers to manage the thread pool By permitting the caller of PackWriter to select the Executor it uses for task execution, we give the caller the ability to manage the lifecycle of the thread pool, including reusing it across concurrent pack generators. This is the first step to supporting application thread pools within Daemon or another managed service like Gerrit Code Review. Change-Id: I96bee7b9c30ff9885f2bd261d0b6daaac713b5a4 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	преди 14 години
Jens Baumgart	db82b8d7eb	Fix concurrent read / write issue in LockFile on Windows LockFile.commit fails if another thread concurrently reads the base file. The problem is fixed by retrying the rename operation if it fails. Change-Id: I6bb76ea7f2e6e90e3ddc45f9dd4d69bd1b6fa1eb Bug: 308506 Signed-off-by: Jens Baumgart <jens.baumgart@sap.com>	преди 14 години
Robin Stocker	a00377a7e2	Fix Javadoc warnings There were some broken links, incorrect uses of @value, an invalid tag and an outdated comment. Change-Id: I22886bcc869a4b62bd606ebed40669f7b4723664 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	преди 14 години
Shawn O. Pearce	7ff18f3ec9	Make StoredConfig an abstraction above FileBasedConfig This exposes a load and save method, allowing a Repository to denote that it has a persistent configuration of some kind which can be accessed by the application, without needing to know exact details of how its stored . Change-Id: I7c414bc0f975b80f083084ea875eca25c75a07b2 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	преди 14 години
Shawn O. Pearce	12fe0f2d1e	Discard the uncompressed delta as soon as its compressed The DeltaCache will most likely need to copy the compressed delta into a new buffer in order to compact away the wasted space at the end caused by over allocation. Since we don't need the uncompressed format anymore, null out our only reference to it so the GC can reclaim this memory if it needs to perform a collection in order to satisfy the cache's allocation attempt. Change-Id: I50403cfd2e3001b093f93a503cccf7adab43cc9d Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	преди 14 години
Shawn O. Pearce	9734194917	Honor pack.windowlimit to cap memory usage during packing The pack.windowlimit configuration parameter places an upper bound on the number of bytes used by the DeltaWindow class as it scans through the object list. If memory usage would exceed the limit the window is temporarily decreased in size to keep memory used within that bound. Change-Id: I09521b8f335475d8aee6125826da8ba2e545060d Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	преди 14 години
Shawn O. Pearce	74e0835012	Honor pack.threads and perform delta search in parallel If we have multiple CPUs available, packing usually goes faster when each CPU is assigned a slice of the available search space. The number of threads to use is guessed from the runtime if it wasn't set by the caller, or wasn't set in the configuration. Change-Id: If554fd8973db77632a52a0f45377dd6ec13fc220 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	преди 14 години
Shawn O. Pearce	a960d1429e	Cache small deltas during packing PackWriter now caches small deltas, or deltas that are very tiny compared to their source inputs, so that the writing phase goes faster by reusing those cached deltas. The cached data is stored compressed, which usually translates to a bigger footprint due to deltas being very hard to compress, but saves time during writing by avoiding the deflate step. They are held under SoftReferences so that the JVM GC can clear out deltas if memory gets very tight. We would rather continue working and spend a bit more CPU time during writing than crash due to OOME. To avoid OutOfMemoryErrors during the caching phase we also trap OOME and just abort out of the caching. Because deflateBound() always produces something larger than what we need to actually store the deflated data, we copy it over into a new buffer if the actual length doesn't match the buffer length. When packing jgit.git this saves over 111 KiB in the cache, and is thus a worthwhile hit on CPU time. To further save memory we store the inflated size of the delta (which we need for the object header) in the same field as the pathHash, as the pathHash is no longer necessary by this phase of the packing algorithm. Change-Id: I0da0c600d845e8ec962289751f24e65b5afa56d7 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	преди 14 години
Shawn O. Pearce	dfad23bf3d	Implement delta generation during packing PackWriter now produces new deltas if there is not a suitable delta available for reuse from an existing pack file. This permits JGit to send less data on the wire by sending a delta relative to an object the other side already has, instead of sending the whole object. The delta searching algorithm is similar in style to what C Git uses, but apparently has some differences (see below for more on). Briefly, objects that should be considered for delta compression are pushed onto a list. This list is then sorted by a rough similarity score, which is derived from the path name the object was discovered at in the repository during object counting. The list is then walked in order. At each position in the list, up to $WINDOW objects prior to it are attempted as delta bases. Each object in the window is tried, and the shortest delta instruction sequence selects the base object. Some rough rules are used to prevent pathological behavior during this matching phase, like skipping pairings of objects that are not similar enough in size. PackWriter intentionally excludes commits and annotated tags from this new delta search phase. In the JGit repository only 28 out of 2600+ commits can be delta compressed by C Git. As the commit count tends to be a fair percentage of the total number of objects in the repository, and they generally do not delta compress well, skipping over them can improve performance with little increase in the output pack size. Because this implementation was rebuilt from scratch based on my own memory of how the packing algorithm has evolved over the years in C Git, PackWriter, DeltaWindow, and DeltaEncoder don't use exactly the same rules everywhere, and that leads JGit to produce different (but logically equivalent) pack files. Repository \| Pack Size (bytes) \| Packing Time \| JGit - CGit = Difference \| JGit / CGit -----------+----------------------------------+----------------- git \| `25094348` - `24322890` = +771458 \| 59.434s / 59.133s jgit \| `5669515` - `5709046` = - 39531 \| 6.654s / 6.806s linux-2.6 \| 389M - 386M = +3M \| 20m02s / 18m01s For the above tests pack.threads was set to 1, window size=10, delta depth=50, and delta and object reuse was disabled for both implementations. Both implementations were reading from an already fully packed repository on local disk. The running time reported is after 1 warm-up run of the tested implementation. PackWriter is writing 771 KiB more data on git.git, 3M more on linux-2.6, but is actually 39.5 KiB smaller on jgit.git. Being larger by less than 0.7% on linux-2.6 isn't bad, nor is taking an extra 2 minutes to pack. On the running time side, JGit is at a major disadvantage because linux-2.6 doesn't fit into the default WindowCache of 20M, while C Git is able to mmap the entire pack and have it available instantly in physical memory (assuming hot cache). CGit also has a feature where it caches deltas that were created during the compression phase, and uses those cached deltas during the writing phase. PackWriter does not implement this (yet), and therefore must create every delta twice. This could easily account for the increased running time we are seeing. Change-Id: I6292edc66c2e95fbe45b519b65fdb3918068889c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	преди 14 години
Shawn O. Pearce	074055d747	debug-show-packdelta: Dump a pack delta to the console This is a horribly crude application, it doesn't even verify that the object its dumping is delta encoded. Its method of getting the delta is pretty abusive to the public PackWriter API, because right now we don't want to expose the real internal low-level methods actually required to do this. Change-Id: I437a17ceb98708b5603a2061126eb251e82f4ed4 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	преди 14 години
Shawn O. Pearce	8612c0ace1	Initial pack format delta generator DeltaIndex is a simple pack style delta generator. The function works by creating a compact index of a source buffer's blocks, and then walking a sliding window along a desired result buffer, searching for the window in the index. When a match is found, the window is stretched to the longest possible length that is common with the source buffer, and a copy instruction is created. Rabin's polynomial hash function is used to compute the hash for a block, permitting efficient sliding of the window in single byte increments. The update function to slide one byte originated from David Mazieres' work in LBFS, and our implementation of the update step was certainly inspired by the initial work Geert Bosch proposed for C Git in http://marc.info/?l=git&m=114565424620771&w=2. To ensure the encoder runs in linear time with respect to the size of the two input buffers (source and result), the maximum number of blocks that can share the same position in the index's hashtable is capped at a constant number. This prevents bad inputs from causing the encoder to run in quadratic time, but comes with a penalty of creating a longer delta due to fewer considered copy positions. Strange hackery is used to cap the amount of memory used by the index to be no more than 12 bytes for every 16 bytes of source buffer, no matter what the JVM per-object overhead is. This permits an index to always be no larger than 1.75x the source buffer length, which is an important feature to support large windows of candidates to match against while packing. Here the strange hackery is nothing more than a manually managed chained hashtable, where pointers are array indexes into storage arrays rather than object references. Computation of the hash function for a single fixed sized block is done through an unrolled loop, where the first 4 iterations have been manually reduced down to eliminate unnecessary instructions. The pattern is derived from ObjectId.equals(byte[], int, byte[], int), where we have unrolled the loop required to compare two 20 byte arrays. Hours of testing with the Sun 1.6 JRE concluded that the non-obvious "foo[idx + 1]" style of reference is faster than "foo[idx++]", and so that is what we use here during hashing. Change-Id: If9fb2a1524361bc701405920560d8ae752221768 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	преди 14 години
Shawn O. Pearce	b38426ae8c	Add debugging toString() method to ObjectToPack Its useful to know what the flags are or what the base that was selected is. Dump these out as part of the object's toString. Change-Id: I8810067fb8337b08b4fcafd5f9ea3e1e31ca6726 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	преди 14 години
Shawn O. Pearce	699e4aa7c5	Make ObjectToPack clearReuseAsIs signal available to subclasses A subclass may want to use this method to release handles that are caching reuse information. Make it protected so they can override it and update themselves. Change-Id: I2277a56ad28560d2d2d97961cbc74bc7405a70d4 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	преди 14 години
Shawn O. Pearce	4569d77e13	Correctly classify the compressing objects phase Searching for reuse candidates should be fast compared to actually doing delta compression. So pull the progress monitor out of this phase and rename it back to identify the compressing objects state. Change-Id: I5eb80919f21c1251e0e3420ff7774126f1f79b27 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	преди 14 години
Shawn O. Pearce	85b7a53d52	Refactor ObjectToPack's delta depth setting Long ago when PackWriter is first written we thought that the delta depth could be updated automatically. But its never used. Instead make this a simple standard setter so the caller can more directly set the delta depth of this object. This permits us to configure a depth that takes into account more than just the depth of another object in this same pack. Change-Id: I1d71b74f2edd7029b8743a2c13b591098ce8cc8f Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	преди 14 години
Shawn O. Pearce	6730f9e3c8	Configure core.bigFileThreshold into PackWriter C Git's fast-import uses this to determine the maximum file size that it tries to delta compress, anything equal to or above this setting is stored with as a whole object with simple deflate. Define the configuration so we can use it later. Change-Id: Iea46e787d019a1b6c51135cc73d7688a02e207f5 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	преди 14 години
Shawn O. Pearce	823e9a9721	Add doNotDelta flag to ObjectToPack This flag will later control whether or not PackWriter search for a delta base for this object. Edge objects will never get searched, as the writer won't be outputting them, so they should always have this flag set on. Sometime in the future this flag should also be set for file blobs on file paths that have the "-delta" gitattribute set in the repository's attributes file. Change-Id: I6e518e1a6996c8ce00b523727f1b605e400e82c6 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	преди 14 години
Shawn O. Pearce	616bc74cf7	Add more configuration options to PackWriter We now at least import other pack settings like pack.window, which means we can later use these to control how we search for deltas. The compression level was fixed to use pack.compression rather than the loose object core.compression setting. Change-Id: I72ff6d481c936153ceb6a9e485fa731faf075a9a Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	преди 14 години
Shawn O. Pearce	2f93a09dd1	Save object path hash codes during packing We need to remember these so we can later cluster objects that have similar file paths near each other as we search for deltas between them. Change-Id: I52cb1e4ca15c9c267a2dbf51dd0d795f885f4cf8 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	преди 14 години
Shawn O. Pearce	b584cb8754	Add getObjectSize to ObjectReader This is an informational function used by PackWriter to help it better organize objects for delta compression. Storage systems can implement it to provide up more detailed size information, or they can simply rely on the default behavior that uses the ObjectLoader obtained from open. For local file storage, we can obtain this information faster through specialized routines that parse a pack object header. Change-Id: I13a09b4effb71ea5151b51547f7d091564531e58 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	преди 14 години
Shawn O. Pearce	711bd3e3d0	Define a constant for 127 in DeltaEncoder The special value 127 here means how many bytes we can put into a single insert command. Rather than use the magical value 127, lets name it to better document the code. Change-Id: I5a326f4380f6ac87987fa833e9477700e984a88e Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	преди 14 години
Shawn O. Pearce	cd7dd8591e	Cap delta copy instructions at 64k Although all modern delta decoders can process copy instructions with a count as large as 0xffffff (~15.9 MiB), pack version 2 streams are only supposed to use delta copy instructions up to 64 KiB. Rewrite our copy instruction encode loop to use the lower 64 KiB limit, even though modern decoders would support longer copies. To improve encoding performance we now try to encode up to four full copy commands in our buffer before we flush it to the stream, but we don't try to implement full buffering here. We are just trying to amortize the virtual method call to the destination stream when we have to do a large copy. Change-Id: I9410a16e6912faa83180a9788dc05f11e33fabae Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	преди 14 години
Shawn O. Pearce	a215914a56	Fix DeltaEncoder header for objects 128 bytes long The encode loop had the wrong condition, objects that are 128 bytes in size need to have their length encoded as two bytes, not one. Change-Id: I3bef85f2b774871ba6104042b341749eb8e7595c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	преди 14 години

1 2

78 Ревизии (stable-0.9)