aboutsummaryrefslogtreecommitdiffstats
path: root/org.eclipse.jgit/src/org/eclipse/jgit/lib/PackFile.java
Commit message (Collapse)AuthorAgeFilesLines
* Move FileRepository to storage.file.FileRepositoryShawn O. Pearce2010-06-261-695/+0
| | | | | | | | | | | | This move isolates all of the local file specific implementation code into a single package, where their package-private methods and support classes are properly hidden away from the rest of the core library. Because of the sheer number of files impacted, I have limited this change to only the renames and the updated imports. Change-Id: Icca4884e1a418f83f8b617d0c4c78b73d8a4bd17 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* Implement zero-copy for single window objectsShawn O. Pearce2010-06-261-22/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Objects that fall completely within a single window can be worked with in a zero-copy fashion, provided that the window is backed by a normal byte[] and not by a ByteBuffer. This works for a surprising number of objects. The default window size is 8 KiB, but most deltas are quite a bit smaller than that. Objects smaller than 1/2 of the window size have a very good chance of falling completely within a window's array, which means we can work with them without copying their data around. Larger objects, or objects which are unlucky enough to span over a window boundary, get copied through the temporary buffer. We pay a tiny penalty to realize we can't use the zero-copy code path, but its easier than trying to keep track of two adjacent windows. With this change (as well as everything preceeding it), packing is actually a bit faster. Some crude benchmarks based on cloning linux-2.6.git (~324 MiB, 1,624,785 objects) over localhost using C git client and JGit daemon shows we get better throughput, and slightly better times: Total Time | Throughput (old) (now) | (old) (now) --------------+--------------------------- 2m45s 2m37s | 12.49 MiB/s 21.17 MiB/s 2m42s 2m36s | 16.29 MiB/s 22.63 MiB/s 2m37s 2m31s | 16.07 MiB/s 21.92 MiB/s Change-Id: I48b2c8d37f08d7bf5e76c5a8020cde4a16ae3396 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* Redo PackWriter object reuse outputShawn O. Pearce2010-06-261-56/+209
| | | | | | | | | | | | | | | | | | | | | | Output of selected reuses is refactored to use a new ObjectReuseAsIs interface that extends the ObjectReader. This interface allows the reader to control how it performs the reuse into the output stream, but also allows it to throw an exception to request the writer to find a different candidate representation. The PackFile reuse code was overhauled, cleaning up the APIs so they aren't exposed in the object loader, but instead are now a single method on the PackFile itself. The reuse algorithm was changed to do a data verification pass, followed by the copy pass to the output. This permits us to work around a corrupt object in a pack file by seeking another copy of that object when this one is bad. The reuse code was also optimized for the common case, where the in-pack representation is under 16 KiB. In these smaller cases data is sent to the pack writer more directly, avoiding some copying. Change-Id: I6350c2b444118305e8446ce1dfd049259832bcca Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* Don't use interruptable pread() to access pack filesShawn O. Pearce2010-05-271-36/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The J2SE NIO APIs require that FileChannel close the underlying file descriptor if a thread is interrupted while it is inside of a read or write operation on that channel. This is insane, because it means we cannot share the file descriptor between threads. If a thread is in the middle of the FileChannel variant of IO.readFully() and it receives an interrupt, the pack will be automatically closed on us. This causes the other threads trying to use that same FileChannel to receive IOExceptions, which leads to the pack getting marked as invalid. Once the pack is marked invalid, JGit loses access to its entire contents and starts to report MissingObjectExceptions. Because PackWriter must ensure that the chosen pack file stays available until the current object's data is fully copied to the output, JGit cannot simply reopen the pack when its automatically closed due to an interrupt being sent at the wrong time. The pack may have been deleted by a concurrent `git gc` process, and that open file descriptor might be the last reference to the inode on disk. Once its closed, the PackWriter loses access to that object representation, and it cannot complete sending the object the client. Fortunately, RandomAccessFile's readFully method does not have this problem. Interrupts during readFully() are ignored. However, it requires us to first seek to the offset we need to read, then issue the read call. This requires locking around the file descriptor to prevent concurrent threads from moving the pointer before the read. This reduces the concurrency level, as now only one window can be paged in at a time from each pack. However, the WindowCache should already be holding most of the pages required to handle the working set for a process, and its own internal locking was already limiting us on the number of concurrent loads possible. Provided that most concurrent accesses are getting hits in the WindowCache, or are for different repositories on the same server, we shouldn't see a major performance hit due to the more serialized loading. I would have preferred to use a pool of RandomAccessFiles for each pack, with threads borrowing an instance dedicated to that thread whenever they needed to page in a window. This would permit much higher levels of concurrency by using multiple file descriptors (and file pointers) for each pack. However the code became too complex to develop in any reasonable period of time, so I've chosen to retrofit the existing code with more serialization instead. Bug: 308945 Change-Id: I2e6e11c6e5a105e5aef68871b66200fd725134c9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* Externalize strings from JGitSasa Zivkov2010-05-191-17/+18
| | | | | | | | | | | | | | | The strings are externalized into the root resource bundles. The resource bundles are stored under the new "resources" source folder to get proper maven build. Strings from tests are, in general, not externalized. Only in cases where it was necessary to make the test pass the strings were externalized. This was typically necessary in cases where e.getMessage() was used in assert and the exception message was slightly changed due to reuse of the externalized strings. Change-Id: Ic0f29c80b9a54fcec8320d8539a3e112852a1f7b Signed-off-by: Sasa Zivkov <sasa.zivkov@sap.com>
* Remove unnecessary truncation of in-pack size during copyShawn O. Pearce2010-05-171-3/+3
| | | | | | | | | The number of bytes to copy was truncated to an int, but the pack's copyToStream() method expected to be passed a long here. Pass through the long so we don't truncate a giant object. Change-Id: I0786ad60a3a33f84d8746efe51f68d64e127c332 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* Reduce size of PackedObjectLoader by dropping long to intShawn O. Pearce2010-05-151-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | Rather than keep track of both the position of the object, and the position of its data, just keep track of the number of bytes used by the object's header in the pack. This shaves 4 bytes out of the size of the PackedObjectLoader instances. We also can defer the addition instruction to the materialize() operation, avoiding it entirely if the caller never actually uses the loader. This may be relevant for PackWriter invocations, where only 1 loader gets chosen for a given object, even though the object may appear on disk in more than one pack file. Error reporting is now simplified, as we can rely on the object offset rather than its data offset. This is the value displayed by pack debugging tools like `git verify-pack -v`, so its better to use that in our own errors. Because nobody needs getDataOffset() now, we can drop that from the public API. Change-Id: Ic639c0d5a722315f4f5c8ffda6e26643d90e5f42 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* Avoid unnecessary second read on OBJ_OFS_DELTA headersShawn O. Pearce2010-05-151-11/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we read the object header we copy 20 bytes from the pack data, then start parsing out the type and the inflated size. For most objects, this is only going to require 3 bytes, which is sufficient to represent objects with inflated sizes of up to 2^16. The local buffer however still has 17 bytes remaining in it, and that can be used to satisfy the OBJ_OFS_DELTA header. We shouldn't need to worry about walking off the end of the buffer here, because delta offsets cannot be larger than 64 bits, and that requires only 9 bytes in the OFS_DELTA encoding. Assuming worst-case scenarios of 9 bytes for the OFS_DELTA encoding, the pack file itself must be approaching 2^64 bytes, an infeasible size to store on any current technology. However, even if this were the case we still have 11 bytes for the type/size header. In that encoding we can represent an object as large as 2^74 bytes, which is also an infeasible size to process in JGit. So drop the second read here. The data offsets we pass into the ObjectLoaders being constructed need to be computed individually now. This saves a local variable, but pushes the addition operation into each branch of the switch. Change-Id: I6cf64697a9878db87bbf31c7636c03392b47a062 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* Move pure IO utility functions to a utility class of its own.Robin Rosenberg2009-10-311-3/+4
| | | | | | | | | | | | | According the javadoc, and implied by the name of the class, NB is about network byte order. The purpose of moving the IO only, and non-byte order related functions to another class is to make it easier for new contributors to understand that they can use these functions in general and it's also makes it easier to understand where to put new IO related utility functions Change-Id: I4a9f6b39d5564bc8a694b366e7ff3cc758c5181b Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* Remove trailing whitespace at end of lineAlex Blewitt2009-10-311-6/+6
| | | | | | | | | As discussed on the egit-dev mailing list, we prefer not to have trailing whitespace in our source code. Correct all currently offending lines by trimming them. Change-Id: I002b1d1980071084c0bc53242c8f5900970e6845 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* Initial JGit contribution to eclipse.orgGit Development Community2009-09-291-0/+515
Per CQ 3448 this is the initial contribution of the JGit project to eclipse.org. It is derived from the historical JGit repository at commit 3a2dd9921c8a08740a9e02c421469e5b1a9e47cb. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>