| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
This move isolates all of the local file specific implementation code
into a single package, where their package-private methods and support
classes are properly hidden away from the rest of the core library.
Because of the sheer number of files impacted, I have limited this
change to only the renames and the updated imports.
Change-Id: Icca4884e1a418f83f8b617d0c4c78b73d8a4bd17
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Objects that fall completely within a single window can be worked
with in a zero-copy fashion, provided that the window is backed by
a normal byte[] and not by a ByteBuffer.
This works for a surprising number of objects. The default window
size is 8 KiB, but most deltas are quite a bit smaller than that.
Objects smaller than 1/2 of the window size have a very good chance
of falling completely within a window's array, which means we can
work with them without copying their data around.
Larger objects, or objects which are unlucky enough to span over a
window boundary, get copied through the temporary buffer. We pay
a tiny penalty to realize we can't use the zero-copy code path,
but its easier than trying to keep track of two adjacent windows.
With this change (as well as everything preceeding it), packing
is actually a bit faster. Some crude benchmarks based on cloning
linux-2.6.git (~324 MiB, 1,624,785 objects) over localhost using
C git client and JGit daemon shows we get better throughput, and
slightly better times:
Total Time | Throughput
(old) (now) | (old) (now)
--------------+---------------------------
2m45s 2m37s | 12.49 MiB/s 21.17 MiB/s
2m42s 2m36s | 16.29 MiB/s 22.63 MiB/s
2m37s 2m31s | 16.07 MiB/s 21.92 MiB/s
Change-Id: I48b2c8d37f08d7bf5e76c5a8020cde4a16ae3396
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Output of selected reuses is refactored to use a new ObjectReuseAsIs
interface that extends the ObjectReader. This interface allows the
reader to control how it performs the reuse into the output stream,
but also allows it to throw an exception to request the writer to
find a different candidate representation.
The PackFile reuse code was overhauled, cleaning up the APIs so they
aren't exposed in the object loader, but instead are now a single
method on the PackFile itself. The reuse algorithm was changed to do
a data verification pass, followed by the copy pass to the output.
This permits us to work around a corrupt object in a pack file by
seeking another copy of that object when this one is bad.
The reuse code was also optimized for the common case, where the
in-pack representation is under 16 KiB. In these smaller cases
data is sent to the pack writer more directly, avoiding some copying.
Change-Id: I6350c2b444118305e8446ce1dfd049259832bcca
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The J2SE NIO APIs require that FileChannel close the underlying file
descriptor if a thread is interrupted while it is inside of a read or
write operation on that channel. This is insane, because it means we
cannot share the file descriptor between threads. If a thread is in
the middle of the FileChannel variant of IO.readFully() and it
receives an interrupt, the pack will be automatically closed on us.
This causes the other threads trying to use that same FileChannel to
receive IOExceptions, which leads to the pack getting marked as
invalid. Once the pack is marked invalid, JGit loses access to its
entire contents and starts to report MissingObjectExceptions.
Because PackWriter must ensure that the chosen pack file stays
available until the current object's data is fully copied to the
output, JGit cannot simply reopen the pack when its automatically
closed due to an interrupt being sent at the wrong time. The pack may
have been deleted by a concurrent `git gc` process, and that open file
descriptor might be the last reference to the inode on disk. Once its
closed, the PackWriter loses access to that object representation, and
it cannot complete sending the object the client.
Fortunately, RandomAccessFile's readFully method does not have this
problem. Interrupts during readFully() are ignored. However, it
requires us to first seek to the offset we need to read, then issue
the read call. This requires locking around the file descriptor to
prevent concurrent threads from moving the pointer before the read.
This reduces the concurrency level, as now only one window can be
paged in at a time from each pack. However, the WindowCache should
already be holding most of the pages required to handle the working
set for a process, and its own internal locking was already limiting
us on the number of concurrent loads possible. Provided that most
concurrent accesses are getting hits in the WindowCache, or are for
different repositories on the same server, we shouldn't see a major
performance hit due to the more serialized loading.
I would have preferred to use a pool of RandomAccessFiles for each
pack, with threads borrowing an instance dedicated to that thread
whenever they needed to page in a window. This would permit much
higher levels of concurrency by using multiple file descriptors (and
file pointers) for each pack. However the code became too complex to
develop in any reasonable period of time, so I've chosen to retrofit
the existing code with more serialization instead.
Bug: 308945
Change-Id: I2e6e11c6e5a105e5aef68871b66200fd725134c9
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The strings are externalized into the root resource bundles.
The resource bundles are stored under the new "resources" source
folder to get proper maven build.
Strings from tests are, in general, not externalized. Only in
cases where it was necessary to make the test pass the strings
were externalized. This was typically necessary in cases where
e.getMessage() was used in assert and the exception message was
slightly changed due to reuse of the externalized strings.
Change-Id: Ic0f29c80b9a54fcec8320d8539a3e112852a1f7b
Signed-off-by: Sasa Zivkov <sasa.zivkov@sap.com>
|
|
|
|
|
|
|
|
|
| |
The number of bytes to copy was truncated to an int, but the
pack's copyToStream() method expected to be passed a long here.
Pass through the long so we don't truncate a giant object.
Change-Id: I0786ad60a3a33f84d8746efe51f68d64e127c332
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rather than keep track of both the position of the object, and the
position of its data, just keep track of the number of bytes used
by the object's header in the pack. This shaves 4 bytes out of the
size of the PackedObjectLoader instances.
We also can defer the addition instruction to the materialize()
operation, avoiding it entirely if the caller never actually uses
the loader. This may be relevant for PackWriter invocations,
where only 1 loader gets chosen for a given object, even though
the object may appear on disk in more than one pack file.
Error reporting is now simplified, as we can rely on the object
offset rather than its data offset. This is the value displayed
by pack debugging tools like `git verify-pack -v`, so its better
to use that in our own errors.
Because nobody needs getDataOffset() now, we can drop that from
the public API.
Change-Id: Ic639c0d5a722315f4f5c8ffda6e26643d90e5f42
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we read the object header we copy 20 bytes from the pack data,
then start parsing out the type and the inflated size. For most
objects, this is only going to require 3 bytes, which is sufficient
to represent objects with inflated sizes of up to 2^16. The local
buffer however still has 17 bytes remaining in it, and that can be
used to satisfy the OBJ_OFS_DELTA header.
We shouldn't need to worry about walking off the end of the buffer
here, because delta offsets cannot be larger than 64 bits, and that
requires only 9 bytes in the OFS_DELTA encoding.
Assuming worst-case scenarios of 9 bytes for the OFS_DELTA encoding,
the pack file itself must be approaching 2^64 bytes, an infeasible
size to store on any current technology. However, even if this
were the case we still have 11 bytes for the type/size header.
In that encoding we can represent an object as large as 2^74 bytes,
which is also an infeasible size to process in JGit.
So drop the second read here.
The data offsets we pass into the ObjectLoaders being constructed
need to be computed individually now. This saves a local variable,
but pushes the addition operation into each branch of the switch.
Change-Id: I6cf64697a9878db87bbf31c7636c03392b47a062
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
According the javadoc, and implied by the name of the class, NB
is about network byte order. The purpose of moving the IO only,
and non-byte order related functions to another class is to
make it easier for new contributors to understand that they
can use these functions in general and it's also makes it easier
to understand where to put new IO related utility functions
Change-Id: I4a9f6b39d5564bc8a694b366e7ff3cc758c5181b
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
|
|
|
|
|
|
|
| |
As discussed on the egit-dev mailing list, we prefer not to have
trailing whitespace in our source code. Correct all currently
offending lines by trimming them.
Change-Id: I002b1d1980071084c0bc53242c8f5900970e6845
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|
|
Per CQ 3448 this is the initial contribution of the JGit project
to eclipse.org. It is derived from the historical JGit repository
at commit 3a2dd9921c8a08740a9e02c421469e5b1a9e47cb.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
|