Implement AutoClosable interface on classes that used release()
Implement AutoClosable and deprecate the old release() method to give
JGit consumers some time to adapt.
Bug: 428039
Change-Id: Id664a91dc5a8cf2ac401e7d87ce2e3b89e221458
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
JGit 3.0: move internal classes into an internal subpackage
This breaks all existing callers once. Applications are not supposed
to build against the internal storage API unless they can accept API
churn and make necessary updates as versions change.
Change-Id: I2ab1327c202ef2003565e1b0770a583970e432e9
Avoid looking at UNREACHABLE_GARBAGE for client have lines
Clients send a bunch of unknown objects to UploadPack on each round
of negotiation. Many of these are not known to the server, which
leads the implementation to be looking at indexes for garbage packs.
Disable examining the index of a garbage pack, allowing servers to
avoid reading them from disk during negotiation.
The effect of this change is the server will only ACK a have line
if the object was reachable during the last garbage collection,
or was recently added to the repository. For most repositories
there is no impact in this behavior change.
If a repository rewinds a branch, runs GC, and then resets the
branch back to where it was before, the now current tip is going to
be skipped by this change. A client that has the commit may wind up
getting a slightly larger data transfer from the server as an older
common ancestor will be chosen during negotiation. This is fixable
on the server side by running GC again to correct the layout of
objects in pack files.
Change-Id: Icd550359ef70fc7b701980f9b13d923fd13c744b
A pack bitmap index is an additional index of compressed
bitmaps of the object graph. Furthermore, a logical API of the index
functionality is included, as it is expected to be used by the
PackWriter.
Compressed bitmaps are created using the javaewah library, which is a
word-aligned compressed variant of the Java bitset class based on
run-length encoding. The library only works with positive integer
values. Thus, the maximum number of ObjectIds in a pack file that
this index can currently support is limited to Integer.MAX_VALUE.
Every ObjectId is given an integer mapping. The integer is the
position of the ObjectId in the complete ObjectId list, sorted
by offset, for the pack file. That integer is what the bitmaps
use to reference the ObjectId. Currently, the new index format can
only be used with pack files that contain a complete closure of the
object graph e.g. the result of a garbage collection.
The index file includes four bitmaps for the Git object types i.e.
commits, trees, blobs, and tags. In addition, a collection of
bitmaps keyed by an ObjectId is also included. The bitmap for each entry
in the collection represents the full closure of ObjectIds reachable
from the keyed ObjectId (including the keyed ObjectId itself). The
bitmaps are further compressed by XORing the current bitmaps against
prior bitmaps in the index, and selecting the smallest representation.
The XOR'd bitmap and offset from the current entry to the position
of the bitmap to XOR against is the actual representation of the entry
in the index file. Each entry contains one byte, which is currently
used to note whether the bitmap should be blindly reused.
Change-Id: Id328724bf6b4c8366a088233098c18643edcf40f
StartGenerator now processes .git/shallow to have the
RevWalk stop for shallow commits.
See RevWalkShallowTest for tests.
Bug: 394543
CQ: 6908
Change-Id: Ia5af1dab3fe9c7888f44eeecab1e1bcf2e8e48fe
Signed-off-by: Chris Aniszczyk <zx@twitter.com>
Revert "Teach PackWriter how to reuse an existing object list"
This reverts commit f5fe2dca3c.
I regret adding this feature to the public API. Caches aren't always
the best idea, as they require work to maintain. Here the cache is
redundant information that must be computed, and when it grows stale
must be removed. The redundant information takes up more disk space,
about the same size as the pack-*.idx files are. For the linux-2.6
repository, that's more than 40 MB for a 400 MB repository. So the
cache is a 10% increase in disk usage.
The entire point of this cache is to improve PackWriter performance,
and only PackWriter performance, and only when sending an initial
clone to a new client. There may be better ways to optimize this, and
until we have a solid solution, we shouldn't be using a separate cache
in JGit.
Teach PackWriter how to reuse an existing object list
Counting the objects needed for packing is the most expensive part of
an UploadPack request that has no uninteresting objects (otherwise
known as an initial clone). During this phase the PackWriter is
enumerating the entire set of objects in this repository, so they can
be sent to the client for their new clone.
Allow the ObjectReader (and therefore the underlying storage system)
to keep a cached list of all reachable objects from a small number of
points in the project's history. If one of those points is reached
during enumeration of the commit graph, most objects are obtained from
the cached list instead of direct traversal.
PackWriter uses the list by discarding the current object lists and
restarting a traversal from all refs but marking the object list name
as uninteresting. This allows PackWriter to enumerate all objects
that are more recent than the list creation, or that were on side
branches that the list does not include.
However, ObjectWalk tags all of the trees and commits within the list
commit as UNINTERESTING, which would normally cause PackWriter to
construct a thin pack that excludes these objects. To avoid that,
addObject() was refactored to allow this list-based enumeration to
always include an object, even if it has been tagged UNINTERESTING by
the ObjectWalk. This implies the list-based enumeration may only be
used for initial clones, where all objects are being sent.
The UNINTERESTING labeling occurs because StartGenerator always
enables the BoundaryGenerator if the walker is an ObjectWalk and a
commit was marked UNINTERESTING, even if RevSort.BOUNDARY was not
enabled. This is the default reasonable behavior for an ObjectWalk,
but isn't desired here in PackWriter with the list-based enumeration.
Rather than trying to change all of this behavior, PackWriter works
around it.
Because the list name commit's immediate files and trees were all
enumerated before the list enumeration itself starts (and are also
within the list itself) PackWriter runs the risk of adding the same
objects to its ObjectIdSubclassMap twice. Since this breaks the
internal map data structure (and also may cause the object to transmit
twice), PackWriter needs to use a new "added" RevFlag to track whether
or not an object has been put into the outgoing list yet.
Change-Id: Ie99ed4d969a6bb20cc2528ac6b8fb91043cee071
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
ObjectReader implementations are now responsible for creating the
unique abbreviation of an ObjectId, or for resolving an abbreviation
back to its full form. In this latter case the reader can offer up
multiple candidates to the caller, who may be able to disambiguate
them based on context.
Repository.resolve() doesn't take multiple candidates into account
right now, but it could in the future by looking for a remaining
^0 or ^{commit} suffix and take an expansion if there is only one
commit that matches the input abbreviation. It could also use
the distance from an annotated tag to resolve "tag-NNN-gcommit"
style strings that are often output by `git describe`.
Change-Id: Icd3250adc8177ae05278b858933afdca0cbbdb56
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
An ObjectReader implementation may be very slow for a single object,
but yet support bulk queries efficiently by batching multiple small
requests into a single larger request. This easily happens when the
reader is built on top of a database that is stored on another host,
as the network round-trip time starts to dominate the operation cost.
RevWalk, ObjectWalk, UploadPack and PackWriter are the first major
users of this new bulk interface, with the goal being to support an
efficient way to pack a repository for a fetch/clone client when the
source repository is stored in a high-latency storage system.
Processing the want/have lists is now done in bulk, to remove
the high costs associated with common ancestor negotiation.
PackWriter already performs object reuse selection in bulk, but it
now can also do the object size lookup and object counting phases
with higher efficiency. Actual object reuse, deltification, and
final output are still doing sequential lookups, making them a bit
more expensive to perform.
Change-Id: I4c966f84917482598012074c370b9831451404ee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
By giving the reader information about the roots of a revision
traversal, some readers may be able to prefetch information from
their backing store using background threads in order to reduce
data access latency. However this isn't typically necessary so
the default reader implementation doesn't react to the advice.
Change-Id: I72c6cbd05cff7d8506826015f50d9f57d5cda77e
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Storage implementations or application code using an ObjectReader
may want to access this constant without being inside of a subclass
of the reader.
Change-Id: I6c871a03d5846b9bb899de4d14a265e8b204d8e0
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Honor pack.threads and perform delta search in parallel
If we have multiple CPUs available, packing usually goes faster
when each CPU is assigned a slice of the available search space.
The number of threads to use is guessed from the runtime if it
wasn't set by the caller, or wasn't set in the configuration.
Change-Id: If554fd8973db77632a52a0f45377dd6ec13fc220
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This is an informational function used by PackWriter to help it
better organize objects for delta compression. Storage systems
can implement it to provide up more detailed size information,
or they can simply rely on the default behavior that uses the
ObjectLoader obtained from open.
For local file storage, we can obtain this information faster
through specialized routines that parse a pack object header.
Change-Id: I13a09b4effb71ea5151b51547f7d091564531e58
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
We don't actually need a Repository object here, just an ObjectReader
that can load content for us. So change the API to depend on that.
However, this breaks the asCommit and asTag legacy translation methods
on RevCommit and RevTag, so we still have to keep the Repository
inside of RevWalk for those two types. Hopefully we can drop those in
the future, and then drop the Repository off the RevWalk.
Change-Id: Iba983e48b663790061c43ae9ffbb77dfe6f4818e
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Similar to what we did on Repository, the openObject method
already implied we wanted to open an object, given its main
argument was of type AnyObjectId. Simplify the method name
to just the action, has or open.
Change-Id: If055e5e0d8de0e2424c18a773f6d2bc2f66054f4
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Refactor Repository.openObject to be Repository.open
We drop the "Object" suffix, because its pretty clear here that
we want to open an object, given that we pass in AnyObjectId as
the main parameter. We also fix the calling convention to throw
a MissingObjectException or IncorrectObjectTypeException, so that
callers don't have to do this error checking themselves.
Change-Id: I72c43353cea8372278b032f5086d52082c1eee39
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Throw IncorrectObjectTypeException on bad type hints
If the type hint isn't OBJ_ANY and it doesn't match the actual type
observed from the object store, define the reader to throw back an
IncorrectObjectTypeException. This way the caller doesn't have to
perform this check itself before it evaluates the object data, and
we can simplify quite a few call sites.
Change-Id: I9f0dfa033857f439c94245361fcae515bc0a6533
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Similar to what we did with the file code, move the pack writer
into its own package so the related classes and their package
private methods are hidden from the rest of the library.
Change-Id: Ic1b5c7c8c8d266e90c910d8d68dfc8e93586854f
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The new selection implementation uses a public API on the
ObjectReader, allowing the storage library to enumerate its
candidates and select the best one for this packer without
needing to build a temporary list of the candidates first.
Change-Id: Ie01496434f7d3581d6d3bbb9e33c8f9fa649b6cd
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Extract PackFile specific code to ObjectToPack subclass
The ObjectReader class is dual-purposed into being a factory for the
ObjectToPack, permitting specific ObjectDatabase implementations
to override the method and offer their own custom subclass of the
generic ObjectToPack class. By allowing them to directly extend the
type, each implementation can add custom fields to support tracking
where an object is stored, without incurring any additional penalties
like a parallel Map<ObjectId,Object> would cost.
The reader was chosen to act as a factory rather than the database,
as the reader will eventually be tied more tightly with the
ObjectWalk and TreeWalk. During object enumeration the reader
would have had to load the object for the RevWalk, and may chose
to cache object position data internally so it can later be reused
and fed into the ObjectToPack instance supplied to the PackWriter.
Since a reader is not thread-safe, and is scoped to this PackWriter
and its internal ObjectWalk, its a great place for the database to
perform caching, if any.
Right now this change goes a bit backwards by changing what should
be generic ObjectToPack references inside of PackWriter to the very
PackFile specific LocalObjectToPack subclass. We will correct these
in a later commit as we start to refine what the ObjectToPack API
will eventually look like in order to better support the PackWriter.
Change-Id: I9f047d26b97e46dee3bc0ccb4060bbebedbe8ea9
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The WindowCache is an implementation detail of PackFile and how its
used by ObjectDirectory. Lets start to hide it and replace the public
API with a more generic concept, ObjectReader.
Because PackedObjectLoader is also considered a private detail of
PackFile, we have to make PackWriter temporarily dependent upon the
WindowCursor and thus FileRepository and ObjectDirectory in order to
just start the refactoring. In later changes we will clean up the
APIs more, exposing sufficient support to PackWriter without needing
the file specific implementation details.
Change-Id: I676be12b57f3534f1285854ee5de1aa483895398
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
As discussed on the egit-dev mailing list, we prefer not to have
trailing whitespace in our source code. Correct all currently
offending lines by trimming them.
Change-Id: I002b1d1980071084c0bc53242c8f5900970e6845
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Per CQ 3448 this is the initial contribution of the JGit project
to eclipse.org. It is derived from the historical JGit repository
at commit 3a2dd9921c.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>