In cases where we need to determine if a given commit is merged
into many refs, using isMergedInto(base, tip) for each ref would
cause multiple unwanted walks.
getMergedInto() marks the unreachable commits as uninteresting
which would then avoid walking that same path again.
Using the same api, also introduce isMergedIntoAny() and
isMergedIntoAll()
Change-Id: I65de9873dce67af9c415d1d236bf52d31b67e8fe
Signed-off-by: Adithya Chakilam <quic_achakila@quicinc.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Move reachability checker generation into the ObjectReader object
Reachability checkers are retrieved from RevWalk and ObjectWalk objects:
* RevWalk.createReachabilityChecker()
* ObjectWalk.createObjectReachabilityChecker()
Since RevWalks and ObjectWalks are themselves directly instantiated
in hundreds of places (e.g. UploadPack...) overriding them in a
consistent way requires overloading 100s of methods, which isn't
feasible. Moving reachability checker generation to a more central
place solves that problem.
The ObjectReader object seems a good place from which to get
reachability checkers, because reachability checkers return
information about relationships between objects. ObjectDatabases
delegate many operations to ObjectReaders, and reachability bitmaps
are attached to ObjectReaders.
The Bitmapped and Pedestrian reachability checker objects were
package private in the org.eclipse.jgit.revwalk package. This change
makes them public and moves them to the
org.eclipse.jgit.internal.revwalk package. Corresponding tests are
also moved.
Motivation:
1) Reachability checking algorithms need to scale. One of the
internal Android repositories has ~2.4 million refs/changes/*
references, causing bad long tail performance in reachability
checks.
2) Reachability check performance is impacted by repository
topography: number of refs, number of objects, amounts of
related vs. unrelated history.
3) Reachability check performance is also affected by per-branch
access (Gerrit branch permissions) since different users can
see different branches.
4) Reachability check performance isn't affected by any state in a
RevWalk or ObjectWalk.
I don't yet know if a single algorithm will work for all cases in #2
and #3. We may need to evolve the ReachabilityChecker interfaces
over time to solve the Gerrit branch permissions case, or use
Gerrit-specific identity information to solve that in an efficient
way.
This change takes the existing public API and moves it to the
ObjectReader/whole repository level, which is where we can do
consistent customizations for #2 and #3. We intend to upstream the
best of whatever works, but anticipate the need for multiple rounds
of experimentation.
Change-Id: I9185feff43551fb387957c436112d5250486833d
Signed-off-by: Terry Parker <tparker@google.com>
RevWalk: new topo sort to not mix lines of history
The topological sort algorithm in TopoSortGenerator for RevWalk may mix
multiple lines of history, producing results that differ from C git's
git-log whose man page states: "Show no parents before all of its
children are shown, and avoid showing commits on multiple lines of
history intermixed." Lines of history are mixed because
TopoSortGenerator merely delays producing a commit until all of its
children have been produced; it does not immediately produce a commit
after its last child has been produced.
Therefore, add a new RevSort option called TOPO_KEEP_BRANCH_TOGETHER
with a new topo sort algorithm in TopoNonIntermixGenerator. In the
Generator, when the last child of a commit has been produced, unpop
that commit so that it will be returned upon the subsequent call to
next(). To avoid producing duplicates, mark commits that have not yet
been produced as TOPO_QUEUED so that when a commit is popped, it is
produced if and only if TOPO_QUEUED is set.
To support nesting with other generators that may produce the same
commit multiple times like DepthGenerator (for example, StartGenerator
does this), do not increment parent inDegree for the same child commit
more than once.
Commit b5e764abd2 modified the existing
TopoSortGenerator to avoid mixing lines of history, but it was reverted
in e40c38ab08 because the new behavior
caused problems for EGit users. This motivated adding a new Generator
for the new behavior.
Signed-off-by: Alex Spradlin <alexaspradlin@google.com>
Change-Id: Icbb24eac98c00e45c175b01e1c8122554f617933
Revert "RevWalk: stop mixing lines of history in topo sort"
This reverts commit b5e764abd2.
PlotWalk uses the TopoSortGenerator, which is causing problems for EGit users
who rely on the emission of commits being somewhat based on date as in the
previous topo-sort algorithm.
Bug: 560529
Change-Id: I3dbd3598a7aeb960de3fc39352699b4f11a8c226
Signed-off-by: Alex Spradlin <alexaspradlin@google.com>
RevWalk: stop mixing lines of history in topo sort
The topological sort algorithm in TopoSortGenerator for RevWalk may mix
multiple lines of history, producing results that differ from C git's
git log whose man page states: "Show no parents before all of its
children are shown, and avoid showing commits on multiple lines of
history intermixed." Lines of history are mixed because
TopoSortGenerator merely delays a commit until all of its children have
been produced; it does not immediately produce a commit after its last
child has been produced.
Therefore, when the last child of a commit has been produced, unpop the
commit so that it will be returned upon the subsequent call to next() in
TopoSortGenerator. To avoid producing duplicates, mark commits that
have not yet been produced as TOPO_QUEUED so that when a commit is
popped, it is produced if and only if TOPO_QUEUED is set.
To support nesting with other generators that may produce the same
commit multiple times like DepthGenerator (for example, StartGenerator
does this), do not increment parent inDegree for the same child commit
more than once.
Modify tests that assert that TopoSortGenerator mixes lines of commit
history.
Change-Id: I4ee03c7a8e5265d61230b2a01ae3858745b2432b
Signed-off-by: Alex Spradlin <alexaspradlin@google.com>
Correct @since in RevWalk for the --first-parent methods
Fixes PDE API checks complaining: the methods were added
in JGit 5.5.0.
Change-Id: I9ff860c3408c6bb3891fa0da7547394d0fe9d0b6
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
RevWalk: Add a setFirstParent that mimics C git's --first-parent
RevWalk does not currently provide a --first-parent equivalent and the
feature has been requested.
Add a field to the RevWalk class to specify whether walks should
traverse first parents only. Modify Generator implementations to support
the feature.
Change-Id: I4a9a0d5767f82141dcf6d08659d7cb77c585fae4
Signed-off-by: Dave Borowitz <dborowitz@google.com>
Signed-off-by: Alex Spradlin <alexaspradlin@google.com>
Every caller would need to check if bitmaps are available in the repo to
instantiate a reachability checker.
Offer a method to create the reachability checker in the walk: the
caller has already a walk over the repo, and the walk has all the
information required.
This allows us to make the implementation classes package-private.
Change-Id: I355e47486fcd9d55baa7cb5700ec08dcc230eea5
Signed-off-by: Ivan Frade <ifrade@google.com>
Simplify RevWalk#iterator by factoring out common code
Factor out a helper that calls next() and tunnels IOException in a
RuntimeException, similar to TunnelException.tunnel(RevWalk::next) in
Guava terms[1].
This should make the code a little more readable. No functional
change intended.
[1] https://github.com/google/guava/issues/2828#issuecomment-304187823
Change-Id: I97c062d03a17663d5c40895fd3d2c6a7306d4f39
Signed-off-by: Jonathan Nieder <jrn@google.com>
MissingObjectException and IncorrectObjectTypeException are subclasses
of IOException.
Change-Id: Ib4e1f37ce1b0b08e69ba3375bbdb6ee82ee4f036
Signed-off-by: Jonathan Nieder <jrn@google.com>
Correctly handle initialization of shallow commits
In a new RevWalk, if the first object parsed is one of the
shallow commits, the following happens:
1) RevCommit.parseCanonical() is called on a new "r1" RevCommit.
2) RevCommit.parseCanonical() immediately calls
RevWalk.initializeShallowCommits().
3) RevWalk.initializeShallowCommits() calls lookupCommit(id),
creating and adding a new "r2" version of this same object and
marking its parents empty.
4) RevCommit.parseCanonical() initializes the "r1" RevCommit's
fields, including the parents.
5) RevCommit.parseCanonical()'s caller uses the "r1" commit that
has parents, losing the fact that it is a shallow commit.
This change passes the current RevCommit as an argument to
RevWalk.initializeShallowCommits() so that method can set its
parents empty rather than creating the duplicate "r2" commit.
Change-Id: I67b79aa2927dd71ac7b0d8f8917f423dcaf08c8a
Signed-off-by: Terry Parker <tparker@google.com>
Remove it from
* package private functions.
* try blocks
* for loops
this was done with the following python script:
$ cat f.py
import sys
import re
import os
def replaceFinal(m):
return m.group(1) + "(" + m.group(2).replace('final ', '') + ")"
methodDecl = re.compile(r"^([\t ]*[a-zA-Z_ ]+)\(([^)]*)\)")
def subst(fn):
input = open(fn)
os.rename(fn, fn + "~")
dest = open(fn, 'w')
for l in input:
l = methodDecl.sub(replaceFinal, l)
dest.write(l)
dest.close()
for root, dirs, files in os.walk(".", topdown=False):
for f in files:
if not f.endswith('.java'):
continue
full = os.path.join(root, f)
print full
subst(full)
Change-Id: If533a75a417594fc893e7c669d2c1f0f6caeb7ca
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Enable and fix warnings about redundant specification of type arguments
Since the introduction of generic type parameter inference in Java 7,
it's not necessary to explicitly specify the type of generic parameters.
Enable the warning in Eclipse, and fix all occurrences.
Change-Id: I9158caf1beca5e4980b6240ac401f3868520aad0
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Enable and fix 'Should be tagged with @Override' warning
Set missingOverrideAnnotation=warning in Eclipse compiler preferences
which enables the warning:
The method <method> of type <type> should be tagged with @Override
since it actually overrides a superclass method
Justification for this warning is described in:
http://stackoverflow.com/a/94411/381622
Enabling this causes in excess of 1000 warnings across the entire
code-base. They are very easy to fix automatically with Eclipse's
"Quick Fix" tool.
Fix all of them except 2 which cause compilation failure when the
project is built with mvn; add TODO comments on those for further
investigation.
Change-Id: I5772061041fd361fe93137fd8b0ad356e748a29c
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
DepthWalk needs to override toObjectWalkWithSameObjects() and thus
needs to be able to directly set the objects and freeFlags fields, so
make them package private.
Change-Id: I24561b82c54ba3d6522582ca25105b204d777074
Signed-off-by: Terry Parker <tparker@google.com>
UploadPack: Verify clients send only commits for shallow lines
If a client mistakenly tries to send a tag object as a shallow line
JGit blindly assumes this is a commit and tries to parse the tag
buffer using the commit parser. This can cause an obtuse error like:
InvalidObjectIdException: Invalid id: t c0ff331234...
The "t" comes from the "object c0ff331234..." line of the tag tring
to be parsed as though it where the "tree" line of a commit.
Run any client supplied shallow lines through the RevWalk to lookup
the object types. Fail fast with a protocol exception if any of them
are non-commit.
Skip objects not known to this repository. This matches behavior
with git-core's upload-pack, which sliently skips over any shallow
line object named by the client but not known by the server.
Change-Id: Ic6c57a90a42813164ce65c2244705fc42e84d700
Despite being the primary author of RevWalk and ObjectWalk I still
fail to remember to setRetainBody(false) in application code using
an ObjectWalk to examine the graph.
Document the default for RevWalk is setRetainBody(true), where the
application usually wants the commit bodies to display or inspect.
Change the default for ObjectWalk to setRetainBody(false), as nearly
all callers want only the graph shape and do not need the larger text
inside a commit body. This allows some code in JGit to be simplified.
Change-Id: I367e42209e805bd5e1f41b4072aeb2fa98ec9d99
RevWalk: Do not close reader passed explicitly to constructor
The RevWalk(ObjectReader) constructor is explicitly to handle the case
where the caller is responsible for opening and closing the reader.
The reader should only be closed when it was created in the
RevWalk(Repository) constructor.
Change-Id: Ic0d595dc8d10de79e87549546c6c5ea2dc617e9b
Implement AutoClosable interface on classes that used release()
Implement AutoClosable and deprecate the old release() method to give
JGit consumers some time to adapt.
Bug: 428039
Change-Id: Id664a91dc5a8cf2ac401e7d87ce2e3b89e221458
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Add retainOnReset(RevFlag) to RevWalk to simplify reset usage
Applications sometimes use a RevFlag instead of a Set<RevObject>
to track boolean state bits about objects being processed. However
this requires careful use of the resetRetain() methods to avoid an
accidental clearing of the RevFlag bits, effectively clearing the
Set<RevObject> the application wanted to track.
Simplify that use case by offering retainOnReset, a collection of
flags that are never cleared by the RevWalk.
Change-Id: I4c05b89b1398e4a4f371eac3a5d1d5edddec838f
Previously, setting any TreeFilter on a RevWalk triggered parent
rewriting, which in the current StartGenerator implementation ends up
buffering the entire commit history in memory. Aside from causing poor
performance on large histories, this does not match the default
behavior of `git rev-list`, which does not rewrite parent SHAs unless
asked to via --parents/--children.
Add a new method setRewriteParents() to RevWalk to disable this
behavior. Continue rewriting parents by default to maintain backwards
compatibility.
Change-Id: I1f38e05526071c75ca58095e312663de5e6f334d
Under certain circumstances isMergedInto() returned
false even though base is reachable from the tip.
This will hinder pushes and receives by falsely
detecting "non fast forward" merges.
o---o---o---o---o
/ \
/ o---o---A---o---M
/ /
---2---1-
if M (tip) was compared to 1 (base), the method
isMergedInto() could still return false, since
two mergeBases will be detected and the return
statement will only look at one of them:
return next() == base;
In most cases this would pass, but if "A" is
a commit with an old timestamp, the Generator
would walk down to "2" before completing the
walk pass "A" and first finding the other
merge base "1". In this case, the first call to
next() returns 2, which compared to base evaluates
as false.
This is fixed by iterating merge bases and
returning true if base is found among them.
Change-Id: If2ee1f4270f5ea4bee73ecb0e9c933f8234818da
Signed-off-by: Gustaf Lundh <gustaf.lundh@sonymobile.com>
Signed-off-by: Sven Selberg <sven.selberg@sonymobile.com>
In certain cases a JGit server updating an existing shallow client
selected a common ancestor that was behind the shallow edge of
the client. This allowed the server to assume the client had some
objects it did not have and allowed creation of pack deltas the
client could never inflate.
Any commit the client has advertised as shallow must be treated
by UploadPack server as though it has no parents. With no parents
the walker cannot visit graph history the client does not have,
and PackWriter cannot consider delta base candidates the client
is lacking.
Change-Id: I4922b9354df9f490966a586fb693762e897345a2
The comment about legacy Tag and Object types no longer applies,
though prior to Idb273d5a92849b42935ac14eed73b796b80aad50 the field
was still being used by RewriteTreeFilter.
Change-Id: I9ee5da8f8a3b61c9cf543817c03117ee0609dd8f
StartGenerator now processes .git/shallow to have the
RevWalk stop for shallow commits.
See RevWalkShallowTest for tests.
Bug: 394543
CQ: 6908
Change-Id: Ia5af1dab3fe9c7888f44eeecab1e1bcf2e8e48fe
Signed-off-by: Chris Aniszczyk <zx@twitter.com>
Use Integer, Character, and Long valueOf methods when
passing parameters to MessageFormat and other places
that expect objects instead of primitives
Change-Id: I5942fbdbca6a378136c00d951ce61167f2366ca4
The "Counting objects" phase of packing is the most time consuming
part for any server providing access to Git repositories. Scanning
through the entire project history, including every revision of
every tree that has ever existed is expensive and takes an incredible
amount of CPU time.
Inline the tree parsing logic, unroll a number of loops, and setup
to better handle the common case of seeing another occurrence of
an object that was already marked SEEN.
This change boosts the "Counting objects" phase when JGit is acting
as a server and is packing the linux-2.6 repository for its client.
Compared to CGit on the same hardware, a JGit daemon server is now
21883 objects/sec faster:
CGit:
Counted 2058062 objects in 38981 ms at 52796.54 objects/sec
Counted 2058062 objects in 38920 ms at 52879.29 objects/sec
Counted 2058062 objects in 39059 ms at 52691.11 objects/sec
JGit (before):
Counted 2058062 objects in 31529 ms at 65275.21 objects/sec
Counted 2058062 objects in 30359 ms at 67790.84 objects/sec
Counted 2058062 objects in 30033 ms at 68526.69 objects/sec
JGit (this commit):
Counted 2058062 objects in 28726 ms at 71644.57 objects/sec
Counted 2058062 objects in 27652 ms at 74427.24 objects/sec
Counted 2058062 objects in 27528 ms at 74762.50 objects/sec
Above the first run was a "cold server". For JGit the JVM had just
started up with `jgit daemon`, and for CGit we hadn't touched the
repository "recently" (but it was certainly in kernel buffer cache).
The second and third runs were against the running JGit JVM, allowing
timing tests to better reflect the benefits of JGit's pack and index
caching, as well as any optimizations the JIT may have performed.
The timings are fair. CGit is opening, checking and mmap'ing both
the pack and index during the timer. JGit is opening, checking
and malloc+read'ing the pack and index data into its Java heap
during the timer. Both processes are walking the same graph space,
and are computing the "path hash" necessary to sort objects in the
object table for delta compression. Since this commit only impacts
the "Counting objects" phase, delta compression was obviously not
included in the timings and JGit may still be performing delta
compression slower than CGit, resulting in an overall slower server
experience for clients.
Change-Id: Ieb184bfaed8475d6960a494b1f3c870e0382164a
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
ObjectIdOwnerMap: More lightweight map for ObjectIds
OwnerMap is about 200 ms faster than SubclassMap, more friendly to the
GC, and uses less storage: testing the "Counting objects" part of
PackWriter on 1886362 objects:
ObjectIdSubclassMap:
load factor 50%
table: 4194304 (wasted 2307942)
ms spent 36998 36009 34795 34703 34941 35070 34284 34511 34638 34256
ms avg 34800 (last 9 runs)
ObjectIdOwnerMap:
load factor 100%
table: 2097152 (wasted 210790)
directory: 1024
ms spent 36842 35112 34922 34703 34580 34782 34165 34662 34314 34140
ms avg 34597 (last 9 runs)
The major difference with OwnerMap is entries must extend from
ObjectIdOwnerMap.Entry, where the OwnerMap has injected its own
private "next" field into each object. This allows the OwnerMap to use
a singly linked list for chaining collisions within a bucket. By
putting collisions in a linked list, we gain the entire table back for
the SHA-1 bits to index their own "private" slot.
Unfortunately this means that each object can appear in at most ONE
OwnerMap, as there is only one "next" field within the object instance
to thread into the map. For types that are very object map heavy like
RevWalk (entity RevObject) and PackWriter (entity ObjectToPack) this
is sufficient, these entity types are only put into one map by their
container. By introducing a new map type, we don't break existing
applications that might be trying to use ObjectIdSubclassMap to track
RevCommits they obtained from a RevWalk.
The OwnerMap uses less memory. Each object uses 1 reference more (so
we're up 1,886,362 references), but the table is 1/2 the size (2^20
rather than 2^21). The table itself wastes only 210,790 slots, rather
than 2,307,942. So OwnerMap is wasting 200k fewer references.
OwnerMap is more friendly to the GC, because it hardly ever generates
garbage. As the map reaches its 100% load factor target, it doubles in
size by allocating additional segment arrays of 2048 entries. (So the
first grow allocates 1 segment, second 2 segments, third 4 segments,
etc.) These segments are hooked into the pre-allocated directory of
1024 spaces. This permits the map to grow to 2 million objects before
the directory itself has to grow. By using segments of 2048 entries,
we are asking the GC to acquire 8,204 bytes in a 32 bit JVM. This is
easier to satisfy then 2,307,942 bytes (for the 512k table that is
just an intermediate step in the SubclassMap). By reusing the
previously allocated segments (they are re-hashed in-place) we don't
release any memory during a table grow.
When the directory grows, it does so by discarding the old one and
using one that is 4x larger (so the directory goes to 4096 entries on
its first grow). A directory of size 4096 can handle up to 8 millon
objects. The second directory grow (16384) goes to 33 million objects.
At that point we're starting to really push the limits of the JVM
heap, but at least its many small arrays. Previously SubclassMap would
need a table of 67108864 entries to handle that object count, which
needs a single contiguous allocation of 256 MiB. That's hard to come
by in a 32 bit JVM. Instead OwnerMap uses 8192 arrays of about 8 KiB
each. This is much easier to fit into a fragmented heap.
Change-Id: Ia4acf5cfbf7e9b71bc7faa0db9060f6a969c0c50
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Applications like UploadPack reset() and reuse the same RevWalk
multiple times in very rapid succession. Releasing the ObjectReader's
internal state on each use, only to allocate it again on the next
cycle kills performance if the ObjectReader has internal caches, or
even if the Inflater gets returned and pulled from the InflaterCache
too frequently.
Making releasing the ObjectReader the application's responsibility
when it is done with the RevWalk, which most already do by wrapping
their loop in a try/finally block.
Change-Id: I3ad188a719e8d7f6bf27d1a7ca16d465534713f4
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
When UploadPack has computed the merge base between the client's have
set and the want set, its already loaded and parsed all of the
interesting commits that PackWriter needs to transmit to the client.
Switching the RevWalk and its object pool over to be an ObjectWalk
saves PackWriter from needing to re-parse these same commits from the
ObjectDatabase, reducing the startup latency for the enumeration
phase of packing.
UploadPack doesn't want to use an ObjectWalk for the okToGiveUp()
tests because its slower, during each commit popped it needs to cache
the tree into the pendingObjects list, and during each reset() it
discards a bunch of ObjectWalk specific state and reallocates some
internal collections. ObjectWalk was never meant to be rapidly
reset() like UploadPack does, so its perhaps somewhat cleaner to allow
"upgrading" a RevWalk to an ObjectWalk.
Bug: 301639
Change-Id: I97ef52a0b79d78229c272880aedb7f74d0f7532f
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Instead of getting the limit from CoreConfig, use the larger of the
reader's limit or 5 MiB, under the assumption that any annotated tag
or commit of interest should be under 5 MiB. But if a repository
was really insane and had bigger objects, the reader implementation
can set its streaming limit higher in order to allow RevWalk to
still process it.
Change-Id: If2c15235daa3e2d1f7167e781aa83fedb5af9a30
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Parsing is rewritten to use the size limited form of getCachedBytes,
thus freeing the revwalk infrastructure from needing to care about
a large object vs. a small object when it gets an ObjectLoader.
Right now we hardcode our upper bound for a commit or annotated
tag to be 15 MiB. I don't know of any that is more than 1 MiB in
the wild, so going 15x that should give us some reasonable headroom.
Change-Id: If296c211d8b257d76e44908504e71dd9ba70ffa8
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
An ObjectReader implementation may be very slow for a single object,
but yet support bulk queries efficiently by batching multiple small
requests into a single larger request. This easily happens when the
reader is built on top of a database that is stored on another host,
as the network round-trip time starts to dominate the operation cost.
RevWalk, ObjectWalk, UploadPack and PackWriter are the first major
users of this new bulk interface, with the goal being to support an
efficient way to pack a repository for a fetch/clone client when the
source repository is stored in a high-latency storage system.
Processing the want/have lists is now done in bulk, to remove
the high costs associated with common ancestor negotiation.
PackWriter already performs object reuse selection in bulk, but it
now can also do the object size lookup and object counting phases
with higher efficiency. Actual object reuse, deltification, and
final output are still doing sequential lookups, making them a bit
more expensive to perform.
Change-Id: I4c966f84917482598012074c370b9831451404ee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
By giving the reader information about the roots of a revision
traversal, some readers may be able to prefetch information from
their backing store using background threads in order to reduce
data access latency. However this isn't typically necessary so
the default reader implementation doesn't react to the advice.
Change-Id: I72c6cbd05cff7d8506826015f50d9f57d5cda77e
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
We want to get rid of these APIs, because they don't perform as well
as DirCache/TreeWalk, or don't offer nearly as many features.
Bug: 319145
Change-Id: I2b28f9cddc36482e1ad42d53e86e9d6461ba3bfc
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Since we don't know the type of object we are parsing, we don't
know if its a massive blob, or some small commit or annotated tag.
Avoid pulling the cached bytes until we have checked the type and
decided if we actually need them to continue parsing right now.
This way large blobs which won't fit in memory and would throw
a LargeObjectException don't abort parsing.
Change-Id: Ifb70df5d1c59f616aa20ee88898cb69524541636
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>