Implement async/batch lookup of object data
An ObjectReader implementation may be very slow for a single object,
but yet support bulk queries efficiently by batching multiple small
requests into a single larger request. This easily happens when the
reader is built on top of a database that is stored on another host,
as the network round-trip time starts to dominate the operation cost.
RevWalk, ObjectWalk, UploadPack and PackWriter are the first major
users of this new bulk interface, with the goal being to support an
efficient way to pack a repository for a fetch/clone client when the
source repository is stored in a high-latency storage system.
Processing the want/have lists is now done in bulk, to remove
the high costs associated with common ancestor negotiation.
PackWriter already performs object reuse selection in bulk, but it
now can also do the object size lookup and object counting phases
with higher efficiency. Actual object reuse, deltification, and
final output are still doing sequential lookups, making them a bit
more expensive to perform.
Change-Id: I4c966f84917482598012074c370b9831451404ee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago Implement async/batch lookup of object data
An ObjectReader implementation may be very slow for a single object,
but yet support bulk queries efficiently by batching multiple small
requests into a single larger request. This easily happens when the
reader is built on top of a database that is stored on another host,
as the network round-trip time starts to dominate the operation cost.
RevWalk, ObjectWalk, UploadPack and PackWriter are the first major
users of this new bulk interface, with the goal being to support an
efficient way to pack a repository for a fetch/clone client when the
source repository is stored in a high-latency storage system.
Processing the want/have lists is now done in bulk, to remove
the high costs associated with common ancestor negotiation.
PackWriter already performs object reuse selection in bulk, but it
now can also do the object size lookup and object counting phases
with higher efficiency. Actual object reuse, deltification, and
final output are still doing sequential lookups, making them a bit
more expensive to perform.
Change-Id: I4c966f84917482598012074c370b9831451404ee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago Implement async/batch lookup of object data
An ObjectReader implementation may be very slow for a single object,
but yet support bulk queries efficiently by batching multiple small
requests into a single larger request. This easily happens when the
reader is built on top of a database that is stored on another host,
as the network round-trip time starts to dominate the operation cost.
RevWalk, ObjectWalk, UploadPack and PackWriter are the first major
users of this new bulk interface, with the goal being to support an
efficient way to pack a repository for a fetch/clone client when the
source repository is stored in a high-latency storage system.
Processing the want/have lists is now done in bulk, to remove
the high costs associated with common ancestor negotiation.
PackWriter already performs object reuse selection in bulk, but it
now can also do the object size lookup and object counting phases
with higher efficiency. Actual object reuse, deltification, and
final output are still doing sequential lookups, making them a bit
more expensive to perform.
Change-Id: I4c966f84917482598012074c370b9831451404ee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago Implement async/batch lookup of object data
An ObjectReader implementation may be very slow for a single object,
but yet support bulk queries efficiently by batching multiple small
requests into a single larger request. This easily happens when the
reader is built on top of a database that is stored on another host,
as the network round-trip time starts to dominate the operation cost.
RevWalk, ObjectWalk, UploadPack and PackWriter are the first major
users of this new bulk interface, with the goal being to support an
efficient way to pack a repository for a fetch/clone client when the
source repository is stored in a high-latency storage system.
Processing the want/have lists is now done in bulk, to remove
the high costs associated with common ancestor negotiation.
PackWriter already performs object reuse selection in bulk, but it
now can also do the object size lookup and object counting phases
with higher efficiency. Actual object reuse, deltification, and
final output are still doing sequential lookups, making them a bit
more expensive to perform.
Change-Id: I4c966f84917482598012074c370b9831451404ee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago ObjectIdOwnerMap: More lightweight map for ObjectIds
OwnerMap is about 200 ms faster than SubclassMap, more friendly to the
GC, and uses less storage: testing the "Counting objects" part of
PackWriter on 1886362 objects:
ObjectIdSubclassMap:
load factor 50%
table: 4194304 (wasted 2307942)
ms spent 36998 36009 34795 34703 34941 35070 34284 34511 34638 34256
ms avg 34800 (last 9 runs)
ObjectIdOwnerMap:
load factor 100%
table: 2097152 (wasted 210790)
directory: 1024
ms spent 36842 35112 34922 34703 34580 34782 34165 34662 34314 34140
ms avg 34597 (last 9 runs)
The major difference with OwnerMap is entries must extend from
ObjectIdOwnerMap.Entry, where the OwnerMap has injected its own
private "next" field into each object. This allows the OwnerMap to use
a singly linked list for chaining collisions within a bucket. By
putting collisions in a linked list, we gain the entire table back for
the SHA-1 bits to index their own "private" slot.
Unfortunately this means that each object can appear in at most ONE
OwnerMap, as there is only one "next" field within the object instance
to thread into the map. For types that are very object map heavy like
RevWalk (entity RevObject) and PackWriter (entity ObjectToPack) this
is sufficient, these entity types are only put into one map by their
container. By introducing a new map type, we don't break existing
applications that might be trying to use ObjectIdSubclassMap to track
RevCommits they obtained from a RevWalk.
The OwnerMap uses less memory. Each object uses 1 reference more (so
we're up 1,886,362 references), but the table is 1/2 the size (2^20
rather than 2^21). The table itself wastes only 210,790 slots, rather
than 2,307,942. So OwnerMap is wasting 200k fewer references.
OwnerMap is more friendly to the GC, because it hardly ever generates
garbage. As the map reaches its 100% load factor target, it doubles in
size by allocating additional segment arrays of 2048 entries. (So the
first grow allocates 1 segment, second 2 segments, third 4 segments,
etc.) These segments are hooked into the pre-allocated directory of
1024 spaces. This permits the map to grow to 2 million objects before
the directory itself has to grow. By using segments of 2048 entries,
we are asking the GC to acquire 8,204 bytes in a 32 bit JVM. This is
easier to satisfy then 2,307,942 bytes (for the 512k table that is
just an intermediate step in the SubclassMap). By reusing the
previously allocated segments (they are re-hashed in-place) we don't
release any memory during a table grow.
When the directory grows, it does so by discarding the old one and
using one that is 4x larger (so the directory goes to 4096 entries on
its first grow). A directory of size 4096 can handle up to 8 millon
objects. The second directory grow (16384) goes to 33 million objects.
At that point we're starting to really push the limits of the JVM
heap, but at least its many small arrays. Previously SubclassMap would
need a table of 67108864 entries to handle that object count, which
needs a single contiguous allocation of 256 MiB. That's hard to come
by in a 32 bit JVM. Instead OwnerMap uses 8192 arrays of about 8 KiB
each. This is much easier to fit into a fragmented heap.
Change-Id: Ia4acf5cfbf7e9b71bc7faa0db9060f6a969c0c50
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago Implement async/batch lookup of object data
An ObjectReader implementation may be very slow for a single object,
but yet support bulk queries efficiently by batching multiple small
requests into a single larger request. This easily happens when the
reader is built on top of a database that is stored on another host,
as the network round-trip time starts to dominate the operation cost.
RevWalk, ObjectWalk, UploadPack and PackWriter are the first major
users of this new bulk interface, with the goal being to support an
efficient way to pack a repository for a fetch/clone client when the
source repository is stored in a high-latency storage system.
Processing the want/have lists is now done in bulk, to remove
the high costs associated with common ancestor negotiation.
PackWriter already performs object reuse selection in bulk, but it
now can also do the object size lookup and object counting phases
with higher efficiency. Actual object reuse, deltification, and
final output are still doing sequential lookups, making them a bit
more expensive to perform.
Change-Id: I4c966f84917482598012074c370b9831451404ee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago Speed up ObjectWalk by 6235 objects/sec
The "Counting objects" phase of packing is the most time consuming
part for any server providing access to Git repositories. Scanning
through the entire project history, including every revision of
every tree that has ever existed is expensive and takes an incredible
amount of CPU time.
Inline the tree parsing logic, unroll a number of loops, and setup
to better handle the common case of seeing another occurrence of
an object that was already marked SEEN.
This change boosts the "Counting objects" phase when JGit is acting
as a server and is packing the linux-2.6 repository for its client.
Compared to CGit on the same hardware, a JGit daemon server is now
21883 objects/sec faster:
CGit:
Counted 2058062 objects in 38981 ms at 52796.54 objects/sec
Counted 2058062 objects in 38920 ms at 52879.29 objects/sec
Counted 2058062 objects in 39059 ms at 52691.11 objects/sec
JGit (before):
Counted 2058062 objects in 31529 ms at 65275.21 objects/sec
Counted 2058062 objects in 30359 ms at 67790.84 objects/sec
Counted 2058062 objects in 30033 ms at 68526.69 objects/sec
JGit (this commit):
Counted 2058062 objects in 28726 ms at 71644.57 objects/sec
Counted 2058062 objects in 27652 ms at 74427.24 objects/sec
Counted 2058062 objects in 27528 ms at 74762.50 objects/sec
Above the first run was a "cold server". For JGit the JVM had just
started up with `jgit daemon`, and for CGit we hadn't touched the
repository "recently" (but it was certainly in kernel buffer cache).
The second and third runs were against the running JGit JVM, allowing
timing tests to better reflect the benefits of JGit's pack and index
caching, as well as any optimizations the JIT may have performed.
The timings are fair. CGit is opening, checking and mmap'ing both
the pack and index during the timer. JGit is opening, checking
and malloc+read'ing the pack and index data into its Java heap
during the timer. Both processes are walking the same graph space,
and are computing the "path hash" necessary to sort objects in the
object table for delta compression. Since this commit only impacts
the "Counting objects" phase, delta compression was obviously not
included in the timings and JGit may still be performing delta
compression slower than CGit, resulting in an overall slower server
experience for clients.
Change-Id: Ieb184bfaed8475d6960a494b1f3c870e0382164a
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago ObjectIdOwnerMap: More lightweight map for ObjectIds
OwnerMap is about 200 ms faster than SubclassMap, more friendly to the
GC, and uses less storage: testing the "Counting objects" part of
PackWriter on 1886362 objects:
ObjectIdSubclassMap:
load factor 50%
table: 4194304 (wasted 2307942)
ms spent 36998 36009 34795 34703 34941 35070 34284 34511 34638 34256
ms avg 34800 (last 9 runs)
ObjectIdOwnerMap:
load factor 100%
table: 2097152 (wasted 210790)
directory: 1024
ms spent 36842 35112 34922 34703 34580 34782 34165 34662 34314 34140
ms avg 34597 (last 9 runs)
The major difference with OwnerMap is entries must extend from
ObjectIdOwnerMap.Entry, where the OwnerMap has injected its own
private "next" field into each object. This allows the OwnerMap to use
a singly linked list for chaining collisions within a bucket. By
putting collisions in a linked list, we gain the entire table back for
the SHA-1 bits to index their own "private" slot.
Unfortunately this means that each object can appear in at most ONE
OwnerMap, as there is only one "next" field within the object instance
to thread into the map. For types that are very object map heavy like
RevWalk (entity RevObject) and PackWriter (entity ObjectToPack) this
is sufficient, these entity types are only put into one map by their
container. By introducing a new map type, we don't break existing
applications that might be trying to use ObjectIdSubclassMap to track
RevCommits they obtained from a RevWalk.
The OwnerMap uses less memory. Each object uses 1 reference more (so
we're up 1,886,362 references), but the table is 1/2 the size (2^20
rather than 2^21). The table itself wastes only 210,790 slots, rather
than 2,307,942. So OwnerMap is wasting 200k fewer references.
OwnerMap is more friendly to the GC, because it hardly ever generates
garbage. As the map reaches its 100% load factor target, it doubles in
size by allocating additional segment arrays of 2048 entries. (So the
first grow allocates 1 segment, second 2 segments, third 4 segments,
etc.) These segments are hooked into the pre-allocated directory of
1024 spaces. This permits the map to grow to 2 million objects before
the directory itself has to grow. By using segments of 2048 entries,
we are asking the GC to acquire 8,204 bytes in a 32 bit JVM. This is
easier to satisfy then 2,307,942 bytes (for the 512k table that is
just an intermediate step in the SubclassMap). By reusing the
previously allocated segments (they are re-hashed in-place) we don't
release any memory during a table grow.
When the directory grows, it does so by discarding the old one and
using one that is 4x larger (so the directory goes to 4096 entries on
its first grow). A directory of size 4096 can handle up to 8 millon
objects. The second directory grow (16384) goes to 33 million objects.
At that point we're starting to really push the limits of the JVM
heap, but at least its many small arrays. Previously SubclassMap would
need a table of 67108864 entries to handle that object count, which
needs a single contiguous allocation of 256 MiB. That's hard to come
by in a 32 bit JVM. Instead OwnerMap uses 8192 arrays of about 8 KiB
each. This is much easier to fit into a fragmented heap.
Change-Id: Ia4acf5cfbf7e9b71bc7faa0db9060f6a969c0c50
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago Implement async/batch lookup of object data
An ObjectReader implementation may be very slow for a single object,
but yet support bulk queries efficiently by batching multiple small
requests into a single larger request. This easily happens when the
reader is built on top of a database that is stored on another host,
as the network round-trip time starts to dominate the operation cost.
RevWalk, ObjectWalk, UploadPack and PackWriter are the first major
users of this new bulk interface, with the goal being to support an
efficient way to pack a repository for a fetch/clone client when the
source repository is stored in a high-latency storage system.
Processing the want/have lists is now done in bulk, to remove
the high costs associated with common ancestor negotiation.
PackWriter already performs object reuse selection in bulk, but it
now can also do the object size lookup and object counting phases
with higher efficiency. Actual object reuse, deltification, and
final output are still doing sequential lookups, making them a bit
more expensive to perform.
Change-Id: I4c966f84917482598012074c370b9831451404ee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago Implement async/batch lookup of object data
An ObjectReader implementation may be very slow for a single object,
but yet support bulk queries efficiently by batching multiple small
requests into a single larger request. This easily happens when the
reader is built on top of a database that is stored on another host,
as the network round-trip time starts to dominate the operation cost.
RevWalk, ObjectWalk, UploadPack and PackWriter are the first major
users of this new bulk interface, with the goal being to support an
efficient way to pack a repository for a fetch/clone client when the
source repository is stored in a high-latency storage system.
Processing the want/have lists is now done in bulk, to remove
the high costs associated with common ancestor negotiation.
PackWriter already performs object reuse selection in bulk, but it
now can also do the object size lookup and object counting phases
with higher efficiency. Actual object reuse, deltification, and
final output are still doing sequential lookups, making them a bit
more expensive to perform.
Change-Id: I4c966f84917482598012074c370b9831451404ee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago Implement async/batch lookup of object data
An ObjectReader implementation may be very slow for a single object,
but yet support bulk queries efficiently by batching multiple small
requests into a single larger request. This easily happens when the
reader is built on top of a database that is stored on another host,
as the network round-trip time starts to dominate the operation cost.
RevWalk, ObjectWalk, UploadPack and PackWriter are the first major
users of this new bulk interface, with the goal being to support an
efficient way to pack a repository for a fetch/clone client when the
source repository is stored in a high-latency storage system.
Processing the want/have lists is now done in bulk, to remove
the high costs associated with common ancestor negotiation.
PackWriter already performs object reuse selection in bulk, but it
now can also do the object size lookup and object counting phases
with higher efficiency. Actual object reuse, deltification, and
final output are still doing sequential lookups, making them a bit
more expensive to perform.
Change-Id: I4c966f84917482598012074c370b9831451404ee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago Implement async/batch lookup of object data
An ObjectReader implementation may be very slow for a single object,
but yet support bulk queries efficiently by batching multiple small
requests into a single larger request. This easily happens when the
reader is built on top of a database that is stored on another host,
as the network round-trip time starts to dominate the operation cost.
RevWalk, ObjectWalk, UploadPack and PackWriter are the first major
users of this new bulk interface, with the goal being to support an
efficient way to pack a repository for a fetch/clone client when the
source repository is stored in a high-latency storage system.
Processing the want/have lists is now done in bulk, to remove
the high costs associated with common ancestor negotiation.
PackWriter already performs object reuse selection in bulk, but it
now can also do the object size lookup and object counting phases
with higher efficiency. Actual object reuse, deltification, and
final output are still doing sequential lookups, making them a bit
more expensive to perform.
Change-Id: I4c966f84917482598012074c370b9831451404ee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago Implement async/batch lookup of object data
An ObjectReader implementation may be very slow for a single object,
but yet support bulk queries efficiently by batching multiple small
requests into a single larger request. This easily happens when the
reader is built on top of a database that is stored on another host,
as the network round-trip time starts to dominate the operation cost.
RevWalk, ObjectWalk, UploadPack and PackWriter are the first major
users of this new bulk interface, with the goal being to support an
efficient way to pack a repository for a fetch/clone client when the
source repository is stored in a high-latency storage system.
Processing the want/have lists is now done in bulk, to remove
the high costs associated with common ancestor negotiation.
PackWriter already performs object reuse selection in bulk, but it
now can also do the object size lookup and object counting phases
with higher efficiency. Actual object reuse, deltification, and
final output are still doing sequential lookups, making them a bit
more expensive to perform.
Change-Id: I4c966f84917482598012074c370b9831451404ee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago Implement async/batch lookup of object data
An ObjectReader implementation may be very slow for a single object,
but yet support bulk queries efficiently by batching multiple small
requests into a single larger request. This easily happens when the
reader is built on top of a database that is stored on another host,
as the network round-trip time starts to dominate the operation cost.
RevWalk, ObjectWalk, UploadPack and PackWriter are the first major
users of this new bulk interface, with the goal being to support an
efficient way to pack a repository for a fetch/clone client when the
source repository is stored in a high-latency storage system.
Processing the want/have lists is now done in bulk, to remove
the high costs associated with common ancestor negotiation.
PackWriter already performs object reuse selection in bulk, but it
now can also do the object size lookup and object counting phases
with higher efficiency. Actual object reuse, deltification, and
final output are still doing sequential lookups, making them a bit
more expensive to perform.
Change-Id: I4c966f84917482598012074c370b9831451404ee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago Implement async/batch lookup of object data
An ObjectReader implementation may be very slow for a single object,
but yet support bulk queries efficiently by batching multiple small
requests into a single larger request. This easily happens when the
reader is built on top of a database that is stored on another host,
as the network round-trip time starts to dominate the operation cost.
RevWalk, ObjectWalk, UploadPack and PackWriter are the first major
users of this new bulk interface, with the goal being to support an
efficient way to pack a repository for a fetch/clone client when the
source repository is stored in a high-latency storage system.
Processing the want/have lists is now done in bulk, to remove
the high costs associated with common ancestor negotiation.
PackWriter already performs object reuse selection in bulk, but it
now can also do the object size lookup and object counting phases
with higher efficiency. Actual object reuse, deltification, and
final output are still doing sequential lookups, making them a bit
more expensive to perform.
Change-Id: I4c966f84917482598012074c370b9831451404ee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago Implement async/batch lookup of object data
An ObjectReader implementation may be very slow for a single object,
but yet support bulk queries efficiently by batching multiple small
requests into a single larger request. This easily happens when the
reader is built on top of a database that is stored on another host,
as the network round-trip time starts to dominate the operation cost.
RevWalk, ObjectWalk, UploadPack and PackWriter are the first major
users of this new bulk interface, with the goal being to support an
efficient way to pack a repository for a fetch/clone client when the
source repository is stored in a high-latency storage system.
Processing the want/have lists is now done in bulk, to remove
the high costs associated with common ancestor negotiation.
PackWriter already performs object reuse selection in bulk, but it
now can also do the object size lookup and object counting phases
with higher efficiency. Actual object reuse, deltification, and
final output are still doing sequential lookups, making them a bit
more expensive to perform.
Change-Id: I4c966f84917482598012074c370b9831451404ee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago Implement async/batch lookup of object data
An ObjectReader implementation may be very slow for a single object,
but yet support bulk queries efficiently by batching multiple small
requests into a single larger request. This easily happens when the
reader is built on top of a database that is stored on another host,
as the network round-trip time starts to dominate the operation cost.
RevWalk, ObjectWalk, UploadPack and PackWriter are the first major
users of this new bulk interface, with the goal being to support an
efficient way to pack a repository for a fetch/clone client when the
source repository is stored in a high-latency storage system.
Processing the want/have lists is now done in bulk, to remove
the high costs associated with common ancestor negotiation.
PackWriter already performs object reuse selection in bulk, but it
now can also do the object size lookup and object counting phases
with higher efficiency. Actual object reuse, deltification, and
final output are still doing sequential lookups, making them a bit
more expensive to perform.
Change-Id: I4c966f84917482598012074c370b9831451404ee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago Implement async/batch lookup of object data
An ObjectReader implementation may be very slow for a single object,
but yet support bulk queries efficiently by batching multiple small
requests into a single larger request. This easily happens when the
reader is built on top of a database that is stored on another host,
as the network round-trip time starts to dominate the operation cost.
RevWalk, ObjectWalk, UploadPack and PackWriter are the first major
users of this new bulk interface, with the goal being to support an
efficient way to pack a repository for a fetch/clone client when the
source repository is stored in a high-latency storage system.
Processing the want/have lists is now done in bulk, to remove
the high costs associated with common ancestor negotiation.
PackWriter already performs object reuse selection in bulk, but it
now can also do the object size lookup and object counting phases
with higher efficiency. Actual object reuse, deltification, and
final output are still doing sequential lookups, making them a bit
more expensive to perform.
Change-Id: I4c966f84917482598012074c370b9831451404ee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago |
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991001011021031041051061071081091101111121131141151161171181191201211221231241251261271281291301311321331341351361371381391401411421431441451461471481491501511521531541551561571581591601611621631641651661671681691701711721731741751761771781791801811821831841851861871881891901911921931941951961971981992002012022032042052062072082092102112122132142152162172182192202212222232242252262272282292302312322332342352362372382392402412422432442452462472482492502512522532542552562572582592602612622632642652662672682692702712722732742752762772782792802812822832842852862872882892902912922932942952962972982993003013023033043053063073083093103113123133143153163173183193203213223233243253263273283293303313323333343353363373383393403413423433443453463473483493503513523533543553563573583593603613623633643653663673683693703713723733743753763773783793803813823833843853863873883893903913923933943953963973983994004014024034044054064074084094104114124134144154164174184194204214224234244254264274284294304314324334344354364374384394404414424434444454464474484494504514524534544554564574584594604614624634644654664674684694704714724734744754764774784794804814824834844854864874884894904914924934944954964974984995005015025035045055065075085095105115125135145155165175185195205215225235245255265275285295305315325335345355365375385395405415425435445455465475485495505515525535545555565575585595605615625635645655665675685695705715725735745755765775785795805815825835845855865875885895905915925935945955965975985996006016026036046056066076086096106116126136146156166176186196206216226236246256266276286296306316326336346356366376386396406416426436446456466476486496506516526536546556566576586596606616626636646656666676686696706716726736746756766776786796806816826836846856866876886896906916926936946956966976986997007017027037047057067077087097107117127137147157167177187197207217227237247257267277287297307317327337347357367377387397407417427437447457467477487497507517527537547557567577587597607617627637647657667677687697707717727737747757767777787797807817827837847857867877887897907917927937947957967977987998008018028038048058068078088098108118128138148158168178188198208218228238248258268278288298308318328338348358368378388398408418428438448458468478488498508518528538548558568578588598608618628638648658668678688698708718728738748758768778788798808818828838848858868878888898908918928938948958968978988999009019029039049059069079089099109119129139149159169179189199209219229239249259269279289299309319329339349359369379389399409419429439449459469479489499509519529539549559569579589599609619629639649659669679689699709719729739749759769779789799809819829839849859869879889899909919929939949959969979989991000100110021003100410051006100710081009101010111012101310141015101610171018101910201021102210231024102510261027102810291030103110321033103410351036103710381039104010411042104310441045104610471048104910501051105210531054105510561057105810591060106110621063106410651066106710681069107010711072107310741075107610771078107910801081108210831084108510861087108810891090109110921093109410951096109710981099110011011102110311041105110611071108110911101111111211131114111511161117111811191120112111221123112411251126112711281129113011311132113311341135113611371138113911401141114211431144114511461147114811491150115111521153115411551156115711581159116011611162116311641165116611671168116911701171117211731174117511761177117811791180118111821183118411851186118711881189119011911192119311941195119611971198119912001201120212031204120512061207120812091210121112121213121412151216121712181219122012211222122312241225122612271228122912301231123212331234123512361237123812391240124112421243124412451246124712481249125012511252125312541255125612571258125912601261126212631264126512661267126812691270127112721273127412751276127712781279128012811282128312841285128612871288128912901291129212931294129512961297129812991300130113021303130413051306130713081309131013111312131313141315131613171318131913201321132213231324 |
- /*
- * Copyright (C) 2007, Robin Rosenberg <robin.rosenberg@dewire.com>
- * Copyright (C) 2008, Shawn O. Pearce <spearce@spearce.org>
- * and other copyright owners as documented in the project's IP log.
- *
- * This program and the accompanying materials are made available
- * under the terms of the Eclipse Distribution License v1.0 which
- * accompanies this distribution, is reproduced below, and is
- * available at http://www.eclipse.org/org/documents/edl-v10.php
- *
- * All rights reserved.
- *
- * Redistribution and use in source and binary forms, with or
- * without modification, are permitted provided that the following
- * conditions are met:
- *
- * - Redistributions of source code must retain the above copyright
- * notice, this list of conditions and the following disclaimer.
- *
- * - Redistributions in binary form must reproduce the above
- * copyright notice, this list of conditions and the following
- * disclaimer in the documentation and/or other materials provided
- * with the distribution.
- *
- * - Neither the name of the Eclipse Foundation, Inc. nor the
- * names of its contributors may be used to endorse or promote
- * products derived from this software without specific prior
- * written permission.
- *
- * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
- * CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
- * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
- * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
- * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
- * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
- * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
- * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
- * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
- * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
- * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
- * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
- package org.eclipse.jgit.revwalk;
-
- import java.io.IOException;
- import java.text.MessageFormat;
- import java.util.ArrayList;
- import java.util.Collection;
- import java.util.EnumSet;
- import java.util.Iterator;
- import java.util.List;
-
- import org.eclipse.jgit.errors.CorruptObjectException;
- import org.eclipse.jgit.errors.IncorrectObjectTypeException;
- import org.eclipse.jgit.errors.LargeObjectException;
- import org.eclipse.jgit.errors.MissingObjectException;
- import org.eclipse.jgit.errors.RevWalkException;
- import org.eclipse.jgit.internal.JGitText;
- import org.eclipse.jgit.lib.AnyObjectId;
- import org.eclipse.jgit.lib.AsyncObjectLoaderQueue;
- import org.eclipse.jgit.lib.Constants;
- import org.eclipse.jgit.lib.MutableObjectId;
- import org.eclipse.jgit.lib.ObjectId;
- import org.eclipse.jgit.lib.ObjectIdOwnerMap;
- import org.eclipse.jgit.lib.ObjectLoader;
- import org.eclipse.jgit.lib.ObjectReader;
- import org.eclipse.jgit.lib.Repository;
- import org.eclipse.jgit.revwalk.filter.RevFilter;
- import org.eclipse.jgit.treewalk.filter.TreeFilter;
-
- /**
- * Walks a commit graph and produces the matching commits in order.
- * <p>
- * A RevWalk instance can only be used once to generate results. Running a
- * second time requires creating a new RevWalk instance, or invoking
- * {@link #reset()} before starting again. Resetting an existing instance may be
- * faster for some applications as commit body parsing can be avoided on the
- * later invocations.
- * <p>
- * RevWalk instances are not thread-safe. Applications must either restrict
- * usage of a RevWalk instance to a single thread, or implement their own
- * synchronization at a higher level.
- * <p>
- * Multiple simultaneous RevWalk instances per {@link Repository} are permitted,
- * even from concurrent threads. Equality of {@link RevCommit}s from two
- * different RevWalk instances is never true, even if their {@link ObjectId}s
- * are equal (and thus they describe the same commit).
- * <p>
- * The offered iterator is over the list of RevCommits described by the
- * configuration of this instance. Applications should restrict themselves to
- * using either the provided Iterator or {@link #next()}, but never use both on
- * the same RevWalk at the same time. The Iterator may buffer RevCommits, while
- * {@link #next()} does not.
- */
- public class RevWalk implements Iterable<RevCommit> {
- private static final int MB = 1 << 20;
-
- /**
- * Set on objects whose important header data has been loaded.
- * <p>
- * For a RevCommit this indicates we have pulled apart the tree and parent
- * references from the raw bytes available in the repository and translated
- * those to our own local RevTree and RevCommit instances. The raw buffer is
- * also available for message and other header filtering.
- * <p>
- * For a RevTag this indicates we have pulled part the tag references to
- * find out who the tag refers to, and what that object's type is.
- */
- static final int PARSED = 1 << 0;
-
- /**
- * Set on RevCommit instances added to our {@link #pending} queue.
- * <p>
- * We use this flag to avoid adding the same commit instance twice to our
- * queue, especially if we reached it by more than one path.
- */
- static final int SEEN = 1 << 1;
-
- /**
- * Set on RevCommit instances the caller does not want output.
- * <p>
- * We flag commits as uninteresting if the caller does not want commits
- * reachable from a commit given to {@link #markUninteresting(RevCommit)}.
- * This flag is always carried into the commit's parents and is a key part
- * of the "rev-list B --not A" feature; A is marked UNINTERESTING.
- */
- static final int UNINTERESTING = 1 << 2;
-
- /**
- * Set on a RevCommit that can collapse out of the history.
- * <p>
- * If the {@link #treeFilter} concluded that this commit matches his
- * parents' for all of the paths that the filter is interested in then we
- * mark the commit REWRITE. Later we can rewrite the parents of a REWRITE
- * child to remove chains of REWRITE commits before we produce the child to
- * the application.
- *
- * @see RewriteGenerator
- */
- static final int REWRITE = 1 << 3;
-
- /**
- * Temporary mark for use within generators or filters.
- * <p>
- * This mark is only for local use within a single scope. If someone sets
- * the mark they must unset it before any other code can see the mark.
- */
- static final int TEMP_MARK = 1 << 4;
-
- /**
- * Temporary mark for use within {@link TopoSortGenerator}.
- * <p>
- * This mark indicates the commit could not produce when it wanted to, as at
- * least one child was behind it. Commits with this flag are delayed until
- * all children have been output first.
- */
- static final int TOPO_DELAY = 1 << 5;
-
- /** Number of flag bits we keep internal for our own use. See above flags. */
- static final int RESERVED_FLAGS = 6;
-
- private static final int APP_FLAGS = -1 & ~((1 << RESERVED_FLAGS) - 1);
-
- final ObjectReader reader;
-
- final MutableObjectId idBuffer;
-
- ObjectIdOwnerMap<RevObject> objects;
-
- private int freeFlags = APP_FLAGS;
-
- private int delayFreeFlags;
-
- int carryFlags = UNINTERESTING;
-
- final ArrayList<RevCommit> roots;
-
- AbstractRevQueue queue;
-
- Generator pending;
-
- private final EnumSet<RevSort> sorting;
-
- private RevFilter filter;
-
- private TreeFilter treeFilter;
-
- private boolean retainBody;
-
- boolean shallowCommitsInitialized;
-
- /**
- * Create a new revision walker for a given repository.
- *
- * @param repo
- * the repository the walker will obtain data from. An
- * ObjectReader will be created by the walker, and must be
- * released by the caller.
- */
- public RevWalk(final Repository repo) {
- this(repo.newObjectReader());
- }
-
- /**
- * Create a new revision walker for a given repository.
- *
- * @param or
- * the reader the walker will obtain data from. The reader should
- * be released by the caller when the walker is no longer
- * required.
- */
- public RevWalk(ObjectReader or) {
- reader = or;
- idBuffer = new MutableObjectId();
- objects = new ObjectIdOwnerMap<RevObject>();
- roots = new ArrayList<RevCommit>();
- queue = new DateRevQueue();
- pending = new StartGenerator(this);
- sorting = EnumSet.of(RevSort.NONE);
- filter = RevFilter.ALL;
- treeFilter = TreeFilter.ALL;
- retainBody = true;
- }
-
- /** @return the reader this walker is using to load objects. */
- public ObjectReader getObjectReader() {
- return reader;
- }
-
- /**
- * Release any resources used by this walker's reader.
- * <p>
- * A walker that has been released can be used again, but may need to be
- * released after the subsequent usage.
- */
- public void release() {
- reader.release();
- }
-
- /**
- * Mark a commit to start graph traversal from.
- * <p>
- * Callers are encouraged to use {@link #parseCommit(AnyObjectId)} to obtain
- * the commit reference, rather than {@link #lookupCommit(AnyObjectId)}, as
- * this method requires the commit to be parsed before it can be added as a
- * root for the traversal.
- * <p>
- * The method will automatically parse an unparsed commit, but error
- * handling may be more difficult for the application to explain why a
- * RevCommit is not actually a commit. The object pool of this walker would
- * also be 'poisoned' by the non-commit RevCommit.
- *
- * @param c
- * the commit to start traversing from. The commit passed must be
- * from this same revision walker.
- * @throws MissingObjectException
- * the commit supplied is not available from the object
- * database. This usually indicates the supplied commit is
- * invalid, but the reference was constructed during an earlier
- * invocation to {@link #lookupCommit(AnyObjectId)}.
- * @throws IncorrectObjectTypeException
- * the object was not parsed yet and it was discovered during
- * parsing that it is not actually a commit. This usually
- * indicates the caller supplied a non-commit SHA-1 to
- * {@link #lookupCommit(AnyObjectId)}.
- * @throws IOException
- * a pack file or loose object could not be read.
- */
- public void markStart(final RevCommit c) throws MissingObjectException,
- IncorrectObjectTypeException, IOException {
- if ((c.flags & SEEN) != 0)
- return;
- if ((c.flags & PARSED) == 0)
- c.parseHeaders(this);
- c.flags |= SEEN;
- roots.add(c);
- queue.add(c);
- }
-
- /**
- * Mark commits to start graph traversal from.
- *
- * @param list
- * commits to start traversing from. The commits passed must be
- * from this same revision walker.
- * @throws MissingObjectException
- * one of the commits supplied is not available from the object
- * database. This usually indicates the supplied commit is
- * invalid, but the reference was constructed during an earlier
- * invocation to {@link #lookupCommit(AnyObjectId)}.
- * @throws IncorrectObjectTypeException
- * the object was not parsed yet and it was discovered during
- * parsing that it is not actually a commit. This usually
- * indicates the caller supplied a non-commit SHA-1 to
- * {@link #lookupCommit(AnyObjectId)}.
- * @throws IOException
- * a pack file or loose object could not be read.
- */
- public void markStart(final Collection<RevCommit> list)
- throws MissingObjectException, IncorrectObjectTypeException,
- IOException {
- for (final RevCommit c : list)
- markStart(c);
- }
-
- /**
- * Mark a commit to not produce in the output.
- * <p>
- * Uninteresting commits denote not just themselves but also their entire
- * ancestry chain, back until the merge base of an uninteresting commit and
- * an otherwise interesting commit.
- * <p>
- * Callers are encouraged to use {@link #parseCommit(AnyObjectId)} to obtain
- * the commit reference, rather than {@link #lookupCommit(AnyObjectId)}, as
- * this method requires the commit to be parsed before it can be added as a
- * root for the traversal.
- * <p>
- * The method will automatically parse an unparsed commit, but error
- * handling may be more difficult for the application to explain why a
- * RevCommit is not actually a commit. The object pool of this walker would
- * also be 'poisoned' by the non-commit RevCommit.
- *
- * @param c
- * the commit to start traversing from. The commit passed must be
- * from this same revision walker.
- * @throws MissingObjectException
- * the commit supplied is not available from the object
- * database. This usually indicates the supplied commit is
- * invalid, but the reference was constructed during an earlier
- * invocation to {@link #lookupCommit(AnyObjectId)}.
- * @throws IncorrectObjectTypeException
- * the object was not parsed yet and it was discovered during
- * parsing that it is not actually a commit. This usually
- * indicates the caller supplied a non-commit SHA-1 to
- * {@link #lookupCommit(AnyObjectId)}.
- * @throws IOException
- * a pack file or loose object could not be read.
- */
- public void markUninteresting(final RevCommit c)
- throws MissingObjectException, IncorrectObjectTypeException,
- IOException {
- c.flags |= UNINTERESTING;
- carryFlagsImpl(c);
- markStart(c);
- }
-
- /**
- * Determine if a commit is reachable from another commit.
- * <p>
- * A commit <code>base</code> is an ancestor of <code>tip</code> if we
- * can find a path of commits that leads from <code>tip</code> and ends at
- * <code>base</code>.
- * <p>
- * This utility function resets the walker, inserts the two supplied
- * commits, and then executes a walk until an answer can be obtained.
- * Currently allocated RevFlags that have been added to RevCommit instances
- * will be retained through the reset.
- *
- * @param base
- * commit the caller thinks is reachable from <code>tip</code>.
- * @param tip
- * commit to start iteration from, and which is most likely a
- * descendant (child) of <code>base</code>.
- * @return true if there is a path directly from <code>tip</code> to
- * <code>base</code> (and thus <code>base</code> is fully merged
- * into <code>tip</code>); false otherwise.
- * @throws MissingObjectException
- * one or or more of the next commit's parents are not available
- * from the object database, but were thought to be candidates
- * for traversal. This usually indicates a broken link.
- * @throws IncorrectObjectTypeException
- * one or or more of the next commit's parents are not actually
- * commit objects.
- * @throws IOException
- * a pack file or loose object could not be read.
- */
- public boolean isMergedInto(final RevCommit base, final RevCommit tip)
- throws MissingObjectException, IncorrectObjectTypeException,
- IOException {
- final RevFilter oldRF = filter;
- final TreeFilter oldTF = treeFilter;
- try {
- finishDelayedFreeFlags();
- reset(~freeFlags & APP_FLAGS);
- filter = RevFilter.MERGE_BASE;
- treeFilter = TreeFilter.ALL;
- markStart(tip);
- markStart(base);
- return next() == base;
- } finally {
- filter = oldRF;
- treeFilter = oldTF;
- }
- }
-
- /**
- * Pop the next most recent commit.
- *
- * @return next most recent commit; null if traversal is over.
- * @throws MissingObjectException
- * one or or more of the next commit's parents are not available
- * from the object database, but were thought to be candidates
- * for traversal. This usually indicates a broken link.
- * @throws IncorrectObjectTypeException
- * one or or more of the next commit's parents are not actually
- * commit objects.
- * @throws IOException
- * a pack file or loose object could not be read.
- */
- public RevCommit next() throws MissingObjectException,
- IncorrectObjectTypeException, IOException {
- return pending.next();
- }
-
- /**
- * Obtain the sort types applied to the commits returned.
- *
- * @return the sorting strategies employed. At least one strategy is always
- * used, but that strategy may be {@link RevSort#NONE}.
- */
- public EnumSet<RevSort> getRevSort() {
- return sorting.clone();
- }
-
- /**
- * Check whether the provided sorting strategy is enabled.
- *
- * @param sort
- * a sorting strategy to look for.
- * @return true if this strategy is enabled, false otherwise
- */
- public boolean hasRevSort(RevSort sort) {
- return sorting.contains(sort);
- }
-
- /**
- * Select a single sorting strategy for the returned commits.
- * <p>
- * Disables all sorting strategies, then enables only the single strategy
- * supplied by the caller.
- *
- * @param s
- * a sorting strategy to enable.
- */
- public void sort(final RevSort s) {
- assertNotStarted();
- sorting.clear();
- sorting.add(s);
- }
-
- /**
- * Add or remove a sorting strategy for the returned commits.
- * <p>
- * Multiple strategies can be applied at once, in which case some strategies
- * may take precedence over others. As an example, {@link RevSort#TOPO} must
- * take precedence over {@link RevSort#COMMIT_TIME_DESC}, otherwise it
- * cannot enforce its ordering.
- *
- * @param s
- * a sorting strategy to enable or disable.
- * @param use
- * true if this strategy should be used, false if it should be
- * removed.
- */
- public void sort(final RevSort s, final boolean use) {
- assertNotStarted();
- if (use)
- sorting.add(s);
- else
- sorting.remove(s);
-
- if (sorting.size() > 1)
- sorting.remove(RevSort.NONE);
- else if (sorting.size() == 0)
- sorting.add(RevSort.NONE);
- }
-
- /**
- * Get the currently configured commit filter.
- *
- * @return the current filter. Never null as a filter is always needed.
- */
- public RevFilter getRevFilter() {
- return filter;
- }
-
- /**
- * Set the commit filter for this walker.
- * <p>
- * Multiple filters may be combined by constructing an arbitrary tree of
- * <code>AndRevFilter</code> or <code>OrRevFilter</code> instances to
- * describe the boolean expression required by the application. Custom
- * filter implementations may also be constructed by applications.
- * <p>
- * Note that filters are not thread-safe and may not be shared by concurrent
- * RevWalk instances. Every RevWalk must be supplied its own unique filter,
- * unless the filter implementation specifically states it is (and always
- * will be) thread-safe. Callers may use {@link RevFilter#clone()} to create
- * a unique filter tree for this RevWalk instance.
- *
- * @param newFilter
- * the new filter. If null the special {@link RevFilter#ALL}
- * filter will be used instead, as it matches every commit.
- * @see org.eclipse.jgit.revwalk.filter.AndRevFilter
- * @see org.eclipse.jgit.revwalk.filter.OrRevFilter
- */
- public void setRevFilter(final RevFilter newFilter) {
- assertNotStarted();
- filter = newFilter != null ? newFilter : RevFilter.ALL;
- }
-
- /**
- * Get the tree filter used to simplify commits by modified paths.
- *
- * @return the current filter. Never null as a filter is always needed. If
- * no filter is being applied {@link TreeFilter#ALL} is returned.
- */
- public TreeFilter getTreeFilter() {
- return treeFilter;
- }
-
- /**
- * Set the tree filter used to simplify commits by modified paths.
- * <p>
- * If null or {@link TreeFilter#ALL} the path limiter is removed. Commits
- * will not be simplified.
- * <p>
- * If non-null and not {@link TreeFilter#ALL} then the tree filter will be
- * installed and commits will have their ancestry simplified to hide commits
- * that do not contain tree entries matched by the filter.
- * <p>
- * Usually callers should be inserting a filter graph including
- * {@link TreeFilter#ANY_DIFF} along with one or more
- * {@link org.eclipse.jgit.treewalk.filter.PathFilter} instances.
- *
- * @param newFilter
- * new filter. If null the special {@link TreeFilter#ALL} filter
- * will be used instead, as it matches everything.
- * @see org.eclipse.jgit.treewalk.filter.PathFilter
- */
- public void setTreeFilter(final TreeFilter newFilter) {
- assertNotStarted();
- treeFilter = newFilter != null ? newFilter : TreeFilter.ALL;
- }
-
- /**
- * Should the body of a commit or tag be retained after parsing its headers?
- * <p>
- * Usually the body is always retained, but some application code might not
- * care and would prefer to discard the body of a commit as early as
- * possible, to reduce memory usage.
- *
- * @return true if the body should be retained; false it is discarded.
- */
- public boolean isRetainBody() {
- return retainBody;
- }
-
- /**
- * Set whether or not the body of a commit or tag is retained.
- * <p>
- * If a body of a commit or tag is not retained, the application must
- * call {@link #parseBody(RevObject)} before the body can be safely
- * accessed through the type specific access methods.
- *
- * @param retain true to retain bodies; false to discard them early.
- */
- public void setRetainBody(final boolean retain) {
- retainBody = retain;
- }
-
- /**
- * Locate a reference to a blob without loading it.
- * <p>
- * The blob may or may not exist in the repository. It is impossible to tell
- * from this method's return value.
- *
- * @param id
- * name of the blob object.
- * @return reference to the blob object. Never null.
- */
- public RevBlob lookupBlob(final AnyObjectId id) {
- RevBlob c = (RevBlob) objects.get(id);
- if (c == null) {
- c = new RevBlob(id);
- objects.add(c);
- }
- return c;
- }
-
- /**
- * Locate a reference to a tree without loading it.
- * <p>
- * The tree may or may not exist in the repository. It is impossible to tell
- * from this method's return value.
- *
- * @param id
- * name of the tree object.
- * @return reference to the tree object. Never null.
- */
- public RevTree lookupTree(final AnyObjectId id) {
- RevTree c = (RevTree) objects.get(id);
- if (c == null) {
- c = new RevTree(id);
- objects.add(c);
- }
- return c;
- }
-
- /**
- * Locate a reference to a commit without loading it.
- * <p>
- * The commit may or may not exist in the repository. It is impossible to
- * tell from this method's return value.
- * <p>
- * See {@link #parseHeaders(RevObject)} and {@link #parseBody(RevObject)}
- * for loading contents.
- *
- * @param id
- * name of the commit object.
- * @return reference to the commit object. Never null.
- */
- public RevCommit lookupCommit(final AnyObjectId id) {
- RevCommit c = (RevCommit) objects.get(id);
- if (c == null) {
- c = createCommit(id);
- objects.add(c);
- }
- return c;
- }
-
- /**
- * Locate a reference to a tag without loading it.
- * <p>
- * The tag may or may not exist in the repository. It is impossible to tell
- * from this method's return value.
- *
- * @param id
- * name of the tag object.
- * @return reference to the tag object. Never null.
- */
- public RevTag lookupTag(final AnyObjectId id) {
- RevTag c = (RevTag) objects.get(id);
- if (c == null) {
- c = new RevTag(id);
- objects.add(c);
- }
- return c;
- }
-
- /**
- * Locate a reference to any object without loading it.
- * <p>
- * The object may or may not exist in the repository. It is impossible to
- * tell from this method's return value.
- *
- * @param id
- * name of the object.
- * @param type
- * type of the object. Must be a valid Git object type.
- * @return reference to the object. Never null.
- */
- public RevObject lookupAny(final AnyObjectId id, final int type) {
- RevObject r = objects.get(id);
- if (r == null) {
- switch (type) {
- case Constants.OBJ_COMMIT:
- r = createCommit(id);
- break;
- case Constants.OBJ_TREE:
- r = new RevTree(id);
- break;
- case Constants.OBJ_BLOB:
- r = new RevBlob(id);
- break;
- case Constants.OBJ_TAG:
- r = new RevTag(id);
- break;
- default:
- throw new IllegalArgumentException(MessageFormat.format(
- JGitText.get().invalidGitType, Integer.valueOf(type)));
- }
- objects.add(r);
- }
- return r;
- }
-
- /**
- * Locate an object that was previously allocated in this walk.
- *
- * @param id
- * name of the object.
- * @return reference to the object if it has been previously located;
- * otherwise null.
- */
- public RevObject lookupOrNull(AnyObjectId id) {
- return objects.get(id);
- }
-
- /**
- * Locate a reference to a commit and immediately parse its content.
- * <p>
- * Unlike {@link #lookupCommit(AnyObjectId)} this method only returns
- * successfully if the commit object exists, is verified to be a commit, and
- * was parsed without error.
- *
- * @param id
- * name of the commit object.
- * @return reference to the commit object. Never null.
- * @throws MissingObjectException
- * the supplied commit does not exist.
- * @throws IncorrectObjectTypeException
- * the supplied id is not a commit or an annotated tag.
- * @throws IOException
- * a pack file or loose object could not be read.
- */
- public RevCommit parseCommit(final AnyObjectId id)
- throws MissingObjectException, IncorrectObjectTypeException,
- IOException {
- RevObject c = peel(parseAny(id));
- if (!(c instanceof RevCommit))
- throw new IncorrectObjectTypeException(id.toObjectId(),
- Constants.TYPE_COMMIT);
- return (RevCommit) c;
- }
-
- /**
- * Locate a reference to a tree.
- * <p>
- * This method only returns successfully if the tree object exists, is
- * verified to be a tree.
- *
- * @param id
- * name of the tree object, or a commit or annotated tag that may
- * reference a tree.
- * @return reference to the tree object. Never null.
- * @throws MissingObjectException
- * the supplied tree does not exist.
- * @throws IncorrectObjectTypeException
- * the supplied id is not a tree, a commit or an annotated tag.
- * @throws IOException
- * a pack file or loose object could not be read.
- */
- public RevTree parseTree(final AnyObjectId id)
- throws MissingObjectException, IncorrectObjectTypeException,
- IOException {
- RevObject c = peel(parseAny(id));
-
- final RevTree t;
- if (c instanceof RevCommit)
- t = ((RevCommit) c).getTree();
- else if (!(c instanceof RevTree))
- throw new IncorrectObjectTypeException(id.toObjectId(),
- Constants.TYPE_TREE);
- else
- t = (RevTree) c;
- parseHeaders(t);
- return t;
- }
-
- /**
- * Locate a reference to an annotated tag and immediately parse its content.
- * <p>
- * Unlike {@link #lookupTag(AnyObjectId)} this method only returns
- * successfully if the tag object exists, is verified to be a tag, and was
- * parsed without error.
- *
- * @param id
- * name of the tag object.
- * @return reference to the tag object. Never null.
- * @throws MissingObjectException
- * the supplied tag does not exist.
- * @throws IncorrectObjectTypeException
- * the supplied id is not a tag or an annotated tag.
- * @throws IOException
- * a pack file or loose object could not be read.
- */
- public RevTag parseTag(final AnyObjectId id) throws MissingObjectException,
- IncorrectObjectTypeException, IOException {
- RevObject c = parseAny(id);
- if (!(c instanceof RevTag))
- throw new IncorrectObjectTypeException(id.toObjectId(),
- Constants.TYPE_TAG);
- return (RevTag) c;
- }
-
- /**
- * Locate a reference to any object and immediately parse its headers.
- * <p>
- * This method only returns successfully if the object exists and was parsed
- * without error. Parsing an object can be expensive as the type must be
- * determined. For blobs this may mean the blob content was unpacked
- * unnecessarily, and thrown away.
- *
- * @param id
- * name of the object.
- * @return reference to the object. Never null.
- * @throws MissingObjectException
- * the supplied does not exist.
- * @throws IOException
- * a pack file or loose object could not be read.
- */
- public RevObject parseAny(final AnyObjectId id)
- throws MissingObjectException, IOException {
- RevObject r = objects.get(id);
- if (r == null)
- r = parseNew(id, reader.open(id));
- else
- parseHeaders(r);
- return r;
- }
-
- private RevObject parseNew(AnyObjectId id, ObjectLoader ldr)
- throws LargeObjectException, CorruptObjectException,
- MissingObjectException, IOException {
- RevObject r;
- int type = ldr.getType();
- switch (type) {
- case Constants.OBJ_COMMIT: {
- final RevCommit c = createCommit(id);
- c.parseCanonical(this, getCachedBytes(c, ldr));
- r = c;
- break;
- }
- case Constants.OBJ_TREE: {
- r = new RevTree(id);
- r.flags |= PARSED;
- break;
- }
- case Constants.OBJ_BLOB: {
- r = new RevBlob(id);
- r.flags |= PARSED;
- break;
- }
- case Constants.OBJ_TAG: {
- final RevTag t = new RevTag(id);
- t.parseCanonical(this, getCachedBytes(t, ldr));
- r = t;
- break;
- }
- default:
- throw new IllegalArgumentException(MessageFormat.format(
- JGitText.get().badObjectType, Integer.valueOf(type)));
- }
- objects.add(r);
- return r;
- }
-
- byte[] getCachedBytes(RevObject obj) throws LargeObjectException,
- MissingObjectException, IncorrectObjectTypeException, IOException {
- return getCachedBytes(obj, reader.open(obj, obj.getType()));
- }
-
- byte[] getCachedBytes(RevObject obj, ObjectLoader ldr)
- throws LargeObjectException, MissingObjectException, IOException {
- try {
- return ldr.getCachedBytes(5 * MB);
- } catch (LargeObjectException tooBig) {
- tooBig.setObjectId(obj);
- throw tooBig;
- }
- }
-
- /**
- * Asynchronous object parsing.
- *
- * @param <T>
- * any ObjectId type.
- * @param objectIds
- * objects to open from the object store. The supplied collection
- * must not be modified until the queue has finished.
- * @param reportMissing
- * if true missing objects are reported by calling failure with a
- * MissingObjectException. This may be more expensive for the
- * implementation to guarantee. If false the implementation may
- * choose to report MissingObjectException, or silently skip over
- * the object with no warning.
- * @return queue to read the objects from.
- */
- public <T extends ObjectId> AsyncRevObjectQueue parseAny(
- Iterable<T> objectIds, boolean reportMissing) {
- List<T> need = new ArrayList<T>();
- List<RevObject> have = new ArrayList<RevObject>();
- for (T id : objectIds) {
- RevObject r = objects.get(id);
- if (r != null && (r.flags & PARSED) != 0)
- have.add(r);
- else
- need.add(id);
- }
-
- final Iterator<RevObject> objItr = have.iterator();
- if (need.isEmpty()) {
- return new AsyncRevObjectQueue() {
- public RevObject next() {
- return objItr.hasNext() ? objItr.next() : null;
- }
-
- public boolean cancel(boolean mayInterruptIfRunning) {
- return true;
- }
-
- public void release() {
- // In-memory only, no action required.
- }
- };
- }
-
- final AsyncObjectLoaderQueue<T> lItr = reader.open(need, reportMissing);
- return new AsyncRevObjectQueue() {
- public RevObject next() throws MissingObjectException,
- IncorrectObjectTypeException, IOException {
- if (objItr.hasNext())
- return objItr.next();
- if (!lItr.next())
- return null;
-
- ObjectId id = lItr.getObjectId();
- ObjectLoader ldr = lItr.open();
- RevObject r = objects.get(id);
- if (r == null)
- r = parseNew(id, ldr);
- else if (r instanceof RevCommit) {
- byte[] raw = ldr.getCachedBytes();
- ((RevCommit) r).parseCanonical(RevWalk.this, raw);
- } else if (r instanceof RevTag) {
- byte[] raw = ldr.getCachedBytes();
- ((RevTag) r).parseCanonical(RevWalk.this, raw);
- } else
- r.flags |= PARSED;
- return r;
- }
-
- public boolean cancel(boolean mayInterruptIfRunning) {
- return lItr.cancel(mayInterruptIfRunning);
- }
-
- public void release() {
- lItr.release();
- }
- };
- }
-
- /**
- * Ensure the object's critical headers have been parsed.
- * <p>
- * This method only returns successfully if the object exists and was parsed
- * without error.
- *
- * @param obj
- * the object the caller needs to be parsed.
- * @throws MissingObjectException
- * the supplied does not exist.
- * @throws IOException
- * a pack file or loose object could not be read.
- */
- public void parseHeaders(final RevObject obj)
- throws MissingObjectException, IOException {
- if ((obj.flags & PARSED) == 0)
- obj.parseHeaders(this);
- }
-
- /**
- * Ensure the object's full body content is available.
- * <p>
- * This method only returns successfully if the object exists and was parsed
- * without error.
- *
- * @param obj
- * the object the caller needs to be parsed.
- * @throws MissingObjectException
- * the supplied does not exist.
- * @throws IOException
- * a pack file or loose object could not be read.
- */
- public void parseBody(final RevObject obj)
- throws MissingObjectException, IOException {
- obj.parseBody(this);
- }
-
- /**
- * Peel back annotated tags until a non-tag object is found.
- *
- * @param obj
- * the starting object.
- * @return If {@code obj} is not an annotated tag, {@code obj}. Otherwise
- * the first non-tag object that {@code obj} references. The
- * returned object's headers have been parsed.
- * @throws MissingObjectException
- * a referenced object cannot be found.
- * @throws IOException
- * a pack file or loose object could not be read.
- */
- public RevObject peel(RevObject obj) throws MissingObjectException,
- IOException {
- while (obj instanceof RevTag) {
- parseHeaders(obj);
- obj = ((RevTag) obj).getObject();
- }
- parseHeaders(obj);
- return obj;
- }
-
- /**
- * Create a new flag for application use during walking.
- * <p>
- * Applications are only assured to be able to create 24 unique flags on any
- * given revision walker instance. Any flags beyond 24 are offered only if
- * the implementation has extra free space within its internal storage.
- *
- * @param name
- * description of the flag, primarily useful for debugging.
- * @return newly constructed flag instance.
- * @throws IllegalArgumentException
- * too many flags have been reserved on this revision walker.
- */
- public RevFlag newFlag(final String name) {
- final int m = allocFlag();
- return new RevFlag(this, name, m);
- }
-
- int allocFlag() {
- if (freeFlags == 0)
- throw new IllegalArgumentException(MessageFormat.format(
- JGitText.get().flagsAlreadyCreated,
- Integer.valueOf(32 - RESERVED_FLAGS)));
- final int m = Integer.lowestOneBit(freeFlags);
- freeFlags &= ~m;
- return m;
- }
-
- /**
- * Automatically carry a flag from a child commit to its parents.
- * <p>
- * A carried flag is copied from the child commit onto its parents when the
- * child commit is popped from the lowest level of walk's internal graph.
- *
- * @param flag
- * the flag to carry onto parents, if set on a descendant.
- */
- public void carry(final RevFlag flag) {
- if ((freeFlags & flag.mask) != 0)
- throw new IllegalArgumentException(MessageFormat.format(JGitText.get().flagIsDisposed, flag.name));
- if (flag.walker != this)
- throw new IllegalArgumentException(MessageFormat.format(JGitText.get().flagNotFromThis, flag.name));
- carryFlags |= flag.mask;
- }
-
- /**
- * Automatically carry flags from a child commit to its parents.
- * <p>
- * A carried flag is copied from the child commit onto its parents when the
- * child commit is popped from the lowest level of walk's internal graph.
- *
- * @param set
- * the flags to carry onto parents, if set on a descendant.
- */
- public void carry(final Collection<RevFlag> set) {
- for (final RevFlag flag : set)
- carry(flag);
- }
-
- /**
- * Allow a flag to be recycled for a different use.
- * <p>
- * Recycled flags always come back as a different Java object instance when
- * assigned again by {@link #newFlag(String)}.
- * <p>
- * If the flag was previously being carried, the carrying request is
- * removed. Disposing of a carried flag while a traversal is in progress has
- * an undefined behavior.
- *
- * @param flag
- * the to recycle.
- */
- public void disposeFlag(final RevFlag flag) {
- freeFlag(flag.mask);
- }
-
- void freeFlag(final int mask) {
- if (isNotStarted()) {
- freeFlags |= mask;
- carryFlags &= ~mask;
- } else {
- delayFreeFlags |= mask;
- }
- }
-
- private void finishDelayedFreeFlags() {
- if (delayFreeFlags != 0) {
- freeFlags |= delayFreeFlags;
- carryFlags &= ~delayFreeFlags;
- delayFreeFlags = 0;
- }
- }
-
- /**
- * Resets internal state and allows this instance to be used again.
- * <p>
- * Unlike {@link #dispose()} previously acquired RevObject (and RevCommit)
- * instances are not invalidated. RevFlag instances are not invalidated, but
- * are removed from all RevObjects.
- */
- public final void reset() {
- reset(0);
- }
-
- /**
- * Resets internal state and allows this instance to be used again.
- * <p>
- * Unlike {@link #dispose()} previously acquired RevObject (and RevCommit)
- * instances are not invalidated. RevFlag instances are not invalidated, but
- * are removed from all RevObjects.
- *
- * @param retainFlags
- * application flags that should <b>not</b> be cleared from
- * existing commit objects.
- */
- public final void resetRetain(final RevFlagSet retainFlags) {
- reset(retainFlags.mask);
- }
-
- /**
- * Resets internal state and allows this instance to be used again.
- * <p>
- * Unlike {@link #dispose()} previously acquired RevObject (and RevCommit)
- * instances are not invalidated. RevFlag instances are not invalidated, but
- * are removed from all RevObjects.
- *
- * @param retainFlags
- * application flags that should <b>not</b> be cleared from
- * existing commit objects.
- */
- public final void resetRetain(final RevFlag... retainFlags) {
- int mask = 0;
- for (final RevFlag flag : retainFlags)
- mask |= flag.mask;
- reset(mask);
- }
-
- /**
- * Resets internal state and allows this instance to be used again.
- * <p>
- * Unlike {@link #dispose()} previously acquired RevObject (and RevCommit)
- * instances are not invalidated. RevFlag instances are not invalidated, but
- * are removed from all RevObjects.
- *
- * @param retainFlags
- * application flags that should <b>not</b> be cleared from
- * existing commit objects.
- */
- protected void reset(int retainFlags) {
- finishDelayedFreeFlags();
- retainFlags |= PARSED;
- final int clearFlags = ~retainFlags;
-
- final FIFORevQueue q = new FIFORevQueue();
- for (final RevCommit c : roots) {
- if ((c.flags & clearFlags) == 0)
- continue;
- c.flags &= retainFlags;
- c.reset();
- q.add(c);
- }
-
- for (;;) {
- final RevCommit c = q.next();
- if (c == null)
- break;
- if (c.parents == null)
- continue;
- for (final RevCommit p : c.parents) {
- if ((p.flags & clearFlags) == 0)
- continue;
- p.flags &= retainFlags;
- p.reset();
- q.add(p);
- }
- }
-
- roots.clear();
- queue = new DateRevQueue();
- pending = new StartGenerator(this);
- }
-
- /**
- * Dispose all internal state and invalidate all RevObject instances.
- * <p>
- * All RevObject (and thus RevCommit, etc.) instances previously acquired
- * from this RevWalk are invalidated by a dispose call. Applications must
- * not retain or use RevObject instances obtained prior to the dispose call.
- * All RevFlag instances are also invalidated, and must not be reused.
- */
- public void dispose() {
- reader.release();
- freeFlags = APP_FLAGS;
- delayFreeFlags = 0;
- carryFlags = UNINTERESTING;
- objects.clear();
- reader.release();
- roots.clear();
- queue = new DateRevQueue();
- pending = new StartGenerator(this);
- shallowCommitsInitialized = false;
- }
-
- /**
- * Returns an Iterator over the commits of this walker.
- * <p>
- * The returned iterator is only useful for one walk. If this RevWalk gets
- * reset a new iterator must be obtained to walk over the new results.
- * <p>
- * Applications must not use both the Iterator and the {@link #next()} API
- * at the same time. Pick one API and use that for the entire walk.
- * <p>
- * If a checked exception is thrown during the walk (see {@link #next()})
- * it is rethrown from the Iterator as a {@link RevWalkException}.
- *
- * @return an iterator over this walker's commits.
- * @see RevWalkException
- */
- public Iterator<RevCommit> iterator() {
- final RevCommit first;
- try {
- first = RevWalk.this.next();
- } catch (MissingObjectException e) {
- throw new RevWalkException(e);
- } catch (IncorrectObjectTypeException e) {
- throw new RevWalkException(e);
- } catch (IOException e) {
- throw new RevWalkException(e);
- }
-
- return new Iterator<RevCommit>() {
- RevCommit next = first;
-
- public boolean hasNext() {
- return next != null;
- }
-
- public RevCommit next() {
- try {
- final RevCommit r = next;
- next = RevWalk.this.next();
- return r;
- } catch (MissingObjectException e) {
- throw new RevWalkException(e);
- } catch (IncorrectObjectTypeException e) {
- throw new RevWalkException(e);
- } catch (IOException e) {
- throw new RevWalkException(e);
- }
- }
-
- public void remove() {
- throw new UnsupportedOperationException();
- }
- };
- }
-
- /** Throws an exception if we have started producing output. */
- protected void assertNotStarted() {
- if (isNotStarted())
- return;
- throw new IllegalStateException(JGitText.get().outputHasAlreadyBeenStarted);
- }
-
- private boolean isNotStarted() {
- return pending instanceof StartGenerator;
- }
-
- /**
- * Create and return an {@link ObjectWalk} using the same objects.
- * <p>
- * Prior to using this method, the caller must reset this RevWalk to clean
- * any flags that were used during the last traversal.
- * <p>
- * The returned ObjectWalk uses the same ObjectReader, internal object pool,
- * and free RevFlags. Once the ObjectWalk is created, this RevWalk should
- * not be used anymore.
- *
- * @return a new walk, using the exact same object pool.
- */
- public ObjectWalk toObjectWalkWithSameObjects() {
- ObjectWalk ow = new ObjectWalk(reader);
- RevWalk rw = ow;
- rw.objects = objects;
- rw.freeFlags = freeFlags;
- return ow;
- }
-
- /**
- * Construct a new unparsed commit for the given object.
- *
- * @param id
- * the object this walker requires a commit reference for.
- * @return a new unparsed reference for the object.
- */
- protected RevCommit createCommit(final AnyObjectId id) {
- return new RevCommit(id);
- }
-
- void carryFlagsImpl(final RevCommit c) {
- final int carry = c.flags & carryFlags;
- if (carry != 0)
- RevCommit.carryFlags(c, carry);
- }
-
- void initializeShallowCommits() throws IOException {
- if (shallowCommitsInitialized)
- throw new IllegalStateException(
- JGitText.get().shallowCommitsAlreadyInitialized);
-
- shallowCommitsInitialized = true;
-
- if (reader == null)
- return;
-
- for (ObjectId id : reader.getShallowCommits())
- lookupCommit(id).parents = RevCommit.NO_PARENTS;
- }
- }
|