Implement async/batch lookup of object data
An ObjectReader implementation may be very slow for a single object,
but yet support bulk queries efficiently by batching multiple small
requests into a single larger request. This easily happens when the
reader is built on top of a database that is stored on another host,
as the network round-trip time starts to dominate the operation cost.
RevWalk, ObjectWalk, UploadPack and PackWriter are the first major
users of this new bulk interface, with the goal being to support an
efficient way to pack a repository for a fetch/clone client when the
source repository is stored in a high-latency storage system.
Processing the want/have lists is now done in bulk, to remove
the high costs associated with common ancestor negotiation.
PackWriter already performs object reuse selection in bulk, but it
now can also do the object size lookup and object counting phases
with higher efficiency. Actual object reuse, deltification, and
final output are still doing sequential lookups, making them a bit
more expensive to perform.
Change-Id: I4c966f84917482598012074c370b9831451404ee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago RefAdvertiser: Avoid object parsing
It isn't strictly necessary to validate every reference's target
object is reachable in the repository before advertising it to a
client. This is an expensive operation when there are thousands of
references, and its very unlikely that a reference uses a missing
object, because garbage collection proceeds from the references and
walks down through the graph. So trying to hide a dangling reference
from clients is relatively pointless.
Even if we are trying to avoid giving a client a corrupt repository,
this simple check isn't sufficient. It is possible for a reference to
point to a valid commit, but that commit to have a missing blob in its
root tree. This can be caused by staging a file into the index,
waiting several weeks, then committing that file while also racing
against a prune. The prune may delete the blob, since its
modification time is more than 2 weeks ago, but retain the commit,
since its modification time is right now.
Such graph corruption is already caught during PackWriter as it
enumerates the graph from the client's want list and digs back
to the roots or common base. Leave the reference validation also
for that same phase, where we know we have to parse the object to
support the enumeration.
Change-Id: Iee70ead0d3ed2d2fcc980417d09d7a69b05f5c2f
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago Implement async/batch lookup of object data
An ObjectReader implementation may be very slow for a single object,
but yet support bulk queries efficiently by batching multiple small
requests into a single larger request. This easily happens when the
reader is built on top of a database that is stored on another host,
as the network round-trip time starts to dominate the operation cost.
RevWalk, ObjectWalk, UploadPack and PackWriter are the first major
users of this new bulk interface, with the goal being to support an
efficient way to pack a repository for a fetch/clone client when the
source repository is stored in a high-latency storage system.
Processing the want/have lists is now done in bulk, to remove
the high costs associated with common ancestor negotiation.
PackWriter already performs object reuse selection in bulk, but it
now can also do the object size lookup and object counting phases
with higher efficiency. Actual object reuse, deltification, and
final output are still doing sequential lookups, making them a bit
more expensive to perform.
Change-Id: I4c966f84917482598012074c370b9831451404ee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago Implement async/batch lookup of object data
An ObjectReader implementation may be very slow for a single object,
but yet support bulk queries efficiently by batching multiple small
requests into a single larger request. This easily happens when the
reader is built on top of a database that is stored on another host,
as the network round-trip time starts to dominate the operation cost.
RevWalk, ObjectWalk, UploadPack and PackWriter are the first major
users of this new bulk interface, with the goal being to support an
efficient way to pack a repository for a fetch/clone client when the
source repository is stored in a high-latency storage system.
Processing the want/have lists is now done in bulk, to remove
the high costs associated with common ancestor negotiation.
PackWriter already performs object reuse selection in bulk, but it
now can also do the object size lookup and object counting phases
with higher efficiency. Actual object reuse, deltification, and
final output are still doing sequential lookups, making them a bit
more expensive to perform.
Change-Id: I4c966f84917482598012074c370b9831451404ee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago RefAdvertiser: Avoid object parsing
It isn't strictly necessary to validate every reference's target
object is reachable in the repository before advertising it to a
client. This is an expensive operation when there are thousands of
references, and its very unlikely that a reference uses a missing
object, because garbage collection proceeds from the references and
walks down through the graph. So trying to hide a dangling reference
from clients is relatively pointless.
Even if we are trying to avoid giving a client a corrupt repository,
this simple check isn't sufficient. It is possible for a reference to
point to a valid commit, but that commit to have a missing blob in its
root tree. This can be caused by staging a file into the index,
waiting several weeks, then committing that file while also racing
against a prune. The prune may delete the blob, since its
modification time is more than 2 weeks ago, but retain the commit,
since its modification time is right now.
Such graph corruption is already caught during PackWriter as it
enumerates the graph from the client's want list and digs back
to the roots or common base. Leave the reference validation also
for that same phase, where we know we have to parse the object to
support the enumeration.
Change-Id: Iee70ead0d3ed2d2fcc980417d09d7a69b05f5c2f
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago RefAdvertiser: Avoid object parsing
It isn't strictly necessary to validate every reference's target
object is reachable in the repository before advertising it to a
client. This is an expensive operation when there are thousands of
references, and its very unlikely that a reference uses a missing
object, because garbage collection proceeds from the references and
walks down through the graph. So trying to hide a dangling reference
from clients is relatively pointless.
Even if we are trying to avoid giving a client a corrupt repository,
this simple check isn't sufficient. It is possible for a reference to
point to a valid commit, but that commit to have a missing blob in its
root tree. This can be caused by staging a file into the index,
waiting several weeks, then committing that file while also racing
against a prune. The prune may delete the blob, since its
modification time is more than 2 weeks ago, but retain the commit,
since its modification time is right now.
Such graph corruption is already caught during PackWriter as it
enumerates the graph from the client's want list and digs back
to the roots or common base. Leave the reference validation also
for that same phase, where we know we have to parse the object to
support the enumeration.
Change-Id: Iee70ead0d3ed2d2fcc980417d09d7a69b05f5c2f
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago RefAdvertiser: Avoid object parsing
It isn't strictly necessary to validate every reference's target
object is reachable in the repository before advertising it to a
client. This is an expensive operation when there are thousands of
references, and its very unlikely that a reference uses a missing
object, because garbage collection proceeds from the references and
walks down through the graph. So trying to hide a dangling reference
from clients is relatively pointless.
Even if we are trying to avoid giving a client a corrupt repository,
this simple check isn't sufficient. It is possible for a reference to
point to a valid commit, but that commit to have a missing blob in its
root tree. This can be caused by staging a file into the index,
waiting several weeks, then committing that file while also racing
against a prune. The prune may delete the blob, since its
modification time is more than 2 weeks ago, but retain the commit,
since its modification time is right now.
Such graph corruption is already caught during PackWriter as it
enumerates the graph from the client's want list and digs back
to the roots or common base. Leave the reference validation also
for that same phase, where we know we have to parse the object to
support the enumeration.
Change-Id: Iee70ead0d3ed2d2fcc980417d09d7a69b05f5c2f
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago Implement async/batch lookup of object data
An ObjectReader implementation may be very slow for a single object,
but yet support bulk queries efficiently by batching multiple small
requests into a single larger request. This easily happens when the
reader is built on top of a database that is stored on another host,
as the network round-trip time starts to dominate the operation cost.
RevWalk, ObjectWalk, UploadPack and PackWriter are the first major
users of this new bulk interface, with the goal being to support an
efficient way to pack a repository for a fetch/clone client when the
source repository is stored in a high-latency storage system.
Processing the want/have lists is now done in bulk, to remove
the high costs associated with common ancestor negotiation.
PackWriter already performs object reuse selection in bulk, but it
now can also do the object size lookup and object counting phases
with higher efficiency. Actual object reuse, deltification, and
final output are still doing sequential lookups, making them a bit
more expensive to perform.
Change-Id: I4c966f84917482598012074c370b9831451404ee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago Implement async/batch lookup of object data
An ObjectReader implementation may be very slow for a single object,
but yet support bulk queries efficiently by batching multiple small
requests into a single larger request. This easily happens when the
reader is built on top of a database that is stored on another host,
as the network round-trip time starts to dominate the operation cost.
RevWalk, ObjectWalk, UploadPack and PackWriter are the first major
users of this new bulk interface, with the goal being to support an
efficient way to pack a repository for a fetch/clone client when the
source repository is stored in a high-latency storage system.
Processing the want/have lists is now done in bulk, to remove
the high costs associated with common ancestor negotiation.
PackWriter already performs object reuse selection in bulk, but it
now can also do the object size lookup and object counting phases
with higher efficiency. Actual object reuse, deltification, and
final output are still doing sequential lookups, making them a bit
more expensive to perform.
Change-Id: I4c966f84917482598012074c370b9831451404ee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago Implement async/batch lookup of object data
An ObjectReader implementation may be very slow for a single object,
but yet support bulk queries efficiently by batching multiple small
requests into a single larger request. This easily happens when the
reader is built on top of a database that is stored on another host,
as the network round-trip time starts to dominate the operation cost.
RevWalk, ObjectWalk, UploadPack and PackWriter are the first major
users of this new bulk interface, with the goal being to support an
efficient way to pack a repository for a fetch/clone client when the
source repository is stored in a high-latency storage system.
Processing the want/have lists is now done in bulk, to remove
the high costs associated with common ancestor negotiation.
PackWriter already performs object reuse selection in bulk, but it
now can also do the object size lookup and object counting phases
with higher efficiency. Actual object reuse, deltification, and
final output are still doing sequential lookups, making them a bit
more expensive to perform.
Change-Id: I4c966f84917482598012074c370b9831451404ee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago Implement async/batch lookup of object data
An ObjectReader implementation may be very slow for a single object,
but yet support bulk queries efficiently by batching multiple small
requests into a single larger request. This easily happens when the
reader is built on top of a database that is stored on another host,
as the network round-trip time starts to dominate the operation cost.
RevWalk, ObjectWalk, UploadPack and PackWriter are the first major
users of this new bulk interface, with the goal being to support an
efficient way to pack a repository for a fetch/clone client when the
source repository is stored in a high-latency storage system.
Processing the want/have lists is now done in bulk, to remove
the high costs associated with common ancestor negotiation.
PackWriter already performs object reuse selection in bulk, but it
now can also do the object size lookup and object counting phases
with higher efficiency. Actual object reuse, deltification, and
final output are still doing sequential lookups, making them a bit
more expensive to perform.
Change-Id: I4c966f84917482598012074c370b9831451404ee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago Implement async/batch lookup of object data
An ObjectReader implementation may be very slow for a single object,
but yet support bulk queries efficiently by batching multiple small
requests into a single larger request. This easily happens when the
reader is built on top of a database that is stored on another host,
as the network round-trip time starts to dominate the operation cost.
RevWalk, ObjectWalk, UploadPack and PackWriter are the first major
users of this new bulk interface, with the goal being to support an
efficient way to pack a repository for a fetch/clone client when the
source repository is stored in a high-latency storage system.
Processing the want/have lists is now done in bulk, to remove
the high costs associated with common ancestor negotiation.
PackWriter already performs object reuse selection in bulk, but it
now can also do the object size lookup and object counting phases
with higher efficiency. Actual object reuse, deltification, and
final output are still doing sequential lookups, making them a bit
more expensive to perform.
Change-Id: I4c966f84917482598012074c370b9831451404ee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago Implement async/batch lookup of object data
An ObjectReader implementation may be very slow for a single object,
but yet support bulk queries efficiently by batching multiple small
requests into a single larger request. This easily happens when the
reader is built on top of a database that is stored on another host,
as the network round-trip time starts to dominate the operation cost.
RevWalk, ObjectWalk, UploadPack and PackWriter are the first major
users of this new bulk interface, with the goal being to support an
efficient way to pack a repository for a fetch/clone client when the
source repository is stored in a high-latency storage system.
Processing the want/have lists is now done in bulk, to remove
the high costs associated with common ancestor negotiation.
PackWriter already performs object reuse selection in bulk, but it
now can also do the object size lookup and object counting phases
with higher efficiency. Actual object reuse, deltification, and
final output are still doing sequential lookups, making them a bit
more expensive to perform.
Change-Id: I4c966f84917482598012074c370b9831451404ee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago Implement async/batch lookup of object data
An ObjectReader implementation may be very slow for a single object,
but yet support bulk queries efficiently by batching multiple small
requests into a single larger request. This easily happens when the
reader is built on top of a database that is stored on another host,
as the network round-trip time starts to dominate the operation cost.
RevWalk, ObjectWalk, UploadPack and PackWriter are the first major
users of this new bulk interface, with the goal being to support an
efficient way to pack a repository for a fetch/clone client when the
source repository is stored in a high-latency storage system.
Processing the want/have lists is now done in bulk, to remove
the high costs associated with common ancestor negotiation.
PackWriter already performs object reuse selection in bulk, but it
now can also do the object size lookup and object counting phases
with higher efficiency. Actual object reuse, deltification, and
final output are still doing sequential lookups, making them a bit
more expensive to perform.
Change-Id: I4c966f84917482598012074c370b9831451404ee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago Implement async/batch lookup of object data
An ObjectReader implementation may be very slow for a single object,
but yet support bulk queries efficiently by batching multiple small
requests into a single larger request. This easily happens when the
reader is built on top of a database that is stored on another host,
as the network round-trip time starts to dominate the operation cost.
RevWalk, ObjectWalk, UploadPack and PackWriter are the first major
users of this new bulk interface, with the goal being to support an
efficient way to pack a repository for a fetch/clone client when the
source repository is stored in a high-latency storage system.
Processing the want/have lists is now done in bulk, to remove
the high costs associated with common ancestor negotiation.
PackWriter already performs object reuse selection in bulk, but it
now can also do the object size lookup and object counting phases
with higher efficiency. Actual object reuse, deltification, and
final output are still doing sequential lookups, making them a bit
more expensive to perform.
Change-Id: I4c966f84917482598012074c370b9831451404ee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago Implement async/batch lookup of object data
An ObjectReader implementation may be very slow for a single object,
but yet support bulk queries efficiently by batching multiple small
requests into a single larger request. This easily happens when the
reader is built on top of a database that is stored on another host,
as the network round-trip time starts to dominate the operation cost.
RevWalk, ObjectWalk, UploadPack and PackWriter are the first major
users of this new bulk interface, with the goal being to support an
efficient way to pack a repository for a fetch/clone client when the
source repository is stored in a high-latency storage system.
Processing the want/have lists is now done in bulk, to remove
the high costs associated with common ancestor negotiation.
PackWriter already performs object reuse selection in bulk, but it
now can also do the object size lookup and object counting phases
with higher efficiency. Actual object reuse, deltification, and
final output are still doing sequential lookups, making them a bit
more expensive to perform.
Change-Id: I4c966f84917482598012074c370b9831451404ee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago Implement async/batch lookup of object data
An ObjectReader implementation may be very slow for a single object,
but yet support bulk queries efficiently by batching multiple small
requests into a single larger request. This easily happens when the
reader is built on top of a database that is stored on another host,
as the network round-trip time starts to dominate the operation cost.
RevWalk, ObjectWalk, UploadPack and PackWriter are the first major
users of this new bulk interface, with the goal being to support an
efficient way to pack a repository for a fetch/clone client when the
source repository is stored in a high-latency storage system.
Processing the want/have lists is now done in bulk, to remove
the high costs associated with common ancestor negotiation.
PackWriter already performs object reuse selection in bulk, but it
now can also do the object size lookup and object counting phases
with higher efficiency. Actual object reuse, deltification, and
final output are still doing sequential lookups, making them a bit
more expensive to perform.
Change-Id: I4c966f84917482598012074c370b9831451404ee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago Implement async/batch lookup of object data
An ObjectReader implementation may be very slow for a single object,
but yet support bulk queries efficiently by batching multiple small
requests into a single larger request. This easily happens when the
reader is built on top of a database that is stored on another host,
as the network round-trip time starts to dominate the operation cost.
RevWalk, ObjectWalk, UploadPack and PackWriter are the first major
users of this new bulk interface, with the goal being to support an
efficient way to pack a repository for a fetch/clone client when the
source repository is stored in a high-latency storage system.
Processing the want/have lists is now done in bulk, to remove
the high costs associated with common ancestor negotiation.
PackWriter already performs object reuse selection in bulk, but it
now can also do the object size lookup and object counting phases
with higher efficiency. Actual object reuse, deltification, and
final output are still doing sequential lookups, making them a bit
more expensive to perform.
Change-Id: I4c966f84917482598012074c370b9831451404ee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago Implement async/batch lookup of object data
An ObjectReader implementation may be very slow for a single object,
but yet support bulk queries efficiently by batching multiple small
requests into a single larger request. This easily happens when the
reader is built on top of a database that is stored on another host,
as the network round-trip time starts to dominate the operation cost.
RevWalk, ObjectWalk, UploadPack and PackWriter are the first major
users of this new bulk interface, with the goal being to support an
efficient way to pack a repository for a fetch/clone client when the
source repository is stored in a high-latency storage system.
Processing the want/have lists is now done in bulk, to remove
the high costs associated with common ancestor negotiation.
PackWriter already performs object reuse selection in bulk, but it
now can also do the object size lookup and object counting phases
with higher efficiency. Actual object reuse, deltification, and
final output are still doing sequential lookups, making them a bit
more expensive to perform.
Change-Id: I4c966f84917482598012074c370b9831451404ee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago Implement async/batch lookup of object data
An ObjectReader implementation may be very slow for a single object,
but yet support bulk queries efficiently by batching multiple small
requests into a single larger request. This easily happens when the
reader is built on top of a database that is stored on another host,
as the network round-trip time starts to dominate the operation cost.
RevWalk, ObjectWalk, UploadPack and PackWriter are the first major
users of this new bulk interface, with the goal being to support an
efficient way to pack a repository for a fetch/clone client when the
source repository is stored in a high-latency storage system.
Processing the want/have lists is now done in bulk, to remove
the high costs associated with common ancestor negotiation.
PackWriter already performs object reuse selection in bulk, but it
now can also do the object size lookup and object counting phases
with higher efficiency. Actual object reuse, deltification, and
final output are still doing sequential lookups, making them a bit
more expensive to perform.
Change-Id: I4c966f84917482598012074c370b9831451404ee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago Implement async/batch lookup of object data
An ObjectReader implementation may be very slow for a single object,
but yet support bulk queries efficiently by batching multiple small
requests into a single larger request. This easily happens when the
reader is built on top of a database that is stored on another host,
as the network round-trip time starts to dominate the operation cost.
RevWalk, ObjectWalk, UploadPack and PackWriter are the first major
users of this new bulk interface, with the goal being to support an
efficient way to pack a repository for a fetch/clone client when the
source repository is stored in a high-latency storage system.
Processing the want/have lists is now done in bulk, to remove
the high costs associated with common ancestor negotiation.
PackWriter already performs object reuse selection in bulk, but it
now can also do the object size lookup and object counting phases
with higher efficiency. Actual object reuse, deltification, and
final output are still doing sequential lookups, making them a bit
more expensive to perform.
Change-Id: I4c966f84917482598012074c370b9831451404ee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago Implement async/batch lookup of object data
An ObjectReader implementation may be very slow for a single object,
but yet support bulk queries efficiently by batching multiple small
requests into a single larger request. This easily happens when the
reader is built on top of a database that is stored on another host,
as the network round-trip time starts to dominate the operation cost.
RevWalk, ObjectWalk, UploadPack and PackWriter are the first major
users of this new bulk interface, with the goal being to support an
efficient way to pack a repository for a fetch/clone client when the
source repository is stored in a high-latency storage system.
Processing the want/have lists is now done in bulk, to remove
the high costs associated with common ancestor negotiation.
PackWriter already performs object reuse selection in bulk, but it
now can also do the object size lookup and object counting phases
with higher efficiency. Actual object reuse, deltification, and
final output are still doing sequential lookups, making them a bit
more expensive to perform.
Change-Id: I4c966f84917482598012074c370b9831451404ee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago Implement async/batch lookup of object data
An ObjectReader implementation may be very slow for a single object,
but yet support bulk queries efficiently by batching multiple small
requests into a single larger request. This easily happens when the
reader is built on top of a database that is stored on another host,
as the network round-trip time starts to dominate the operation cost.
RevWalk, ObjectWalk, UploadPack and PackWriter are the first major
users of this new bulk interface, with the goal being to support an
efficient way to pack a repository for a fetch/clone client when the
source repository is stored in a high-latency storage system.
Processing the want/have lists is now done in bulk, to remove
the high costs associated with common ancestor negotiation.
PackWriter already performs object reuse selection in bulk, but it
now can also do the object size lookup and object counting phases
with higher efficiency. Actual object reuse, deltification, and
final output are still doing sequential lookups, making them a bit
more expensive to perform.
Change-Id: I4c966f84917482598012074c370b9831451404ee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago Implement async/batch lookup of object data
An ObjectReader implementation may be very slow for a single object,
but yet support bulk queries efficiently by batching multiple small
requests into a single larger request. This easily happens when the
reader is built on top of a database that is stored on another host,
as the network round-trip time starts to dominate the operation cost.
RevWalk, ObjectWalk, UploadPack and PackWriter are the first major
users of this new bulk interface, with the goal being to support an
efficient way to pack a repository for a fetch/clone client when the
source repository is stored in a high-latency storage system.
Processing the want/have lists is now done in bulk, to remove
the high costs associated with common ancestor negotiation.
PackWriter already performs object reuse selection in bulk, but it
now can also do the object size lookup and object counting phases
with higher efficiency. Actual object reuse, deltification, and
final output are still doing sequential lookups, making them a bit
more expensive to perform.
Change-Id: I4c966f84917482598012074c370b9831451404ee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago Implement async/batch lookup of object data
An ObjectReader implementation may be very slow for a single object,
but yet support bulk queries efficiently by batching multiple small
requests into a single larger request. This easily happens when the
reader is built on top of a database that is stored on another host,
as the network round-trip time starts to dominate the operation cost.
RevWalk, ObjectWalk, UploadPack and PackWriter are the first major
users of this new bulk interface, with the goal being to support an
efficient way to pack a repository for a fetch/clone client when the
source repository is stored in a high-latency storage system.
Processing the want/have lists is now done in bulk, to remove
the high costs associated with common ancestor negotiation.
PackWriter already performs object reuse selection in bulk, but it
now can also do the object size lookup and object counting phases
with higher efficiency. Actual object reuse, deltification, and
final output are still doing sequential lookups, making them a bit
more expensive to perform.
Change-Id: I4c966f84917482598012074c370b9831451404ee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago Implement async/batch lookup of object data
An ObjectReader implementation may be very slow for a single object,
but yet support bulk queries efficiently by batching multiple small
requests into a single larger request. This easily happens when the
reader is built on top of a database that is stored on another host,
as the network round-trip time starts to dominate the operation cost.
RevWalk, ObjectWalk, UploadPack and PackWriter are the first major
users of this new bulk interface, with the goal being to support an
efficient way to pack a repository for a fetch/clone client when the
source repository is stored in a high-latency storage system.
Processing the want/have lists is now done in bulk, to remove
the high costs associated with common ancestor negotiation.
PackWriter already performs object reuse selection in bulk, but it
now can also do the object size lookup and object counting phases
with higher efficiency. Actual object reuse, deltification, and
final output are still doing sequential lookups, making them a bit
more expensive to perform.
Change-Id: I4c966f84917482598012074c370b9831451404ee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago Implement async/batch lookup of object data
An ObjectReader implementation may be very slow for a single object,
but yet support bulk queries efficiently by batching multiple small
requests into a single larger request. This easily happens when the
reader is built on top of a database that is stored on another host,
as the network round-trip time starts to dominate the operation cost.
RevWalk, ObjectWalk, UploadPack and PackWriter are the first major
users of this new bulk interface, with the goal being to support an
efficient way to pack a repository for a fetch/clone client when the
source repository is stored in a high-latency storage system.
Processing the want/have lists is now done in bulk, to remove
the high costs associated with common ancestor negotiation.
PackWriter already performs object reuse selection in bulk, but it
now can also do the object size lookup and object counting phases
with higher efficiency. Actual object reuse, deltification, and
final output are still doing sequential lookups, making them a bit
more expensive to perform.
Change-Id: I4c966f84917482598012074c370b9831451404ee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago Implement async/batch lookup of object data
An ObjectReader implementation may be very slow for a single object,
but yet support bulk queries efficiently by batching multiple small
requests into a single larger request. This easily happens when the
reader is built on top of a database that is stored on another host,
as the network round-trip time starts to dominate the operation cost.
RevWalk, ObjectWalk, UploadPack and PackWriter are the first major
users of this new bulk interface, with the goal being to support an
efficient way to pack a repository for a fetch/clone client when the
source repository is stored in a high-latency storage system.
Processing the want/have lists is now done in bulk, to remove
the high costs associated with common ancestor negotiation.
PackWriter already performs object reuse selection in bulk, but it
now can also do the object size lookup and object counting phases
with higher efficiency. Actual object reuse, deltification, and
final output are still doing sequential lookups, making them a bit
more expensive to perform.
Change-Id: I4c966f84917482598012074c370b9831451404ee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago Implement async/batch lookup of object data
An ObjectReader implementation may be very slow for a single object,
but yet support bulk queries efficiently by batching multiple small
requests into a single larger request. This easily happens when the
reader is built on top of a database that is stored on another host,
as the network round-trip time starts to dominate the operation cost.
RevWalk, ObjectWalk, UploadPack and PackWriter are the first major
users of this new bulk interface, with the goal being to support an
efficient way to pack a repository for a fetch/clone client when the
source repository is stored in a high-latency storage system.
Processing the want/have lists is now done in bulk, to remove
the high costs associated with common ancestor negotiation.
PackWriter already performs object reuse selection in bulk, but it
now can also do the object size lookup and object counting phases
with higher efficiency. Actual object reuse, deltification, and
final output are still doing sequential lookups, making them a bit
more expensive to perform.
Change-Id: I4c966f84917482598012074c370b9831451404ee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago Implement async/batch lookup of object data
An ObjectReader implementation may be very slow for a single object,
but yet support bulk queries efficiently by batching multiple small
requests into a single larger request. This easily happens when the
reader is built on top of a database that is stored on another host,
as the network round-trip time starts to dominate the operation cost.
RevWalk, ObjectWalk, UploadPack and PackWriter are the first major
users of this new bulk interface, with the goal being to support an
efficient way to pack a repository for a fetch/clone client when the
source repository is stored in a high-latency storage system.
Processing the want/have lists is now done in bulk, to remove
the high costs associated with common ancestor negotiation.
PackWriter already performs object reuse selection in bulk, but it
now can also do the object size lookup and object counting phases
with higher efficiency. Actual object reuse, deltification, and
final output are still doing sequential lookups, making them a bit
more expensive to perform.
Change-Id: I4c966f84917482598012074c370b9831451404ee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago Implement async/batch lookup of object data
An ObjectReader implementation may be very slow for a single object,
but yet support bulk queries efficiently by batching multiple small
requests into a single larger request. This easily happens when the
reader is built on top of a database that is stored on another host,
as the network round-trip time starts to dominate the operation cost.
RevWalk, ObjectWalk, UploadPack and PackWriter are the first major
users of this new bulk interface, with the goal being to support an
efficient way to pack a repository for a fetch/clone client when the
source repository is stored in a high-latency storage system.
Processing the want/have lists is now done in bulk, to remove
the high costs associated with common ancestor negotiation.
PackWriter already performs object reuse selection in bulk, but it
now can also do the object size lookup and object counting phases
with higher efficiency. Actual object reuse, deltification, and
final output are still doing sequential lookups, making them a bit
more expensive to perform.
Change-Id: I4c966f84917482598012074c370b9831451404ee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago PackWriter: Support reuse of entire packs
The most expensive part of packing a repository for transport to
another system is enumerating all of the objects in the repository.
Once this gets to the size of the linux-2.6 repository (1.8 million
objects), enumeration can take several CPU minutes and costs a lot
of temporary working set memory.
Teach PackWriter to efficiently reuse an existing "cached pack"
by answering a clone request with a thin pack followed by a larger
cached pack appended to the end. This requires the repository
owner to first construct the cached pack by hand, and record the
tip commits inside of $GIT_DIR/objects/info/cached-packs:
cd $GIT_DIR
root=$(git rev-parse master)
tmp=objects/.tmp-$$
names=$(echo $root | git pack-objects --keep-true-parents --revs $tmp)
for n in $names; do
chmod a-w $tmp-$n.pack $tmp-$n.idx
touch objects/pack/pack-$n.keep
mv $tmp-$n.pack objects/pack/pack-$n.pack
mv $tmp-$n.idx objects/pack/pack-$n.idx
done
(echo "+ $root";
for n in $names; do echo "P $n"; done;
echo) >>objects/info/cached-packs
git repack -a -d
When a clone request needs to include $root, the corresponding
cached pack will be copied as-is, rather than enumerating all of
the objects that are reachable from $root.
For a linux-2.6 kernel repository that should be about 376 MiB,
the above process creates two packs of 368 MiB and 38 MiB[1].
This is a local disk usage increase of ~26 MiB, due to reduced
delta compression between the large cached pack and the smaller
recent activity pack. The overhead is similar to 1 full copy of
the compressed project sources.
With this cached pack in hand, JGit daemon completes a clone request
in 1m17s less time, but a slightly larger data transfer (+2.39 MiB):
Before:
remote: Counting objects: 1861830, done
remote: Finding sources: 100% (1861830/1861830)
remote: Getting sizes: 100% (88243/88243)
remote: Compressing objects: 100% (88184/88184)
Receiving objects: 100% (1861830/1861830), 376.01 MiB | 19.01 MiB/s, done.
remote: Total 1861830 (delta 4706), reused 1851053 (delta 1553844)
Resolving deltas: 100% (1564621/1564621), done.
real 3m19.005s
After:
remote: Counting objects: 1601, done
remote: Counting objects: 1828460, done
remote: Finding sources: 100% (50475/50475)
remote: Getting sizes: 100% (18843/18843)
remote: Compressing objects: 100% (7585/7585)
remote: Total 1861830 (delta 2407), reused 1856197 (delta 37510)
Receiving objects: 100% (1861830/1861830), 378.40 MiB | 31.31 MiB/s, done.
Resolving deltas: 100% (1559477/1559477), done.
real 2m2.938s
Repository owners can periodically refresh their cached packs by
repacking their repository, folding all newer objects into a larger
cached pack. Since repacking is already considered to be a normal
Git maintenance activity, this isn't a very big burden.
[1] In this test $root was set back about two weeks.
Change-Id: Ib87131d5c4b5e8c5cacb0f4fe16ff4ece554734b
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago |
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991001011021031041051061071081091101111121131141151161171181191201211221231241251261271281291301311321331341351361371381391401411421431441451461471481491501511521531541551561571581591601611621631641651661671681691701711721731741751761771781791801811821831841851861871881891901911921931941951961971981992002012022032042052062072082092102112122132142152162172182192202212222232242252262272282292302312322332342352362372382392402412422432442452462472482492502512522532542552562572582592602612622632642652662672682692702712722732742752762772782792802812822832842852862872882892902912922932942952962972982993003013023033043053063073083093103113123133143153163173183193203213223233243253263273283293303313323333343353363373383393403413423433443453463473483493503513523533543553563573583593603613623633643653663673683693703713723733743753763773783793803813823833843853863873883893903913923933943953963973983994004014024034044054064074084094104114124134144154164174184194204214224234244254264274284294304314324334344354364374384394404414424434444454464474484494504514524534544554564574584594604614624634644654664674684694704714724734744754764774784794804814824834844854864874884894904914924934944954964974984995005015025035045055065075085095105115125135145155165175185195205215225235245255265275285295305315325335345355365375385395405415425435445455465475485495505515525535545555565575585595605615625635645655665675685695705715725735745755765775785795805815825835845855865875885895905915925935945955965975985996006016026036046056066076086096106116126136146156166176186196206216226236246256266276286296306316326336346356366376386396406416426436446456466476486496506516526536546556566576586596606616626636646656666676686696706716726736746756766776786796806816826836846856866876886896906916926936946956966976986997007017027037047057067077087097107117127137147157167177187197207217227237247257267277287297307317327337347357367377387397407417427437447457467477487497507517527537547557567577587597607617627637647657667677687697707717727737747757767777787797807817827837847857867877887897907917927937947957967977987998008018028038048058068078088098108118128138148158168178188198208218228238248258268278288298308318328338348358368378388398408418428438448458468478488498508518528538548558568578588598608618628638648658668678688698708718728738748758768778788798808818828838848858868878888898908918928938948958968978988999009019029039049059069079089099109119129139149159169179189199209219229239249259269279289299309319329339349359369379389399409419429439449459469479489499509519529539549559569579589599609619629639649659669679689699709719729739749759769779789799809819829839849859869879889899909919929939949959969979989991000100110021003100410051006100710081009101010111012101310141015101610171018101910201021102210231024102510261027102810291030103110321033103410351036103710381039104010411042104310441045104610471048104910501051105210531054105510561057105810591060106110621063106410651066106710681069107010711072107310741075107610771078107910801081108210831084108510861087108810891090109110921093109410951096109710981099110011011102110311041105110611071108110911101111111211131114111511161117 |
- /*
- * Copyright (C) 2008-2010, Google Inc.
- * and other copyright owners as documented in the project's IP log.
- *
- * This program and the accompanying materials are made available
- * under the terms of the Eclipse Distribution License v1.0 which
- * accompanies this distribution, is reproduced below, and is
- * available at http://www.eclipse.org/org/documents/edl-v10.php
- *
- * All rights reserved.
- *
- * Redistribution and use in source and binary forms, with or
- * without modification, are permitted provided that the following
- * conditions are met:
- *
- * - Redistributions of source code must retain the above copyright
- * notice, this list of conditions and the following disclaimer.
- *
- * - Redistributions in binary form must reproduce the above
- * copyright notice, this list of conditions and the following
- * disclaimer in the documentation and/or other materials provided
- * with the distribution.
- *
- * - Neither the name of the Eclipse Foundation, Inc. nor the
- * names of its contributors may be used to endorse or promote
- * products derived from this software without specific prior
- * written permission.
- *
- * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
- * CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
- * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
- * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
- * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
- * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
- * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
- * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
- * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
- * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
- * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
- * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
- package org.eclipse.jgit.transport;
-
- import java.io.EOFException;
- import java.io.IOException;
- import java.io.InputStream;
- import java.io.OutputStream;
- import java.text.MessageFormat;
- import java.util.ArrayList;
- import java.util.Collections;
- import java.util.HashSet;
- import java.util.List;
- import java.util.Map;
- import java.util.Set;
-
- import org.eclipse.jgit.JGitText;
- import org.eclipse.jgit.errors.CorruptObjectException;
- import org.eclipse.jgit.errors.IncorrectObjectTypeException;
- import org.eclipse.jgit.errors.MissingObjectException;
- import org.eclipse.jgit.errors.PackProtocolException;
- import org.eclipse.jgit.lib.Constants;
- import org.eclipse.jgit.lib.NullProgressMonitor;
- import org.eclipse.jgit.lib.ObjectId;
- import org.eclipse.jgit.lib.ProgressMonitor;
- import org.eclipse.jgit.lib.Ref;
- import org.eclipse.jgit.lib.Repository;
- import org.eclipse.jgit.revwalk.AsyncRevObjectQueue;
- import org.eclipse.jgit.revwalk.DepthWalk;
- import org.eclipse.jgit.revwalk.ObjectWalk;
- import org.eclipse.jgit.revwalk.RevCommit;
- import org.eclipse.jgit.revwalk.RevFlag;
- import org.eclipse.jgit.revwalk.RevFlagSet;
- import org.eclipse.jgit.revwalk.RevObject;
- import org.eclipse.jgit.revwalk.RevTag;
- import org.eclipse.jgit.revwalk.RevWalk;
- import org.eclipse.jgit.revwalk.filter.CommitTimeRevFilter;
- import org.eclipse.jgit.storage.pack.PackConfig;
- import org.eclipse.jgit.storage.pack.PackWriter;
- import org.eclipse.jgit.transport.BasePackFetchConnection.MultiAck;
- import org.eclipse.jgit.transport.RefAdvertiser.PacketLineOutRefAdvertiser;
- import org.eclipse.jgit.util.io.InterruptTimer;
- import org.eclipse.jgit.util.io.TimeoutInputStream;
- import org.eclipse.jgit.util.io.TimeoutOutputStream;
-
- /**
- * Implements the server side of a fetch connection, transmitting objects.
- */
- public class UploadPack {
- static final String OPTION_INCLUDE_TAG = BasePackFetchConnection.OPTION_INCLUDE_TAG;
-
- static final String OPTION_MULTI_ACK = BasePackFetchConnection.OPTION_MULTI_ACK;
-
- static final String OPTION_MULTI_ACK_DETAILED = BasePackFetchConnection.OPTION_MULTI_ACK_DETAILED;
-
- static final String OPTION_THIN_PACK = BasePackFetchConnection.OPTION_THIN_PACK;
-
- static final String OPTION_SIDE_BAND = BasePackFetchConnection.OPTION_SIDE_BAND;
-
- static final String OPTION_SIDE_BAND_64K = BasePackFetchConnection.OPTION_SIDE_BAND_64K;
-
- static final String OPTION_OFS_DELTA = BasePackFetchConnection.OPTION_OFS_DELTA;
-
- static final String OPTION_NO_PROGRESS = BasePackFetchConnection.OPTION_NO_PROGRESS;
-
- static final String OPTION_NO_DONE = BasePackFetchConnection.OPTION_NO_DONE;
-
- static final String OPTION_SHALLOW = BasePackFetchConnection.OPTION_SHALLOW;
-
- /** Policy the server uses to validate client requests */
- public static enum RequestPolicy {
- /** Client may only ask for objects the server advertised a reference for. */
- ADVERTISED,
- /** Client may ask for any commit reachable from a reference. */
- REACHABLE_COMMIT,
- /** Client may ask for any SHA-1 in the repository. */
- ANY;
- }
-
- /** Database we read the objects from. */
- private final Repository db;
-
- /** Revision traversal support over {@link #db}. */
- private final RevWalk walk;
-
- /** Configuration to pass into the PackWriter. */
- private PackConfig packConfig;
-
- /** Timeout in seconds to wait for client interaction. */
- private int timeout;
-
- /**
- * Is the client connection a bi-directional socket or pipe?
- * <p>
- * If true, this class assumes it can perform multiple read and write cycles
- * with the client over the input and output streams. This matches the
- * functionality available with a standard TCP/IP connection, or a local
- * operating system or in-memory pipe.
- * <p>
- * If false, this class runs in a read everything then output results mode,
- * making it suitable for single round-trip systems RPCs such as HTTP.
- */
- private boolean biDirectionalPipe = true;
-
- /** Timer to manage {@link #timeout}. */
- private InterruptTimer timer;
-
- private InputStream rawIn;
-
- private OutputStream rawOut;
-
- private PacketLineIn pckIn;
-
- private PacketLineOut pckOut;
-
- /** The refs we advertised as existing at the start of the connection. */
- private Map<String, Ref> refs;
-
- /** Filter used while advertising the refs to the client. */
- private RefFilter refFilter;
-
- /** Hook handling the various upload phases. */
- private PreUploadHook preUploadHook = PreUploadHook.NULL;
-
- /** Capabilities requested by the client. */
- private final Set<String> options = new HashSet<String>();
-
- /** Raw ObjectIds the client has asked for, before validating them. */
- private final Set<ObjectId> wantIds = new HashSet<ObjectId>();
-
- /** Objects the client wants to obtain. */
- private final Set<RevObject> wantAll = new HashSet<RevObject>();
-
- /** Objects on both sides, these don't have to be sent. */
- private final Set<RevObject> commonBase = new HashSet<RevObject>();
-
- /** Shallow commits the client already has. */
- private final Set<ObjectId> clientShallowCommits = new HashSet<ObjectId>();
-
- /** Shallow commits on the client which are now becoming unshallow */
- private final List<ObjectId> unshallowCommits = new ArrayList<ObjectId>();
-
- /** Desired depth from the client on a shallow request. */
- private int depth;
-
- /** Commit time of the oldest common commit, in seconds. */
- private int oldestTime;
-
- /** null if {@link #commonBase} should be examined again. */
- private Boolean okToGiveUp;
-
- private boolean sentReady;
-
- /** Objects we sent in our advertisement list, clients can ask for these. */
- private Set<ObjectId> advertised;
-
- /** Marked on objects the client has asked us to give them. */
- private final RevFlag WANT;
-
- /** Marked on objects both we and the client have. */
- private final RevFlag PEER_HAS;
-
- /** Marked on objects in {@link #commonBase}. */
- private final RevFlag COMMON;
-
- /** Objects where we found a path from the want list to a common base. */
- private final RevFlag SATISFIED;
-
- private final RevFlagSet SAVE;
-
- private RequestPolicy requestPolicy = RequestPolicy.ADVERTISED;
-
- private MultiAck multiAck = MultiAck.OFF;
-
- private boolean noDone;
-
- private PackWriter.Statistics statistics;
-
- private UploadPackLogger logger;
-
- /**
- * Create a new pack upload for an open repository.
- *
- * @param copyFrom
- * the source repository.
- */
- public UploadPack(final Repository copyFrom) {
- db = copyFrom;
- walk = new RevWalk(db);
- walk.setRetainBody(false);
-
- WANT = walk.newFlag("WANT");
- PEER_HAS = walk.newFlag("PEER_HAS");
- COMMON = walk.newFlag("COMMON");
- SATISFIED = walk.newFlag("SATISFIED");
- walk.carry(PEER_HAS);
-
- SAVE = new RevFlagSet();
- SAVE.add(WANT);
- SAVE.add(PEER_HAS);
- SAVE.add(COMMON);
- SAVE.add(SATISFIED);
- refFilter = RefFilter.DEFAULT;
- }
-
- /** @return the repository this upload is reading from. */
- public final Repository getRepository() {
- return db;
- }
-
- /** @return the RevWalk instance used by this connection. */
- public final RevWalk getRevWalk() {
- return walk;
- }
-
- /** @return all refs which were advertised to the client. */
- public final Map<String, Ref> getAdvertisedRefs() {
- if (refs == null)
- setAdvertisedRefs(db.getAllRefs());
- return refs;
- }
-
- /**
- * @param allRefs
- * explicit set of references to claim as advertised by this
- * UploadPack instance. This overrides any references that
- * may exist in the source repository. The map is passed
- * to the configured {@link #getRefFilter()}.
- */
- public void setAdvertisedRefs(Map<String, Ref> allRefs) {
- refs = refFilter.filter(allRefs);
- }
-
- /** @return timeout (in seconds) before aborting an IO operation. */
- public int getTimeout() {
- return timeout;
- }
-
- /**
- * Set the timeout before willing to abort an IO call.
- *
- * @param seconds
- * number of seconds to wait (with no data transfer occurring)
- * before aborting an IO read or write operation with the
- * connected client.
- */
- public void setTimeout(final int seconds) {
- timeout = seconds;
- }
-
- /**
- * @return true if this class expects a bi-directional pipe opened between
- * the client and itself. The default is true.
- */
- public boolean isBiDirectionalPipe() {
- return biDirectionalPipe;
- }
-
- /**
- * @param twoWay
- * if true, this class will assume the socket is a fully
- * bidirectional pipe between the two peers and takes advantage
- * of that by first transmitting the known refs, then waiting to
- * read commands. If false, this class assumes it must read the
- * commands before writing output and does not perform the
- * initial advertising.
- */
- public void setBiDirectionalPipe(final boolean twoWay) {
- biDirectionalPipe = twoWay;
- if (!biDirectionalPipe && requestPolicy == RequestPolicy.ADVERTISED)
- requestPolicy = RequestPolicy.REACHABLE_COMMIT;
- }
-
- /** @return policy used by the service to validate client requests. */
- public RequestPolicy getRequestPolicy() {
- return requestPolicy;
- }
-
- /**
- * @param policy
- * the policy used to enforce validation of a client's want list.
- * By default the policy is {@link RequestPolicy#ADVERTISED},
- * which is the Git default requiring clients to only ask for an
- * object that a reference directly points to. This may be relaxed
- * to {@link RequestPolicy#REACHABLE_COMMIT} when callers
- * have {@link #setBiDirectionalPipe(boolean)} set to false.
- */
- public void setRequestPolicy(RequestPolicy policy) {
- requestPolicy = policy != null ? policy : RequestPolicy.ADVERTISED;
- }
-
- /** @return the filter used while advertising the refs to the client */
- public RefFilter getRefFilter() {
- return refFilter;
- }
-
- /**
- * Set the filter used while advertising the refs to the client.
- * <p>
- * Only refs allowed by this filter will be sent to the client. This can
- * be used by a server to restrict the list of references the client can
- * obtain through clone or fetch, effectively limiting the access to only
- * certain refs.
- *
- * @param refFilter
- * the filter; may be null to show all refs.
- */
- public void setRefFilter(final RefFilter refFilter) {
- this.refFilter = refFilter != null ? refFilter : RefFilter.DEFAULT;
- }
-
- /** @return the configured upload hook. */
- public PreUploadHook getPreUploadHook() {
- return preUploadHook;
- }
-
- /**
- * Set the hook that controls how this instance will behave.
- *
- * @param hook
- * the hook; if null no special actions are taken.
- */
- public void setPreUploadHook(PreUploadHook hook) {
- preUploadHook = hook != null ? hook : PreUploadHook.NULL;
- }
-
- /**
- * Set the configuration used by the pack generator.
- *
- * @param pc
- * configuration controlling packing parameters. If null the
- * source repository's settings will be used.
- */
- public void setPackConfig(PackConfig pc) {
- this.packConfig = pc;
- }
-
- /**
- * Set the logger.
- *
- * @param logger
- * the logger instance. If null, no logging occurs.
- */
- public void setLogger(UploadPackLogger logger) {
- this.logger = logger;
- }
-
- /**
- * Execute the upload task on the socket.
- *
- * @param input
- * raw input to read client commands from. Caller must ensure the
- * input is buffered, otherwise read performance may suffer.
- * @param output
- * response back to the Git network client, to write the pack
- * data onto. Caller must ensure the output is buffered,
- * otherwise write performance may suffer.
- * @param messages
- * secondary "notice" channel to send additional messages out
- * through. When run over SSH this should be tied back to the
- * standard error channel of the command execution. For most
- * other network connections this should be null.
- * @throws IOException
- */
- public void upload(final InputStream input, final OutputStream output,
- final OutputStream messages) throws IOException {
- try {
- rawIn = input;
- rawOut = output;
-
- if (timeout > 0) {
- final Thread caller = Thread.currentThread();
- timer = new InterruptTimer(caller.getName() + "-Timer");
- TimeoutInputStream i = new TimeoutInputStream(rawIn, timer);
- TimeoutOutputStream o = new TimeoutOutputStream(rawOut, timer);
- i.setTimeout(timeout * 1000);
- o.setTimeout(timeout * 1000);
- rawIn = i;
- rawOut = o;
- }
-
- pckIn = new PacketLineIn(rawIn);
- pckOut = new PacketLineOut(rawOut);
- service();
- } finally {
- walk.release();
- if (timer != null) {
- try {
- timer.terminate();
- } finally {
- timer = null;
- }
- }
- }
- }
-
- /**
- * Get the PackWriter's statistics if a pack was sent to the client.
- *
- * @return statistics about pack output, if a pack was sent. Null if no pack
- * was sent, such as during the negotation phase of a smart HTTP
- * connection, or if the client was already up-to-date.
- */
- public PackWriter.Statistics getPackStatistics() {
- return statistics;
- }
-
- private void service() throws IOException {
- if (biDirectionalPipe)
- sendAdvertisedRefs(new PacketLineOutRefAdvertiser(pckOut));
- else if (requestPolicy == RequestPolicy.ANY)
- advertised = Collections.emptySet();
- else {
- advertised = new HashSet<ObjectId>();
- for (Ref ref : getAdvertisedRefs().values()) {
- if (ref.getObjectId() != null)
- advertised.add(ref.getObjectId());
- }
- }
-
- boolean sendPack;
- try {
- recvWants();
- if (wantIds.isEmpty()) {
- preUploadHook.onBeginNegotiateRound(this, wantIds, 0);
- preUploadHook.onEndNegotiateRound(this, wantIds, 0, 0, false);
- return;
- }
-
- if (options.contains(OPTION_MULTI_ACK_DETAILED)) {
- multiAck = MultiAck.DETAILED;
- noDone = options.contains(OPTION_NO_DONE);
- } else if (options.contains(OPTION_MULTI_ACK))
- multiAck = MultiAck.CONTINUE;
- else
- multiAck = MultiAck.OFF;
-
- if (depth != 0)
- processShallow();
- sendPack = negotiate();
- } catch (PackProtocolException err) {
- reportErrorDuringNegotiate(err.getMessage());
- throw err;
-
- } catch (UploadPackMayNotContinueException err) {
- if (!err.isOutput() && err.getMessage() != null) {
- try {
- pckOut.writeString("ERR " + err.getMessage() + "\n");
- err.setOutput();
- } catch (Throwable err2) {
- // Ignore this secondary failure (and not mark output).
- }
- }
- throw err;
-
- } catch (IOException err) {
- reportErrorDuringNegotiate(JGitText.get().internalServerError);
- throw err;
- } catch (RuntimeException err) {
- reportErrorDuringNegotiate(JGitText.get().internalServerError);
- throw err;
- } catch (Error err) {
- reportErrorDuringNegotiate(JGitText.get().internalServerError);
- throw err;
- }
-
- if (sendPack)
- sendPack();
- }
-
- private void reportErrorDuringNegotiate(String msg) {
- try {
- pckOut.writeString("ERR " + msg + "\n");
- } catch (Throwable err) {
- // Ignore this secondary failure.
- }
- }
-
- private void processShallow() throws IOException {
- DepthWalk.RevWalk depthWalk =
- new DepthWalk.RevWalk(walk.getObjectReader(), depth);
-
- // Find all the commits which will be shallow
- for (ObjectId o : wantIds) {
- try {
- depthWalk.markRoot(depthWalk.parseCommit(o));
- } catch (IncorrectObjectTypeException notCommit) {
- // Ignore non-commits in this loop.
- }
- }
-
- RevCommit o;
- while ((o = depthWalk.next()) != null) {
- DepthWalk.Commit c = (DepthWalk.Commit) o;
-
- // Commits at the boundary which aren't already shallow in
- // the client need to be marked as such
- if (c.getDepth() == depth && !clientShallowCommits.contains(c))
- pckOut.writeString("shallow " + o.name());
-
- // Commits not on the boundary which are shallow in the client
- // need to become unshallowed
- if (c.getDepth() < depth && clientShallowCommits.contains(c)) {
- unshallowCommits.add(c.copy());
- pckOut.writeString("unshallow " + c.name());
- }
- }
-
- pckOut.end();
- }
-
- /**
- * Generate an advertisement of available refs and capabilities.
- *
- * @param adv
- * the advertisement formatter.
- * @throws IOException
- * the formatter failed to write an advertisement.
- * @throws UploadPackMayNotContinueException
- * the hook denied advertisement.
- */
- public void sendAdvertisedRefs(final RefAdvertiser adv) throws IOException,
- UploadPackMayNotContinueException {
- try {
- preUploadHook.onPreAdvertiseRefs(this);
- } catch (UploadPackMayNotContinueException fail) {
- if (fail.getMessage() != null) {
- adv.writeOne("ERR " + fail.getMessage());
- fail.setOutput();
- }
- throw fail;
- }
-
- adv.init(db);
- adv.advertiseCapability(OPTION_INCLUDE_TAG);
- adv.advertiseCapability(OPTION_MULTI_ACK_DETAILED);
- adv.advertiseCapability(OPTION_MULTI_ACK);
- adv.advertiseCapability(OPTION_OFS_DELTA);
- adv.advertiseCapability(OPTION_SIDE_BAND);
- adv.advertiseCapability(OPTION_SIDE_BAND_64K);
- adv.advertiseCapability(OPTION_THIN_PACK);
- adv.advertiseCapability(OPTION_NO_PROGRESS);
- adv.advertiseCapability(OPTION_SHALLOW);
- if (!biDirectionalPipe)
- adv.advertiseCapability(OPTION_NO_DONE);
- adv.setDerefTags(true);
- advertised = adv.send(getAdvertisedRefs());
- adv.end();
- }
-
- private void recvWants() throws IOException {
- boolean isFirst = true;
- for (;;) {
- String line;
- try {
- line = pckIn.readString();
- } catch (EOFException eof) {
- if (isFirst)
- break;
- throw eof;
- }
-
- if (line == PacketLineIn.END)
- break;
-
- if (line.startsWith("deepen ")) {
- depth = Integer.parseInt(line.substring(7));
- continue;
- }
-
- if (line.startsWith("shallow ")) {
- clientShallowCommits.add(ObjectId.fromString(line.substring(8)));
- continue;
- }
-
- if (!line.startsWith("want ") || line.length() < 45)
- throw new PackProtocolException(MessageFormat.format(JGitText.get().expectedGot, "want", line));
-
- if (isFirst && line.length() > 45) {
- String opt = line.substring(45);
- if (opt.startsWith(" "))
- opt = opt.substring(1);
- for (String c : opt.split(" "))
- options.add(c);
- line = line.substring(0, 45);
- }
-
- wantIds.add(ObjectId.fromString(line.substring(5)));
- isFirst = false;
- }
- }
-
- private boolean negotiate() throws IOException {
- okToGiveUp = Boolean.FALSE;
-
- ObjectId last = ObjectId.zeroId();
- List<ObjectId> peerHas = new ArrayList<ObjectId>(64);
- for (;;) {
- String line;
- try {
- line = pckIn.readString();
- } catch (EOFException eof) {
- // EOF on stateless RPC (aka smart HTTP) and non-shallow request
- // means the client asked for the updated shallow/unshallow data,
- // disconnected, and will try another request with actual want/have.
- // Don't report the EOF here, its a bug in the protocol that the client
- // just disconnects without sending an END.
- if (!biDirectionalPipe && depth > 0)
- return false;
- throw eof;
- }
-
- if (line == PacketLineIn.END) {
- last = processHaveLines(peerHas, last);
- if (commonBase.isEmpty() || multiAck != MultiAck.OFF)
- pckOut.writeString("NAK\n");
- if (noDone && sentReady) {
- pckOut.writeString("ACK " + last.name() + "\n");
- return true;
- }
- if (!biDirectionalPipe)
- return false;
- pckOut.flush();
-
- } else if (line.startsWith("have ") && line.length() == 45) {
- peerHas.add(ObjectId.fromString(line.substring(5)));
-
- } else if (line.equals("done")) {
- last = processHaveLines(peerHas, last);
-
- if (commonBase.isEmpty())
- pckOut.writeString("NAK\n");
-
- else if (multiAck != MultiAck.OFF)
- pckOut.writeString("ACK " + last.name() + "\n");
-
- return true;
-
- } else {
- throw new PackProtocolException(MessageFormat.format(JGitText.get().expectedGot, "have", line));
- }
- }
- }
-
- private ObjectId processHaveLines(List<ObjectId> peerHas, ObjectId last)
- throws IOException {
- preUploadHook.onBeginNegotiateRound(this, wantIds, peerHas.size());
- if (peerHas.isEmpty())
- return last;
-
- List<ObjectId> toParse = peerHas;
- HashSet<ObjectId> peerHasSet = null;
- boolean needMissing = false;
- sentReady = false;
-
- if (wantAll.isEmpty() && !wantIds.isEmpty()) {
- // We have not yet parsed the want list. Parse it now.
- peerHasSet = new HashSet<ObjectId>(peerHas);
- int cnt = wantIds.size() + peerHasSet.size();
- toParse = new ArrayList<ObjectId>(cnt);
- toParse.addAll(wantIds);
- toParse.addAll(peerHasSet);
- needMissing = true;
- }
-
- Set<RevObject> notAdvertisedWants = null;
- int haveCnt = 0;
- AsyncRevObjectQueue q = walk.parseAny(toParse, needMissing);
- try {
- for (;;) {
- RevObject obj;
- try {
- obj = q.next();
- } catch (MissingObjectException notFound) {
- ObjectId id = notFound.getObjectId();
- if (wantIds.contains(id)) {
- String msg = MessageFormat.format(
- JGitText.get().wantNotValid, id.name());
- throw new PackProtocolException(msg, notFound);
- }
- continue;
- }
- if (obj == null)
- break;
-
- // If the object is still found in wantIds, the want
- // list wasn't parsed earlier, and was done in this batch.
- //
- if (wantIds.remove(obj)) {
- if (!advertised.contains(obj) && requestPolicy != RequestPolicy.ANY) {
- if (notAdvertisedWants == null)
- notAdvertisedWants = new HashSet<RevObject>();
- notAdvertisedWants.add(obj);
- }
-
- if (!obj.has(WANT)) {
- obj.add(WANT);
- wantAll.add(obj);
- }
-
- if (!(obj instanceof RevCommit))
- obj.add(SATISFIED);
-
- if (obj instanceof RevTag) {
- RevObject target = walk.peel(obj);
- if (target instanceof RevCommit) {
- if (!target.has(WANT)) {
- target.add(WANT);
- wantAll.add(target);
- }
- }
- }
-
- if (!peerHasSet.contains(obj))
- continue;
- }
-
- last = obj;
- haveCnt++;
-
- if (obj instanceof RevCommit) {
- RevCommit c = (RevCommit) obj;
- if (oldestTime == 0 || c.getCommitTime() < oldestTime)
- oldestTime = c.getCommitTime();
- }
-
- if (obj.has(PEER_HAS))
- continue;
-
- obj.add(PEER_HAS);
- if (obj instanceof RevCommit)
- ((RevCommit) obj).carry(PEER_HAS);
- addCommonBase(obj);
-
- // If both sides have the same object; let the client know.
- //
- switch (multiAck) {
- case OFF:
- if (commonBase.size() == 1)
- pckOut.writeString("ACK " + obj.name() + "\n");
- break;
- case CONTINUE:
- pckOut.writeString("ACK " + obj.name() + " continue\n");
- break;
- case DETAILED:
- pckOut.writeString("ACK " + obj.name() + " common\n");
- break;
- }
- }
- } finally {
- q.release();
- }
-
- // If the client asked for non advertised object, check our policy.
- if (notAdvertisedWants != null && !notAdvertisedWants.isEmpty()) {
- switch (requestPolicy) {
- case ADVERTISED:
- default:
- throw new PackProtocolException(MessageFormat.format(
- JGitText.get().wantNotValid,
- notAdvertisedWants.iterator().next().name()));
-
- case REACHABLE_COMMIT:
- checkNotAdvertisedWants(notAdvertisedWants);
- break;
-
- case ANY:
- // Allow whatever was asked for.
- break;
- }
- }
-
- int missCnt = peerHas.size() - haveCnt;
-
- // If we don't have one of the objects but we're also willing to
- // create a pack at this point, let the client know so it stops
- // telling us about its history.
- //
- boolean didOkToGiveUp = false;
- if (0 < missCnt) {
- for (int i = peerHas.size() - 1; i >= 0; i--) {
- ObjectId id = peerHas.get(i);
- if (walk.lookupOrNull(id) == null) {
- didOkToGiveUp = true;
- if (okToGiveUp()) {
- switch (multiAck) {
- case OFF:
- break;
- case CONTINUE:
- pckOut.writeString("ACK " + id.name() + " continue\n");
- break;
- case DETAILED:
- pckOut.writeString("ACK " + id.name() + " ready\n");
- sentReady = true;
- break;
- }
- }
- break;
- }
- }
- }
-
- if (multiAck == MultiAck.DETAILED && !didOkToGiveUp && okToGiveUp()) {
- ObjectId id = peerHas.get(peerHas.size() - 1);
- sentReady = true;
- pckOut.writeString("ACK " + id.name() + " ready\n");
- sentReady = true;
- }
-
- preUploadHook.onEndNegotiateRound(this, wantAll, haveCnt, missCnt, sentReady);
- peerHas.clear();
- return last;
- }
-
- private void checkNotAdvertisedWants(Set<RevObject> notAdvertisedWants)
- throws MissingObjectException, IncorrectObjectTypeException, IOException {
- // Walk the requested commits back to the advertised commits.
- // If any commit exists, a branch was deleted or rewound and
- // the repository owner no longer exports that requested item.
- // If the requested commit is merged into an advertised branch
- // it will be marked UNINTERESTING and no commits return.
-
- for (RevObject o : notAdvertisedWants) {
- if (!(o instanceof RevCommit)) {
- throw new PackProtocolException(MessageFormat.format(
- JGitText.get().wantNotValid,
- notAdvertisedWants.iterator().next().name()));
- }
- walk.markStart((RevCommit) o);
- }
-
- for (ObjectId id : advertised) {
- try {
- walk.markUninteresting(walk.parseCommit(id));
- } catch (IncorrectObjectTypeException notCommit) {
- continue;
- }
- }
-
- RevCommit bad = walk.next();
- if (bad != null) {
- throw new PackProtocolException(MessageFormat.format(
- JGitText.get().wantNotValid,
- bad.name()));
- }
- walk.reset();
- }
-
- private void addCommonBase(final RevObject o) {
- if (!o.has(COMMON)) {
- o.add(COMMON);
- commonBase.add(o);
- okToGiveUp = null;
- }
- }
-
- private boolean okToGiveUp() throws PackProtocolException {
- if (okToGiveUp == null)
- okToGiveUp = Boolean.valueOf(okToGiveUpImp());
- return okToGiveUp.booleanValue();
- }
-
- private boolean okToGiveUpImp() throws PackProtocolException {
- if (commonBase.isEmpty())
- return false;
-
- try {
- for (RevObject obj : wantAll) {
- if (!wantSatisfied(obj))
- return false;
- }
- return true;
- } catch (IOException e) {
- throw new PackProtocolException(JGitText.get().internalRevisionError, e);
- }
- }
-
- private boolean wantSatisfied(final RevObject want) throws IOException {
- if (want.has(SATISFIED))
- return true;
-
- walk.resetRetain(SAVE);
- walk.markStart((RevCommit) want);
- if (oldestTime != 0)
- walk.setRevFilter(CommitTimeRevFilter.after(oldestTime * 1000L));
- for (;;) {
- final RevCommit c = walk.next();
- if (c == null)
- break;
- if (c.has(PEER_HAS)) {
- addCommonBase(c);
- want.add(SATISFIED);
- return true;
- }
- }
- return false;
- }
-
- private void sendPack() throws IOException {
- final boolean sideband = options.contains(OPTION_SIDE_BAND)
- || options.contains(OPTION_SIDE_BAND_64K);
-
- if (!biDirectionalPipe) {
- // Ensure the request was fully consumed. Any remaining input must
- // be a protocol error. If we aren't at EOF the implementation is broken.
- int eof = rawIn.read();
- if (0 <= eof)
- throw new CorruptObjectException(MessageFormat.format(
- JGitText.get().expectedEOFReceived,
- "\\x" + Integer.toHexString(eof)));
- }
-
- if (sideband) {
- try {
- sendPack(true);
- } catch (UploadPackMayNotContinueException noPack) {
- // This was already reported on (below).
- throw noPack;
- } catch (IOException err) {
- if (reportInternalServerErrorOverSideband())
- throw new UploadPackInternalServerErrorException(err);
- else
- throw err;
- } catch (RuntimeException err) {
- if (reportInternalServerErrorOverSideband())
- throw new UploadPackInternalServerErrorException(err);
- else
- throw err;
- } catch (Error err) {
- if (reportInternalServerErrorOverSideband())
- throw new UploadPackInternalServerErrorException(err);
- else
- throw err;
- }
- } else {
- sendPack(false);
- }
- }
-
- private boolean reportInternalServerErrorOverSideband() {
- try {
- SideBandOutputStream err = new SideBandOutputStream(
- SideBandOutputStream.CH_ERROR,
- SideBandOutputStream.SMALL_BUF,
- rawOut);
- err.write(Constants.encode(JGitText.get().internalServerError));
- err.flush();
- return true;
- } catch (Throwable cannotReport) {
- // Ignore the reason. This is a secondary failure.
- return false;
- }
- }
-
- private void sendPack(final boolean sideband) throws IOException {
- ProgressMonitor pm = NullProgressMonitor.INSTANCE;
- OutputStream packOut = rawOut;
- SideBandOutputStream msgOut = null;
-
- if (sideband) {
- int bufsz = SideBandOutputStream.SMALL_BUF;
- if (options.contains(OPTION_SIDE_BAND_64K))
- bufsz = SideBandOutputStream.MAX_BUF;
-
- packOut = new SideBandOutputStream(SideBandOutputStream.CH_DATA,
- bufsz, rawOut);
- if (!options.contains(OPTION_NO_PROGRESS)) {
- msgOut = new SideBandOutputStream(
- SideBandOutputStream.CH_PROGRESS, bufsz, rawOut);
- pm = new SideBandProgressMonitor(msgOut);
- }
- }
-
- try {
- if (wantAll.isEmpty()) {
- preUploadHook.onSendPack(this, wantIds, commonBase);
- } else {
- preUploadHook.onSendPack(this, wantAll, commonBase);
- }
- } catch (UploadPackMayNotContinueException noPack) {
- if (sideband && noPack.getMessage() != null) {
- noPack.setOutput();
- SideBandOutputStream err = new SideBandOutputStream(
- SideBandOutputStream.CH_ERROR,
- SideBandOutputStream.SMALL_BUF, rawOut);
- err.write(Constants.encode(noPack.getMessage()));
- err.flush();
- }
- throw noPack;
- }
-
- PackConfig cfg = packConfig;
- if (cfg == null)
- cfg = new PackConfig(db);
- final PackWriter pw = new PackWriter(cfg, walk.getObjectReader());
- try {
- pw.setUseCachedPacks(true);
- pw.setReuseDeltaCommits(true);
- pw.setDeltaBaseAsOffset(options.contains(OPTION_OFS_DELTA));
- pw.setThin(options.contains(OPTION_THIN_PACK));
- pw.setReuseValidatingObjects(false);
-
- if (commonBase.isEmpty() && refs != null) {
- Set<ObjectId> tagTargets = new HashSet<ObjectId>();
- for (Ref ref : refs.values()) {
- if (ref.getPeeledObjectId() != null)
- tagTargets.add(ref.getPeeledObjectId());
- else if (ref.getObjectId() == null)
- continue;
- else if (ref.getName().startsWith(Constants.R_HEADS))
- tagTargets.add(ref.getObjectId());
- }
- pw.setTagTargets(tagTargets);
- }
-
- if (depth > 0)
- pw.setShallowPack(depth, unshallowCommits);
-
- RevWalk rw = walk;
- if (wantAll.isEmpty()) {
- pw.preparePack(pm, wantIds, commonBase);
- } else {
- walk.reset();
-
- ObjectWalk ow = walk.toObjectWalkWithSameObjects();
- pw.preparePack(pm, ow, wantAll, commonBase);
- rw = ow;
- }
-
- if (options.contains(OPTION_INCLUDE_TAG) && refs != null) {
- for (Ref ref : refs.values()) {
- ObjectId objectId = ref.getObjectId();
-
- // If the object was already requested, skip it.
- if (wantAll.isEmpty()) {
- if (wantIds.contains(objectId))
- continue;
- } else {
- RevObject obj = rw.lookupOrNull(objectId);
- if (obj != null && obj.has(WANT))
- continue;
- }
-
- if (!ref.isPeeled())
- ref = db.peel(ref);
-
- ObjectId peeledId = ref.getPeeledObjectId();
- if (peeledId == null)
- continue;
-
- objectId = ref.getObjectId();
- if (pw.willInclude(peeledId) && !pw.willInclude(objectId))
- pw.addObject(rw.parseAny(objectId));
- }
- }
-
- pw.writePack(pm, NullProgressMonitor.INSTANCE, packOut);
- statistics = pw.getStatistics();
-
- if (msgOut != null) {
- String msg = pw.getStatistics().getMessage() + '\n';
- msgOut.write(Constants.encode(msg));
- msgOut.flush();
- }
-
- } finally {
- pw.release();
- }
-
- if (sideband)
- pckOut.end();
-
- if (logger != null && statistics != null)
- logger.onPackStatistics(statistics);
- }
- }
|