You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

PackWriterTest.java 30KB

Shallow fetch: Respect "shallow" lines When fetching from a shallow clone, the client sends "have" lines to tell the server about objects it already has and "shallow" lines to tell where its local history terminates. In some circumstances, the server fails to honor the shallow lines and fails to return objects that the client needs. UploadPack passes the "have" lines to PackWriter so PackWriter can omit them from the generated pack. UploadPack processes "shallow" lines by calling RevWalk.assumeShallow() with the set of shallow commits. RevWalk creates and caches RevCommits for these shallow commits, clearing out their parents. That way, walks correctly terminate at the shallow commits instead of assuming the client has history going back behind them. UploadPack converts its RevWalk to an ObjectWalk, maintaining the cached RevCommits, and passes it to PackWriter. Unfortunately, to support shallow fetches the PackWriter does the following: if (shallowPack && !(walk instanceof DepthWalk.ObjectWalk)) walk = new DepthWalk.ObjectWalk(reader, depth); That is, when the client sends a "deepen" line (fetch --depth=<n>) and the caller has not passed in a DepthWalk.ObjectWalk, PackWriter throws away the RevWalk that was passed in and makes a new one. The cleared parent lists prepared by RevWalk.assumeShallow() are lost. Fortunately UploadPack intends to pass in a DepthWalk.ObjectWalk. It tries to create it by calling toObjectWalkWithSameObjects() on a DepthWalk.RevWalk. But it doesn't work: because DepthWalk.RevWalk does not override the standard RevWalk#toObjectWalkWithSameObjects implementation, the result is a plain ObjectWalk instead of an instance of DepthWalk.ObjectWalk. The result is that the "shallow" information is thrown away and objects reachable from the shallow commits can be omitted from the pack sent when fetching with --depth from a shallow clone. Multiple factors collude to limit the circumstances under which this bug can be observed: 1. Commits with depth != 0 don't enter DepthGenerator's pending queue. That means a "have" cannot have any effect on DepthGenerator unless it is also a "want". 2. DepthGenerator#next() doesn't call carryFlagsImpl(), so the uninteresting flag is not propagated to ancestors there even if a "have" is also a "want". 3. JGit treats a depth of 1 as "1 past the wants". Because of (2), the only place the UNINTERESTING flag can leak to a shallow commit's parents is in the carryFlags() call from markUninteresting(). carryFlags() only traverses commits that have already been parsed: commits yet to be parsed are supposed to inherit correct flags from their parent in PendingGenerator#next (which doesn't happen here --- that is (2)). So the list of commits that have already been parsed becomes relevant. When we hit the markUninteresting() call, all "want"s, "have"s, and commits to be unshallowed have been parsed. carryFlags() only affects the parsed commits. If the "want" is a direct parent of a "have", then it carryFlags() marks it as uninteresting. If the "have" was also a "shallow", then its parent pointer should have been null and the "want" shouldn't have been marked, so we see the bug. If the "want" is a more distant ancestor then (2) keeps the uninteresting state from propagating to the "want" and we don't see the bug. If the "shallow" is not also a "have" then the shallow commit isn't parsed so (2) keeps the uninteresting state from propagating to the "want so we don't see the bug. Here is a reproduction case (time flowing left to right, arrows pointing to parents). "C" must be a commit that the client reports as a "have" during negotiation. That can only happen if the server reports it as an existing branch or tag in the first round of negotiation: A <-- B <-- C <-- D First do git clone --depth 1 <repo> which yields D as a "have" and C as a "shallow" commit. Then try git fetch --depth 1 <repo> B:refs/heads/B Negotiation sets up: have D, shallow C, have C, want B. But due to this bug B is marked as uninteresting and is not sent. Change-Id: I6e14b57b2f85e52d28cdcf356df647870f475440 Signed-off-by: Terry Parker <tparker@google.com>
7 years ago
Fix missing deltas near type boundaries Delta search was discarding discovered deltas if an object appeared near a type boundary in the delta search window. This has caused JGit to produce larger pack files than other implementations of the packing algorithm. Delta search works by pushing prior objects into a search window, an ordered list of objects to attempt to delta compress the next object against. (The window size is bounded, avoiding O(N^2) behavior.) For implementation reasons multiple object types can appear in the input list, and the window. PackWriter commonly passes both trees and blobs in the input list handed to the DeltaWindow algorithm. The pack file format requires an object to only delta compress against the same type, so the DeltaWindow algorithm must stop doing comparisions if a blob would be compared to a tree. Because the input list is sorted by object type and the window is recently considered prior objects, once a wrong type is discovered in the window the search algorithm stops and uses the current result. Unfortunately the termination condition was discarding any found delta by setting deltaBase and deltaBuf to null when it was trying to break the window search. When this bug occurs, the state of the DeltaWindow looks like this: current | \ / input list: tree0 tree1 blob1 blob2 window: blob1 tree1 tree0 / \ | res.prev As the loop iterates to the right across the window, it first finds that blob1 is a suitable delta base for blob2, and temporarily holds this in the bestDelta/deltaBuf fields. It then considers tree1, but tree1 has the wrong type (blob != tree), so the window loop must give up and fall through the remaining code. Moving the condition up and discarding the window contents allows the bestDelta/deltaBuf to be kept, letting the final file delta compress blob1 against blob0. The impact of this bug (and its fix) on real world repositories is likely minimal. The boundary from blob to tree happens approximately once in the search, as the input list is sorted by type. Only the first window size worth of blobs (e.g. 10 or 250) were failing to produce a delta in the final file. This bug fix does produce significantly different results for small test repositories created in the unit test suite, such as when a pack may contains 6 objects (2 commits, 2 trees, 2 blobs). Packing test cases can now better sample different output pack file sizes depending on delta compression and object reuse flags in PackConfig. Change-Id: Ibec09398d0305d4dbc0c66fce1daaf38eb71148f
7 years ago
Fix missing deltas near type boundaries Delta search was discarding discovered deltas if an object appeared near a type boundary in the delta search window. This has caused JGit to produce larger pack files than other implementations of the packing algorithm. Delta search works by pushing prior objects into a search window, an ordered list of objects to attempt to delta compress the next object against. (The window size is bounded, avoiding O(N^2) behavior.) For implementation reasons multiple object types can appear in the input list, and the window. PackWriter commonly passes both trees and blobs in the input list handed to the DeltaWindow algorithm. The pack file format requires an object to only delta compress against the same type, so the DeltaWindow algorithm must stop doing comparisions if a blob would be compared to a tree. Because the input list is sorted by object type and the window is recently considered prior objects, once a wrong type is discovered in the window the search algorithm stops and uses the current result. Unfortunately the termination condition was discarding any found delta by setting deltaBase and deltaBuf to null when it was trying to break the window search. When this bug occurs, the state of the DeltaWindow looks like this: current | \ / input list: tree0 tree1 blob1 blob2 window: blob1 tree1 tree0 / \ | res.prev As the loop iterates to the right across the window, it first finds that blob1 is a suitable delta base for blob2, and temporarily holds this in the bestDelta/deltaBuf fields. It then considers tree1, but tree1 has the wrong type (blob != tree), so the window loop must give up and fall through the remaining code. Moving the condition up and discarding the window contents allows the bestDelta/deltaBuf to be kept, letting the final file delta compress blob1 against blob0. The impact of this bug (and its fix) on real world repositories is likely minimal. The boundary from blob to tree happens approximately once in the search, as the input list is sorted by type. Only the first window size worth of blobs (e.g. 10 or 250) were failing to produce a delta in the final file. This bug fix does produce significantly different results for small test repositories created in the unit test suite, such as when a pack may contains 6 objects (2 commits, 2 trees, 2 blobs). Packing test cases can now better sample different output pack file sizes depending on delta compression and object reuse flags in PackConfig. Change-Id: Ibec09398d0305d4dbc0c66fce1daaf38eb71148f
7 years ago
Shallow fetch: Respect "shallow" lines When fetching from a shallow clone, the client sends "have" lines to tell the server about objects it already has and "shallow" lines to tell where its local history terminates. In some circumstances, the server fails to honor the shallow lines and fails to return objects that the client needs. UploadPack passes the "have" lines to PackWriter so PackWriter can omit them from the generated pack. UploadPack processes "shallow" lines by calling RevWalk.assumeShallow() with the set of shallow commits. RevWalk creates and caches RevCommits for these shallow commits, clearing out their parents. That way, walks correctly terminate at the shallow commits instead of assuming the client has history going back behind them. UploadPack converts its RevWalk to an ObjectWalk, maintaining the cached RevCommits, and passes it to PackWriter. Unfortunately, to support shallow fetches the PackWriter does the following: if (shallowPack && !(walk instanceof DepthWalk.ObjectWalk)) walk = new DepthWalk.ObjectWalk(reader, depth); That is, when the client sends a "deepen" line (fetch --depth=<n>) and the caller has not passed in a DepthWalk.ObjectWalk, PackWriter throws away the RevWalk that was passed in and makes a new one. The cleared parent lists prepared by RevWalk.assumeShallow() are lost. Fortunately UploadPack intends to pass in a DepthWalk.ObjectWalk. It tries to create it by calling toObjectWalkWithSameObjects() on a DepthWalk.RevWalk. But it doesn't work: because DepthWalk.RevWalk does not override the standard RevWalk#toObjectWalkWithSameObjects implementation, the result is a plain ObjectWalk instead of an instance of DepthWalk.ObjectWalk. The result is that the "shallow" information is thrown away and objects reachable from the shallow commits can be omitted from the pack sent when fetching with --depth from a shallow clone. Multiple factors collude to limit the circumstances under which this bug can be observed: 1. Commits with depth != 0 don't enter DepthGenerator's pending queue. That means a "have" cannot have any effect on DepthGenerator unless it is also a "want". 2. DepthGenerator#next() doesn't call carryFlagsImpl(), so the uninteresting flag is not propagated to ancestors there even if a "have" is also a "want". 3. JGit treats a depth of 1 as "1 past the wants". Because of (2), the only place the UNINTERESTING flag can leak to a shallow commit's parents is in the carryFlags() call from markUninteresting(). carryFlags() only traverses commits that have already been parsed: commits yet to be parsed are supposed to inherit correct flags from their parent in PendingGenerator#next (which doesn't happen here --- that is (2)). So the list of commits that have already been parsed becomes relevant. When we hit the markUninteresting() call, all "want"s, "have"s, and commits to be unshallowed have been parsed. carryFlags() only affects the parsed commits. If the "want" is a direct parent of a "have", then it carryFlags() marks it as uninteresting. If the "have" was also a "shallow", then its parent pointer should have been null and the "want" shouldn't have been marked, so we see the bug. If the "want" is a more distant ancestor then (2) keeps the uninteresting state from propagating to the "want" and we don't see the bug. If the "shallow" is not also a "have" then the shallow commit isn't parsed so (2) keeps the uninteresting state from propagating to the "want so we don't see the bug. Here is a reproduction case (time flowing left to right, arrows pointing to parents). "C" must be a commit that the client reports as a "have" during negotiation. That can only happen if the server reports it as an existing branch or tag in the first round of negotiation: A <-- B <-- C <-- D First do git clone --depth 1 <repo> which yields D as a "have" and C as a "shallow" commit. Then try git fetch --depth 1 <repo> B:refs/heads/B Negotiation sets up: have D, shallow C, have C, want B. But due to this bug B is marked as uninteresting and is not sent. Change-Id: I6e14b57b2f85e52d28cdcf356df647870f475440 Signed-off-by: Terry Parker <tparker@google.com>
7 years ago
Shallow fetch: Respect "shallow" lines When fetching from a shallow clone, the client sends "have" lines to tell the server about objects it already has and "shallow" lines to tell where its local history terminates. In some circumstances, the server fails to honor the shallow lines and fails to return objects that the client needs. UploadPack passes the "have" lines to PackWriter so PackWriter can omit them from the generated pack. UploadPack processes "shallow" lines by calling RevWalk.assumeShallow() with the set of shallow commits. RevWalk creates and caches RevCommits for these shallow commits, clearing out their parents. That way, walks correctly terminate at the shallow commits instead of assuming the client has history going back behind them. UploadPack converts its RevWalk to an ObjectWalk, maintaining the cached RevCommits, and passes it to PackWriter. Unfortunately, to support shallow fetches the PackWriter does the following: if (shallowPack && !(walk instanceof DepthWalk.ObjectWalk)) walk = new DepthWalk.ObjectWalk(reader, depth); That is, when the client sends a "deepen" line (fetch --depth=<n>) and the caller has not passed in a DepthWalk.ObjectWalk, PackWriter throws away the RevWalk that was passed in and makes a new one. The cleared parent lists prepared by RevWalk.assumeShallow() are lost. Fortunately UploadPack intends to pass in a DepthWalk.ObjectWalk. It tries to create it by calling toObjectWalkWithSameObjects() on a DepthWalk.RevWalk. But it doesn't work: because DepthWalk.RevWalk does not override the standard RevWalk#toObjectWalkWithSameObjects implementation, the result is a plain ObjectWalk instead of an instance of DepthWalk.ObjectWalk. The result is that the "shallow" information is thrown away and objects reachable from the shallow commits can be omitted from the pack sent when fetching with --depth from a shallow clone. Multiple factors collude to limit the circumstances under which this bug can be observed: 1. Commits with depth != 0 don't enter DepthGenerator's pending queue. That means a "have" cannot have any effect on DepthGenerator unless it is also a "want". 2. DepthGenerator#next() doesn't call carryFlagsImpl(), so the uninteresting flag is not propagated to ancestors there even if a "have" is also a "want". 3. JGit treats a depth of 1 as "1 past the wants". Because of (2), the only place the UNINTERESTING flag can leak to a shallow commit's parents is in the carryFlags() call from markUninteresting(). carryFlags() only traverses commits that have already been parsed: commits yet to be parsed are supposed to inherit correct flags from their parent in PendingGenerator#next (which doesn't happen here --- that is (2)). So the list of commits that have already been parsed becomes relevant. When we hit the markUninteresting() call, all "want"s, "have"s, and commits to be unshallowed have been parsed. carryFlags() only affects the parsed commits. If the "want" is a direct parent of a "have", then it carryFlags() marks it as uninteresting. If the "have" was also a "shallow", then its parent pointer should have been null and the "want" shouldn't have been marked, so we see the bug. If the "want" is a more distant ancestor then (2) keeps the uninteresting state from propagating to the "want" and we don't see the bug. If the "shallow" is not also a "have" then the shallow commit isn't parsed so (2) keeps the uninteresting state from propagating to the "want so we don't see the bug. Here is a reproduction case (time flowing left to right, arrows pointing to parents). "C" must be a commit that the client reports as a "have" during negotiation. That can only happen if the server reports it as an existing branch or tag in the first round of negotiation: A <-- B <-- C <-- D First do git clone --depth 1 <repo> which yields D as a "have" and C as a "shallow" commit. Then try git fetch --depth 1 <repo> B:refs/heads/B Negotiation sets up: have D, shallow C, have C, want B. But due to this bug B is marked as uninteresting and is not sent. Change-Id: I6e14b57b2f85e52d28cdcf356df647870f475440 Signed-off-by: Terry Parker <tparker@google.com>
7 years ago
Shallow fetch: Respect "shallow" lines When fetching from a shallow clone, the client sends "have" lines to tell the server about objects it already has and "shallow" lines to tell where its local history terminates. In some circumstances, the server fails to honor the shallow lines and fails to return objects that the client needs. UploadPack passes the "have" lines to PackWriter so PackWriter can omit them from the generated pack. UploadPack processes "shallow" lines by calling RevWalk.assumeShallow() with the set of shallow commits. RevWalk creates and caches RevCommits for these shallow commits, clearing out their parents. That way, walks correctly terminate at the shallow commits instead of assuming the client has history going back behind them. UploadPack converts its RevWalk to an ObjectWalk, maintaining the cached RevCommits, and passes it to PackWriter. Unfortunately, to support shallow fetches the PackWriter does the following: if (shallowPack && !(walk instanceof DepthWalk.ObjectWalk)) walk = new DepthWalk.ObjectWalk(reader, depth); That is, when the client sends a "deepen" line (fetch --depth=<n>) and the caller has not passed in a DepthWalk.ObjectWalk, PackWriter throws away the RevWalk that was passed in and makes a new one. The cleared parent lists prepared by RevWalk.assumeShallow() are lost. Fortunately UploadPack intends to pass in a DepthWalk.ObjectWalk. It tries to create it by calling toObjectWalkWithSameObjects() on a DepthWalk.RevWalk. But it doesn't work: because DepthWalk.RevWalk does not override the standard RevWalk#toObjectWalkWithSameObjects implementation, the result is a plain ObjectWalk instead of an instance of DepthWalk.ObjectWalk. The result is that the "shallow" information is thrown away and objects reachable from the shallow commits can be omitted from the pack sent when fetching with --depth from a shallow clone. Multiple factors collude to limit the circumstances under which this bug can be observed: 1. Commits with depth != 0 don't enter DepthGenerator's pending queue. That means a "have" cannot have any effect on DepthGenerator unless it is also a "want". 2. DepthGenerator#next() doesn't call carryFlagsImpl(), so the uninteresting flag is not propagated to ancestors there even if a "have" is also a "want". 3. JGit treats a depth of 1 as "1 past the wants". Because of (2), the only place the UNINTERESTING flag can leak to a shallow commit's parents is in the carryFlags() call from markUninteresting(). carryFlags() only traverses commits that have already been parsed: commits yet to be parsed are supposed to inherit correct flags from their parent in PendingGenerator#next (which doesn't happen here --- that is (2)). So the list of commits that have already been parsed becomes relevant. When we hit the markUninteresting() call, all "want"s, "have"s, and commits to be unshallowed have been parsed. carryFlags() only affects the parsed commits. If the "want" is a direct parent of a "have", then it carryFlags() marks it as uninteresting. If the "have" was also a "shallow", then its parent pointer should have been null and the "want" shouldn't have been marked, so we see the bug. If the "want" is a more distant ancestor then (2) keeps the uninteresting state from propagating to the "want" and we don't see the bug. If the "shallow" is not also a "have" then the shallow commit isn't parsed so (2) keeps the uninteresting state from propagating to the "want so we don't see the bug. Here is a reproduction case (time flowing left to right, arrows pointing to parents). "C" must be a commit that the client reports as a "have" during negotiation. That can only happen if the server reports it as an existing branch or tag in the first round of negotiation: A <-- B <-- C <-- D First do git clone --depth 1 <repo> which yields D as a "have" and C as a "shallow" commit. Then try git fetch --depth 1 <repo> B:refs/heads/B Negotiation sets up: have D, shallow C, have C, want B. But due to this bug B is marked as uninteresting and is not sent. Change-Id: I6e14b57b2f85e52d28cdcf356df647870f475440 Signed-off-by: Terry Parker <tparker@google.com>
7 years ago
Shallow fetch: Respect "shallow" lines When fetching from a shallow clone, the client sends "have" lines to tell the server about objects it already has and "shallow" lines to tell where its local history terminates. In some circumstances, the server fails to honor the shallow lines and fails to return objects that the client needs. UploadPack passes the "have" lines to PackWriter so PackWriter can omit them from the generated pack. UploadPack processes "shallow" lines by calling RevWalk.assumeShallow() with the set of shallow commits. RevWalk creates and caches RevCommits for these shallow commits, clearing out their parents. That way, walks correctly terminate at the shallow commits instead of assuming the client has history going back behind them. UploadPack converts its RevWalk to an ObjectWalk, maintaining the cached RevCommits, and passes it to PackWriter. Unfortunately, to support shallow fetches the PackWriter does the following: if (shallowPack && !(walk instanceof DepthWalk.ObjectWalk)) walk = new DepthWalk.ObjectWalk(reader, depth); That is, when the client sends a "deepen" line (fetch --depth=<n>) and the caller has not passed in a DepthWalk.ObjectWalk, PackWriter throws away the RevWalk that was passed in and makes a new one. The cleared parent lists prepared by RevWalk.assumeShallow() are lost. Fortunately UploadPack intends to pass in a DepthWalk.ObjectWalk. It tries to create it by calling toObjectWalkWithSameObjects() on a DepthWalk.RevWalk. But it doesn't work: because DepthWalk.RevWalk does not override the standard RevWalk#toObjectWalkWithSameObjects implementation, the result is a plain ObjectWalk instead of an instance of DepthWalk.ObjectWalk. The result is that the "shallow" information is thrown away and objects reachable from the shallow commits can be omitted from the pack sent when fetching with --depth from a shallow clone. Multiple factors collude to limit the circumstances under which this bug can be observed: 1. Commits with depth != 0 don't enter DepthGenerator's pending queue. That means a "have" cannot have any effect on DepthGenerator unless it is also a "want". 2. DepthGenerator#next() doesn't call carryFlagsImpl(), so the uninteresting flag is not propagated to ancestors there even if a "have" is also a "want". 3. JGit treats a depth of 1 as "1 past the wants". Because of (2), the only place the UNINTERESTING flag can leak to a shallow commit's parents is in the carryFlags() call from markUninteresting(). carryFlags() only traverses commits that have already been parsed: commits yet to be parsed are supposed to inherit correct flags from their parent in PendingGenerator#next (which doesn't happen here --- that is (2)). So the list of commits that have already been parsed becomes relevant. When we hit the markUninteresting() call, all "want"s, "have"s, and commits to be unshallowed have been parsed. carryFlags() only affects the parsed commits. If the "want" is a direct parent of a "have", then it carryFlags() marks it as uninteresting. If the "have" was also a "shallow", then its parent pointer should have been null and the "want" shouldn't have been marked, so we see the bug. If the "want" is a more distant ancestor then (2) keeps the uninteresting state from propagating to the "want" and we don't see the bug. If the "shallow" is not also a "have" then the shallow commit isn't parsed so (2) keeps the uninteresting state from propagating to the "want so we don't see the bug. Here is a reproduction case (time flowing left to right, arrows pointing to parents). "C" must be a commit that the client reports as a "have" during negotiation. That can only happen if the server reports it as an existing branch or tag in the first round of negotiation: A <-- B <-- C <-- D First do git clone --depth 1 <repo> which yields D as a "have" and C as a "shallow" commit. Then try git fetch --depth 1 <repo> B:refs/heads/B Negotiation sets up: have D, shallow C, have C, want B. But due to this bug B is marked as uninteresting and is not sent. Change-Id: I6e14b57b2f85e52d28cdcf356df647870f475440 Signed-off-by: Terry Parker <tparker@google.com>
7 years ago
Shallow fetch: Respect "shallow" lines When fetching from a shallow clone, the client sends "have" lines to tell the server about objects it already has and "shallow" lines to tell where its local history terminates. In some circumstances, the server fails to honor the shallow lines and fails to return objects that the client needs. UploadPack passes the "have" lines to PackWriter so PackWriter can omit them from the generated pack. UploadPack processes "shallow" lines by calling RevWalk.assumeShallow() with the set of shallow commits. RevWalk creates and caches RevCommits for these shallow commits, clearing out their parents. That way, walks correctly terminate at the shallow commits instead of assuming the client has history going back behind them. UploadPack converts its RevWalk to an ObjectWalk, maintaining the cached RevCommits, and passes it to PackWriter. Unfortunately, to support shallow fetches the PackWriter does the following: if (shallowPack && !(walk instanceof DepthWalk.ObjectWalk)) walk = new DepthWalk.ObjectWalk(reader, depth); That is, when the client sends a "deepen" line (fetch --depth=<n>) and the caller has not passed in a DepthWalk.ObjectWalk, PackWriter throws away the RevWalk that was passed in and makes a new one. The cleared parent lists prepared by RevWalk.assumeShallow() are lost. Fortunately UploadPack intends to pass in a DepthWalk.ObjectWalk. It tries to create it by calling toObjectWalkWithSameObjects() on a DepthWalk.RevWalk. But it doesn't work: because DepthWalk.RevWalk does not override the standard RevWalk#toObjectWalkWithSameObjects implementation, the result is a plain ObjectWalk instead of an instance of DepthWalk.ObjectWalk. The result is that the "shallow" information is thrown away and objects reachable from the shallow commits can be omitted from the pack sent when fetching with --depth from a shallow clone. Multiple factors collude to limit the circumstances under which this bug can be observed: 1. Commits with depth != 0 don't enter DepthGenerator's pending queue. That means a "have" cannot have any effect on DepthGenerator unless it is also a "want". 2. DepthGenerator#next() doesn't call carryFlagsImpl(), so the uninteresting flag is not propagated to ancestors there even if a "have" is also a "want". 3. JGit treats a depth of 1 as "1 past the wants". Because of (2), the only place the UNINTERESTING flag can leak to a shallow commit's parents is in the carryFlags() call from markUninteresting(). carryFlags() only traverses commits that have already been parsed: commits yet to be parsed are supposed to inherit correct flags from their parent in PendingGenerator#next (which doesn't happen here --- that is (2)). So the list of commits that have already been parsed becomes relevant. When we hit the markUninteresting() call, all "want"s, "have"s, and commits to be unshallowed have been parsed. carryFlags() only affects the parsed commits. If the "want" is a direct parent of a "have", then it carryFlags() marks it as uninteresting. If the "have" was also a "shallow", then its parent pointer should have been null and the "want" shouldn't have been marked, so we see the bug. If the "want" is a more distant ancestor then (2) keeps the uninteresting state from propagating to the "want" and we don't see the bug. If the "shallow" is not also a "have" then the shallow commit isn't parsed so (2) keeps the uninteresting state from propagating to the "want so we don't see the bug. Here is a reproduction case (time flowing left to right, arrows pointing to parents). "C" must be a commit that the client reports as a "have" during negotiation. That can only happen if the server reports it as an existing branch or tag in the first round of negotiation: A <-- B <-- C <-- D First do git clone --depth 1 <repo> which yields D as a "have" and C as a "shallow" commit. Then try git fetch --depth 1 <repo> B:refs/heads/B Negotiation sets up: have D, shallow C, have C, want B. But due to this bug B is marked as uninteresting and is not sent. Change-Id: I6e14b57b2f85e52d28cdcf356df647870f475440 Signed-off-by: Terry Parker <tparker@google.com>
7 years ago
Shallow fetch: Respect "shallow" lines When fetching from a shallow clone, the client sends "have" lines to tell the server about objects it already has and "shallow" lines to tell where its local history terminates. In some circumstances, the server fails to honor the shallow lines and fails to return objects that the client needs. UploadPack passes the "have" lines to PackWriter so PackWriter can omit them from the generated pack. UploadPack processes "shallow" lines by calling RevWalk.assumeShallow() with the set of shallow commits. RevWalk creates and caches RevCommits for these shallow commits, clearing out their parents. That way, walks correctly terminate at the shallow commits instead of assuming the client has history going back behind them. UploadPack converts its RevWalk to an ObjectWalk, maintaining the cached RevCommits, and passes it to PackWriter. Unfortunately, to support shallow fetches the PackWriter does the following: if (shallowPack && !(walk instanceof DepthWalk.ObjectWalk)) walk = new DepthWalk.ObjectWalk(reader, depth); That is, when the client sends a "deepen" line (fetch --depth=<n>) and the caller has not passed in a DepthWalk.ObjectWalk, PackWriter throws away the RevWalk that was passed in and makes a new one. The cleared parent lists prepared by RevWalk.assumeShallow() are lost. Fortunately UploadPack intends to pass in a DepthWalk.ObjectWalk. It tries to create it by calling toObjectWalkWithSameObjects() on a DepthWalk.RevWalk. But it doesn't work: because DepthWalk.RevWalk does not override the standard RevWalk#toObjectWalkWithSameObjects implementation, the result is a plain ObjectWalk instead of an instance of DepthWalk.ObjectWalk. The result is that the "shallow" information is thrown away and objects reachable from the shallow commits can be omitted from the pack sent when fetching with --depth from a shallow clone. Multiple factors collude to limit the circumstances under which this bug can be observed: 1. Commits with depth != 0 don't enter DepthGenerator's pending queue. That means a "have" cannot have any effect on DepthGenerator unless it is also a "want". 2. DepthGenerator#next() doesn't call carryFlagsImpl(), so the uninteresting flag is not propagated to ancestors there even if a "have" is also a "want". 3. JGit treats a depth of 1 as "1 past the wants". Because of (2), the only place the UNINTERESTING flag can leak to a shallow commit's parents is in the carryFlags() call from markUninteresting(). carryFlags() only traverses commits that have already been parsed: commits yet to be parsed are supposed to inherit correct flags from their parent in PendingGenerator#next (which doesn't happen here --- that is (2)). So the list of commits that have already been parsed becomes relevant. When we hit the markUninteresting() call, all "want"s, "have"s, and commits to be unshallowed have been parsed. carryFlags() only affects the parsed commits. If the "want" is a direct parent of a "have", then it carryFlags() marks it as uninteresting. If the "have" was also a "shallow", then its parent pointer should have been null and the "want" shouldn't have been marked, so we see the bug. If the "want" is a more distant ancestor then (2) keeps the uninteresting state from propagating to the "want" and we don't see the bug. If the "shallow" is not also a "have" then the shallow commit isn't parsed so (2) keeps the uninteresting state from propagating to the "want so we don't see the bug. Here is a reproduction case (time flowing left to right, arrows pointing to parents). "C" must be a commit that the client reports as a "have" during negotiation. That can only happen if the server reports it as an existing branch or tag in the first round of negotiation: A <-- B <-- C <-- D First do git clone --depth 1 <repo> which yields D as a "have" and C as a "shallow" commit. Then try git fetch --depth 1 <repo> B:refs/heads/B Negotiation sets up: have D, shallow C, have C, want B. But due to this bug B is marked as uninteresting and is not sent. Change-Id: I6e14b57b2f85e52d28cdcf356df647870f475440 Signed-off-by: Terry Parker <tparker@google.com>
7 years ago
Shallow fetch: Respect "shallow" lines When fetching from a shallow clone, the client sends "have" lines to tell the server about objects it already has and "shallow" lines to tell where its local history terminates. In some circumstances, the server fails to honor the shallow lines and fails to return objects that the client needs. UploadPack passes the "have" lines to PackWriter so PackWriter can omit them from the generated pack. UploadPack processes "shallow" lines by calling RevWalk.assumeShallow() with the set of shallow commits. RevWalk creates and caches RevCommits for these shallow commits, clearing out their parents. That way, walks correctly terminate at the shallow commits instead of assuming the client has history going back behind them. UploadPack converts its RevWalk to an ObjectWalk, maintaining the cached RevCommits, and passes it to PackWriter. Unfortunately, to support shallow fetches the PackWriter does the following: if (shallowPack && !(walk instanceof DepthWalk.ObjectWalk)) walk = new DepthWalk.ObjectWalk(reader, depth); That is, when the client sends a "deepen" line (fetch --depth=<n>) and the caller has not passed in a DepthWalk.ObjectWalk, PackWriter throws away the RevWalk that was passed in and makes a new one. The cleared parent lists prepared by RevWalk.assumeShallow() are lost. Fortunately UploadPack intends to pass in a DepthWalk.ObjectWalk. It tries to create it by calling toObjectWalkWithSameObjects() on a DepthWalk.RevWalk. But it doesn't work: because DepthWalk.RevWalk does not override the standard RevWalk#toObjectWalkWithSameObjects implementation, the result is a plain ObjectWalk instead of an instance of DepthWalk.ObjectWalk. The result is that the "shallow" information is thrown away and objects reachable from the shallow commits can be omitted from the pack sent when fetching with --depth from a shallow clone. Multiple factors collude to limit the circumstances under which this bug can be observed: 1. Commits with depth != 0 don't enter DepthGenerator's pending queue. That means a "have" cannot have any effect on DepthGenerator unless it is also a "want". 2. DepthGenerator#next() doesn't call carryFlagsImpl(), so the uninteresting flag is not propagated to ancestors there even if a "have" is also a "want". 3. JGit treats a depth of 1 as "1 past the wants". Because of (2), the only place the UNINTERESTING flag can leak to a shallow commit's parents is in the carryFlags() call from markUninteresting(). carryFlags() only traverses commits that have already been parsed: commits yet to be parsed are supposed to inherit correct flags from their parent in PendingGenerator#next (which doesn't happen here --- that is (2)). So the list of commits that have already been parsed becomes relevant. When we hit the markUninteresting() call, all "want"s, "have"s, and commits to be unshallowed have been parsed. carryFlags() only affects the parsed commits. If the "want" is a direct parent of a "have", then it carryFlags() marks it as uninteresting. If the "have" was also a "shallow", then its parent pointer should have been null and the "want" shouldn't have been marked, so we see the bug. If the "want" is a more distant ancestor then (2) keeps the uninteresting state from propagating to the "want" and we don't see the bug. If the "shallow" is not also a "have" then the shallow commit isn't parsed so (2) keeps the uninteresting state from propagating to the "want so we don't see the bug. Here is a reproduction case (time flowing left to right, arrows pointing to parents). "C" must be a commit that the client reports as a "have" during negotiation. That can only happen if the server reports it as an existing branch or tag in the first round of negotiation: A <-- B <-- C <-- D First do git clone --depth 1 <repo> which yields D as a "have" and C as a "shallow" commit. Then try git fetch --depth 1 <repo> B:refs/heads/B Negotiation sets up: have D, shallow C, have C, want B. But due to this bug B is marked as uninteresting and is not sent. Change-Id: I6e14b57b2f85e52d28cdcf356df647870f475440 Signed-off-by: Terry Parker <tparker@google.com>
7 years ago
Shallow fetch: Respect "shallow" lines When fetching from a shallow clone, the client sends "have" lines to tell the server about objects it already has and "shallow" lines to tell where its local history terminates. In some circumstances, the server fails to honor the shallow lines and fails to return objects that the client needs. UploadPack passes the "have" lines to PackWriter so PackWriter can omit them from the generated pack. UploadPack processes "shallow" lines by calling RevWalk.assumeShallow() with the set of shallow commits. RevWalk creates and caches RevCommits for these shallow commits, clearing out their parents. That way, walks correctly terminate at the shallow commits instead of assuming the client has history going back behind them. UploadPack converts its RevWalk to an ObjectWalk, maintaining the cached RevCommits, and passes it to PackWriter. Unfortunately, to support shallow fetches the PackWriter does the following: if (shallowPack && !(walk instanceof DepthWalk.ObjectWalk)) walk = new DepthWalk.ObjectWalk(reader, depth); That is, when the client sends a "deepen" line (fetch --depth=<n>) and the caller has not passed in a DepthWalk.ObjectWalk, PackWriter throws away the RevWalk that was passed in and makes a new one. The cleared parent lists prepared by RevWalk.assumeShallow() are lost. Fortunately UploadPack intends to pass in a DepthWalk.ObjectWalk. It tries to create it by calling toObjectWalkWithSameObjects() on a DepthWalk.RevWalk. But it doesn't work: because DepthWalk.RevWalk does not override the standard RevWalk#toObjectWalkWithSameObjects implementation, the result is a plain ObjectWalk instead of an instance of DepthWalk.ObjectWalk. The result is that the "shallow" information is thrown away and objects reachable from the shallow commits can be omitted from the pack sent when fetching with --depth from a shallow clone. Multiple factors collude to limit the circumstances under which this bug can be observed: 1. Commits with depth != 0 don't enter DepthGenerator's pending queue. That means a "have" cannot have any effect on DepthGenerator unless it is also a "want". 2. DepthGenerator#next() doesn't call carryFlagsImpl(), so the uninteresting flag is not propagated to ancestors there even if a "have" is also a "want". 3. JGit treats a depth of 1 as "1 past the wants". Because of (2), the only place the UNINTERESTING flag can leak to a shallow commit's parents is in the carryFlags() call from markUninteresting(). carryFlags() only traverses commits that have already been parsed: commits yet to be parsed are supposed to inherit correct flags from their parent in PendingGenerator#next (which doesn't happen here --- that is (2)). So the list of commits that have already been parsed becomes relevant. When we hit the markUninteresting() call, all "want"s, "have"s, and commits to be unshallowed have been parsed. carryFlags() only affects the parsed commits. If the "want" is a direct parent of a "have", then it carryFlags() marks it as uninteresting. If the "have" was also a "shallow", then its parent pointer should have been null and the "want" shouldn't have been marked, so we see the bug. If the "want" is a more distant ancestor then (2) keeps the uninteresting state from propagating to the "want" and we don't see the bug. If the "shallow" is not also a "have" then the shallow commit isn't parsed so (2) keeps the uninteresting state from propagating to the "want so we don't see the bug. Here is a reproduction case (time flowing left to right, arrows pointing to parents). "C" must be a commit that the client reports as a "have" during negotiation. That can only happen if the server reports it as an existing branch or tag in the first round of negotiation: A <-- B <-- C <-- D First do git clone --depth 1 <repo> which yields D as a "have" and C as a "shallow" commit. Then try git fetch --depth 1 <repo> B:refs/heads/B Negotiation sets up: have D, shallow C, have C, want B. But due to this bug B is marked as uninteresting and is not sent. Change-Id: I6e14b57b2f85e52d28cdcf356df647870f475440 Signed-off-by: Terry Parker <tparker@google.com>
7 years ago
Shallow fetch: Respect "shallow" lines When fetching from a shallow clone, the client sends "have" lines to tell the server about objects it already has and "shallow" lines to tell where its local history terminates. In some circumstances, the server fails to honor the shallow lines and fails to return objects that the client needs. UploadPack passes the "have" lines to PackWriter so PackWriter can omit them from the generated pack. UploadPack processes "shallow" lines by calling RevWalk.assumeShallow() with the set of shallow commits. RevWalk creates and caches RevCommits for these shallow commits, clearing out their parents. That way, walks correctly terminate at the shallow commits instead of assuming the client has history going back behind them. UploadPack converts its RevWalk to an ObjectWalk, maintaining the cached RevCommits, and passes it to PackWriter. Unfortunately, to support shallow fetches the PackWriter does the following: if (shallowPack && !(walk instanceof DepthWalk.ObjectWalk)) walk = new DepthWalk.ObjectWalk(reader, depth); That is, when the client sends a "deepen" line (fetch --depth=<n>) and the caller has not passed in a DepthWalk.ObjectWalk, PackWriter throws away the RevWalk that was passed in and makes a new one. The cleared parent lists prepared by RevWalk.assumeShallow() are lost. Fortunately UploadPack intends to pass in a DepthWalk.ObjectWalk. It tries to create it by calling toObjectWalkWithSameObjects() on a DepthWalk.RevWalk. But it doesn't work: because DepthWalk.RevWalk does not override the standard RevWalk#toObjectWalkWithSameObjects implementation, the result is a plain ObjectWalk instead of an instance of DepthWalk.ObjectWalk. The result is that the "shallow" information is thrown away and objects reachable from the shallow commits can be omitted from the pack sent when fetching with --depth from a shallow clone. Multiple factors collude to limit the circumstances under which this bug can be observed: 1. Commits with depth != 0 don't enter DepthGenerator's pending queue. That means a "have" cannot have any effect on DepthGenerator unless it is also a "want". 2. DepthGenerator#next() doesn't call carryFlagsImpl(), so the uninteresting flag is not propagated to ancestors there even if a "have" is also a "want". 3. JGit treats a depth of 1 as "1 past the wants". Because of (2), the only place the UNINTERESTING flag can leak to a shallow commit's parents is in the carryFlags() call from markUninteresting(). carryFlags() only traverses commits that have already been parsed: commits yet to be parsed are supposed to inherit correct flags from their parent in PendingGenerator#next (which doesn't happen here --- that is (2)). So the list of commits that have already been parsed becomes relevant. When we hit the markUninteresting() call, all "want"s, "have"s, and commits to be unshallowed have been parsed. carryFlags() only affects the parsed commits. If the "want" is a direct parent of a "have", then it carryFlags() marks it as uninteresting. If the "have" was also a "shallow", then its parent pointer should have been null and the "want" shouldn't have been marked, so we see the bug. If the "want" is a more distant ancestor then (2) keeps the uninteresting state from propagating to the "want" and we don't see the bug. If the "shallow" is not also a "have" then the shallow commit isn't parsed so (2) keeps the uninteresting state from propagating to the "want so we don't see the bug. Here is a reproduction case (time flowing left to right, arrows pointing to parents). "C" must be a commit that the client reports as a "have" during negotiation. That can only happen if the server reports it as an existing branch or tag in the first round of negotiation: A <-- B <-- C <-- D First do git clone --depth 1 <repo> which yields D as a "have" and C as a "shallow" commit. Then try git fetch --depth 1 <repo> B:refs/heads/B Negotiation sets up: have D, shallow C, have C, want B. But due to this bug B is marked as uninteresting and is not sent. Change-Id: I6e14b57b2f85e52d28cdcf356df647870f475440 Signed-off-by: Terry Parker <tparker@google.com>
7 years ago
Shallow fetch: Respect "shallow" lines When fetching from a shallow clone, the client sends "have" lines to tell the server about objects it already has and "shallow" lines to tell where its local history terminates. In some circumstances, the server fails to honor the shallow lines and fails to return objects that the client needs. UploadPack passes the "have" lines to PackWriter so PackWriter can omit them from the generated pack. UploadPack processes "shallow" lines by calling RevWalk.assumeShallow() with the set of shallow commits. RevWalk creates and caches RevCommits for these shallow commits, clearing out their parents. That way, walks correctly terminate at the shallow commits instead of assuming the client has history going back behind them. UploadPack converts its RevWalk to an ObjectWalk, maintaining the cached RevCommits, and passes it to PackWriter. Unfortunately, to support shallow fetches the PackWriter does the following: if (shallowPack && !(walk instanceof DepthWalk.ObjectWalk)) walk = new DepthWalk.ObjectWalk(reader, depth); That is, when the client sends a "deepen" line (fetch --depth=<n>) and the caller has not passed in a DepthWalk.ObjectWalk, PackWriter throws away the RevWalk that was passed in and makes a new one. The cleared parent lists prepared by RevWalk.assumeShallow() are lost. Fortunately UploadPack intends to pass in a DepthWalk.ObjectWalk. It tries to create it by calling toObjectWalkWithSameObjects() on a DepthWalk.RevWalk. But it doesn't work: because DepthWalk.RevWalk does not override the standard RevWalk#toObjectWalkWithSameObjects implementation, the result is a plain ObjectWalk instead of an instance of DepthWalk.ObjectWalk. The result is that the "shallow" information is thrown away and objects reachable from the shallow commits can be omitted from the pack sent when fetching with --depth from a shallow clone. Multiple factors collude to limit the circumstances under which this bug can be observed: 1. Commits with depth != 0 don't enter DepthGenerator's pending queue. That means a "have" cannot have any effect on DepthGenerator unless it is also a "want". 2. DepthGenerator#next() doesn't call carryFlagsImpl(), so the uninteresting flag is not propagated to ancestors there even if a "have" is also a "want". 3. JGit treats a depth of 1 as "1 past the wants". Because of (2), the only place the UNINTERESTING flag can leak to a shallow commit's parents is in the carryFlags() call from markUninteresting(). carryFlags() only traverses commits that have already been parsed: commits yet to be parsed are supposed to inherit correct flags from their parent in PendingGenerator#next (which doesn't happen here --- that is (2)). So the list of commits that have already been parsed becomes relevant. When we hit the markUninteresting() call, all "want"s, "have"s, and commits to be unshallowed have been parsed. carryFlags() only affects the parsed commits. If the "want" is a direct parent of a "have", then it carryFlags() marks it as uninteresting. If the "have" was also a "shallow", then its parent pointer should have been null and the "want" shouldn't have been marked, so we see the bug. If the "want" is a more distant ancestor then (2) keeps the uninteresting state from propagating to the "want" and we don't see the bug. If the "shallow" is not also a "have" then the shallow commit isn't parsed so (2) keeps the uninteresting state from propagating to the "want so we don't see the bug. Here is a reproduction case (time flowing left to right, arrows pointing to parents). "C" must be a commit that the client reports as a "have" during negotiation. That can only happen if the server reports it as an existing branch or tag in the first round of negotiation: A <-- B <-- C <-- D First do git clone --depth 1 <repo> which yields D as a "have" and C as a "shallow" commit. Then try git fetch --depth 1 <repo> B:refs/heads/B Negotiation sets up: have D, shallow C, have C, want B. But due to this bug B is marked as uninteresting and is not sent. Change-Id: I6e14b57b2f85e52d28cdcf356df647870f475440 Signed-off-by: Terry Parker <tparker@google.com>
7 years ago
Shallow fetch: Respect "shallow" lines When fetching from a shallow clone, the client sends "have" lines to tell the server about objects it already has and "shallow" lines to tell where its local history terminates. In some circumstances, the server fails to honor the shallow lines and fails to return objects that the client needs. UploadPack passes the "have" lines to PackWriter so PackWriter can omit them from the generated pack. UploadPack processes "shallow" lines by calling RevWalk.assumeShallow() with the set of shallow commits. RevWalk creates and caches RevCommits for these shallow commits, clearing out their parents. That way, walks correctly terminate at the shallow commits instead of assuming the client has history going back behind them. UploadPack converts its RevWalk to an ObjectWalk, maintaining the cached RevCommits, and passes it to PackWriter. Unfortunately, to support shallow fetches the PackWriter does the following: if (shallowPack && !(walk instanceof DepthWalk.ObjectWalk)) walk = new DepthWalk.ObjectWalk(reader, depth); That is, when the client sends a "deepen" line (fetch --depth=<n>) and the caller has not passed in a DepthWalk.ObjectWalk, PackWriter throws away the RevWalk that was passed in and makes a new one. The cleared parent lists prepared by RevWalk.assumeShallow() are lost. Fortunately UploadPack intends to pass in a DepthWalk.ObjectWalk. It tries to create it by calling toObjectWalkWithSameObjects() on a DepthWalk.RevWalk. But it doesn't work: because DepthWalk.RevWalk does not override the standard RevWalk#toObjectWalkWithSameObjects implementation, the result is a plain ObjectWalk instead of an instance of DepthWalk.ObjectWalk. The result is that the "shallow" information is thrown away and objects reachable from the shallow commits can be omitted from the pack sent when fetching with --depth from a shallow clone. Multiple factors collude to limit the circumstances under which this bug can be observed: 1. Commits with depth != 0 don't enter DepthGenerator's pending queue. That means a "have" cannot have any effect on DepthGenerator unless it is also a "want". 2. DepthGenerator#next() doesn't call carryFlagsImpl(), so the uninteresting flag is not propagated to ancestors there even if a "have" is also a "want". 3. JGit treats a depth of 1 as "1 past the wants". Because of (2), the only place the UNINTERESTING flag can leak to a shallow commit's parents is in the carryFlags() call from markUninteresting(). carryFlags() only traverses commits that have already been parsed: commits yet to be parsed are supposed to inherit correct flags from their parent in PendingGenerator#next (which doesn't happen here --- that is (2)). So the list of commits that have already been parsed becomes relevant. When we hit the markUninteresting() call, all "want"s, "have"s, and commits to be unshallowed have been parsed. carryFlags() only affects the parsed commits. If the "want" is a direct parent of a "have", then it carryFlags() marks it as uninteresting. If the "have" was also a "shallow", then its parent pointer should have been null and the "want" shouldn't have been marked, so we see the bug. If the "want" is a more distant ancestor then (2) keeps the uninteresting state from propagating to the "want" and we don't see the bug. If the "shallow" is not also a "have" then the shallow commit isn't parsed so (2) keeps the uninteresting state from propagating to the "want so we don't see the bug. Here is a reproduction case (time flowing left to right, arrows pointing to parents). "C" must be a commit that the client reports as a "have" during negotiation. That can only happen if the server reports it as an existing branch or tag in the first round of negotiation: A <-- B <-- C <-- D First do git clone --depth 1 <repo> which yields D as a "have" and C as a "shallow" commit. Then try git fetch --depth 1 <repo> B:refs/heads/B Negotiation sets up: have D, shallow C, have C, want B. But due to this bug B is marked as uninteresting and is not sent. Change-Id: I6e14b57b2f85e52d28cdcf356df647870f475440 Signed-off-by: Terry Parker <tparker@google.com>
7 years ago
Fix missing deltas near type boundaries Delta search was discarding discovered deltas if an object appeared near a type boundary in the delta search window. This has caused JGit to produce larger pack files than other implementations of the packing algorithm. Delta search works by pushing prior objects into a search window, an ordered list of objects to attempt to delta compress the next object against. (The window size is bounded, avoiding O(N^2) behavior.) For implementation reasons multiple object types can appear in the input list, and the window. PackWriter commonly passes both trees and blobs in the input list handed to the DeltaWindow algorithm. The pack file format requires an object to only delta compress against the same type, so the DeltaWindow algorithm must stop doing comparisions if a blob would be compared to a tree. Because the input list is sorted by object type and the window is recently considered prior objects, once a wrong type is discovered in the window the search algorithm stops and uses the current result. Unfortunately the termination condition was discarding any found delta by setting deltaBase and deltaBuf to null when it was trying to break the window search. When this bug occurs, the state of the DeltaWindow looks like this: current | \ / input list: tree0 tree1 blob1 blob2 window: blob1 tree1 tree0 / \ | res.prev As the loop iterates to the right across the window, it first finds that blob1 is a suitable delta base for blob2, and temporarily holds this in the bestDelta/deltaBuf fields. It then considers tree1, but tree1 has the wrong type (blob != tree), so the window loop must give up and fall through the remaining code. Moving the condition up and discarding the window contents allows the bestDelta/deltaBuf to be kept, letting the final file delta compress blob1 against blob0. The impact of this bug (and its fix) on real world repositories is likely minimal. The boundary from blob to tree happens approximately once in the search, as the input list is sorted by type. Only the first window size worth of blobs (e.g. 10 or 250) were failing to produce a delta in the final file. This bug fix does produce significantly different results for small test repositories created in the unit test suite, such as when a pack may contains 6 objects (2 commits, 2 trees, 2 blobs). Packing test cases can now better sample different output pack file sizes depending on delta compression and object reuse flags in PackConfig. Change-Id: Ibec09398d0305d4dbc0c66fce1daaf38eb71148f
7 years ago
Fix missing deltas near type boundaries Delta search was discarding discovered deltas if an object appeared near a type boundary in the delta search window. This has caused JGit to produce larger pack files than other implementations of the packing algorithm. Delta search works by pushing prior objects into a search window, an ordered list of objects to attempt to delta compress the next object against. (The window size is bounded, avoiding O(N^2) behavior.) For implementation reasons multiple object types can appear in the input list, and the window. PackWriter commonly passes both trees and blobs in the input list handed to the DeltaWindow algorithm. The pack file format requires an object to only delta compress against the same type, so the DeltaWindow algorithm must stop doing comparisions if a blob would be compared to a tree. Because the input list is sorted by object type and the window is recently considered prior objects, once a wrong type is discovered in the window the search algorithm stops and uses the current result. Unfortunately the termination condition was discarding any found delta by setting deltaBase and deltaBuf to null when it was trying to break the window search. When this bug occurs, the state of the DeltaWindow looks like this: current | \ / input list: tree0 tree1 blob1 blob2 window: blob1 tree1 tree0 / \ | res.prev As the loop iterates to the right across the window, it first finds that blob1 is a suitable delta base for blob2, and temporarily holds this in the bestDelta/deltaBuf fields. It then considers tree1, but tree1 has the wrong type (blob != tree), so the window loop must give up and fall through the remaining code. Moving the condition up and discarding the window contents allows the bestDelta/deltaBuf to be kept, letting the final file delta compress blob1 against blob0. The impact of this bug (and its fix) on real world repositories is likely minimal. The boundary from blob to tree happens approximately once in the search, as the input list is sorted by type. Only the first window size worth of blobs (e.g. 10 or 250) were failing to produce a delta in the final file. This bug fix does produce significantly different results for small test repositories created in the unit test suite, such as when a pack may contains 6 objects (2 commits, 2 trees, 2 blobs). Packing test cases can now better sample different output pack file sizes depending on delta compression and object reuse flags in PackConfig. Change-Id: Ibec09398d0305d4dbc0c66fce1daaf38eb71148f
7 years ago
Fix missing deltas near type boundaries Delta search was discarding discovered deltas if an object appeared near a type boundary in the delta search window. This has caused JGit to produce larger pack files than other implementations of the packing algorithm. Delta search works by pushing prior objects into a search window, an ordered list of objects to attempt to delta compress the next object against. (The window size is bounded, avoiding O(N^2) behavior.) For implementation reasons multiple object types can appear in the input list, and the window. PackWriter commonly passes both trees and blobs in the input list handed to the DeltaWindow algorithm. The pack file format requires an object to only delta compress against the same type, so the DeltaWindow algorithm must stop doing comparisions if a blob would be compared to a tree. Because the input list is sorted by object type and the window is recently considered prior objects, once a wrong type is discovered in the window the search algorithm stops and uses the current result. Unfortunately the termination condition was discarding any found delta by setting deltaBase and deltaBuf to null when it was trying to break the window search. When this bug occurs, the state of the DeltaWindow looks like this: current | \ / input list: tree0 tree1 blob1 blob2 window: blob1 tree1 tree0 / \ | res.prev As the loop iterates to the right across the window, it first finds that blob1 is a suitable delta base for blob2, and temporarily holds this in the bestDelta/deltaBuf fields. It then considers tree1, but tree1 has the wrong type (blob != tree), so the window loop must give up and fall through the remaining code. Moving the condition up and discarding the window contents allows the bestDelta/deltaBuf to be kept, letting the final file delta compress blob1 against blob0. The impact of this bug (and its fix) on real world repositories is likely minimal. The boundary from blob to tree happens approximately once in the search, as the input list is sorted by type. Only the first window size worth of blobs (e.g. 10 or 250) were failing to produce a delta in the final file. This bug fix does produce significantly different results for small test repositories created in the unit test suite, such as when a pack may contains 6 objects (2 commits, 2 trees, 2 blobs). Packing test cases can now better sample different output pack file sizes depending on delta compression and object reuse flags in PackConfig. Change-Id: Ibec09398d0305d4dbc0c66fce1daaf38eb71148f
7 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907
  1. /*
  2. * Copyright (C) 2008, Marek Zawirski <marek.zawirski@gmail.com>
  3. * and other copyright owners as documented in the project's IP log.
  4. *
  5. * This program and the accompanying materials are made available
  6. * under the terms of the Eclipse Distribution License v1.0 which
  7. * accompanies this distribution, is reproduced below, and is
  8. * available at http://www.eclipse.org/org/documents/edl-v10.php
  9. *
  10. * All rights reserved.
  11. *
  12. * Redistribution and use in source and binary forms, with or
  13. * without modification, are permitted provided that the following
  14. * conditions are met:
  15. *
  16. * - Redistributions of source code must retain the above copyright
  17. * notice, this list of conditions and the following disclaimer.
  18. *
  19. * - Redistributions in binary form must reproduce the above
  20. * copyright notice, this list of conditions and the following
  21. * disclaimer in the documentation and/or other materials provided
  22. * with the distribution.
  23. *
  24. * - Neither the name of the Eclipse Foundation, Inc. nor the
  25. * names of its contributors may be used to endorse or promote
  26. * products derived from this software without specific prior
  27. * written permission.
  28. *
  29. * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
  30. * CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
  31. * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  32. * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  33. * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
  34. * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
  35. * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
  36. * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
  37. * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  38. * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
  39. * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
  40. * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
  41. * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  42. */
  43. package org.eclipse.jgit.internal.storage.file;
  44. import static org.eclipse.jgit.internal.storage.pack.PackWriter.NONE;
  45. import static org.eclipse.jgit.lib.Constants.OBJ_BLOB;
  46. import static org.junit.Assert.assertEquals;
  47. import static org.junit.Assert.assertFalse;
  48. import static org.junit.Assert.assertNotNull;
  49. import static org.junit.Assert.assertTrue;
  50. import static org.junit.Assert.fail;
  51. import java.io.ByteArrayInputStream;
  52. import java.io.ByteArrayOutputStream;
  53. import java.io.File;
  54. import java.io.FileOutputStream;
  55. import java.io.IOException;
  56. import java.text.ParseException;
  57. import java.util.ArrayList;
  58. import java.util.Arrays;
  59. import java.util.Collections;
  60. import java.util.Comparator;
  61. import java.util.HashSet;
  62. import java.util.List;
  63. import java.util.Set;
  64. import org.eclipse.jgit.errors.MissingObjectException;
  65. import org.eclipse.jgit.internal.storage.file.PackIndex.MutableEntry;
  66. import org.eclipse.jgit.internal.storage.pack.PackWriter;
  67. import org.eclipse.jgit.junit.JGitTestUtil;
  68. import org.eclipse.jgit.junit.TestRepository;
  69. import org.eclipse.jgit.junit.TestRepository.BranchBuilder;
  70. import org.eclipse.jgit.lib.NullProgressMonitor;
  71. import org.eclipse.jgit.lib.ObjectId;
  72. import org.eclipse.jgit.lib.ObjectIdSet;
  73. import org.eclipse.jgit.lib.ObjectInserter;
  74. import org.eclipse.jgit.lib.Repository;
  75. import org.eclipse.jgit.lib.Sets;
  76. import org.eclipse.jgit.revwalk.DepthWalk;
  77. import org.eclipse.jgit.revwalk.ObjectWalk;
  78. import org.eclipse.jgit.revwalk.RevBlob;
  79. import org.eclipse.jgit.revwalk.RevCommit;
  80. import org.eclipse.jgit.revwalk.RevObject;
  81. import org.eclipse.jgit.revwalk.RevWalk;
  82. import org.eclipse.jgit.storage.pack.PackConfig;
  83. import org.eclipse.jgit.storage.pack.PackStatistics;
  84. import org.eclipse.jgit.test.resources.SampleDataRepositoryTestCase;
  85. import org.eclipse.jgit.transport.PackParser;
  86. import org.junit.After;
  87. import org.junit.Before;
  88. import org.junit.Test;
  89. public class PackWriterTest extends SampleDataRepositoryTestCase {
  90. private static final List<RevObject> EMPTY_LIST_REVS = Collections
  91. .<RevObject> emptyList();
  92. private static final Set<ObjectIdSet> EMPTY_ID_SET = Collections
  93. .<ObjectIdSet> emptySet();
  94. private PackConfig config;
  95. private PackWriter writer;
  96. private ByteArrayOutputStream os;
  97. private PackFile pack;
  98. private ObjectInserter inserter;
  99. private FileRepository dst;
  100. private RevBlob contentA;
  101. private RevBlob contentB;
  102. private RevBlob contentC;
  103. private RevBlob contentD;
  104. private RevBlob contentE;
  105. private RevCommit c1;
  106. private RevCommit c2;
  107. private RevCommit c3;
  108. private RevCommit c4;
  109. private RevCommit c5;
  110. @Override
  111. @Before
  112. public void setUp() throws Exception {
  113. super.setUp();
  114. os = new ByteArrayOutputStream();
  115. config = new PackConfig(db);
  116. dst = createBareRepository();
  117. File alt = new File(dst.getObjectDatabase().getDirectory(), "info/alternates");
  118. alt.getParentFile().mkdirs();
  119. write(alt, db.getObjectDatabase().getDirectory().getAbsolutePath() + "\n");
  120. }
  121. @Override
  122. @After
  123. public void tearDown() throws Exception {
  124. if (writer != null) {
  125. writer.close();
  126. writer = null;
  127. }
  128. if (inserter != null) {
  129. inserter.close();
  130. inserter = null;
  131. }
  132. super.tearDown();
  133. }
  134. /**
  135. * Test constructor for exceptions, default settings, initialization.
  136. *
  137. * @throws IOException
  138. */
  139. @Test
  140. public void testContructor() throws IOException {
  141. writer = new PackWriter(config, db.newObjectReader());
  142. assertFalse(writer.isDeltaBaseAsOffset());
  143. assertTrue(config.isReuseDeltas());
  144. assertTrue(config.isReuseObjects());
  145. assertEquals(0, writer.getObjectCount());
  146. }
  147. /**
  148. * Change default settings and verify them.
  149. */
  150. @Test
  151. public void testModifySettings() {
  152. config.setReuseDeltas(false);
  153. config.setReuseObjects(false);
  154. config.setDeltaBaseAsOffset(false);
  155. assertFalse(config.isReuseDeltas());
  156. assertFalse(config.isReuseObjects());
  157. assertFalse(config.isDeltaBaseAsOffset());
  158. writer = new PackWriter(config, db.newObjectReader());
  159. writer.setDeltaBaseAsOffset(true);
  160. assertTrue(writer.isDeltaBaseAsOffset());
  161. assertFalse(config.isDeltaBaseAsOffset());
  162. }
  163. /**
  164. * Write empty pack by providing empty sets of interesting/uninteresting
  165. * objects and check for correct format.
  166. *
  167. * @throws IOException
  168. */
  169. @Test
  170. public void testWriteEmptyPack1() throws IOException {
  171. createVerifyOpenPack(NONE, NONE, false, false);
  172. assertEquals(0, writer.getObjectCount());
  173. assertEquals(0, pack.getObjectCount());
  174. assertEquals("da39a3ee5e6b4b0d3255bfef95601890afd80709", writer
  175. .computeName().name());
  176. }
  177. /**
  178. * Write empty pack by providing empty iterator of objects to write and
  179. * check for correct format.
  180. *
  181. * @throws IOException
  182. */
  183. @Test
  184. public void testWriteEmptyPack2() throws IOException {
  185. createVerifyOpenPack(EMPTY_LIST_REVS);
  186. assertEquals(0, writer.getObjectCount());
  187. assertEquals(0, pack.getObjectCount());
  188. }
  189. /**
  190. * Try to pass non-existing object as uninteresting, with non-ignoring
  191. * setting.
  192. *
  193. * @throws IOException
  194. */
  195. @Test
  196. public void testNotIgnoreNonExistingObjects() throws IOException {
  197. final ObjectId nonExisting = ObjectId
  198. .fromString("0000000000000000000000000000000000000001");
  199. try {
  200. createVerifyOpenPack(NONE, haves(nonExisting), false, false);
  201. fail("Should have thrown MissingObjectException");
  202. } catch (MissingObjectException x) {
  203. // expected
  204. }
  205. }
  206. /**
  207. * Try to pass non-existing object as uninteresting, with ignoring setting.
  208. *
  209. * @throws IOException
  210. */
  211. @Test
  212. public void testIgnoreNonExistingObjects() throws IOException {
  213. final ObjectId nonExisting = ObjectId
  214. .fromString("0000000000000000000000000000000000000001");
  215. createVerifyOpenPack(NONE, haves(nonExisting), false, true);
  216. // shouldn't throw anything
  217. }
  218. /**
  219. * Try to pass non-existing object as uninteresting, with ignoring setting.
  220. * Use a repo with bitmap indexes because then PackWriter will use
  221. * PackWriterBitmapWalker which had problems with this situation.
  222. *
  223. * @throws IOException
  224. * @throws ParseException
  225. */
  226. @Test
  227. public void testIgnoreNonExistingObjectsWithBitmaps() throws IOException,
  228. ParseException {
  229. final ObjectId nonExisting = ObjectId
  230. .fromString("0000000000000000000000000000000000000001");
  231. new GC(db).gc();
  232. createVerifyOpenPack(NONE, haves(nonExisting), false, true, true);
  233. // shouldn't throw anything
  234. }
  235. /**
  236. * Create pack basing on only interesting objects, then precisely verify
  237. * content. No delta reuse here.
  238. *
  239. * @throws IOException
  240. */
  241. @Test
  242. public void testWritePack1() throws IOException {
  243. config.setReuseDeltas(false);
  244. writeVerifyPack1();
  245. }
  246. /**
  247. * Test writing pack without object reuse. Pack content/preparation as in
  248. * {@link #testWritePack1()}.
  249. *
  250. * @throws IOException
  251. */
  252. @Test
  253. public void testWritePack1NoObjectReuse() throws IOException {
  254. config.setReuseDeltas(false);
  255. config.setReuseObjects(false);
  256. writeVerifyPack1();
  257. }
  258. /**
  259. * Create pack basing on both interesting and uninteresting objects, then
  260. * precisely verify content. No delta reuse here.
  261. *
  262. * @throws IOException
  263. */
  264. @Test
  265. public void testWritePack2() throws IOException {
  266. writeVerifyPack2(false);
  267. }
  268. /**
  269. * Test pack writing with deltas reuse, delta-base first rule. Pack
  270. * content/preparation as in {@link #testWritePack2()}.
  271. *
  272. * @throws IOException
  273. */
  274. @Test
  275. public void testWritePack2DeltasReuseRefs() throws IOException {
  276. writeVerifyPack2(true);
  277. }
  278. /**
  279. * Test pack writing with delta reuse. Delta bases referred as offsets. Pack
  280. * configuration as in {@link #testWritePack2DeltasReuseRefs()}.
  281. *
  282. * @throws IOException
  283. */
  284. @Test
  285. public void testWritePack2DeltasReuseOffsets() throws IOException {
  286. config.setDeltaBaseAsOffset(true);
  287. writeVerifyPack2(true);
  288. }
  289. /**
  290. * Test pack writing with delta reuse. Raw-data copy (reuse) is made on a
  291. * pack with CRC32 index. Pack configuration as in
  292. * {@link #testWritePack2DeltasReuseRefs()}.
  293. *
  294. * @throws IOException
  295. */
  296. @Test
  297. public void testWritePack2DeltasCRC32Copy() throws IOException {
  298. final File packDir = db.getObjectDatabase().getPackDirectory();
  299. final File crc32Pack = new File(packDir,
  300. "pack-34be9032ac282b11fa9babdc2b2a93ca996c9c2f.pack");
  301. final File crc32Idx = new File(packDir,
  302. "pack-34be9032ac282b11fa9babdc2b2a93ca996c9c2f.idx");
  303. copyFile(JGitTestUtil.getTestResourceFile(
  304. "pack-34be9032ac282b11fa9babdc2b2a93ca996c9c2f.idxV2"),
  305. crc32Idx);
  306. db.openPack(crc32Pack);
  307. writeVerifyPack2(true);
  308. }
  309. /**
  310. * Create pack basing on fixed objects list, then precisely verify content.
  311. * No delta reuse here.
  312. *
  313. * @throws IOException
  314. * @throws MissingObjectException
  315. *
  316. */
  317. @Test
  318. public void testWritePack3() throws MissingObjectException, IOException {
  319. config.setReuseDeltas(false);
  320. final ObjectId forcedOrder[] = new ObjectId[] {
  321. ObjectId.fromString("82c6b885ff600be425b4ea96dee75dca255b69e7"),
  322. ObjectId.fromString("c59759f143fb1fe21c197981df75a7ee00290799"),
  323. ObjectId.fromString("aabf2ffaec9b497f0950352b3e582d73035c2035"),
  324. ObjectId.fromString("902d5476fa249b7abc9d84c611577a81381f0327"),
  325. ObjectId.fromString("6ff87c4664981e4397625791c8ea3bbb5f2279a3") ,
  326. ObjectId.fromString("5b6e7c66c276e7610d4a73c70ec1a1f7c1003259") };
  327. try (RevWalk parser = new RevWalk(db)) {
  328. final RevObject forcedOrderRevs[] = new RevObject[forcedOrder.length];
  329. for (int i = 0; i < forcedOrder.length; i++)
  330. forcedOrderRevs[i] = parser.parseAny(forcedOrder[i]);
  331. createVerifyOpenPack(Arrays.asList(forcedOrderRevs));
  332. }
  333. assertEquals(forcedOrder.length, writer.getObjectCount());
  334. verifyObjectsOrder(forcedOrder);
  335. assertEquals("ed3f96b8327c7c66b0f8f70056129f0769323d86", writer
  336. .computeName().name());
  337. }
  338. /**
  339. * Another pack creation: basing on both interesting and uninteresting
  340. * objects. No delta reuse possible here, as this is a specific case when we
  341. * write only 1 commit, associated with 1 tree, 1 blob.
  342. *
  343. * @throws IOException
  344. */
  345. @Test
  346. public void testWritePack4() throws IOException {
  347. writeVerifyPack4(false);
  348. }
  349. /**
  350. * Test thin pack writing: 1 blob delta base is on objects edge. Pack
  351. * configuration as in {@link #testWritePack4()}.
  352. *
  353. * @throws IOException
  354. */
  355. @Test
  356. public void testWritePack4ThinPack() throws IOException {
  357. writeVerifyPack4(true);
  358. }
  359. /**
  360. * Compare sizes of packs created using {@link #testWritePack2()} and
  361. * {@link #testWritePack2DeltasReuseRefs()}. The pack using deltas should
  362. * be smaller.
  363. *
  364. * @throws Exception
  365. */
  366. @Test
  367. public void testWritePack2SizeDeltasVsNoDeltas() throws Exception {
  368. config.setReuseDeltas(false);
  369. config.setDeltaCompress(false);
  370. testWritePack2();
  371. final long sizePack2NoDeltas = os.size();
  372. tearDown();
  373. setUp();
  374. testWritePack2DeltasReuseRefs();
  375. final long sizePack2DeltasRefs = os.size();
  376. assertTrue(sizePack2NoDeltas > sizePack2DeltasRefs);
  377. }
  378. /**
  379. * Compare sizes of packs created using
  380. * {@link #testWritePack2DeltasReuseRefs()} and
  381. * {@link #testWritePack2DeltasReuseOffsets()}. The pack with delta bases
  382. * written as offsets should be smaller.
  383. *
  384. * @throws Exception
  385. */
  386. @Test
  387. public void testWritePack2SizeOffsetsVsRefs() throws Exception {
  388. testWritePack2DeltasReuseRefs();
  389. final long sizePack2DeltasRefs = os.size();
  390. tearDown();
  391. setUp();
  392. testWritePack2DeltasReuseOffsets();
  393. final long sizePack2DeltasOffsets = os.size();
  394. assertTrue(sizePack2DeltasRefs > sizePack2DeltasOffsets);
  395. }
  396. /**
  397. * Compare sizes of packs created using {@link #testWritePack4()} and
  398. * {@link #testWritePack4ThinPack()}. Obviously, the thin pack should be
  399. * smaller.
  400. *
  401. * @throws Exception
  402. */
  403. @Test
  404. public void testWritePack4SizeThinVsNoThin() throws Exception {
  405. testWritePack4();
  406. final long sizePack4 = os.size();
  407. tearDown();
  408. setUp();
  409. testWritePack4ThinPack();
  410. final long sizePack4Thin = os.size();
  411. assertTrue(sizePack4 > sizePack4Thin);
  412. }
  413. @Test
  414. public void testDeltaStatistics() throws Exception {
  415. config.setDeltaCompress(true);
  416. FileRepository repo = createBareRepository();
  417. TestRepository<FileRepository> testRepo = new TestRepository<>(repo);
  418. ArrayList<RevObject> blobs = new ArrayList<>();
  419. blobs.add(testRepo.blob(genDeltableData(1000)));
  420. blobs.add(testRepo.blob(genDeltableData(1005)));
  421. try (PackWriter pw = new PackWriter(repo)) {
  422. NullProgressMonitor m = NullProgressMonitor.INSTANCE;
  423. pw.preparePack(blobs.iterator());
  424. pw.writePack(m, m, os);
  425. PackStatistics stats = pw.getStatistics();
  426. assertEquals(1, stats.getTotalDeltas());
  427. assertTrue("Delta bytes not set.",
  428. stats.byObjectType(OBJ_BLOB).getDeltaBytes() > 0);
  429. }
  430. }
  431. // Generate consistent junk data for building files that delta well
  432. private String genDeltableData(int length) {
  433. assertTrue("Generated data must have a length > 0", length > 0);
  434. char[] data = {'a', 'b', 'c', '\n'};
  435. StringBuilder builder = new StringBuilder(length);
  436. for (int i = 0; i < length; i++) {
  437. builder.append(data[i % 4]);
  438. }
  439. return builder.toString();
  440. }
  441. @Test
  442. public void testWriteIndex() throws Exception {
  443. config.setIndexVersion(2);
  444. writeVerifyPack4(false);
  445. File packFile = pack.getPackFile();
  446. String name = packFile.getName();
  447. String base = name.substring(0, name.lastIndexOf('.'));
  448. File indexFile = new File(packFile.getParentFile(), base + ".idx");
  449. // Validate that IndexPack came up with the right CRC32 value.
  450. final PackIndex idx1 = PackIndex.open(indexFile);
  451. assertTrue(idx1 instanceof PackIndexV2);
  452. assertEquals(0x4743F1E4L, idx1.findCRC32(ObjectId
  453. .fromString("82c6b885ff600be425b4ea96dee75dca255b69e7")));
  454. // Validate that an index written by PackWriter is the same.
  455. final File idx2File = new File(indexFile.getAbsolutePath() + ".2");
  456. try (FileOutputStream is = new FileOutputStream(idx2File)) {
  457. writer.writeIndex(is);
  458. }
  459. final PackIndex idx2 = PackIndex.open(idx2File);
  460. assertTrue(idx2 instanceof PackIndexV2);
  461. assertEquals(idx1.getObjectCount(), idx2.getObjectCount());
  462. assertEquals(idx1.getOffset64Count(), idx2.getOffset64Count());
  463. for (int i = 0; i < idx1.getObjectCount(); i++) {
  464. final ObjectId id = idx1.getObjectId(i);
  465. assertEquals(id, idx2.getObjectId(i));
  466. assertEquals(idx1.findOffset(id), idx2.findOffset(id));
  467. assertEquals(idx1.findCRC32(id), idx2.findCRC32(id));
  468. }
  469. }
  470. @Test
  471. public void testExclude() throws Exception {
  472. FileRepository repo = createBareRepository();
  473. TestRepository<FileRepository> testRepo = new TestRepository<>(
  474. repo);
  475. BranchBuilder bb = testRepo.branch("refs/heads/master");
  476. contentA = testRepo.blob("A");
  477. c1 = bb.commit().add("f", contentA).create();
  478. testRepo.getRevWalk().parseHeaders(c1);
  479. PackIndex pf1 = writePack(repo, wants(c1), EMPTY_ID_SET);
  480. assertContent(
  481. pf1,
  482. Arrays.asList(c1.getId(), c1.getTree().getId(),
  483. contentA.getId()));
  484. contentB = testRepo.blob("B");
  485. c2 = bb.commit().add("f", contentB).create();
  486. testRepo.getRevWalk().parseHeaders(c2);
  487. PackIndex pf2 = writePack(repo, wants(c2), Sets.of((ObjectIdSet) pf1));
  488. assertContent(
  489. pf2,
  490. Arrays.asList(c2.getId(), c2.getTree().getId(),
  491. contentB.getId()));
  492. }
  493. private static void assertContent(PackIndex pi, List<ObjectId> expected) {
  494. assertEquals("Pack index has wrong size.", expected.size(),
  495. pi.getObjectCount());
  496. for (int i = 0; i < pi.getObjectCount(); i++)
  497. assertTrue(
  498. "Pack index didn't contain the expected id "
  499. + pi.getObjectId(i),
  500. expected.contains(pi.getObjectId(i)));
  501. }
  502. @Test
  503. public void testShallowIsMinimalDepth1() throws Exception {
  504. FileRepository repo = setupRepoForShallowFetch();
  505. PackIndex idx = writeShallowPack(repo, 1, wants(c2), NONE, NONE);
  506. assertContent(idx, Arrays.asList(c2.getId(), c2.getTree().getId(),
  507. contentA.getId(), contentB.getId()));
  508. // Client already has blobs A and B, verify those are not packed.
  509. idx = writeShallowPack(repo, 1, wants(c5), haves(c2), shallows(c2));
  510. assertContent(idx, Arrays.asList(c5.getId(), c5.getTree().getId(),
  511. contentC.getId(), contentD.getId(), contentE.getId()));
  512. }
  513. @Test
  514. public void testShallowIsMinimalDepth2() throws Exception {
  515. FileRepository repo = setupRepoForShallowFetch();
  516. PackIndex idx = writeShallowPack(repo, 2, wants(c2), NONE, NONE);
  517. assertContent(idx,
  518. Arrays.asList(c1.getId(), c2.getId(), c1.getTree().getId(),
  519. c2.getTree().getId(), contentA.getId(),
  520. contentB.getId()));
  521. // Client already has blobs A and B, verify those are not packed.
  522. idx = writeShallowPack(repo, 2, wants(c5), haves(c1, c2), shallows(c1));
  523. assertContent(idx,
  524. Arrays.asList(c4.getId(), c5.getId(), c4.getTree().getId(),
  525. c5.getTree().getId(), contentC.getId(),
  526. contentD.getId(), contentE.getId()));
  527. }
  528. @Test
  529. public void testShallowFetchShallowParentDepth1() throws Exception {
  530. FileRepository repo = setupRepoForShallowFetch();
  531. PackIndex idx = writeShallowPack(repo, 1, wants(c5), NONE, NONE);
  532. assertContent(idx,
  533. Arrays.asList(c5.getId(), c5.getTree().getId(),
  534. contentA.getId(), contentB.getId(), contentC.getId(),
  535. contentD.getId(), contentE.getId()));
  536. idx = writeShallowPack(repo, 1, wants(c4), haves(c5), shallows(c5));
  537. assertContent(idx, Arrays.asList(c4.getId(), c4.getTree().getId()));
  538. }
  539. @Test
  540. public void testShallowFetchShallowParentDepth2() throws Exception {
  541. FileRepository repo = setupRepoForShallowFetch();
  542. PackIndex idx = writeShallowPack(repo, 2, wants(c5), NONE, NONE);
  543. assertContent(idx,
  544. Arrays.asList(c4.getId(), c5.getId(), c4.getTree().getId(),
  545. c5.getTree().getId(), contentA.getId(),
  546. contentB.getId(), contentC.getId(), contentD.getId(),
  547. contentE.getId()));
  548. idx = writeShallowPack(repo, 2, wants(c3), haves(c4, c5), shallows(c4));
  549. assertContent(idx, Arrays.asList(c2.getId(), c3.getId(),
  550. c2.getTree().getId(), c3.getTree().getId()));
  551. }
  552. @Test
  553. public void testShallowFetchShallowAncestorDepth1() throws Exception {
  554. FileRepository repo = setupRepoForShallowFetch();
  555. PackIndex idx = writeShallowPack(repo, 1, wants(c5), NONE, NONE);
  556. assertContent(idx,
  557. Arrays.asList(c5.getId(), c5.getTree().getId(),
  558. contentA.getId(), contentB.getId(), contentC.getId(),
  559. contentD.getId(), contentE.getId()));
  560. idx = writeShallowPack(repo, 1, wants(c3), haves(c5), shallows(c5));
  561. assertContent(idx, Arrays.asList(c3.getId(), c3.getTree().getId()));
  562. }
  563. @Test
  564. public void testShallowFetchShallowAncestorDepth2() throws Exception {
  565. FileRepository repo = setupRepoForShallowFetch();
  566. PackIndex idx = writeShallowPack(repo, 2, wants(c5), NONE, NONE);
  567. assertContent(idx,
  568. Arrays.asList(c4.getId(), c5.getId(), c4.getTree().getId(),
  569. c5.getTree().getId(), contentA.getId(),
  570. contentB.getId(), contentC.getId(), contentD.getId(),
  571. contentE.getId()));
  572. idx = writeShallowPack(repo, 2, wants(c2), haves(c4, c5), shallows(c4));
  573. assertContent(idx, Arrays.asList(c1.getId(), c2.getId(),
  574. c1.getTree().getId(), c2.getTree().getId()));
  575. }
  576. private FileRepository setupRepoForShallowFetch() throws Exception {
  577. FileRepository repo = createBareRepository();
  578. TestRepository<Repository> r = new TestRepository<>(repo);
  579. BranchBuilder bb = r.branch("refs/heads/master");
  580. contentA = r.blob("A");
  581. contentB = r.blob("B");
  582. contentC = r.blob("C");
  583. contentD = r.blob("D");
  584. contentE = r.blob("E");
  585. c1 = bb.commit().add("a", contentA).create();
  586. c2 = bb.commit().add("b", contentB).create();
  587. c3 = bb.commit().add("c", contentC).create();
  588. c4 = bb.commit().add("d", contentD).create();
  589. c5 = bb.commit().add("e", contentE).create();
  590. r.getRevWalk().parseHeaders(c5); // fully initialize the tip RevCommit
  591. return repo;
  592. }
  593. private static PackIndex writePack(FileRepository repo,
  594. Set<? extends ObjectId> want, Set<ObjectIdSet> excludeObjects)
  595. throws IOException {
  596. RevWalk walk = new RevWalk(repo);
  597. return writePack(repo, walk, 0, want, NONE, excludeObjects);
  598. }
  599. private static PackIndex writeShallowPack(FileRepository repo, int depth,
  600. Set<? extends ObjectId> want, Set<? extends ObjectId> have,
  601. Set<? extends ObjectId> shallow) throws IOException {
  602. // During negotiation, UploadPack would have set up a DepthWalk and
  603. // marked the client's "shallow" commits. Emulate that here.
  604. DepthWalk.RevWalk walk = new DepthWalk.RevWalk(repo, depth - 1);
  605. walk.assumeShallow(shallow);
  606. return writePack(repo, walk, depth, want, have, EMPTY_ID_SET);
  607. }
  608. private static PackIndex writePack(FileRepository repo, RevWalk walk,
  609. int depth, Set<? extends ObjectId> want,
  610. Set<? extends ObjectId> have, Set<ObjectIdSet> excludeObjects)
  611. throws IOException {
  612. try (PackWriter pw = new PackWriter(repo)) {
  613. pw.setDeltaBaseAsOffset(true);
  614. pw.setReuseDeltaCommits(false);
  615. for (ObjectIdSet idx : excludeObjects) {
  616. pw.excludeObjects(idx);
  617. }
  618. if (depth > 0) {
  619. pw.setShallowPack(depth, null);
  620. }
  621. ObjectWalk ow = walk.toObjectWalkWithSameObjects();
  622. pw.preparePack(NullProgressMonitor.INSTANCE, ow, want, have, NONE);
  623. String id = pw.computeName().getName();
  624. File packdir = repo.getObjectDatabase().getPackDirectory();
  625. File packFile = new File(packdir, "pack-" + id + ".pack");
  626. try (FileOutputStream packOS = new FileOutputStream(packFile)) {
  627. pw.writePack(NullProgressMonitor.INSTANCE,
  628. NullProgressMonitor.INSTANCE, packOS);
  629. }
  630. File idxFile = new File(packdir, "pack-" + id + ".idx");
  631. try (FileOutputStream idxOS = new FileOutputStream(idxFile)) {
  632. pw.writeIndex(idxOS);
  633. }
  634. return PackIndex.open(idxFile);
  635. }
  636. }
  637. // TODO: testWritePackDeltasCycle()
  638. // TODO: testWritePackDeltasDepth()
  639. private void writeVerifyPack1() throws IOException {
  640. final HashSet<ObjectId> interestings = new HashSet<>();
  641. interestings.add(ObjectId
  642. .fromString("82c6b885ff600be425b4ea96dee75dca255b69e7"));
  643. createVerifyOpenPack(interestings, NONE, false, false);
  644. final ObjectId expectedOrder[] = new ObjectId[] {
  645. ObjectId.fromString("82c6b885ff600be425b4ea96dee75dca255b69e7"),
  646. ObjectId.fromString("c59759f143fb1fe21c197981df75a7ee00290799"),
  647. ObjectId.fromString("540a36d136cf413e4b064c2b0e0a4db60f77feab"),
  648. ObjectId.fromString("aabf2ffaec9b497f0950352b3e582d73035c2035"),
  649. ObjectId.fromString("902d5476fa249b7abc9d84c611577a81381f0327"),
  650. ObjectId.fromString("4b825dc642cb6eb9a060e54bf8d69288fbee4904"),
  651. ObjectId.fromString("6ff87c4664981e4397625791c8ea3bbb5f2279a3"),
  652. ObjectId.fromString("5b6e7c66c276e7610d4a73c70ec1a1f7c1003259") };
  653. assertEquals(expectedOrder.length, writer.getObjectCount());
  654. verifyObjectsOrder(expectedOrder);
  655. assertEquals("34be9032ac282b11fa9babdc2b2a93ca996c9c2f", writer
  656. .computeName().name());
  657. }
  658. private void writeVerifyPack2(boolean deltaReuse) throws IOException {
  659. config.setReuseDeltas(deltaReuse);
  660. final HashSet<ObjectId> interestings = new HashSet<>();
  661. interestings.add(ObjectId
  662. .fromString("82c6b885ff600be425b4ea96dee75dca255b69e7"));
  663. final HashSet<ObjectId> uninterestings = new HashSet<>();
  664. uninterestings.add(ObjectId
  665. .fromString("540a36d136cf413e4b064c2b0e0a4db60f77feab"));
  666. createVerifyOpenPack(interestings, uninterestings, false, false);
  667. final ObjectId expectedOrder[] = new ObjectId[] {
  668. ObjectId.fromString("82c6b885ff600be425b4ea96dee75dca255b69e7"),
  669. ObjectId.fromString("c59759f143fb1fe21c197981df75a7ee00290799"),
  670. ObjectId.fromString("aabf2ffaec9b497f0950352b3e582d73035c2035"),
  671. ObjectId.fromString("902d5476fa249b7abc9d84c611577a81381f0327"),
  672. ObjectId.fromString("6ff87c4664981e4397625791c8ea3bbb5f2279a3") ,
  673. ObjectId.fromString("5b6e7c66c276e7610d4a73c70ec1a1f7c1003259") };
  674. if (!config.isReuseDeltas() && !config.isDeltaCompress()) {
  675. // If no deltas are in the file the final two entries swap places.
  676. swap(expectedOrder, 4, 5);
  677. }
  678. assertEquals(expectedOrder.length, writer.getObjectCount());
  679. verifyObjectsOrder(expectedOrder);
  680. assertEquals("ed3f96b8327c7c66b0f8f70056129f0769323d86", writer
  681. .computeName().name());
  682. }
  683. private static void swap(ObjectId[] arr, int a, int b) {
  684. ObjectId tmp = arr[a];
  685. arr[a] = arr[b];
  686. arr[b] = tmp;
  687. }
  688. private void writeVerifyPack4(final boolean thin) throws IOException {
  689. final HashSet<ObjectId> interestings = new HashSet<>();
  690. interestings.add(ObjectId
  691. .fromString("82c6b885ff600be425b4ea96dee75dca255b69e7"));
  692. final HashSet<ObjectId> uninterestings = new HashSet<>();
  693. uninterestings.add(ObjectId
  694. .fromString("c59759f143fb1fe21c197981df75a7ee00290799"));
  695. createVerifyOpenPack(interestings, uninterestings, thin, false);
  696. final ObjectId writtenObjects[] = new ObjectId[] {
  697. ObjectId.fromString("82c6b885ff600be425b4ea96dee75dca255b69e7"),
  698. ObjectId.fromString("aabf2ffaec9b497f0950352b3e582d73035c2035"),
  699. ObjectId.fromString("5b6e7c66c276e7610d4a73c70ec1a1f7c1003259") };
  700. assertEquals(writtenObjects.length, writer.getObjectCount());
  701. ObjectId expectedObjects[];
  702. if (thin) {
  703. expectedObjects = new ObjectId[4];
  704. System.arraycopy(writtenObjects, 0, expectedObjects, 0,
  705. writtenObjects.length);
  706. expectedObjects[3] = ObjectId
  707. .fromString("6ff87c4664981e4397625791c8ea3bbb5f2279a3");
  708. } else {
  709. expectedObjects = writtenObjects;
  710. }
  711. verifyObjectsOrder(expectedObjects);
  712. assertEquals("cded4b74176b4456afa456768b2b5aafb41c44fc", writer
  713. .computeName().name());
  714. }
  715. private void createVerifyOpenPack(final Set<ObjectId> interestings,
  716. final Set<ObjectId> uninterestings, final boolean thin,
  717. final boolean ignoreMissingUninteresting)
  718. throws MissingObjectException, IOException {
  719. createVerifyOpenPack(interestings, uninterestings, thin,
  720. ignoreMissingUninteresting, false);
  721. }
  722. private void createVerifyOpenPack(final Set<ObjectId> interestings,
  723. final Set<ObjectId> uninterestings, final boolean thin,
  724. final boolean ignoreMissingUninteresting, boolean useBitmaps)
  725. throws MissingObjectException, IOException {
  726. NullProgressMonitor m = NullProgressMonitor.INSTANCE;
  727. writer = new PackWriter(config, db.newObjectReader());
  728. writer.setUseBitmaps(useBitmaps);
  729. writer.setThin(thin);
  730. writer.setIgnoreMissingUninteresting(ignoreMissingUninteresting);
  731. writer.preparePack(m, interestings, uninterestings);
  732. writer.writePack(m, m, os);
  733. writer.close();
  734. verifyOpenPack(thin);
  735. }
  736. private void createVerifyOpenPack(List<RevObject> objectSource)
  737. throws MissingObjectException, IOException {
  738. NullProgressMonitor m = NullProgressMonitor.INSTANCE;
  739. writer = new PackWriter(config, db.newObjectReader());
  740. writer.preparePack(objectSource.iterator());
  741. assertEquals(objectSource.size(), writer.getObjectCount());
  742. writer.writePack(m, m, os);
  743. writer.close();
  744. verifyOpenPack(false);
  745. }
  746. private void verifyOpenPack(boolean thin) throws IOException {
  747. final byte[] packData = os.toByteArray();
  748. if (thin) {
  749. PackParser p = index(packData);
  750. try {
  751. p.parse(NullProgressMonitor.INSTANCE);
  752. fail("indexer should grumble about missing object");
  753. } catch (IOException x) {
  754. // expected
  755. }
  756. }
  757. ObjectDirectoryPackParser p = (ObjectDirectoryPackParser) index(packData);
  758. p.setKeepEmpty(true);
  759. p.setAllowThin(thin);
  760. p.setIndexVersion(2);
  761. p.parse(NullProgressMonitor.INSTANCE);
  762. pack = p.getPackFile();
  763. assertNotNull("have PackFile after parsing", pack);
  764. }
  765. private PackParser index(byte[] packData) throws IOException {
  766. if (inserter == null)
  767. inserter = dst.newObjectInserter();
  768. return inserter.newPackParser(new ByteArrayInputStream(packData));
  769. }
  770. private void verifyObjectsOrder(ObjectId objectsOrder[]) {
  771. final List<PackIndex.MutableEntry> entries = new ArrayList<>();
  772. for (MutableEntry me : pack) {
  773. entries.add(me.cloneEntry());
  774. }
  775. Collections.sort(entries, new Comparator<PackIndex.MutableEntry>() {
  776. @Override
  777. public int compare(MutableEntry o1, MutableEntry o2) {
  778. return Long.signum(o1.getOffset() - o2.getOffset());
  779. }
  780. });
  781. int i = 0;
  782. for (MutableEntry me : entries) {
  783. assertEquals(objectsOrder[i++].toObjectId(), me.toObjectId());
  784. }
  785. }
  786. private static Set<ObjectId> haves(ObjectId... objects) {
  787. return Sets.of(objects);
  788. }
  789. private static Set<ObjectId> wants(ObjectId... objects) {
  790. return Sets.of(objects);
  791. }
  792. private static Set<ObjectId> shallows(ObjectId... objects) {
  793. return Sets.of(objects);
  794. }
  795. }