You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

PackWriterTest.java 28KB

Shallow fetch: Respect "shallow" lines When fetching from a shallow clone, the client sends "have" lines to tell the server about objects it already has and "shallow" lines to tell where its local history terminates. In some circumstances, the server fails to honor the shallow lines and fails to return objects that the client needs. UploadPack passes the "have" lines to PackWriter so PackWriter can omit them from the generated pack. UploadPack processes "shallow" lines by calling RevWalk.assumeShallow() with the set of shallow commits. RevWalk creates and caches RevCommits for these shallow commits, clearing out their parents. That way, walks correctly terminate at the shallow commits instead of assuming the client has history going back behind them. UploadPack converts its RevWalk to an ObjectWalk, maintaining the cached RevCommits, and passes it to PackWriter. Unfortunately, to support shallow fetches the PackWriter does the following: if (shallowPack && !(walk instanceof DepthWalk.ObjectWalk)) walk = new DepthWalk.ObjectWalk(reader, depth); That is, when the client sends a "deepen" line (fetch --depth=<n>) and the caller has not passed in a DepthWalk.ObjectWalk, PackWriter throws away the RevWalk that was passed in and makes a new one. The cleared parent lists prepared by RevWalk.assumeShallow() are lost. Fortunately UploadPack intends to pass in a DepthWalk.ObjectWalk. It tries to create it by calling toObjectWalkWithSameObjects() on a DepthWalk.RevWalk. But it doesn't work: because DepthWalk.RevWalk does not override the standard RevWalk#toObjectWalkWithSameObjects implementation, the result is a plain ObjectWalk instead of an instance of DepthWalk.ObjectWalk. The result is that the "shallow" information is thrown away and objects reachable from the shallow commits can be omitted from the pack sent when fetching with --depth from a shallow clone. Multiple factors collude to limit the circumstances under which this bug can be observed: 1. Commits with depth != 0 don't enter DepthGenerator's pending queue. That means a "have" cannot have any effect on DepthGenerator unless it is also a "want". 2. DepthGenerator#next() doesn't call carryFlagsImpl(), so the uninteresting flag is not propagated to ancestors there even if a "have" is also a "want". 3. JGit treats a depth of 1 as "1 past the wants". Because of (2), the only place the UNINTERESTING flag can leak to a shallow commit's parents is in the carryFlags() call from markUninteresting(). carryFlags() only traverses commits that have already been parsed: commits yet to be parsed are supposed to inherit correct flags from their parent in PendingGenerator#next (which doesn't happen here --- that is (2)). So the list of commits that have already been parsed becomes relevant. When we hit the markUninteresting() call, all "want"s, "have"s, and commits to be unshallowed have been parsed. carryFlags() only affects the parsed commits. If the "want" is a direct parent of a "have", then it carryFlags() marks it as uninteresting. If the "have" was also a "shallow", then its parent pointer should have been null and the "want" shouldn't have been marked, so we see the bug. If the "want" is a more distant ancestor then (2) keeps the uninteresting state from propagating to the "want" and we don't see the bug. If the "shallow" is not also a "have" then the shallow commit isn't parsed so (2) keeps the uninteresting state from propagating to the "want so we don't see the bug. Here is a reproduction case (time flowing left to right, arrows pointing to parents). "C" must be a commit that the client reports as a "have" during negotiation. That can only happen if the server reports it as an existing branch or tag in the first round of negotiation: A <-- B <-- C <-- D First do git clone --depth 1 <repo> which yields D as a "have" and C as a "shallow" commit. Then try git fetch --depth 1 <repo> B:refs/heads/B Negotiation sets up: have D, shallow C, have C, want B. But due to this bug B is marked as uninteresting and is not sent. Change-Id: I6e14b57b2f85e52d28cdcf356df647870f475440 Signed-off-by: Terry Parker <tparker@google.com>
7 years ago
Fix missing deltas near type boundaries Delta search was discarding discovered deltas if an object appeared near a type boundary in the delta search window. This has caused JGit to produce larger pack files than other implementations of the packing algorithm. Delta search works by pushing prior objects into a search window, an ordered list of objects to attempt to delta compress the next object against. (The window size is bounded, avoiding O(N^2) behavior.) For implementation reasons multiple object types can appear in the input list, and the window. PackWriter commonly passes both trees and blobs in the input list handed to the DeltaWindow algorithm. The pack file format requires an object to only delta compress against the same type, so the DeltaWindow algorithm must stop doing comparisions if a blob would be compared to a tree. Because the input list is sorted by object type and the window is recently considered prior objects, once a wrong type is discovered in the window the search algorithm stops and uses the current result. Unfortunately the termination condition was discarding any found delta by setting deltaBase and deltaBuf to null when it was trying to break the window search. When this bug occurs, the state of the DeltaWindow looks like this: current | \ / input list: tree0 tree1 blob1 blob2 window: blob1 tree1 tree0 / \ | res.prev As the loop iterates to the right across the window, it first finds that blob1 is a suitable delta base for blob2, and temporarily holds this in the bestDelta/deltaBuf fields. It then considers tree1, but tree1 has the wrong type (blob != tree), so the window loop must give up and fall through the remaining code. Moving the condition up and discarding the window contents allows the bestDelta/deltaBuf to be kept, letting the final file delta compress blob1 against blob0. The impact of this bug (and its fix) on real world repositories is likely minimal. The boundary from blob to tree happens approximately once in the search, as the input list is sorted by type. Only the first window size worth of blobs (e.g. 10 or 250) were failing to produce a delta in the final file. This bug fix does produce significantly different results for small test repositories created in the unit test suite, such as when a pack may contains 6 objects (2 commits, 2 trees, 2 blobs). Packing test cases can now better sample different output pack file sizes depending on delta compression and object reuse flags in PackConfig. Change-Id: Ibec09398d0305d4dbc0c66fce1daaf38eb71148f
7 years ago
Fix missing deltas near type boundaries Delta search was discarding discovered deltas if an object appeared near a type boundary in the delta search window. This has caused JGit to produce larger pack files than other implementations of the packing algorithm. Delta search works by pushing prior objects into a search window, an ordered list of objects to attempt to delta compress the next object against. (The window size is bounded, avoiding O(N^2) behavior.) For implementation reasons multiple object types can appear in the input list, and the window. PackWriter commonly passes both trees and blobs in the input list handed to the DeltaWindow algorithm. The pack file format requires an object to only delta compress against the same type, so the DeltaWindow algorithm must stop doing comparisions if a blob would be compared to a tree. Because the input list is sorted by object type and the window is recently considered prior objects, once a wrong type is discovered in the window the search algorithm stops and uses the current result. Unfortunately the termination condition was discarding any found delta by setting deltaBase and deltaBuf to null when it was trying to break the window search. When this bug occurs, the state of the DeltaWindow looks like this: current | \ / input list: tree0 tree1 blob1 blob2 window: blob1 tree1 tree0 / \ | res.prev As the loop iterates to the right across the window, it first finds that blob1 is a suitable delta base for blob2, and temporarily holds this in the bestDelta/deltaBuf fields. It then considers tree1, but tree1 has the wrong type (blob != tree), so the window loop must give up and fall through the remaining code. Moving the condition up and discarding the window contents allows the bestDelta/deltaBuf to be kept, letting the final file delta compress blob1 against blob0. The impact of this bug (and its fix) on real world repositories is likely minimal. The boundary from blob to tree happens approximately once in the search, as the input list is sorted by type. Only the first window size worth of blobs (e.g. 10 or 250) were failing to produce a delta in the final file. This bug fix does produce significantly different results for small test repositories created in the unit test suite, such as when a pack may contains 6 objects (2 commits, 2 trees, 2 blobs). Packing test cases can now better sample different output pack file sizes depending on delta compression and object reuse flags in PackConfig. Change-Id: Ibec09398d0305d4dbc0c66fce1daaf38eb71148f
7 years ago
Shallow fetch: Respect "shallow" lines When fetching from a shallow clone, the client sends "have" lines to tell the server about objects it already has and "shallow" lines to tell where its local history terminates. In some circumstances, the server fails to honor the shallow lines and fails to return objects that the client needs. UploadPack passes the "have" lines to PackWriter so PackWriter can omit them from the generated pack. UploadPack processes "shallow" lines by calling RevWalk.assumeShallow() with the set of shallow commits. RevWalk creates and caches RevCommits for these shallow commits, clearing out their parents. That way, walks correctly terminate at the shallow commits instead of assuming the client has history going back behind them. UploadPack converts its RevWalk to an ObjectWalk, maintaining the cached RevCommits, and passes it to PackWriter. Unfortunately, to support shallow fetches the PackWriter does the following: if (shallowPack && !(walk instanceof DepthWalk.ObjectWalk)) walk = new DepthWalk.ObjectWalk(reader, depth); That is, when the client sends a "deepen" line (fetch --depth=<n>) and the caller has not passed in a DepthWalk.ObjectWalk, PackWriter throws away the RevWalk that was passed in and makes a new one. The cleared parent lists prepared by RevWalk.assumeShallow() are lost. Fortunately UploadPack intends to pass in a DepthWalk.ObjectWalk. It tries to create it by calling toObjectWalkWithSameObjects() on a DepthWalk.RevWalk. But it doesn't work: because DepthWalk.RevWalk does not override the standard RevWalk#toObjectWalkWithSameObjects implementation, the result is a plain ObjectWalk instead of an instance of DepthWalk.ObjectWalk. The result is that the "shallow" information is thrown away and objects reachable from the shallow commits can be omitted from the pack sent when fetching with --depth from a shallow clone. Multiple factors collude to limit the circumstances under which this bug can be observed: 1. Commits with depth != 0 don't enter DepthGenerator's pending queue. That means a "have" cannot have any effect on DepthGenerator unless it is also a "want". 2. DepthGenerator#next() doesn't call carryFlagsImpl(), so the uninteresting flag is not propagated to ancestors there even if a "have" is also a "want". 3. JGit treats a depth of 1 as "1 past the wants". Because of (2), the only place the UNINTERESTING flag can leak to a shallow commit's parents is in the carryFlags() call from markUninteresting(). carryFlags() only traverses commits that have already been parsed: commits yet to be parsed are supposed to inherit correct flags from their parent in PendingGenerator#next (which doesn't happen here --- that is (2)). So the list of commits that have already been parsed becomes relevant. When we hit the markUninteresting() call, all "want"s, "have"s, and commits to be unshallowed have been parsed. carryFlags() only affects the parsed commits. If the "want" is a direct parent of a "have", then it carryFlags() marks it as uninteresting. If the "have" was also a "shallow", then its parent pointer should have been null and the "want" shouldn't have been marked, so we see the bug. If the "want" is a more distant ancestor then (2) keeps the uninteresting state from propagating to the "want" and we don't see the bug. If the "shallow" is not also a "have" then the shallow commit isn't parsed so (2) keeps the uninteresting state from propagating to the "want so we don't see the bug. Here is a reproduction case (time flowing left to right, arrows pointing to parents). "C" must be a commit that the client reports as a "have" during negotiation. That can only happen if the server reports it as an existing branch or tag in the first round of negotiation: A <-- B <-- C <-- D First do git clone --depth 1 <repo> which yields D as a "have" and C as a "shallow" commit. Then try git fetch --depth 1 <repo> B:refs/heads/B Negotiation sets up: have D, shallow C, have C, want B. But due to this bug B is marked as uninteresting and is not sent. Change-Id: I6e14b57b2f85e52d28cdcf356df647870f475440 Signed-off-by: Terry Parker <tparker@google.com>
7 years ago
Shallow fetch: Respect "shallow" lines When fetching from a shallow clone, the client sends "have" lines to tell the server about objects it already has and "shallow" lines to tell where its local history terminates. In some circumstances, the server fails to honor the shallow lines and fails to return objects that the client needs. UploadPack passes the "have" lines to PackWriter so PackWriter can omit them from the generated pack. UploadPack processes "shallow" lines by calling RevWalk.assumeShallow() with the set of shallow commits. RevWalk creates and caches RevCommits for these shallow commits, clearing out their parents. That way, walks correctly terminate at the shallow commits instead of assuming the client has history going back behind them. UploadPack converts its RevWalk to an ObjectWalk, maintaining the cached RevCommits, and passes it to PackWriter. Unfortunately, to support shallow fetches the PackWriter does the following: if (shallowPack && !(walk instanceof DepthWalk.ObjectWalk)) walk = new DepthWalk.ObjectWalk(reader, depth); That is, when the client sends a "deepen" line (fetch --depth=<n>) and the caller has not passed in a DepthWalk.ObjectWalk, PackWriter throws away the RevWalk that was passed in and makes a new one. The cleared parent lists prepared by RevWalk.assumeShallow() are lost. Fortunately UploadPack intends to pass in a DepthWalk.ObjectWalk. It tries to create it by calling toObjectWalkWithSameObjects() on a DepthWalk.RevWalk. But it doesn't work: because DepthWalk.RevWalk does not override the standard RevWalk#toObjectWalkWithSameObjects implementation, the result is a plain ObjectWalk instead of an instance of DepthWalk.ObjectWalk. The result is that the "shallow" information is thrown away and objects reachable from the shallow commits can be omitted from the pack sent when fetching with --depth from a shallow clone. Multiple factors collude to limit the circumstances under which this bug can be observed: 1. Commits with depth != 0 don't enter DepthGenerator's pending queue. That means a "have" cannot have any effect on DepthGenerator unless it is also a "want". 2. DepthGenerator#next() doesn't call carryFlagsImpl(), so the uninteresting flag is not propagated to ancestors there even if a "have" is also a "want". 3. JGit treats a depth of 1 as "1 past the wants". Because of (2), the only place the UNINTERESTING flag can leak to a shallow commit's parents is in the carryFlags() call from markUninteresting(). carryFlags() only traverses commits that have already been parsed: commits yet to be parsed are supposed to inherit correct flags from their parent in PendingGenerator#next (which doesn't happen here --- that is (2)). So the list of commits that have already been parsed becomes relevant. When we hit the markUninteresting() call, all "want"s, "have"s, and commits to be unshallowed have been parsed. carryFlags() only affects the parsed commits. If the "want" is a direct parent of a "have", then it carryFlags() marks it as uninteresting. If the "have" was also a "shallow", then its parent pointer should have been null and the "want" shouldn't have been marked, so we see the bug. If the "want" is a more distant ancestor then (2) keeps the uninteresting state from propagating to the "want" and we don't see the bug. If the "shallow" is not also a "have" then the shallow commit isn't parsed so (2) keeps the uninteresting state from propagating to the "want so we don't see the bug. Here is a reproduction case (time flowing left to right, arrows pointing to parents). "C" must be a commit that the client reports as a "have" during negotiation. That can only happen if the server reports it as an existing branch or tag in the first round of negotiation: A <-- B <-- C <-- D First do git clone --depth 1 <repo> which yields D as a "have" and C as a "shallow" commit. Then try git fetch --depth 1 <repo> B:refs/heads/B Negotiation sets up: have D, shallow C, have C, want B. But due to this bug B is marked as uninteresting and is not sent. Change-Id: I6e14b57b2f85e52d28cdcf356df647870f475440 Signed-off-by: Terry Parker <tparker@google.com>
7 years ago
Shallow fetch: Respect "shallow" lines When fetching from a shallow clone, the client sends "have" lines to tell the server about objects it already has and "shallow" lines to tell where its local history terminates. In some circumstances, the server fails to honor the shallow lines and fails to return objects that the client needs. UploadPack passes the "have" lines to PackWriter so PackWriter can omit them from the generated pack. UploadPack processes "shallow" lines by calling RevWalk.assumeShallow() with the set of shallow commits. RevWalk creates and caches RevCommits for these shallow commits, clearing out their parents. That way, walks correctly terminate at the shallow commits instead of assuming the client has history going back behind them. UploadPack converts its RevWalk to an ObjectWalk, maintaining the cached RevCommits, and passes it to PackWriter. Unfortunately, to support shallow fetches the PackWriter does the following: if (shallowPack && !(walk instanceof DepthWalk.ObjectWalk)) walk = new DepthWalk.ObjectWalk(reader, depth); That is, when the client sends a "deepen" line (fetch --depth=<n>) and the caller has not passed in a DepthWalk.ObjectWalk, PackWriter throws away the RevWalk that was passed in and makes a new one. The cleared parent lists prepared by RevWalk.assumeShallow() are lost. Fortunately UploadPack intends to pass in a DepthWalk.ObjectWalk. It tries to create it by calling toObjectWalkWithSameObjects() on a DepthWalk.RevWalk. But it doesn't work: because DepthWalk.RevWalk does not override the standard RevWalk#toObjectWalkWithSameObjects implementation, the result is a plain ObjectWalk instead of an instance of DepthWalk.ObjectWalk. The result is that the "shallow" information is thrown away and objects reachable from the shallow commits can be omitted from the pack sent when fetching with --depth from a shallow clone. Multiple factors collude to limit the circumstances under which this bug can be observed: 1. Commits with depth != 0 don't enter DepthGenerator's pending queue. That means a "have" cannot have any effect on DepthGenerator unless it is also a "want". 2. DepthGenerator#next() doesn't call carryFlagsImpl(), so the uninteresting flag is not propagated to ancestors there even if a "have" is also a "want". 3. JGit treats a depth of 1 as "1 past the wants". Because of (2), the only place the UNINTERESTING flag can leak to a shallow commit's parents is in the carryFlags() call from markUninteresting(). carryFlags() only traverses commits that have already been parsed: commits yet to be parsed are supposed to inherit correct flags from their parent in PendingGenerator#next (which doesn't happen here --- that is (2)). So the list of commits that have already been parsed becomes relevant. When we hit the markUninteresting() call, all "want"s, "have"s, and commits to be unshallowed have been parsed. carryFlags() only affects the parsed commits. If the "want" is a direct parent of a "have", then it carryFlags() marks it as uninteresting. If the "have" was also a "shallow", then its parent pointer should have been null and the "want" shouldn't have been marked, so we see the bug. If the "want" is a more distant ancestor then (2) keeps the uninteresting state from propagating to the "want" and we don't see the bug. If the "shallow" is not also a "have" then the shallow commit isn't parsed so (2) keeps the uninteresting state from propagating to the "want so we don't see the bug. Here is a reproduction case (time flowing left to right, arrows pointing to parents). "C" must be a commit that the client reports as a "have" during negotiation. That can only happen if the server reports it as an existing branch or tag in the first round of negotiation: A <-- B <-- C <-- D First do git clone --depth 1 <repo> which yields D as a "have" and C as a "shallow" commit. Then try git fetch --depth 1 <repo> B:refs/heads/B Negotiation sets up: have D, shallow C, have C, want B. But due to this bug B is marked as uninteresting and is not sent. Change-Id: I6e14b57b2f85e52d28cdcf356df647870f475440 Signed-off-by: Terry Parker <tparker@google.com>
7 years ago
Shallow fetch: Respect "shallow" lines When fetching from a shallow clone, the client sends "have" lines to tell the server about objects it already has and "shallow" lines to tell where its local history terminates. In some circumstances, the server fails to honor the shallow lines and fails to return objects that the client needs. UploadPack passes the "have" lines to PackWriter so PackWriter can omit them from the generated pack. UploadPack processes "shallow" lines by calling RevWalk.assumeShallow() with the set of shallow commits. RevWalk creates and caches RevCommits for these shallow commits, clearing out their parents. That way, walks correctly terminate at the shallow commits instead of assuming the client has history going back behind them. UploadPack converts its RevWalk to an ObjectWalk, maintaining the cached RevCommits, and passes it to PackWriter. Unfortunately, to support shallow fetches the PackWriter does the following: if (shallowPack && !(walk instanceof DepthWalk.ObjectWalk)) walk = new DepthWalk.ObjectWalk(reader, depth); That is, when the client sends a "deepen" line (fetch --depth=<n>) and the caller has not passed in a DepthWalk.ObjectWalk, PackWriter throws away the RevWalk that was passed in and makes a new one. The cleared parent lists prepared by RevWalk.assumeShallow() are lost. Fortunately UploadPack intends to pass in a DepthWalk.ObjectWalk. It tries to create it by calling toObjectWalkWithSameObjects() on a DepthWalk.RevWalk. But it doesn't work: because DepthWalk.RevWalk does not override the standard RevWalk#toObjectWalkWithSameObjects implementation, the result is a plain ObjectWalk instead of an instance of DepthWalk.ObjectWalk. The result is that the "shallow" information is thrown away and objects reachable from the shallow commits can be omitted from the pack sent when fetching with --depth from a shallow clone. Multiple factors collude to limit the circumstances under which this bug can be observed: 1. Commits with depth != 0 don't enter DepthGenerator's pending queue. That means a "have" cannot have any effect on DepthGenerator unless it is also a "want". 2. DepthGenerator#next() doesn't call carryFlagsImpl(), so the uninteresting flag is not propagated to ancestors there even if a "have" is also a "want". 3. JGit treats a depth of 1 as "1 past the wants". Because of (2), the only place the UNINTERESTING flag can leak to a shallow commit's parents is in the carryFlags() call from markUninteresting(). carryFlags() only traverses commits that have already been parsed: commits yet to be parsed are supposed to inherit correct flags from their parent in PendingGenerator#next (which doesn't happen here --- that is (2)). So the list of commits that have already been parsed becomes relevant. When we hit the markUninteresting() call, all "want"s, "have"s, and commits to be unshallowed have been parsed. carryFlags() only affects the parsed commits. If the "want" is a direct parent of a "have", then it carryFlags() marks it as uninteresting. If the "have" was also a "shallow", then its parent pointer should have been null and the "want" shouldn't have been marked, so we see the bug. If the "want" is a more distant ancestor then (2) keeps the uninteresting state from propagating to the "want" and we don't see the bug. If the "shallow" is not also a "have" then the shallow commit isn't parsed so (2) keeps the uninteresting state from propagating to the "want so we don't see the bug. Here is a reproduction case (time flowing left to right, arrows pointing to parents). "C" must be a commit that the client reports as a "have" during negotiation. That can only happen if the server reports it as an existing branch or tag in the first round of negotiation: A <-- B <-- C <-- D First do git clone --depth 1 <repo> which yields D as a "have" and C as a "shallow" commit. Then try git fetch --depth 1 <repo> B:refs/heads/B Negotiation sets up: have D, shallow C, have C, want B. But due to this bug B is marked as uninteresting and is not sent. Change-Id: I6e14b57b2f85e52d28cdcf356df647870f475440 Signed-off-by: Terry Parker <tparker@google.com>
7 years ago
Fix missing deltas near type boundaries Delta search was discarding discovered deltas if an object appeared near a type boundary in the delta search window. This has caused JGit to produce larger pack files than other implementations of the packing algorithm. Delta search works by pushing prior objects into a search window, an ordered list of objects to attempt to delta compress the next object against. (The window size is bounded, avoiding O(N^2) behavior.) For implementation reasons multiple object types can appear in the input list, and the window. PackWriter commonly passes both trees and blobs in the input list handed to the DeltaWindow algorithm. The pack file format requires an object to only delta compress against the same type, so the DeltaWindow algorithm must stop doing comparisions if a blob would be compared to a tree. Because the input list is sorted by object type and the window is recently considered prior objects, once a wrong type is discovered in the window the search algorithm stops and uses the current result. Unfortunately the termination condition was discarding any found delta by setting deltaBase and deltaBuf to null when it was trying to break the window search. When this bug occurs, the state of the DeltaWindow looks like this: current | \ / input list: tree0 tree1 blob1 blob2 window: blob1 tree1 tree0 / \ | res.prev As the loop iterates to the right across the window, it first finds that blob1 is a suitable delta base for blob2, and temporarily holds this in the bestDelta/deltaBuf fields. It then considers tree1, but tree1 has the wrong type (blob != tree), so the window loop must give up and fall through the remaining code. Moving the condition up and discarding the window contents allows the bestDelta/deltaBuf to be kept, letting the final file delta compress blob1 against blob0. The impact of this bug (and its fix) on real world repositories is likely minimal. The boundary from blob to tree happens approximately once in the search, as the input list is sorted by type. Only the first window size worth of blobs (e.g. 10 or 250) were failing to produce a delta in the final file. This bug fix does produce significantly different results for small test repositories created in the unit test suite, such as when a pack may contains 6 objects (2 commits, 2 trees, 2 blobs). Packing test cases can now better sample different output pack file sizes depending on delta compression and object reuse flags in PackConfig. Change-Id: Ibec09398d0305d4dbc0c66fce1daaf38eb71148f
7 years ago
Fix missing deltas near type boundaries Delta search was discarding discovered deltas if an object appeared near a type boundary in the delta search window. This has caused JGit to produce larger pack files than other implementations of the packing algorithm. Delta search works by pushing prior objects into a search window, an ordered list of objects to attempt to delta compress the next object against. (The window size is bounded, avoiding O(N^2) behavior.) For implementation reasons multiple object types can appear in the input list, and the window. PackWriter commonly passes both trees and blobs in the input list handed to the DeltaWindow algorithm. The pack file format requires an object to only delta compress against the same type, so the DeltaWindow algorithm must stop doing comparisions if a blob would be compared to a tree. Because the input list is sorted by object type and the window is recently considered prior objects, once a wrong type is discovered in the window the search algorithm stops and uses the current result. Unfortunately the termination condition was discarding any found delta by setting deltaBase and deltaBuf to null when it was trying to break the window search. When this bug occurs, the state of the DeltaWindow looks like this: current | \ / input list: tree0 tree1 blob1 blob2 window: blob1 tree1 tree0 / \ | res.prev As the loop iterates to the right across the window, it first finds that blob1 is a suitable delta base for blob2, and temporarily holds this in the bestDelta/deltaBuf fields. It then considers tree1, but tree1 has the wrong type (blob != tree), so the window loop must give up and fall through the remaining code. Moving the condition up and discarding the window contents allows the bestDelta/deltaBuf to be kept, letting the final file delta compress blob1 against blob0. The impact of this bug (and its fix) on real world repositories is likely minimal. The boundary from blob to tree happens approximately once in the search, as the input list is sorted by type. Only the first window size worth of blobs (e.g. 10 or 250) were failing to produce a delta in the final file. This bug fix does produce significantly different results for small test repositories created in the unit test suite, such as when a pack may contains 6 objects (2 commits, 2 trees, 2 blobs). Packing test cases can now better sample different output pack file sizes depending on delta compression and object reuse flags in PackConfig. Change-Id: Ibec09398d0305d4dbc0c66fce1daaf38eb71148f
7 years ago
Fix missing deltas near type boundaries Delta search was discarding discovered deltas if an object appeared near a type boundary in the delta search window. This has caused JGit to produce larger pack files than other implementations of the packing algorithm. Delta search works by pushing prior objects into a search window, an ordered list of objects to attempt to delta compress the next object against. (The window size is bounded, avoiding O(N^2) behavior.) For implementation reasons multiple object types can appear in the input list, and the window. PackWriter commonly passes both trees and blobs in the input list handed to the DeltaWindow algorithm. The pack file format requires an object to only delta compress against the same type, so the DeltaWindow algorithm must stop doing comparisions if a blob would be compared to a tree. Because the input list is sorted by object type and the window is recently considered prior objects, once a wrong type is discovered in the window the search algorithm stops and uses the current result. Unfortunately the termination condition was discarding any found delta by setting deltaBase and deltaBuf to null when it was trying to break the window search. When this bug occurs, the state of the DeltaWindow looks like this: current | \ / input list: tree0 tree1 blob1 blob2 window: blob1 tree1 tree0 / \ | res.prev As the loop iterates to the right across the window, it first finds that blob1 is a suitable delta base for blob2, and temporarily holds this in the bestDelta/deltaBuf fields. It then considers tree1, but tree1 has the wrong type (blob != tree), so the window loop must give up and fall through the remaining code. Moving the condition up and discarding the window contents allows the bestDelta/deltaBuf to be kept, letting the final file delta compress blob1 against blob0. The impact of this bug (and its fix) on real world repositories is likely minimal. The boundary from blob to tree happens approximately once in the search, as the input list is sorted by type. Only the first window size worth of blobs (e.g. 10 or 250) were failing to produce a delta in the final file. This bug fix does produce significantly different results for small test repositories created in the unit test suite, such as when a pack may contains 6 objects (2 commits, 2 trees, 2 blobs). Packing test cases can now better sample different output pack file sizes depending on delta compression and object reuse flags in PackConfig. Change-Id: Ibec09398d0305d4dbc0c66fce1daaf38eb71148f
7 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878
  1. /*
  2. * Copyright (C) 2008, Marek Zawirski <marek.zawirski@gmail.com> and others
  3. *
  4. * This program and the accompanying materials are made available under the
  5. * terms of the Eclipse Distribution License v. 1.0 which is available at
  6. * https://www.eclipse.org/org/documents/edl-v10.php.
  7. *
  8. * SPDX-License-Identifier: BSD-3-Clause
  9. */
  10. package org.eclipse.jgit.internal.storage.file;
  11. import static org.eclipse.jgit.internal.storage.pack.PackWriter.NONE;
  12. import static org.eclipse.jgit.lib.Constants.INFO_ALTERNATES;
  13. import static org.eclipse.jgit.lib.Constants.OBJ_BLOB;
  14. import static org.junit.Assert.assertEquals;
  15. import static org.junit.Assert.assertFalse;
  16. import static org.junit.Assert.assertNotNull;
  17. import static org.junit.Assert.assertTrue;
  18. import static org.junit.Assert.fail;
  19. import java.io.ByteArrayInputStream;
  20. import java.io.ByteArrayOutputStream;
  21. import java.io.File;
  22. import java.io.FileOutputStream;
  23. import java.io.IOException;
  24. import java.text.ParseException;
  25. import java.util.ArrayList;
  26. import java.util.Arrays;
  27. import java.util.Collections;
  28. import java.util.HashSet;
  29. import java.util.List;
  30. import java.util.Set;
  31. import org.eclipse.jgit.errors.MissingObjectException;
  32. import org.eclipse.jgit.internal.storage.file.PackIndex.MutableEntry;
  33. import org.eclipse.jgit.internal.storage.pack.PackExt;
  34. import org.eclipse.jgit.internal.storage.pack.PackWriter;
  35. import org.eclipse.jgit.junit.JGitTestUtil;
  36. import org.eclipse.jgit.junit.TestRepository;
  37. import org.eclipse.jgit.junit.TestRepository.BranchBuilder;
  38. import org.eclipse.jgit.lib.NullProgressMonitor;
  39. import org.eclipse.jgit.lib.ObjectId;
  40. import org.eclipse.jgit.lib.ObjectIdSet;
  41. import org.eclipse.jgit.lib.ObjectInserter;
  42. import org.eclipse.jgit.lib.Repository;
  43. import org.eclipse.jgit.lib.Sets;
  44. import org.eclipse.jgit.revwalk.DepthWalk;
  45. import org.eclipse.jgit.revwalk.ObjectWalk;
  46. import org.eclipse.jgit.revwalk.RevBlob;
  47. import org.eclipse.jgit.revwalk.RevCommit;
  48. import org.eclipse.jgit.revwalk.RevObject;
  49. import org.eclipse.jgit.revwalk.RevWalk;
  50. import org.eclipse.jgit.storage.pack.PackConfig;
  51. import org.eclipse.jgit.storage.pack.PackStatistics;
  52. import org.eclipse.jgit.test.resources.SampleDataRepositoryTestCase;
  53. import org.eclipse.jgit.transport.PackParser;
  54. import org.junit.After;
  55. import org.junit.Before;
  56. import org.junit.Test;
  57. public class PackWriterTest extends SampleDataRepositoryTestCase {
  58. private static final List<RevObject> EMPTY_LIST_REVS = Collections
  59. .<RevObject> emptyList();
  60. private static final Set<ObjectIdSet> EMPTY_ID_SET = Collections
  61. .<ObjectIdSet> emptySet();
  62. private PackConfig config;
  63. private PackWriter writer;
  64. private ByteArrayOutputStream os;
  65. private Pack pack;
  66. private ObjectInserter inserter;
  67. private FileRepository dst;
  68. private RevBlob contentA;
  69. private RevBlob contentB;
  70. private RevBlob contentC;
  71. private RevBlob contentD;
  72. private RevBlob contentE;
  73. private RevCommit c1;
  74. private RevCommit c2;
  75. private RevCommit c3;
  76. private RevCommit c4;
  77. private RevCommit c5;
  78. @Override
  79. @Before
  80. public void setUp() throws Exception {
  81. super.setUp();
  82. os = new ByteArrayOutputStream();
  83. config = new PackConfig(db);
  84. dst = createBareRepository();
  85. File alt = new File(dst.getObjectDatabase().getDirectory(), INFO_ALTERNATES);
  86. alt.getParentFile().mkdirs();
  87. write(alt, db.getObjectDatabase().getDirectory().getAbsolutePath() + "\n");
  88. }
  89. @Override
  90. @After
  91. public void tearDown() throws Exception {
  92. if (writer != null) {
  93. writer.close();
  94. writer = null;
  95. }
  96. if (inserter != null) {
  97. inserter.close();
  98. inserter = null;
  99. }
  100. super.tearDown();
  101. }
  102. /**
  103. * Test constructor for exceptions, default settings, initialization.
  104. *
  105. * @throws IOException
  106. */
  107. @Test
  108. public void testContructor() throws IOException {
  109. writer = new PackWriter(config, db.newObjectReader());
  110. assertFalse(writer.isDeltaBaseAsOffset());
  111. assertTrue(config.isReuseDeltas());
  112. assertTrue(config.isReuseObjects());
  113. assertEquals(0, writer.getObjectCount());
  114. }
  115. /**
  116. * Change default settings and verify them.
  117. */
  118. @Test
  119. public void testModifySettings() {
  120. config.setReuseDeltas(false);
  121. config.setReuseObjects(false);
  122. config.setDeltaBaseAsOffset(false);
  123. assertFalse(config.isReuseDeltas());
  124. assertFalse(config.isReuseObjects());
  125. assertFalse(config.isDeltaBaseAsOffset());
  126. writer = new PackWriter(config, db.newObjectReader());
  127. writer.setDeltaBaseAsOffset(true);
  128. assertTrue(writer.isDeltaBaseAsOffset());
  129. assertFalse(config.isDeltaBaseAsOffset());
  130. }
  131. /**
  132. * Write empty pack by providing empty sets of interesting/uninteresting
  133. * objects and check for correct format.
  134. *
  135. * @throws IOException
  136. */
  137. @Test
  138. public void testWriteEmptyPack1() throws IOException {
  139. createVerifyOpenPack(NONE, NONE, false, false);
  140. assertEquals(0, writer.getObjectCount());
  141. assertEquals(0, pack.getObjectCount());
  142. assertEquals("da39a3ee5e6b4b0d3255bfef95601890afd80709", writer
  143. .computeName().name());
  144. }
  145. /**
  146. * Write empty pack by providing empty iterator of objects to write and
  147. * check for correct format.
  148. *
  149. * @throws IOException
  150. */
  151. @Test
  152. public void testWriteEmptyPack2() throws IOException {
  153. createVerifyOpenPack(EMPTY_LIST_REVS);
  154. assertEquals(0, writer.getObjectCount());
  155. assertEquals(0, pack.getObjectCount());
  156. }
  157. /**
  158. * Try to pass non-existing object as uninteresting, with non-ignoring
  159. * setting.
  160. *
  161. * @throws IOException
  162. */
  163. @Test
  164. public void testNotIgnoreNonExistingObjects() throws IOException {
  165. final ObjectId nonExisting = ObjectId
  166. .fromString("0000000000000000000000000000000000000001");
  167. try {
  168. createVerifyOpenPack(NONE, haves(nonExisting), false, false);
  169. fail("Should have thrown MissingObjectException");
  170. } catch (MissingObjectException x) {
  171. // expected
  172. }
  173. }
  174. /**
  175. * Try to pass non-existing object as uninteresting, with ignoring setting.
  176. *
  177. * @throws IOException
  178. */
  179. @Test
  180. public void testIgnoreNonExistingObjects() throws IOException {
  181. final ObjectId nonExisting = ObjectId
  182. .fromString("0000000000000000000000000000000000000001");
  183. createVerifyOpenPack(NONE, haves(nonExisting), false, true);
  184. // shouldn't throw anything
  185. }
  186. /**
  187. * Try to pass non-existing object as uninteresting, with ignoring setting.
  188. * Use a repo with bitmap indexes because then PackWriter will use
  189. * PackWriterBitmapWalker which had problems with this situation.
  190. *
  191. * @throws IOException
  192. * @throws ParseException
  193. */
  194. @Test
  195. public void testIgnoreNonExistingObjectsWithBitmaps() throws IOException,
  196. ParseException {
  197. final ObjectId nonExisting = ObjectId
  198. .fromString("0000000000000000000000000000000000000001");
  199. new GC(db).gc();
  200. createVerifyOpenPack(NONE, haves(nonExisting), false, true, true);
  201. // shouldn't throw anything
  202. }
  203. /**
  204. * Create pack basing on only interesting objects, then precisely verify
  205. * content. No delta reuse here.
  206. *
  207. * @throws IOException
  208. */
  209. @Test
  210. public void testWritePack1() throws IOException {
  211. config.setReuseDeltas(false);
  212. writeVerifyPack1();
  213. }
  214. /**
  215. * Test writing pack without object reuse. Pack content/preparation as in
  216. * {@link #testWritePack1()}.
  217. *
  218. * @throws IOException
  219. */
  220. @Test
  221. public void testWritePack1NoObjectReuse() throws IOException {
  222. config.setReuseDeltas(false);
  223. config.setReuseObjects(false);
  224. writeVerifyPack1();
  225. }
  226. /**
  227. * Create pack basing on both interesting and uninteresting objects, then
  228. * precisely verify content. No delta reuse here.
  229. *
  230. * @throws IOException
  231. */
  232. @Test
  233. public void testWritePack2() throws IOException {
  234. writeVerifyPack2(false);
  235. }
  236. /**
  237. * Test pack writing with deltas reuse, delta-base first rule. Pack
  238. * content/preparation as in {@link #testWritePack2()}.
  239. *
  240. * @throws IOException
  241. */
  242. @Test
  243. public void testWritePack2DeltasReuseRefs() throws IOException {
  244. writeVerifyPack2(true);
  245. }
  246. /**
  247. * Test pack writing with delta reuse. Delta bases referred as offsets. Pack
  248. * configuration as in {@link #testWritePack2DeltasReuseRefs()}.
  249. *
  250. * @throws IOException
  251. */
  252. @Test
  253. public void testWritePack2DeltasReuseOffsets() throws IOException {
  254. config.setDeltaBaseAsOffset(true);
  255. writeVerifyPack2(true);
  256. }
  257. /**
  258. * Test pack writing with delta reuse. Raw-data copy (reuse) is made on a
  259. * pack with CRC32 index. Pack configuration as in
  260. * {@link #testWritePack2DeltasReuseRefs()}.
  261. *
  262. * @throws IOException
  263. */
  264. @Test
  265. public void testWritePack2DeltasCRC32Copy() throws IOException {
  266. final File packDir = db.getObjectDatabase().getPackDirectory();
  267. final PackFile crc32Pack = new PackFile(packDir,
  268. "pack-34be9032ac282b11fa9babdc2b2a93ca996c9c2f.pack");
  269. final PackFile crc32Idx = new PackFile(packDir,
  270. "pack-34be9032ac282b11fa9babdc2b2a93ca996c9c2f.idx");
  271. copyFile(JGitTestUtil.getTestResourceFile(
  272. "pack-34be9032ac282b11fa9babdc2b2a93ca996c9c2f.idxV2"),
  273. crc32Idx);
  274. db.openPack(crc32Pack);
  275. writeVerifyPack2(true);
  276. }
  277. /**
  278. * Create pack basing on fixed objects list, then precisely verify content.
  279. * No delta reuse here.
  280. *
  281. * @throws IOException
  282. * @throws MissingObjectException
  283. *
  284. */
  285. @Test
  286. public void testWritePack3() throws MissingObjectException, IOException {
  287. config.setReuseDeltas(false);
  288. final ObjectId forcedOrder[] = new ObjectId[] {
  289. ObjectId.fromString("82c6b885ff600be425b4ea96dee75dca255b69e7"),
  290. ObjectId.fromString("c59759f143fb1fe21c197981df75a7ee00290799"),
  291. ObjectId.fromString("aabf2ffaec9b497f0950352b3e582d73035c2035"),
  292. ObjectId.fromString("902d5476fa249b7abc9d84c611577a81381f0327"),
  293. ObjectId.fromString("6ff87c4664981e4397625791c8ea3bbb5f2279a3") ,
  294. ObjectId.fromString("5b6e7c66c276e7610d4a73c70ec1a1f7c1003259") };
  295. try (RevWalk parser = new RevWalk(db)) {
  296. final RevObject forcedOrderRevs[] = new RevObject[forcedOrder.length];
  297. for (int i = 0; i < forcedOrder.length; i++)
  298. forcedOrderRevs[i] = parser.parseAny(forcedOrder[i]);
  299. createVerifyOpenPack(Arrays.asList(forcedOrderRevs));
  300. }
  301. assertEquals(forcedOrder.length, writer.getObjectCount());
  302. verifyObjectsOrder(forcedOrder);
  303. assertEquals("ed3f96b8327c7c66b0f8f70056129f0769323d86", writer
  304. .computeName().name());
  305. }
  306. /**
  307. * Another pack creation: basing on both interesting and uninteresting
  308. * objects. No delta reuse possible here, as this is a specific case when we
  309. * write only 1 commit, associated with 1 tree, 1 blob.
  310. *
  311. * @throws IOException
  312. */
  313. @Test
  314. public void testWritePack4() throws IOException {
  315. writeVerifyPack4(false);
  316. }
  317. /**
  318. * Test thin pack writing: 1 blob delta base is on objects edge. Pack
  319. * configuration as in {@link #testWritePack4()}.
  320. *
  321. * @throws IOException
  322. */
  323. @Test
  324. public void testWritePack4ThinPack() throws IOException {
  325. writeVerifyPack4(true);
  326. }
  327. /**
  328. * Compare sizes of packs created using {@link #testWritePack2()} and
  329. * {@link #testWritePack2DeltasReuseRefs()}. The pack using deltas should
  330. * be smaller.
  331. *
  332. * @throws Exception
  333. */
  334. @Test
  335. public void testWritePack2SizeDeltasVsNoDeltas() throws Exception {
  336. config.setReuseDeltas(false);
  337. config.setDeltaCompress(false);
  338. testWritePack2();
  339. final long sizePack2NoDeltas = os.size();
  340. tearDown();
  341. setUp();
  342. testWritePack2DeltasReuseRefs();
  343. final long sizePack2DeltasRefs = os.size();
  344. assertTrue(sizePack2NoDeltas > sizePack2DeltasRefs);
  345. }
  346. /**
  347. * Compare sizes of packs created using
  348. * {@link #testWritePack2DeltasReuseRefs()} and
  349. * {@link #testWritePack2DeltasReuseOffsets()}. The pack with delta bases
  350. * written as offsets should be smaller.
  351. *
  352. * @throws Exception
  353. */
  354. @Test
  355. public void testWritePack2SizeOffsetsVsRefs() throws Exception {
  356. testWritePack2DeltasReuseRefs();
  357. final long sizePack2DeltasRefs = os.size();
  358. tearDown();
  359. setUp();
  360. testWritePack2DeltasReuseOffsets();
  361. final long sizePack2DeltasOffsets = os.size();
  362. assertTrue(sizePack2DeltasRefs > sizePack2DeltasOffsets);
  363. }
  364. /**
  365. * Compare sizes of packs created using {@link #testWritePack4()} and
  366. * {@link #testWritePack4ThinPack()}. Obviously, the thin pack should be
  367. * smaller.
  368. *
  369. * @throws Exception
  370. */
  371. @Test
  372. public void testWritePack4SizeThinVsNoThin() throws Exception {
  373. testWritePack4();
  374. final long sizePack4 = os.size();
  375. tearDown();
  376. setUp();
  377. testWritePack4ThinPack();
  378. final long sizePack4Thin = os.size();
  379. assertTrue(sizePack4 > sizePack4Thin);
  380. }
  381. @Test
  382. public void testDeltaStatistics() throws Exception {
  383. config.setDeltaCompress(true);
  384. // TestRepository will close repo
  385. FileRepository repo = createBareRepository();
  386. ArrayList<RevObject> blobs = new ArrayList<>();
  387. try (TestRepository<FileRepository> testRepo = new TestRepository<>(
  388. repo)) {
  389. blobs.add(testRepo.blob(genDeltableData(1000)));
  390. blobs.add(testRepo.blob(genDeltableData(1005)));
  391. try (PackWriter pw = new PackWriter(repo)) {
  392. NullProgressMonitor m = NullProgressMonitor.INSTANCE;
  393. pw.preparePack(blobs.iterator());
  394. pw.writePack(m, m, os);
  395. PackStatistics stats = pw.getStatistics();
  396. assertEquals(1, stats.getTotalDeltas());
  397. assertTrue("Delta bytes not set.",
  398. stats.byObjectType(OBJ_BLOB).getDeltaBytes() > 0);
  399. }
  400. }
  401. }
  402. // Generate consistent junk data for building files that delta well
  403. private String genDeltableData(int length) {
  404. assertTrue("Generated data must have a length > 0", length > 0);
  405. char[] data = {'a', 'b', 'c', '\n'};
  406. StringBuilder builder = new StringBuilder(length);
  407. for (int i = 0; i < length; i++) {
  408. builder.append(data[i % 4]);
  409. }
  410. return builder.toString();
  411. }
  412. @Test
  413. public void testWriteIndex() throws Exception {
  414. config.setIndexVersion(2);
  415. writeVerifyPack4(false);
  416. PackFile packFile = pack.getPackFile();
  417. PackFile indexFile = packFile.create(PackExt.INDEX);
  418. // Validate that IndexPack came up with the right CRC32 value.
  419. final PackIndex idx1 = PackIndex.open(indexFile);
  420. assertTrue(idx1 instanceof PackIndexV2);
  421. assertEquals(0x4743F1E4L, idx1.findCRC32(ObjectId
  422. .fromString("82c6b885ff600be425b4ea96dee75dca255b69e7")));
  423. // Validate that an index written by PackWriter is the same.
  424. final File idx2File = new File(indexFile.getAbsolutePath() + ".2");
  425. try (FileOutputStream is = new FileOutputStream(idx2File)) {
  426. writer.writeIndex(is);
  427. }
  428. final PackIndex idx2 = PackIndex.open(idx2File);
  429. assertTrue(idx2 instanceof PackIndexV2);
  430. assertEquals(idx1.getObjectCount(), idx2.getObjectCount());
  431. assertEquals(idx1.getOffset64Count(), idx2.getOffset64Count());
  432. for (int i = 0; i < idx1.getObjectCount(); i++) {
  433. final ObjectId id = idx1.getObjectId(i);
  434. assertEquals(id, idx2.getObjectId(i));
  435. assertEquals(idx1.findOffset(id), idx2.findOffset(id));
  436. assertEquals(idx1.findCRC32(id), idx2.findCRC32(id));
  437. }
  438. }
  439. @Test
  440. public void testExclude() throws Exception {
  441. // TestRepository closes repo
  442. FileRepository repo = createBareRepository();
  443. try (TestRepository<FileRepository> testRepo = new TestRepository<>(
  444. repo)) {
  445. BranchBuilder bb = testRepo.branch("refs/heads/master");
  446. contentA = testRepo.blob("A");
  447. c1 = bb.commit().add("f", contentA).create();
  448. testRepo.getRevWalk().parseHeaders(c1);
  449. PackIndex pf1 = writePack(repo, wants(c1), EMPTY_ID_SET);
  450. assertContent(pf1, Arrays.asList(c1.getId(), c1.getTree().getId(),
  451. contentA.getId()));
  452. contentB = testRepo.blob("B");
  453. c2 = bb.commit().add("f", contentB).create();
  454. testRepo.getRevWalk().parseHeaders(c2);
  455. PackIndex pf2 = writePack(repo, wants(c2),
  456. Sets.of((ObjectIdSet) pf1));
  457. assertContent(pf2, Arrays.asList(c2.getId(), c2.getTree().getId(),
  458. contentB.getId()));
  459. }
  460. }
  461. private static void assertContent(PackIndex pi, List<ObjectId> expected) {
  462. assertEquals("Pack index has wrong size.", expected.size(),
  463. pi.getObjectCount());
  464. for (int i = 0; i < pi.getObjectCount(); i++)
  465. assertTrue(
  466. "Pack index didn't contain the expected id "
  467. + pi.getObjectId(i),
  468. expected.contains(pi.getObjectId(i)));
  469. }
  470. @Test
  471. public void testShallowIsMinimalDepth1() throws Exception {
  472. try (FileRepository repo = setupRepoForShallowFetch()) {
  473. PackIndex idx = writeShallowPack(repo, 1, wants(c2), NONE, NONE);
  474. assertContent(idx, Arrays.asList(c2.getId(), c2.getTree().getId(),
  475. contentA.getId(), contentB.getId()));
  476. // Client already has blobs A and B, verify those are not packed.
  477. idx = writeShallowPack(repo, 1, wants(c5), haves(c2), shallows(c2));
  478. assertContent(idx, Arrays.asList(c5.getId(), c5.getTree().getId(),
  479. contentC.getId(), contentD.getId(), contentE.getId()));
  480. }
  481. }
  482. @Test
  483. public void testShallowIsMinimalDepth2() throws Exception {
  484. try (FileRepository repo = setupRepoForShallowFetch()) {
  485. PackIndex idx = writeShallowPack(repo, 2, wants(c2), NONE, NONE);
  486. assertContent(idx,
  487. Arrays.asList(c1.getId(), c2.getId(), c1.getTree().getId(),
  488. c2.getTree().getId(), contentA.getId(),
  489. contentB.getId()));
  490. // Client already has blobs A and B, verify those are not packed.
  491. idx = writeShallowPack(repo, 2, wants(c5), haves(c1, c2),
  492. shallows(c1));
  493. assertContent(idx,
  494. Arrays.asList(c4.getId(), c5.getId(), c4.getTree().getId(),
  495. c5.getTree().getId(), contentC.getId(),
  496. contentD.getId(), contentE.getId()));
  497. }
  498. }
  499. @Test
  500. public void testShallowFetchShallowParentDepth1() throws Exception {
  501. try (FileRepository repo = setupRepoForShallowFetch()) {
  502. PackIndex idx = writeShallowPack(repo, 1, wants(c5), NONE, NONE);
  503. assertContent(idx, Arrays.asList(c5.getId(), c5.getTree().getId(),
  504. contentA.getId(), contentB.getId(), contentC.getId(),
  505. contentD.getId(), contentE.getId()));
  506. idx = writeShallowPack(repo, 1, wants(c4), haves(c5), shallows(c5));
  507. assertContent(idx, Arrays.asList(c4.getId(), c4.getTree().getId()));
  508. }
  509. }
  510. @Test
  511. public void testShallowFetchShallowParentDepth2() throws Exception {
  512. try (FileRepository repo = setupRepoForShallowFetch()) {
  513. PackIndex idx = writeShallowPack(repo, 2, wants(c5), NONE, NONE);
  514. assertContent(idx,
  515. Arrays.asList(c4.getId(), c5.getId(), c4.getTree().getId(),
  516. c5.getTree().getId(), contentA.getId(),
  517. contentB.getId(), contentC.getId(),
  518. contentD.getId(), contentE.getId()));
  519. idx = writeShallowPack(repo, 2, wants(c3), haves(c4, c5),
  520. shallows(c4));
  521. assertContent(idx, Arrays.asList(c2.getId(), c3.getId(),
  522. c2.getTree().getId(), c3.getTree().getId()));
  523. }
  524. }
  525. @Test
  526. public void testShallowFetchShallowAncestorDepth1() throws Exception {
  527. try (FileRepository repo = setupRepoForShallowFetch()) {
  528. PackIndex idx = writeShallowPack(repo, 1, wants(c5), NONE, NONE);
  529. assertContent(idx, Arrays.asList(c5.getId(), c5.getTree().getId(),
  530. contentA.getId(), contentB.getId(), contentC.getId(),
  531. contentD.getId(), contentE.getId()));
  532. idx = writeShallowPack(repo, 1, wants(c3), haves(c5), shallows(c5));
  533. assertContent(idx, Arrays.asList(c3.getId(), c3.getTree().getId()));
  534. }
  535. }
  536. @Test
  537. public void testShallowFetchShallowAncestorDepth2() throws Exception {
  538. try (FileRepository repo = setupRepoForShallowFetch()) {
  539. PackIndex idx = writeShallowPack(repo, 2, wants(c5), NONE, NONE);
  540. assertContent(idx,
  541. Arrays.asList(c4.getId(), c5.getId(), c4.getTree().getId(),
  542. c5.getTree().getId(), contentA.getId(),
  543. contentB.getId(), contentC.getId(),
  544. contentD.getId(), contentE.getId()));
  545. idx = writeShallowPack(repo, 2, wants(c2), haves(c4, c5),
  546. shallows(c4));
  547. assertContent(idx, Arrays.asList(c1.getId(), c2.getId(),
  548. c1.getTree().getId(), c2.getTree().getId()));
  549. }
  550. }
  551. private FileRepository setupRepoForShallowFetch() throws Exception {
  552. FileRepository repo = createBareRepository();
  553. // TestRepository will close the repo, but we need to return an open
  554. // one!
  555. repo.incrementOpen();
  556. try (TestRepository<Repository> r = new TestRepository<>(repo)) {
  557. BranchBuilder bb = r.branch("refs/heads/master");
  558. contentA = r.blob("A");
  559. contentB = r.blob("B");
  560. contentC = r.blob("C");
  561. contentD = r.blob("D");
  562. contentE = r.blob("E");
  563. c1 = bb.commit().add("a", contentA).create();
  564. c2 = bb.commit().add("b", contentB).create();
  565. c3 = bb.commit().add("c", contentC).create();
  566. c4 = bb.commit().add("d", contentD).create();
  567. c5 = bb.commit().add("e", contentE).create();
  568. r.getRevWalk().parseHeaders(c5); // fully initialize the tip RevCommit
  569. return repo;
  570. }
  571. }
  572. private static PackIndex writePack(FileRepository repo,
  573. Set<? extends ObjectId> want, Set<ObjectIdSet> excludeObjects)
  574. throws IOException {
  575. try (RevWalk walk = new RevWalk(repo)) {
  576. return writePack(repo, walk, 0, want, NONE, excludeObjects);
  577. }
  578. }
  579. private static PackIndex writeShallowPack(FileRepository repo, int depth,
  580. Set<? extends ObjectId> want, Set<? extends ObjectId> have,
  581. Set<? extends ObjectId> shallow) throws IOException {
  582. // During negotiation, UploadPack would have set up a DepthWalk and
  583. // marked the client's "shallow" commits. Emulate that here.
  584. try (DepthWalk.RevWalk walk = new DepthWalk.RevWalk(repo, depth - 1)) {
  585. walk.assumeShallow(shallow);
  586. return writePack(repo, walk, depth, want, have, EMPTY_ID_SET);
  587. }
  588. }
  589. private static PackIndex writePack(FileRepository repo, RevWalk walk,
  590. int depth, Set<? extends ObjectId> want,
  591. Set<? extends ObjectId> have, Set<ObjectIdSet> excludeObjects)
  592. throws IOException {
  593. try (PackWriter pw = new PackWriter(repo)) {
  594. pw.setDeltaBaseAsOffset(true);
  595. pw.setReuseDeltaCommits(false);
  596. for (ObjectIdSet idx : excludeObjects) {
  597. pw.excludeObjects(idx);
  598. }
  599. if (depth > 0) {
  600. pw.setShallowPack(depth, null);
  601. }
  602. // ow doesn't need to be closed; caller closes walk.
  603. ObjectWalk ow = walk.toObjectWalkWithSameObjects();
  604. pw.preparePack(NullProgressMonitor.INSTANCE, ow, want, have, NONE);
  605. File packdir = repo.getObjectDatabase().getPackDirectory();
  606. PackFile packFile = new PackFile(packdir, pw.computeName(),
  607. PackExt.PACK);
  608. try (FileOutputStream packOS = new FileOutputStream(packFile)) {
  609. pw.writePack(NullProgressMonitor.INSTANCE,
  610. NullProgressMonitor.INSTANCE, packOS);
  611. }
  612. PackFile idxFile = packFile.create(PackExt.INDEX);
  613. try (FileOutputStream idxOS = new FileOutputStream(idxFile)) {
  614. pw.writeIndex(idxOS);
  615. }
  616. return PackIndex.open(idxFile);
  617. }
  618. }
  619. // TODO: testWritePackDeltasCycle()
  620. // TODO: testWritePackDeltasDepth()
  621. private void writeVerifyPack1() throws IOException {
  622. final HashSet<ObjectId> interestings = new HashSet<>();
  623. interestings.add(ObjectId
  624. .fromString("82c6b885ff600be425b4ea96dee75dca255b69e7"));
  625. createVerifyOpenPack(interestings, NONE, false, false);
  626. final ObjectId expectedOrder[] = new ObjectId[] {
  627. ObjectId.fromString("82c6b885ff600be425b4ea96dee75dca255b69e7"),
  628. ObjectId.fromString("c59759f143fb1fe21c197981df75a7ee00290799"),
  629. ObjectId.fromString("540a36d136cf413e4b064c2b0e0a4db60f77feab"),
  630. ObjectId.fromString("aabf2ffaec9b497f0950352b3e582d73035c2035"),
  631. ObjectId.fromString("902d5476fa249b7abc9d84c611577a81381f0327"),
  632. ObjectId.fromString("4b825dc642cb6eb9a060e54bf8d69288fbee4904"),
  633. ObjectId.fromString("6ff87c4664981e4397625791c8ea3bbb5f2279a3"),
  634. ObjectId.fromString("5b6e7c66c276e7610d4a73c70ec1a1f7c1003259") };
  635. assertEquals(expectedOrder.length, writer.getObjectCount());
  636. verifyObjectsOrder(expectedOrder);
  637. assertEquals("34be9032ac282b11fa9babdc2b2a93ca996c9c2f", writer
  638. .computeName().name());
  639. }
  640. private void writeVerifyPack2(boolean deltaReuse) throws IOException {
  641. config.setReuseDeltas(deltaReuse);
  642. final HashSet<ObjectId> interestings = new HashSet<>();
  643. interestings.add(ObjectId
  644. .fromString("82c6b885ff600be425b4ea96dee75dca255b69e7"));
  645. final HashSet<ObjectId> uninterestings = new HashSet<>();
  646. uninterestings.add(ObjectId
  647. .fromString("540a36d136cf413e4b064c2b0e0a4db60f77feab"));
  648. createVerifyOpenPack(interestings, uninterestings, false, false);
  649. final ObjectId expectedOrder[] = new ObjectId[] {
  650. ObjectId.fromString("82c6b885ff600be425b4ea96dee75dca255b69e7"),
  651. ObjectId.fromString("c59759f143fb1fe21c197981df75a7ee00290799"),
  652. ObjectId.fromString("aabf2ffaec9b497f0950352b3e582d73035c2035"),
  653. ObjectId.fromString("902d5476fa249b7abc9d84c611577a81381f0327"),
  654. ObjectId.fromString("6ff87c4664981e4397625791c8ea3bbb5f2279a3") ,
  655. ObjectId.fromString("5b6e7c66c276e7610d4a73c70ec1a1f7c1003259") };
  656. if (!config.isReuseDeltas() && !config.isDeltaCompress()) {
  657. // If no deltas are in the file the final two entries swap places.
  658. swap(expectedOrder, 4, 5);
  659. }
  660. assertEquals(expectedOrder.length, writer.getObjectCount());
  661. verifyObjectsOrder(expectedOrder);
  662. assertEquals("ed3f96b8327c7c66b0f8f70056129f0769323d86", writer
  663. .computeName().name());
  664. }
  665. private static void swap(ObjectId[] arr, int a, int b) {
  666. ObjectId tmp = arr[a];
  667. arr[a] = arr[b];
  668. arr[b] = tmp;
  669. }
  670. private void writeVerifyPack4(final boolean thin) throws IOException {
  671. final HashSet<ObjectId> interestings = new HashSet<>();
  672. interestings.add(ObjectId
  673. .fromString("82c6b885ff600be425b4ea96dee75dca255b69e7"));
  674. final HashSet<ObjectId> uninterestings = new HashSet<>();
  675. uninterestings.add(ObjectId
  676. .fromString("c59759f143fb1fe21c197981df75a7ee00290799"));
  677. createVerifyOpenPack(interestings, uninterestings, thin, false);
  678. final ObjectId writtenObjects[] = new ObjectId[] {
  679. ObjectId.fromString("82c6b885ff600be425b4ea96dee75dca255b69e7"),
  680. ObjectId.fromString("aabf2ffaec9b497f0950352b3e582d73035c2035"),
  681. ObjectId.fromString("5b6e7c66c276e7610d4a73c70ec1a1f7c1003259") };
  682. assertEquals(writtenObjects.length, writer.getObjectCount());
  683. ObjectId expectedObjects[];
  684. if (thin) {
  685. expectedObjects = new ObjectId[4];
  686. System.arraycopy(writtenObjects, 0, expectedObjects, 0,
  687. writtenObjects.length);
  688. expectedObjects[3] = ObjectId
  689. .fromString("6ff87c4664981e4397625791c8ea3bbb5f2279a3");
  690. } else {
  691. expectedObjects = writtenObjects;
  692. }
  693. verifyObjectsOrder(expectedObjects);
  694. assertEquals("cded4b74176b4456afa456768b2b5aafb41c44fc", writer
  695. .computeName().name());
  696. }
  697. private void createVerifyOpenPack(final Set<ObjectId> interestings,
  698. final Set<ObjectId> uninterestings, final boolean thin,
  699. final boolean ignoreMissingUninteresting)
  700. throws MissingObjectException, IOException {
  701. createVerifyOpenPack(interestings, uninterestings, thin,
  702. ignoreMissingUninteresting, false);
  703. }
  704. private void createVerifyOpenPack(final Set<ObjectId> interestings,
  705. final Set<ObjectId> uninterestings, final boolean thin,
  706. final boolean ignoreMissingUninteresting, boolean useBitmaps)
  707. throws MissingObjectException, IOException {
  708. NullProgressMonitor m = NullProgressMonitor.INSTANCE;
  709. writer = new PackWriter(config, db.newObjectReader());
  710. writer.setUseBitmaps(useBitmaps);
  711. writer.setThin(thin);
  712. writer.setIgnoreMissingUninteresting(ignoreMissingUninteresting);
  713. writer.preparePack(m, interestings, uninterestings);
  714. writer.writePack(m, m, os);
  715. writer.close();
  716. verifyOpenPack(thin);
  717. }
  718. private void createVerifyOpenPack(List<RevObject> objectSource)
  719. throws MissingObjectException, IOException {
  720. NullProgressMonitor m = NullProgressMonitor.INSTANCE;
  721. writer = new PackWriter(config, db.newObjectReader());
  722. writer.preparePack(objectSource.iterator());
  723. assertEquals(objectSource.size(), writer.getObjectCount());
  724. writer.writePack(m, m, os);
  725. writer.close();
  726. verifyOpenPack(false);
  727. }
  728. private void verifyOpenPack(boolean thin) throws IOException {
  729. final byte[] packData = os.toByteArray();
  730. if (thin) {
  731. PackParser p = index(packData);
  732. try {
  733. p.parse(NullProgressMonitor.INSTANCE);
  734. fail("indexer should grumble about missing object");
  735. } catch (IOException x) {
  736. // expected
  737. }
  738. }
  739. ObjectDirectoryPackParser p = (ObjectDirectoryPackParser) index(packData);
  740. p.setKeepEmpty(true);
  741. p.setAllowThin(thin);
  742. p.setIndexVersion(2);
  743. p.parse(NullProgressMonitor.INSTANCE);
  744. pack = p.getPack();
  745. assertNotNull("have PackFile after parsing", pack);
  746. }
  747. private PackParser index(byte[] packData) throws IOException {
  748. if (inserter == null)
  749. inserter = dst.newObjectInserter();
  750. return inserter.newPackParser(new ByteArrayInputStream(packData));
  751. }
  752. private void verifyObjectsOrder(ObjectId objectsOrder[]) {
  753. final List<PackIndex.MutableEntry> entries = new ArrayList<>();
  754. for (MutableEntry me : pack) {
  755. entries.add(me.cloneEntry());
  756. }
  757. Collections.sort(entries, (MutableEntry o1, MutableEntry o2) -> Long
  758. .signum(o1.getOffset() - o2.getOffset()));
  759. int i = 0;
  760. for (MutableEntry me : entries) {
  761. assertEquals(objectsOrder[i++].toObjectId(), me.toObjectId());
  762. }
  763. }
  764. private static Set<ObjectId> haves(ObjectId... objects) {
  765. return Sets.of(objects);
  766. }
  767. private static Set<ObjectId> wants(ObjectId... objects) {
  768. return Sets.of(objects);
  769. }
  770. private static Set<ObjectId> shallows(ObjectId... objects) {
  771. return Sets.of(objects);
  772. }
  773. }