Shallow fetch/clone: Make --depth mean the total history depth
cgit changed the --depth parameter to mean the total depth of history
rather than the depth of ancestors to be returned [1]. JGit still uses
the latter meaning, so update it to match cgit.
depth=0 still means a non-shallow clone. depth=1 now means only the
wants rather than the wants and their direct parents.
This is accomplished by changing the semantic meaning of "depth" in
UploadPack and PackWriter to mean the total depth of history desired,
while keeping "depth" in DepthWalk.{RevWalk,ObjectWalk} to mean
the depth of traversal. Thus UploadPack and PackWriter always
initialize their DepthWalks with "depth-1".
[1] upload-pack: fix off-by-one depth calculation in shallow clone
https://code.googlesource.com/git/+/682c7d2f1a2d1a5443777237450505738af2ff1a
Change-Id: I87ed3c0f56c37e3491e367a41f5e555c4207ff44
Signed-off-by: Terry Parker <tparker@google.com>
When fetching from a shallow clone, the client sends "have" lines
to tell the server about objects it already has and "shallow" lines
to tell where its local history terminates. In some circumstances,
the server fails to honor the shallow lines and fails to return
objects that the client needs.
UploadPack passes the "have" lines to PackWriter so PackWriter can
omit them from the generated pack. UploadPack processes "shallow"
lines by calling RevWalk.assumeShallow() with the set of shallow
commits. RevWalk creates and caches RevCommits for these shallow
commits, clearing out their parents. That way, walks correctly
terminate at the shallow commits instead of assuming the client has
history going back behind them. UploadPack converts its RevWalk to an
ObjectWalk, maintaining the cached RevCommits, and passes it to
PackWriter.
Unfortunately, to support shallow fetches the PackWriter does the
following:
if (shallowPack && !(walk instanceof DepthWalk.ObjectWalk))
walk = new DepthWalk.ObjectWalk(reader, depth);
That is, when the client sends a "deepen" line (fetch --depth=<n>)
and the caller has not passed in a DepthWalk.ObjectWalk, PackWriter
throws away the RevWalk that was passed in and makes a new one. The
cleared parent lists prepared by RevWalk.assumeShallow() are lost.
Fortunately UploadPack intends to pass in a DepthWalk.ObjectWalk.
It tries to create it by calling toObjectWalkWithSameObjects() on
a DepthWalk.RevWalk. But it doesn't work: because DepthWalk.RevWalk
does not override the standard RevWalk#toObjectWalkWithSameObjects
implementation, the result is a plain ObjectWalk instead of an
instance of DepthWalk.ObjectWalk.
The result is that the "shallow" information is thrown away and
objects reachable from the shallow commits can be omitted from the
pack sent when fetching with --depth from a shallow clone.
Multiple factors collude to limit the circumstances under which this
bug can be observed:
1. Commits with depth != 0 don't enter DepthGenerator's pending queue.
That means a "have" cannot have any effect on DepthGenerator unless
it is also a "want".
2. DepthGenerator#next() doesn't call carryFlagsImpl(), so the
uninteresting flag is not propagated to ancestors there even if a
"have" is also a "want".
3. JGit treats a depth of 1 as "1 past the wants".
Because of (2), the only place the UNINTERESTING flag can leak to a
shallow commit's parents is in the carryFlags() call from
markUninteresting(). carryFlags() only traverses commits that have
already been parsed: commits yet to be parsed are supposed to inherit
correct flags from their parent in PendingGenerator#next (which
doesn't happen here --- that is (2)). So the list of commits that have
already been parsed becomes relevant.
When we hit the markUninteresting() call, all "want"s, "have"s, and
commits to be unshallowed have been parsed. carryFlags() only
affects the parsed commits. If the "want" is a direct parent of a
"have", then it carryFlags() marks it as uninteresting. If the "have"
was also a "shallow", then its parent pointer should have been null
and the "want" shouldn't have been marked, so we see the bug. If the
"want" is a more distant ancestor then (2) keeps the uninteresting
state from propagating to the "want" and we don't see the bug. If the
"shallow" is not also a "have" then the shallow commit isn't parsed
so (2) keeps the uninteresting state from propagating to the "want
so we don't see the bug.
Here is a reproduction case (time flowing left to right, arrows
pointing to parents). "C" must be a commit that the client
reports as a "have" during negotiation. That can only happen if the
server reports it as an existing branch or tag in the first round of
negotiation:
A <-- B <-- C <-- D
First do
git clone --depth 1 <repo>
which yields D as a "have" and C as a "shallow" commit. Then try
git fetch --depth 1 <repo> B:refs/heads/B
Negotiation sets up: have D, shallow C, have C, want B.
But due to this bug B is marked as uninteresting and is not sent.
Change-Id: I6e14b57b2f85e52d28cdcf356df647870f475440
Signed-off-by: Terry Parker <tparker@google.com>
When doing an incremental fetch from JGit, "have" commits are marked
as "uninteresting". In a non-shallow fetch, when the RevWalk hits an
"uninteresting" commit it marks the commit's corresponding tree as
uninteresting. That has the effect of dropping those trees and all the
trees and blobs they reference out of the thin pack returned to the
client.
However, shallow fetches use a DepthWalk to limit the RevWalk, which
nearly always causes the RevWalk to terminate before encountering the
"have" commits. As a result the pack created for the incremental fetch
never encounters "uninteresting" tree objects and thus includes
duplicate objects that it knows the client already has.
Change-Id: I7b1f7c3b0d83e04d34cd2fa676f1ad4fec904c05
Signed-off-by: Terry Parker <tparker@google.com>
PackWriter: Declare preparePack object sets as @NonNull
Require callers to pass in valid sets for both want and have
collections. Offer PackWriter.NONE as a handy constant for an
empty collection for the have part of preparePack instead of null.
Change-Id: Ifda4450f5e488cbfefd728382b7d30797e229217
Hoist ObjectIdSet up to lib as part of the public API and add
the interface to some common types like PackIndex and JGit custom
ObjectId map types. This cleans up wrapper code in a number of
places by allowing direct use of the types as an ObjectIdSet.
Future commits can now rely on ObjectIdSet as a simple read-only
type to check a set of objects from a number of storage options.
Change-Id: Ib62b062421d475bd52abd6c84a73916ef36e084b
Previously, non-reuse deltas were only included in packStatistics if they
were not cached by the deltaWindow.
Change-Id: I7684d8214875f0a7569b34614f8a3ba341dbde9c
Signed-off-by: James Kolb <jkolb@google.com>
When writing new packs it should be allowed to specify objects as "have"
(objects which should not be included in the pack) which do not exist in
the local repository.
This works with the traditional PackWriter, but when PackWriter was
working on a repository with bitmap indexes and used
PackWriterBitmapWalker then this feature was broken. Non-existing "have"
objects lead to MissingObjectExceptions. That broke push and Gerrit
replication. When the replication target had branches unknown to the
replication source then the source repository wanted to build pack files
where "have" included branch-tips which were unknown in the source
repository.
Bug: 427107
Change-Id: I6b6598a1ec49af68aa77ea6f1f06e827982ea4ac
Also-by: Matthias Sohn <matthias.sohn@sap.com>
Move SampleDataRepositoryTestCase to org.eclipse.jgit.test
This class requires resources which are private to another
bundle. Move the class next to its resources, removing an
odd cross bundle dependency.
Change-Id: I30d5568b09ea5fb3bd3bb60b602f149c0867f49a
JGit 3.0: move internal classes into an internal subpackage
This breaks all existing callers once. Applications are not supposed
to build against the internal storage API unless they can accept API
churn and make necessary updates as versions change.
Change-Id: I2ab1327c202ef2003565e1b0770a583970e432e9
Update DfsGarbageCollector to not read back a pack index.
Previously, the Dfs GC excluded objects from packs by passing a
previously written index to the PackWriter. Reading back a file on
Dfs is slow. Instead, allow the PackWriter to expose the objects
included in a pack and forward that to invocations of excludeObjects() .
Change-Id: I377cb4ab07f62cf790505e1eeb0b2efe81897c79
Remove packIndex field from FileObjDatabase openPack method.
Previously, the FileObjDatabase required both the pack file path and
index file path to be passed to openPack(). A future change to add
a bitmap index will add a .bitmap file parallel to the pack file
(similar to the .idx file). Update the PackFile to support
automatically loading pack index extensions based on the pack file
path.
Change-Id: Ifc8fc3e57f4afa177ba5a88df87334dbfa799f01
PackWriter supports excluding objects from being written to the pack.
You may specify a PackIndex which lists all those objects which should
not go into the new pack. This feature was broken because not all
commits have been checked whether they should be excluded or not. For
other object types the exclude algorithm worked. This commit adds the
missing check.
Change-Id: Id0047098393641ccba784c58b8325175c22fcece
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
During parsing these are used with contains(). If they are a List
type, the contains operation is not efficient. Some callers such
as UploadPack often pass a List here, so convert to Set when the
type isn't efficient for contains().
Change-Id: If948ae3bf1f46e756bd2d5db14795e12ba7a6207
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
PackWriter: Rename getObjectsNumber to getObjectCount
This better matches with PackFile and CachedPack's methods
that return the same value.
Change-Id: Idb9b7c71d2048dd2344a62c2cde20b4e34529ab7
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
PackWriter incorrectly returned 0 from getObjectsNumber() when the
pack has not been written yet. This caused dumb transports like
amazon-s3:// and sftp:// to abort early and never write out a pack,
under the assumption that the pack had no objects.
Until the pack header is written to the output stream, compute the
current object count each time it is requested. Once the header is
started, use the object count from the stats object.
Change-Id: I041a2368ae0cfe6f649ec28658d41a6355933900
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Refactor IndexPack to not require local filesystem
By moving the logic that parses a pack stream from the network (or
a bundle) into a type that can be constructed by an ObjectInserter,
repository implementations have a chance to inject their own logic
for storing object data received into the destination repository.
The API isn't completely generic yet, there are still quite a few
assumptions that the PackParser subclass is storing the data onto
the local filesystem as a single file. But its about the simplest
split of IndexPack I can come up with without completely ripping
the code apart.
Change-Id: I5b167c9cc6d7a7c56d0197c62c0fd0036a83ec6c
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Eclipse has some problem re-running single JUnit tests if
the tests are in Junit 3 format, but the JUnit 4 launcher
is used. This was quite unnecessary and the move was not
completed. We still have no JUnit4 test.
This completes the extermination of JUnit3. Most of the
work was global searce/replace using regular expression,
followed by numerous invocarions of quick-fix and organize
imports and verification that we had the same number of
tests before and after.
- Annotations were introduced.
- All references to JUnit3 classes removed
- Half-good replacement for getting the test name. This was
needed to make the TestRngs work. The initialization of
TestRngs was also made lazily since we can not longer find
out the test name in runtime in the @Before methods.
- Renamed test classes to end with Test, with the exception
of TestTranslateBundle, which fails from Maven
- Moved JGitTestUtil to the junit support bundle
Change-Id: Iddcd3da6ca927a7be773a9c63ebf8bb2147e2d13
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This refactoring permits applications to configure global per-process
settings for all packing and easily pass it through to per-request
PackWriters, ensuring that the process configuration overrides the
repository specific settings.
For example this might help in a daemon environment where the server
wants to cap the resources used to serve a dynamic upload pack
request, even though the repository's own pack.* settings might be
configured to be more aggressive. This allows fast but less bandwidth
efficient serving of clients, while still retaining good compression
through a cron managed `git gc`.
Change-Id: I58cc5e01b48924b1a99f79aa96c8150cdfc50846
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Move PackWriter progress monitors onto the operations
Rather than taking the ProgressMonitor objects in our constructor and
carrying them around as instance fields, take them as arguments to the
actual time consuming operations we need to run.
Change-Id: I2b230d07e277de029b1061c807e67de5428cc1c4
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Ensure ObjectReader used by PackWriter is released
The ObjectReader API demands that we release the reader when we are
done with it. PackWriter contains a reader, which it uses for the
entire packing session. Expose the release of the reader through
a release method on the writer.
This still doesn't address the RevWalk and TreeWalk users, who
don't correctly release their reader. But its a small step in the
right direction.
Change-Id: I5cb0b5c1b432434a799fceb21b86479e09b84a0a
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Similar to what we did with the file code, move the pack writer
into its own package so the related classes and their package
private methods are hidden from the rest of the library.
Change-Id: Ic1b5c7c8c8d266e90c910d8d68dfc8e93586854f
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Move FileRepository to storage.file.FileRepository
This move isolates all of the local file specific implementation code
into a single package, where their package-private methods and support
classes are properly hidden away from the rest of the core library.
Because of the sheer number of files impacted, I have limited this
change to only the renames and the updated imports.
Change-Id: Icca4884e1a418f83f8b617d0c4c78b73d8a4bd17
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Use getObjectsDatabase().getDirectory() to find objects
Only the ObjectDirectory type of database knows where to find the
objects directory on the local filesystem, so defer to it whenever
we need to know where the objects reside. Since this is the type
returned by FileRepository's getObjectDatabase() method, we mostly
don't have to do much other than use a slightly longer invocation.
Change-Id: Ie5f58132a6411b56c3acad73646ad169d78a0654
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This stream was used only to determine how many bytes had been
written thus far. Except we're always dumping it into a simple
ByteArrayOutputStream, which also knows that. Drop the dependency
on the pack stream and use ByteArrayOutputStream directly.
This lets us later move this test into the new storage.file
package without dragging along the pack stream that is an internal
implementation detail of PackWriter, which is more general than
just the file storage layer.
Change-Id: I291689c0b1ed799270c213ee73b710b2637fb238
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Only import the sample data packs on tests that need them
Not all of our test cases really require the sample data packs,
and we are better off not using them because its hard to see exactly
what condition a test is testing when looking only at the Java code.
Clarify the dependency by only making the packs available when
there is a real need for it.
Change-Id: Id8a76ee7ee1f7efba585be4bed19a8fb5b3b3585
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Per CQ 3448 this is the initial contribution of the JGit project
to eclipse.org. It is derived from the historical JGit repository
at commit 3a2dd9921c.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>