Shawn O. Pearce [Tue, 7 Dec 2010 22:05:40 +0000 (14:05 -0800)]
IndexPack: Use streaming for large whole blobs
When indexing large blobs that are stored whole (non-delta form),
avoid allocating the entire blob in memory and instead stream it
through the SHA-1 checksum computation. This reduces the size
of memory required by IndexPack when processing very big blobs,
such as a 500 MiB uncompressable binary.
If the large blob already exists in the local repository, its
contents needs to be compared byte-for-byte after the entire pack
has been indexed, to ensure there isn't an unexpected SHA-1 collision
which may result in later data corruption. This compare is performed
as a streaming compare, again avoiding the large object allocation.
This change doesn't improve on memory utilization for large objects
stored as deltas. The change also doesn't improve handling for
any large commits, trees or annotated tags. There isn't much to
be done here for those objects, because they need to be passed down
to the ObjectChecker as a byte[]. Fortunately it isn't common for
these object types to be that large,
Bug: 312868
Change-Id: I862afd4cb78013ee033d4ec68c067b1774a05be8 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com> CC: Roberto Tyley <roberto.tyley@guardian.co.uk>
Some method parameters in WorkingTreeIterator are never used. Remove
them. Especially the removal of the FS parameter in isModified()
simplifies upcoming performance optimizations.
Change-Id: I7c449589283a4a6b6e23f2586cd784febdca8bcd Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Jens Baumgart [Wed, 8 Dec 2010 15:20:28 +0000 (16:20 +0100)]
Introduce http test bundle
Introduce a http test bundle to make this functionality available for
EGit tests. A simple http server class is provided. The jetty version
was updated to a version that is also available via p2 (needed in EGit
UI tests).
Change-Id: I13bfc4c6c47e27d8f97d3e9752347d6d23e553d4 Signed-off-by: Jens Baumgart <jens.baumgart@sap.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Shawn O. Pearce [Wed, 8 Dec 2010 00:49:51 +0000 (16:49 -0800)]
Remove empty iterator from TreeWalk
Its confusing that a new TreeWalk() needs to have reset() invoked
on it before addTree(). This is a historical accident caused by
how TreeWalk was abused within ObjectWalk.
Drop the initial empty tree from the TreeWalk and thus remove a
number of pointless reset() operations from unit tests and some of
the internal JGit code.
Existing application code which is still calling reset() will simply
be incurring a few unnecessary field assignments, but they should
consider cleaning up their code in the future.
Change-Id: I434e94ffa43491019e7dff52ca420a4d2245f48b Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Shawn O. Pearce [Tue, 7 Dec 2010 20:45:52 +0000 (12:45 -0800)]
Refactor IndexPack to use InputStream for inflation
By inflating with an InputStream like API, it is possible to stream
through large objects rather than allocating the entire thing as
a byte[]. This change only refactors the inflation code within
IndexPack to use a streaming interface.
Matthias Sohn [Mon, 6 Dec 2010 23:58:19 +0000 (00:58 +0100)]
[findbugs] Do not ignore exceptional return value
java.io.File.delete() reports failure as an exceptional
return value false. Fix the code which silently ignored
this exceptional return value. Also remove some duplicate
deletion helper methods.
Change-Id: I80ed20ca1f07a2bc6e779957a4ad0c713789c5be Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Matthias Sohn [Mon, 6 Dec 2010 23:45:58 +0000 (00:45 +0100)]
Provide file utilities for file deletion
Provide file helper methods in a reusable utility class to
replace many local implementations. java.io.File has some
methods reporting failure by returning false. We prefer to
throw IOException on failure so that callers can't forget
checking the return value.
Change-Id: I430c77b5d2cffcf8b47584326ad4817a7291845e Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Jens Baumgart [Mon, 6 Dec 2010 12:40:07 +0000 (13:40 +0100)]
LockFile.commit: retry renaming
Currently the following can happen in LockFile.commit: deletion of the
original file succeeds but renaming fails afterwards. In this case the
original file (e.g. branch file in refs/heads) is lost.
To workaround the issue the same retry logic as for file deletion is
applied to file renaming.
Shawn O. Pearce [Sat, 4 Dec 2010 00:14:46 +0000 (16:14 -0800)]
Abstract SSH setup to support GIT_SSH
In order to honor GIT_SSH the TransportGitSsh class needs to run the
process named by the GIT_SSH environment variable and use that as the
pipes for connectivity to the remote peer. Refactor the current
transport code to support a different type of pipe connectivity, so we
can later add GIT_SSH.
Bug: 321062
Change-Id: I9d8ee1a95f1bac5013b33a4a42dcf1f98f92172f Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Shawn O. Pearce [Fri, 3 Dec 2010 20:38:31 +0000 (12:38 -0800)]
Remove result id from CommitBuilder, TagBuilder
These objects don't need to be updated with the resulting ObjectId of
the formatted content, callers can get that from the ObjectInserter on
their own.
Change-Id: Idc5f097de9f7beafc5e54e597383d82daf9d7db4 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Reviewed-by: Chris Aniszczyk <caniszczyk@gmail.com>
When in OURS and THEIRS a new file is created we want a conflict
when the two contents differ. If on two branches the same file
with the same content is created this should not be a conflict.
But: the current merge algorithm is throwing NPEs in this case.
Fix this by choosing an empty RawText as common base if the
base is empty.
Change-Id: I21cb23f852965b82fb82ccd66ec961c7edb3ac3d Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Shawn O. Pearce [Wed, 1 Dec 2010 17:59:55 +0000 (09:59 -0800)]
Avoid unnecessary decoding of length in PackFile
If the object type is a whole object and all we want is the type,
there is no need to skip the length header. The type is already known
and can be returned as-is. Instead skip the length header only for
the two delta formats, where the delta base must itself be scanned.
Change-Id: I87029258e88924b3e5850bdd6c9006a366191d10 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Shawn O. Pearce [Wed, 1 Dec 2010 17:57:16 +0000 (09:57 -0800)]
Remove unused 'shift' variable from PackFile
This variable was not used for anything, but Eclipse's JDT failed to
notice because of the "shift += " operation within the body of the
while loop. Here we don't need the shift because we do not decode the
length, but we do have to skip over the bytes that store the length to
locate the delta base.
Mathias Kinzler [Wed, 1 Dec 2010 14:10:13 +0000 (15:10 +0100)]
Rebase Interoperability second part: fix "pop steps"
If the CLI stops a rebase upon conflict, the current
step is already popped from the git-rebase-todo and appended to the
"done" file. The current implementation wrongly pops the step only
after successful cherry-pick.
Stefan Lay [Tue, 30 Nov 2010 10:33:54 +0000 (11:33 +0100)]
Include list of assume unchanged files in IndexDiff
The IndexDiff had not collected the info if the flag
"assume-unchanged" is set. This information is useful for clients
which may want to decide if specific actions are allowed on a file.
Bug: 326213
Change-Id: I14bb7b03247d6c0b429a9d8d3f6b10f21d8ddeb1 Signed-off-by: Stefan Lay <stefan.lay@sap.com>
Marc Strapetz [Fri, 26 Nov 2010 10:07:04 +0000 (11:07 +0100)]
Fix DiffConfig to understand "copy" resp. "copies" for diff.renames property.
Rename detection should be considered enabled if
diff.renames config property is set to "copy" or "copies", instead of
throwing IllegalArgumentException.
Change default diff algorithm to histogram and add tests
The referenced bug showed that JGit produced different merge results
compared to C Git. Unit test was added to reproduce the issue. The
problem can be solved by switching to histogram diff algorithm.
Bug: 331078
Change-Id: I54f30afb3a9fef1dbca365ca5f98f4cc846092e3 Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Philipp Thun <philipp.thun@sap.com>
The diff algorithm which is used by Merge, Cherry-Pick, Rebase
should be configurable. A new configuration parameter "diff.algorithm"
is introduced which currently accepts the values "myers" or
"histogram". Based on this parameter for example the ResolveMerger
will choose a diff algorithm. The reason for this is bug 331078.
This bug shows that JGit is more compatible with C Git when
histogram diff is in place. But since histogram diff is quite new we
need an easy way to fall back to Myers diff.
Bug: 331078
Change-Id: I2549c992e478d991c61c9508ad826d1a9e539ae3 Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Philipp Thun <philipp.thun@sap.com>
Fix bug regarding handling of non-versioned files during merge
There was a bug introduced by commit 0e815fe. For non-versioned files
the merge algorithm detected an incoming deletion from THEIRS.
Consequently such files were deleted. That's a severe bug which was
fixed by more precisely detecting incoming deletions.
Change-Id: I4385d3c990db11d62e371a385dc8ee89841db84a Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Philipp Thun <philipp.thun@sap.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Mathias Kinzler [Mon, 22 Nov 2010 15:26:00 +0000 (16:26 +0100)]
Initial implementation of a Rebase command
This is a first iteration to implement Rebase. At the moment, this
does not implement --continue and --skip, so if the first
conflict is found, the only option is to --abort the command.
Shawn O. Pearce [Fri, 19 Nov 2010 01:04:10 +0000 (17:04 -0800)]
Move WorkingTreeIterator inherited state into an object
Instead of copying up to 4 fields from the parent iterator each time a
child iterator is initialized and used, construct a single state
object that contains the 4 fields, and pass that one state object
through to the child. This makes it easier to add additional state
fields that must be inherited, at the slight expense of an extra
object allocation per TreeWalk, and an extra level of field
indirection whenever the options, nameEncoder, or read buffer is
required by the iterator.
Change-Id: Ic4603c33b772d7a45f9c81140537d51945688fcb Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Shawn O. Pearce [Fri, 19 Nov 2010 00:50:14 +0000 (16:50 -0800)]
Name TreeFilter and MergeFilter implementations
Naming these inner classes ensures that stack traces which contain
them will give us useful information about which filter is involved in
the trace, rather than the generated names $1, $2, etc. This makes it
much easier to understand a stack trace at a glance.
Change-Id: Ia6a75fdb382ff6461e02054d94baf011bdeee5aa Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
In the case where DirCacheCheckout was used to checkout a tree
without taking HEAD into account (e.g. during a clone or hard reset)
we didn't handle conflicts correctly. E.g. if there are conflicts
(entries with stage != 0) in the index and we tried to hard reset
we have been processing the conflicting pathes multiple times (once
for every stage). With this fix we will update the index with the
entry from the "merge" state (the state we want checkout) when we
detect existing conflicts.
Change-Id: Iffbddccaa588cf0d1460a5e44dabaf540d996e26 Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Shawn O. Pearce [Sat, 13 Nov 2010 00:15:43 +0000 (16:15 -0800)]
Merge branch 'rename-detection'
* rename-detection:
RenameDetector: Only scan deletes if adds exist
SimilarityRenameDetector: Initialize sizes to 0
SimilarityRenameDetector: Avoid allocating source index
SimilarityRenameDetector: Only attempt to index large files once
SimilarityIndex: Don't overflow internal counter fields
SimilarityIndex: Accept files larger than 8 MB
SimilarityIndex: Correct comment explaining the logic
Shawn O. Pearce [Sat, 13 Nov 2010 00:12:27 +0000 (16:12 -0800)]
Merge branch 'fs-fsync'
* fs-fsync:
Remove unnecessary flush calls from LockFile
Remove unnecessary region locking from LockFile
Support core.fsyncRefFiles option
Support core.fsyncObjectFiles option
Simplify LockFile write(ObjectId) case
Shawn O. Pearce [Fri, 12 Nov 2010 23:11:30 +0000 (15:11 -0800)]
Base64: Reformat to match JGit style
Rewrite the initialization of the encoding tables to be more clear,
but slightly slower to setup. We generally perfer a clear definition
of the data over a slightly slower class load time.
Change-Id: I0c7f89b6ab82dcf71525ffb69a388c312c195913 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Shawn O. Pearce [Fri, 12 Nov 2010 22:51:06 +0000 (14:51 -0800)]
Base64: Strip out code JGit doesn't use
Since we have already modified this class to localize an error
message, we might as well strip it down to contain only the
functionality we need, or might ever use.
To keep this simple to review we don't adjust formatting right
away, so code that was buried inside of an if or else block whose
condition was removed might not have the correct indentation anymore.
We can fix this with a later reformatting change.
Change-Id: I2996aaa704e9d6182e5500c7a63240d5e9d722cc Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
When DirCacheCheckout should be used to checkout only one
tree (reset --hard, clone) then we had to use the standard
constructor and specify null as value for head. This change
adds explicit constructors not taking HEAD and documents
that.
Bug: 330021 Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Shawn O. Pearce [Fri, 5 Nov 2010 02:01:51 +0000 (19:01 -0700)]
Remove unnecessary note fanout when removing notes
Fanout level notes trees are combined back together into a flat leaf
level tree if during a removal of a subtree there are less than 3/4 of
the fanout subtrees still existing, and the size of the combined leaf
is under the 256 split limit noted above.
This rule is used because deletes are less common than insertions, and
SHA-1's relatively uniform distribution suggests that with only 192
subtrees existing in the fanout, there should be approximately 192
names in the combined replacement leaf tree.
Change-Id: Ia9d145ffd5454982509fc40906bc4dbbf2b13952 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Shawn O. Pearce [Fri, 5 Nov 2010 01:56:40 +0000 (18:56 -0700)]
Split note leaf buckets at 256 elements
Leaf level notes trees are split into a new fan-out tree if an
insertion occurs and the tree already contains >= 256 notes in it.
The splitting may occur multiple times if all of the notes have the
same prefix; in the worst case this produces a tree path such as
"00/00/00/00/00/00/00/00/00/00/00/00/00/00/00/00/00/00/00/be" if all
of the notes begin with zeros.
Change-Id: I2d7d98f35108def9ec49936ddbdc34b13822a3c7 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Shawn O. Pearce [Fri, 5 Nov 2010 01:46:19 +0000 (18:46 -0700)]
Add internal API for note iteration
Some algorithms need to be able to iterate through all notes within a
particular bucket, such as when splitting or combining a bucket.
Exposing an Iterator<Note> makes this traversal possible.
For a LeafBucket the iteration is simple, its over the sorted array of
elements. For FanoutBucket its a bit more complex as the iteration
needs to union the iterators of each fanout bucket, lazily loading any
buckets that aren't already in-memory.
Change-Id: I3d5279b11984f44dcf0ddb14a82a4b4e51d4632d Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Shawn O. Pearce [Fri, 5 Nov 2010 00:50:48 +0000 (17:50 -0700)]
Add in-memory updating support to NoteMap
NoteMap now supports editing in-memory, allowing applications to
modify the NoteMap once it has been loaded from the branch. The
ability to write the branch back to tree objects is not yet done,
so the edits are strictly transient.
Change-Id: I63448954abfca2a8e3e95369cd84c0d1176cdb79 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Shawn O. Pearce [Thu, 11 Nov 2010 01:28:14 +0000 (17:28 -0800)]
Remove unnecessary region locking from LockFile
The lock file protocol relies on the atomic creation of a standardized
name in the parent directory of the file being updated. Since the
creation is atomic, at most one thread in any process can succeed on
this creation, and all others will fail. While the lock file exists,
that file is private to the thread that is writing it, and no others
will attempt to read or modify the file.
Consequently the use of the region level locks around the file are
unnecessary, and may actually reduce performance when using NFS, SMB,
or some other sort of remote filesystem that supports locking.
Change-Id: Ice312b6fb4fdf9d36c734c3624c6d0537903913b Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Shawn O. Pearce [Thu, 11 Nov 2010 01:24:16 +0000 (17:24 -0800)]
Support core.fsyncRefFiles option
If core.fsyncRefFiles is set to true, fsync is used whenever a
reference file is updated, ensuring the file contents are also
written to disk. This can help to prevent empty ref files after
a system crash when using a filesystem such as HFS+ where data
writes may be delayed.
Change-Id: Ie508a974da50f63b0409c38afe68772322dc19f1 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Shawn O. Pearce [Wed, 10 Nov 2010 02:58:36 +0000 (18:58 -0800)]
Support core.fsyncObjectFiles option
Some repositories may be on really unstable filesystems, but still
want to have good reliability when objects are written to disk. If
core.fsyncObjectFiles is set to true, request the JVM to ensure the
data is written before returning success to the caller of insert.
The option defaults to false because it should be useless on any
filesystem that orders writes and metadata, such as ext3 mounted with
data=ordered (or data=journal). But it may be useful on some systems
(especially HFS+) where file content may flush to the disk
independently of filesystem structure changes.
Because FileChannel.force(boolean) only claims to ensure data is
written if it was written using the write(ByteBuffer) method of
FileChannel, redirect all writes when using fsyncObjectFiles to go
through the FileChannel interface instead of through the older style
OutputStream interface. This may not be necessary on all JVMs, but
its more portable to follow the definition than the common behavior.
Change-Id: I57f6b6bb7e403c07fbae989dbf3758eaf5edbc78 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Shawn O. Pearce [Thu, 11 Nov 2010 22:43:22 +0000 (14:43 -0800)]
SimilarityRenameDetector: Initialize sizes to 0
Setting the array elements to -1 is more expensive than relying on
the allocator to zero the array for us first. Shifting the code to
always add 1 to the size (so an empty file is actually 1 byte long)
allows us to detect an unloaded size by comparing to 0, thus saving
the array fill calls.
Change-Id: Iad859e910655675b53ba70de8e6fceaef7cfcdd1 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>