Shawn Pearce [Fri, 8 Mar 2013 19:02:04 +0000 (11:02 -0800)]
Avoid repacking unreachable garbage in DfsGarbageCollector
If a repository has significant amounts of unreachable garbage the
final phase to coalesce it can take longer than any other part of the
garbage collection phase. Provide a setting for applications to tweak
the threshold where coalescing ends and files just remain on disk.
Robin Rosenberg [Wed, 20 Feb 2013 23:07:33 +0000 (00:07 +0100)]
Do not cherry-pick merge commits during rebase
Rebase computes the list of commits that are included in
the merges, just like Git does, so do not try to include
the merge commits. Re-recreating merges during rebase is
a bit more complicated and might be a useful future extension,
but for now just linearize during rebase.
Change-Id: I61239d265f395e5ead580df2528e46393dc6bdbd Signed-off-by: Robin Stocker <robin@nibor.org>
Robin Rosenberg [Wed, 20 Feb 2013 21:54:59 +0000 (22:54 +0100)]
Extend FileUtils.delete with option to delete empty directories only
The new option EMPTY_DIRECTORIES_ONLY will make delete() only delete
empty directories. Any attempt to delete files will fail. Can be
combined with RECURSIVE to wipe out entire tree structures and
IGNORE_ERRORS to silently ignore any files or non-empty directories.
Change-Id: Icaa9a30e5302ee5c0ba23daad11c7b93e26b7445 Signed-off-by: Robin Stocker <robin@nibor.org>
Shawn Pearce [Wed, 6 Mar 2013 20:48:25 +0000 (12:48 -0800)]
Do not attempt to read bitmap from invalid pack
If a pack file has been marked invalid due to a prior IOException
accessing its contents, do not offer its bitmap index to callers.
The pack cannot be used so its bitmap should be off limits from
any reader trying to work from a bitmap.
Enable writing pack indexes with bitmaps in the GC.
Update the dfs and file GC implementations to prepare and write
bitmaps on the packs that contain the full closure of the object
graph. Update the DfsPackDescription to include the index version.
Colby Ranger [Mon, 6 Aug 2012 18:09:53 +0000 (11:09 -0700)]
Support creating pack bitmap indexes in PackWriter.
Update the PackWriter to support writing out pack bitmap indexes,
a parallel ".bitmap" file to the ".pack" file.
Bitmaps are selected at commits every 1 to 5,000 commits for
each unique path from the start. The most recent 100 commits are
all bitmapped. The next 19,000 commits have a bitmaps every 100
commits. The remaining commits have a bitmap every 5,000 commits.
Commits with more than 1 parent are prefered over ones
with 1 or less. Furthermore, previously computed bitmaps are reused,
if the previous entry had the reuse flag set, which is set when the
bitmap was placed at the max allowed distance.
Bitmaps are used to speed up the counting phase when packing, for
requests that are not shallow. The PackWriterBitmapWalker uses
a RevFilter to proactively mark commits with RevFlag.SEEN, when
they appear in a bitmap. The walker produces the full closure
of reachable ObjectIds, given the collection of starting ObjectIds.
For fetch request, two ObjectWalks are executed to compute the
ObjectIds reachable from the haves and from the wants. The
ObjectIds needed to be written are determined by taking all the
resulting wants AND NOT the haves.
For clone requests, we get cached pack support for "free" since
it is possible to determine if all of the ObjectIds in a pack file
are included in the resulting list of ObjectIds to write.
On my machine, the best times for clones and fetches of the linux
kernel repository (with about 2.6M objects and 300K commits) are
tabulated below:
Colby Ranger [Tue, 28 Aug 2012 16:08:20 +0000 (09:08 -0700)]
Added read/write support for pack bitmap index.
A pack bitmap index is an additional index of compressed
bitmaps of the object graph. Furthermore, a logical API of the index
functionality is included, as it is expected to be used by the
PackWriter.
Compressed bitmaps are created using the javaewah library, which is a
word-aligned compressed variant of the Java bitset class based on
run-length encoding. The library only works with positive integer
values. Thus, the maximum number of ObjectIds in a pack file that
this index can currently support is limited to Integer.MAX_VALUE.
Every ObjectId is given an integer mapping. The integer is the
position of the ObjectId in the complete ObjectId list, sorted
by offset, for the pack file. That integer is what the bitmaps
use to reference the ObjectId. Currently, the new index format can
only be used with pack files that contain a complete closure of the
object graph e.g. the result of a garbage collection.
The index file includes four bitmaps for the Git object types i.e.
commits, trees, blobs, and tags. In addition, a collection of
bitmaps keyed by an ObjectId is also included. The bitmap for each entry
in the collection represents the full closure of ObjectIds reachable
from the keyed ObjectId (including the keyed ObjectId itself). The
bitmaps are further compressed by XORing the current bitmaps against
prior bitmaps in the index, and selecting the smallest representation.
The XOR'd bitmap and offset from the current entry to the position
of the bitmap to XOR against is the actual representation of the entry
in the index file. Each entry contains one byte, which is currently
used to note whether the bitmap should be blindly reused.
Colby Ranger [Mon, 27 Aug 2012 23:08:42 +0000 (16:08 -0700)]
Break the dependency on RevObject when creating a newObjectToPack().
Update the ObjectReuseAsIs API to support creating new
ObjectToPack with only the AnyObjectId and Git object type. This is
needed to support the future pack index bitmaps, which only contain
this information and do not want the overhead of creating a temporary
object for every ObjectId.
As you can see, the destination ref pattern has a superfluous slash.
It looks like this behaviour has always been the case for CloneCommand,
at least since cc2197ed when code catering to bare-clone fetch refspecs
was added. That was released with JGit v1.0 almost 2 years ago, so
there will probably be some bare repos in the wild which will have been
cloned with JGit and have these corrupted refspecs.
The effect of the corrupted fetch refspec is quite interesting. Up to
and including JGit 2.0, the corrupt refspec was tolerated and fetches
would work as intended with no indication to the user that anything was
amiss. With JGit 2.1, a change was introduced which made JGit less
tolerant, and fetches now attempt to update the non-existing ref
"refs/heads//master". No exception is raised, but the real ref -
"refs/heads/master" - is not updated.
This behaviour was noticed by a user of Agit (which does bare clones by
default and recently updated from JGit v2.0 to v2.2), reported here:
https://github.com/rtyley/agit/issues/92
If you run C-Git fetch on a bare-repo cloned by JGit, it flat-out
rejects the refspec (checked against v1.7.10.4):
Incidentally, C-Git does not create an explicit fetch refspec at all
when performing a bare clone - the full remote config generated by C-Git
looks like this:
Roberto Tyley [Fri, 1 Mar 2013 21:49:58 +0000 (21:49 +0000)]
Fix RefUpdate performance for existing Refs
No longer invoke the expensive RefDatabase.isNameConflicting() check on
updating existing refs, reducing batch ref update time by ~97%.
The RefDirectory implementation of isNameConflicting() is quite
slow (it has to do an expensive loose-ref scan) but it's only necessary
to perform this check on ref update if the ref is being *created* - if
the ref already exists, we can already guarantee that it does not
conflict with any other refs.
C-Git seems to use a similar condition before making the
is_refname_available() check:
The execution time for the run is completely dominated by the batch ref
update at the end. Repeating the experiment with BFG v1.0.2 (using JGit
patched with this change), the refs update is dramatically reduced:
Colby Ranger [Mon, 28 Jan 2013 19:49:01 +0000 (11:49 -0800)]
Include supported extensions in PackFile constructor.
Previously a PackFile class was assumed to only support a .pack and .idx
file. Update the constructor to enumerate the supported extensions for
the pack file. This will allow the bitmap code to only be executed if
the bitmap extension file is known to exist.
Gustaf Lundh [Fri, 1 Feb 2013 12:20:31 +0000 (13:20 +0100)]
Performance fixes in DateRevQueue
When a lot of commits are added to DateRevQueue, the
sort-on-insertion approach is very heavy on CPU cycles.
One approach to fix this was made by Dave Borowitz:
https://git.eclipse.org/r/#/c/5491/
But using Java's PriorityQueue seems to have brought some
extra overhead, and the desired performance could not be
reached.
This fix takes another approach to the insertion problem,
without changing the expected behaviour or bringing extra
memory overhead:
If we detect over 1000 commits in the DateRevQueue, a
"seek-index" is rebuilt every 1000th added commit.
The index keeps track of every 100th commit in the
DateRevQueue. During insertions, it will be used for a
preliminary scanning (binary search) of the queue, with
the intention of helping add() find a good starting point
to start walking from. After finding this starting point,
add() will step commit-by-commit until the correct
insertion place in the queue is found (today, the queue
is expected to be sorted at all times).
When applied to repositories with many refs, this approach
has proven to bring huge performance gains and scales quite
well.
For instance, in a repository with close to 80000 refs,
we could cut down the time a typical Gerrit replication
of 1 commit would take (just a push from JGit's point of
view) from 32sec down to 3.5sec.
Below you see some typical times to add a specific amount
of commits (with random commit times) to the DateRevQueue
and the difference the preliminary seek-index makes:
Commits | Index | No Index
1024 8ms 8ms
2048 13ms 9ms
4096 5ms 59ms
8192 11ms 595ms
16384 22ms 3058ms
32768 64ms 13811ms
65536 201ms 62677ms
131072 783ms 331585ms
Only one extra reference is needed for every 100 inserted
commits (and only when we see more than 1000 commits in
the queue), so the memory overhead should be negligible.
Various index-stepping values were tested, and 100 seemed to
scale very well and be effective from start.
In the future, it should probably be dynamic and based on
the number of refs in the queue, but this should serve well
as a starting point.
Note: While other fundamentally different data structures may
be more suitable, the DateRevQueue is extremely central to
many of the Git core operations. This approach was chosen,
since the effect of the patch is easy to predict in conjuction
with the current implementation. A totally new data structure
will make it harder to predict behaviour in many common and
uncommon cases (in terms of breaking ties, memory usage, cost
when using few elements, object creation/disposing overhead,
etc).
Matthias Sohn [Thu, 21 Feb 2013 01:34:17 +0000 (02:34 +0100)]
Merge branch 'stable-2.3'
* stable-2.3:
Prepare 2.3.2-SNAPSHOT builds
JGit v2.3.1.201302201838-r
Accept Change-Id even if footer contains not well-formed entries
Fix false positives in hashing used by PathFilterGroup
Change-Id: I5882aa3b482d6bcd40a45bed51e5ab03f018a5bc Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Stefan Lay [Fri, 15 Feb 2013 11:34:44 +0000 (12:34 +0100)]
Accept Change-Id even if footer contains not well-formed entries
Instead of only looking for a Change-Id in the last section if it
consists only of well-formed "key: value" lines replace the last
occurrence of a valid Change-Id line in the last section. Some tools
require footer lines e.g. without a colon.
Gerrit doesn't accept Change-Id lines in the footer if the Change-Id
line doesn't start at the beginning of the line.
Bug: 400818
Change-Id: Icce54872adc8c566994beea848448a2f7ca87085 Signed-off-by: Stefan Lay <stefan.lay@sap.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Matthias Sohn [Wed, 13 Feb 2013 22:12:03 +0000 (23:12 +0100)]
Merge branch 'stable-2.3'
* stable-2.3:
Prepare post 2.3.0.201302130906 builds
JGit v2.3.0.201302130906
Replace explicit version by property where possible
Add better documentation to DirCacheCheckout
Prepare post 2.3rc1 builds
JGit v2.3.0.201302060400-rc1
Change-Id: I5a7626014dc38e7623937a4241dbf8a6db05c1f9 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Jonathan Nieder [Wed, 13 Feb 2013 00:29:24 +0000 (16:29 -0800)]
Remove unused availableRefs local from Clone.guessHEAD
This variable has been populated and never used ever since it was
introduced in v0.4.9~336 (Add "jgit clone", 2008-12-23). Remove it
to make the function easier to understand.
Robin Rosenberg [Wed, 30 Jan 2013 20:50:22 +0000 (21:50 +0100)]
Implement core.checkstat = minimal
There is a huge performance issue when using both JGit (EGit) and Git
because JGit does not fill all dircache stat fields with the values Git
would expect. As a result thereof Git would typically revalidate a large
number of tracked files. This can take several minutes for large
repositories with many large files.
Since 1.8.2 Git will restrict stat checking to the size and whole second
part of the modification time stamp, if core.statinfo is set to
"minimal".
As JGit checks only size and modification time this is close to what
JGit already does. To make the match perfect ignore the sub-second part
of the modification time stamp if core.statinfo = minimal.
Robin Rosenberg [Thu, 24 Jan 2013 03:23:26 +0000 (04:23 +0100)]
Fix stash apply using merge logic
Instead of the complicated strange stuff, implement staah
apply as cherry-pick.
Provided there are no conflicts and it is requested that
the index should be applied, perform yet another cherry-pick,
but discard tha results thereof it that would result in conflicts.
Tobias Pfeifer [Thu, 17 Jan 2013 13:45:00 +0000 (14:45 +0100)]
Extract method to output the first header line of a git diff
In order to be able to determine the range of the first header line
(e.g. "diff --git a/file1 b/file2") in subclasses, the code that formats
the first header line is extracted.
Required by egit's change: Ia61398146c0336ab332234f24d341561292554db
Colby Ranger [Tue, 22 Jan 2013 22:54:05 +0000 (14:54 -0800)]
Reduce memory held and speed up DfsGarbageCollector.
getObjectList() returns a list of ObjectToPack. These can hold on to a
lot of memory. Furthermore, binary searching for objects in a sorted
array can be slow. Improve the speed and reduce the memory by creating a
copy of the ObjectId and inserting it into an ObjectIdOwnerMap.
Robin Stocker [Mon, 25 Jun 2012 22:55:39 +0000 (00:55 +0200)]
Enable marking entries using TreeFilters in DiffEntry
This adds a new optional TreeFilter[] argument to DiffEntry.scan. All
filters will be checked during the scan to determine if an entry should
be "marked" with regard to that filter.
After having called scan, the user can then call isMarked(int) on the
entries to find out whether they matched the TreeFilter with the passed
index.
An example use case for this is in the file diff viewer of EGit's
History view, where we'd like to highlight entries that are matching the
current filter.
See EGit change I03da4b38d1591495cb290909f0e4c6e52270e97f.
Bug: 393610
Change-Id: Icf911fe6fca131b2567514f54d66636a44561af1 Signed-off-by: Robin Stocker <robin@nibor.org> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Stefan Lay [Wed, 9 Jan 2013 13:02:51 +0000 (14:02 +0100)]
Add conflicts message before footer
In case of a conflict during cherry-pick or revert the commit message
was amended after the footer. This made the footer invalid. Many users
do not understand that they have to edit the commit message in order to
make it valid again.
Stefan Lay [Wed, 9 Jan 2013 13:02:13 +0000 (14:02 +0100)]
Only replace the ChangeId in the footer, not in the body
Additionally expose methods to find the first footer line and to find
the position of the ChangeId footer in the commit message in order to
enable reuse of these methods introduced for the fix.
Change-Id: I87ecca009ca3bff1ca0de3c436ebd95736bf5a10 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com> Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
Markus Duft [Tue, 12 Jun 2012 05:02:58 +0000 (07:02 +0200)]
Fix patch application WRT windows line endings.
Previously the result of an application would have been \r\r\n in the
case of windows line endings, as RawText does not touch the \r, and
ApplyCommand adds "\r\n" if this is the ending of the first line in the
target file. Only always adding \n should be ok, since \r\n would be the
result if the file and the patch include windows line endings.
Colby Ranger [Sat, 19 Jan 2013 00:22:10 +0000 (16:22 -0800)]
Update DfsGarbageCollector to not read back a pack index.
Previously, the Dfs GC excluded objects from packs by passing a
previously written index to the PackWriter. Reading back a file on
Dfs is slow. Instead, allow the PackWriter to expose the objects
included in a pack and forward that to invocations of excludeObjects() .
Tomasz Zarna [Thu, 17 Jan 2013 20:38:43 +0000 (21:38 +0100)]
Add additional FastForwardMode enums for different config contexts
FastForwardMode is represented by different strings depending on context
it is set or get from. E.g. FastForwardMode.FF_ONLY for
branch.<name>.mergeoptions is "--ff-only" but for merge.ff it is "only".
Colby Ranger [Thu, 17 Jan 2013 23:00:38 +0000 (15:00 -0800)]
Rename PackConstants to PackExt, a typed pack file extension.
PackConstants previously contained string values for the pack and pack
index extension. Change PackConstant to be PackExt, a typed wrapper
around the string pack file extension.
Robin Rosenberg [Thu, 3 Jan 2013 23:24:48 +0000 (00:24 +0100)]
Define a tree filter for user-visible changes between two indexes
The primary purpose of the filter is to detect an index change that
could possibly lead to a change in what files are visible in the staging
view and decorations. Besides what TreeFilter.ANY_DIFF does for trees in
general, this filter also looks at the assume-valid (CE_VALID) flag to
see whether changes should be ignored or not.
Colby Ranger [Tue, 15 Jan 2013 22:56:48 +0000 (14:56 -0800)]
Remove getReverseIndexSize() from DfsPackDescription.
The method is used in only one location (DfsPackFile). Furthermore,
PackIndex already does an explicit computation of the size in
DfsPackFile. Simplify the DfsPackDescription by removing the method
and do the calculation similar to PackIndex.
Colby Ranger [Tue, 15 Jan 2013 22:24:30 +0000 (14:24 -0800)]
Use file extension with DfsPackDescription get/set file size.
Previously the size getters and setters had explicit methods for index
and pack. Update the api to be based on the file extension. This will
make it possible to support other extensions in the future, such as
the forthcoming bitmap extensions.
Roberto Tyley [Tue, 25 Dec 2012 09:10:51 +0000 (09:10 +0000)]
Fix concurrent creation of fan-out object directories
If multiple threads attempted to insert loose objects into the same new
fan-out directory, the creation of that directory was subject to a race
condition that could lead to an unnecessary IOException being thrown -
because an inserter could not 'create' a directory that had just been
generated by a different thread. All we require is that the directory
does indeed *exist*, so not being able to _create_ it is not actually a
fatal problem. Setting 'skipExisting' to 'true' on the call to mkdir()
fixes the issue.
I found this issue as a real world occurrence while working on The BFG
Repo Cleaner (https://github.com/rtyley/bfg-repo-cleaner), a tool which
concurrently performs a lot of object creation.
In order to demonstrate the problem here I've added a small test case
which reliably reproduces the issue on the few different hardware
systems I've tried. The error thrown when the race-condition arises is
this:
java.io.IOException: Creating directory /home/roberto/repo.git/objects/e6 failed
at org.eclipse.jgit.util.FileUtils.mkdir(FileUtils.java:182)
at org.eclipse.jgit.storage.file.ObjectDirectory.insertUnpackedObject(ObjectDirectory.java:590)
at org.eclipse.jgit.storage.file.ObjectDirectoryInserter.insertOneObject(ObjectDirectoryInserter.java:113)
at org.eclipse.jgit.storage.file.ObjectDirectoryInserter.insert(ObjectDirectoryInserter.java:91)
at org.eclipse.jgit.lib.ObjectInserter.insert(ObjectInserter.java:329)
Markus Duft [Fri, 13 Jul 2012 06:25:25 +0000 (08:25 +0200)]
Improve handling of checkout conflicts
This converts a checkout conflict exception into a RebaseResult /
MergeResult containing the conflicting paths, which enables EGit (or
others) to handle the situation in a user-friendly way