Change-Id: I22fc46dff6cc5dfd975f6e82161d265781778cde
Signed-off-by: Stefan Lay <stefan.lay@sap.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Add private methods which are used for reading and writing MERGE_HEAD
and CHERRY_PICK_HEAD files, as suggested in the comments on change
I947967fdc2f1d55016c95106b104c2afcc9797a1.
Change-Id: If4617a05ee57054b8b1fcba36a06a641340ecc0e
Signed-off-by: Robin Stocker <robin@nibor.org>
Add handling of CHERRY_PICK_HEAD file in .git (similar to MERGE_HEAD),
which is written in case of a conflicting cherry-pick merge.
It is used so that Repository.getRepositoryState can return the new
states CHERRY_PICKING and CHERRY_PICKING_RESOLVED. These states, as well
as CHERRY_PICK_HEAD can be used in EGit to properly show the merge tool.
Also, in case of a conflict, MERGE_MSG is written with the original
commit message and a "Conflicts" section appended. This way, the
cherry-picked message is not lost and can later be re-used in the commit
dialog.
Bug: 339092
Change-Id: I947967fdc2f1d55016c95106b104c2afcc9797a1
Signed-off-by: Robin Stocker <robin@nibor.org>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
There should be a way to explictly refresh the refs cached in the
RefDirectory. Since commit c261b28 (use of FileSnapshot) this is
not needed anymore for storage in the filesystem. But for DHT based
storage an explicit refresh may be needed.
Change-Id: I7d30c3496c05e1fb6e9519f3af9f23c6adb93bf9
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Detaching HEAD didn't work in some corner checkout cases. If, for example,
HEAD is symbolic ref to refs/heads/master, refs/heads/master is ref to commit
c0ffee... then:
checkout c0ffee...
would leave the HEAD unchanged.
The same symptom occurs when checking out a remote tracking branch or a tag
that references the same commit as refs/heads/master.
In the above case, the RefUpdate class didn't have enough information to decide
if the update needed to detach symbolic ref because it dealt only with new/old
objectIDs. Therefore, this fix introduced the RefUpdate.detachingSymbolicRef
flag.
Bug: 315166
Change-Id: I085c98b77ea8f9104a213978ea0d4ac6fd58f49b
Signed-off-by: Sasa Zivkov <sasa.zivkov@sap.com>
This enables applications to differentiate between explicitly set
configuration parameters and best effort attempts to guess these
parameters from the operating system.
Change-Id: I67cc4099238a40c6dca795e64f0155ced6008ef1
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
ObjectIdOwnerMap: More lightweight map for ObjectIds
OwnerMap is about 200 ms faster than SubclassMap, more friendly to the
GC, and uses less storage: testing the "Counting objects" part of
PackWriter on 1886362 objects:
ObjectIdSubclassMap:
load factor 50%
table: 4194304 (wasted 2307942)
ms spent 36998 36009 34795 34703 34941 35070 34284 34511 34638 34256
ms avg 34800 (last 9 runs)
ObjectIdOwnerMap:
load factor 100%
table: 2097152 (wasted 210790)
directory: 1024
ms spent 36842 35112 34922 34703 34580 34782 34165 34662 34314 34140
ms avg 34597 (last 9 runs)
The major difference with OwnerMap is entries must extend from
ObjectIdOwnerMap.Entry, where the OwnerMap has injected its own
private "next" field into each object. This allows the OwnerMap to use
a singly linked list for chaining collisions within a bucket. By
putting collisions in a linked list, we gain the entire table back for
the SHA-1 bits to index their own "private" slot.
Unfortunately this means that each object can appear in at most ONE
OwnerMap, as there is only one "next" field within the object instance
to thread into the map. For types that are very object map heavy like
RevWalk (entity RevObject) and PackWriter (entity ObjectToPack) this
is sufficient, these entity types are only put into one map by their
container. By introducing a new map type, we don't break existing
applications that might be trying to use ObjectIdSubclassMap to track
RevCommits they obtained from a RevWalk.
The OwnerMap uses less memory. Each object uses 1 reference more (so
we're up 1,886,362 references), but the table is 1/2 the size (2^20
rather than 2^21). The table itself wastes only 210,790 slots, rather
than 2,307,942. So OwnerMap is wasting 200k fewer references.
OwnerMap is more friendly to the GC, because it hardly ever generates
garbage. As the map reaches its 100% load factor target, it doubles in
size by allocating additional segment arrays of 2048 entries. (So the
first grow allocates 1 segment, second 2 segments, third 4 segments,
etc.) These segments are hooked into the pre-allocated directory of
1024 spaces. This permits the map to grow to 2 million objects before
the directory itself has to grow. By using segments of 2048 entries,
we are asking the GC to acquire 8,204 bytes in a 32 bit JVM. This is
easier to satisfy then 2,307,942 bytes (for the 512k table that is
just an intermediate step in the SubclassMap). By reusing the
previously allocated segments (they are re-hashed in-place) we don't
release any memory during a table grow.
When the directory grows, it does so by discarding the old one and
using one that is 4x larger (so the directory goes to 4096 entries on
its first grow). A directory of size 4096 can handle up to 8 millon
objects. The second directory grow (16384) goes to 33 million objects.
At that point we're starting to really push the limits of the JVM
heap, but at least its many small arrays. Previously SubclassMap would
need a table of 67108864 entries to handle that object count, which
needs a single contiguous allocation of 256 MiB. That's hard to come
by in a 32 bit JVM. Instead OwnerMap uses 8192 arrays of about 8 KiB
each. This is much easier to fit into a fragmented heap.
Change-Id: Ia4acf5cfbf7e9b71bc7faa0db9060f6a969c0c50
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
ObjectIdSubclassMap: Micro-optimize wrapping at end of table
During a review of the class, Josh Bloch pointed out we can use
"i = (i + 1) & mask" to wrap around at the end of the table, instead
of a conditional with a branch. This is generally faster due to one
less branch that will be mis-predicted by the CPU.
Change-Id: Ic88c00455ebc6adde9708563a6ad4d0377442bba
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
ObjectIdSubclassMap: Avoid field loads in inner loops
Ensure the JIT knows the table cannot be changed during the critical
inner loop of get() or insert() by loading the field into a final
local variable. This shouldn't be necessary, but the instance member
is declared non-final (to resizing) and it is not very obvious to the
JIT that the table cannot be modified by AnyObjectId.equals().
Simplify the JIT's decision making by making it obvious, these
values cannot change during the critical inner loop, allowing
for better register allocation.
Change-Id: I0d797533fc5327366f1207b0937c406f02cdaab3
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This method is trivial in definition, and is called in only 3
places. Inline the method manually to ensure its really going
to be inlined by the JIT at runtime.
Change-Id: I128522af8167c07d2de6cc210573599038871dda
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
32 is way to small for the map. Most applications using the map
will need to load more than 16 objects just from the root refs
being read from the Repository.
Default the initial size to 2048. This cuts out 6 expansions in
the early life of the table, reducing garbage and rehashing time.
Change-Id: I6dd076ebc0b284f1755855d383b79535604ac547
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
If the table needs to be grown, do it before the current insertion
rather than after. This is a tiny micro-optimization that allows
the compiler to reuse the result of "++size" to compare against
previously pre-computed size at which the table should rehash itself.
Change-Id: Ief6f81b91c10ed433d67e0182f558ca70d58a2b0
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
ObjectIdSubclassMap: Use & rather than % for hashing
Bitwise and is faster than integer modulus operations, and since
the table size is always a power of 2, this is simple to use for
index operation.
Change-Id: I83d01e5c74fd9e910c633a98ea6f90b59092ba29
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
obj_hash doesn't match our naming conventions, camelCaseNames
are the preferred format.
Change-Id: I72da199daccb60a98d17b6af1e498189bf149515
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
A standard HashSet was being used to store the list of subsections as
they were being parsed. This was changed to use a LinkedHashSet so
that iterating over the set would return values in the same order as
they are listed in the config file.
Change-Id: I4251f95b8fe0ad59b07ff563c9ebb468f996c37d
[findbugs] Avoid futile attempt to change max pool size
Javadoc for ScheduledThreadPoolExecutor says [1]:
While ScheduledThreadPoolExecutor inherits from ThreadPoolExecutor, a
few of the inherited tuning methods are not useful for it. In
particular, because it acts as a fixed-sized pool using corePoolSize
threads and an unbounded queue, adjustments to maximumPoolSize have no
useful effect.
[1]
http://download.oracle.com/javase/6/docs/api/java/util/concurrent/ScheduledThreadPoolExecutor.html
Change-Id: I8eccb7d6544aa6e27f5fa064c19dddb2a706523f
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
The mapTree() routines have been deprecated for a long time, and their
sibilings for mapCommit() and mapTag() were already removed from the
main Repository API.
Remove mapTree(). Application callers who only need the tree's name
can use resolve("^{tree}") syntax to resolve to the tree ObjectId, or
fail if the input is not a tree.
Applications that want to read a tree should use DirCache or TreeWalk.
Change-Id: I85726413790fc87721271c482f6636f81baf8b82
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This type and its associated methods has been deprecated for a while
now. Time to remove it. Applications can use a TreeWalk instead to
access the elements of any tree-like object.
Change-Id: I047e552ac77b77e2de086f63cb4fb318da57c208
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This interface has been deprecated for a while now.
Applications can use a TreeWalk instead.
Change-Id: I751d6e919e4b501c36fc36e5f816b8a8c5379cb9
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This has been deprecated for some time now. Applications should
instead use DirCache within a TreeWalk.
Change-Id: I8099d93f07139c33fe09bdeef8d739782397da17
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This class has been deprecated for a long time now.
Time to remove it. Applications can use the newer
DirCache.writeTree() as a replacement.
Change-Id: I91dc9507668d8a3ecadd6acd4f1c8b7bd7760cc3
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This class has been deprecated for a long time now.
Time to remove it. Applications can use the newer
DirCacheCheckout class as a replacement.
Change-Id: Id66d29fcca5a7286b8f8838303d83f40898918d2
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This interface has been deprecated for a long time now.
Time to remove it.
Change-Id: I29a938657e4637b2a9d0561940b38d70866613f7
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
resolve(): Fix wrong parsing of branch "foo-gbed2-dev"
When parsing a string such as "foo-gbed2" resolve() was assuming the
suffix was from git describe output. This lead to JGit trying to find
the completion for the object abbreviation "bed2", rather than using
the current value of the reference. If there was only one such object
in the repository, JGit might actually use the wrong value here, as
resolve() would return the completion of the abbreviation "bed2"
rather than the current value of the reference "refs/heads/foo-gbed2".
Move the parsing of git describe abbreviations out of the operator
portion of the resolve() method and into the simple portion that is
supposed to handle only object ids or reference names, and only do the
describe parsing after all other approaches have already failed to
provide a resolution.
Add new unit tests to verify the behavior is as expected by users.
Bug: 338839
Change-Id: I52054d7b89628700c730f9a4bd7743b16b9042a9
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The alarm queue threads were started with an empty task body, which
meant the thread started and terminated immediately, leaving the
queue itself with no worker.
Change-Id: I2a9b5fe9c2bdff4a5e0f7ec7ad41a54b41a4ddd6
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
ProgressMonitor: Refactor to use background alarms
Instead of polling the system clock on every update(1) method call,
use a scheduled executor to toggle a volatile once per second until
the task is done. Check the volatile on each update(int), looking
to see if output should occur.
This limits progress output to either once per 1% complete, or once
per second. To save time during update calls the timer isn't reset
during each 1% of output, which means we may see one unnecessary
output trigger if at least 1% completed during the one second of the
alarm time.
Change-Id: I8fdd7e31c37bef39a5d1b3da7105da0ef879eb84
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Respect core.excludesfile to enable global ignore rules
Also use FS.resolve() to properly resolve files from path strings.
Bug: 328428 (partial fix)
Change-Id: I41d94694f220dcb85605c9acadfffb1fa23beaeb
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Support for --no-standard-notes and --show-notes=REF options is added
to the Log command. The --show-notes option can be specified more than
once if more than one notes branch should be used for showing notes.
The notes are displayed from note branches in the order how the note
branches are specified in the command line. However, the standard note,
from the refs/notes/commits, is always displayed as first unless
the --no-standard-notes options is given.
Change-Id: I4e7940804ed9d388b625b8e8a8e25bfcf5ee15a6
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
RepositoryBuilder: Allow callers to require repository exists
The setMustExist() method allows callers to require the repository
exists in order for build() to succeed. This is useful within a
RepositoryResolver where existence is required.
Change-Id: I6a1154551435cf0da6c2b4a7f4dce266abea5dff
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
The new addIfAbsent() method combines get() with add(), but does
it in a single step so that the common case of get() returning null
for a new object can immediately insert the object into the map.
Change-Id: Ib599ab4de13ad67665ccfccf3ece52ba3222bcba
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Revert "Teach PackWriter how to reuse an existing object list"
This reverts commit f5fe2dca3c.
I regret adding this feature to the public API. Caches aren't always
the best idea, as they require work to maintain. Here the cache is
redundant information that must be computed, and when it grows stale
must be removed. The redundant information takes up more disk space,
about the same size as the pack-*.idx files are. For the linux-2.6
repository, that's more than 40 MB for a 400 MB repository. So the
cache is a 10% increase in disk usage.
The entire point of this cache is to improve PackWriter performance,
and only PackWriter performance, and only when sending an initial
clone to a new client. There may be better ways to optimize this, and
until we have a solid solution, we shouldn't be using a separate cache
in JGit.
[findbugs] Do not ignore exceptional return value of mkdir
java.io.File.mkdir() and mkdirs() report failure as an exceptional
return value false. Fix the code which silently ignored this
exceptional return value.
Change-Id: I41244f4b9d66176e68e2c07e2329cf08492f8619
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Teach PackWriter how to reuse an existing object list
Counting the objects needed for packing is the most expensive part of
an UploadPack request that has no uninteresting objects (otherwise
known as an initial clone). During this phase the PackWriter is
enumerating the entire set of objects in this repository, so they can
be sent to the client for their new clone.
Allow the ObjectReader (and therefore the underlying storage system)
to keep a cached list of all reachable objects from a small number of
points in the project's history. If one of those points is reached
during enumeration of the commit graph, most objects are obtained from
the cached list instead of direct traversal.
PackWriter uses the list by discarding the current object lists and
restarting a traversal from all refs but marking the object list name
as uninteresting. This allows PackWriter to enumerate all objects
that are more recent than the list creation, or that were on side
branches that the list does not include.
However, ObjectWalk tags all of the trees and commits within the list
commit as UNINTERESTING, which would normally cause PackWriter to
construct a thin pack that excludes these objects. To avoid that,
addObject() was refactored to allow this list-based enumeration to
always include an object, even if it has been tagged UNINTERESTING by
the ObjectWalk. This implies the list-based enumeration may only be
used for initial clones, where all objects are being sent.
The UNINTERESTING labeling occurs because StartGenerator always
enables the BoundaryGenerator if the walker is an ObjectWalk and a
commit was marked UNINTERESTING, even if RevSort.BOUNDARY was not
enabled. This is the default reasonable behavior for an ObjectWalk,
but isn't desired here in PackWriter with the list-based enumeration.
Rather than trying to change all of this behavior, PackWriter works
around it.
Because the list name commit's immediate files and trees were all
enumerated before the list enumeration itself starts (and are also
within the list itself) PackWriter runs the risk of adding the same
objects to its ObjectIdSubclassMap twice. Since this breaks the
internal map data structure (and also may cause the object to transmit
twice), PackWriter needs to use a new "added" RevFlag to track whether
or not an object has been put into the outgoing list yet.
Change-Id: Ie99ed4d969a6bb20cc2528ac6b8fb91043cee071
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Remove getObjectsDirectory, openPack from base API
These two methods are specific to the FileRepository implementation
and should not be exposed as part of the base Repository API. Now
that PackParser is generic and does not require these two methods
to import a pack stream into a repostiory, it is safe to remove
these and get them out of the public view.
Change-Id: I8990004d08074657f467849dabfdaa7e6674e69a
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Refactor IndexPack to not require local filesystem
By moving the logic that parses a pack stream from the network (or
a bundle) into a type that can be constructed by an ObjectInserter,
repository implementations have a chance to inject their own logic
for storing object data received into the destination repository.
The API isn't completely generic yet, there are still quite a few
assumptions that the PackParser subclass is storing the data onto
the local filesystem as a single file. But its about the simplest
split of IndexPack I can come up with without completely ripping
the code apart.
Change-Id: I5b167c9cc6d7a7c56d0197c62c0fd0036a83ec6c
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Change-Id: I4f05bdb0c58b039bd379341a6093f06a2cdfec6e
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
ConfigConstants: expose some constants for user name and email.
This is needed by a EGit change
http://egit.eclipse.org/r/#change,2232
Change-Id: I3d62f904b769fc2f1b7b8f0f24f7dd757fc9c379
Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>
Merging Git notes branches has several differences from merging "normal"
branches. Although Git notes are initially stored as one flat tree the
tree may fanout when the number of notes becomes too large for efficient
access. In this case the first two hex digits of the note name will be
used as a subdirectory name and the rest 38 hex digits as the file name
under that directory. Similarly, when number of notes decreases a fanout
tree may collapse back into a flat tree. The Git notes merge algorithm
must take into account possibly different tree structures in different
note branches and must properly match them against each other.
Any conflict on a Git note is, by default, resolved by concatenating
the two conflicting versions of the note. A delete-edit conflict is, by
default, resolved by keeping the edit version.
The note merge logic is pluggable and the caller may provide custom
note merger that will perform different merging strategy.
Additionally, it is possible to have non-note entries inside a notes
tree. The merge algorithm must also take this fact into account and
will try to merge such non-note entries. However, in case of any merge
conflicts the merge operation will fail. Git notes merge algorithm is
currently not trying to do content merge of non-note entries.
Thanks to Shawn Pearce for patiently answering my questions related to
this topic, giving hints and providing code snippets.
Change-Id: I3b2335c76c766fd7ea25752e54087f9b19d69c88
Signed-off-by: Sasa Zivkov <sasa.zivkov@sap.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Config: Preserve existing case of names in sections
When an application asks for the names in a section, it may want to
see the existing case that was stored by the user. For example,
Gerrit Code Review wants to store a configuration block like:
[access "refs/heads/master"]
label-Code-Review = group Developers
and although the name label-Code-Review is case-insensitive, it wants
to display the case as it appeared in the configuration file.
When enumerating section names or variable names (both of which are
case-insensitive), Config now keeps track of the string that first
appeared, and presents them in file order, permitting applications to
use this information. To maintain case-insensitive behavior, the
contains() method of the returned Set<String> still performs a
case-insensitive compare.
This is a behavior change if the caller enumerates the returned
Set<String> and copies it to his own Set<String>, and then performs
contains() tests against that, as the strings are now the original
case from the configuration block. But I don't think anyone actually
does this, as the returned sets are immutable and are cached.
Change-Id: Ie4e060ef7772958b2062679e462c34c506371740
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Add a performance optimized variant of the ANY_DIFF filter
If a treewalk walks also over index and the workingtree then the
IndexDiffFilter filter can be used which works much faster then
the semantically equivalent ANY_DIFF filter. This is because this
filter can better avoid computing SHA-1 ids over the content of
working-tree files which is very costly.
This fix will significantly improve the performance of e.g.
EGit's commit dialog.
Change-Id: I2a51816f4ed9df2900c6307a54cd09f50004266f
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Philipp Thun <philipp.thun@sap.com>
Instead of setting a boolean when a difference record is found, return
false from diff() only if all of the collections are empty. When all
of them are empty, no difference was found.
Change-Id: I555fef37adb764ce253481751071c53ad12cf416
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
IndexDiff: Use isModified() when comparing index-worktree
The isModified() is more efficient because it can skip over files that
are stat clean, without needing to scan them.
This is useful to efficently work on paths that were already staged
and thus differ between HEAD and the index, but not between the index
and the working tree.
Change-Id: I4418202e612f0571974e0898050d987c6c280966
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
IndexDiff: Clean up tree-index compare for staged files
When comparing the ObjectIds for two tree entries its faster
to use the raw buffer compares over allocating ObjectIds and
then performing equals on their contents.
However, this also needs to consider the raw modes. It is possible
for a path to change modes but not ObjectId (e.g. making a file
executable), and in this case its still a staged change to report back
to the caller.
Change-Id: I1a267254c04b3273a97f63c71d1e6718cd9d2fa8
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
If the caller really needs the list of files that are flagged as
assume-unchanged (aka assume-valid in the DirCache), we should give
them the complete list and not just those that we wrongly identified
as being modified during diff().
This change is necessary because diff() is slightly broken and is
discovering differences on files that it shouldn't have considered.
Change-Id: Ibe464c1a0e51c19dc287a4bc5348b7b07f4d840b
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>