CRLF only works for small files, where small is the size of the
buffer, i.e. about 8K. This QD fix reallocates the buffer to be
large enough.
Bug: 369780
Change-Id: Ifc34ad204fbf5986b257a5c616e4a8c601e8261a
Support gitdir references in working tree .git file
A '.git' file in a repository's working tree root is now parsed
as a ref to a folder located elsewhere. This supports submodules
having their repository location outside of the parent repository's
working directory such as in the parent repository's '.git/modules'
directory.
This adds support to BaseRepositoryBuilder for repositories created
with the '--separate-git-dir' option specified to 'git init'.
Change-Id: I73c538f6d845bdbc0c4e2bce5a77f900cf36e1a9
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Add BranchTrackingStatus for getting remote tracking status
This is used by EGit change I1e1caca561d1b0a0c194bfc42e64b698f42c6e6a to
show branch status in decoration.
It can also be used for providing the same output as C Git in "git
status".
Change-Id: I8d2b108c89905c3f0496f3d517879596740787c0
Signed-off-by: Robin Stocker <robin@nibor.org>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Add BranchConfig helper for access to branch config section
Getting the name of the remote-tracking branch given a branch is not so
easy to get right. This class provides a way to do that and could be
used for more branch config related things (e.g. in PullCommand).
Change-Id: I896a2384217936c8b672df8b81c9599f5c350458
Signed-off-by: Robin Stocker <robin@nibor.org>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Revision strings that end with a ':' with no trailing path
should return the tree associated with the current ref parsed
Bug: 368370
Change-Id: I7c7617a77bd418bad4e570be2d1e9002ad280762
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
Make sure all bytes are written to files on close, or get an error.
Java's BufferedOutputStream swallows any errors that occur when flushing
the buffer in close().
This class overrides close to make sure an error during the final
flush is reported back to the caller.
Change-Id: I74a82b31505fadf8378069c5f6554f1033c28f9b
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Adds the following commands:
- Add
- Init
- Status
- Sync
- Update
This also updates AddCommand so that file patterns added that
are submodules can be staged in the index.
Change-Id: Ie5112aa26430e5a2a3acd65a7b0e1d76067dc545
Signed-off-by: Kevin Sawicki <kevin@github.com>
Signed-off-by: Chris Aniszczyk <zx@twitter.com>
Revision strings such as 'master@{0}' can now be resolved
by Repository.resolve by reading the reflog for the ref and
returning the commit for the entry number specified.
This still throws an exception for cases not supported
such as 'master@{yesterday}'.
Change-Id: I6162777d6510e083565a77cac4545cda5a9aefb3
The method canAmend was added to RepositoryState. It returns true if
amending the HEAD commit is allowed in the current repository state.
Change-Id: Idd0c4eea83a23c41340789b7b877959b457d951e
Signed-off-by: Jens Baumgart <jens.baumgart@sap.com>
Add detection of untracked folders to IndexDiffFilter
Decorators need to know whether folders in the working tree contain only
untracked files. This change enhances IndexDiffFilter to report such
folders. This works only together with treewalks which operate in
default traversal mode. For treewalks which process entries in
postorder mode (files are walked before their parent folder is walked)
this detection doesn't work.
Bug: 359264
Change-Id: I9298d1e3ccac0aec8bbd4e8ac867bc06a5c89c9f
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Jens Baumgart <jens.baumgart@sap.com>
Signed-off-by: Chris Aniszczyk <zx@twitter.com>
A few places were still using GitIndex. Replacing it was fairly
simple, but there is a difference in test outcome in
ReadTreeTest.testUntrackedConflicts. I believe the new behavior
is good, since we do not update neither the index, not the worktree.
Change-Id: I4be5357b7b3139dded17f77e07a140addb213ea7
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
Deprecate GitIndex more by using only DirCache internally.
This includes merging ReadTreeTest into DirCacheCheckoutTest and
converting IndexDiffTest to use DirCache only. The GitIndex specific
T0007GitIndex test remains.
GitIndex is deprecated. Let us speed up its demise by focusing the
DirCacheCheckout tests to using DirCache instead.
This also add explicit deprecation comments to methods that depend
on GitIndex in Repository and TreeEntry. The latter is deprecated in
itself.
Change-Id: Id89262f7fbfee07871f444378f196ded444f2783
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
In practice the DHT storage layer has not been performing as well as
large scale server environments want to see from a Git server.
The performance of the DHT schema degrades rapidly as small changes
are pushed into the repository due to the chunk size being less than
1/3 of the pushed pack size. Small chunks cause poor prefetch
performance during reading, and require significantly longer prefetch
lists inside of the chunk meta field to work around the small size.
The DHT code is very complex (>17,000 lines of code) and is very
sensitive to the underlying database round-trip time, as well as the
way objects were written into the pack stream that was chunked and
stored on the database. A poor pack layout (from any version of C Git
prior to Junio reworking it) can cause the DHT code to be unable to
enumerate the objects of the linux-2.6 repository in a completable
time scale.
Performing a clone from a DHT stored repository of 2 million objects
takes 2 million row lookups in the DHT to locate the OBJECT_INDEX row
for each object being cloned. This is very difficult for some DHTs to
scale, even at 5000 rows/second the lookup stage alone takes 6 minutes
(on local filesystem, this is almost too fast to bother measuring).
Some servers like Apache Cassandra just fall over and cannot complete
the 2 million lookups in rapid fire.
On a ~400 MiB repository, the DHT schema has an extra 25 MiB of
redundant data that gets downloaded to the JGit process, and that is
before you consider the cost of the OBJECT_INDEX table also being
fully loaded, which is at least 223 MiB of data for the linux kernel
repository. In the DHT schema answering a `git clone` of the ~400 MiB
linux kernel needs to load 248 MiB of "index" data from the DHT, in
addition to the ~400 MiB of pack data that gets sent to the client.
This is 193 MiB more data to be accessed than the native filesystem
format, but it needs to come over a much smaller pipe (local Ethernet
typically) than the local SATA disk drive.
I also never got around to writing the "repack" support for the DHT
schema, as it turns out to be fairly complex to safely repack data in
the repository while also trying to minimize the amount of changes
made to the database, due to very common limitations on database
mutation rates..
This new DFS storage layer fixes a lot of those issues by taking the
simple approach for storing relatively standard Git pack and index
files on an abstract filesystem. Packs are accessed by an in-process
buffer cache, similar to the WindowCache used by the local filesystem
storage layer. Unlike the local file IO, there are some assumptions
that the storage system has relatively high latency and no concept of
"file handles". Instead it looks at the file more like HTTP byte range
requests, where a read channel is a simply a thunk to trigger a read
request over the network.
The DFS code in this change is still abstract, it does not store on
any particular filesystem, but is fairly well suited to the Amazon S3
or Apache Hadoop HDFS. Storing packs directly on HDFS rather than
HBase removes a layer of abstraction, as most HBase row reads turn
into an HDFS read.
Most of the DFS code in this change was blatently copied from the
local filesystem code. Most parts should be refactored to be shared
between the two storage systems, but right now I am hesistent to do
this due to how well tuned the local filesystem code currently is.
Change-Id: Iec524abdf172e9ec5485d6c88ca6512cd8a6eafb
Extend IndexDiff to calculate ignored files and folders
IndexDiff was extended to calculate ignored files and folders.
The calculation only considers files that are NOT in the index.
This functionality is required by the new EGit decorator implementation.
Bug: 359264
Change-Id: I8f09d6a4d61b64aeea80fd22bf3a2963c2bca347
Signed-off-by: Jens Baumgart <jens.baumgart@sap.com>
This constant determine the default start-point, if the user
don't want to create a branch from the current HEAD.
Change-Id: Iea944e11e80134fbafc4c47383457d5ed11a4164
Signed-off-by: Manuel Doninger <manuel.doninger@googlemail.com>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Since we replaced GitIndex by DirCache JGit didn't fire
IndexChangedEvents anymore. For EGit this still worked with a high
latency since its RepositoryChangeScanner which is scheduled to
run each 10 seconds fires the event in case the index changes.
This scanner is meant to detect index changes induced by a different
process e.g. by calling "git add" from native git.
When the index is changed from within the same process we should fire
the event synchronously. Compare the index checksum on write to index
checksum when index was read earlier to determine if index really
changed. Use IndexChangedListener interface to keep DirCache decoupled
from Repository.
Change-Id: Id4311f7a7859ffe8738863b3d86c83c8b5f513af
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
IndexOutOfBoundException is thrown from Repository.resolveSimple() when
'-g' string is located less then 4 characters from the end of this
string.
Change-Id: I1128c2cdfec9db3023d4d0f1f40d863e84b75950
Signed-off-by: Dariusz Luksza <dariusz@luksza.org>
Repository.writeMergeCommitMsg(null) no longer fails if the MERGE_MSG
file is missing. This was done to avoid CommitCommand to fail in case of
a missing MERGE_MSG file.
Bug: 352243
Change-Id: Iddf43533d133f8f22199ed6e2393a552670e7d1f
Signed-off-by: Jens Baumgart <jens.baumgart@sap.com>
Before this change any files in the conflicting set would
also be listed in the the other IndexDiff Sets which is
confusing. With this change a conflicting file will not
be included in any of the other sets.
Change-Id: Ife9f2652685220bcfddc1f9820423acdcd5acfdc
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
jgit.storage.dht is a storage provider implementation for JGit that
permits storing the Git repository in a distributed hashtable, NoSQL
system, or other database. The actual underlying storage system is
undefined, and can be plugged in by implementing 7 small interfaces:
* Database
* RepositoryIndexTable
* RepositoryTable
* RefTable
* ChunkTable
* ObjectIndexTable
* WriteBuffer
The storage provider interface tries to assume very little about the
underlying storage system, and requires only three key features:
* key -> value lookup (a hashtable is suitable)
* atomic updates on single rows
* asynchronous operations (Java's ExecutorService is easy to use)
Most NoSQL database products offer all 3 of these features in their
clients, and so does any decent network based cache system like the
open source memcache product. Relying only on key equality for data
retrevial makes it simple for the storage engine to distribute across
multiple machines. Traditional SQL systems could also be used with a
JDBC based spi implementation.
Before submitting this change I have implemented six storage systems
for the spi layer:
* Apache HBase[1]
* Apache Cassandra[2]
* Google Bigtable[3]
* an in-memory implementation for unit testing
* a JDBC implementation for SQL
* a generic cache provider that can ride on top of memcache
All six systems came in with an spi layer around 1000 lines of code to
implement the above 7 interfaces. This is a huge reduction in size
compared to prior attempts to implement a new JGit storage layer. As
this package shows, a complete JGit storage implementation is more
than 17,000 lines of fairly complex code.
A simple cache is provided in storage.dht.spi.cache. Implementers can
use CacheDatabase to wrap any other type of Database and perform fast
reads against a network based cache service, such as the open source
memcached[4]. An implementation of CacheService must be provided to
glue this spi onto the network cache.
[1] https://github.com/spearce/jgit_hbase
[2] https://github.com/spearce/jgit_cassandra
[3] http://labs.google.com/papers/bigtable.html
[4] http://memcached.org/
Change-Id: I0aa4072781f5ccc019ca421c036adff2c40c4295
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Change-Id: I22fc46dff6cc5dfd975f6e82161d265781778cde
Signed-off-by: Stefan Lay <stefan.lay@sap.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Add private methods which are used for reading and writing MERGE_HEAD
and CHERRY_PICK_HEAD files, as suggested in the comments on change
I947967fdc2f1d55016c95106b104c2afcc9797a1.
Change-Id: If4617a05ee57054b8b1fcba36a06a641340ecc0e
Signed-off-by: Robin Stocker <robin@nibor.org>
Add handling of CHERRY_PICK_HEAD file in .git (similar to MERGE_HEAD),
which is written in case of a conflicting cherry-pick merge.
It is used so that Repository.getRepositoryState can return the new
states CHERRY_PICKING and CHERRY_PICKING_RESOLVED. These states, as well
as CHERRY_PICK_HEAD can be used in EGit to properly show the merge tool.
Also, in case of a conflict, MERGE_MSG is written with the original
commit message and a "Conflicts" section appended. This way, the
cherry-picked message is not lost and can later be re-used in the commit
dialog.
Bug: 339092
Change-Id: I947967fdc2f1d55016c95106b104c2afcc9797a1
Signed-off-by: Robin Stocker <robin@nibor.org>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
There should be a way to explictly refresh the refs cached in the
RefDirectory. Since commit c261b28 (use of FileSnapshot) this is
not needed anymore for storage in the filesystem. But for DHT based
storage an explicit refresh may be needed.
Change-Id: I7d30c3496c05e1fb6e9519f3af9f23c6adb93bf9
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Detaching HEAD didn't work in some corner checkout cases. If, for example,
HEAD is symbolic ref to refs/heads/master, refs/heads/master is ref to commit
c0ffee... then:
checkout c0ffee...
would leave the HEAD unchanged.
The same symptom occurs when checking out a remote tracking branch or a tag
that references the same commit as refs/heads/master.
In the above case, the RefUpdate class didn't have enough information to decide
if the update needed to detach symbolic ref because it dealt only with new/old
objectIDs. Therefore, this fix introduced the RefUpdate.detachingSymbolicRef
flag.
Bug: 315166
Change-Id: I085c98b77ea8f9104a213978ea0d4ac6fd58f49b
Signed-off-by: Sasa Zivkov <sasa.zivkov@sap.com>
This enables applications to differentiate between explicitly set
configuration parameters and best effort attempts to guess these
parameters from the operating system.
Change-Id: I67cc4099238a40c6dca795e64f0155ced6008ef1
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
ObjectIdOwnerMap: More lightweight map for ObjectIds
OwnerMap is about 200 ms faster than SubclassMap, more friendly to the
GC, and uses less storage: testing the "Counting objects" part of
PackWriter on 1886362 objects:
ObjectIdSubclassMap:
load factor 50%
table: 4194304 (wasted 2307942)
ms spent 36998 36009 34795 34703 34941 35070 34284 34511 34638 34256
ms avg 34800 (last 9 runs)
ObjectIdOwnerMap:
load factor 100%
table: 2097152 (wasted 210790)
directory: 1024
ms spent 36842 35112 34922 34703 34580 34782 34165 34662 34314 34140
ms avg 34597 (last 9 runs)
The major difference with OwnerMap is entries must extend from
ObjectIdOwnerMap.Entry, where the OwnerMap has injected its own
private "next" field into each object. This allows the OwnerMap to use
a singly linked list for chaining collisions within a bucket. By
putting collisions in a linked list, we gain the entire table back for
the SHA-1 bits to index their own "private" slot.
Unfortunately this means that each object can appear in at most ONE
OwnerMap, as there is only one "next" field within the object instance
to thread into the map. For types that are very object map heavy like
RevWalk (entity RevObject) and PackWriter (entity ObjectToPack) this
is sufficient, these entity types are only put into one map by their
container. By introducing a new map type, we don't break existing
applications that might be trying to use ObjectIdSubclassMap to track
RevCommits they obtained from a RevWalk.
The OwnerMap uses less memory. Each object uses 1 reference more (so
we're up 1,886,362 references), but the table is 1/2 the size (2^20
rather than 2^21). The table itself wastes only 210,790 slots, rather
than 2,307,942. So OwnerMap is wasting 200k fewer references.
OwnerMap is more friendly to the GC, because it hardly ever generates
garbage. As the map reaches its 100% load factor target, it doubles in
size by allocating additional segment arrays of 2048 entries. (So the
first grow allocates 1 segment, second 2 segments, third 4 segments,
etc.) These segments are hooked into the pre-allocated directory of
1024 spaces. This permits the map to grow to 2 million objects before
the directory itself has to grow. By using segments of 2048 entries,
we are asking the GC to acquire 8,204 bytes in a 32 bit JVM. This is
easier to satisfy then 2,307,942 bytes (for the 512k table that is
just an intermediate step in the SubclassMap). By reusing the
previously allocated segments (they are re-hashed in-place) we don't
release any memory during a table grow.
When the directory grows, it does so by discarding the old one and
using one that is 4x larger (so the directory goes to 4096 entries on
its first grow). A directory of size 4096 can handle up to 8 millon
objects. The second directory grow (16384) goes to 33 million objects.
At that point we're starting to really push the limits of the JVM
heap, but at least its many small arrays. Previously SubclassMap would
need a table of 67108864 entries to handle that object count, which
needs a single contiguous allocation of 256 MiB. That's hard to come
by in a 32 bit JVM. Instead OwnerMap uses 8192 arrays of about 8 KiB
each. This is much easier to fit into a fragmented heap.
Change-Id: Ia4acf5cfbf7e9b71bc7faa0db9060f6a969c0c50
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
ObjectIdSubclassMap: Micro-optimize wrapping at end of table
During a review of the class, Josh Bloch pointed out we can use
"i = (i + 1) & mask" to wrap around at the end of the table, instead
of a conditional with a branch. This is generally faster due to one
less branch that will be mis-predicted by the CPU.
Change-Id: Ic88c00455ebc6adde9708563a6ad4d0377442bba
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
ObjectIdSubclassMap: Avoid field loads in inner loops
Ensure the JIT knows the table cannot be changed during the critical
inner loop of get() or insert() by loading the field into a final
local variable. This shouldn't be necessary, but the instance member
is declared non-final (to resizing) and it is not very obvious to the
JIT that the table cannot be modified by AnyObjectId.equals().
Simplify the JIT's decision making by making it obvious, these
values cannot change during the critical inner loop, allowing
for better register allocation.
Change-Id: I0d797533fc5327366f1207b0937c406f02cdaab3
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This method is trivial in definition, and is called in only 3
places. Inline the method manually to ensure its really going
to be inlined by the JIT at runtime.
Change-Id: I128522af8167c07d2de6cc210573599038871dda
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
32 is way to small for the map. Most applications using the map
will need to load more than 16 objects just from the root refs
being read from the Repository.
Default the initial size to 2048. This cuts out 6 expansions in
the early life of the table, reducing garbage and rehashing time.
Change-Id: I6dd076ebc0b284f1755855d383b79535604ac547
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
If the table needs to be grown, do it before the current insertion
rather than after. This is a tiny micro-optimization that allows
the compiler to reuse the result of "++size" to compare against
previously pre-computed size at which the table should rehash itself.
Change-Id: Ief6f81b91c10ed433d67e0182f558ca70d58a2b0
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
ObjectIdSubclassMap: Use & rather than % for hashing
Bitwise and is faster than integer modulus operations, and since
the table size is always a power of 2, this is simple to use for
index operation.
Change-Id: I83d01e5c74fd9e910c633a98ea6f90b59092ba29
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
obj_hash doesn't match our naming conventions, camelCaseNames
are the preferred format.
Change-Id: I72da199daccb60a98d17b6af1e498189bf149515
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
A standard HashSet was being used to store the list of subsections as
they were being parsed. This was changed to use a LinkedHashSet so
that iterating over the set would return values in the same order as
they are listed in the config file.
Change-Id: I4251f95b8fe0ad59b07ff563c9ebb468f996c37d