When running on NFS there was a chance that JGits LockFile
semantic is broken because File#createNewFile() may allow
multiple clients to create the same file in parallel. This
change provides a fix which is only used when the new config
option core.supportsAtomicCreateNewFile is set to false. The
default for this option is true. This option can only be set in the
global or the system config file. The repository config file is not
taken into account in this case.
If the config option core.supportsAtomicCreateNewFile is true
then File#createNewFile() is trusted and the behaviour doesn't
change.
But if core.supportsAtomicCreateNewFile is set to false then after
successful creation of the lock file a hardlink to that lock file is
created and the attribute nlink of the lock file is checked to be 2. If
multiple clients manage to create the same lock file nlink would be
greater than 2 showing the error.
This expensive workaround is described in
https://www.time-travellers.org/shane/papers/NFS_considered_harmful.html
section III.d) "Exclusive File Creation"
Change-Id: I3d2cc48d8eb280d5f7039eb94da37804f903be6a
With the auto option, gc checks whether any housekeeping is required; if
not, it exits without performing any work. Some JGit commands run gc
--auto after performing operations that could create many loose objects.
Housekeeping is required if there are too many loose objects or too many
packs in the repository.
If the number of loose objects exceeds the value of the gc.auto option
jgit's GC consolidates all existing packs into a single pack (equivalent
to -A option), whereas git-core would combine all loose objects into a
single pack using repack -d -l. Setting the value of gc.auto to 0
disables automatic packing of loose objects.
If the number of packs exceeds the value of gc.autoPackLimit, then
existing packs (except those marked with a .keep file) are consolidated
into a single pack by using the -A option of repack. Setting
gc.autoPackLimit to 0 disables automatic consolidation of packs.
Like git the following jgit commands run auto gc:
- fetch
- merge
- rebase
- receive-pack
The auto gc for receive-pack can be suppressed by setting the config
option receive.autogc = false
Change-Id: I68a2a051b39ec2c53cb7c4b8f6c596ba65eeba5d
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Allow for higher concurrency on DfsBlockCache by adding a configuration
for number of estimated concurrent requests.
Change-Id: Ia65e58ecb2c459b6d9c9697a2f715d933270f7e6
Signed-off-by: Philipp Marx <smigfu@googlemail.com>
Implement the DIR_NO_GITLINKS setting with the same functionality
it provides in cGit.
Bug: 436200
Change-Id: I8304e42df2d7e8d7925f515805e075a92ff6ce28
Signed-off-by: Preben Ingvaldsen <preben@puppetlabs.com>
Add config parameter gc.prunePackExpire for packfile expiration
JGit's Garbage Collector is repacking relevant objects into new
packfiles and is afterwards deleting the now obsolete packfiles. But to
prevent problems caused by race conditions JGit was not deleting
packfiles when they are too young. The same mechanism as for loose
objects and the config parameter gc.pruneExpire was used.
But JGit was reusing the parameter gc.pruneExpire also for packfiles
which may cause a lot of filesystem consumption if gc.pruneExpire was
set to the default of 2 weeks. Only two weeks after packfile creation gc
was allowed to delete this packfile.
This change introduces a new config paramter gc.prunePackExpire with a
default of "1.hour". This parameter is used when packfiles are deleted.
Only packfiles older than the specified time can be deleted.
For loose objects the behaviour is not changed and only the old
parameter gc.pruneExpire is relevant.
Change-Id: I6209efb05678b15153bd22479dc13486907a44f8
TreeWalk provides the new method getEolStreamType. This new method can
be used with EolStreamTypeUtil in order to create a wrapped InputStream
or OutputStream when reading / writing files. The implementation
implements support for the git configuration options core.crlf, core.eol
and the .gitattributes "text", "eol" and "binary"
CQ: 10896
Bug: 486563
Change-Id: Ie4f6367afc2a6aec1de56faf95120fff0339a358
Signed-off-by: Ivan Motsch <ivan.motsch@bsiag.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Avoid storing large packs in block cache during reuse
When a large pack (> 30% of the block cache) is being reused by
copying it pollutes the block cache with noise by storing blocks
that are never referenced again.
Avoid this by streaming the file directly from its channel onto
the output stream.
Change-Id: I2e53de27f3dcfb93de68b1fad45f75ab23e79fe7
Core classes to parse and process .gitattributes files including
support for reading attributes in WorkingTreeIterator and the
dirCacheIterator.
The implementation follows the git ignore implementation. It supports
lazy reading attributes while walking the working tree.
Bug: 342372
CQ: 9078
Change-Id: I05f3ce1861fbf9896b1bcb7816ba78af35f3ad3d
Also-by: Marc Strapetz <marc.strapetz@syntevo.com>
Also-by: Gunnar Wagenknecht <gunnar@wagenknecht.org>
Also-by: Arthur Daussy <arthur.daussy@obeo.fr>
Signed-off-by: Gunnar Wagenknecht <gunnar@wagenknecht.org>
Signed-off-by: Marc Strapetz <marc.strapetz@syntevo.com>
Signed-off-by: Arthur Daussy <arthur.daussy@obeo.fr>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
JGit's ObjectDirectory implements the optimization that it remembers the
pack folders (.git/objects/pack) lastModified timestamp and doesn't
check for new packfiles in this folder if the lastModified attribute has
not changed.
In environments using NFS this can cause trouble. If multiple JGit
instances from multiple machines work on the same repository and one
instance creates a new ref and a new packfile (e.g. by doing a fetch)
then the other machines may detect the new ref but can't resolve the
referenced object because it doesn't detect that pack folder has a new
packfile. That's because NFS may cache file/folder metadata for quite a
long time and the pack folders modification time is not updated although
a new packfile is there and could be read.
The new config parameter core.trustfolderstat controls this behaviour.
The default is true and jgits behaviours is unchanged. But if this
parameter is set to false then jgit doesn't trust the pack directories
lastmodified anymore. Instead it will always iterate through the content
of that folder to detect new packfiles.
Change-Id: Ie3b4e92933286aa9916070a22422e629b3147f54
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Support for Submodule configuration submodule.<name>.ignore
For each submodule native git allows to configure which modifications to
submodules should be ignored by the status command. It is possible to
ignore "none", "all", "dirty", "untracked" [1]. This configuration is
now supported by IndexDiff. The StatusCommand offers the possibility to
specify this mode.
[1] http://git-scm.com/docs/gitmodules
Change-Id: Ifd81d574a680f9b4152945ba70f8ec4af4f452c9
JGit should offer the possibility to do a garbage collection in
"aggressive" mode. In this mode garbage collection more aggressively
optimize the repository at the expense of taking much more time.
Technically a aggressive mode garbage collection differs from a
non-aggressive one by:
- not reusing packed objects found in old packs. Recompress every object
- the configuration pack.window is set to 250 (the default is 10)
- the configuration pack.depths is set to 250 (the default is 50)
The associated classes in org.eclipse.jgit.api and the command line
command in org.eclipse.jgit.pgm expose this new option.
The configuration parameters gc.aggressiveDepth and gc.aggressiveWindow
have been introduced to configure this feature.
Bug: 444332
Change-Id: I024101f2810acf6be13ce144c9893d98f5c4ae76
Windows: Hide the .git directory if hidedotfiles is set to non-false
Other .git files are not hidden with this patch
Change-Id: Idf63ca08d08f3a77c33f5848d02074f8d6a75758
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Preserve merges during pull if configured to do so
Setting branch.<name>.rebase or pull.rebase to 'preserve' will preserve
merges during rebase. Also, pull.rebase is now consulted if there is no
branch-specific configuration.
Bug: 429664
Change-Id: I345fa295c7e774e0d0a8e6aba30fbfc3552e0084
Signed-off-by: Konrad Kügler <swamblumat-eclipsebugs@yahoo.de>
Use fetch.prune and remote.<name>.prune to set prune mode when fetching
When no explicit value is set via FetchCommand.setRemoveDeletedRefs()
checks if pruning is enabled in the configuration.
The following commit introduced the prune config to C Git:
737c5a9cde
Change-Id: Ida79d335218e1c9f5c6e2ce03386ac8a1c0b212e
Signed-off-by: Konrad Kügler <swamblumat-eclipsebugs@yahoo.de>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
The change includes comparing symbolic links between disk and index,
adding symbolic links to the index, creating/modifying links on
checkout. The behavior is controlled by the core.symlinks setting, just
as C Git does. When a new repository is created core.symlinks will be
set depending on the capabilities of the operating system and Java
runtime.
If core.symlinks is set to true, the assumption is that symlinks are
supported, which may result in runtime errors if this turns out not to
be the case.
Measuring the cost of jgit status on a repository with ~70000 files,
of which ~30000 are tracked reveals a penalty of about 10% for using
the Java7 (really NIO2) support module.
Bug: 354367
Change-Id: I12f0fdd9d26212324a586896ef7eb1f6ff89c39c
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
This feature was introduced in native git with version 1.8.4.
Bug: 422951
Change-Id: I42f194174d64d7ada6631e2156c2a7bf93b5e91c
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
This implementation has been proven to deadlock in production server
loads. Google has been running with it disabled for a quite a while,
as the bugs have been difficult to identify and fix.
Instead of suggesting it works and is useful, drop the code. JGit
should not advertise support for functionality that is known to
be broken.
In a few of the places where read-ahead was enabled by DfsReader
there is more information about what blocks should be loaded when.
During object representation selection, or size lookup, or sending
object as-is to a PackWriter, or sending an entire pack as-is the
reader knows exactly which blocks are required in the cache, and it
also can compute when those will be needed. The broken read-ahead
code was stupid and just read a fixed amount ahead of the current
offset, which can waste IOs if more precise data was available.
DFS systems are usually slow to respond so read-ahead is still
a desired feature, but it needs to be rebuilt from scratch and
make better use of the offset information.
Change-Id: Ibaed8288ec3340cf93eb269dc0f1f23ab5ab1aea
There is a huge performance issue when using both JGit (EGit) and Git
because JGit does not fill all dircache stat fields with the values Git
would expect. As a result thereof Git would typically revalidate a large
number of tracked files. This can take several minutes for large
repositories with many large files.
Since 1.8.2 Git will restrict stat checking to the size and whole second
part of the modification time stamp, if core.statinfo is set to
"minimal".
As JGit checks only size and modification time this is close to what
JGit already does. To make the match perfect ignore the sub-second part
of the modification time stamp if core.statinfo = minimal.
Change-Id: I8eaff1858a891571075a86db043f9d80da3d7503
Add additional FastForwardMode enums for different config contexts
FastForwardMode is represented by different strings depending on context
it is set or get from. E.g. FastForwardMode.FF_ONLY for
branch.<name>.mergeoptions is "--ff-only" but for merge.ff it is "only".
Change-Id: I39ae93578e4783de80ebf4af29ae23b3936eec47
Add additional FastForwardMode enums for different config contexts
FastForwardMode should be represented by different enums depending on
context it is set or get from. E.g. FastForwardMode.FF_ONLY for
branch.<name>.mergeoptions is "--ff-only" but for merge.ff it is "only".
Change-Id: I3ecc16d48e715b81320b73ffae4caf3558f965f2
A few classes such as Constanrs are marked with @SuppressWarnings, as are
toString() methods with many liternal, but otherwise $NLS-n$ is used for
string containing text that should not be translated. A few literals may
fall into the gray zone, but mostly I've tried to only tag the obvious
ones.
Change-Id: I22e50a77e2bf9e0b842a66bdf674e8fa1692f590
Make GC honor the config parameter gc.pruneexpire. If the parameter is
not set then the default is "2.weeks.ago"
Change-Id: I0ae0ca85993cafb4bc75ba80504da18544894ec3
Set core.precomposeunicode to true when creating repository on Mac
Java has no option but to use precomposed Unicode, so we should
state that when creating a new repository. Not that Java will use
precomposed unicode regardless of this setting, but this reduces
the risk of incompatibility with C Git.
Change-Id: I3779b75f76d2e2061c836cbc9b4b7c2ae0cf18f4
Adds the following commands:
- Add
- Init
- Status
- Sync
- Update
This also updates AddCommand so that file patterns added that
are submodules can be staged in the index.
Change-Id: Ie5112aa26430e5a2a3acd65a7b0e1d76067dc545
Signed-off-by: Kevin Sawicki <kevin@github.com>
Signed-off-by: Chris Aniszczyk <zx@twitter.com>
In practice the DHT storage layer has not been performing as well as
large scale server environments want to see from a Git server.
The performance of the DHT schema degrades rapidly as small changes
are pushed into the repository due to the chunk size being less than
1/3 of the pushed pack size. Small chunks cause poor prefetch
performance during reading, and require significantly longer prefetch
lists inside of the chunk meta field to work around the small size.
The DHT code is very complex (>17,000 lines of code) and is very
sensitive to the underlying database round-trip time, as well as the
way objects were written into the pack stream that was chunked and
stored on the database. A poor pack layout (from any version of C Git
prior to Junio reworking it) can cause the DHT code to be unable to
enumerate the objects of the linux-2.6 repository in a completable
time scale.
Performing a clone from a DHT stored repository of 2 million objects
takes 2 million row lookups in the DHT to locate the OBJECT_INDEX row
for each object being cloned. This is very difficult for some DHTs to
scale, even at 5000 rows/second the lookup stage alone takes 6 minutes
(on local filesystem, this is almost too fast to bother measuring).
Some servers like Apache Cassandra just fall over and cannot complete
the 2 million lookups in rapid fire.
On a ~400 MiB repository, the DHT schema has an extra 25 MiB of
redundant data that gets downloaded to the JGit process, and that is
before you consider the cost of the OBJECT_INDEX table also being
fully loaded, which is at least 223 MiB of data for the linux kernel
repository. In the DHT schema answering a `git clone` of the ~400 MiB
linux kernel needs to load 248 MiB of "index" data from the DHT, in
addition to the ~400 MiB of pack data that gets sent to the client.
This is 193 MiB more data to be accessed than the native filesystem
format, but it needs to come over a much smaller pipe (local Ethernet
typically) than the local SATA disk drive.
I also never got around to writing the "repack" support for the DHT
schema, as it turns out to be fairly complex to safely repack data in
the repository while also trying to minimize the amount of changes
made to the database, due to very common limitations on database
mutation rates..
This new DFS storage layer fixes a lot of those issues by taking the
simple approach for storing relatively standard Git pack and index
files on an abstract filesystem. Packs are accessed by an in-process
buffer cache, similar to the WindowCache used by the local filesystem
storage layer. Unlike the local file IO, there are some assumptions
that the storage system has relatively high latency and no concept of
"file handles". Instead it looks at the file more like HTTP byte range
requests, where a read channel is a simply a thunk to trigger a read
request over the network.
The DFS code in this change is still abstract, it does not store on
any particular filesystem, but is fairly well suited to the Amazon S3
or Apache Hadoop HDFS. Storing packs directly on HDFS rather than
HBase removes a layer of abstraction, as most HBase row reads turn
into an HDFS read.
Most of the DFS code in this change was blatently copied from the
local filesystem code. Most parts should be refactored to be shared
between the two storage systems, but right now I am hesistent to do
this due to how well tuned the local filesystem code currently is.
Change-Id: Iec524abdf172e9ec5485d6c88ca6512cd8a6eafb
This constant determine the default start-point, if the user
don't want to create a branch from the current HEAD.
Change-Id: Iea944e11e80134fbafc4c47383457d5ed11a4164
Signed-off-by: Manuel Doninger <manuel.doninger@googlemail.com>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Change-Id: I22fc46dff6cc5dfd975f6e82161d265781778cde
Signed-off-by: Stefan Lay <stefan.lay@sap.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Respect core.excludesfile to enable global ignore rules
Also use FS.resolve() to properly resolve files from path strings.
Bug: 328428 (partial fix)
Change-Id: I41d94694f220dcb85605c9acadfffb1fa23beaeb
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
ConfigConstants: expose some constants for user name and email.
This is needed by a EGit change
http://egit.eclipse.org/r/#change,2232
Change-Id: I3d62f904b769fc2f1b7b8f0f24f7dd757fc9c379
Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>
The diff algorithm which is used by Merge, Cherry-Pick, Rebase
should be configurable. A new configuration parameter "diff.algorithm"
is introduced which currently accepts the values "myers" or
"histogram". Based on this parameter for example the ResolveMerger
will choose a diff algorithm. The reason for this is bug 331078.
This bug shows that JGit is more compatible with C Git when
histogram diff is in place. But since histogram diff is quite new we
need an easy way to fall back to Myers diff.
Bug: 331078
Change-Id: I2549c992e478d991c61c9508ad826d1a9e539ae3
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Philipp Thun <philipp.thun@sap.com>
The need for branching becomes more pressing with pull
support: we need to make sure the upstream configuration entries
are written correctly when creating and renaming branches
(and of course are cleaned up when deleting them).
This adds support for listing, adding, deleting and renaming
branches including the more common options.
Bug: 326938
Change-Id: I00bcc19476e835d6fd78fd188acde64946c1505c
Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
This is the minimal implementation of a "Pull" command. It does not
have any parameters besides the generic progress monitor and timeout.
It works on the currently checked-out branch and assumes that the
configuration contains the keys "branch.<branch name>.remote" and
"branch.<branch name>.merge" to determine the remote configuration
for the fetch and the remote branch name for the merge.
Bug: 303404
Change-Id: I7fe09029996d0cfc09a7d8f097b5d6af1488fa93
Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
"Bare" Repository should not return working directory.
If a repository is "bare", it currently still returns a working directory.
This conflicts with the specification of "bare"-ness.
Bug: 311902
Change-Id: Ib54b31ddc80b9032e6e7bf013948bb83e12cfd88
Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>
Per CQ 3448 this is the initial contribution of the JGit project
to eclipse.org. It is derived from the historical JGit repository
at commit 3a2dd9921c.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>