Jens Baumgart [Wed, 26 Oct 2011 12:00:48 +0000 (14:00 +0200)]
Add detection of untracked folders to IndexDiffFilter
Decorators need to know whether folders in the working tree contain only
untracked files. This change enhances IndexDiffFilter to report such
folders. This works only together with treewalks which operate in
default traversal mode. For treewalks which process entries in
postorder mode (files are walked before their parent folder is walked)
this detection doesn't work.
Bug: 359264
Change-Id: I9298d1e3ccac0aec8bbd4e8ac867bc06a5c89c9f Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Jens Baumgart <jens.baumgart@sap.com> Signed-off-by: Chris Aniszczyk <zx@twitter.com>
Kevin Sawicki [Tue, 8 Nov 2011 20:49:16 +0000 (12:49 -0800)]
Support a configured credentials provider in LsRemoteCommand
Refactored the three common transport configuration options:
credentials provider, timeout, and transport config callback
into a new TransportCommand base class which is now extended
by all commands that use a Transport object during execution.
Bug: 349188
Change-Id: I90c2c14fb4e3cc4712905158f9047153a0c235c2 Signed-off-by: Kevin Sawicki <kevin@github.com> Signed-off-by: Chris Aniszczyk <zx@twitter.com>
Robin Rosenberg [Sun, 16 Oct 2011 05:01:21 +0000 (07:01 +0200)]
Kill GitIndex
A few places were still using GitIndex. Replacing it was fairly
simple, but there is a difference in test outcome in
ReadTreeTest.testUntrackedConflicts. I believe the new behavior
is good, since we do not update neither the index, not the worktree.
Change-Id: I4be5357b7b3139dded17f77e07a140addb213ea7 Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
Robin Rosenberg [Wed, 19 Oct 2011 21:14:41 +0000 (23:14 +0200)]
Deprecate GitIndex more by using only DirCache internally.
This includes merging ReadTreeTest into DirCacheCheckoutTest and
converting IndexDiffTest to use DirCache only. The GitIndex specific
T0007GitIndex test remains.
GitIndex is deprecated. Let us speed up its demise by focusing the
DirCacheCheckout tests to using DirCache instead.
This also add explicit deprecation comments to methods that depend
on GitIndex in Repository and TreeEntry. The latter is deprecated in
itself.
Change-Id: Id89262f7fbfee07871f444378f196ded444f2783 Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
Shawn O. Pearce [Mon, 7 Nov 2011 20:06:56 +0000 (12:06 -0800)]
Always use try/finally around DfsBlockCache.clockLock
Any RuntimeException or Error in this block will leave the lock
held by the caller thread, which can later result in deadlock or
just cache requests hanging forever because they cannot get to
the lock object.
Wrap everything in try/finally to prevent the lock from hanging,
even though a RuntimeException or Error should never happen in
any of these code paths.
Shawn O. Pearce [Tue, 8 Nov 2011 20:22:37 +0000 (12:22 -0800)]
DfsBlockCache: Fix NPE when evicting empty cell
The cache starts with a single empty Ref that has no data, as the
clock list does not support being empty. When this Ref is removed,
the size has to be decremented from the associated DfsPackKey,
which was previously null. Make it always be non-null.
Robin Rosenberg [Sat, 5 Nov 2011 15:52:24 +0000 (16:52 +0100)]
Don't throw away the stack trace when tests fail
Most unexpected exceptions are completely useless yielding message
like "null" or "3" or in the best cases something reasonable, but
still out of context.
Just declare the test as throwing an exception. That will retain
the full stack trace leading to the point of failure without using
a debugger or changing the code.
* changes:
DfsBlockCache: Update hits to not include contains()
Add a listener for changes to a DfsObjDatabase's pack files
Expose the reverse index size in the DfsPackDescription
Add a DfsPackFile method to get the number of cached bytes
Expose the list of pack files in the DfsBlockCache
Add a DFS repository description and reference it in each pack
Clarify the docstring of DfsBlockCache.reconfigure()
DFS: A storage layer for JGit
Dave Borowitz [Fri, 7 Oct 2011 22:31:19 +0000 (15:31 -0700)]
Add a DFS repository description and reference it in each pack
Just as DfsPackDescription describes a pack but does not imply it is
open in memory, a DfsRepositoryDescription describes a repository at a
basic level without it necessarily being open.
Dave Borowitz [Thu, 3 Nov 2011 19:43:03 +0000 (12:43 -0700)]
Clarify the docstring of DfsBlockCache.reconfigure()
The docstring was copied from the local filesystem cache code, which
actually attempted to reconfigure the cache on the fly. The DFS cache is
designed to be "reconfigured" exactly once.
In practice the DHT storage layer has not been performing as well as
large scale server environments want to see from a Git server.
The performance of the DHT schema degrades rapidly as small changes
are pushed into the repository due to the chunk size being less than
1/3 of the pushed pack size. Small chunks cause poor prefetch
performance during reading, and require significantly longer prefetch
lists inside of the chunk meta field to work around the small size.
The DHT code is very complex (>17,000 lines of code) and is very
sensitive to the underlying database round-trip time, as well as the
way objects were written into the pack stream that was chunked and
stored on the database. A poor pack layout (from any version of C Git
prior to Junio reworking it) can cause the DHT code to be unable to
enumerate the objects of the linux-2.6 repository in a completable
time scale.
Performing a clone from a DHT stored repository of 2 million objects
takes 2 million row lookups in the DHT to locate the OBJECT_INDEX row
for each object being cloned. This is very difficult for some DHTs to
scale, even at 5000 rows/second the lookup stage alone takes 6 minutes
(on local filesystem, this is almost too fast to bother measuring).
Some servers like Apache Cassandra just fall over and cannot complete
the 2 million lookups in rapid fire.
On a ~400 MiB repository, the DHT schema has an extra 25 MiB of
redundant data that gets downloaded to the JGit process, and that is
before you consider the cost of the OBJECT_INDEX table also being
fully loaded, which is at least 223 MiB of data for the linux kernel
repository. In the DHT schema answering a `git clone` of the ~400 MiB
linux kernel needs to load 248 MiB of "index" data from the DHT, in
addition to the ~400 MiB of pack data that gets sent to the client.
This is 193 MiB more data to be accessed than the native filesystem
format, but it needs to come over a much smaller pipe (local Ethernet
typically) than the local SATA disk drive.
I also never got around to writing the "repack" support for the DHT
schema, as it turns out to be fairly complex to safely repack data in
the repository while also trying to minimize the amount of changes
made to the database, due to very common limitations on database
mutation rates..
This new DFS storage layer fixes a lot of those issues by taking the
simple approach for storing relatively standard Git pack and index
files on an abstract filesystem. Packs are accessed by an in-process
buffer cache, similar to the WindowCache used by the local filesystem
storage layer. Unlike the local file IO, there are some assumptions
that the storage system has relatively high latency and no concept of
"file handles". Instead it looks at the file more like HTTP byte range
requests, where a read channel is a simply a thunk to trigger a read
request over the network.
The DFS code in this change is still abstract, it does not store on
any particular filesystem, but is fairly well suited to the Amazon S3
or Apache Hadoop HDFS. Storing packs directly on HDFS rather than
HBase removes a layer of abstraction, as most HBase row reads turn
into an HDFS read.
Most of the DFS code in this change was blatently copied from the
local filesystem code. Most parts should be refactored to be shared
between the two storage systems, but right now I am hesistent to do
this due to how well tuned the local filesystem code currently is.
Robin Rosenberg [Fri, 4 Nov 2011 16:53:44 +0000 (17:53 +0100)]
Allow '\' in user names in URI-ish
Actually this is not ok according to the RFC, but this implementation is
ment to be Git compatible. A '\' is needed when the authentication
requires or allows authentication to a Windows domain where the
user name can be specified as DOMAIN\user.
Robin Rosenberg [Fri, 28 Oct 2011 12:58:32 +0000 (14:58 +0200)]
Do not resolve path using cygwin unless told to
The system property jgit.cygpath must be set to true in order
for cygwin's cygpath to be used to translate path from cygwin
namespace to Windows namespace.
The cygwin path translation should be considered deprecated.
Bug: 353389
Change-Id: I2b5234c0ab936dac67d1e232f4cd28331bf3226d Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
Matthias Sohn [Wed, 26 Oct 2011 21:29:23 +0000 (17:29 -0400)]
Merge changes I488e9c97,I30f1049f,I1c088dce
* changes:
Cosmetic adjustment of relative date format, do not display "0 months"
Make use of the many date formatting options in the log command
Define a utility class for handling Git date formats
Carsten Pfeiffer [Tue, 25 Oct 2011 07:22:11 +0000 (09:22 +0200)]
Allow detecting which files were renamed during a revwalk
The egit history view shows the files associated with a commit by using
a PathFilter. When following renames with a FollowFilter, the PathFilter
cannot be configured anymore because the affected files are simply not
known.
Thus, it should be possible to get to know which files are renamed.
Robin Rosenberg [Sun, 23 Oct 2011 20:53:17 +0000 (22:53 +0200)]
Fix compatibilty breakage for SystemReader
Introducing a new abstract method is not nice when one
expects other to subclass them. Create default implementations
so old code that implements SystemReader does not break.
The default methods just delegate to the JVM.
Change-Id: I42cdfdcb6b29f7203697a23833dca85185b0b9b3 Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
Robin Rosenberg [Sat, 22 Oct 2011 23:51:30 +0000 (01:51 +0200)]
Define a utility class for handling Git date formats
Besides the formats known by git-log(1) we also add "locale"
and "localelocal" that formats dates according to the user's locale.
"locale" does not translate into local timezone, while
localelocal does.
Robin Rosenberg [Mon, 17 Oct 2011 06:28:19 +0000 (08:28 +0200)]
Fix bad checkout behaviour when a file is removed
We deleted the entry if there was a file and an index
entry, but not when there was just an index entry. Now
delete the file in both cases since the missing file
just means our worktree is dirty. This affected the
implementation of reset --hard.
Shawn O. Pearce [Sat, 8 Oct 2011 00:23:58 +0000 (17:23 -0700)]
Refactor HTTP server stack to use Filter as base
All Git URLs operate off a suffix approach, for example the default
binding is for paths such as:
*/info/refs
*/git-upload-pack
*/git-receive-pack
These names are not common on project hosting servers, especially
one like Gerrit Code Review.
In addition to offering Git-over-HTTP as a servlet, offer it as a
filter that triggers when a matching suffix appears, but otherwise
delegates the request through the chain. This filter would permit
Gerrit Code Review to place projects at the root of the server,
rather than within the "/p/" subdirectory, making the HTTP and SSH
URL structure exactly match each other.
To prevent breakage with existing users, the MetaServlet and
GitServlet are kept as wrappers delegating to their filters,
returning 404 Not Found when the filter has no match.
* changes:
UploadPack: Fix races in smart HTTP negotiation
PackWriter: Export more statistics
Do not requeue state vector in stateless RPC fetch
Wrap excessively long line in BasePackFetchConnection
Fix smart HTTP client stream alignment errors
Jens Baumgart [Wed, 5 Oct 2011 11:56:23 +0000 (13:56 +0200)]
Extend IndexDiff to calculate ignored files and folders
IndexDiff was extended to calculate ignored files and folders.
The calculation only considers files that are NOT in the index.
This functionality is required by the new EGit decorator implementation.
Change-Id: I589e758cc55873ce75614602e017ac793435e24d Signed-off-by: Kevin Sawicki <kevin@github.com> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Manuel Doninger [Thu, 8 Sep 2011 17:37:11 +0000 (19:37 +0200)]
New config constant for default start-point
This constant determine the default start-point, if the user
don't want to create a branch from the current HEAD.
Change-Id: Iea944e11e80134fbafc4c47383457d5ed11a4164 Signed-off-by: Manuel Doninger <manuel.doninger@googlemail.com> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Matthias Sohn [Thu, 29 Sep 2011 22:00:22 +0000 (00:00 +0200)]
Fire IndexChangedEvent on DirCache.commit()
Since we replaced GitIndex by DirCache JGit didn't fire
IndexChangedEvents anymore. For EGit this still worked with a high
latency since its RepositoryChangeScanner which is scheduled to
run each 10 seconds fires the event in case the index changes.
This scanner is meant to detect index changes induced by a different
process e.g. by calling "git add" from native git.
When the index is changed from within the same process we should fire
the event synchronously. Compare the index checksum on write to index
checksum when index was read earlier to determine if index really
changed. Use IndexChangedListener interface to keep DirCache decoupled
from Repository.
Change-Id: Id4311f7a7859ffe8738863b3d86c83c8b5f513af Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Fix status in index entries after checkout of paths
The checkout command was producing an inconsistent state of the index
which even confuses native git. The content sha1 of the touched index
entries was updated, but the length and the filemode was not updated.
Later in coding the index entries got automatically corrected (through
Dircache.checkoutEntry()) but the correction was after persisting the
index to disk. So, the correction was lost and we ended up with an index
where length and sha1 don't fit together.
A similar problem is fixed with "lastModified" of DircacheEntry. When
checking out a path without specifying an explicit commit (you want to
checkout what's in the index) the index was not updated regarding
lastModified. Readers of the index will think the checked-out
file is dirty because the file has a younger lastmodified then what's
in the index.
Robin Rosenberg [Thu, 8 Sep 2011 17:42:19 +0000 (19:42 +0200)]
Test the reflog message for commit, cherry-pick, revert and merge
Change-Id: I319f09577b3e04f6c31399fe8e57e9a9ad2c8a6c Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Robin Rosenberg [Thu, 8 Sep 2011 16:35:17 +0000 (18:35 +0200)]
Append merge strategy to reflog message
Change-Id: Ia0e73208b86c45a3d96698e973f6e70ec5cb7303 Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Robin Rosenberg [Thu, 8 Sep 2011 16:05:01 +0000 (18:05 +0200)]
Fix the reflog prefix for cherry-pick, revert and merge commands
We should see whether the commit was a regular commit or something
else.
Change-Id: I82d8300cf3c53cb2bdcb6495386aadb803e0c6f7 Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Enable full Transport configuration for JGit API commands
Add a TransportConfigCallback parameter to JGit API commands, to allow
consumers of the JGit command API to perform custom Transport configuration
that would be otherwise difficult to anticipate & expose on the API command
builders.
My specific use-case is configuring additional properties on SshTransport
- I need to take over the SshSessionFactory used by the transport. Using
TransportConfigCallback I can simply do this (rather than reimplement the
API command classes):
public void configure(Transport tn) {
if (tn instanceof SshTransport) {
((SshTransport) tn).setSshSessionFactory(factoryProvider.get());
}
}
Adding an explicit setSshSessionFactory() method to the JGit command
classes would bloat the API. Also, creating the replacement
SshSessionFactory is unnecessary if the transport is not SSH, but the type
of the Transport is only known once the remote has been resolved and the
URI parsed - consequently it makes sense to perform this step in a
callback, where the transport instance can be inspected to determine if
it's of a relevant type.
A note about where this leaves the API - there are now 4 commands:
I think there's potential for introducing an interface or val-object to
identify/encapsulate this repetition, which I'd be happy to do in a
subsequent commit.
Shawn O. Pearce [Sat, 27 Aug 2011 00:28:18 +0000 (17:28 -0700)]
UploadPack: Fix races in smart HTTP negotiation
Clients cache the set of advertised references at the start of a
negotiation, and keep replaying the same "want SHA1" list to the
server on each negotiation step. If another client pushes into
a branch and moves it by fast-forward, any request to obtain that
branch's prior SHA-1 is still valid, the commit is reachable from
the new position of the reference. Unfortunately the fast-forward
causes smart HTTP negotations to fail, as the server no longer is
advertising that prior SHA-1.
Instead of causing clients to fail out with a "want invalid" error
and forcing the end-user retry, possibly getting into a never ending
try-fail-retry race while other clients are pushing into the same
busy repository, allow the slightly stale want request so long as
it is still reachable.
C Git implemented this same change recently to fix races on the
smart HTTP protocol when the C Git git-http-backend is used.
The new RequestPolicy feature also allows server authors to make
an even more lenient configuration that exports any SHA-1 to the
client. This might be useful in certain settings where a server
has authenticated the client as the "repository owner" and wants
to allow them to grab any content from the server as a complete
unbroken history chain.
The new setAdvertisedRefs() method allows server authors to manually
fix the references that are advertised, possibly bypassing the
getAllRefs() call on the Repository object.
Change-Id: I7cdb563bf9c55c83653f217f6e53c3add55a0541 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Shawn O. Pearce [Mon, 15 Aug 2011 23:38:28 +0000 (16:38 -0700)]
PackWriter: Export more statistics
Export the shallow pack information, and also a handy function to
sum up the total times. Include the time writing out the index file,
if it was created.
Change-Id: I7f60ae6848455a357b25feedb23743bbf6c153cf Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Do not requeue state vector in stateless RPC fetch
If the no-done capability was enabled on the connection, don't
queue up the state vector again once the ACK %s ready message
is observed from the remote. The pack will be following in this
response stream, so the state vector is no longer required.
Change-Id: I7bd1e76957cb58c7ff1cdaeef227f1b02a7e5d24 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The client's use of UnionInputStream was broken when combined with a
8192 byte buffer used by PackParser. A smart HTTP client connection
always pushes in the execute stateless RPC input stream after the
data stream has ended from the remote peer. At the end of the pack,
PackParser asked to fill a 8192 byte buffer, but if only e.g. 1000
bytes remained UnionInputStream went to the next stream and asked
it for input, which triggered a new RPC, and failed because there
was nothing pending in the request buffer.
Change UnionInputStream to only return what it consumed from a
single InputStream without invoking the next InputStream, just in
case that second InputStream happens to be one of these magical
ones that generates an RPC invocation.
Change-Id: I0e51a8e6fea1647e4d2e08ac9cfc69c2945ce4cb Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Kevin Sawicki [Tue, 13 Sep 2011 22:29:55 +0000 (15:29 -0700)]
Remove duplicate calls to DirCache.unlock on checkout
Calls to unlock the DirCache before throwing an exception
were not needed since checkout calls doCheckout wrapped
in a try block that calls DirCache.unlock in a finally
block.
Change-Id: I2b249a784f9e363430e288aad67fcefb7fac0a6e Signed-off-by: Kevin Sawicki <kevin@github.com>
Matthias Sohn [Sun, 11 Sep 2011 20:43:41 +0000 (22:43 +0200)]
Merge branch 'stable-1.1'
* stable-1.1:
Allow commit when submodule changes are present
Ignore submodule on checkout instead of deleting it
cleanup: Reuse local variable for current DirCacheEntry
Prepare post v1.1.0.201109071825-rc3 builds
JGit v1.1.0.201109071825-rc3
Use commit message best practices for Mylyn Commit template
Change-Id: I6ab9e5cb48c036d2ee2e548f5ec040d93672d8ad Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Robin Rosenberg [Sat, 3 Sep 2011 20:54:37 +0000 (22:54 +0200)]
Ignore submodule on checkout instead of deleting it
The purpose of this commit is to prevent destruction of
submodules on checkout from a tree with a submodule to
another. For consistency we handle the reverse case too,
when we checkout a branch that has a submodule and the
submodule directory exists. And finally we ignore the
case where the submodule changes.
We do not update the submodules, we just try to ignore
them harder.
Bug: 356664
Change-Id: I202c695a57af99b13d0d7220803fd08def3d9b5e Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
Robin Rosenberg [Sun, 4 Sep 2011 09:12:49 +0000 (11:12 +0200)]
Fix the names in the reflog for checkouts
We were diverging from the reference implementation. Always use the
ref we checkout to as the to-branch the reflog and avoid the
refs/heads both in the from-name and to-name.
Change-Id: Id973d9102593872e4df41d0788f0eb7c7fd130c4 Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Robin Rosenberg [Sun, 4 Sep 2011 09:10:47 +0000 (11:10 +0200)]
Add a helper for parsing branch switch info out of a reflog entry
Change-Id: I91c7e08c4afd2562df2226887a933d93c78a0371 Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>