Shawn O. Pearce [Fri, 24 Jun 2011 19:55:19 +0000 (12:55 -0700)]
Improve performance when writing trees and small blobs
ObjectDirectoryInserter was always creating a temporary file,
writing the complete compressed contents of a tree, fsync()'ing
that to stable storage, and only then checking to see if there
was already an object with the same SHA-1 in the repository.
For commits this strategy makes some sense, the commit is very
unlikely to exist in the repository, as there are embedded times
and these change with each commit.
However for trees coming out of DirCache, it is more common for the
tree to already exist in the repository. Most subdirectories are
not modified in any given commit. Doing all of this local file IO
for things that already exist is very slow.
Try to detect cases where the object is "small enough" that it can
be processed entirely in memory, and avoid doing disk IO entirely
if the object already exists.
Also increase the size of the output buffer for the deflation.
This should boost the average write(2) syscall size from 512 bytes
to 8192 bytes, making streaming of large compressed contents to
disk slightly more efficient.
Change-Id: I1d40364e8725468522435814631916d73174c92b Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Shawn O. Pearce [Fri, 27 May 2011 20:35:18 +0000 (13:35 -0700)]
Push errors back over sideband when possible
If an internal exception occurs while packing and the request
needs to abort, the HTTP response might already be committed due
to progress message having already been delivered to the client.
This prevents UploadPackServlet from resetting the response and
sending back an HTTP 500 response.
Try to catch all exceptions and report internal errors over the
sideband stream or as an ERR command during the initial ACK/NAK
negotiation phase. This allows JGit to transmit an error message
that the user will receive on their console without needing to
worry about resetting the (already gone) HTTP response.
Change-Id: Ie393fb8bb55d2b79ab1276adf71c781c1807f9fe Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Shawn O. Pearce [Fri, 27 May 2011 17:46:02 +0000 (10:46 -0700)]
Report progress while updating references
If a fetch or push needs to apply more than a few references
to the local repository it may take more than 0.25 seconds to
process all of the updates. This is especially true in the DHT
storage system during an initial push of a project with many tags.
The backend database may need to use a transaction to ensure each
tag reference creation is unique, and there may be large delays
caused by these transactions.
Change-Id: Ib11a077adfbd525253e425d327f2e2c2380804c7 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Matthias Sohn [Thu, 9 Jun 2011 15:41:16 +0000 (17:41 +0200)]
Merge branch 'stable-1.0'
* stable-1.0:
Prepare post JGit v1.0.0.201106090707-r builds
JGit v1.0.0.201106090707-r
Include about.html files in maven build
Prepare post v1.0.0.201106081625-r builds
JGit v1.0.0.201106081625-r
Add missing about.html files to all shipped bundles
Prepare post v1.0.0.201106071701-r builds
JGit v1.0.0.201106071701-r
Matthias Sohn [Wed, 1 Jun 2011 23:45:50 +0000 (01:45 +0200)]
Merge branch 'stable-1.0'
* stable-1.0:
Prepare post v1.0.0.201106011211-rc3 builds
JGit v1.0.0.201106011211-rc3
Remove incubation marker
blame: Compute the origin of lines in a result file
Shawn O. Pearce [Fri, 27 May 2011 22:27:17 +0000 (15:27 -0700)]
blame: Compute the origin of lines in a result file
BlameGenerator digs through history and discovers the origin of each
line of some result file. BlameResult consumes the stream of regions
created by the generator and lays them out in a table for applications
to display alongside of source lines.
Applications may optionally push in the working tree copy of a file
using the push(String, byte[]) method, allowing the application to
receive accurate line annotations for the working tree version. Lines
that are uncommitted (difference between HEAD and working tree) will
show up with the description given by the application as the author,
or "Not Committed Yet" as a default string.
Applications may also run the BlameGenerator in reverse mode using the
reverse(AnyObjectId, AnyObjectId) method instead of push(). When
running in the reverse mode the generator annotates lines by the
commit they are removed in, rather than the commit they were added in.
This allows a user to discover where a line disappeared from when they
are looking at an older revision in the repository. For example:
blame --reverse 16e810b2..master -L 1080, org.eclipse.jgit.test/tst/org/eclipse/jgit/storage/file/RefDirectoryTest.java
( 1080) } 2302a6d3 (Christian Halstrick 2011-05-20 11:18:20 +0200 1081) 2302a6d3 (Christian Halstrick 2011-05-20 11:18:20 +0200 1082) /** 2302a6d3 (Christian Halstrick 2011-05-20 11:18:20 +0200 1083) * Kick the timestamp of a local file.
Above we learn that line 1080 (a closing curly brace of the prior
method) still exists in branch master, but the Javadoc comment below
it has been removed by Christian Halstrick on May 20th as part of
commit 2302a6d3. This result differs considerably from that of C
Git's blame --reverse feature. JGit tells the reader which commit
performed the delete, while C Git tells the reader the last commit
that still contained the line, leaving it an exercise to the reader
to discover the descendant that performed the removal.
This is still only a basic implementation. Quite notably it is
missing support for the smart block copy/move detection that the C
implementation of `git blame` is well known for. Despite being
incremental, the BlameGenerator can only be run once. After the
generator runs it cannot be reused. A better implementation would
support applications browsing through history efficiently.
In regards to CQ 5110, only a little of the original code survives.
CQ: 5110
Bug: 306161
Change-Id: I84b8ea4838bb7d25f4fcdd540547884704661b8f Signed-off-by: Kevin Sawicki <kevin@github.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Shawn O. Pearce [Tue, 31 May 2011 16:14:50 +0000 (09:14 -0700)]
Merge branch 'stable-1.0'
* stable-1.0:
DHT: Support removing a repository name
DHT: Fix thread-safety issue in AbstractWriteBuffer
jgit.sh: Implement pager support
Change EditList to extend ArrayList
Ensure the HTTP request is fully consumed
Make sure test repositories are closed
Fix CloneCommand not to fetch into remote tracking branches when bare
Update Eclipse IP log for 1.0
Shawn O. Pearce [Fri, 27 May 2011 00:25:59 +0000 (17:25 -0700)]
DHT: Support removing a repository name
The first step to deleting a repository from the DHT storage is to
remove the name binding in the RepositoryIndexTable, making the
repository unavailable for lookup.
Change-Id: I469bf92f4bf2f555a15949569b21937c14cb142b Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Shawn O. Pearce [Fri, 27 May 2011 00:19:30 +0000 (17:19 -0700)]
DHT: Fix thread-safety issue in AbstractWriteBuffer
There is a data corruption issue with the 'running' list if a
background thread schedules something onto the buffer while the
application thread is also using it.
Change-Id: I5ba78b98b6632965d677a9c8f209f0cf8320cc3d Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Shawn O. Pearce [Sun, 29 May 2011 20:05:03 +0000 (13:05 -0700)]
jgit.sh: Implement pager support
If the command is either `diff` or `log`, there is often a lot of
lines of output. Run these commands through $GIT_PAGER, $PAGER, or
`less` in order to make it easier to browse the output on a terminal.
Change-Id: I18b87ea4acf404b94788f2ac2101812bd13e6a0f Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Shawn O. Pearce [Sun, 29 May 2011 19:24:27 +0000 (12:24 -0700)]
Change EditList to extend ArrayList
There is no reason for this type to contain an ArrayList and try to
hide the implementation. It only slows down execution by adding an
extra layer of method dispatch to each invocation.
Instead subclass from ArrayList.
Change-Id: Ifbb9c7060c2fe3d5a7397c1aa85fbade14088637 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Shawn O. Pearce [Fri, 27 May 2011 19:04:19 +0000 (12:04 -0700)]
Ensure the HTTP request is fully consumed
Some servlet containers require the servlet to read the EOF marker
from the input stream before a response can be output if the stream
is using "Transfer-Encoding: chunked"... which is typical for any
sort of large push to a repository over smart HTTP.
Ensure the EOF is always read by the PackParser when it is handling
the stream, and fail fast if there is more data present than expected
since this does indicate a protocol error.
Also ensure the EOF is read by UploadPack before it starts to output
a partial response using packing progress meters.
Change-Id: I131db9dea20b2324cb7c3272a814f21296bc64bd Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Some repositories created during tests are not added to the 'toClose'
list in LocalDiskRepositoryTestCase. Therefore when the tests end
we may have open FileHandles and on Windows this may cause the
tests to fail because we can't delete those files.
This is fixed by adding the possibility to explicitly add
repositories to the list of repos which are closed automatically.
Change-Id: I1261baeef4c7d9aaedd7c34b546393bfa005bbcc Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Fix CloneCommand not to fetch into remote tracking branches when bare
When cloning into a bare repository we should not create remote
tracking branches (e.g refs/remotes/origin/testX). Branches of the
remote repository should but fetched into into branches of the same
name (e.g refs/heads/testX). Also add the noCheckout option which
would prevent checkout after fetch.
Change-Id: I5d4cc0389f3f30c53aa0065f38119af2a1430909 Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
RefDirectory was not using FileSnapshot correctly in all places. This
is fixed with this commit. Additionally the constructors for the
different types of refs have been changed to take a FileSnapshot
instead of a modification time.
Change-Id: Ifb6a59e87e8b058a398c38cdfb9d648f0bad4bf8 Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Shawn O. Pearce [Fri, 13 May 2011 16:16:58 +0000 (09:16 -0700)]
DHT: Add sequence RefData
RefData now uses a sequence number as part of the field, ensuring
that updates always increase the sequence number by one whenever
a reference is modified.
Attaching a sequence number to RefData will help with storing
reference log entries during updates. As the sequence number should
be unique within the reference name space, log entries can be keyed
by the sequence number and remain unique. Making this work over
reference delete-create cycles will require an additional RefTable
API to return the oldest sequence number previously used in the
reference log to seed the recreated reference.
Change-Id: I11cfff2a96ef962e57f29925a3eef41bdbf9f9bb Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Shawn O. Pearce [Fri, 13 May 2011 14:44:42 +0000 (07:44 -0700)]
DHT: Replace TinyProtobuf with Google Protocol Buffers
The standard Google distribution of Protocol Buffers in Java is better
maintained than TinyProtobuf, and should be faster for most uses. It
does use slightly more memory due to many of our key types being
stored as strings in protobuf messages, but this is probably worth the
small hit to memory in exchange for better maintained code that is
easier to reuse in other applications.
Exposing all of our data members to the underlying implementation
makes it easier to develop reporting and data mining tools, or to
expand out a nested structure like RefData into a flat format in a SQL
database table.
Since the C++ `protoc` tool is necessary to convert the protobuf
script into Java code, the generated files are committed as part of
the source repository to make it easier for developers who do not have
this tool installed to still build the overall JGit package and make
use of it. Reviewers will need to be careful to ensure that any edits
made to a *.proto file come in a commit that also updates the
generated code to match.
CQ: 5135
Change-Id: I53e11e82c186b9cf0d7b368e0276519e6a0b2893 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Shawn O. Pearce [Thu, 5 May 2011 18:18:54 +0000 (11:18 -0700)]
DHT: Remove per-process ChunkCache
Performance testing has indicated the per-process ChunkCache isn't
very effective for the DHT storage implementation. If a server is
using the DHT storage backend, it is most likely part of a larger
cluster where requests are distributed in a round-robin fashion
between the member servers.
In such a scenario there is insufficient data locality between
requests to get a good hit ratio on the per-process ChunkCache. A low
hit ratio means the cache is actually hurting performance by eating up
memory that could otherwise be used for transient request data, and
increasing pressure on the GC when it needs to find free space.
Remove all of the ChunkCache code. Installations that want to cache
(to reduce database usage) should wrap their Database with a
CacheDatabase and use a network based CacheServer.
I left the ChunkCache in the original DHT storage commit because I
wanted to document in the history of the project that its probably
worth *not* having, but leave open a door for someone to revert this
change if they find otherwise at a later date.
Change-Id: I364d0725c46c5a19f7443642a40c89ba4d3fdd29 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Stefan Lay [Tue, 24 May 2011 08:38:59 +0000 (01:38 -0700)]
Add a DiffFormatter which calculates a patch-id
Adds a class which can be used to calculates a SHA1 of the diff
associated with a patch, similar to git patch-id.
In this version whitespace is not ignored.
Change-Id: I421d15ea905e23df543082786786841cbe3ef10d Signed-off-by: Stefan Lay <stefan.lay@sap.com> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Let RefDirectory use FileSnapShot to handle fast updates
Since this change may affect performance and memory consumption on every
access to a loose ref I explicitly made it a RFC to collect opinions.
Previously RefDirectory.scanRef() was not detecting an update of a
loose ref when the update didn't changed the modification time of
the backing file. RefDirectory cached loose refs and the way to detect
outdated cache entries was to compare lastmodification timestamp on the
file representing the ref. If two updates to the same ref happen faster
than the filesystem-timer granularity (for linux this is 2 seconds)
there is the possiblity that we don't detect the update.
Because of this bug EGit's PushOperationTest only works with 2 second
sleeps inside.
This change let RefDirectory use FileSnapshot to detect such situations.
FileSnapshot helps to remember when a file was last read from disk and
therefore enables to decide when to load a file from disk although
modification time has not changed.
Change-Id: I03b9a137af097ec69c4c5e2eaa512d2bdd7fe080 Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Bernard Leach [Sun, 22 May 2011 14:20:32 +0000 (00:20 +1000)]
Remove rebase temporary files on checkout failure
A checkout conflict during rebase setup should leave the repository
in SAFE state which means ensuring that the rebase temporary files
need to be removed.
Bug: 346813
Change-Id: If8b758fde73ed5a452a99a195a844825a03bae1a Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Bernard Leach [Fri, 13 May 2011 05:59:57 +0000 (15:59 +1000)]
Create a MergeResult for deleted/modified files
Change Ia2ab4f8dc95020f2914ff01c2bf3b1bc62a9d45d added merge
support for when OURS or THEIRS was simultaneously deleted
and modified. That changeset however did not add create an
entry in the conflicts table so clients would see a CONFLICTING
result but getConflicts() would return null.
This change creates a MergeResult for the conflicting file.
Bug: 345684
Change-Id: I52acb81c1729b49c9fb3e7a477c6448d8e55c317 Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Chris Aniszczyk [Wed, 18 May 2011 16:39:33 +0000 (11:39 -0500)]
Implement rebase ff for upstream branches with merge commits
Change Ib9898fe0f982fa08e41f1dca9452c43de715fdb6 added support for
the 'cherry-pick' fast forward case where the upstream commit history
does not include any merge commits. This change adds support for the
case where merge commits exist and the local branch has no changes.
Bug: 344779
Change-Id: If203ce5aa1b4e5d4d7982deb621b710e71f4ee10 Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Optimize MergeAlgorithm if ours or theirs is empty
Previously when merging two contents with a non-empty base and one of
the contents was empty (size == 0) and the other was modified there
was a potentially expensive calculation until we finally always come
to the same result -> the complete non-deleted content should collide
with the empty content.
This proposal adds an optimization to detect empty input content and
to produce the appropriate result immediatly.
Change-Id: Ie6a837260c19d808f0e99173f570ff96dd22acd3 Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Shawn O. Pearce [Mon, 16 May 2011 18:28:23 +0000 (11:28 -0700)]
Fix diff bug on inserted line
For the following patch on the linux 2.6.32 tag:
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -685,6 +685,7 @@ static void enqueue_sleeper(struct cfs_rq *cfs_rq, struct sc
JGit produced an incorrect diff, attempting to add a new "}" instead
of the new "#endif" at the end of the hunk. This was caused by a prior
fix for bug 328895 where we wanted to "slide" a diff down in the file
when adding a new method/function and want to show the closing curly
brace as being added after the new method, rather than added onto the
end of the prior function or method just before the insertion point.
Bug: 345956
Change-Id: I32b9e24f1e2980258b1b39dd1807919ab1c5f9b2 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Change-Id: I9e54f3e7e96892b64546270cbdf0308046e1d40c Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Stefan Lay [Fri, 6 May 2011 08:42:51 +0000 (10:42 +0200)]
Fix getHumanishName broken for windows paths
Since d1718a the method getHumanishName was broken on windows since
the URIish is not normalized anymore. For a path like
"C:\gitRepositories\egit" the whole path was returned instead of
"egit".
Bug: 343519
Change-Id: I95056009072b99d32f288966302d0f8188b47836 Signed-off-by: Stefan Lay <stefan.lay@sap.com>
Before this change any files in the conflicting set would
also be listed in the the other IndexDiff Sets which is
confusing. With this change a conflicting file will not
be included in any of the other sets.
Change-Id: Ife9f2652685220bcfddc1f9820423acdcd5acfdc Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Shawn O. Pearce [Wed, 2 Mar 2011 23:23:30 +0000 (15:23 -0800)]
Store Git on any DHT
jgit.storage.dht is a storage provider implementation for JGit that
permits storing the Git repository in a distributed hashtable, NoSQL
system, or other database. The actual underlying storage system is
undefined, and can be plugged in by implementing 7 small interfaces:
The storage provider interface tries to assume very little about the
underlying storage system, and requires only three key features:
* key -> value lookup (a hashtable is suitable)
* atomic updates on single rows
* asynchronous operations (Java's ExecutorService is easy to use)
Most NoSQL database products offer all 3 of these features in their
clients, and so does any decent network based cache system like the
open source memcache product. Relying only on key equality for data
retrevial makes it simple for the storage engine to distribute across
multiple machines. Traditional SQL systems could also be used with a
JDBC based spi implementation.
Before submitting this change I have implemented six storage systems
for the spi layer:
* Apache HBase[1]
* Apache Cassandra[2]
* Google Bigtable[3]
* an in-memory implementation for unit testing
* a JDBC implementation for SQL
* a generic cache provider that can ride on top of memcache
All six systems came in with an spi layer around 1000 lines of code to
implement the above 7 interfaces. This is a huge reduction in size
compared to prior attempts to implement a new JGit storage layer. As
this package shows, a complete JGit storage implementation is more
than 17,000 lines of fairly complex code.
A simple cache is provided in storage.dht.spi.cache. Implementers can
use CacheDatabase to wrap any other type of Database and perform fast
reads against a network based cache service, such as the open source
memcached[4]. An implementation of CacheService must be provided to
glue this spi onto the network cache.