mirrors/jgit - jgit - source @ dussan.org

Commit Graph

Author	SHA1	Message	Date
David Pursehouse	f04bd20fc6	InMemoryRepository: Remove unused RevWalk from batch method signature The RevWalk given in the arguments is not used. According to the comment at the top of the method, a new RevWalk is intentionally used in the implementation. Remove the unused argument. Change-Id: Iec81a1341d5bf377801475845b96a465753096ef Signed-off-by: David Pursehouse <david.pursehouse@sonymobile.com>	8 years ago
Shawn Pearce	eadfcd3ec1	ReceiveCommand.abort(): Utility to mark batch of commands as failed If one or more commands is failing the entire group usually has to also fail with "transaction aborted". Pull this loop into a helper so the idiom can be easily reused in several places throughout JGit. Change-Id: I3b9399b7e26ce2b0dc5f7baa85d585a433b4eaed	8 years ago
Shawn Pearce	48e245fc60	RefTreeDatabase: Ref database using refs/txn/committed Instead of storing references in the local filesystem rely on the RefTree rooted at refs/txn/committed. This avoids needing to store references in the packed-refs file by keeping all data rooted under a single refs/txn/committed ref. Performance to scan all references from a well packed RefTree is very close to reading the packed-refs file from local disk. Storing a packed RefTree is smaller due to pack file compression, about 49.39 bytes/ref (on average) compared to packed-refs using ~65.49 bytes/ref. Change-Id: I75caa631162dc127a780095066195cbacc746d49	8 years ago
Shawn Pearce	4c9eda17be	InMemoryRepository: Abort BatchRefUpdate if a command previously failed If any command has already been marked as failing, fail the entire batch. Change-Id: I1692240841aa4f4cb252bdccbc6f11d9246929c1	8 years ago
Andrey Loskutov	95b36b397b	Null-annotated Ref class and fixed related compiler errors This change fixes all compiler errors in JGit and replaces possible NPE's with either appropriate exceptions, avoiding multiple "Nullable return" method calls or early returning from the method. Change-Id: I24c8a600ec962d61d5f40abf73eac4203e115240 Signed-off-by: Andrey Loskutov <loskutov@gmx.de>	8 years ago
Shawn Pearce	3d8e6b1e16	Support atomic push in JGit client This should mirror the behavior of `git push --atomic` where the client asks the server to apply all-or-nothing. Some JGit servers already support this based on a custom DFS backend. InMemoryRepository is extended to support atomic push for unit testing purposes. Local disk server side support inside of JGit is a more complex animal due to the excessive amount of file locking required to protect every reference as a loose reference. Change-Id: I15083fbe48447678e034afeffb4639572a32f50c	8 years ago
Andrey Loskutov	3afdaf0b3d	[performance] Remove synthetic access$ methods in dfs, diff and merge Java compiler must generate synthetic access methods for private methods and fields of the enclosing class if they are accessed from inner classes and vice versa. While invisible in the code, those synthetic access methods exist in the bytecode and seem to produce some extra execution overhead at runtime (compared with the direct access to this fields or methods), see https://git.eclipse.org/r/58948/. By removing the "private" access modifier from affected methods and fields we help compiler to avoid generation of synthetic access methods and hope to improve execution performance. To validate changes, one can either use javap or use Bytecode Outline plugin in Eclipse. In both cases one should look for "synthetic access$<number>" methods at the end of the class and inner class files in question - there should be none. NB: don't mix this "synthetic access$" methods up with "public synthetic bridge" methods generated to allow generic method override return types. Change-Id: I94fb481b68c84841c1db1a5ebe678b13e13c962b Signed-off-by: Andrey Loskutov <loskutov@gmx.de>	8 years ago
Shawn Pearce	1553a6ff55	Add readAheadBuffer hint to ReadableChannel This hint allows an underlying implementation to read more bytes when possible and buffer them locally for future read calls to consume. Change-Id: Ia986a1bb8640eecb91cfbd515c61fa1ff1574a6f	9 years ago
Dave Borowitz	89b91ad406	InMemoryRepository: Use a real Builder class Change-Id: I161b98a58503415955a21f2720395611f439ce98	9 years ago
Dave Borowitz	d79cadb3cf	TestRepository: Add a reset method to move HEAD around This flushed out a number of bugs in the way DfsRefUpdate, or at least the InMemoryRepository implementation, processes symrefs. These have been fixed, to an extent, in InMemoryRepository, but other implementations may still suffer from these bugs. Change-Id: Ifd12115a0060b9ff45a88d305b72f91ca0472f9a	9 years ago
Dave Borowitz	d612468c5c	InMemoryRepository: Ensure new ref targets exist in the repo ObjectInserter recently learned to read back inserted objects before they have been flushed. It is in general unsafe to create refs to such objects, but it is now much more possible to do so, by passing "new RevWalk(inserter.newReader())" into RefUpdate#execute(RevWalk). We can't change the RefUpdate interface to remove execute(RevWalk); nor would we necessarily want to, for performance reasons. And in any case, RefUpdate#safeParse explicitly ignores MissingObjectExceptions. But we can enforce object existence in InMemoryRepository, which will allow callers using this class in their tests to ensure they are using the RefDatabase correctly. Change-Id: I5c696ba23bcd2a536a0512fa7f5b6130961905c5	9 years ago
Shawn Pearce	d70419ab00	Revert "Add a method to DfsOutputStream to read as an InputStream" This reverts commit `b646578d89`. openInputStream() is never used in JGit, nor is it used by any known working DFS implementation. The method was added as a utility for reading back from a DfsInserter, but the final implementation of that feature does not requrire this method. Change-Id: I075ad95e40af49c92b554480f8993ef5658f7684	9 years ago
Dave Borowitz	b646578d89	Add a method to DfsOutputStream to read as an InputStream Change-Id: I0ec1f17a88bc14f22c10f9bc8d6f5b5118410e3a	11 years ago
Shawn Pearce	f32b861243	JGit 3.0: move internal classes into an internal subpackage This breaks all existing callers once. Applications are not supposed to build against the internal storage API unless they can accept API churn and make necessary updates as versions change. Change-Id: I2ab1327c202ef2003565e1b0770a583970e432e9	11 years ago
Colby Ranger	698705c754	Rename PackConstants to PackExt, a typed pack file extension. PackConstants previously contained string values for the pack and pack index extension. Change PackConstant to be PackExt, a typed wrapper around the string pack file extension. Change-Id: I86ac4db6da8f33aa42d6f37cfcc119e819444318	11 years ago
Colby Ranger	5d3c2b3def	Update DfsObjDatabase API to open/write by pack extension. Previously, the DfsObjDatabase had a hardcoded getPackFile() and getPackIndex() methods which opens a .pack and .idx file, respectively. A future change to add a bitmap index will need to be stored in a parallel .bitmap file. Update the DfsObjDatabase to support opening and writing of files for any pack extension. Change-Id: I7c403b501e242096a2d435f6865d6025a9f86108	11 years ago
Robin Rosenberg	c310fa0c80	Mark non-externalizable strings as such A few classes such as Constanrs are marked with @SuppressWarnings, as are toString() methods with many liternal, but otherwise $NLS-n$ is used for string containing text that should not be translated. A few literals may fall into the gray zone, but mostly I've tried to only tag the obvious ones. Change-Id: I22e50a77e2bf9e0b842a66bdf674e8fa1692f590	11 years ago
Colby Ranger	b777d7797d	Implement wasDeltaAttempted() in DfsObjectRepresentation. In DFS, everything is stored in a pack but only objects in a pack with source GC or UNREACHABLE_GARBAGE have had delta compression attempted. Expose the PackSource setter and getter on DfsPackDescription in order to implement wasDeltaAttempted. Change-Id: Ie949f321147ad870f1c3f23b552343bbbda32152	11 years ago
Dave Borowitz	dc23a7cc42	Make InMemoryRepository pack names globally unique It was easy to create multiple packs with exactly the same name and same DfsRepositoryDescription in a test, which can poison the DfsBlockCache. The javadoc for DfsObjDatabase.newPack() explicitly says pack names should be unique within an entire DFS, so do this by making the packId AtomicInteger static. Arguably, test writers shouldn't be doing things like putting 'new DfsRepositoryDescription("test")' in a setUp() method, but that's a natural thing to do, and we don't document this restriction anywhere. Change-Id: I9477413ab3950d83b7d17e173fbc0a3e064896e3	12 years ago
Matthias Sohn	c394d05a47	Add missing @since tags to mark API added in 2.0 Change-Id: I0a86ce0e393dfde9bb27f0b29e036e76c856396e Signed-off-by: Matthias Sohn <matthias.sohn@sap.com> Signed-off-by: Chris Aniszczyk <zx@twitter.com>	12 years ago
Dave Borowitz	9bc26efe9d	Pass a DfsRepositoryDescription to InMemoryRepository This was likely intended originally, but this class had never been used, so the mistake went unnoticed. Change-Id: I5e0e9f22ebf707c11d0581511c7a56b182188f77	12 years ago
Dave Borowitz	84c80be1dc	Fire DfsPacksChangedEvents when committing packs. Once a pack has been committed with commitPack(), we know that the pack list has changed but we don't re-scan the underlying storage. Change-Id: Ia7b35df4442a5f5dfe7e817edcc77b44b5410d08	12 years ago
Dave Borowitz	35d72ac806	Add a DFS repository description and reference it in each pack Just as DfsPackDescription describes a pack but does not imply it is open in memory, a DfsRepositoryDescription describes a repository at a basic level without it necessarily being open. Change-Id: I890b5fccdda12c1090cfabf4083b5c0e98d717f6	12 years ago
Shawn O. Pearce	fa4cc2475f	DFS: A storage layer for JGit In practice the DHT storage layer has not been performing as well as large scale server environments want to see from a Git server. The performance of the DHT schema degrades rapidly as small changes are pushed into the repository due to the chunk size being less than 1/3 of the pushed pack size. Small chunks cause poor prefetch performance during reading, and require significantly longer prefetch lists inside of the chunk meta field to work around the small size. The DHT code is very complex (>17,000 lines of code) and is very sensitive to the underlying database round-trip time, as well as the way objects were written into the pack stream that was chunked and stored on the database. A poor pack layout (from any version of C Git prior to Junio reworking it) can cause the DHT code to be unable to enumerate the objects of the linux-2.6 repository in a completable time scale. Performing a clone from a DHT stored repository of 2 million objects takes 2 million row lookups in the DHT to locate the OBJECT_INDEX row for each object being cloned. This is very difficult for some DHTs to scale, even at 5000 rows/second the lookup stage alone takes 6 minutes (on local filesystem, this is almost too fast to bother measuring). Some servers like Apache Cassandra just fall over and cannot complete the 2 million lookups in rapid fire. On a ~400 MiB repository, the DHT schema has an extra 25 MiB of redundant data that gets downloaded to the JGit process, and that is before you consider the cost of the OBJECT_INDEX table also being fully loaded, which is at least 223 MiB of data for the linux kernel repository. In the DHT schema answering a `git clone` of the ~400 MiB linux kernel needs to load 248 MiB of "index" data from the DHT, in addition to the ~400 MiB of pack data that gets sent to the client. This is 193 MiB more data to be accessed than the native filesystem format, but it needs to come over a much smaller pipe (local Ethernet typically) than the local SATA disk drive. I also never got around to writing the "repack" support for the DHT schema, as it turns out to be fairly complex to safely repack data in the repository while also trying to minimize the amount of changes made to the database, due to very common limitations on database mutation rates.. This new DFS storage layer fixes a lot of those issues by taking the simple approach for storing relatively standard Git pack and index files on an abstract filesystem. Packs are accessed by an in-process buffer cache, similar to the WindowCache used by the local filesystem storage layer. Unlike the local file IO, there are some assumptions that the storage system has relatively high latency and no concept of "file handles". Instead it looks at the file more like HTTP byte range requests, where a read channel is a simply a thunk to trigger a read request over the network. The DFS code in this change is still abstract, it does not store on any particular filesystem, but is fairly well suited to the Amazon S3 or Apache Hadoop HDFS. Storing packs directly on HDFS rather than HBase removes a layer of abstraction, as most HBase row reads turn into an HDFS read. Most of the DFS code in this change was blatently copied from the local filesystem code. Most parts should be refactored to be shared between the two storage systems, but right now I am hesistent to do this due to how well tuned the local filesystem code currently is. Change-Id: Iec524abdf172e9ec5485d6c88ca6512cd8a6eafb	13 years ago

14 Commits (f04bd20fc658365fcf24086ea26beaf3ff5f1ada)