mirrors/jgit - jgit - source @ dussan.org

Commit Graph

Author	SHA1	Message	Date
David Pursehouse	064834d350	Reorder modifiers to follow Java Language Specification The Java Language Specification recommends listing modifiers in the following order: 1. Annotations 2. public 3. protected 4. private 5. abstract 6. static 7. final 8. transient 9. volatile 10. synchronized 11. native 12. strictfp Not following this convention has no technical impact, but will reduce the code's readability because most developers are used to the standard order. This was detected using SonarLint. Change-Id: I9cddecb4f4234dae1021b677e915be23d349a380 Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>	4 years ago
David Pursehouse	9be93b7991	Remove redundant "static" qualifier from enum declarations Nested enum types are implicitly static. Change-Id: Id3d7886087494fb67bc0d080b4a3491fb4baac19 Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>	4 years ago
Matthias Sohn	5c5f7c6b14	Update EDL 1.0 license headers to new short SPDX compliant format This is the format given by the Eclipse legal doc generator [1]. [1] https://www.eclipse.org/projects/tools/documentation.php?id=technology.jgit Bug: 548298 Change-Id: I8d8cabc998ba1b083e3f0906a8d558d391ffb6c4 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	4 years ago
Dave Borowitz	5c02ce52d6	Allow overriding DfsPackDescription comparator for scanning packs Provide a factory for comparators that use the default heuristics except with a different ordering of PackSources. Change-Id: I0809b64deb3d0486040076946fdbdad650d69240	6 years ago
Dave Borowitz	96512f5d3b	Move DfsPackDescription comparators to common location There are several ways of comparing DfsPackDescriptions for different purposes, such as object lookup search order and reftable ordering. Some of these are later compounded into comparators on other objects, so they appear in the code as Comparator<DfsReftable>, for example. Put all the DfsPackDescription comparators in static methods on DfsPackDescription itself. Stop implementing Comparable, to avoid giving the impression that there is always one true and correct way of sorting packs. Change-Id: Ia5ca65249c13373f7ef5b8a5d1ad50a26577706c	6 years ago
Dave Borowitz	e7bacf0a7f	Use Comparators for PackSource Rather than requiring callers to do their own computations based on the package-private "category" number, provide an actual Comparator<PackSource> instance, and explicitly discourage usage of default Enum comparison. Construct the default comparator using a builder pattern based on defining equivalence classes. This gives us the same behavior as the old category field in PackSource, with an abstraction that does not leak the implementation detail of comparing rank numbers. Change-Id: I6757211397ab1bc181d61298e073f88b69dbefc3	6 years ago
Dave Borowitz	43ec590d0e	DfsPackDescription: Disallow null PackSource In normal operation, the source of a pack should never be null; the DFS implementation should always know where a pack came from. Existing implementations in InMemoryRepository and at Google always have the source available at construction time. The problem with null PackSources in the previous implementation was it made the DfsPackDescription#compareTo method intransitive. Specifically, it skips comparing the sources at all if either operand is null. Suppose we have three descriptions A, B, and C, where all fields are equal except the PackSource, and: * A's source is INSERT * B's source is null * C's source is RECEIVE In this case, A.compareTo(B) == 0, and B.compareTo(C) == 0, since all fields are equal except the source, which is skipped. But A.compareTo(C) != 0, since A and B have different sources. Avoid this problem in compareTo by enforcing that the source is never null. We could of course assign an arbitrary category number to a null source in order to make comparison transitive[1], but it's simpler to implement and reason about if the field is non-nullable, and there is no real-world use case to make it null. Although a non-null source is required at construction time, the field is currently still mutable: DfsPackDecscription#setPackSource is used by DfsInserterTest to mark packs as garbage. This could probably be avoided as well, allowing us to convert packSource to a final field, but doing so is beyond the scope of this change. [1] The astute reader will notice this is already done by DfsObjDatabase#reftableComparator(). In fact, the reason that different comparator implementations non-obviously have different semantics for this nullable field is another reason why it's clearer to avoid null entirely. Change-Id: I85a2aaf3fd6d4868f241f7972a0349f087830ffa	6 years ago
Han-Wen Nienhuys	f3ec7cf3f0	Remove further unnecessary 'final' keywords Remove it from * package private functions. * try blocks * for loops this was done with the following python script: $ cat f.py import sys import re import os def replaceFinal(m): return m.group(1) + "(" + m.group(2).replace('final ', '') + ")" methodDecl = re.compile(r"^([\t ][a-zA-Z_ ]+)$([^)])$") def subst(fn): input = open(fn) os.rename(fn, fn + "~") dest = open(fn, 'w') for l in input: l = methodDecl.sub(replaceFinal, l) dest.write(l) dest.close() for root, dirs, files in os.walk(".", topdown=False): for f in files: if not f.endswith('.java'): continue full = os.path.join(root, f) print full subst(full) Change-Id: If533a75a417594fc893e7c669d2c1f0f6caeb7ca Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>	6 years ago
Minh Thai	bf8057058e	scanPacks to return reftables even if no packs An empty repository may have a dangling symref HEAD pointing to refs/heads/master. In this case, there will be a reftable even though there are no packs yet. Change-Id: Ib759ffbbfc490953481853e74263dd46d2592888 Signed-off-by: Minh Thai <mthai@google.com>	6 years ago
Matthias Sohn	a224b78675	Fix javadoc in org.eclipse.jgit dfs package Change-Id: I1f5e3dc3ba34b323ee7244dbefee207ce19e6021 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	6 years ago
Shawn Pearce	7cd5d77ae3	dfs: Switch InMemoryRepository to DfsReftableDatabase This ensure DfsReftableDatabase is tested by the same test suites that use/test InMemoryRepository. It also simplifies the logic of InMemoryRepository and brings its compatibility story closer to any other DFS repository that uses reftables for its reference storage. Change-Id: I881469fd77ed11a9239b477633510b8c482a19ca Signed-off-by: Minh Thai <mthai@google.com> Signed-off-by: Terry Parker <tparker@google.com>	6 years ago
Shawn Pearce	1d31257a5d	dfs: reftable backed DfsRefDatabase DfsReftableDatabase is a new alternative for DfsRefDatabase that handles more operations for the implementor by delegating through reftables. All reftable files are stored in sibling DfsObjDatabase using PackExt.REFTABLE and PackSource.INSERT. Its assumed the DfsObjDatabase periodically runs compactions and GCs using DfsPackCompactor and DfsGarbageCollector. Those passes are essential to collapsing the stack of reftables. Change-Id: Ia03196ff6fd9ae2d0623c3747cfa84357c6d0c79 Signed-off-by: Minh Thai <mthai@google.com> Signed-off-by: Terry Parker <tparker@google.com>	6 years ago
Shawn Pearce	1a7b8a11df	dfs: expose DfsReftable from DfsObjDatabase Reftable storage in DFS is related to pack storage. Reftables are stored in the same namespace, but with PackExt.REFTABLE. Include the set of DfsReftable instances in the PackList and export some helpers to access the tables. Change-Id: I6a4f5f953ed6b0ff80a7780f4c6cbcc5eda0da3e	6 years ago
Shawn Pearce	e6d9ae058b	dfs: only create DfsPackFile if description has PACK In the future with reftable a DFS implementation may choose to create a PackDescription that contains only a REFTABLE extension. Filter these out by only creating a DfsPackFile if the PackDescription as the expected PackExt.PACK. Change-Id: I4c831622378156ae6b68f82c1ee1db5e150893be	6 years ago
Shawn Pearce	07f98a8b71	Derive DfsStreamKey from DfsPackDescription By making this a deterministic function, DfsBlockCache can stop retaining a map of every DfsPackDescription it has ever seen. This fixes a long standing memory leak in DfsBlockCache. This refactoring also simplifies the idea of setting up more lightweight objects around streams. Change-Id: I051e7b96f5454c6b0a0e652d8f4a69c0bed7f6f4	6 years ago
Shawn Pearce	1513a5632d	Allow DfsReader to be subclassed Necessary if a DFS implementation wants to override close() to record DfsReaderIoStats. Change-Id: I144575f9bf1abf2c1fd72030550c4f0795fcf44d	7 years ago
David Pursehouse	3b4448637f	Enable and fix warnings about redundant specification of type arguments Since the introduction of generic type parameter inference in Java 7, it's not necessary to explicitly specify the type of generic parameters. Enable the warning in Eclipse, and fix all occurrences. Change-Id: I9158caf1beca5e4980b6240ac401f3868520aad0 Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>	7 years ago
Thirumala Reddy Mutchukota	c9f55032a2	Record the estimated size of the pack files. The Compacter and Garbage Collector will record the estimated size of the newly going to be created compact, gc or garbage packs. This information can be used by the clients to better make a call on how to actually store the pack based on the approximated expected size. Added a new protected method DfsObjDatabase.newPack(PackSource packSource, long estimatedPackSize), so that the clients can override this method to make use of the estimatedPackSize while creating a new PackDescription object. The default implementation of this method is equivalent to newPack(packSource).setEstimatedPackSize(estimatedPackSize). I didn't make it abstract because that would force all the existing sub classes of DfsObjDatabase to implement this method. Due to this default implementation, the estimatedPackSize is added to DfsPackDescription using a setter instead of a constructor parameter (even though constructor parameter would be a better choice as this value is set only during the object creation). Change-Id: Iade1122633ea774c2e842178a6a6cbb4a57b598b Signed-off-by: Thirumala Reddy Mutchukota <thirumala@google.com>	7 years ago
Shawn Pearce	f15e9c088a	DfsObjDatabase: clear PackList dirty bit if no new packs If a reference was updated more recently than a pack was written (typical) the PackList was perpetually dirty until the next GC was completed for the repository. Detect this condition by observing no changes to the PackList membership and resetting the dirty bit. Change-Id: Ie2133aca1f8083307c73b6a26358175864f100ef	7 years ago
Dave Borowitz	ecb2aa0503	DfsObjDatabase: Add lazy last modified method to PackList Change-Id: Id045f162fa584ea14da29a9df58a42c53a78dc15	7 years ago
Dave Borowitz	0f1c361e62	DfsObjectDatabase: Expose PackList and move markDirty there What's invalidated when an object database is "dirty" is not the whole database, but rather a specific list of packs. If there is a race between getting the pack list and setting the volatile dirty flag where the packs are rescanned, we don't need to mark the new pack list as dirty. This is a fine point that only really applies if the decision of whether or not to mark dirty actually requires introspecting the pack list (say, its timestamps). The general operation of "take whatever is the current pack list and mark it dirty" may still be inherently racy, but the cost is not so high. Change-Id: I159e9154bd8b2d348b4e383627a503e85462dcc6	7 years ago
Dave Borowitz	18e9db306b	Invalidate DfsObjDatabase pack list when refs are updated Currently, there is a race where a user of a DfsRepository in a single thread may get unexpected MissingObjectExceptions trying to look up an object that appears as the current value of a ref: 1. Thread A scans packs before scanning refs, for example by reading an object by SHA-1. 2. Thread B flushes an object and updates a ref to point to that object. 3. Thread A looks up the ref updated in (2). Since it is scanning refs for the first time, it sees the new object SHA-1. 4. Thread A tries to read the object it found in (3), using the cached pack list it got from (1). The object appears missing. Allow implementations to work around this by marking the object database's current pack list as "dirty." A dirty pack list means that DfsReader will rescan packs and try again if a requested object is missing. Implementations should mark objects as dirty any time the ref database reads or scans refs that might be newer than a previously cached pack list. Change-Id: I06c722b20c859ed1475628ec6a2f6d3d6d580700	7 years ago
Shawn Pearce	30eb6423a2	Add GC_REST PackSource to better order DFS packs Force reads to use a search ordering of: INSERT / RECEIVE COMPACT GC (heads) GC_REST (non-heads) GC_TXN (refs/txn) UNREACHABLE_GARBAGE This has provided decent performance for object lookups. Starting from an arbitrary reference may find the content in a newer pack created by DfsObjectInserter or a ReceivePack server. Compaction of recent packs also contains newer content, and then most interesting data is in the "main" GC pack. As the GC pack is self-contained (has no edges leading outside) readers typically do not need to go further. Adding a new GC_REST PackSource allows the DfsGarbageCollector to identify to the pack ordering code which pack is which, so the non-heads are scanned second during reads. This removes a hack that was unique to Google's implementation that enforced this behavior by fixing up the lastModified timestamp. Renumber the PackSource's categories to reflect this search ordering. Change-Id: I19fdaab8a8d6687cbe8c88488e6daa0630bf189a	8 years ago
Shawn Pearce	40051505d7	GC: Pack RefTrees in their own pack The RefTree graph needs to be quickly accessed to read references. It is also distinct graph disconnected from the rest of the repository. Store the commit and tree objects in their own pack. Change-Id: Icbb735be8fa91ccbf0708ca3a219b364e11a6b83	8 years ago
Mike Williams	c4d73fb7cc	Insert duplicate objects to prevent race during garbage collection. Prior to this change, DfsInserter would not insert an object into a pack if it already existed in another pack in the repository, even if that pack was unreachable. Consider this sequence of events: - Object FOO is pushed to a repository. - Subsequent ref changes make FOO UNREACHABLE_GARBAGE. - FOO is subsequently re-inserted using a DfsInserter, but skipped due to existing in UNREACHABLE_GARBAGE. - The repository is repacked; FOO will not be written into a new pack because it is not yet reachable from a reference. If the UNREACHABLE_GARBAGE packs are deleted, FOO disappears. - A reference is updated to reference FOO. This reference is now broken as FOO was removed when the repacking process deleted the UNREACHABLE_GARBAGE pack that stored the only copy of FOO. The garbage collector can't safely delete the UNREACHABLE_GARBAGE pack because FOO might be in the middle of being re-inserted/re-packed. This change writes a duplicate copy of an object if it only exists in UNREACHABLE_GARBAGE. This "freshens" the object to give it a chance to survive long enough to be made reachable through a reference. Change-Id: I20f2062230f3af3bccd6f21d3b7342f1152a5532 Signed-off-by: Mike Williams <miwilliams@google.com>	8 years ago
Shawn Pearce	f32b861243	JGit 3.0: move internal classes into an internal subpackage This breaks all existing callers once. Applications are not supposed to build against the internal storage API unless they can accept API churn and make necessary updates as versions change. Change-Id: I2ab1327c202ef2003565e1b0770a583970e432e9	11 years ago
Shawn Pearce	ea5eef912a	Cluster UNREACHABLE_GARBAGE packs at the end of the search list Garbage is unlikely to be used by a reader. Ensure they always cluster at the end of the search list, no matter what timestamp was used on the pack files. Change-Id: I3bed89e9569ee3363c36bb3f73fcd34057a3883f	11 years ago
Colby Ranger	698705c754	Rename PackConstants to PackExt, a typed pack file extension. PackConstants previously contained string values for the pack and pack index extension. Change PackConstant to be PackExt, a typed wrapper around the string pack file extension. Change-Id: I86ac4db6da8f33aa42d6f37cfcc119e819444318	11 years ago
Colby Ranger	5d3c2b3def	Update DfsObjDatabase API to open/write by pack extension. Previously, the DfsObjDatabase had a hardcoded getPackFile() and getPackIndex() methods which opens a .pack and .idx file, respectively. A future change to add a bitmap index will need to be stored in a parallel .bitmap file. Update the DfsObjDatabase to support opening and writing of files for any pack extension. Change-Id: I7c403b501e242096a2d435f6865d6025a9f86108	11 years ago
Shawn O. Pearce	3534fa9c61	Expose some DFS APIs as public or protected Expose class DfsReader and method DfsPackFile.hasObject() as public. Applications may want to be able to inquire about some details of the storage of a repository. Make this possible by exposing some simple accessor methods. Expose method DfsObjDatabase.clearCache() as protected, allowing implementing subclasses to dump the cache if necessary, and force it to reload on a future request. Change-Id: Ic592c82d45ace9f2fa5f8d7e4bacfdce96dea969	12 years ago
Dave Borowitz	84c80be1dc	Fire DfsPacksChangedEvents when committing packs. Once a pack has been committed with commitPack(), we know that the pack list has changed but we don't re-scan the underlying storage. Change-Id: Ia7b35df4442a5f5dfe7e817edcc77b44b5410d08	12 years ago
Dave Borowitz	0f8e486a4d	Add a listener for changes to a DfsObjDatabase's pack files Intended for cross-request use, so only refers to DfsRepositoryDescriptions rather than DfsRepositorys. Change-Id: I2633e472c9264d91d632069f608d53d4bdd0fc09	12 years ago
Dave Borowitz	35d72ac806	Add a DFS repository description and reference it in each pack Just as DfsPackDescription describes a pack but does not imply it is open in memory, a DfsRepositoryDescription describes a repository at a basic level without it necessarily being open. Change-Id: I890b5fccdda12c1090cfabf4083b5c0e98d717f6	12 years ago
Shawn O. Pearce	fa4cc2475f	DFS: A storage layer for JGit In practice the DHT storage layer has not been performing as well as large scale server environments want to see from a Git server. The performance of the DHT schema degrades rapidly as small changes are pushed into the repository due to the chunk size being less than 1/3 of the pushed pack size. Small chunks cause poor prefetch performance during reading, and require significantly longer prefetch lists inside of the chunk meta field to work around the small size. The DHT code is very complex (>17,000 lines of code) and is very sensitive to the underlying database round-trip time, as well as the way objects were written into the pack stream that was chunked and stored on the database. A poor pack layout (from any version of C Git prior to Junio reworking it) can cause the DHT code to be unable to enumerate the objects of the linux-2.6 repository in a completable time scale. Performing a clone from a DHT stored repository of 2 million objects takes 2 million row lookups in the DHT to locate the OBJECT_INDEX row for each object being cloned. This is very difficult for some DHTs to scale, even at 5000 rows/second the lookup stage alone takes 6 minutes (on local filesystem, this is almost too fast to bother measuring). Some servers like Apache Cassandra just fall over and cannot complete the 2 million lookups in rapid fire. On a ~400 MiB repository, the DHT schema has an extra 25 MiB of redundant data that gets downloaded to the JGit process, and that is before you consider the cost of the OBJECT_INDEX table also being fully loaded, which is at least 223 MiB of data for the linux kernel repository. In the DHT schema answering a `git clone` of the ~400 MiB linux kernel needs to load 248 MiB of "index" data from the DHT, in addition to the ~400 MiB of pack data that gets sent to the client. This is 193 MiB more data to be accessed than the native filesystem format, but it needs to come over a much smaller pipe (local Ethernet typically) than the local SATA disk drive. I also never got around to writing the "repack" support for the DHT schema, as it turns out to be fairly complex to safely repack data in the repository while also trying to minimize the amount of changes made to the database, due to very common limitations on database mutation rates.. This new DFS storage layer fixes a lot of those issues by taking the simple approach for storing relatively standard Git pack and index files on an abstract filesystem. Packs are accessed by an in-process buffer cache, similar to the WindowCache used by the local filesystem storage layer. Unlike the local file IO, there are some assumptions that the storage system has relatively high latency and no concept of "file handles". Instead it looks at the file more like HTTP byte range requests, where a read channel is a simply a thunk to trigger a read request over the network. The DFS code in this change is still abstract, it does not store on any particular filesystem, but is fairly well suited to the Amazon S3 or Apache Hadoop HDFS. Storing packs directly on HDFS rather than HBase removes a layer of abstraction, as most HBase row reads turn into an HDFS read. Most of the DFS code in this change was blatently copied from the local filesystem code. Most parts should be refactored to be shared between the two storage systems, but right now I am hesistent to do this due to how well tuned the local filesystem code currently is. Change-Id: Iec524abdf172e9ec5485d6c88ca6512cd8a6eafb	13 years ago

27 Commits (25a6bd4d614589c968090fb506fc9b26d5c82fe2)