Keep track of the original cause for a packfile invalidation.
It is needed for the sysadmin to understand if there is a real
underlying filesystem problem and repository corruption or if it is
simply a consequence of a concurrency of Git operations (e.g. repack
or GC).
Change-Id: I06ddda9ec847844ec31616ab6d17f153a5a34e33
Signed-off-by: Luca Milanesio <luca.milanesio@gmail.com>
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
DfsPackFile: Refactor getBitmapIndex to open ReadableChannel in try-with-resource
Refactor getBitmapIndex to open ReadableChannel in try-with-resource
instead of closing the channel in the finally block.
The same cannot be done in copyPackThroughCache, so just suppress the
warning with an explanatory comment.
Change-Id: I9b95373d350728e85a159423d5ca80e8b215914d
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Open auto-closeable resources in try-with-resource
When an auto-closeable resources is not opened in try-with-resource,
the warning "should be managed by try-with-resource" is emitted by
Eclipse.
Fix the ones that can be silenced simply by moving the declaration of
the variable into a try-with-resource.
In cases where we explicitly call the close() method, for example in
tests where we are testing specific behavior caused by the close(),
suppress the warning.
Leave the ones that will require more significant refcactoring to fix.
They can be done in separate commits that can be reviewed and tested
in isolation.
Change-Id: I9682cd20fb15167d3c7f9027cecdc82bc50b83c4
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
CorruptObjectException has a constructor that takes Throwable and
calls initCause with it. Use that instead of instantiating the
exception and explicitly calling initCause.
Change-Id: I1f2747d6c4cc5249e93401b9787eb4ceb50cb995
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Use new StoredObjectRepresentationNotAvailableException constructor
In 5e7eed4 a new StoredObjectRepresentationNotAvailableException
constructor was added, that takes a Throwable to initialize the
exception cause.
Update more call sites to use this constructor instead of first
instantiating it and explicitly calling initCause().
All callers now use the new constructor, so annotate the other one as
deprecated.
Change-Id: I6d2a7e289a95f0360ddebf904cfd8b6c18fef10c
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
StoredObjectRepresentationNotAvailableException: Add constructor that takes cause
If the cause can be passed into the constructor, callers don't need to
instantiate it and then explicitly call initCause.
Note that the constructors in this class cause "non-API parameter type"
warnings because ObjectToPack is internal, however it's probably OK
since the only non-internal reference to it is in the pgm.debug package.
Change-Id: Ia4eab24e79f9afe6214ea8160137d941d4048319
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Reftable storage in DFS is related to pack storage. Reftables are
stored in the same namespace, but with PackExt.REFTABLE. Include
the set of DfsReftable instances in the PackList and export some
helpers to access the tables.
Change-Id: I6a4f5f953ed6b0ff80a7780f4c6cbcc5eda0da3e
dfs: support reading reftables through DfsBlockCache
DfsBlockCache directly shares its internal byte[] with ReftableReader,
avoding copying between the DfsBlockCache and the BlockReader
instances used by ReftableReader.
Change-Id: Icaa4f40052b26f952681414653a8b5314b7c2c23
dfs: optionally store blockSize in DfsPackDescription
Allow a DFS implementation to report blockSize to DfsPackFile,
bypassing alignment errors and corrections in the DfsBlockCache when
the blockSize of a specific file differs from the cache's configured
blockSize.
Change-Id: Ic376314d4a86a0bd528c033e169d93eef035b233
dfs: Fix caching of index, bitmap index, reverse index
When 07f98a8b71 ("Derive DfsStreamKey from DfsPackDescription")
stopped caching DfsPackFile in the DfsBlockCache, the DfsPackFile began
to always load the idx, bitmap, or compute reverse index, as the cache
handles were no longer populated by prior requests.
Rework caching to lookup the objects from the DfsBlockCache if the
local DfsPackFile handle is invalid. This allows the DfsPackFile to
be more of a flyweight instance across requests.
Change-Id: Ic7b42ce2d90692cccea36deb30c2c76ccc81638b
dfs: Use special ForReverseIndex DfsStreamKey wrapper instead of derive
While implementing a custom subclass of DfsStreamKey it became obvious
the required derive(String) was making it impossible to construct an
efficient key in all cases.
Instead, use a special wrapper type ForReverseIndex around the INDEX's
own DfsStreamKey to denote the reverse index stream in the
DfsBlockCache. This adds a smaller layer of boxing, but eliminates
weird issues for DFS implementors using specialized DfsStreamKey
implementations for space efficiency reasons.
Now that DfsStreamKey is reasonably light-weight, avoid allocating the
index and reverse index keys until necessary. DfsPackFile mostly
holds the DfsBlockCache.Ref handle to the object, and only needs the
DfsStreamKey when its looking up the handle.
Change-Id: Icea78e8f7f1514087b94ef5f525d9573ea2913f2
By making this a deterministic function, DfsBlockCache can stop
retaining a map of every DfsPackDescription it has ever seen. This
fixes a long standing memory leak in DfsBlockCache.
This refactoring also simplifies the idea of setting up more
lightweight objects around streams.
Change-Id: I051e7b96f5454c6b0a0e652d8f4a69c0bed7f6f4
This new base class has the minimum set of properties and methods
necessary for DfsBlockCache to manage blocks of a file in the cache.
Subclasses can use DfsBlockCache for any content.
This refactoring opens the door for additional PackExt types other
than PACK to be stored on a block-by-block basis by the DfsBlockCache.
Change-Id: I307228fc805c3ff0c596783beb24fd52bec35ba8
Instead of overloading the pack's DfsStreamKey with negative positions
for the idx, reverse idx and bitmap, assign a unique DfsStreamKey for
each of these related streams.
Change-Id: Ie048036c74a1d1bbf5ea7e888452dc0c1adf992f
This renaming supports reusing DfsStreamKey in a future commit
to index other PackExt type streams inside of the DfsBlockCache.
Change-Id: Ib52d374e47724ccb837f4fbab1fc85c486c5b408
If a block is missing from the block cache, open the pack stream,
retain the ReadableChannel, and turn on read-ahead. This should help
to load a medium sized pack into a cold cache more quickly from a
slower IO stream, as the pack is scanned sequentially and missing
blocks are more likely to be available through the read-ahead.
Change-Id: I3300d936b9299be6d9eb642992df7c04bb439cde
Compute how much disk IO a DfsReader is performing, and how long the
sum of those operations took on this reader instance. Implementations
of DFS and interested applications can get the stats by calling the
new DfsReader.getIoStats() method at or after close().
Change-Id: If585741301f29182617933d6406d4a70497f2ca7
These packages don't use @since tags because they are not part of the
stable public API. Some @since tags snuck in, though. Remove them to
make the convention easier to find for new contributors and the
expectations clearer for users.
Change-Id: I6c17d3cfc93657f1b33cf5c5708f2b1c712b0d31
The 12 bytes `PACK...` header is written in PackWriter before reading
CachedPack files. In DfsPackFile#copyPackBypassCache, the header was not
skipped when the first block is not in cache.
Change-Id: Ibbe2e564d36b79922a936657f286addb1044d237
Signed-off-by: Zhen Chen <czhen@google.com>
Filter corrupt objects from DfsReader.selectObjectRepresentation()
PackWriter.writeObject() can get into an infinite loop when corrupt
packs are present. When it finds a pack file with an object that can be
reused it calls DfsPackFile.copyAsIs(). If that method sees an invalid
CRC, it adds the object to the DfsPackFile's corrupt object list and
throws a CorruptObjectException, which it later catches as an
IOException and wraps in a
StoredObjectRepresentationNotAvailableException.
PackWriter.writeObjectImpl() catches that SORNAE and retries the
operation by calling DfsReader.selectObjectRepresentation(). But
currently that method returns the same object which was just seen to
be corrupt.
Change DfsPackFile.isCorrupt() from private to package private, and use
that method in DfsReader.selectObjectRepresentation() to filter out
corrupt objects.
The stack traces that show the problem are:
org.eclipse.jgit.errors.CorruptObjectException.<init>(CorruptObjectException.java:113)
org.eclipse.jgit.internal.storage.dfs.DfsPackFile.copyAsIs(DfsPackFile.java:624)
org.eclipse.jgit.internal.storage.dfs.DfsReader.copyObjectAsIs(DfsReader.java:491)
org.eclipse.jgit.internal.storage.pack.PackWriter.writeObjectImpl(PackWriter.java:1478)
org.eclipse.jgit.internal.storage.pack.PackWriter.writeObject(PackWriter.java:1455)
org.eclipse.jgit.internal.storage.dfs.DfsPackFile.getPackIndex(DfsPackFile.java:228)
org.eclipse.jgit.internal.storage.dfs.DfsReader.findAllFromPack(DfsReader.java:476)
org.eclipse.jgit.internal.storage.dfs.DfsReader.selectObjectRepresentation(DfsReader.java:455)
org.eclipse.jgit.internal.storage.pack.PackWriter.writeObjectImpl(PackWriter.java:1492)
org.eclipse.jgit.internal.storage.pack.PackWriter.writeObject(PackWriter.java:1455)
Change-Id: Iad7bbcaed1f11a6aa3b4f5af911a73a34c0fabfd
Signed-off-by: Terry Parker <tparker@google.com>
Set "potentialNullReference" to "error" level and fixed all issues
There should be no functional change, the logic updated only to make
code simple so that compiler can understand what is going for. Removed
all @SuppressWarnings("null") annotations since they cannot be used if
"org.eclipse.jdt.core.compiler.problem.potentialNullReference" option is
set to the "error" level.
Bug: 470647
Change-Id: Ie93c249fa46e792198d362e531d5cbabaf41fdc4
Signed-off-by: Andrey Loskutov <loskutov@gmx.de>
Cached packs are only used when writing over the network or to
a bundle file and reuse validation is always disabled in these
two contexts. The client/consumer of the stream will be SHA-1
checksumming every object.
Reuse validation is most critical during local GC to avoid silently
ignoring corruption by stopping as soon as a problem is found and
leaving everything alone for the end-user to debug and salvage.
Cached packs are not supported during local GC as the bitmap rebuild
logic does not support including a cached pack in the result.
Strip out the validation and force PackWriter to always disable the
cached pack feature if reuseValidation is enabled.
Change-Id: If0d7baf2ae1bf1f7e71bf773151302c9f7887039
This hint allows an underlying implementation to read more bytes when
possible and buffer them locally for future read calls to consume.
Change-Id: Ia986a1bb8640eecb91cfbd515c61fa1ff1574a6f
Avoid storing large packs in block cache during reuse
When a large pack (> 30% of the block cache) is being reused by
copying it pollutes the block cache with noise by storing blocks
that are never referenced again.
Avoid this by streaming the file directly from its channel onto
the output stream.
Change-Id: I2e53de27f3dcfb93de68b1fad45f75ab23e79fe7
Remove AutoCloseable from internal PackFile and friends
PackFile is held by the block cache and cannot be auto closed in a
try-with-resources statement. Remove the interface as JGit does
explicit management of the instances.
ObjectDatabase and RefDatabase are internal details of Repository
and are managed with the Repository. Marking them AutoCloseable
provides no value to the library or an application using the API.
Change-Id: Ibee19eadd66233e6666b601583daa1834a7778f1
The index provides access to a list of objects in a pack.
This will be helpful for repository integrity checking.
Change-Id: I435eeeb3fe1b1f5632d40528936416e97491d412
Signed-off-by: David Pletcher <dpletcher@google.com>
Provide more details in exceptions thrown when packfile is invalid
Mention packfile path in exceptions thrown when we detect that a
packfile is invalid and make excplicit that corrupt packs are removed
from the pack list.
Change-Id: I454ada5f8e69307d3f34d1c1b8f3cb87607ddf35
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Do not update the ref hot bit when checking isIndexLoaded
DfsPackFile.isIndexLoaded() uses the DfsBlockCache.Ref.get() method
to check if the index loaded. However, using the get() method marks
a hot bit in the cache, which can cause the index to never be unloaded
and seem hotter than it really is. Add a has() method which only
checks if the value is not null and does not update the hot bit.
Change-Id: I7e9ed216f6e273e8f5d79ae573973197654419b4
This implementation has been proven to deadlock in production server
loads. Google has been running with it disabled for a quite a while,
as the bugs have been difficult to identify and fix.
Instead of suggesting it works and is useful, drop the code. JGit
should not advertise support for functionality that is known to
be broken.
In a few of the places where read-ahead was enabled by DfsReader
there is more information about what blocks should be loaded when.
During object representation selection, or size lookup, or sending
object as-is to a PackWriter, or sending an entire pack as-is the
reader knows exactly which blocks are required in the cache, and it
also can compute when those will be needed. The broken read-ahead
code was stupid and just read a fixed amount ahead of the current
offset, which can waste IOs if more precise data was available.
DFS systems are usually slow to respond so read-ahead is still
a desired feature, but it needs to be rebuilt from scratch and
make better use of the offset information.
Change-Id: Ibaed8288ec3340cf93eb269dc0f1f23ab5ab1aea
Rewrite this complicated logic to examine each pack file exactly
once. This reduces thrashing when there are many large pack files
present and the reader needs to locate each object's header.
The intermediate temporary list is now smaller, it is bounded to
the same length as the input object list. In the prior version of
this code the list contained one entry for every representation of
every object being packed.
Only one representation object is allocated, reducing the overall
memory footprint to be approximately one reference per object found
in the current pack file (the pointer in the BlockList). This saves
considerable working set memory compared to the prior version that
made and held onto a new representation for every ObjectToPack.
Change-Id: I2c1f18cd6755643ac4c2cf1f23b5464ca9d91b22
JGit 3.0: move internal classes into an internal subpackage
This breaks all existing callers once. Applications are not supposed
to build against the internal storage API unless they can accept API
churn and make necessary updates as versions change.
Change-Id: I2ab1327c202ef2003565e1b0770a583970e432e9
Avoid looking at UNREACHABLE_GARBAGE for client have lines
Clients send a bunch of unknown objects to UploadPack on each round
of negotiation. Many of these are not known to the server, which
leads the implementation to be looking at indexes for garbage packs.
Disable examining the index of a garbage pack, allowing servers to
avoid reading them from disk during negotiation.
The effect of this change is the server will only ACK a have line
if the object was reachable during the last garbage collection,
or was recently added to the repository. For most repositories
there is no impact in this behavior change.
If a repository rewinds a branch, runs GC, and then resets the
branch back to where it was before, the now current tip is going to
be skipped by this change. A client that has the commit may wind up
getting a slightly larger data transfer from the server as an older
common ancestor will be chosen during negotiation. This is fixable
on the server side by running GC again to correct the layout of
objects in pack files.
Change-Id: Icd550359ef70fc7b701980f9b13d923fd13c744b
If a pack file has been marked invalid due to a prior IOException
accessing its contents, do not offer its bitmap index to callers.
The pack cannot be used so its bitmap should be off limits from
any reader trying to work from a bitmap.
Change-Id: Ia44e46558abdddee560bb184158b1e0af9437eee
A pack bitmap index is an additional index of compressed
bitmaps of the object graph. Furthermore, a logical API of the index
functionality is included, as it is expected to be used by the
PackWriter.
Compressed bitmaps are created using the javaewah library, which is a
word-aligned compressed variant of the Java bitset class based on
run-length encoding. The library only works with positive integer
values. Thus, the maximum number of ObjectIds in a pack file that
this index can currently support is limited to Integer.MAX_VALUE.
Every ObjectId is given an integer mapping. The integer is the
position of the ObjectId in the complete ObjectId list, sorted
by offset, for the pack file. That integer is what the bitmaps
use to reference the ObjectId. Currently, the new index format can
only be used with pack files that contain a complete closure of the
object graph e.g. the result of a garbage collection.
The index file includes four bitmaps for the Git object types i.e.
commits, trees, blobs, and tags. In addition, a collection of
bitmaps keyed by an ObjectId is also included. The bitmap for each entry
in the collection represents the full closure of ObjectIds reachable
from the keyed ObjectId (including the keyed ObjectId itself). The
bitmaps are further compressed by XORing the current bitmaps against
prior bitmaps in the index, and selecting the smallest representation.
The XOR'd bitmap and offset from the current entry to the position
of the bitmap to XOR against is the actual representation of the entry
in the index file. Each entry contains one byte, which is currently
used to note whether the bitmap should be blindly reused.
Change-Id: Id328724bf6b4c8366a088233098c18643edcf40f
Rename PackConstants to PackExt, a typed pack file extension.
PackConstants previously contained string values for the pack and pack
index extension. Change PackConstant to be PackExt, a typed wrapper
around the string pack file extension.
Change-Id: I86ac4db6da8f33aa42d6f37cfcc119e819444318
Remove getReverseIndexSize() from DfsPackDescription.
The method is used in only one location (DfsPackFile). Furthermore,
PackIndex already does an explicit computation of the size in
DfsPackFile. Simplify the DfsPackDescription by removing the method
and do the calculation similar to PackIndex.
Change-Id: I1391fdaaf7c2c3226d96ada1ae8647bcdff4794e
Use file extension with DfsPackDescription get/set file size.
Previously the size getters and setters had explicit methods for index
and pack. Update the api to be based on the file extension. This will
make it possible to support other extensions in the future, such as
the forthcoming bitmap extensions.
Change-Id: Iab9d4abe0af65b2fc71ad71ef1db0feb6b3b5c58
Update DfsObjDatabase API to open/write by pack extension.
Previously, the DfsObjDatabase had a hardcoded getPackFile() and
getPackIndex() methods which opens a .pack and .idx file, respectively.
A future change to add a bitmap index will need to be stored in a
parallel .bitmap file. Update the DfsObjDatabase to support opening and
writing of files for any pack extension.
Change-Id: I7c403b501e242096a2d435f6865d6025a9f86108
Change-Id: I458167739210214fa54c4b3d62fac5abc82f96f7
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Signed-off-by: Chris Aniszczyk <zx@twitter.com>
Add the an event and listener for a dfs PackIndex being loaded.
The DfsPackFile will fire any static repository listeners on the event
just before the PackIndex is loaded.
Change-Id: Ie51098106bd5a1a32feae7d2dd068abf02b030ee
Expose class DfsReader and method DfsPackFile.hasObject() as public.
Applications may want to be able to inquire about some details of
the storage of a repository. Make this possible by exposing some
simple accessor methods.
Expose method DfsObjDatabase.clearCache() as protected, allowing
implementing subclasses to dump the cache if necessary, and force
it to reload on a future request.
Change-Id: Ic592c82d45ace9f2fa5f8d7e4bacfdce96dea969
Parsing the size from a packed object header was incorrectly computing
the total inflated length when the length exceeded the range of a Java
int. The next 7 bits of size information was shifted left as an int
using a shift of 25 bits, placing the higher bits of the size into the
sign position. When this size was extended to a long to be added to
the current size accumulator the size went negative, resulting in
NegativeArraySizeException being thrown.
Fix all places where this particular pattern of code is used to read a
pack size field, or a binary delta header, as they both use the same
variable length encoding scheme.
Change-Id: I04008728ed828f18202652c3d5401cf95a441d0a