Don't remove pack from pack list for problems which could be transient
If we hit a corrupt object or invalid pack remove the pack from the pack
list. Other IOException could be transient hence we should not remove
the pack from the list to avoid the problem reported on the Gerrit list
[1]. It looks like in the reported case the pack was removed from the
pack list causing MissingObjectExceptions which disappear when the
server is restarted.
[1] https://groups.google.com/forum/#!topic/repo-discuss/Qdmbl-YZ4NU
Change-Id: I331626110d54b190e46cddc2c40f29ddeb9613cd
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Log reason for ignoring pack when IOException occurred
This should help to identify the root cause of the problem discussed on
the Gerrit list [1].
[1] https://groups.google.com/forum/#!topic/repo-discuss/Qdmbl-YZ4NU
Change-Id: I871f70e4bb1227952e1544b789013583b14e2b96
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Make sure modifications to config-param trustFolderStat are detected
ObjectDirectory.searchPacksAgain() should always read trustFolderStat
from the config and not rely on a cached value.
Change-Id: I90edbaae3c64eea0c9894d05acde4267991575ee
JGit's ObjectDirectory implements the optimization that it remembers the
pack folders (.git/objects/pack) lastModified timestamp and doesn't
check for new packfiles in this folder if the lastModified attribute has
not changed.
In environments using NFS this can cause trouble. If multiple JGit
instances from multiple machines work on the same repository and one
instance creates a new ref and a new packfile (e.g. by doing a fetch)
then the other machines may detect the new ref but can't resolve the
referenced object because it doesn't detect that pack folder has a new
packfile. That's because NFS may cache file/folder metadata for quite a
long time and the pack folders modification time is not updated although
a new packfile is there and could be read.
The new config parameter core.trustfolderstat controls this behaviour.
The default is true and jgits behaviours is unchanged. But if this
parameter is set to false then jgit doesn't trust the pack directories
lastmodified anymore. Instead it will always iterate through the content
of that folder to detect new packfiles.
Change-Id: Ie3b4e92933286aa9916070a22422e629b3147f54
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Some of our Windows users have reported sporadic file system access
problems related to ObjectDirectory(Inserter) file deletion code in
combination with antiviral/firewall tools. For one of these users the
problem was fairly reproducible and changing deletion to RETRY solved
his problem.
Change-Id: I1e4001d5557fca693b7bac401268599467cb0c9e
Signed-off-by: Marc Strapetz <marc.strapetz@syntevo.com>
Fix MissingObjectException race in ObjectDirectory
Johannes Carlsson identified a race condition[1] that can lead to
spurious MissingObjectExceptions at read time. If two threads are
active inside of ObjectDirectory looking for a packed object and the
packList is currently the empty NO_PACKS list, thread A will find
no object and eventually consider tryAgain1(). If thread A is put
to sleep and this point and thread B also does not find the object,
loads the packs, when thread A wakes up its tryAgain1 would return
false and the thread never considers the packs.
Rework the internal API of ObjectDirectory to keep a handle on the
exact PackList that was iterated by thread A, allowing it to always
retry walking through the packs if the new PackList is different.
This had some ripple effect into the CachedObjectDirectory and
the shared FileObjectDatabase interface. The new code should be
slightly easier to follow, especially from the perspective of the
CachedObjectDirectory trying to minimize the number of open system
calls it makes to files matching "$GIT_DIR/objects/??/?x{38}".
[1] http://dev.eclipse.org/mhonarc/lists/jgit-dev/msg02401.html
Change-Id: I9a1c9d6ad6cb38404b7b9178167b714077561353
JGit 3.0: move internal classes into an internal subpackage
This breaks all existing callers once. Applications are not supposed
to build against the internal storage API unless they can accept API
churn and make necessary updates as versions change.
Change-Id: I2ab1327c202ef2003565e1b0770a583970e432e9
The bitmap code in PackWriter knows exactly when to use a pack as
a "cached pack". It enables cached pack usage only when the pack
has a bitmap and its entire closure of objects needs to be sent.
This is a much simpler code path to maintain, and JGit actually
has a way to write the necessary index.
Change-Id: I2645d482f8733fdf0c4120cc59ba9aa4d4ba6881
A pack bitmap index is an additional index of compressed
bitmaps of the object graph. Furthermore, a logical API of the index
functionality is included, as it is expected to be used by the
PackWriter.
Compressed bitmaps are created using the javaewah library, which is a
word-aligned compressed variant of the Java bitset class based on
run-length encoding. The library only works with positive integer
values. Thus, the maximum number of ObjectIds in a pack file that
this index can currently support is limited to Integer.MAX_VALUE.
Every ObjectId is given an integer mapping. The integer is the
position of the ObjectId in the complete ObjectId list, sorted
by offset, for the pack file. That integer is what the bitmaps
use to reference the ObjectId. Currently, the new index format can
only be used with pack files that contain a complete closure of the
object graph e.g. the result of a garbage collection.
The index file includes four bitmaps for the Git object types i.e.
commits, trees, blobs, and tags. In addition, a collection of
bitmaps keyed by an ObjectId is also included. The bitmap for each entry
in the collection represents the full closure of ObjectIds reachable
from the keyed ObjectId (including the keyed ObjectId itself). The
bitmaps are further compressed by XORing the current bitmaps against
prior bitmaps in the index, and selecting the smallest representation.
The XOR'd bitmap and offset from the current entry to the position
of the bitmap to XOR against is the actual representation of the entry
in the index file. Each entry contains one byte, which is currently
used to note whether the bitmap should be blindly reused.
Change-Id: Id328724bf6b4c8366a088233098c18643edcf40f
Include supported extensions in PackFile constructor.
Previously a PackFile class was assumed to only support a .pack and .idx
file. Update the constructor to enumerate the supported extensions for
the pack file. This will allow the bitmap code to only be executed if
the bitmap extension file is known to exist.
Change-Id: Ie59041dffec5f60d7ea2771026ffd945106bd4bf
Fix concurrent creation of fan-out object directories
If multiple threads attempted to insert loose objects into the same new
fan-out directory, the creation of that directory was subject to a race
condition that could lead to an unnecessary IOException being thrown -
because an inserter could not 'create' a directory that had just been
generated by a different thread. All we require is that the directory
does indeed *exist*, so not being able to _create_ it is not actually a
fatal problem. Setting 'skipExisting' to 'true' on the call to mkdir()
fixes the issue.
I found this issue as a real world occurrence while working on The BFG
Repo Cleaner (https://github.com/rtyley/bfg-repo-cleaner), a tool which
concurrently performs a lot of object creation.
In order to demonstrate the problem here I've added a small test case
which reliably reproduces the issue on the few different hardware
systems I've tried. The error thrown when the race-condition arises is
this:
java.io.IOException: Creating directory /home/roberto/repo.git/objects/e6 failed
at org.eclipse.jgit.util.FileUtils.mkdir(FileUtils.java:182)
at org.eclipse.jgit.storage.file.ObjectDirectory.insertUnpackedObject(ObjectDirectory.java:590)
at org.eclipse.jgit.storage.file.ObjectDirectoryInserter.insertOneObject(ObjectDirectoryInserter.java:113)
at org.eclipse.jgit.storage.file.ObjectDirectoryInserter.insert(ObjectDirectoryInserter.java:91)
at org.eclipse.jgit.lib.ObjectInserter.insert(ObjectInserter.java:329)
Change-Id: I88eac49bc600c56ba9ad290e6133d8a7113125ab
Remove packIndex field from FileObjDatabase openPack method.
Previously, the FileObjDatabase required both the pack file path and
index file path to be passed to openPack(). A future change to add
a bitmap index will add a .bitmap file parallel to the pack file
(similar to the .idx file). Update the PackFile to support
automatically loading pack index extensions based on the pack file
path.
Change-Id: Ifc8fc3e57f4afa177ba5a88df87334dbfa799f01
A few classes such as Constanrs are marked with @SuppressWarnings, as are
toString() methods with many liternal, but otherwise $NLS-n$ is used for
string containing text that should not be translated. A few literals may
fall into the gray zone, but mostly I've tried to only tag the obvious
ones.
Change-Id: I22e50a77e2bf9e0b842a66bdf674e8fa1692f590
StartGenerator now processes .git/shallow to have the
RevWalk stop for shallow commits.
See RevWalkShallowTest for tests.
Bug: 394543
CQ: 6908
Change-Id: Ia5af1dab3fe9c7888f44eeecab1e1bcf2e8e48fe
Signed-off-by: Chris Aniszczyk <zx@twitter.com>
The most expensive part of packing a repository for transport to
another system is enumerating all of the objects in the repository.
Once this gets to the size of the linux-2.6 repository (1.8 million
objects), enumeration can take several CPU minutes and costs a lot
of temporary working set memory.
Teach PackWriter to efficiently reuse an existing "cached pack"
by answering a clone request with a thin pack followed by a larger
cached pack appended to the end. This requires the repository
owner to first construct the cached pack by hand, and record the
tip commits inside of $GIT_DIR/objects/info/cached-packs:
cd $GIT_DIR
root=$(git rev-parse master)
tmp=objects/.tmp-$$
names=$(echo $root | git pack-objects --keep-true-parents --revs $tmp)
for n in $names; do
chmod a-w $tmp-$n.pack $tmp-$n.idx
touch objects/pack/pack-$n.keep
mv $tmp-$n.pack objects/pack/pack-$n.pack
mv $tmp-$n.idx objects/pack/pack-$n.idx
done
(echo "+ $root";
for n in $names; do echo "P $n"; done;
echo) >>objects/info/cached-packs
git repack -a -d
When a clone request needs to include $root, the corresponding
cached pack will be copied as-is, rather than enumerating all of
the objects that are reachable from $root.
For a linux-2.6 kernel repository that should be about 376 MiB,
the above process creates two packs of 368 MiB and 38 MiB[1].
This is a local disk usage increase of ~26 MiB, due to reduced
delta compression between the large cached pack and the smaller
recent activity pack. The overhead is similar to 1 full copy of
the compressed project sources.
With this cached pack in hand, JGit daemon completes a clone request
in 1m17s less time, but a slightly larger data transfer (+2.39 MiB):
Before:
remote: Counting objects: 1861830, done
remote: Finding sources: 100% (1861830/1861830)
remote: Getting sizes: 100% (88243/88243)
remote: Compressing objects: 100% (88184/88184)
Receiving objects: 100% (1861830/1861830), 376.01 MiB | 19.01 MiB/s, done.
remote: Total 1861830 (delta 4706), reused 1851053 (delta 1553844)
Resolving deltas: 100% (1564621/1564621), done.
real 3m19.005s
After:
remote: Counting objects: 1601, done
remote: Counting objects: 1828460, done
remote: Finding sources: 100% (50475/50475)
remote: Getting sizes: 100% (18843/18843)
remote: Compressing objects: 100% (7585/7585)
remote: Total 1861830 (delta 2407), reused 1856197 (delta 37510)
Receiving objects: 100% (1861830/1861830), 378.40 MiB | 31.31 MiB/s, done.
Resolving deltas: 100% (1559477/1559477), done.
real 2m2.938s
Repository owners can periodically refresh their cached packs by
repacking their repository, folding all newer objects into a larger
cached pack. Since repacking is already considered to be a normal
Git maintenance activity, this isn't a very big burden.
[1] In this test $root was set back about two weeks.
Change-Id: Ib87131d5c4b5e8c5cacb0f4fe16ff4ece554734b
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
[findbugs] Do not ignore exceptional return value of mkdir
java.io.File.mkdir() and mkdirs() report failure as an exceptional
return value false. Fix the code which silently ignored this
exceptional return value.
Change-Id: I41244f4b9d66176e68e2c07e2329cf08492f8619
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Fix tests broken by fix for adding files in a network share
The change Ie0350e032a97e0d09626d6143c5c692873a5f6a2 was not
done properly. The renamed file was not write protected, and
this broke a test.
Bug: 335388
Change-Id: I41b2235b7677bc5fddc70dda2a56cdd2cb53ce5d
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
We cannot always rename read-only files on network shares,
so rename the temp file for a new loose object first, and
then set it as read-only.
Bug: 335388
Change-Id: Ie0350e032a97e0d09626d6143c5c692873a5f6a2
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Refactor IndexPack to not require local filesystem
By moving the logic that parses a pack stream from the network (or
a bundle) into a type that can be constructed by an ObjectInserter,
repository implementations have a chance to inject their own logic
for storing object data received into the destination repository.
The API isn't completely generic yet, there are still quite a few
assumptions that the PackParser subclass is storing the data onto
the local filesystem as a single file. But its about the simplest
split of IndexPack I can come up with without completely ripping
the code apart.
Change-Id: I5b167c9cc6d7a7c56d0197c62c0fd0036a83ec6c
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Pulling the last modified checking logic out of ObjectDirectory
makes it possible to reuse this code for other files, such as
the $GIT_DIR/config or $GIT_DIR/packed-refs files.
Change-Id: If2f27a89fc3b7adde7e65ff40bbca5d55b98b772
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
java.io.File.delete() reports failure as an exceptional
return value false. Fix the code which silently ignored
this exceptional return value. Also remove some duplicate
deletion helper methods.
Change-Id: I80ed20ca1f07a2bc6e779957a4ad0c713789c5be
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Make ObjectDirectory getPacks() work the first time
If an object hasn't been accessed yet the pack list for a repository
may not have been scanned from disk. If an application (e.g. the dumb
transport servlet support code) asks for the pack list for an
ObjectDirectory, we should load it immediately.
Change-Id: I93d7b1bca422d905948e8e83b2afa83c8894a68b
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Update CachedObjectDirectory when inserting objects
If an ObjectInserter is created from a CachedObjectDirectory, we need
to ensure the cache is updated whenever a new loose object is actually
added to the loose objects directory, otherwise a future read from an
ObjectReader on the CachedObjectDirectory might not be able to open
the newly created object.
We mostly had the infrastructure in place to implement this due to the
injection of unpacked large deltas, but we didn't have a way to pass
the ObjectId from ObjectDirectoryInserter to CachedObjectDirectory,
because the inserter was using the underlying ObjectDirectory and not
the CachedObjectDirectory. Redirecting to CachedObjectDirectory
ensures the cache is updated.
Change-Id: I1f7bdfacc7ad77ebdb885f655e549cc570652225
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
When running IndexPack we use a CachedObjectDirectory, which
knows what objects are loose and tries to avoid stat(2) calls for
objects that do not exist in the repository, as stat(2) on Win32
is very slow.
However large delta objects found in a pack file are expanded into
a loose object, in order to avoid costly delta chain processing
when that object is used as a base for another delta.
If this expand occurs while working with the CachedObjectDirectory,
we need to update the cached directory data to include this new
object, otherwise it won't be available when we try to open it
during the object verify phase.
Bug: 324868
Change-Id: Idf0c76d4849d69aa415ead32e46a435622395d68
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Instead of spooling large delta bases into temporary files and then
immediately deleting them afterwards, spool the large delta out to
a normal loose object. Later any requests for that large delta can
be answered by reading from the loose object, which is much easier
to stream efficiently for readers.
Since the object is now duplicated, once in the pack as a delta and
again as a loose object, any future prune-packed will automatically
delete the loose object variant, releasing the wasted disk space.
As prune-packed is run automatically during either repack or gc, and
gc --auto triggers automatically based on the number of loose objects,
we get automatic cache management for free. Large objects that were
unpacked will be periodically cleared out, and will simply be restored
later if they are needed again.
After a short offline discussion with Junio Hamano today, we may want
to propose a change to prune-packed to hold onto larger loose objects
which also exist in pack files as deltas, if the loose object was
recently accessed or modified in the last 2 days.
Change-Id: I3668a3967c807010f48cd69f994dcbaaf582337c
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Remember loose objects and fast-track their lookup
Recently created objects are usually what branches point to, and
are usually written out as loose objects. But due to the high cost
of asking the operating system if a file exists, these are the last
thing that ObjectDirectory examines when looking for an object by
its ObjectId.
Caching recently seen loose objects permits the opening code to
jump directly to the loose object, accelerating lookup for branch
heads that are accessed often.
To avoid exploding the cache its limited to approximately 2048
entries. When more ids are added, the table is simply cleared
and reset in size.
Change-Id: I18f483217412b102f754ffd496c87061d592e535
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Move ObjectDirectory streaming limit to WindowCacheConfig
IDEs like Eclipse offer up the settings in WindowCacheConfig to the
user as a global set of options that are configured for the entire
JVM process, not per-repository, as the cache is shared across the
entire JVM. The limit on how much we are willing to allocate for
an object buffer is similar to the limit on how much we can use for
data caches, allocating that much space impacts the entire JVM and
not just a single repository, so it should be a global limit.
Change-Id: I22eafb3e223bf8dea57ece82cd5df8bfe5badebc
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Fix ObjectDirectory abbreviation resolution to notice new packs
If we can't resolve an abbreviation, it might be because there is
a new pack file we haven't picked up yet. Try scanning the packs
again and recheck each pack if there were differences from the last
scan we did.
Because of this, we don't have to open a pack during the test where
we generate a pack on the fly. We'll miss on the first loop during
which the PackList is the NO_PACKS magic initialization constant,
and pick up the newly created index during this retry logic.
Change-Id: I7b97efb29a695ee60c90818be380f7ea23ad13a3
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
ObjectReader implementations are now responsible for creating the
unique abbreviation of an ObjectId, or for resolving an abbreviation
back to its full form. In this latter case the reader can offer up
multiple candidates to the caller, who may be able to disambiguate
them based on context.
Repository.resolve() doesn't take multiple candidates into account
right now, but it could in the future by looking for a remaining
^0 or ^{commit} suffix and take an expansion if there is only one
commit that matches the input abbreviation. It could also use
the distance from an annotated tag to resolve "tag-NNN-gcommit"
style strings that are often output by `git describe`.
Change-Id: Icd3250adc8177ae05278b858933afdca0cbbdb56
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This is an informational function used by PackWriter to help it
better organize objects for delta compression. Storage systems
can implement it to provide up more detailed size information,
or they can simply rely on the default behavior that uses the
ObjectLoader obtained from open.
For local file storage, we can obtain this information faster
through specialized routines that parse a pack object header.
Change-Id: I13a09b4effb71ea5151b51547f7d091564531e58
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Use core.streamFileThreshold to set our streaming limit
We default this to 1 MiB for now, but we allow users to modify
it through the Repository's configuration file to be a different
value. A new repository listener is used to identify when the
setting has been updated and trigger a reconfiguration of any
active ObjectReaders.
To prevent a horrible explosion we cap core.streamFileThreshold
at no more than 1/4 of the maximum JVM heap size. We do this
because we need at least 2 byte arrays equal in size to the
stream threshold for the worst case delta inflation scenario,
and our host application probably also needs some amount of the
heap for their working set size.
Change-Id: I103b3a541dc970bbf1a6d92917a12c5a1ee34d6c
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Replace PackedObjectLoader with ObjectLoader.SmallObject
The class is identical, but ObjectLoader.SmallObject is part of our
public API for storage implementations to build on top of.
Change-Id: I381a3953b14870b6d3d74a9c295769ace78869dc
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Big loose objects can now be streamed if they are over the large
object size threshold. This prevents the JVM heap from exploding
with a very large byte array to hold the slurped file, and then
again with its uncompressed copy.
We may have slightly slowed down the simple case for small
loose objects, as the loader no longer slurps the entire thing
and decompresses in memory. To try and keep good performance
for the very common small objects that are below 8 KiB in size,
buffers are set to 8 KiB, causing the reader to slurp most of the
file anyway. However the data has to be copied at least once,
from the BufferedInputStream into the InflaterInputStream.
New unit tests are supplied to get nearly 100% code coverage on the
unpacked code paths, for both standard and pack style loose objects.
We tested a fair chunk of the code elsewhere, but these new tests
are better isolated to the specific branches in the code path.
Change-Id: I87b764ab1b84225e9b5619a2a55fd8eaa640e1fe
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Similar to what we did with the file code, move the pack writer
into its own package so the related classes and their package
private methods are hidden from the rest of the library.
Change-Id: Ic1b5c7c8c8d266e90c910d8d68dfc8e93586854f
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
We no longer need an ObjectLoader to be lazy and try to delay
the materialization of the object content. That was done only
to support PackWriter searching for a good reuse candidate.
Instead, simplify the code base by doing the materialization
immediately when the loader asks for it, because any caller
asking for the loader is going to need the content.
Change-Id: Id867b1004529744f234ab8f9cfab3d2c52ca3bd0
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Tighten up local packed object representation during packing
Rather than making a loader, and then using that to fill the object
representation, parse the header and set up our data directly.
This saves some time, as we don't waste cycles on information we
won't use right now.
The weight computed for a representation is now its actual stored
size in the pack file, rather than its inflated size. This accounts
for changes made when the compression level is modified on the
repository. It is however more costly to determine the weight of
the object, since we have to find its length in the pack. To try and
recover that cost we now cache the length as part of our ObjectToPack
record, so it doesn't have to be found during the output phase.
A LocalObjectToPack now costs us (assuming 32 bit pointers):
(32 bit) (64 bit)
vm header: 8 bytes 8 bytes
ObjectId: 20 bytes 20 bytes
PackedObjectInfo: 12 bytes 12 bytes
ObjectToPack: 8 bytes 12 bytes
LocalOTP: 20 bytes 24 bytes
----------- ---------
68 bytes 74 bytes
Change-Id: I923d2736186eb2ac8ab498d3eb137e17930fcb50
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Move FileRepository to storage.file.FileRepository
This move isolates all of the local file specific implementation code
into a single package, where their package-private methods and support
classes are properly hidden away from the rest of the core library.
Because of the sheer number of files impacted, I have limited this
change to only the renames and the updated imports.
Change-Id: Icca4884e1a418f83f8b617d0c4c78b73d8a4bd17
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The new selection implementation uses a public API on the
ObjectReader, allowing the storage library to enumerate its
candidates and select the best one for this packer without
needing to build a temporary list of the candidates first.
Change-Id: Ie01496434f7d3581d6d3bbb9e33c8f9fa649b6cd
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The WindowCache is an implementation detail of PackFile and how its
used by ObjectDirectory. Lets start to hide it and replace the public
API with a more generic concept, ObjectReader.
Because PackedObjectLoader is also considered a private detail of
PackFile, we have to make PackWriter temporarily dependent upon the
WindowCursor and thus FileRepository and ObjectDirectory in order to
just start the refactoring. In later changes we will clean up the
APIs more, exposing sufficient support to PackWriter without needing
the file specific implementation details.
Change-Id: I676be12b57f3534f1285854ee5de1aa483895398
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Not every object storage system will have the concept of alternate
object databases to search, and even if they do, they may not have
the notion of fast-access / slow-access split like we do within
the ObjectDirectory code for pack files and loose objects.
Push all of that down below the generic API so that it is a hidden
detail of the ObjectDirectory and its related supporting classes.
Change-Id: I54bc1ca5ff2ac94dfffad1f9a9dad7af202b9523
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Refactor object writing responsiblities to ObjectDatabase
The ObjectInserter API permits ObjectDatabase implementations to
control their own object insertion behavior, rather than forcing
it to always be a new loose file created in the local filesystem.
Inserted objects can also be queued and written asynchronously to
the main application, such as by appending into a pack file that
is later closed and added to the repository.
This change also starts to open the door to non-file based object
storage, such as an in-memory HashMap for unit testing, or a more
complex system built on top of a distributed hash table.
To help existing application code port to the newer interface we
are keeping ObjectWriter as a delegation wrapper to the new API.
Each ObjectWriter instances holds a reference to an ObjectInserter
for the Repository's top-level ObjectDatabase, and it flushes and
releases that instance on each object processed.
Change-Id: I413224fb95563e7330c82748deb0aada4e0d6ace
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
On Windows, FS_Win32_Cygwin has been used if a Cygwin Git installation
is present in the PATH. Assuming that the user works with the Cygwin
Git installation may result in unnecessary overhead if he actually
does not.
Applications built on top of jgit may have more knowledge on the
actually used Git client (Cygwin or not) and hence should be able to
configure which FS to use accordingly.
Change-Id: Ifc4278078b298781d55cf5421e9647a21fa5db24
The strings are externalized into the root resource bundles.
The resource bundles are stored under the new "resources" source
folder to get proper maven build.
Strings from tests are, in general, not externalized. Only in
cases where it was necessary to make the test pass the strings
were externalized. This was typically necessary in cases where
e.getMessage() was used in assert and the exception message was
slightly changed due to reuse of the externalized strings.
Change-Id: Ic0f29c80b9a54fcec8320d8539a3e112852a1f7b
Signed-off-by: Sasa Zivkov <sasa.zivkov@sap.com>
If a concurrent thread picks up a newly created PackFile and adds
it to the pack list before the IndexPack thread itself can insert
the item onto the front of the list, do nothing and use the item
that was picked up by that other concurrent scanning thread.
This avoids a potential condition where the same pack exists in
memory twice, which causes confusion later during a rescan of the
directory because we don't know exactly which PackFile instance
should be retained into the new list, and which should be discarded.
We can stop searching through the old pack list as soon as the
sort function declares that the item to insert should be before
the item already in the list. Because the list is always sorted
by modification time (in seconds), we should never encounter a
case where the pack is positioned at the wrong spot in the list.
This early break out still permits an efficient implementation of
the common case, inserting a new pack at the head of the list.
Change-Id: Ice4459bbd4ee9487078aff5257893883d04f05fb
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Favor earlier PackFile instances over later duplicates
There is a potential race condition during insertPack that can lead
to us having the same pack file open twice in the same directory.
A different thread can miss an object on disk, and trigger a scan
of the directory, and notice the pack that was put in by IndexPack.
So the pack winds up in the newly created PackList.
The IndexPack thread then wakes up and finishes its insertPack by
creating a new PackFile and inserting it into position 0 of the list.
We now have the same pack listed twice.
Readers will favor the earlier PackFile instance, because its the
first one they come across as they iterate through the list.
Keep that earlier one when we scan the pack directory again, as
this will avoid needing to purge out all of the windows that may
have been cached.
Of course we should also fix that race condition, but this block
was taking the wrong resolution if this error ever shows up, so
lets first fix the block to use a more sane resolution.
Change-Id: I0d339b9fd1dd8012e8fe5a564b893c0f69109e28
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Added caching for loose object lookup during pack indexing
On Windows systems, file system lookup is a slow operation, so
checking each object if it exists during indexing (after receiving
the pack) could take a siginificant time. This patch introduces
CachedObjectDirectory that pre-caches lookup results.
Bug: 300397
Change-Id: I471b93f9bb3ee173eb37cae1d75e9e4eb49985e7
Signed-off-by: Constantine Plotnikov <constantine.plotnikov@gmail.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This exposes the list of known packs, allowing callers to list them
into a context like the objects/info/packs file.
Change-Id: I0b889564bd176836ff5c77ba310c6d229409dcd5
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This makes the jgit command line behave like the C Git implementation
in the respect.
These variables are not recognized in the core, though we add support
to do the overrides there. Hence other users of the JGit library, like
the Eclipse plugin and others, will not be affected.
GIT_DIR
The location of the ".git" directory.
GIT_WORK_TREE
The location of the work tree.
GIT_INDEX_FILE
The location of the index file.
GIT_CEILING_DIRECTORIES
A colon (semicolon on Windows) separated list of paths that
which JGit will not cross when looking for the .git directory.
GIT_OBJECT_DIRECTORY
The location of the objects directory under which objects are
stored.
GIT_ALTERNATE_OBJECT_DIRECTORIES
A colon (semicolon on Windows) separated list of object directories
to search for objects.
In addition to these we support the core.worktree config setting when
the git directory is set deliberately instead of being found.
Change-Id: I2b9bceb13c0f66b25e9e3cefd2e01534a286e04c
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>