If a pack isn't found on disk remove it from pack list
If accessing a pack throws FileNotFoundException the pack was deleted
and we need to remove it from the pack list. This can be caused e.g. by
git gc.
Change-Id: I5d10f87f364dadbbdbfb61b6b2cbdee9c7457f3d
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Don't remove pack from pack list for problems which could be transient
If we hit a corrupt object or invalid pack remove the pack from the pack
list. Other IOException could be transient hence we should not remove
the pack from the list to avoid the problem reported on the Gerrit list
[1]. It looks like in the reported case the pack was removed from the
pack list causing MissingObjectExceptions which disappear when the
server is restarted.
[1] https://groups.google.com/forum/#!topic/repo-discuss/Qdmbl-YZ4NU
Change-Id: I331626110d54b190e46cddc2c40f29ddeb9613cd
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Log reason for ignoring pack when IOException occurred
This should help to identify the root cause of the problem discussed on
the Gerrit list [1].
[1] https://groups.google.com/forum/#!topic/repo-discuss/Qdmbl-YZ4NU
Change-Id: I871f70e4bb1227952e1544b789013583b14e2b96
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Allow explicit configuration of git directory in InitCommand
Native git's "init" command allows to specify the location of the .git
folder with the option "--separate-git-dir". This allows for example to
setup repositories with a non-standard layout. E.g. .git folder under
/repos/a.git and the worktree under /home/git/a. Both directories
contain pointers to the other side: /repos/a.git/config contains
core.worktree=/home/git/a . And /home/git/a/.git is a file containing
"gitdir: /repos/a.git". This commit adds that option to InitCommand.
This feature is needed to support the new submodule layout where the
.git folder of the submodules is under .git/modules/<submodule>.
Change-Id: I0208f643808bf8f28e2c979d6e33662607775f1f
Make sure modifications to config-param trustFolderStat are detected
ObjectDirectory.searchPacksAgain() should always read trustFolderStat
from the config and not rely on a cached value.
Change-Id: I90edbaae3c64eea0c9894d05acde4267991575ee
JGit's ObjectDirectory implements the optimization that it remembers the
pack folders (.git/objects/pack) lastModified timestamp and doesn't
check for new packfiles in this folder if the lastModified attribute has
not changed.
In environments using NFS this can cause trouble. If multiple JGit
instances from multiple machines work on the same repository and one
instance creates a new ref and a new packfile (e.g. by doing a fetch)
then the other machines may detect the new ref but can't resolve the
referenced object because it doesn't detect that pack folder has a new
packfile. That's because NFS may cache file/folder metadata for quite a
long time and the pack folders modification time is not updated although
a new packfile is there and could be read.
The new config parameter core.trustfolderstat controls this behaviour.
The default is true and jgits behaviours is unchanged. But if this
parameter is set to false then jgit doesn't trust the pack directories
lastmodified anymore. Instead it will always iterate through the content
of that folder to detect new packfiles.
Change-Id: Ie3b4e92933286aa9916070a22422e629b3147f54
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
JGit should offer the possibility to do a garbage collection in
"aggressive" mode. In this mode garbage collection more aggressively
optimize the repository at the expense of taking much more time.
Technically a aggressive mode garbage collection differs from a
non-aggressive one by:
- not reusing packed objects found in old packs. Recompress every object
- the configuration pack.window is set to 250 (the default is 10)
- the configuration pack.depths is set to 250 (the default is 50)
The associated classes in org.eclipse.jgit.api and the command line
command in org.eclipse.jgit.pgm expose this new option.
The configuration parameters gc.aggressiveDepth and gc.aggressiveWindow
have been introduced to configure this feature.
Bug: 444332
Change-Id: I024101f2810acf6be13ce144c9893d98f5c4ae76
Windows: Hide the .git directory if hidedotfiles is set to non-false
Other .git files are not hidden with this patch
Change-Id: Idf63ca08d08f3a77c33f5848d02074f8d6a75758
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
According to http://stackoverflow.com/a/8381338, the maximum array
size is not Integer.MAX_VALUE, but Integer.MAX_VALUE - 8
Change-Id: I6ddc7470368acd20abf0885c53c89a982bb0f176
Signed-off-by: Marc Strapetz <marc.strapetz@syntevo.com>
Cleanup use of java.util.Inflater, fixing rare infinite loops
The native implementation of inflate() can set finished to return
true at the same time as it copies the last bytes into the buffer.
Check for finished on each iteration, terminating as soon as libz
knows the stream was completely inflated.
If not finished, it is likely input is required before the next
native call could do any useful work. Most invocations are passing
in a buffer large enough to store the entire result. A partial return
from inflate() will need more input before it can continue. Checking
right away that needsInput() is true saves a native call to determine
no bytes can be inflated without more input.
This should fix a rare infinite loop condition inside of inflation
when an object ends exactly at the end of a block boundary, and
the next block contains only the 20 byte trailing SHA-1.
When the stream is finished each new attempt to inflate() returns
n == 0, as no additional bytes were output. The needsInput() test
tries to add the length of the footer block to itself, but then loops
back around an reloads the same block as the block is smaller than
a full block size. A zero length input is set to the inflater,
which triggers needsInput() condition again.
Change-Id: I95d02bfeab4bf995a254d49166b4ae62d1f21346
Add a method to ObjectInserter to read back inserted objects
In the DFS implementation, flushing an inserter writes a new pack to
the storage system and is potentially very slow, but was the only way
to ensure previously-inserted objects were available. For some tasks,
like performing a series of three-way merges, the total size of all
inserted objects may be small enough to avoid flushing the in-memory
buffered data.
DfsOutputStream already provides a read method to read back from the
not-yet-flushed data, so use this to provide an ObjectReader in the
DFS case.
In the file-backed case, objects are written out loosely on the fly,
so the implementation can just return the existing WindowCursor.
Change-Id: I454fdfb88f4d215e31b7da2b2a069853b197b3dd
Use bitcheck to check for presence of OPT_FULL option
Previously an equality check was performed so an exception would
be thrown if any other options were set.
Change-Id: I36b60e2c0a8aef9fcfe663055dba520192996872
Fix for reflog corruption caused by multiline message
If a client passes a multiline message as argument to ReflogWriter.log()
the Reflog gets corrupted and cannot be parsed. ReflogWriter.log() is
invoked implicitly from various commands such as StashCreate, Rebase and
many more. However the message is not always filtered for line feeds.
Such an example is the StashCreateOperation of EGit which passes
unchecked user input as commit message. If a multiline comment is pasted
to the stash create dialog, the reflog gets corrupted.
ReflogWriter now replaces line endings in log message with spaces.
Bug: 435509
Change-Id: I3010cc902e13bee4d7b6696dfd11ab51062739d3
Signed-off-by: Andreas Hermann <a.v.hermann@gmail.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Streaming packed deltas is so slow that it never feasibly completes
(it will take hours for it to stream a few hundred megabytes on
relatively fast systems with a large amount of storage). This
was indicated as a "failed experiment" by Shawn in the following
mailing list post:
http://dev.eclipse.org/mhonarc/lists/jgit-dev/msg01674.html
Change-Id: Idc12f59e37b122f13856d7b533a5af9d8867a8a5
Signed-off-by: Doug Kelly <dougk.ff7@gmail.com>
When working on a non-bare repository with a detached HEAD jgit's GC was
packing the ref named "HEAD" into the packed-refs file and deleted the
loose ref (the file .git/HEAD!). This made the repo unusable for native
git. This is fixed by telling jgit to only pack refs starting from
"refs/"
Change-Id: I50018aa006f18b244d2cae2ff78b5ffe1b821d63
PostReceiveHooks can make use of this information to, for example,
update a cached size of the Git repository.
Change-Id: I2bf1200959a50531e2155a7609c96035ba45b10d
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Revert "Add getPackFile to ReceivePack to make PostReceiveHook more
usable"
This reverts commit 2670fd427c.
By returning an instance of File from the ReceivePack.getPackFile the
abstraction of the persistence implementation was broken.
Change-Id: I28e3ebf3a659a7cbc94be51bba9e1ad338f2b786
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Some of our Windows users have reported sporadic file system access
problems related to ObjectDirectory(Inserter) file deletion code in
combination with antiviral/firewall tools. For one of these users the
problem was fairly reproducible and changing deletion to RETRY solved
his problem.
Change-Id: I1e4001d5557fca693b7bac401268599467cb0c9e
Signed-off-by: Marc Strapetz <marc.strapetz@syntevo.com>
Add getPackFile to ReceivePack to make PostReceiveHook more usable
Having access to the pack file that was created by the ReceivePack
may be useful for post receive hooks. For example, a hook may want
to check the size of the received pack and the created index.
Change-Id: I4d51758e4565d32c9f8892242947eb72644b847d
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
The change includes comparing symbolic links between disk and index,
adding symbolic links to the index, creating/modifying links on
checkout. The behavior is controlled by the core.symlinks setting, just
as C Git does. When a new repository is created core.symlinks will be
set depending on the capabilities of the operating system and Java
runtime.
If core.symlinks is set to true, the assumption is that symlinks are
supported, which may result in runtime errors if this turns out not to
be the case.
Measuring the cost of jgit status on a repository with ~70000 files,
of which ~30000 are tracked reveals a penalty of about 10% for using
the Java7 (really NIO2) support module.
Bug: 354367
Change-Id: I12f0fdd9d26212324a586896ef7eb1f6ff89c39c
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Fix MissingObjectException race in ObjectDirectory
Johannes Carlsson identified a race condition[1] that can lead to
spurious MissingObjectExceptions at read time. If two threads are
active inside of ObjectDirectory looking for a packed object and the
packList is currently the empty NO_PACKS list, thread A will find
no object and eventually consider tryAgain1(). If thread A is put
to sleep and this point and thread B also does not find the object,
loads the packs, when thread A wakes up its tryAgain1 would return
false and the thread never considers the packs.
Rework the internal API of ObjectDirectory to keep a handle on the
exact PackList that was iterated by thread A, allowing it to always
retry walking through the packs if the new PackList is different.
This had some ripple effect into the CachedObjectDirectory and
the shared FileObjectDatabase interface. The new code should be
slightly easier to follow, especially from the perspective of the
CachedObjectDirectory trying to minimize the number of open system
calls it makes to files matching "$GIT_DIR/objects/??/?x{38}".
[1] http://dev.eclipse.org/mhonarc/lists/jgit-dev/msg02401.html
Change-Id: I9a1c9d6ad6cb38404b7b9178167b714077561353
Package was renamed, so I had to update the imports. Also, I verified
bitmap serialization was still compatible.
Change-Id: I161ad3875b963b56001beab477ef8d072accee4f
Cache SimpleDateFormat in GitDateParser per locale
Otherwise switching to another locale yields wrong results when parsing
date strings in GitDateParser. Since the MockSystemReader explicitly
uses english locale the tests need to specify the locale to be used when
parsing date strings.
Bug: 420772
Change-Id: I313ef6b1e9ef3bfb43d929ce34712ebd21f2cd9c
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Don't delete .idx file if .pack file can't be deleted
If during an garbage collection old packfiles are deleted it could
happen that on certain platforms the index file can be deleted but the
packfile can't be deleted (because someone locked the file). This led
to repositories with packfiles without corresponding index files. Those
zombie-packfiles potentially consume a lot of space on disk and it is
never tried to delete them again. Try to avoid this situation by
deleting packfiles first and don't try to delete the other files if we
can't delete the packfile. This gives us the chance to delete the
packfile during next GC.
This commit only improves the situation - there is still the chance for
orphan files during packfile deletion. We don't have an atomic delete
of multiple files .
Change-Id: I0a19ae630186f07d0cc7fe9df246fa1cedeca8f6
Propagate IOException where possible when getting refs.
Currently, Repository.getAllRefs() and Repository.getTags() silently
ignores an IOException and instead returns an empty map. Repository
is a public API and as such cannot be changed until the next major
revision change. Where possible, update the internal jgit APIs to
use the RefDatabase directly, since it propagates the error.
Change-Id: I4e4537d8bd0fa772f388262684c5c4ca1929dc4c
Ignore bitmap indexes that do not match the pack checksum
If `git gc` creates a new pack with the same file name, the
pack checksum may not match that in the .bitmap. Fix the PackFile
implementaion to silently ignore invalid bitmap indexes.
Fixes Issue https://code.google.com/p/gerrit/issues/detail?id=2131
Change-Id: I378673c00de32385ba90f4b639cb812f9574a216
Previously it took 1200ms to create a reverse index (sorted by offset).
Using a simple bucket sort algorithm, that time is reduced to 450ms.
The bucket index into the offset array is kept, in order to decrease
the binary search window.
Don't keep a copy of the offsets. Instead, use nth position
to lookup the offset in the PackIndex.
Change-Id: If51ab76752622e04a4430d9a14db95ad02f5329d
Currently, the offset can only be retrieved by ObjectId or iterating all
of the entries. Add a method to lookup the offset by position in the
index sorted by SHA1.
Change-Id: I45e9ac8b752d1dab47b202753a1dcca7122b958e
A parenthesis was in the wrong place passing arguments to the wrong
format call. Also fix formatting of enclosing switch statement.
Change-Id: I4cb9642f08b58c39033c3a81dab4bd56bebf4fd2
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Update PackBitmapIndexRemapper to handle mappings not in the new pack.
Previously, the code assumed all commits in the old pack would also
be present in the new pack. This assumption caused an
ArrayIndexOutOfBoundsException during remapping of ids. Fix the
iterator to only return entries that may be remapped. Furthermore,
update getBitmap() to return null if commit does not exist in the
new pack.
Change-Id: I065babe8cd39a7654c916bd01c7012135733dddf
When renaming the lock file succeeds the lock isn't held anymore
This wrong book-keeping caused IOExceptions to be thrown because
LockFile.unlock() erroneously tried to delete the non-existing lock
file. These IOExeptions were hidden since they were silently caught.
Change-Id: If42b6192d92c5a2d8f2bf904b16567ef08c32e89
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Only on Windows the rename operation which renames temporary Packfiles
(and index-files and bitmap-files) sometime fails. This happens only
when renaming a temporary Packfile to a Packfile which already exists.
Such situations occur if you run GC twice on a repo without modifying
the repo inbetween.
In such situations there was bug in GC which led to a corrupted repo
whithout any packfiles anymore. This commit fixes the problem by
introducing a utility method which renames a file and throws an
IOException if it fails. This method also takes care to repeat a
failing rename if our FS class has found out we are running on a
platform with a unreliable File.renameTo() method.
I am searching for a better solution because even with this utility
method in hand a GC on a already GC'ed repo will fail on Windows. But
at least with this fix we will not produce corrupted repos anymore.
Bug: 389305
Change-Id: Iac1ab3e0b8c419c90404f2e2f3559672eb8f6d28
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
With JGit it is possible to write reflog entries where new objectid and
old objectid is null. Such reflogs cause FileRepository GC to crash
because it doesn't expect the new objectid to be null. One case where
this happened is in Gerrit's allProjects repo. In the same way as we
expect the old objectid to be potentially null we should also ignore
null values in the new objectid column.
Change-Id: Icf666c7ef803179b84306ca8deb602369b8df16e
JGit 3.0: move internal classes into an internal subpackage
This breaks all existing callers once. Applications are not supposed
to build against the internal storage API unless they can accept API
churn and make necessary updates as versions change.
Change-Id: I2ab1327c202ef2003565e1b0770a583970e432e9