PostReceiveHooks can make use of this information to, for example,
update a cached size of the Git repository.
Change-Id: I2bf1200959a50531e2155a7609c96035ba45b10d
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Revert "Add getPackFile to ReceivePack to make PostReceiveHook more
usable"
This reverts commit 2670fd427c.
By returning an instance of File from the ReceivePack.getPackFile the
abstraction of the persistence implementation was broken.
Change-Id: I28e3ebf3a659a7cbc94be51bba9e1ad338f2b786
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Add getPackFile to ReceivePack to make PostReceiveHook more usable
Having access to the pack file that was created by the ReceivePack
may be useful for post receive hooks. For example, a hook may want
to check the size of the received pack and the created index.
Change-Id: I4d51758e4565d32c9f8892242947eb72644b847d
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Possibility to limit the max pack size on receive-pack
The maxPackSizeLimit, when set, will reject a pack if it exceeds
that limit.
This feature is intended to provide a mechanism to control disk space
quota on Git repositories.
Change-Id: I83d8db670875c395f8171461b402083323e623a5
CQ: 7896
Move Apache httpclient based HTTP support to a separate bundle
This move avoids that all consumers of org.eclipse.jgit depend on Apache
httpclient. Also add another feature to make this optional for OSGi
consumers as well.
Change-Id: I5ef5e00c53678b9e1d7cfd54bbca3ff6f1c1c967
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Add an implementation for HttpConnection using Apache HttpClient
This change implements the http connection abstraction with the help of
org.apache.http.client.HttpClient. The default implementation used by
JGit is still the JDK HttpURLConnection. But now JGit users have the
possibility to switch completely to org.apache.httpclient. The reason
for this is that in certain (e.g. cloud) environments you are forced to
use the org.apache classes.
Change-Id: I0b357f23243ed13a014c79ba179fa327dfe318b2
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
The change includes comparing symbolic links between disk and index,
adding symbolic links to the index, creating/modifying links on
checkout. The behavior is controlled by the core.symlinks setting, just
as C Git does. When a new repository is created core.symlinks will be
set depending on the capabilities of the operating system and Java
runtime.
If core.symlinks is set to true, the assumption is that symlinks are
supported, which may result in runtime errors if this turns out not to
be the case.
Measuring the cost of jgit status on a repository with ~70000 files,
of which ~30000 are tracked reveals a penalty of about 10% for using
the Java7 (really NIO2) support module.
Bug: 354367
Change-Id: I12f0fdd9d26212324a586896ef7eb1f6ff89c39c
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Fix MissingObjectException race in ObjectDirectory
Johannes Carlsson identified a race condition[1] that can lead to
spurious MissingObjectExceptions at read time. If two threads are
active inside of ObjectDirectory looking for a packed object and the
packList is currently the empty NO_PACKS list, thread A will find
no object and eventually consider tryAgain1(). If thread A is put
to sleep and this point and thread B also does not find the object,
loads the packs, when thread A wakes up its tryAgain1 would return
false and the thread never considers the packs.
Rework the internal API of ObjectDirectory to keep a handle on the
exact PackList that was iterated by thread A, allowing it to always
retry walking through the packs if the new PackList is different.
This had some ripple effect into the CachedObjectDirectory and
the shared FileObjectDatabase interface. The new code should be
slightly easier to follow, especially from the perspective of the
CachedObjectDirectory trying to minimize the number of open system
calls it makes to files matching "$GIT_DIR/objects/??/?x{38}".
[1] http://dev.eclipse.org/mhonarc/lists/jgit-dev/msg02401.html
Change-Id: I9a1c9d6ad6cb38404b7b9178167b714077561353
Package was renamed, so I had to update the imports. Also, I verified
bitmap serialization was still compatible.
Change-Id: I161ad3875b963b56001beab477ef8d072accee4f
More helpful InvalidPathException messages (include reason)
Instead of just a generic "Invalid path: $path", add a reason for the
cases where it's not obvious what the problem is (e.g. "aux" being
reserved on Windows).
Bug: 413915
Change-Id: Ia6436bd2560e4f049c92d9aac907cb87348605e0
Signed-off-by: Robin Stocker <robin@nibor.org>
Cache SimpleDateFormat in GitDateParser per locale
Otherwise switching to another locale yields wrong results when parsing
date strings in GitDateParser. Since the MockSystemReader explicitly
uses english locale the tests need to specify the locale to be used when
parsing date strings.
Bug: 420772
Change-Id: I313ef6b1e9ef3bfb43d929ce34712ebd21f2cd9c
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Do not update the ref hot bit when checking isIndexLoaded
DfsPackFile.isIndexLoaded() uses the DfsBlockCache.Ref.get() method
to check if the index loaded. However, using the get() method marks
a hot bit in the cache, which can cause the index to never be unloaded
and seem hotter than it really is. Add a has() method which only
checks if the value is not null and does not update the hot bit.
Change-Id: I7e9ed216f6e273e8f5d79ae573973197654419b4
Don't delete .idx file if .pack file can't be deleted
If during an garbage collection old packfiles are deleted it could
happen that on certain platforms the index file can be deleted but the
packfile can't be deleted (because someone locked the file). This led
to repositories with packfiles without corresponding index files. Those
zombie-packfiles potentially consume a lot of space on disk and it is
never tried to delete them again. Try to avoid this situation by
deleting packfiles first and don't try to delete the other files if we
can't delete the packfile. This gives us the chance to delete the
packfile during next GC.
This commit only improves the situation - there is still the chance for
orphan files during packfile deletion. We don't have an atomic delete
of multiple files .
Change-Id: I0a19ae630186f07d0cc7fe9df246fa1cedeca8f6
Add Squash/Fixup support for rebase interactive in RebaseCommand
The rebase command now supports squash and fixup. Both actions are not
allowed as the first step of the rebase.
In JGit, before any rebase step is performed, the next commit is
already cherry-picked. This commit keeps that behaviour. In case of
squash or fixup a soft reset to the parent is perfomed afterwards.
CQ: 7684
Bug: 396510
Change-Id: I3c4190940b4d7f19860e223d647fc78705e57203
Signed-off-by: Tobias Pfeifer <to.pfeifer@web.de>
Signed-off-by: Stefan Lay <stefan.lay@sap.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Enhance reading of git-rebase-todo formatted files
Reading and writing files formatted like the git-rebase-todo files was
hidden in the RebaseCommand. Certain constructs (like leading tabs and
spaces) have not been handled as in native git. Also the upcoming
rebase interactive feature in EGit needs reading/writing these files
independently from a RebaseCommand.
Therefore reading and writing those files has been moved to the
Repository class. RebaseCommand gets smaller because of that and doesn't
have to deal with reading/writing files.
Additional tests for empty todo-list files, or files containing comments
have been added.
Change-Id: I323f3619952fecdf28ddf50139a88e0bea34f5ba
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Also-by: Tobias Pfeifer <to.pfeifer@sap.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Propagate IOException where possible when getting refs.
Currently, Repository.getAllRefs() and Repository.getTags() silently
ignores an IOException and instead returns an empty map. Repository
is a public API and as such cannot be changed until the next major
revision change. Where possible, update the internal jgit APIs to
use the RefDatabase directly, since it propagates the error.
Change-Id: I4e4537d8bd0fa772f388262684c5c4ca1929dc4c
Ignore bitmap indexes that do not match the pack checksum
If `git gc` creates a new pack with the same file name, the
pack checksum may not match that in the .bitmap. Fix the PackFile
implementaion to silently ignore invalid bitmap indexes.
Fixes Issue https://code.google.com/p/gerrit/issues/detail?id=2131
Change-Id: I378673c00de32385ba90f4b639cb812f9574a216
Remove unneeded packs when compacting with no new objects
Previously, the DfsPackCompactor exited without pruning the existing
packs, when no new packs were created.
Change-Id: I5e3b6f8c789706c7a982e6ae93cf7c3d4346797c
OpenJDK 7 does not benefit from using an inflate stride on the input
array. The implementation of java.util.zip.Inflater supplies the
entire input byte[] to libz, with no regards for the bounds supplied.
Slicing at 512 byte increments in DfsBlock no longer has any benefit.
In OpenJDK 6 the native portion of Inflater used GetByteArrayRegion
to obtain a copy of the input buffer for libz. In this use case
supplying a small stride made sense, it avoided allocating space
for and copying data past the end of the object's compressed stream.
In OpenJDK 7 the native code uses GetPrimitiveArrayCritical,
which tries to avoid copying by freezing Java garbage collection
and accessing the byte[] contents in place. On OpenJDK 7 derived
JVMs it is likely more efficient to supply the entire DfsBlock.
Since OpenJDK 5 and 6 are deprecated and replaced by OpenJDK 7
it is reasonable to suggest any consumers running JGit with DFS
support use an OpenJDK 7 derived JVM. However, JGit still targets
local filesystem support on Java 5, so it is still not reasonble to
apply this same simplification to the internal.storage.file package.
See: JDK-6751338 (http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6751338)
Change-Id: Ib248b6d383da5c8aa887d9c355a0df6f3e2247a5
Previously it took 1200ms to create a reverse index (sorted by offset).
Using a simple bucket sort algorithm, that time is reduced to 450ms.
The bucket index into the offset array is kept, in order to decrease
the binary search window.
Don't keep a copy of the offsets. Instead, use nth position
to lookup the offset in the PackIndex.
Change-Id: If51ab76752622e04a4430d9a14db95ad02f5329d
Currently, the offset can only be retrieved by ObjectId or iterating all
of the entries. Add a method to lookup the offset by position in the
index sorted by SHA1.
Change-Id: I45e9ac8b752d1dab47b202753a1dcca7122b958e
Support refspecs with wildcard in middle (not only at end)
The following refspec, which can be used to fetch GitHub pull requests,
is supported by C Git but was not yet by JGit:
+refs/pull/*/head:refs/remotes/origin/pr/*
The reason is that the wildcard in the source is in the middle.
This change also includes more validation (e.g. "refs//heads" is not
valid) and test cases.
Bug: 405099
Change-Id: I9bcef7785a0762ed0a98ca95a0bdf8879d5702aa
Allow use of ArchiveCommand without depending on the jgit command-line
tools.
To avoid complicating the process of installing and upgrading JGit,
this does not add a dependency by the org.eclipse.jgit bundle on
commons-compress. Instead, the caller is responsible for registering
any formats they want to use by calling ArchiveCommand.registerFormat.
This patch puts functionality that requires an archiver into a
separate org.eclipse.jgit.archive bundle for people who want it. One
can use it by calling ArchiveCommand.registerFormat directly to
register its formats or by relying on OSGi class loading to load
org.eclipse.jgit.archive.FormatActivator, which takes care of
registration automatically.
Once the appropriate formats are registered, you can make a tar or zip
from a git tree object as follows:
ArchiveCommand cmd = git.archive();
try {
cmd.setTree(tree).setFormat(fmt).setOutputStream(out).call();
} finally {
cmd.release();
}
Change-Id: I418e7e7d76422dc6f010d0b3b624d7bec3b20c6e
A parenthesis was in the wrong place passing arguments to the wrong
format call. Also fix formatting of enclosing switch statement.
Change-Id: I4cb9642f08b58c39033c3a81dab4bd56bebf4fd2
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
The most important difference is that in Java7 we have symbolic links
and for most operations in the work tree we want to operate on the link
itself rather than the link target, which the old File methods generally
do.
We also add support for the hidden attribute, which only makes sense
on Windows and exists, just since there are claims that Files.exists
is faster the File.exists.
A new bundle is only activated when run with a Java7 execution
environment. It is implemented as a fragment.
Tycho currently has no way to conditionally include optional features
based on the java version used to run the build, this means with this
change the jgit packaging build always needs to be run using java 7.
Change-Id: I3d6580d6fa7b22f60d7e54ab236898ed44954ffd
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Rescale "Compressing objects" progress meter by size
Instead of counting objects processed, count number of bytes added
into the window. This should rescale the progress meter so that 30%
complete means 30% of the total uncompressed content size has been
inflated and fed into the window.
In theory the progress meter should be more accurate about its
percentage complete/remaining fraction than with objects. When
counting objects small objects move the progress meter more rapidly
than large objects, but demand a smaller amount of work than large
objects being compressed.
Change-Id: Id2848c16a2148b5ca51e0ca1e29c5be97eefeb48
Instead of assuming all objects cost the same amount of time to
delta compress, aggregate the byte size of objects in the list
and partition threads with roughly equal total bytes.
Before splitting the list select the N largest paths and assign
each one to its own thread. This allows threads to get through the
worst cases in parallel before attempting smaller paths that are
more likely to be splittable.
By running the largest path buckets first on each thread the likely
slowest part of compression is done early, while progress is still
reporting a low percentage. This gives users a better impression of
how fast the phase will run. On very complex inputs the slow part
is more likely to happen first, making a user realize its time to
go grab lunch, or even run it overnight.
If the worst sections are earlier, memory overruns may show up
earlier, giving the user a chance to correct the configuration and
try again before wasting large amounts of time. It also makes it
less likely the delta compression phase reaches 92% in 30 minutes
and then crawls for 10 hours through the remaining 8%.
Change-Id: I7621c4349b99e40098825c4966b8411079992e5f
By excluding objects the compactor can avoid storing objects that
are already well packed in the base GC packs, or any other pack
not being replaced by the current compaction operation.
For deltas the base object is still included even if the base exists
in another exclusion set. This favors keeping deltas for recent
history, to support faster fetch operations for clients.
Change-Id: Ie822fe075fe5072fe3171450fda2f0ca507796a1
Update PackBitmapIndexRemapper to handle mappings not in the new pack.
Previously, the code assumed all commits in the old pack would also
be present in the new pack. This assumption caused an
ArrayIndexOutOfBoundsException during remapping of ids. Fix the
iterator to only return entries that may be remapped. Furthermore,
update getBitmap() to return null if commit does not exist in the
new pack.
Change-Id: I065babe8cd39a7654c916bd01c7012135733dddf
Always attempt delta compression when reuseDeltas is false
If reuseObjects=true but reuseDeltas=false the caller wants attempt
a delta for every object in the input list. Test for reuseDeltas
to ensure every object passes through the searchInWindow() method.
If no delta is possible for an object and it will be stored whole
(non-delta format), PackWriter may still reuse its content from any
source pack. This avoids an inflate()-deflate() cycle to recompress
the object contents.
Change-Id: I845caeded419ef4551ef1c85787dd5ffd73235d9
TemporaryBuffer is great when the output size is not known, but must
be bound by a relatively large upper limit that fits in memory, e.g.
64 KiB or 20 MiB. The buffer gracefully supports growing storage by
allocating 8 KiB blocks and storing them in an ArrayList.
In a Git repository many deltas are less than 8 KiB. Typical tree
objects are well below this threshold, and their deltas must be
encoded even smaller.
For these much smaller cases avoid the 8 KiB minimum allocation used
by TemporaryBuffer. Instead allocate a very small OutputStream
writing to an array that is sized at the limit.
Change-Id: Ie25c6d3a8cf4604e0f8cd9a3b5b701a592d6ffca
Correct distribution of allowed delta size along chain length
Nicolas Pitre discovered a very simple rule for selecting between two
different delta base candidates:
- if based whole object, must be <= 50% of target
- if at end of a chain, must be <= 1/depth * 50% of target
The rule penalizes deltas near the end of the chain, requiring them to
be very small in order to be kept by the packer. This favors deltas
that are based on a shorter chain, where the read-time unpack cost is
much lower. Fewer bytes need to be consulted from the source pack
file, and less copying is required in memory to rebuild the object.
Junio Hamano explained Nico's rule to me today, and this commit fixes
DeltaWindow to implement it as described.
When no base has been chosen the computation is simply the statements
denoted above. However once a base with depth of 9 has been chosen
(e.g. when pack.depth is limited to 10), a non-delta source may
create a new delta that is up to 10x larger than the already selected
base. This reflects the intent of Nico's size distribution rule no
matter what order objects are visited in the DeltaWindow.
With this patch and my other patches applied, repacking JGit with:
[pack]
reuseObjects = false
reuseDeltas = false
depth = 50
window = 250
threads = 4
compression = 9
CGit (all) 5,711,735 bytes; real 0m13.942s user 0m47.722s [1]
JGit heads 5,718,295 bytes; real 0m11.880s user 0m38.177s [2]
rest 9,809 bytes
The improved JGit result for the head pack is only 6.4 KiB larger than
CGit's resulting pack. This patch allowed JGit to find an additional
39.7 KiB worth of space savings. JGit now also often runs 2s faster
than CGit, despite also creating bitmaps and pruning objects after the
head pack creation.
[1] time git repack -a -d -F --window=250 --depth=50
[2] time java -Xmx128m -jar jgit debug-gc
Change-Id: I5caec31359bf7248cabdd2a3254c84d4ee3cd96b
When an idle thread tries to steal work from a sibling's remaining
toSearch queue, always try to split along a path boundary. This
avoids missing delta opportunities in the current window of the
thread whose work is being taken.
The search order is reversed to walk further down the chain from
current position, avoiding the risk of splitting the list within
the path the thread is currently processing.
When selecting which thread to split from use an accurate estimate
of the size to be taken. This avoids selecting a thread that has
only one path remaining but may contain more pending entries than
another thread with several paths remaining.
As there is now a race condition where the straggling thread can
start the next path before the split can finish, the stealWork()
loop spins until it is able to acquire a split or there is only
one path remaining in the siblings.
Change-Id: Ib11ff99f90a4d9efab24bf4a85342cc63203dba5
PackWriter generally chooses the order for objects when it builds the
object lists. This ordering already depends on history information to
guide placing more recent objects first and historical objects last.
Allow PackWriter to make the basic ordering decisions, instead of
trying to override them. The old approach of sorting the list caused
DfsReader to override any ordering change PackWriter might have tried
to make when repacking a repository.
This now better matches with WindowCursor's implementation, where
PackWriter solely determines the object ordering.
Change-Id: Ic17ab5631ec539f0758b962966c3a1823735b814
Replace DeltaWindow array with circularly linked list
Typical window sizes are 10 and 250 (although others are accepted).
In either case the pointer overhead of 1 pointer in an array or
2 pointers for a double linked list is trivial. A doubly linked
list as used here for window=250 is only another 1024 bytes on a
32 bit machine, or 2048 bytes on a 64 bit machine.
The critical search loops scan through the array in either the
previous direction or the next direction until the cycle is finished,
or some other scan abort condition is reached. Loading the next
object's pointer from a field in the current object avoids the
branch required to test for wrapping around the edge of the array.
It also saves the array bounds check on each access.
When a delta is chosen the window is shuffled to hoist the currently
selected base as an earlier candidate for the next object. Moving
the window entry is easier in a double-linked list than sliding a
group of array entries.
Change-Id: I9ccf20c3362a78678aede0f0f2cda165e509adff
The copy instruction formatter should not to compute the shifts and
masks twice. Instead compute them once and assume there is a register
available to store the temporary "b" for compare with 0.
Change-Id: Ic7826f29dca67b16903d8f790bdf785eb478c10d
javac and the JIT are more likely to understand a boolean being
used as a branch conditional than comparing int against 0 and 1.
Rewrite NEXT_RES and NEXT_SRC constants to be booleans so the
code is clarified for the JIT.
Change-Id: I1bdd8b587a69572975a84609c779b9ebf877b85d
Micro-optimize DeltaWindow maxMemory test to be != 0
Instead of using a compare-with-0 use a does not equal 0.
javac bytecode has a special instruction for this, as it
is very common in software. We can assume the JIT knows
how to efficiently translate the opcode to machine code,
and processors can do != 0 very quickly.
Change-Id: Idb84c1d744d2874517fd4bfa1db390e2dbf64eac
This class and all of its methods are only package visible.
Clarify the methods as final for the benefit of the JIT to
inline trivial code.
Change-Id: I078841f9900dbf299fbe6abf2599f0208ae96856
Colby just pointed out to me the buffer was 16 KiB. This may
be very small for common objects. Increase to 64 KiB.
Change-Id: Ideecc4720655a57673252f7adb8eebdf2fda230d
Most objects are written as OFS_DELTA with the base in the pack,
that is why this case comes first in writeHeader(). Rewrite the
condition to always examine this first and cache the PackWriter's
formatting flag for use of OFS_DELTA headers, in modern Git networks
this is true more often then it it is false.
Assume the cost of write() is high, especially due to entering the
MessageDigest to update the pack footer SHA-1 computation. Combine
the OFS_DELTA information as part of the header buffer so that the
entire burst is a single write call, rather than two relatively
small ones. Most OFS_DELTA headers are <= 6 bytes, so this rewrite
tranforms 2 writes of 3 bytes each into 1 write of ~6 bytes.
Try to simplify the objectHeader code to reduce branches and use
more local registers. This shouldn't really be necessary if the
compiler is well optimized, but it isn't very hard to clarify data
usage to either javac or the JIT, which may make it easier for the
JIT to produce better machine code for this method.
Change-Id: I2b12788ad6866076fabbf7fa11f8cce44e963f35