searchForReuse might impact performance in large repositories
The search for reuse phase for *all* the objects scans *all*
the packfiles, looking for the best candidate to serve back to the
client.
This can lead to an expensive operation when the number of
packfiles and objects is high.
Add parameter "pack.searchForReuseTimeout" to limit the time spent
on this search.
Change-Id: I54f5cddb6796fdc93ad9585c2ab4b44854fa6c48
Change-Id: Ifb8227cb62370029d6774f2a22b15d6478c713ca
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Wait opening new packfile until it can't be racy anymore
If
- pack.waitPreventRacyPack = true (default is false)
- packfile size > pack.minSizePreventRacyPack (default is 100 MB)
wait after a new packfile was written and before it is opened until it
cannot be racy anymore.
If a new packfile is accessed while it's still racy at least the pack's
index will be reread by ObjectDirectory.scanPacksImpl(). Hence it may
save resources to wait one tick of the file system timer to avoid this
reloading. On filesystems with a coarse timestamp resolution it may be
beneficial to skip this wait for small packfiles.
Bug: 546891
Change-Id: I0e8bf3d7677a025edd2e397dd2c9134ba59b1a18
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
If set, "singlePack" will create a single GC pack file for all
objects reachable from refs/*. If not set, the GC pack will contain
object reachable from refs/heads/* and refs/tags/*, and the GC_REST
pack will contain all other reachable objects.
Change-Id: I56bcb6a9da2c10a0909c2f940c025db6f3acebcb
Signed-off-by: Terry Parker <tparker@google.com>
Enable and fix 'Should be tagged with @Override' warning
Set missingOverrideAnnotation=warning in Eclipse compiler preferences
which enables the warning:
The method <method> of type <type> should be tagged with @Override
since it actually overrides a superclass method
Justification for this warning is described in:
http://stackoverflow.com/a/94411/381622
Enabling this causes in excess of 1000 warnings across the entire
code-base. They are very easy to fix automatically with Eclipse's
"Quick Fix" tool.
Fix all of them except 2 which cause compilation failure when the
project is built with mvn; add TODO comments on those for further
investigation.
Change-Id: I5772061041fd361fe93137fd8b0ad356e748a29c
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
gc: Add options to preserve and prune old pack files
The new --preserve-oldpacks option moves old pack files into the
preserved subdirectory instead of deleting them after repacking.
The new --prune-preserved option prunes old pack files from the
preserved subdirectory after repacking, but before potentially
moving the latest old packfiles to this subdirectory.
These options are designed to prevent stale file handle exceptions
during git operations which can happen on users of NFS repos when
repacking is done on them. The strategy is to preserve old pack files
around until the next repack with the hopes that they will become
unreferenced by then and not cause any exceptions to running processes
when they are finally deleted (pruned).
Change-Id: If3f729f0d9ce920ee2c3e6acdde46f2068be61d2
Signed-off-by: James Melvin <jmelvin@codeaurora.org>
Expose the following bitmap selection parameters via PackConfig:
"bitmapContiguousCommitCount", "bitmapRecentCommitCount",
"bitmapRecentCommitSpan", "bitmapDistantCommitSpan",
"bitmapExcessiveBranchCount", and "bitmapInactiveBranchAge".
The value of bitmapContiguousCommitCount, whereby bitmaps are
created for the most recent N commits in a branch, has never
been verified. If experiments show that they are not valuable,
then we can simplify the implementation so that there is only
a concept of recent and distant commit history (defined by
"bitmapRecentCommitCount"), and the only controls we need are
"bitmapRecentCommitSpan" and "bitmapDistantCommitSpan".
Change-Id: I288bf3f97d6fbfdfcd5dde2699eff433a7307fb9
Signed-off-by: Terry Parker <tparker@google.com>
Update bitmap selection throttling to fully span active branches.
Replace the “bitmapCommitRange” parameter that was recently introduced
with two new parameters: “bitmapExcessiveBranchCount” and
“bitmapInactiveBranchAgeInDays”. If the count of branches does not
exceed “bitmapExcessiveBranchCount”, then the current algorithm is kept
for all branches.
If the branch count is excessive, then the commit time for the tip
commit for each branch is used to determine if a branch is “inactive”.
"Active" branches get full commit selection using the existing
algorithm. "Inactive" branches get fewer bitmaps near the branch tips.
Introduce a "contiguousCommitCount" parameter that always enforces that
the N most recent commits in a branch are selected for bitmaps. The
previous nextSelectionDistance() algorithm created anywhere from 1-100
contiguous bitmaps at branch tips.
For example, consider a branch with commits numbering 0-300, with 0
being the most recent commit. If the most recent 200 commits are not
merge commits and the 200th commit was the last one selected,
nextSelectionDistance() returned 100, causing commits 200-101 to be
ignored. Then a window of size 100 was evaluated, searching for merge
commits. Since no merge commits are found, the next commit (commit 0)
was selected, for a total of 1 commit in the topmost 100 commits.
If instead the 250th commit was selected, then by the same logic
commit 50 is selected. At that point nextSelectionDistance() switches to
selecting consecutive commits, so commits 0-50 in the topmost 100
commits are selected. The "contiguousCommitCount" parameter provides
more determinism by always selecting a constant number or topmost
commits.
Add an optimization to break out of the inner loop of selectCommits() if
all of the commits for the current branch have already been found.
When reusing bitmaps from an existing pack, remove unnecessary
populating and clearing of the writeBitmaps/PackBitmapIndexBuilder.
Add comments to PackWriterBitmapPreparer, rename methods and variables
for readability.
Add tests for bitmap selection with and without merge commits and with
excessive branch pruning triggered.
Note: I will follow up with an additional change that exposes the new
parameters through PackConfig.
Change-Id: I5ccbb96c8849f331c302d9f7840e05f9650c4608
Signed-off-by: Terry Parker <tparker@google.com>
Limit the range of commits for which bitmaps are created.
A bitmap index contains bitmaps for a set of commits in a pack file.
Creating a bitmap for every commit is too expensive, so heuristics
select the most "important" commits. The most recent commits are the
most valuable. To clone a repository only those for the branch tips are
needed. When fetching, only commits since the last fetch are needed.
The commit selection heuristics generally work, but for some
repositories the number of selected commits is prohibitively high. One
example is the MSM 3.10 Linux kernel. With over 1 million commits on
2820 branches, the current heuristics resulted in +36k selected commits.
Each uncompressed bitmap for that repository is ~413k, making it
difficult to complete a GC operation in available memory.
The benefit of creating bitmaps over the entire history of a repository
like the MSM 3.10 Linux kernel isn't clear. For that repository, most
history for the last year appears to be in the last 100k commits.
Limiting bitmap commit selection to just those commits reduces the count
of selected commits from ~36k to ~10.5k. Dropping bitmaps for older
commits does not affect object counting times for clones or for fetches
on clients that are reasonably up-to-date.
This patch defines a new "bitmapCommitRange" PackConfig parameter to
limit the commit selection process when building bitmaps. The range
starts with the most recent commit and walks backwards. A range of 10k
considers only the 10000 most recent commits. A range of zero creates
bitmaps only for branch tips. A range of -1 (the default) does not limit
the range--all commits in the pack are used in the commit selection
process.
Change-Id: Ied92c70cfa0778facc670e0f14a0980bed5e3bfb
Signed-off-by: Terry Parker <tparker@google.com>
Support cutting existing delta chains longer than the max depth
Some packs built by JGit have incredibly long delta chains due to a
long standing bug in PackWriter. Google has packs created by JGit's
DfsGarbageCollector with chains of 6000 objects long, or more.
Inflating objects at the end of this 6000 long chain is impossible
to complete within a reasonable time bound. It could take a beefy
system hours to perform even using the heavily optimized native C
implementation of Git, let alone with JGit.
Enable pack.cutDeltaChains to be set in a configuration file to
permit the PackWriter to determine the length of each delta chain
and clip the chain at arbitrary points to fit within pack.depth.
Delta chain cycles are still possible, but no attempt is made to
detect them. A trivial chain of A->B->A will iterate for the full
pack.depth configured limit (e.g. 50) and then pick an object to
store as non-delta.
When cutting chains the object list is walked in reverse to try
and take advantage of existing chain computations. The assumption
here is most deltas are near the end of the list, and their bases
are near the front of the list. Going up from the tail attempts to
reuse chainLength computations by relying on the memoized value in
the delta base.
The chainLength field in ObjectToPack is overloaded into the depth
field normally used by DeltaWindow. This is acceptable because the
chain cut happens before delta search, and the chainLength is reset
to 0 if delta search will follow.
Change-Id: Ida4fde9558f3abbbb77ade398d2af3941de9c812
JGit 3.0: move internal classes into an internal subpackage
This breaks all existing callers once. Applications are not supposed
to build against the internal storage API unless they can accept API
churn and make necessary updates as versions change.
Change-Id: I2ab1327c202ef2003565e1b0770a583970e432e9
This is helpful for writing the pack configuration into a log file.
Change-Id: I5e7f5ff7e01c9538ca12a1860844ba9b467bdf05
Signed-off-by: Edwin Kempin <edwin.kempin@sap.com>
Bitmaps provide a huge performance boost for counting objects and they
play nice with the cgit implementation.
Change-Id: I33b05a6c8f1ee2df7770f0b9fdc50d0b4bbf1029
Support creating pack bitmap indexes in PackWriter.
Update the PackWriter to support writing out pack bitmap indexes,
a parallel ".bitmap" file to the ".pack" file.
Bitmaps are selected at commits every 1 to 5,000 commits for
each unique path from the start. The most recent 100 commits are
all bitmapped. The next 19,000 commits have a bitmaps every 100
commits. The remaining commits have a bitmap every 5,000 commits.
Commits with more than 1 parent are prefered over ones
with 1 or less. Furthermore, previously computed bitmaps are reused,
if the previous entry had the reuse flag set, which is set when the
bitmap was placed at the max allowed distance.
Bitmaps are used to speed up the counting phase when packing, for
requests that are not shallow. The PackWriterBitmapWalker uses
a RevFilter to proactively mark commits with RevFlag.SEEN, when
they appear in a bitmap. The walker produces the full closure
of reachable ObjectIds, given the collection of starting ObjectIds.
For fetch request, two ObjectWalks are executed to compute the
ObjectIds reachable from the haves and from the wants. The
ObjectIds needed to be written are determined by taking all the
resulting wants AND NOT the haves.
For clone requests, we get cached pack support for "free" since
it is possible to determine if all of the ObjectIds in a pack file
are included in the resulting list of ObjectIds to write.
On my machine, the best times for clones and fetches of the linux
kernel repository (with about 2.6M objects and 300K commits) are
tabulated below:
Operation Index V2 Index VE003
Clone 37530ms (524.06 MiB) 82ms (524.06 MiB)
Fetch (1 commit back) 75ms 107ms
Fetch (10 commits back) 456ms (269.51 KiB) 341ms (265.19 KiB)
Fetch (100 commits back) 449ms (269.91 KiB) 337ms (267.28 KiB)
Fetch (1000 commits back) 2229ms ( 14.75 MiB) 189ms ( 14.42 MiB)
Fetch (10000 commits back) 2177ms ( 16.30 MiB) 254ms ( 15.88 MiB)
Fetch (100000 commits back) 14340ms (185.83 MiB) 1655ms (189.39 MiB)
Change-Id: Icdb0cdd66ff168917fb9ef17b96093990cc6a98d
A few classes such as Constanrs are marked with @SuppressWarnings, as are
toString() methods with many liternal, but otherwise $NLS-n$ is used for
string containing text that should not be translated. A few literals may
fall into the gray zone, but mostly I've tried to only tag the obvious
ones.
Change-Id: I22e50a77e2bf9e0b842a66bdf674e8fa1692f590
Some embeddings of UploadPack (e.g. Gerrit Code Review) set their own
PackConfig from a server-wide configuration, overriding any JGit
defaults or settings that may exist at the local repository level.
Make a copy constructor form of PackConfig so this server-wide
configuration object can be copied and then merged with repository
specific configuration data.
Change-Id: I4463c95aeaf7d6536c3ab132dec9c50ee528d9e0
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Use limited getCachedBytes code to reduce duplication
Rather than duplicating this block everywhere, reuse the limited size
form of getCachedBytes to acquire the content of an object.
Change-Id: I2e26a823e6fd0964d8f8dbfaa0fc2e8834c179c1
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Enable configuration of non-standard pack settings
For daemons we might want to disable delta compression entirely, or
in some strange case an administrator might need to turn of delta
reuse. Expose these normally internal pack settings through the pack
configuration section.
Change-Id: I39bfefee8384c864cc04ffac724f197240c8a11a
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This refactoring permits applications to configure global per-process
settings for all packing and easily pass it through to per-request
PackWriters, ensuring that the process configuration overrides the
repository specific settings.
For example this might help in a daemon environment where the server
wants to cap the resources used to serve a dynamic upload pack
request, even though the repository's own pack.* settings might be
configured to be more aggressive. This allows fast but less bandwidth
efficient serving of clients, while still retaining good compression
through a cron managed `git gc`.
Change-Id: I58cc5e01b48924b1a99f79aa96c8150cdfc50846
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Honor pack.threads and perform delta search in parallel
If we have multiple CPUs available, packing usually goes faster
when each CPU is assigned a slice of the available search space.
The number of threads to use is guessed from the runtime if it
wasn't set by the caller, or wasn't set in the configuration.
Change-Id: If554fd8973db77632a52a0f45377dd6ec13fc220
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
PackWriter now caches small deltas, or deltas that are very tiny
compared to their source inputs, so that the writing phase goes
faster by reusing those cached deltas.
The cached data is stored compressed, which usually translates to
a bigger footprint due to deltas being very hard to compress, but
saves time during writing by avoiding the deflate step. They are
held under SoftReferences so that the JVM GC can clear out deltas
if memory gets very tight. We would rather continue working and
spend a bit more CPU time during writing than crash due to OOME.
To avoid OutOfMemoryErrors during the caching phase we also trap
OOME and just abort out of the caching.
Because deflateBound() always produces something larger than what
we need to actually store the deflated data, we copy it over into
a new buffer if the actual length doesn't match the buffer length.
When packing jgit.git this saves over 111 KiB in the cache, and is
thus a worthwhile hit on CPU time.
To further save memory we store the inflated size of the delta
(which we need for the object header) in the same field as the
pathHash, as the pathHash is no longer necessary by this phase
of the packing algorithm.
Change-Id: I0da0c600d845e8ec962289751f24e65b5afa56d7
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
C Git's fast-import uses this to determine the maximum file size
that it tries to delta compress, anything equal to or above this
setting is stored with as a whole object with simple deflate.
Define the configuration so we can use it later.
Change-Id: Iea46e787d019a1b6c51135cc73d7688a02e207f5
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
We now at least import other pack settings like pack.window, which
means we can later use these to control how we search for deltas.
The compression level was fixed to use pack.compression rather than
the loose object core.compression setting.
Change-Id: I72ff6d481c936153ceb6a9e485fa731faf075a9a
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Per CQ 3448 this is the initial contribution of the JGit project
to eclipse.org. It is derived from the historical JGit repository
at commit 3a2dd9921c.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>