summaryrefslogtreecommitdiffstats
path: root/Documentation
Commit message (Collapse)AuthorAgeFilesLines
* Document commit-graph options supported by JGitMatthias Sohn2023-09-011-0/+8
| | | | Change-Id: I0ab1b826232bbfcf28518d7a01ae5c5d82a08e04
* Introduce core.packedIndexGitUseStrongRefs config keyMartin Fick2023-08-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce a core.packedIndexGitUseStrongRefs configuration key, which defaults to true so that the current behavior does not change. However, setting it to false allows soft references to be used for Pack indices instead of strong references so that they can be garbage collected when there is memory pressure. Pack objects can be large when associated with pack files with large object counts, and this memory is not really accounted for or tracked by the WindowCache and it can be very substantial at times, especially with many large object count projects. A particularly problematic use case is Gerrit's ls-projects command which loads very little data in the WindowCache via ByteWindows, but ends up loading and holding many entire indices in memory, sometimes even after the ByteWindows for their Pack objects have already been garbage collected since they won't get cleared until after a new ByteWindow is loaded. By using SoftReferences, single use indices can get cleared when there is memory pressure and OOMs can be easily avoided, drastically reducing the amount of memory required to perform an ls-projects on large sites with many projects and large object counts. On one of our test sites, an ls-projects command with strong index references requires more than 66GB of heap to complete successfully, with soft index references it requires less than 23GB. Change-Id: I3cb3df52f4ce1b8c554d378807218f199077d80b Signed-off-by: Martin Fick <quic_mfick@quicinc.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
* BasePackFetchConnection: support negotiationTip featureRonald Bhuleskar2023-03-281-0/+6
| | | | | | | | | | By default, Git will report, to the server, commits reachable from all local refs to find common commits in an attempt to reduce the size of the to-be-received packfile. If specified with negotiation tip, Git will only report commits reachable from the given tips. This is useful to speed up fetches when the user knows which local ref is likely to have commits in common with the upstream ref being fetched. When negotation-tip is on, use the wanted refs instead of all refs as source of the "have" list to send. This is controlled by the `fetch.usenegotationtip` flag, false by default. This works only for programmatic fetches and there is no support for it yet in the CLI. Change-Id: I19f8fe48889bfe0ece7cdf78019b678ede5c6a32
* Merge branch 'stable-6.4'Matthias Sohn2023-02-201-2/+3
|\ | | | | | | | | | | | | | | | | | | * stable-6.4: Fix getPackedRefs to not throw NoSuchFileException Add pack options to preserve and prune old pack files Allow to perform PackedBatchRefUpdate without locking loose refs Document option "core.sha1Implementation" introduced in 59029aec Change-Id: I36051c623fcd480aa80ed32b4e89f9bdd1b798e0
| * Merge branch 'stable-6.0' into stable-6.1Matthias Sohn2023-02-161-2/+3
| |\ | | | | | | | | | | | | | | | | | | | | | | | | * stable-6.0: Add pack options to preserve and prune old pack files Allow to perform PackedBatchRefUpdate without locking loose refs Document option "core.sha1Implementation" introduced in 59029aec Change-Id: I876a38c2de8b7d5eaacd00e36b85599f88173221
| | * Add pack options to preserve and prune old pack filesMatthias Sohn2023-02-111-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the options - pack.preserveOldPacks - pack.prunePreserved This allows to configure in git config if old packs should be preserved during gc and pruned during the next gc. The original implementation in 91132bb0 only allows to set these options using the API. Change-Id: I5b23ab4f317d12f5ccd234401419913e8263cc9a
| | * Document option "core.sha1Implementation" introduced in 59029aecMatthias Sohn2023-02-021-0/+1
| | | | | | | | | | | | | | | Bug: 580310 Change-Id: I10f3d6f6b5af7ab96683994c9cbd85e6c18a5084
* | | PackConfig: add entry for minimum size to indexIvan Frade2023-02-161-0/+1
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The object size index can have up to #(blobs-in-repo) entries, taking a relevant amount of memory. Let operators configure the threshold size to include objects in the size index. The index will include objects with size *at or above* this value (with -1 for none). This is more effective for the filter-by-size case. Lowering the threshold adds more objects to the index. This improves performance at the cost of memory/storage space. For the object-size case, more calls will use the index instead of reading IO. For the filter-by-size case, lower threshold means better granularity (if ObjectReader#isSmallerThan is implemented based only on the index). Change-Id: I6ccd9334adbbc2abf95fde51dbbfc85b8230ade0
* | Merge branch 'stable-6.0' into stable-6.1Matthias Sohn2023-02-011-0/+1
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | * stable-6.0: Shortcut during git fetch for avoiding looping through all local refs FetchCommand: fix fetchSubmodules to work on a Ref to a blob Silence API warnings introduced by I466dcde6 Allow the exclusions of refs prefixes from bitmap PackWriterBitmapPreparer: do not include annotated tags in bitmap BatchingProgressMonitor: avoid int overflow when computing percentage Speedup GC listing objects referenced from reflogs FileSnapshotTest: Add more MISSING_FILE coverage Change-Id: Ib5055f2f3b8a313c178d6f6c7c5630285ad5a726
| * Allow the exclusions of refs prefixes from bitmapLuca Milanesio2023-01-311-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When running a GC.repack() against a repository with over one thousands of refs/heads and tens of millions of ObjectIds, the calculation of all bitmaps associated with all the refs would result in an unreasonable big file that would take up to several hours to compute. Test scenario: repo with 2500 heads / 10M obj Intel Xeon E5-2680 2.5GHz Before this change: 20 mins After this change and 2300 heads excluded: 10 mins (90s for bitmap) Having such a large bitmap file is also slow in the runtime processing and have negligible or even negative benefits, because the time lost in reading and decompressing the bitmap in memory would not be compensated by the time saved by using it. It is key to preserve the bitmaps for those refs that are mostly used in clone/fetch and give the ability to exlude some refs prefixes that are known to be less frequently accessed, even though they may actually be actively written. Example: Gerrit sandbox branches may even be actively used and selected automatically because its commits are very recent, however, they may bloat the bitmap, making it ineffective. A mono-repo with tens of thousands of developers may have a relatively small number of active branches where the CI/CD jobs are continuously fetching/cloning the code. However, because Gerrit allows the use of sandbox branches, the total number of refs/heads may be even tens to hundred thousands. Change-Id: I466dcde69fa008e7f7785735c977f6e150e3b644 Signed-off-by: Luca Milanesio <luca.milanesio@gmail.com>
* | Refresh 'objects' dir and retry if a loose object is not foundKaushik Lingarkar2023-01-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A new loose object may not be immediately visible on a NFS client if it was created on another client. Refreshing the 'objects' dir and trying again can help work around the NFS behavior. Here's an E2E problem that this change can help fix. Consider a Gerrit multi-primary setup with repositories based on NFS. Add a new patch-set to an existing change and then immediately fetch the new patch-set of that change. If the fetch is handled by a Gerrit primary different that the one which created the patch-set, then we sometimes run into a MissingObjectException that causes the fetch to fail. Bug: 581317 Change-Id: Iccc6676c68ef13a1e8b2ff52b3eeca790a89a13d Signed-off-by: Kaushik Lingarkar <quic_kaushikl@quicinc.com>
* | Introduce core.trustPackedRefsStat configKaushik Lingarkar2023-01-051-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, we always read packed-refs file when 'trustFolderStat' is false. Introduce a new config 'trustPackedRefsStat' which takes precedence over 'trustFolderStat' when reading packed refs. Possible values for this new config are: * always: Trust packed-refs file attributes * after_open: Same as 'always', but refresh the file attributes of packed-refs before trusting it * never: Always read the packed-refs file * unset: Fallback to 'trustFolderStat' to determine if the file attributes of packed-refs can be trusted Folks whose repositories are on NFS and have traditionally been setting 'trustFolderStat=false' can now get some performance improvement with 'trustPackedRefsStat=after_open' as it refreshes the file attributes of packed-refs (at least on some NFS clients) before considering it. For example, consider a repository on NFS with ~500k packed-refs. Here are some stats which illustrate the improvement with this new config when reading packed refs on NFS: trustFolderStat=true trustPackedRefsStat=unset: 0.2ms trustFolderStat=false trustPackedRefsStat=unset: 155ms trustFolderStat=false trustPackedRefsStat=after_open: 1.5ms Change-Id: I00da88e4cceebbcf3475be0fc0011ff65767c111 Signed-off-by: Kaushik Lingarkar <quic_kaushikl@quicinc.com>
* | Fix documentation for core.trustFolderStatKaushik Lingarkar2022-12-141-1/+1
|/ | | | | | | | Update documentation for core.trustFolderStat to highlight that it is also used when reading the packed-refs file. Change-Id: I3eac377c3a7f48493abc8ae6d0889ee70a05d24d Signed-off-by: Kaushik Lingarkar <quic_kaushikl@quicinc.com>
* searchForReuse might impact performance in large repositoriesFabio Ponciroli2021-06-251-0/+1
| | | | | | | | | | | | | | The search for reuse phase for *all* the objects scans *all* the packfiles, looking for the best candidate to serve back to the client. This can lead to an expensive operation when the number of packfiles and objects is high. Add parameter "pack.searchForReuseTimeout" to limit the time spent on this search. Change-Id: I54f5cddb6796fdc93ad9585c2ab4b44854fa6c48
* Document http options supported by JGitThomas Wolf2021-03-131-0/+20
| | | | Change-Id: I0af4f9991fdb4f09de25f743d1e0dca67ceaa18b Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
* Fix formatting of config option valuesMatthias Sohn2020-10-261-4/+4
| | | | Change-Id: If9a4bb44c4b348cbb94127207566471105267a53
* Document options in core section supported by JGitMatthias Sohn2020-10-261-2/+34
| | | | Change-Id: I25af04112cf219405718b5c3e8e103156fb30fa5
* Document gc and pack relevant optionsMatthias Sohn2020-04-031-0/+57
| | | | | Change-Id: Iab7262b25942fa8c062b979d394674635b70a284 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
* Documentation/technical/reftable: improve repo layoutHan-Wen Nienhuys2020-02-111-23/+30
| | | | | | | | | | | | | | | Previously, the list of tables was in .git/refs. This makes repo detection fail in older clients, which is undesirable. This is proposal was discussed and approved on the git@vger list at https://lore.kernel.org/git/CAFQ2z_PvKiz==GyS6J1H1uG0FRPL86JvDj+LjX1We4-yCSVQ+g@mail.gmail.com/ For backward compatibility, JGit could detect a file under .git/refs and use it as a reftable list. Change-Id: Ic0b974fa250cfa905463b811957e2a4fdd7bbc6b Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
* reftable: enforce ascending order in sortAndWriteRefsHan-Wen Nienhuys2019-10-301-0/+4
| | | | | | | | | | | | | | | | | | | MergedReftableTest#scanDuplicates tests whether we can write duplicate keys in a merged reftable. Apparently, the first key appearing should get precedence, and this works because the sort() algorithm on ordered collections is stable. This is potentially confusing behavior, because you can write data into the table that cannot be retrieved (Merged table can only have one entry per key), and the APIs such as exactRef() only return a single value. Make this consistent with behavior introduced in I04f55c481 "reftable: enforce ordering for ref and log writes" by considering a duplicate key in sortAndWriteRefs as a fatal runtime error. Change-Id: I1eedd18f028180069f78c5c467169dcfe1521157 Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
* Documentation/technical/reftable: change suggested file namesHan-Wen Nienhuys2019-10-301-11/+15
| | | | | | | | | | | | | | By using ${min_update}-${max_update} as file name template, we guarantee that each file has a unique name. This allows data from open files to be cached across reloads of the stack. This is in anticipation of Change I1837f268e ("file: implement FileReftableDatabase"), which is the first implementation of reftable on a filesystem. Change-Id: I7ef0610eb60c494165382d0c372afcf41f074393 Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
* Documentation/technical/reftable: document rename in reflog.Han-Wen Nienhuys2019-08-211-0/+4
| | | | | Change-Id: I0fe7d28a772b1ee9eefd9a38bff5e08a8559988f Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
* reftable: explicitly store update_index per refShawn Pearce2017-08-211-0/+5
| | | | | | | | | | | | | | | | | Add an update_index to every reference in a reftable, storing the exact transaction that last modified the reference. This is necessary to fix some merge race conditions. Consider updates at T1, T3 are present in two reftables. Compacting these will create a table with range [T1,T3]. If T2 arrives during or after the compaction its impossible for readers to know how to merge the [T1,T3] table with the T2 table. With an explicit update_index per reference, MergedReftable is able to individually sort each reference, merging individual entries at T3 from [T1,T3] ahead of identically named entries appearing in T2. Change-Id: Ie4065d4176a5a0207dcab9696ae05d086e042140
* reftable: file format documentationShawn Pearce2017-08-171-0/+950
Some repositories contain a lot of references (e.g. android at 866k, rails at 31k). The reftable format provides: - Near constant time lookup for any single reference, even when the repository is cold and not in process or kernel cache. - Near constant time verification a SHA-1 is referred to by at least one reference (for allow-tip-sha1-in-want). - Efficient lookup of an entire namespace, such as `refs/tags/`. - Support atomic push `O(size_of_update)` operations. - Combine reflog storage with ref storage. Change-Id: I29d0ff1eee475845660ac9173413e1407adcfbf2