You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

GcBasicPackingTest.java 11KB

Limit the range of commits for which bitmaps are created. A bitmap index contains bitmaps for a set of commits in a pack file. Creating a bitmap for every commit is too expensive, so heuristics select the most "important" commits. The most recent commits are the most valuable. To clone a repository only those for the branch tips are needed. When fetching, only commits since the last fetch are needed. The commit selection heuristics generally work, but for some repositories the number of selected commits is prohibitively high. One example is the MSM 3.10 Linux kernel. With over 1 million commits on 2820 branches, the current heuristics resulted in +36k selected commits. Each uncompressed bitmap for that repository is ~413k, making it difficult to complete a GC operation in available memory. The benefit of creating bitmaps over the entire history of a repository like the MSM 3.10 Linux kernel isn't clear. For that repository, most history for the last year appears to be in the last 100k commits. Limiting bitmap commit selection to just those commits reduces the count of selected commits from ~36k to ~10.5k. Dropping bitmaps for older commits does not affect object counting times for clones or for fetches on clients that are reasonably up-to-date. This patch defines a new "bitmapCommitRange" PackConfig parameter to limit the commit selection process when building bitmaps. The range starts with the most recent commit and walks backwards. A range of 10k considers only the 10000 most recent commits. A range of zero creates bitmaps only for branch tips. A range of -1 (the default) does not limit the range--all commits in the pack are used in the commit selection process. Change-Id: Ied92c70cfa0778facc670e0f14a0980bed5e3bfb Signed-off-by: Terry Parker <tparker@google.com>
8 years ago
Limit the range of commits for which bitmaps are created. A bitmap index contains bitmaps for a set of commits in a pack file. Creating a bitmap for every commit is too expensive, so heuristics select the most "important" commits. The most recent commits are the most valuable. To clone a repository only those for the branch tips are needed. When fetching, only commits since the last fetch are needed. The commit selection heuristics generally work, but for some repositories the number of selected commits is prohibitively high. One example is the MSM 3.10 Linux kernel. With over 1 million commits on 2820 branches, the current heuristics resulted in +36k selected commits. Each uncompressed bitmap for that repository is ~413k, making it difficult to complete a GC operation in available memory. The benefit of creating bitmaps over the entire history of a repository like the MSM 3.10 Linux kernel isn't clear. For that repository, most history for the last year appears to be in the last 100k commits. Limiting bitmap commit selection to just those commits reduces the count of selected commits from ~36k to ~10.5k. Dropping bitmaps for older commits does not affect object counting times for clones or for fetches on clients that are reasonably up-to-date. This patch defines a new "bitmapCommitRange" PackConfig parameter to limit the commit selection process when building bitmaps. The range starts with the most recent commit and walks backwards. A range of 10k considers only the 10000 most recent commits. A range of zero creates bitmaps only for branch tips. A range of -1 (the default) does not limit the range--all commits in the pack are used in the commit selection process. Change-Id: Ied92c70cfa0778facc670e0f14a0980bed5e3bfb Signed-off-by: Terry Parker <tparker@google.com>
8 years ago
Limit the range of commits for which bitmaps are created. A bitmap index contains bitmaps for a set of commits in a pack file. Creating a bitmap for every commit is too expensive, so heuristics select the most "important" commits. The most recent commits are the most valuable. To clone a repository only those for the branch tips are needed. When fetching, only commits since the last fetch are needed. The commit selection heuristics generally work, but for some repositories the number of selected commits is prohibitively high. One example is the MSM 3.10 Linux kernel. With over 1 million commits on 2820 branches, the current heuristics resulted in +36k selected commits. Each uncompressed bitmap for that repository is ~413k, making it difficult to complete a GC operation in available memory. The benefit of creating bitmaps over the entire history of a repository like the MSM 3.10 Linux kernel isn't clear. For that repository, most history for the last year appears to be in the last 100k commits. Limiting bitmap commit selection to just those commits reduces the count of selected commits from ~36k to ~10.5k. Dropping bitmaps for older commits does not affect object counting times for clones or for fetches on clients that are reasonably up-to-date. This patch defines a new "bitmapCommitRange" PackConfig parameter to limit the commit selection process when building bitmaps. The range starts with the most recent commit and walks backwards. A range of 10k considers only the 10000 most recent commits. A range of zero creates bitmaps only for branch tips. A range of -1 (the default) does not limit the range--all commits in the pack are used in the commit selection process. Change-Id: Ied92c70cfa0778facc670e0f14a0980bed5e3bfb Signed-off-by: Terry Parker <tparker@google.com>
8 years ago
Limit the range of commits for which bitmaps are created. A bitmap index contains bitmaps for a set of commits in a pack file. Creating a bitmap for every commit is too expensive, so heuristics select the most "important" commits. The most recent commits are the most valuable. To clone a repository only those for the branch tips are needed. When fetching, only commits since the last fetch are needed. The commit selection heuristics generally work, but for some repositories the number of selected commits is prohibitively high. One example is the MSM 3.10 Linux kernel. With over 1 million commits on 2820 branches, the current heuristics resulted in +36k selected commits. Each uncompressed bitmap for that repository is ~413k, making it difficult to complete a GC operation in available memory. The benefit of creating bitmaps over the entire history of a repository like the MSM 3.10 Linux kernel isn't clear. For that repository, most history for the last year appears to be in the last 100k commits. Limiting bitmap commit selection to just those commits reduces the count of selected commits from ~36k to ~10.5k. Dropping bitmaps for older commits does not affect object counting times for clones or for fetches on clients that are reasonably up-to-date. This patch defines a new "bitmapCommitRange" PackConfig parameter to limit the commit selection process when building bitmaps. The range starts with the most recent commit and walks backwards. A range of 10k considers only the 10000 most recent commits. A range of zero creates bitmaps only for branch tips. A range of -1 (the default) does not limit the range--all commits in the pack are used in the commit selection process. Change-Id: Ied92c70cfa0778facc670e0f14a0980bed5e3bfb Signed-off-by: Terry Parker <tparker@google.com>
8 years ago
Update bitmap selection throttling to fully span active branches. Replace the “bitmapCommitRange” parameter that was recently introduced with two new parameters: “bitmapExcessiveBranchCount” and “bitmapInactiveBranchAgeInDays”. If the count of branches does not exceed “bitmapExcessiveBranchCount”, then the current algorithm is kept for all branches. If the branch count is excessive, then the commit time for the tip commit for each branch is used to determine if a branch is “inactive”. "Active" branches get full commit selection using the existing algorithm. "Inactive" branches get fewer bitmaps near the branch tips. Introduce a "contiguousCommitCount" parameter that always enforces that the N most recent commits in a branch are selected for bitmaps. The previous nextSelectionDistance() algorithm created anywhere from 1-100 contiguous bitmaps at branch tips. For example, consider a branch with commits numbering 0-300, with 0 being the most recent commit. If the most recent 200 commits are not merge commits and the 200th commit was the last one selected, nextSelectionDistance() returned 100, causing commits 200-101 to be ignored. Then a window of size 100 was evaluated, searching for merge commits. Since no merge commits are found, the next commit (commit 0) was selected, for a total of 1 commit in the topmost 100 commits. If instead the 250th commit was selected, then by the same logic commit 50 is selected. At that point nextSelectionDistance() switches to selecting consecutive commits, so commits 0-50 in the topmost 100 commits are selected. The "contiguousCommitCount" parameter provides more determinism by always selecting a constant number or topmost commits. Add an optimization to break out of the inner loop of selectCommits() if all of the commits for the current branch have already been found. When reusing bitmaps from an existing pack, remove unnecessary populating and clearing of the writeBitmaps/PackBitmapIndexBuilder. Add comments to PackWriterBitmapPreparer, rename methods and variables for readability. Add tests for bitmap selection with and without merge commits and with excessive branch pruning triggered. Note: I will follow up with an additional change that exposes the new parameters through PackConfig. Change-Id: I5ccbb96c8849f331c302d9f7840e05f9650c4608 Signed-off-by: Terry Parker <tparker@google.com>
8 years ago
Limit the range of commits for which bitmaps are created. A bitmap index contains bitmaps for a set of commits in a pack file. Creating a bitmap for every commit is too expensive, so heuristics select the most "important" commits. The most recent commits are the most valuable. To clone a repository only those for the branch tips are needed. When fetching, only commits since the last fetch are needed. The commit selection heuristics generally work, but for some repositories the number of selected commits is prohibitively high. One example is the MSM 3.10 Linux kernel. With over 1 million commits on 2820 branches, the current heuristics resulted in +36k selected commits. Each uncompressed bitmap for that repository is ~413k, making it difficult to complete a GC operation in available memory. The benefit of creating bitmaps over the entire history of a repository like the MSM 3.10 Linux kernel isn't clear. For that repository, most history for the last year appears to be in the last 100k commits. Limiting bitmap commit selection to just those commits reduces the count of selected commits from ~36k to ~10.5k. Dropping bitmaps for older commits does not affect object counting times for clones or for fetches on clients that are reasonably up-to-date. This patch defines a new "bitmapCommitRange" PackConfig parameter to limit the commit selection process when building bitmaps. The range starts with the most recent commit and walks backwards. A range of 10k considers only the 10000 most recent commits. A range of zero creates bitmaps only for branch tips. A range of -1 (the default) does not limit the range--all commits in the pack are used in the commit selection process. Change-Id: Ied92c70cfa0778facc670e0f14a0980bed5e3bfb Signed-off-by: Terry Parker <tparker@google.com>
8 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331
  1. /*
  2. * Copyright (C) 2012, Christian Halstrick <christian.halstrick@sap.com> and others
  3. *
  4. * This program and the accompanying materials are made available under the
  5. * terms of the Eclipse Distribution License v. 1.0 which is available at
  6. * https://www.eclipse.org/org/documents/edl-v10.php.
  7. *
  8. * SPDX-License-Identifier: BSD-3-Clause
  9. */
  10. package org.eclipse.jgit.internal.storage.file;
  11. import static org.junit.Assert.assertEquals;
  12. import static org.junit.Assert.assertFalse;
  13. import static org.junit.Assert.assertTrue;
  14. import java.io.File;
  15. import java.io.IOException;
  16. import java.util.ArrayList;
  17. import java.util.Collection;
  18. import java.util.Date;
  19. import java.util.List;
  20. import org.eclipse.jgit.junit.TestRepository.BranchBuilder;
  21. import org.eclipse.jgit.lib.ConfigConstants;
  22. import org.eclipse.jgit.lib.RefUpdate;
  23. import org.eclipse.jgit.revwalk.RevCommit;
  24. import org.eclipse.jgit.storage.file.FileBasedConfig;
  25. import org.eclipse.jgit.storage.pack.PackConfig;
  26. import org.junit.Test;
  27. import org.junit.experimental.theories.DataPoints;
  28. import org.junit.experimental.theories.Theories;
  29. import org.junit.experimental.theories.Theory;
  30. import org.junit.runner.RunWith;
  31. @RunWith(Theories.class)
  32. public class GcBasicPackingTest extends GcTestCase {
  33. @DataPoints
  34. public static boolean[] aggressiveValues = { true, false };
  35. @Theory
  36. public void repackEmptyRepo_noPackCreated(boolean aggressive)
  37. throws IOException {
  38. configureGc(gc, aggressive);
  39. gc.repack();
  40. assertEquals(0, repo.getObjectDatabase().getPacks().size());
  41. }
  42. @Theory
  43. public void testPackRepoWithNoRefs(boolean aggressive) throws Exception {
  44. tr.commit().add("A", "A").add("B", "B").create();
  45. stats = gc.getStatistics();
  46. assertEquals(4, stats.numberOfLooseObjects);
  47. assertEquals(0, stats.numberOfPackedObjects);
  48. configureGc(gc, aggressive);
  49. gc.gc();
  50. stats = gc.getStatistics();
  51. assertEquals(4, stats.numberOfLooseObjects);
  52. assertEquals(0, stats.numberOfPackedObjects);
  53. assertEquals(0, stats.numberOfPackFiles);
  54. assertEquals(0, stats.numberOfBitmaps);
  55. }
  56. @Theory
  57. public void testPack2Commits(boolean aggressive) throws Exception {
  58. BranchBuilder bb = tr.branch("refs/heads/master");
  59. bb.commit().add("A", "A").add("B", "B").create();
  60. bb.commit().add("A", "A2").add("B", "B2").create();
  61. stats = gc.getStatistics();
  62. assertEquals(8, stats.numberOfLooseObjects);
  63. assertEquals(0, stats.numberOfPackedObjects);
  64. configureGc(gc, aggressive);
  65. gc.gc();
  66. stats = gc.getStatistics();
  67. assertEquals(0, stats.numberOfLooseObjects);
  68. assertEquals(8, stats.numberOfPackedObjects);
  69. assertEquals(1, stats.numberOfPackFiles);
  70. assertEquals(2, stats.numberOfBitmaps);
  71. }
  72. @Theory
  73. public void testPack2Commits_noPackFolder(boolean aggressive) throws Exception {
  74. File packDir = repo.getObjectDatabase().getPackDirectory();
  75. assertTrue(packDir.delete());
  76. BranchBuilder bb = tr.branch("refs/heads/master");
  77. bb.commit().add("A", "A").add("B", "B").create();
  78. bb.commit().add("A", "A2").add("B", "B2").create();
  79. stats = gc.getStatistics();
  80. assertEquals(8, stats.numberOfLooseObjects);
  81. assertEquals(0, stats.numberOfPackedObjects);
  82. configureGc(gc, aggressive);
  83. gc.gc();
  84. stats = gc.getStatistics();
  85. assertEquals(0, stats.numberOfLooseObjects);
  86. assertEquals(8, stats.numberOfPackedObjects);
  87. assertEquals(1, stats.numberOfPackFiles);
  88. assertEquals(2, stats.numberOfBitmaps);
  89. assertTrue(packDir.exists());
  90. }
  91. @Theory
  92. public void testPackAllObjectsInOnePack(boolean aggressive)
  93. throws Exception {
  94. tr.branch("refs/heads/master").commit().add("A", "A").add("B", "B")
  95. .create();
  96. stats = gc.getStatistics();
  97. assertEquals(4, stats.numberOfLooseObjects);
  98. assertEquals(0, stats.numberOfPackedObjects);
  99. configureGc(gc, aggressive);
  100. gc.gc();
  101. stats = gc.getStatistics();
  102. assertEquals(0, stats.numberOfLooseObjects);
  103. assertEquals(4, stats.numberOfPackedObjects);
  104. assertEquals(1, stats.numberOfPackFiles);
  105. assertEquals(1, stats.numberOfBitmaps);
  106. // Do the gc again and check that it hasn't changed anything
  107. gc.gc();
  108. stats = gc.getStatistics();
  109. assertEquals(0, stats.numberOfLooseObjects);
  110. assertEquals(4, stats.numberOfPackedObjects);
  111. assertEquals(1, stats.numberOfPackFiles);
  112. assertEquals(1, stats.numberOfBitmaps);
  113. }
  114. @Theory
  115. public void testPackCommitsAndLooseOne(boolean aggressive)
  116. throws Exception {
  117. BranchBuilder bb = tr.branch("refs/heads/master");
  118. RevCommit first = bb.commit().add("A", "A").add("B", "B").create();
  119. bb.commit().add("A", "A2").add("B", "B2").create();
  120. tr.update("refs/heads/master", first);
  121. stats = gc.getStatistics();
  122. assertEquals(8, stats.numberOfLooseObjects);
  123. assertEquals(0, stats.numberOfPackedObjects);
  124. configureGc(gc, aggressive);
  125. gc.gc();
  126. stats = gc.getStatistics();
  127. assertEquals(0, stats.numberOfLooseObjects);
  128. assertEquals(8, stats.numberOfPackedObjects);
  129. assertEquals(2, stats.numberOfPackFiles);
  130. assertEquals(1, stats.numberOfBitmaps);
  131. }
  132. @Theory
  133. public void testNotPackTwice(boolean aggressive) throws Exception {
  134. BranchBuilder bb = tr.branch("refs/heads/master");
  135. RevCommit first = bb.commit().message("M").add("M", "M").create();
  136. bb.commit().message("B").add("B", "Q").create();
  137. bb.commit().message("A").add("A", "A").create();
  138. RevCommit second = tr.commit().parent(first).message("R").add("R", "Q")
  139. .create();
  140. tr.update("refs/tags/t1", second);
  141. Collection<PackFile> oldPacks = tr.getRepository().getObjectDatabase()
  142. .getPacks();
  143. assertEquals(0, oldPacks.size());
  144. stats = gc.getStatistics();
  145. assertEquals(11, stats.numberOfLooseObjects);
  146. assertEquals(0, stats.numberOfPackedObjects);
  147. gc.setExpireAgeMillis(0);
  148. fsTick();
  149. configureGc(gc, aggressive);
  150. gc.gc();
  151. stats = gc.getStatistics();
  152. assertEquals(0, stats.numberOfLooseObjects);
  153. List<PackFile> packs = new ArrayList<>(
  154. repo.getObjectDatabase().getPacks());
  155. assertEquals(11, packs.get(0).getObjectCount());
  156. }
  157. @Test
  158. public void testDonePruneTooYoungPacks() throws Exception {
  159. BranchBuilder bb = tr.branch("refs/heads/master");
  160. bb.commit().message("M").add("M", "M").create();
  161. String tempRef = "refs/heads/soon-to-be-unreferenced";
  162. BranchBuilder bb2 = tr.branch(tempRef);
  163. bb2.commit().message("M").add("M", "M").create();
  164. gc.setExpireAgeMillis(0);
  165. gc.gc();
  166. stats = gc.getStatistics();
  167. assertEquals(0, stats.numberOfLooseObjects);
  168. assertEquals(4, stats.numberOfPackedObjects);
  169. assertEquals(1, stats.numberOfPackFiles);
  170. File oldPackfile = tr.getRepository().getObjectDatabase().getPacks()
  171. .iterator().next().getPackFile();
  172. fsTick();
  173. // delete the temp ref, orphaning its commit
  174. RefUpdate update = tr.getRepository().getRefDatabase().newUpdate(tempRef, false);
  175. update.setForceUpdate(true);
  176. update.delete();
  177. bb.commit().message("B").add("B", "Q").create();
  178. // The old packfile is too young to be deleted. We should end up with
  179. // two pack files
  180. gc.setExpire(new Date(oldPackfile.lastModified() - 1));
  181. gc.gc();
  182. stats = gc.getStatistics();
  183. assertEquals(0, stats.numberOfLooseObjects);
  184. // if objects exist in multiple packFiles then they are counted multiple
  185. // times
  186. assertEquals(10, stats.numberOfPackedObjects);
  187. assertEquals(2, stats.numberOfPackFiles);
  188. // repack again but now without a grace period for loose objects. Since
  189. // we don't have loose objects anymore this shouldn't change anything
  190. gc.setExpireAgeMillis(0);
  191. gc.gc();
  192. stats = gc.getStatistics();
  193. assertEquals(0, stats.numberOfLooseObjects);
  194. // if objects exist in multiple packFiles then they are counted multiple
  195. // times
  196. assertEquals(10, stats.numberOfPackedObjects);
  197. assertEquals(2, stats.numberOfPackFiles);
  198. // repack again but now without a grace period for packfiles. We should
  199. // end up with one packfile
  200. gc.setPackExpireAgeMillis(0);
  201. // we want to keep newly-loosened objects though
  202. gc.setExpireAgeMillis(-1);
  203. gc.gc();
  204. stats = gc.getStatistics();
  205. assertEquals(1, stats.numberOfLooseObjects);
  206. // if objects exist in multiple packFiles then they are counted multiple
  207. // times
  208. assertEquals(6, stats.numberOfPackedObjects);
  209. assertEquals(1, stats.numberOfPackFiles);
  210. }
  211. @Test
  212. public void testImmediatePruning() throws Exception {
  213. BranchBuilder bb = tr.branch("refs/heads/master");
  214. bb.commit().message("M").add("M", "M").create();
  215. String tempRef = "refs/heads/soon-to-be-unreferenced";
  216. BranchBuilder bb2 = tr.branch(tempRef);
  217. bb2.commit().message("M").add("M", "M").create();
  218. gc.setExpireAgeMillis(0);
  219. gc.gc();
  220. stats = gc.getStatistics();
  221. fsTick();
  222. // delete the temp ref, orphaning its commit
  223. RefUpdate update = tr.getRepository().getRefDatabase().newUpdate(tempRef, false);
  224. update.setForceUpdate(true);
  225. update.delete();
  226. bb.commit().message("B").add("B", "Q").create();
  227. // We want to immediately prune deleted objects
  228. FileBasedConfig config = repo.getConfig();
  229. config.setString(ConfigConstants.CONFIG_GC_SECTION, null,
  230. ConfigConstants.CONFIG_KEY_PRUNEEXPIRE, "now");
  231. config.save();
  232. //And we don't want to keep packs full of dead objects
  233. gc.setPackExpireAgeMillis(0);
  234. gc.gc();
  235. stats = gc.getStatistics();
  236. assertEquals(0, stats.numberOfLooseObjects);
  237. assertEquals(6, stats.numberOfPackedObjects);
  238. assertEquals(1, stats.numberOfPackFiles);
  239. }
  240. @Test
  241. public void testPreserveAndPruneOldPacks() throws Exception {
  242. testPreserveOldPacks();
  243. configureGc(gc, false).setPrunePreserved(true);
  244. gc.gc();
  245. assertFalse(repo.getObjectDatabase().getPreservedDirectory().exists());
  246. }
  247. private void testPreserveOldPacks() throws Exception {
  248. BranchBuilder bb = tr.branch("refs/heads/master");
  249. bb.commit().message("P").add("P", "P").create();
  250. // pack loose object into packfile
  251. gc.setExpireAgeMillis(0);
  252. gc.gc();
  253. File oldPackfile = tr.getRepository().getObjectDatabase().getPacks()
  254. .iterator().next().getPackFile();
  255. assertTrue(oldPackfile.exists());
  256. fsTick();
  257. bb.commit().message("B").add("B", "Q").create();
  258. // repack again but now without a grace period for packfiles. We should
  259. // end up with a new packfile and the old one should be placed in the
  260. // preserved directory
  261. gc.setPackExpireAgeMillis(0);
  262. configureGc(gc, false).setPreserveOldPacks(true);
  263. gc.gc();
  264. File oldPackDir = repo.getObjectDatabase().getPreservedDirectory();
  265. String oldPackFileName = oldPackfile.getName();
  266. String oldPackName = oldPackFileName.substring(0,
  267. oldPackFileName.lastIndexOf('.')) + ".old-pack"; //$NON-NLS-1$
  268. File preservePackFile = new File(oldPackDir, oldPackName);
  269. assertTrue(preservePackFile.exists());
  270. }
  271. private PackConfig configureGc(GC myGc, boolean aggressive) {
  272. PackConfig pconfig = new PackConfig(repo);
  273. if (aggressive) {
  274. pconfig.setDeltaSearchWindowSize(250);
  275. pconfig.setMaxDeltaDepth(250);
  276. pconfig.setReuseObjects(false);
  277. } else
  278. pconfig = new PackConfig(repo);
  279. myGc.setPackConfig(pconfig);
  280. return pconfig;
  281. }
  282. }