You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

FileObjectDatabase.java 2.2KB

PackWriter: Support reuse of entire packs The most expensive part of packing a repository for transport to another system is enumerating all of the objects in the repository. Once this gets to the size of the linux-2.6 repository (1.8 million objects), enumeration can take several CPU minutes and costs a lot of temporary working set memory. Teach PackWriter to efficiently reuse an existing "cached pack" by answering a clone request with a thin pack followed by a larger cached pack appended to the end. This requires the repository owner to first construct the cached pack by hand, and record the tip commits inside of $GIT_DIR/objects/info/cached-packs: cd $GIT_DIR root=$(git rev-parse master) tmp=objects/.tmp-$$ names=$(echo $root | git pack-objects --keep-true-parents --revs $tmp) for n in $names; do chmod a-w $tmp-$n.pack $tmp-$n.idx touch objects/pack/pack-$n.keep mv $tmp-$n.pack objects/pack/pack-$n.pack mv $tmp-$n.idx objects/pack/pack-$n.idx done (echo "+ $root"; for n in $names; do echo "P $n"; done; echo) >>objects/info/cached-packs git repack -a -d When a clone request needs to include $root, the corresponding cached pack will be copied as-is, rather than enumerating all of the objects that are reachable from $root. For a linux-2.6 kernel repository that should be about 376 MiB, the above process creates two packs of 368 MiB and 38 MiB[1]. This is a local disk usage increase of ~26 MiB, due to reduced delta compression between the large cached pack and the smaller recent activity pack. The overhead is similar to 1 full copy of the compressed project sources. With this cached pack in hand, JGit daemon completes a clone request in 1m17s less time, but a slightly larger data transfer (+2.39 MiB): Before: remote: Counting objects: 1861830, done remote: Finding sources: 100% (1861830/1861830) remote: Getting sizes: 100% (88243/88243) remote: Compressing objects: 100% (88184/88184) Receiving objects: 100% (1861830/1861830), 376.01 MiB | 19.01 MiB/s, done. remote: Total 1861830 (delta 4706), reused 1851053 (delta 1553844) Resolving deltas: 100% (1564621/1564621), done. real 3m19.005s After: remote: Counting objects: 1601, done remote: Counting objects: 1828460, done remote: Finding sources: 100% (50475/50475) remote: Getting sizes: 100% (18843/18843) remote: Compressing objects: 100% (7585/7585) remote: Total 1861830 (delta 2407), reused 1856197 (delta 37510) Receiving objects: 100% (1861830/1861830), 378.40 MiB | 31.31 MiB/s, done. Resolving deltas: 100% (1559477/1559477), done. real 2m2.938s Repository owners can periodically refresh their cached packs by repacking their repository, folding all newer objects into a larger cached pack. Since repacking is already considered to be a normal Git maintenance activity, this isn't a very big burden. [1] In this test $root was set back about two weeks. Change-Id: Ib87131d5c4b5e8c5cacb0f4fe16ff4ece554734b Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
13 years ago
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677
  1. /*
  2. * Copyright (C) 2010, Google Inc. and others
  3. *
  4. * This program and the accompanying materials are made available under the
  5. * terms of the Eclipse Distribution License v. 1.0 which is available at
  6. * https://www.eclipse.org/org/documents/edl-v10.php.
  7. *
  8. * SPDX-License-Identifier: BSD-3-Clause
  9. */
  10. package org.eclipse.jgit.internal.storage.file;
  11. import java.io.File;
  12. import java.io.IOException;
  13. import java.util.Collection;
  14. import java.util.Set;
  15. import org.eclipse.jgit.internal.storage.pack.ObjectToPack;
  16. import org.eclipse.jgit.internal.storage.pack.PackWriter;
  17. import org.eclipse.jgit.lib.AbbreviatedObjectId;
  18. import org.eclipse.jgit.lib.AnyObjectId;
  19. import org.eclipse.jgit.lib.Config;
  20. import org.eclipse.jgit.lib.ObjectDatabase;
  21. import org.eclipse.jgit.lib.ObjectId;
  22. import org.eclipse.jgit.lib.ObjectLoader;
  23. import org.eclipse.jgit.lib.ObjectReader;
  24. import org.eclipse.jgit.util.FS;
  25. abstract class FileObjectDatabase extends ObjectDatabase {
  26. enum InsertLooseObjectResult {
  27. INSERTED, EXISTS_PACKED, EXISTS_LOOSE, FAILURE;
  28. }
  29. /** {@inheritDoc} */
  30. @Override
  31. public ObjectReader newReader() {
  32. return new WindowCursor(this);
  33. }
  34. /** {@inheritDoc} */
  35. @Override
  36. public ObjectDirectoryInserter newInserter() {
  37. return new ObjectDirectoryInserter(this, getConfig());
  38. }
  39. abstract void resolve(Set<ObjectId> matches, AbbreviatedObjectId id)
  40. throws IOException;
  41. abstract Config getConfig();
  42. abstract FS getFS();
  43. abstract Set<ObjectId> getShallowCommits() throws IOException;
  44. abstract void selectObjectRepresentation(PackWriter packer,
  45. ObjectToPack otp, WindowCursor curs) throws IOException;
  46. abstract File getDirectory();
  47. abstract File fileFor(AnyObjectId id);
  48. abstract ObjectLoader openObject(WindowCursor curs, AnyObjectId objectId)
  49. throws IOException;
  50. abstract long getObjectSize(WindowCursor curs, AnyObjectId objectId)
  51. throws IOException;
  52. abstract ObjectLoader openLooseObject(WindowCursor curs, AnyObjectId id)
  53. throws IOException;
  54. abstract InsertLooseObjectResult insertUnpackedObject(File tmp,
  55. ObjectId id, boolean createDuplicate) throws IOException;
  56. abstract Pack openPack(File pack) throws IOException;
  57. abstract Collection<Pack> getPacks();
  58. }