summaryrefslogtreecommitdiffstats
path: root/org.eclipse.jgit
Commit message (Collapse)AuthorAgeFilesLines
...
| * | | Add "all" parameter to the commit CommandStefan Lay2010-08-041-0/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the add parameter is set all modified and deleted files are staged prior to commit. Change-Id: Id23bc25730fcdd151386cd495a7cdc0935cbc00b Signed-off-by: Stefan Lay <stefan.lay@sap.com>
* | | | Merge "Add the parameter "update" to the Add command"Chris Aniszczyk2010-08-051-12/+41
|\| | |
| * | | Add the parameter "update" to the Add commandStefan Lay2010-08-041-12/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change is mainly done for a subsequent commit which will introduce the "all" parameter to the Commit command. Bug: 318439 Change-Id: I85a8a76097d0197ef689a289288ba82addb92fc9 Signed-off-by: Stefan Lay <stefan.lay@sap.com>
* | | | Allow to replace existing Change-IdStefan Lay2010-08-051-2/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is useful to be able to replace an existing Change-Id in the message, for example if the user decides not to amend the previous commit. Bug: 321188 Change-Id: I594e7f9efd0c57d794d2bd26d55ec45f4e6a47fd Signed-off-by: Stefan Lay <stefan.lay@sap.com> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
* | | | Rename getOldName,getNewName to getOldPath,getNewPathShawn O. Pearce2010-08-046-62/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TreeWalk calls this value "path", while "name" is the stuff after the last slash. FileHeader should do the same thing to be consistent. Rename getOldName to getOldPath and getNewName to getNewPath. Bug: 318526 Change-Id: Ib2e372ad4426402d37939b48d8f233154cc637da Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* | | | Merge branch 'js/diff'Shawn O. Pearce2010-08-041-1/+2
|\ \ \ \ | | |_|/ | |/| | | | | | | | | | * js/diff: Fixed bug in scoring mechanism for rename detection
| * | | Fixed bug in scoring mechanism for rename detectionJeff Schumacher2010-08-041-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A bug in rename detection would cause file scores to be wrong. The bug was due to the way rename detection would judge the similarity between files. If file A has three lines containing 'foo', and file B has 5 lines containing 'foo', the rename detection phase should record that A and B have three lines in common (the minimum of the number of times that line appears in both files). Instead, it would choose the the number of times the line appeared in the destination file, in this case file B. I fixed the bug by having the SimilarityIndex instead choose the minimum number, as it should. I also added a test case to verify that the bug had been fixed. Change-Id: Ic75272a2d6e512a361f88eec91e1b8a7c2298d6b
* | | | Add gitignore support to IndexDiff and use TreeWalkJens Baumgart2010-08-041-56/+130
| |/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | IndexDiff was re-implemented and now uses TreeWalk instead of GitIndex. Additionally, gitignore support and retrieval of untracked files was added. Change-Id: Ie6a8e04833c61d44c668c906b161202b200bb509 Signed-off-by: Jens Baumgart <jens.baumgart@sap.com> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
* | | Make use of Repository.writeMerge...()Christian Halstrick2010-07-291-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | The CommitCommand should not use java.io to delete MERGE_HEAD and MERGE_MSG files since Repository already has utility methods for that. Change-Id: If66a419349b95510e5b5c2237a91f06c1d5ba0d4 Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
* | | Merge "Fix tag sorting in PlotWalk"Christian Halstrick2010-07-282-14/+17
|\ \ \
| * | | Fix tag sorting in PlotWalkShawn O. Pearce2010-07-282-14/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By deferring tag sorting until the commit is produced by the walker we can avoid an infinite loop that was triggered by trying to sort tags while allocating a commit. This also avoids needing to look at commits which aren't going to be produced in the result. Bug: 321103 Change-Id: I25acc739db2ec0221a50b72c2d2aa618a9a75f37 Reviewed-by: Mathias Kinzler <mathias.kinzler@sap.com> Reviewed-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* | | | Meaningful error message when trying to check-out submodulesMathias Kinzler2010-07-283-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, a NullPointerException occurs in this case. We should instead throw a more meaningful Exception with a proper message. This is a very "stupid" implementation which simply checks for the existence of a ".gitmodules" file. Bug: 300731 Bug: 306765 Bug: 308452 Bug: 314853 Change-Id: I155aa340a85cbc5d7d60da31dba199fc30689b67 Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>
* | | | Fix unit tests under windowsChristian Halstrick2010-07-281-1/+1
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the following tests fail under windows because certain inputstreams are not closed and files cannot be deleted because of that. The main problem I found is UnpackedObject.InflaterInputStream.close(). This method may throw exceptions found by checkValidEndOfStream() but doesn't call super.close() before leaving. It is not clear to me which resources a close() method should release before it throws an exception. But those reseources which are not published to the outside and which therefore cannot be closed by other means have to be closed in all cases. I changed the close() method to call super.close() under all circumstances. failing tests: testStandardFormat_LargeObject_TruncatedZLibStream(org.eclipse.jgit.storage.file.UnpackedObjectTest) testStandardFormat_LargeObject_TrailingGarbage(org.eclipse.jgit.storage.file.UnpackedObjectTest) testPackFormat_SmallObject(org.eclipse.jgit.storage.file.UnpackedObjectTest) Change-Id: Id2e609a29e725aad953ff9bd88af6381df38399d Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
* | | Teach NameConflictTreeWalk to report DF conflictsChristian Halstrick2010-07-281-0/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | Add a method isDirectoryFileConflict() to NameConflictTreeWalk which tells whether the current path is part of a directory/file conflict. Change-Id: Iffcc7090aaec743dd6f3fd1a333cac96c587ae5d Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* | | Stack Overflow in EGit History View Mathias Kinzler2010-07-281-2/+3
| |/ |/| | | | | | | | | | | | | | | | | | | This is caused by a recursion in PlotWalk.getTags(). As a hotfix, the sort was simply removed. The sort must be re-implemented so that parseAny() is not called again (currently, this happens in the PlotRefComparator). Change-Id: I060d26fda8a75ac803acaf89cfb7d3b4317328f3 Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com> Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
* | Break dissimilar file pairs during diffJeff Schumacher2010-07-275-21/+155
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | File pairs that are very dissimilar during a diff were not being broken apart into their constituent ADD/DELETE pairs. The leads to sub-optimal rename detection. Take, for example, this situation: A file exists at src/a.txt containing "foo". A user renames src/a.txt to src/b.txt, then adds a new src/a.txt containing "bar". Even though the old a.txt and the new b.txt are identical, the rename detection algorithm would not detect it as a rename since it was already paired in a MODIFY. I added code to split all MODIFYs below a certain score into their constituent ADD/DELETE pairs. This allows situations like the one I described above to be more correctly handled. Change-Id: I22c04b70581f206bbc68c4cd1ee87a1f663b418e Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* | Add methods which write MERGE_HEAD and MERGE_MSGChristian Halstrick2010-07-271-3/+60
| | | | | | | | | | | | | | | | | | | | Add methods to the Repository class which write into MERGE_HEAD and MERGE_MSG files. Since we have the read methods in the same class this seems to be the right place. Change-Id: I5dd65306ceb06e008fcc71b37ca3a649632ba462 Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* | Fix concurrent read / write issue in LockFile on WindowsJens Baumgart2010-07-2712-25/+83
| | | | | | | | | | | | | | | | | | | | LockFile.commit fails if another thread concurrently reads the base file. The problem is fixed by retrying the rename operation if it fails. Change-Id: I6bb76ea7f2e6e90e3ddc45f9dd4d69bd1b6fa1eb Bug: 308506 Signed-off-by: Jens Baumgart <jens.baumgart@sap.com>
* | Fix Javadoc warningsRobin Stocker2010-07-274-16/+15
| | | | | | | | | | | | | | | | There were some broken links, incorrect uses of @value, an invalid tag and an outdated comment. Change-Id: I22886bcc869a4b62bd606ebed40669f7b4723664 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* | Make forPath(ObjectReader) variant in TreeWalkShawn O. Pearce2010-07-271-8/+40
| | | | | | | | | | | | | | | | This simplifies the logic for those who already have an ObjectReader on hand want to reuse it to lookup a single path. Change-Id: Ief17d6b2a0674ddb34bbc9f43121b756eae960fb Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* | Make StoredConfig an abstraction above FileBasedConfigShawn O. Pearce2010-07-263-2/+94
| | | | | | | | | | | | | | | | | | | | This exposes a load and save method, allowing a Repository to denote that it has a persistent configuration of some kind which can be accessed by the application, without needing to know exact details of how its stored . Change-Id: I7c414bc0f975b80f083084ea875eca25c75a07b2 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* | Merge branch 'delta'Shawn O. Pearce2010-07-22150-4981/+11691
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * delta: (103 commits) Discard the uncompressed delta as soon as its compressed Honor pack.windowlimit to cap memory usage during packing Honor pack.threads and perform delta search in parallel Cache small deltas during packing Implement delta generation during packing debug-show-packdelta: Dump a pack delta to the console Initial pack format delta generator Add debugging toString() method to ObjectToPack Make ObjectToPack clearReuseAsIs signal available to subclasses Correctly classify the compressing objects phase Refactor ObjectToPack's delta depth setting Configure core.bigFileThreshold into PackWriter Add doNotDelta flag to ObjectToPack Add more configuration options to PackWriter Save object path hash codes during packing Add path hash code to ObjectWalk Add getObjectSize to ObjectReader Allow TemporaryBuffer.Heap to allocate smaller than 8 KiB Define a constant for 127 in DeltaEncoder Cap delta copy instructions at 64k ... Conflicts: org.eclipse.jgit.pgm/src/org/eclipse/jgit/pgm/Diff.java org.eclipse.jgit/resources/org/eclipse/jgit/JGitText.properties org.eclipse.jgit/src/org/eclipse/jgit/JGitText.java org.eclipse.jgit/src/org/eclipse/jgit/revwalk/RewriteTreeFilter.java Change-Id: I7c7a05e443a48d32c836173a409ee7d340c70796
| * | Discard the uncompressed delta as soon as its compressedShawn O. Pearce2010-07-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The DeltaCache will most likely need to copy the compressed delta into a new buffer in order to compact away the wasted space at the end caused by over allocation. Since we don't need the uncompressed format anymore, null out our only reference to it so the GC can reclaim this memory if it needs to perform a collection in order to satisfy the cache's allocation attempt. Change-Id: I50403cfd2e3001b093f93a503cccf7adab43cc9d Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | Honor pack.windowlimit to cap memory usage during packingShawn O. Pearce2010-07-093-4/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The pack.windowlimit configuration parameter places an upper bound on the number of bytes used by the DeltaWindow class as it scans through the object list. If memory usage would exceed the limit the window is temporarily decreased in size to keep memory used within that bound. Change-Id: I09521b8f335475d8aee6125826da8ba2e545060d Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | Honor pack.threads and perform delta search in parallelShawn O. Pearce2010-07-097-11/+358
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we have multiple CPUs available, packing usually goes faster when each CPU is assigned a slice of the available search space. The number of threads to use is guessed from the runtime if it wasn't set by the caller, or wasn't set in the configuration. Change-Id: If554fd8973db77632a52a0f45377dd6ec13fc220 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | Cache small deltas during packingShawn O. Pearce2010-07-095-21/+379
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PackWriter now caches small deltas, or deltas that are very tiny compared to their source inputs, so that the writing phase goes faster by reusing those cached deltas. The cached data is stored compressed, which usually translates to a bigger footprint due to deltas being very hard to compress, but saves time during writing by avoiding the deflate step. They are held under SoftReferences so that the JVM GC can clear out deltas if memory gets very tight. We would rather continue working and spend a bit more CPU time during writing than crash due to OOME. To avoid OutOfMemoryErrors during the caching phase we also trap OOME and just abort out of the caching. Because deflateBound() always produces something larger than what we need to actually store the deflated data, we copy it over into a new buffer if the actual length doesn't match the buffer length. When packing jgit.git this saves over 111 KiB in the cache, and is thus a worthwhile hit on CPU time. To further save memory we store the inflated size of the delta (which we need for the object header) in the same field as the pathHash, as the pathHash is no longer necessary by this phase of the packing algorithm. Change-Id: I0da0c600d845e8ec962289751f24e65b5afa56d7 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | Implement delta generation during packingShawn O. Pearce2010-07-093-1/+622
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PackWriter now produces new deltas if there is not a suitable delta available for reuse from an existing pack file. This permits JGit to send less data on the wire by sending a delta relative to an object the other side already has, instead of sending the whole object. The delta searching algorithm is similar in style to what C Git uses, but apparently has some differences (see below for more on). Briefly, objects that should be considered for delta compression are pushed onto a list. This list is then sorted by a rough similarity score, which is derived from the path name the object was discovered at in the repository during object counting. The list is then walked in order. At each position in the list, up to $WINDOW objects prior to it are attempted as delta bases. Each object in the window is tried, and the shortest delta instruction sequence selects the base object. Some rough rules are used to prevent pathological behavior during this matching phase, like skipping pairings of objects that are not similar enough in size. PackWriter intentionally excludes commits and annotated tags from this new delta search phase. In the JGit repository only 28 out of 2600+ commits can be delta compressed by C Git. As the commit count tends to be a fair percentage of the total number of objects in the repository, and they generally do not delta compress well, skipping over them can improve performance with little increase in the output pack size. Because this implementation was rebuilt from scratch based on my own memory of how the packing algorithm has evolved over the years in C Git, PackWriter, DeltaWindow, and DeltaEncoder don't use exactly the same rules everywhere, and that leads JGit to produce different (but logically equivalent) pack files. Repository | Pack Size (bytes) | Packing Time | JGit - CGit = Difference | JGit / CGit -----------+----------------------------------+----------------- git | 25094348 - 24322890 = +771458 | 59.434s / 59.133s jgit | 5669515 - 5709046 = - 39531 | 6.654s / 6.806s linux-2.6 | 389M - 386M = +3M | 20m02s / 18m01s For the above tests pack.threads was set to 1, window size=10, delta depth=50, and delta and object reuse was disabled for both implementations. Both implementations were reading from an already fully packed repository on local disk. The running time reported is after 1 warm-up run of the tested implementation. PackWriter is writing 771 KiB more data on git.git, 3M more on linux-2.6, but is actually 39.5 KiB smaller on jgit.git. Being larger by less than 0.7% on linux-2.6 isn't bad, nor is taking an extra 2 minutes to pack. On the running time side, JGit is at a major disadvantage because linux-2.6 doesn't fit into the default WindowCache of 20M, while C Git is able to mmap the entire pack and have it available instantly in physical memory (assuming hot cache). CGit also has a feature where it caches deltas that were created during the compression phase, and uses those cached deltas during the writing phase. PackWriter does not implement this (yet), and therefore must create every delta twice. This could easily account for the increased running time we are seeing. Change-Id: I6292edc66c2e95fbe45b519b65fdb3918068889c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | debug-show-packdelta: Dump a pack delta to the consoleShawn O. Pearce2010-07-092-4/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a horribly crude application, it doesn't even verify that the object its dumping is delta encoded. Its method of getting the delta is pretty abusive to the public PackWriter API, because right now we don't want to expose the real internal low-level methods actually required to do this. Change-Id: I437a17ceb98708b5603a2061126eb251e82f4ed4 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | Initial pack format delta generatorShawn O. Pearce2010-07-094-13/+865
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DeltaIndex is a simple pack style delta generator. The function works by creating a compact index of a source buffer's blocks, and then walking a sliding window along a desired result buffer, searching for the window in the index. When a match is found, the window is stretched to the longest possible length that is common with the source buffer, and a copy instruction is created. Rabin's polynomial hash function is used to compute the hash for a block, permitting efficient sliding of the window in single byte increments. The update function to slide one byte originated from David Mazieres' work in LBFS, and our implementation of the update step was certainly inspired by the initial work Geert Bosch proposed for C Git in http://marc.info/?l=git&m=114565424620771&w=2. To ensure the encoder runs in linear time with respect to the size of the two input buffers (source and result), the maximum number of blocks that can share the same position in the index's hashtable is capped at a constant number. This prevents bad inputs from causing the encoder to run in quadratic time, but comes with a penalty of creating a longer delta due to fewer considered copy positions. Strange hackery is used to cap the amount of memory used by the index to be no more than 12 bytes for every 16 bytes of source buffer, no matter what the JVM per-object overhead is. This permits an index to always be no larger than 1.75x the source buffer length, which is an important feature to support large windows of candidates to match against while packing. Here the strange hackery is nothing more than a manually managed chained hashtable, where pointers are array indexes into storage arrays rather than object references. Computation of the hash function for a single fixed sized block is done through an unrolled loop, where the first 4 iterations have been manually reduced down to eliminate unnecessary instructions. The pattern is derived from ObjectId.equals(byte[], int, byte[], int), where we have unrolled the loop required to compare two 20 byte arrays. Hours of testing with the Sun 1.6 JRE concluded that the non-obvious "foo[idx + 1]" style of reference is faster than "foo[idx++]", and so that is what we use here during hashing. Change-Id: If9fb2a1524361bc701405920560d8ae752221768 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | Add debugging toString() method to ObjectToPackShawn O. Pearce2010-07-091-0/+28
| | | | | | | | | | | | | | | | | | | | | | | | Its useful to know what the flags are or what the base that was selected is. Dump these out as part of the object's toString. Change-Id: I8810067fb8337b08b4fcafd5f9ea3e1e31ca6726 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | Make ObjectToPack clearReuseAsIs signal available to subclassesShawn O. Pearce2010-07-092-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | A subclass may want to use this method to release handles that are caching reuse information. Make it protected so they can override it and update themselves. Change-Id: I2277a56ad28560d2d2d97961cbc74bc7405a70d4 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | Correctly classify the compressing objects phaseShawn O. Pearce2010-07-091-15/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | Searching for reuse candidates should be fast compared to actually doing delta compression. So pull the progress monitor out of this phase and rename it back to identify the compressing objects state. Change-Id: I5eb80919f21c1251e0e3420ff7774126f1f79b27 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | Refactor ObjectToPack's delta depth settingShawn O. Pearce2010-07-091-8/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Long ago when PackWriter is first written we thought that the delta depth could be updated automatically. But its never used. Instead make this a simple standard setter so the caller can more directly set the delta depth of this object. This permits us to configure a depth that takes into account more than just the depth of another object in this same pack. Change-Id: I1d71b74f2edd7029b8743a2c13b591098ce8cc8f Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | Configure core.bigFileThreshold into PackWriterShawn O. Pearce2010-07-092-0/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | C Git's fast-import uses this to determine the maximum file size that it tries to delta compress, anything equal to or above this setting is stored with as a whole object with simple deflate. Define the configuration so we can use it later. Change-Id: Iea46e787d019a1b6c51135cc73d7688a02e207f5 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | Add doNotDelta flag to ObjectToPackShawn O. Pearce2010-07-092-1/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This flag will later control whether or not PackWriter search for a delta base for this object. Edge objects will never get searched, as the writer won't be outputting them, so they should always have this flag set on. Sometime in the future this flag should also be set for file blobs on file paths that have the "-delta" gitattribute set in the repository's attributes file. Change-Id: I6e518e1a6996c8ce00b523727f1b605e400e82c6 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | Add more configuration options to PackWriterShawn O. Pearce2010-07-092-8/+152
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We now at least import other pack settings like pack.window, which means we can later use these to control how we search for deltas. The compression level was fixed to use pack.compression rather than the loose object core.compression setting. Change-Id: I72ff6d481c936153ceb6a9e485fa731faf075a9a Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | Save object path hash codes during packingShawn O. Pearce2010-07-092-5/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | We need to remember these so we can later cluster objects that have similar file paths near each other as we search for deltas between them. Change-Id: I52cb1e4ca15c9c267a2dbf51dd0d795f885f4cf8 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | Add path hash code to ObjectWalkShawn O. Pearce2010-07-092-0/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PackWriter wants to categorize objects that are similar in path name, so blobs that are probably from the same file (or same sort of file) can be delta compressed against each other. Avoid converting into a string by performing the hashing directly against the path buffer in the tree iterator. We only hash the last 16 bytes of the path, and we try avoid any spaces, as we want the suffix of a file such as ".java" to be more important than the directory it is in, like "src". Change-Id: I31770ee711526306769a6f534afb19f937e0ba85 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | Add getObjectSize to ObjectReaderShawn O. Pearce2010-07-097-1/+247
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is an informational function used by PackWriter to help it better organize objects for delta compression. Storage systems can implement it to provide up more detailed size information, or they can simply rely on the default behavior that uses the ObjectLoader obtained from open. For local file storage, we can obtain this information faster through specialized routines that parse a pack object header. Change-Id: I13a09b4effb71ea5151b51547f7d091564531e58 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | Allow TemporaryBuffer.Heap to allocate smaller than 8 KiBShawn O. Pearce2010-07-091-8/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the heap limit was set to something smaller than 8 KiB, we were still allocating the full 8 KiB block size, and accepting up to the amount we allocated by. Instead actually put a hard cap on the limit. Change-Id: Id1da26fde2102e76510b1da4ede8493928a981cc Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | Define a constant for 127 in DeltaEncoderShawn O. Pearce2010-07-071-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | The special value 127 here means how many bytes we can put into a single insert command. Rather than use the magical value 127, lets name it to better document the code. Change-Id: I5a326f4380f6ac87987fa833e9477700e984a88e Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | Cap delta copy instructions at 64kShawn O. Pearce2010-07-071-12/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Although all modern delta decoders can process copy instructions with a count as large as 0xffffff (~15.9 MiB), pack version 2 streams are only supposed to use delta copy instructions up to 64 KiB. Rewrite our copy instruction encode loop to use the lower 64 KiB limit, even though modern decoders would support longer copies. To improve encoding performance we now try to encode up to four full copy commands in our buffer before we flush it to the stream, but we don't try to implement full buffering here. We are just trying to amortize the virtual method call to the destination stream when we have to do a large copy. Change-Id: I9410a16e6912faa83180a9788dc05f11e33fabae Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | Fix DeltaEncoder header for objects 128 bytes longShawn O. Pearce2010-07-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | The encode loop had the wrong condition, objects that are 128 bytes in size need to have their length encoded as two bytes, not one. Change-Id: I3bef85f2b774871ba6104042b341749eb8e7595c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | amend commit: Support large delta packed objects as streamsShawn O. Pearce2010-07-064-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename the ByteWindow's inflate() method to setInput. We have completely refactored the purpose of this method to be feeding part (or all) of the window as input to the Inflater, and the actual inflate activity happens in the caller. Change-Id: Ie93a5bae0e9e637b5e822d56993ce6b562c6ad15 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | amend commit: Support large loose objects as streamsShawn O. Pearce2010-07-061-13/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to validate the stream state after the InflaterInputStream thinks the stream is done. Git expects a higher level of service from the Inflater than the InflaterInputStream usually gives, we need to ensure the embedded CRC is valid, and that there isn't trailing garbage at the end of the file. Change-Id: I1c9642a82dbd76b69e607dceccf8b85dc869a3c1 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | Improve description of isBare and NoWorkTreeExceptionShawn O. Pearce2010-07-031-10/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | Alex pointed out that my description of a bare repository might be confusing for some readers. Reword the description of the error, and make it consistent throughout the Repository class's API. Change-Id: I87929ddd3005f578a7022f363270952d1f7f8664 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | amend commit: Refactor repository construction to builder classShawn O. Pearce2010-07-033-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During code review, Alex raised a few comments about commit 532421d98925 ("Refactor repository construction to builder class"). Due to the size of the related series we aren't going to go back and rebase in something this minor, so resolve them as a follow-up commit instead. Change-Id: Ied52f7a8f7252743353c58d20bfc3ec498933e00 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | Remove pointless size test in PackFile decompressShawn O. Pearce2010-07-031-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that any large objects are forced through a streaming loader when its bigger than getStreamFileThreshold(), and that threshold is pegged at Integer.MAX_VALUE as its largest size, we will never be able to reach this code path where we threw OutOfMemoryError. Robin pointed out that we probably should include a message here, but the code is effectively unreachable, so there isn't any value in adding a message at this point. So remove it. Change-Id: Ie611d005622e38a75537f1350246df0ab89dd500 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | Avoid unbounded getCachedBytes during parseAnyShawn O. Pearce2010-07-031-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since we don't know the type of object we are parsing, we don't know if its a massive blob, or some small commit or annotated tag. Avoid pulling the cached bytes until we have checked the type and decided if we actually need them to continue parsing right now. This way large blobs which won't fit in memory and would throw a LargeObjectException don't abort parsing. Change-Id: Ifb70df5d1c59f616aa20ee88898cb69524541636 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | Make type and size lazy for large delta objectsShawn O. Pearce2010-07-032-22/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Callers don't necessarily need the getSize() result from a large delta. They instead should be always using openStream() or copyTo() for blobs going to local files, or they should be checking the result of the constant-time isLarge() method to determine the type of access they can use on the ObjectLoader. Avoid inflating the delta instruction stream twice by delaying the decoding of the size until after we have created the DeltaStream and decoded the header. Likewise with the type, callers don't necessarily always need it to be present in an ObjectLoader. Delay looking at it as late as we can, thereby avoiding an ugly O(N^2) loop looking up the type for every single object in the entire delta chain. Change-Id: I6487b75b52a5d201d811a8baed2fb4fcd6431320 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>