summaryrefslogtreecommitdiffstats
path: root/org.eclipse.jgit
Commit message (Collapse)AuthorAgeFilesLines
...
| * | Change Repository.getConfig() to return non-file ConfigsShawn O. Pearce2010-06-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | A repository implementation might support storing configurations on a non-file storage system, so widen the return value to be any type of configuration. Change-Id: If9a0928f4b3ef29a24d270b0ce585a6e77f6fac6 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | Make lib.Repository abstract and lib.FileRepository its implementationShawn O. Pearce2010-06-255-337/+530
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To support other storage models other than just the local filesystem, we split the Repository class into a nearly abstract interface and then create a concrete subclass called FileRepository with the file based IO implementation. We are using an abstract class for Repository rather than the much more generic interface, as implementers will want to inherit a large array of utility functions, such as resolve(String). Having these in a base class makes it easy to inherit them. This isn't the final home for lib.FileRepository. Future changes will rename it into storage.file.FileRepository, but to do that we need to also move a number of other related class, which we aren't quite ready to do. Change-Id: I1bd54ea0500337799a8e792874c272eb14d555f7 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | Consistently fail work tree methods on bare repositoriesShawn O. Pearce2010-06-251-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the working tree isn't available, it doesn't make any sense to obtain the merge heads, or the buffered commit message. The repository shouldn't have a partial merge state to read. Throw back the same exception we do when invoking getWorkDir() on a bare repository instance. Change-Id: I762c55890b7fe272a183da583f910671d1cadf71 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | Consistently use getDirectory() for work tree stateShawn O. Pearce2010-06-251-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | This permits us to leave the implementation of these methods here in the Repository class, but later refactor how the directory is accessed into a subclass. Change-Id: I5785b2009c5b7cca0fb070a968e50814ce847076 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | Add RepositoryState.BAREShawn O. Pearce2010-06-252-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A bare repository cannot be checked out, committed to, etc. as it doesn't have a working directory. Define this as a state since the state enumeration exists only to describe how a working directory can be modified. Change-Id: I0a299013c6e42fef6cae3f6a9446f8f6c8e0514a Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | Rename Repository 'config' as 'repoConfig'Shawn O. Pearce2010-06-251-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | This better matches with the other configuration variable, 'userConfig', and helps to make it clear what config object we are dealing with. Change-Id: I2c585649aecc805e8e66db2f094828cd2649e549 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | Remove RepositoryConfig and use FileBasedConfig insteadShawn O. Pearce2010-06-253-140/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change the Repository API to use straight-up FileBasedConfig. This lets us remove the subclass RepositoryConfig and stop having a specialized configuration type for repository, letting us instead focus the config type heirarchy on type-of-storage rather than use. Change-Id: I7236800e8090624453a89cb0c7a9a632702691c6 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | Delegate repository access to refs, objectsShawn O. Pearce2010-06-251-15/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of using the internal field directly to access references or objects, use the getter method to obtain the proper type of database, and follow down from there. This permits us to later do a refactoring that makes those methods abstract and strips the field out of the Repository class, moving it into a concrete base class that is more storage implementation specific. Change-Id: Ic21dd48800e68a04ce372965ad233485b2a84bef Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | Cleanup Repository.create()Shawn O. Pearce2010-06-251-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | This method doesn't need to be synchronized, as its only a proxy to create(boolean), which is the real worker. While we are touching it try to improve the Javadoc and whitespace nearby. Change-Id: Ibdddec6e518ca6d7439cfad90fedfcdc2d6b7a2e Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | Move additional have enumeration to RepositoryShawn O. Pearce2010-06-252-16/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | This permits the repository implementation to know what its alternates concept means, and avoids needing to expose finer details about the ObjectDatabase to network code like the RefAdvertiser. Change-Id: Ic6d173f300cb72de34519c7607cf7b0ff3ea6882 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | Refactor amazon-s3:// property file loading to support no directoryShawn O. Pearce2010-06-251-17/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the future getDirectory() can return null. Avoid an NPE here by refactoring the code to support conditionally skipping a check for the properties file in the repository directory, falling to only the user's ~/ file location. Change-Id: I76f5503d4063fdd9d24b7c1b58e1b09ddf1a5670 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | Download pack-*.idx to /tmp if not on local filesystemShawn O. Pearce2010-06-251-7/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the destination repository doesn't use an ObjectDirectory to store its objects, we can't download to the object directory. Instead pull the pack-*.idx files down to temporary files in the JVM's default temporary directory. Change-Id: Ied16bc89be624d87110ba42ba52d698a6ea7d982 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | DirCache must use getIndexFileShawn O. Pearce2010-06-251-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | When reading or locking the index of a repository, we need to use the index file specified by the repository, to ensure we correctly honor what the repository was configured with. Change-Id: I5be366ce32d7923b888dc01d19335912b01b7c4c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | Disable topological sorting in PackWriterShawn O. Pearce2010-06-231-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Its not strictly required that we sort topologically in order to produce a valid pack file. This was just something that Linus thought would be a good idea to do. In practice its not that important for most repositories. Local file IO quickly falls out of the pattern that topological sorting provides any sort of benefit for, so expending extra resources to enforce it when we make a pack isn't really worth it. I'm removing this sort in the pipeline because later changes would support really efficient COMMIT_TIME_DESC sorting on a non-file storage system, but TOPO sorting would be a bit more ugly to run, due to the in-memory delays it imposes. Change-Id: I0121453461c2140c6917cb10c6df584eb47e5795 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | UploadPack: Permit flushing progress messages under smart HTTPShawn O. Pearce2010-06-231-1/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If UploadPack invokes flush() on the output stream we pass it, its most likely the progress messages coming down the side band stream. As pack generation can take a while, we want to push that down at the client as early as we can, to keep the connection alive, and to let the user know we are still working on their behalf. Ensure we dump the temporary buffer whenever flush() is invoked, otherwise the messages don't get sent in a timely fashion to the user agent (in this case, git fetch). We specifically don't implement flush() for ReceivePack right now, as that protocol currently does not provide progress messages to the user, but it does invoke flush several times, as the different streams include '0000' type flush-pkts to denote various end points. Change-Id: I797c90a2c562a416223dc0704785f61ac64e0220 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | Rewrite resolve in terms of RevWalkShawn O. Pearce2010-06-231-119/+83
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We want to eventually get rid of the mapCommit, mapTree APIs on Repository and force everyone into the faster parsers that exist in RevWalk. Rewriting resolve in terms of the faster parsers is a good first step. It actually simplifies the code a bit, as we no longer need to keep track of an ObjectId and an Object (the parsed form), since all RevObjects implicitly have their ObjectId readily available. Change-Id: I4d234630195616e2c263e7e70038b55a1be4e7a3 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | Replace manual peel loops with RevWalk.peelShawn O. Pearce2010-06-234-23/+35
| | | | | | | | | | | | | | | | | | | | | | | | Instead of peeling things by hand in application level code, defer the peeling logic into RevWalk's new peel utility method. Change-Id: Idabd10dc41502e782f6a2eeb56f09566b97775a8 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | Use RevTag/RevCommit to sort in a PlotWalkShawn O. Pearce2010-06-231-9/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We already have these objects parsed and cached in our object pool. We shouldn't be looking them up via the legacy mapObject API, but instead can use the pool and the faster parsing routines available through the RevWalk that we extend. While we are here fixing the code, lets also correct the tag date sorting to accept tags that have no tagger identity, because they were created before Git knew how to store that field. Change-Id: Id49a11f6d9c050c82b876e5e11058840c894b2d7 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | Use CoreConfig, UserConfig and TransferConfig directlyShawn O. Pearce2010-06-235-6/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than relying on the helpers in RepositoryConfig to get these objects, obtain them directly through the Config API. Its only slightly more verbose, but permits us to work with the base Config class, which is more flexible than the highly file specific RepositoryConfig. This is what I really meant to do when I added the section parser and caching support to Config, we just failed to finish updating all of the call sites. Change-Id: I481cb365aa00bfa8c21e5ad0cd367ddd9c6c0edd Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | Use higher level Config types when possibleShawn O. Pearce2010-06-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We don't have to assume/depend on RepositoryConfig here, these two tests can use higher level versions of the class and still come up with the same test. That frees us up to do some changes to the RepositoryConfig API. Change-Id: Ia7b263c8c5efa3fae1054416d39c546867288132 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* | | Allow client of Add command to set a WorkingTreeIteratorStefan Lay2010-07-221-3/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | This is e.g. useful when a client of the AddCommand has additional rules to ignore files. In Eclipse a resource can be set to derived or be excluded by preferences. Change-Id: I6c47e54a1ce26315faf5ed0723298ad2c2db197c Signed-off-by: Stefan Lay <stefan.lay@sap.com>
* | | Allow for filepattern "." in AddCommandStefan Lay2010-07-221-1/+5
| | | | | | | | | | | | | | | | | | | | | Enable adding on repository root level. Change-Id: I415b10dc74cc9435578424d9f106c972fd703055 Signed-off-by: Stefan Lay <stefan.lay@sap.com>
* | | Do not add ignored files in Add commandStefan Lay2010-07-221-2/+6
| | | | | | | | | | | | Signed-off-by: Stefan Lay <stefan.lay@sap.com>
* | | Move ignore node handling into WorkingTreeIteratorShawn O. Pearce2010-07-217-451/+220
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The working tree iterator has perfect knowledge of the path structure as well as immediate information about whether or not an ignore file even exists at this level. We can exploit that to simplify the logic and running time for testing ignored file status by pushing all of the checks down into the iterator itself. Change-Id: I22ff534853e8c5672cc5c2d9444aeb14e294070e Signed-off-by: Shawn O. Pearce <spearce@spearce.org> CC: Charley Wang <chwang@redhat.com> CC: Chris Aniszczyk <caniszczyk@gmail.com> CC: Stefan Lay <stefan.lay@sap.com> CC: Matthias Sohn <matthias.sohn@sap.com>
* | | Merge "Fix concurrent read / write issue in GitIndex on Windows"Shawn Pearce2010-07-215-4/+48
|\ \ \
| * | | Fix concurrent read / write issue in GitIndex on WindowsJens Baumgart2010-07-215-4/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GitIndex.write fails if another thread concurrently reads the index file. The problem is fixed by retrying the rename operation if it fails. Bug: 311051 Change-Id: Ib243d2a90adae312712d02521de4834d06804944 Signed-off-by: Jens Baumgart <jens.baumgart@sap.com>
* | | | Check for racy git in WorkingTreeIteratorChristian Halstrick2010-07-201-11/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The WorkingTreeIterator has a method to check whether the current file differs from the corresponding index entry. This commit improves this check to also handle racy git situations. See http://git.kernel.org/?p=git/git.git;a=blob;f=Documentation/technical/racy-git.txt;hb=HEAD Change-Id: I3ad0897211dcbb2eac9eebcb19d095a5052fb06b Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
* | | | Smudge racily clean index entries by truncating length (like git.git)Christian Halstrick2010-07-202-18/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To mark an entry racily clean we set its length to 0 (like native git does). Entries which are not racily clean and have zero length can be distinguished from racily clean entries by checking P_OBJECTID against the SHA1 of empty content. When length is 0 and P_OBJECTID is different from SHA1 of empty content we know the entry is marked racily clean. See http://dev.eclipse.org/mhonarc/lists/jgit-dev/msg00488.html Change-Id: I689552931441ab51964b430b303160c9126b66af Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
* | | | Use proper constants for .gitignore and .git directoryShawn O. Pearce2010-07-201-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have a constant for .gitignore, so use it. While we are in the same method, correct the reference of ".git" to be the actual GIT_DIR given. This might not be within the work tree if the GIT_DIR and GIT_WORK_TREE environment variables were used. Change-Id: I38e1cec13405109b9c347858b38dd9fb2f1f2560 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> CC: Charley Wang <chwang@redhat.com> CC: Chris Aniszczyk <caniszczyk@gmail.com> CC: Stefan Lay <stefan.lay@sap.com> CC: Matthias Sohn <matthias.sohn@sap.com>
* | | | Remove gitIgnoreTimestamp from abstract iterator APIShawn O. Pearce2010-07-203-61/+11
| |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This never should have been exposed on the top of the AbstractTreeIterator type hierarchy. There is no concept of a timestamp in a canonical tree read from the object database, and the time in the DirCache isn't what we want here either. Actually all that we need is to find the files whose names are ".gitignore" and are below the root directory. We can accomplish that with a suffix filter, and process them immediately. Change-Id: Ib09cbf81a9e038452ce491385c65498312e2916b Signed-off-by: Shawn O. Pearce <spearce@spearce.org> CC: Charley Wang <chwang@redhat.com> CC: Chris Aniszczyk <caniszczyk@gmail.com> CC: Stefan Lay <stefan.lay@sap.com> CC: Matthias Sohn <matthias.sohn@sap.com>
* | | Fix NPE in RenameDetectorShawn O. Pearce2010-07-201-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we have two adds of the same object but no deletes the detector threw an NPE because the entry that came back from the deleted map was null (no matching objects). In this case we need to put the adds all back onto the list of left over additions since they did not match a delete. Change-Id: Ie68fbe7426b4dc0cb571a08911c7adbffff755d5 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> CC: Jeffrey Schumacher" <jeffschu@google.com>
* | | IndexPack: Fix spurious pack file corruption errorsShawn O. Pearce2010-07-201-3/+3
|/ / | | | | | | | | | | | | | | | | | | | | | | | | We didn't correctly handle the zlib trailer for an object. If the trailer bytes were outside of the current buffer window but we had fully inflated the object itself, we broke out of the loop (as we had our target size) but inflate wasn't finished (as it did not yet get the trailer) so we failed the test and threw a corruption exception. Use an infinite loop and only break out when the inflater is done. Change-Id: I7c9bbbeb577a990d9bc56a50ebd485935460f6c8 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* | Merge branch 'js/rename'Shawn O. Pearce2010-07-1620-223/+2303
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * js/rename: Implemented file path based tie breaking to exact rename detection Added more test cases for RenameDetector Added very small optimization to exact rename detection Fixed Misleading Javadoc Added file path similarity to scoring metric in rename detection Fixed potential div by zero bug Added file size based rename detection optimization Create FileHeader from DiffEntry log: Implement --follow Cache the diff configuration section log: Add whitespace ignore options Format submodule links during differences Redo DiffFormatter API to be easier to use log, diff: Add rename detection support Implement similarity based rename detection Added a preliminary version of rename detection Refactored code out of FileHeader to facilitate rename detection
| * | Implemented file path based tie breaking to exact rename detectionJeff Schumacher2010-07-163-47/+199
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During the exact rename detection phase in RenameDetector, ties were resolved on a first-found basis. I added support for file path based tie breaking during that phase. Basically, there are four situations that have to be handled: One add matching one delete: In this simple case, we pair them as a rename. One add matching many deletes: Find the delete whos path matches the add the closest, and pair them as a rename. Many adds matching one delete: Similar to the above case, we find the add that matches the delete the closest, and pair them as a rename. The other adds are marked as copies of the delete. Many adds matching many deletes: Build a scoring matrix similar to the one used for content- based matching, scoring instead by file path. Some of the utility functions in SimilarityRenameDetector are used in this case, as we use the same encoding scheme. Once the matrix is built, scan it for the best matches, marking them as renames. The rest are marked as copies. I don't particularly like the idea of using utility functions right out of SimilarityRenameDetector, but it works for the moment. A later commit will likely refactor this into a common utility class, as well as bringing exact rename detection out of RenameDetector and into a separate class, much like SimilarityRenameDetector. Change-Id: I1fb08390aebdcbf20d049aecf402a36506e55611
| * | Added very small optimization to exact rename detectionJeff Schumacher2010-07-121-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | Optimized a small loop in findExactRenames. The loop would go through all the items in a list of DiffEntries even after it already found what it was looking for. I made it break out of the loop as soon as a good match was found. Change-Id: I28741e0c49ce52d8008930a87cd1db7037700a61
| * | Fixed Misleading JavadocJeff Schumacher2010-07-121-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The javadoc for the setRenameLimit method in RenameDetector said that you could only have limits in the range (0,100), implying that 0 and 100 were illegal inputs. The code, however, allowed 0 and 100. I changed the javadoc to say that the range [0,100] was legal. I also documented the IllegalArgumentException that is thrown if the limit is outside that range. Change-Id: I916838f254859f6f0e1516bb55b8e7dc87e57dc2
| * | Added file path similarity to scoring metric in rename detectionJeff Schumacher2010-07-122-4/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The scoring method was not taking into account the similarity of the file paths and file names. I changed the metric so that it is 99% based on content (which used to be 100% of the old metric), and 1% based on path similarity. Of that 1%, half (.5% of the total final score) is based on the actual file names (e.g. "foo.java"), and half on the directory (e.g. "src/com/foo/bar/"). Change-Id: I94f0c23bf6413c491b10d5625f6ad7d2ecfb4def
| * | Fixed potential div by zero bugJeff Schumacher2010-07-121-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The scoring logic in SimilarityIndex was dividing by the max file size. If both files are empty, this would cause a div by zero error. This case cannot currently happen, since two empty files would have the same SHA1, and would therefore be caught in the earlier SHA1 based detection pass. Still, if this logic eventually gets separated from that pass, a div by zero error would occur. I changed the logic to instead consider two empty files to have a similarity score of 100. Change-Id: Ic08e18a066b8fef25bb5e7c62418106a8cee762a
| * | Added file size based rename detection optimizationJeff Schumacher2010-07-121-0/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to this change, files that were very different in size (enough so that they could not have enough in common to be detected as renames) were still having their scores calculated. I added an optimization to skip such files. For example, if the rename detection threshold is 60%, the larger file is 200kb, and the smaller file is 50kb, the pair cannot be counted as a rename since they cannot possibly share 60% of their content in common. (200*.6=120, 120>50) Change-Id: Icd8315412d5de6292839778e7cea7fe6f061b0fc
| * | Create FileHeader from DiffEntryJeff Schumacher2010-07-085-97/+210
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Added support for converting DiffEntrys to FileHeaders. FileHeaders are DiffEntrys with a buffer containing the diff output as well as a list of HunkHeaders. The HunkHeaders contain EditLists. The createFileHeader(DiffEntry) method in DiffFormatter performs a Myers Diff on the files refered to by the DiffEntry, then puts the returned EditList into a single HunkHeader, which is then put into the FileHeader to be returned. It also generates the appropriate diff header an puts it into the FileHeader's buffer. The rest of the diff output, which would normally be parsed to generate the HunkHeaders, is not generated. In fact, the purpose of this method is to avoid the costly diff output generation and parsing normally required to create a FileHeader. Change-Id: I7d8b18c0f6c85e3d02ad58995d3d231e69af5887
| * | log: Implement --followShawn O. Pearce2010-07-033-2/+171
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The FollowFilter can be installed on a RevWalk to cause the path to be updated through rename detection when the affected file is found to be added to the project. The filter works reasonably well, for example we can follow the history of the fsck command in git-core: $ jgit log --name-status --follow builtin/fsck.c | grep ^R R100 builtin-fsck.c builtin/fsck.c R099 fsck.c builtin-fsck.c R099 fsck-objects.c fsck.c R099 fsck-cache.c fsck-objects.c Change-Id: I4017bcfd150126aa342fdd423a688493ca660a1f Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | Cache the diff configuration sectionShawn O. Pearce2010-07-032-3/+70
| | | | | | | | | | | | | | | | | | | | | | | | This way we don't have to reparse for the rename limit every time we create a new rename detector for a repository. Change-Id: I669d031690b85ef4da5e39189be7173fb773fc56 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | log: Add whitespace ignore optionsShawn O. Pearce2010-07-038-28/+90
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Similar to what we did with diff, implement whitespace ignore options for log too. This requires us to define some means of creating any RawText object type at will inside of DiffFormatter, so we define a new factory interface to construct RawText instances on demand. Unfortunately we have to copy the entire block of common options. args4j only processes the options/arguments on the one command class and Java doesn't support multiple inheritance. Change-Id: Ia16cd3a11b850fffae9fbe7b721d7e43f1d0e8a5 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | Format submodule links during differencesShawn O. Pearce2010-07-031-8/+20
| | | | | | | | | | | | | | | | | | | | | | | | Instead of crashing, output a submodule link with the simple "Subproject commit $fullid\n" syntax used by C Git. Change-Id: Iae8646941683fb19b73fb038217d2e3bf5f77fa9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | Redo DiffFormatter API to be easier to useShawn O. Pearce2010-07-033-86/+131
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Passing around the OutputStream and the Repository is crazy. Instead put the stream in the constructor, since this formatter exists only to output to the stream, and put the repository as a member variable that can be optionally set. Change-Id: I2bad012fee7f40dc1346700ebd19f1e048982878 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | log, diff: Add rename detection supportShawn O. Pearce2010-07-031-0/+160
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement rename detection in the command line diff and log commands. Also support --name-status, -p and -U flags, as these can be quite useful to view more detail. All of the Git patch file formatting code is now moved over to the DiffFormatter class. This permits us to reuse it in any context, including inside of IDEs. Change-Id: I687ccba34e18105a07e0a439d2181c323209d96c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | Implement similarity based rename detectionShawn O. Pearce2010-07-037-134/+1014
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Content similarity based rename detection is performed only after a linear time detection is performed using exact content match on the ObjectIds. Any names which were paired up during that exact match phase are excluded from the inexact similarity based rename, which reduces the space that must be considered. During rename detection two entries cannot be marked as a rename if they are different types of files. This prevents a symlink from being renamed to a regular file, even if their blob content appears to be similar, or is identical. Efficiently comparing two files is performed by building up two hash indexes and hashing lines or short blocks from each file, counting the number of bytes that each line or block represents. Instead of using a standard java.util.HashMap, we use a custom open hashing scheme similiar to what we use in ObjecIdSubclassMap. This permits us to have a very light-weight hash, with very little memory overhead per cell stored. As we only need two ints per record in the map (line/block key and number of bytes), we collapse them into a single long inside of a long array, making very efficient use of available memory when we create the index table. We only need object headers for the index structure itself, and the index table, but not per-cell. This offers a massive space savings over using java.util.HashMap. The score calculation is done by approximating how many bytes are the same between the two inputs (which for a delta would be how much is copied from the base into the result). The score is derived by dividing the approximate number of bytes in common into the length of the larger of the two input files. Right now the SimilarityIndex table should average about 1/2 full, which means we waste about 50% of our memory on empty entries after we are done indexing a file and sort the table's contents. If memory becomes an issue we could discard the table and copy all records over to a new array that is properly sized. Building the index requires O(M + N log N) time, where M is the size of the input file in bytes, and N is the number of unique lines/blocks in the file. The N log N time constraint comes from the sort of the index table that is necessary to perform linear time matching against another SimilarityIndex created for a different file. To actually perform the rename detection, a SxD matrix is created, placing the sources (aka deletions) along one dimension and the destinations (aka additions) along the other. A simple O(S x D) loop examines every cell in this matrix. A SimilarityIndex is built along the row and reused for each column compare along that row, avoiding the costly index rebuild at the row level. A future improvement would be to load a smaller square matrix into SimilarityIndexes and process everything in that sub-matrix before discarding the column dimension and moving down to the next sub-matrix block along that same grid of rows. An optional ProgressMonitor is permitted to be passed in, allowing applications to see the progress of the detector as it works through the matrix cells. This provides some indication of current status for very long running renames. The default line/block hash function used by the SimilarityIndex may not be optimal, and may produce too many collisions. It is borrowed from RawText's hash, which is used to quickly skip out of a longer equality test if two lines have different hash functions. We may need to refine this hash in the future, in order to minimize the number of collisions we get on common source files. Based on a handful of test commits in JGit (especially my own recent rename repository refactoring series), this rename detector produces output that is very close to C Git. The content similarity scores are sometimes off by 1%, which is most probably caused by our SimilarityIndex type using a different hash function than C Git uses when it computes the delta size between any two objects in the rename matrix. Bug: 318504 Change-Id: I11dff969e8a2e4cf252636d857d2113053bdd9dc Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
| * | Added a preliminary version of rename detectionJeff Schumacher2010-07-014-0/+275
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | JGit does not currently do rename detection during diffs. I added a class that, given a TreeWalk to iterate over, can output a list of DiffEntry's for that TreeWalk, taking into account renames. This class only detects renames by SHA1's. More complex rename detection, along the lines of what C Git does will be added later. Change-Id: I93606ce15da70df6660651ec322ea50718dd7c04
| * | Refactored code out of FileHeader to facilitate rename detectionJeff Schumacher2010-06-302-123/+176
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Refactored a superclass out of FileHeader called DiffEntry that holds the more general data from FileHeader that is useful in rename detection (old/new Ids, modes, names, as well as changeType and score). FileHeader is now a DiffEntry that adds Hunks, parsing abilities, etc. Change-Id: I8398728cd218f8c6e98f7a4a7f2f342391d865e4
* | | Fix infinite loop in IndexPackShawn O. Pearce2010-07-163-146/+104
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A programming error using the Inflater API led to an infinite loop within IndexPack, caused by the Inflater returning 0 from the inflate() method, but it didn't want more input. This happens when it has reached the end of the stream, or has reached a spot asking for an external dictionary. Such a case is a failure for us, and we should abort out. Thanks to Alex for pointing out that we had 3 implementations of the inflate rountine, which should be consolidated into one and use a switch to determine where to load data from. Bug: 317416 Change-Id: I34120482375b687ea36ed9154002d77047e94b1f Signed-off-by: Shawn O. Pearce <spearce@spearce.org>