Shawn O. Pearce [Fri, 20 Aug 2010 16:25:47 +0000 (09:25 -0700)]
Try really hard to load a commit or tag
When we need the canonical form of a commit or a tag in order to
parse it into our RevCommit or RevTag fields, we really need it as a
single contiguous byte array. However the ObjectDatabase may choose
to give us a large loader. In general commits or tags are always
under the several MiB limit, so even if the loader calls it "large"
we should still be able to afford the JVM heap memory required to
get a single byte array. Coerce even large loaders into a single
byte array anyway.
Change-Id: I04efbaa7b31c5f4b0a68fc074821930b1132cfcf Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
ReadTreeTests relied on Repository.getIndex() which on
platforms which coarse FileSystemTimers failed to detect
index modifications. By explicitly reloading and writing
the index this problem is solved.
Change-Id: I0a98babfc2068a3b6b7d2257834988e1154f5b26 Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Shawn O. Pearce [Thu, 19 Aug 2010 18:50:20 +0000 (11:50 -0700)]
Make ObjectId.compareTo final
Since equals() is now final and does not permit being overridden,
we should do the same thing with compareTo() to prevent different
subclasses from having different ordering behaviors. This could
lead to the same mess that we had with different equals() behaviors.
Change-Id: I35a849b6efccee5fe74cc5788a3566a1516004b7 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Shawn O. Pearce [Thu, 19 Aug 2010 18:46:22 +0000 (11:46 -0700)]
Make ObjectId.hashCode final too
Since equals() is now final and does not permit being overridden,
we should do the same thing with hashCode() to prevent different
subclasses from having different hashing behaviors. This could
lead to the same mess that we had with different equals() behaviors.
Change-Id: I35a849b6efccee5fe74cc5788a3566a1516004b7 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Shawn O. Pearce [Thu, 19 Aug 2010 18:43:39 +0000 (11:43 -0700)]
Remove unnecessary ObjectId.copy() calls
When RevObject overrode equals() to provide only reference equality
we used to need to convert a RevObject into an ObjectId by copy()
just to use standard Java tools like JUnit assertEquals(), or to
use contains() or get() on standard java.util collection types.
Now that we have removed this override and made ObjectId's equals()
final (preventing any of this mess in the future), some copy()
calls are unnecessary. Anytime the value is being used as an input
to a lookup routine, or to an equals, we can avoid the copy().
However we still want to use copy() anytime we are given an ObjectId
that may exist long-term, where we don't want the high cost of the
additional storage from a RevCommit extension. So we can't remove
all uses of copy(), just some of them.
Change-Id: Ief275dace435c0ddfa362ac8e5d93558bc7e9fc3 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The MergeResult class is enhanced to report more data about a
three-way merge. Information about conflicts and the base, ours,
theirs commits can be retrived.
Change-Id: Iaaf41a1f4002b8fe3ddfa62dc73c787f363460c2 Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Chris Aniszczyk [Thu, 19 Aug 2010 01:56:27 +0000 (20:56 -0500)]
Remove getter and setter for author in Tag
There was a duplicated getter and setter for tagger in Tag.
There's no needed to have two getters and setters that represent
the same things. The appropriate tests were updated also.
Change-Id: If46dc00c4c0f31ea4234c6d3bda3c03e6ebbafac Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
indexState() encodes the complete state of the index
into one readable String. This helps to write tests
against the index. indexState() is enhanced to optionally
also contain the content of the files in the index.
Change-Id: Ie988f93768d864f4cbd55809a786bd5759fc24a5 Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Added a utility method to set the reset an index to match exactly
some content in the filesystem. This can be used by tests to prepare
commits in the working-tree and set the index in one shot.
[sp: Cleaned up formatting, added getEntryFile(), released inserter.]
Change-Id: If38b1f7cacaaf769f51b14541c5da0c1e24568a5 Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Add a convenience API in FileRepository to pass in a String that
points to the GIT_DIR location. This is converted to a File and
sent through the usual constructor.
Matthias Sohn [Mon, 16 Aug 2010 15:42:41 +0000 (17:42 +0200)]
Set default file encoding used for JGit tests to UTF-8
This patch fixes the problem that JGit tests run from Maven fail on
Mac OS X [1]. In Eclipse the tests succeed since we set Eclipse
workspace encoding to UTF-8 via "Preferences > General > Workspace
> Text file encoding", checked via JConsole that this setting changes
the JVM system property of the test run. This change copies this
setting to the Maven test environment so that we get consistent test
results on all platforms.
Matthias Sohn [Sun, 15 Aug 2010 21:12:58 +0000 (23:12 +0200)]
Backout RevObject's object-identity based equals implementation
This restores the transitivity and symmetry properties of the equals
methods on the AnyObjectId type hierarchy as defined in [1].
Following [2] we declare these equals methods final to ensure that
semantics of equals are consistent across AnyObjectId's type hierarchy.
Shawn O. Pearce [Fri, 6 Aug 2010 16:33:55 +0000 (09:33 -0700)]
Fix ArrayIndexOutOfBounds on non-square exact rename matrix
If the exact rename matrix for a particular ObjectId isn't square we
crashed with an ArrayIndexOutOfBoundsException because the matrix
entries were encoded backwards. The encode function accepts the
source (aka deleted) index first, not second. Add a unit test to
cover this non-square case to ensure we don't have this regression
in the future.
Change-Id: I5b005e5093e1f00de2e3ec104e27ab6820203566 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* changes:
Enable configuration of non-standard pack settings
Pass PackConfig down to PackWriter when packing
Simplify UploadPack use of options during writing
Move PackWriter configuration to PackConfig
Allow PackWriter callers to manage the thread pool
Rename getOldName,getNewName to getOldPath,getNewPath
TreeWalk calls this value "path", while "name" is the stuff after the
last slash. FileHeader should do the same thing to be consistent.
Rename getOldName to getOldPath and getNewName to getNewPath.
Bug: 318526
Change-Id: Ib2e372ad4426402d37939b48d8f233154cc637da Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Jeff Schumacher [Tue, 3 Aug 2010 23:59:30 +0000 (16:59 -0700)]
Fixed bug in scoring mechanism for rename detection
A bug in rename detection would cause file scores to be wrong. The
bug was due to the way rename detection would judge the similarity
between files. If file A has three lines containing 'foo', and file
B has 5 lines containing 'foo', the rename detection phase should
record that A and B have three lines in common (the minimum of the
number of times that line appears in both files). Instead, it would
choose the the number of times the line appeared in the destination
file, in this case file B. I fixed the bug by having the
SimilarityIndex instead choose the minimum number, as it should. I
also added a test case to verify that the bug had been fixed.
An utility method which was in RacyGitTests has been moved to
RepositoryTestCase. Also the javadoc has been improved.
This method allows to wait long enough until the
filesystem-timer has advanced. This is useful when it has to
be guaranteed that two files modifications have different
modification timestamps.
Change-Id: I2ebd7cd7818feba6acffb3f835101d8fd281bd5a Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
The indexState() method was enhanced to be more configurable. A bitmask
controls which of the optional parts are reported. All data about
the worktree is not reported anymore by this method which makes the
interface more cleaner for users wanting to test only the state of the
index.
This was done because the previous version reported always so much
additional data that it was hard to write good assertions against it.
Change-Id: I9b481e97f8fcf3fcdbb785b801dc07bfa85dcc33 Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Stefan Lay <stefan.lay@sap.com>
In order to test racy git situations we have to be able to control
the last-modification timestamps of the filesystem. Since we already
access the modification timestamps of files through an abstraction
(the WorkingTreeIterator) I add a new implementation of this iterator
which allows to map timestamp-ranges to single constant timestamps.
For users of this iterator it looks like all files in that range
have been modified at exactly the same time.
With the help of this iterator a test has been writting which
checkes for racy git handling (smudging, unsmudging, dirty-detection).
Additionally add a method to RepositoryTestCase which encodes the
current index state in one String. This should include info about
pathes, file/index modtime, smudgeState, clean-state. Make
sure timestamps are presented in a way that it is easy to
write assertions against this strings (no concrete milliseconds
but t0,t1,...).
These two topics depend circulary on each other: thats why they have
been squashed in one commit.
Change-Id: I115c3f2f20fca9b481830bdc6b9d1ade2c3abdcf Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Enable configuration of non-standard pack settings
For daemons we might want to disable delta compression entirely, or
in some strange case an administrator might need to turn of delta
reuse. Expose these normally internal pack settings through the pack
configuration section.
Change-Id: I39bfefee8384c864cc04ffac724f197240c8a11a Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
When we are creating a pack the higher level application should be able
to override the PackConfig used, allowing it to control the number of
threads used or how much memory is allocated per writer.
Change-Id: I47795987bb0d161d3642082acc2f617d7cb28d8c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
We only use these variables once, so just put them at the proper
use site and avoid assigning the local variable. The code is a
bit shorter and the intent is a little bit more clear.
Change-Id: I70d120fb149b612ac93055ea39bc053b8d90a5db Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This refactoring permits applications to configure global per-process
settings for all packing and easily pass it through to per-request
PackWriters, ensuring that the process configuration overrides the
repository specific settings.
For example this might help in a daemon environment where the server
wants to cap the resources used to serve a dynamic upload pack
request, even though the repository's own pack.* settings might be
configured to be more aggressive. This allows fast but less bandwidth
efficient serving of clients, while still retaining good compression
through a cron managed `git gc`.
Change-Id: I58cc5e01b48924b1a99f79aa96c8150cdfc50846 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Meaningful error message when trying to check-out submodules
Currently, a NullPointerException occurs in this case. We should
instead throw a more meaningful Exception with a proper message.
This is a very "stupid" implementation which simply checks for
the existence of a ".gitmodules" file.
the following tests fail under windows because certain inputstreams
are not closed and files cannot be deleted because of that. The
main problem I found is UnpackedObject.InflaterInputStream.close().
This method may throw exceptions found by checkValidEndOfStream()
but doesn't call super.close() before leaving. It is not clear to me
which resources a close() method should release before it throws an
exception. But those reseources which are not published to the
outside and which therefore cannot be closed by other means have to
be closed in all cases.
I changed the close() method to call super.close() under all
circumstances.
By deferring tag sorting until the commit is produced by the walker
we can avoid an infinite loop that was triggered by trying to sort
tags while allocating a commit. This also avoids needing to look
at commits which aren't going to be produced in the result.
Bug: 321103
Change-Id: I25acc739db2ec0221a50b72c2d2aa618a9a75f37 Reviewed-by: Mathias Kinzler <mathias.kinzler@sap.com> Reviewed-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
These need to be dynamic based on the current thread's environment
at time of execution in order to be properly localized for the end
user that will be seeing these messages.
Change-Id: I4976f462cfe606edd2761c0e36b2f6b20f63d53c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Allow PackWriter callers to manage the thread pool
By permitting the caller of PackWriter to select the Executor it
uses for task execution, we give the caller the ability to manage
the lifecycle of the thread pool, including reusing it across
concurrent pack generators.
This is the first step to supporting application thread pools
within Daemon or another managed service like Gerrit Code Review.
Change-Id: I96bee7b9c30ff9885f2bd261d0b6daaac713b5a4 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This is caused by a recursion in PlotWalk.getTags().
As a hotfix, the sort was simply removed. The sort
must be re-implemented so that parseAny() is not called
again (currently, this happens in the PlotRefComparator).
Change-Id: I060d26fda8a75ac803acaf89cfb7d3b4317328f3 Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com> Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Jeff Schumacher [Thu, 22 Jul 2010 19:55:28 +0000 (12:55 -0700)]
Break dissimilar file pairs during diff
File pairs that are very dissimilar during a diff were not being
broken apart into their constituent ADD/DELETE pairs. The leads to
sub-optimal rename detection. Take, for example, this situation:
A file exists at src/a.txt containing "foo". A user renames src/a.txt
to src/b.txt, then adds a new src/a.txt containing "bar".
Even though the old a.txt and the new b.txt are identical, the
rename detection algorithm would not detect it as a rename since
it was already paired in a MODIFY. I added code to split all
MODIFYs below a certain score into their constituent ADD/DELETE
pairs. This allows situations like the one I described above to be
more correctly handled.
Change-Id: I22c04b70581f206bbc68c4cd1ee87a1f663b418e Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Add methods to the Repository class which write into MERGE_HEAD
and MERGE_MSG files. Since we have the read methods in the same
class this seems to be the right place.
Change-Id: I5dd65306ceb06e008fcc71b37ca3a649632ba462 Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Make StoredConfig an abstraction above FileBasedConfig
This exposes a load and save method, allowing a Repository to denote
that it has a persistent configuration of some kind which can be
accessed by the application, without needing to know exact details
of how its stored .
Change-Id: I7c414bc0f975b80f083084ea875eca25c75a07b2 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* delta: (103 commits)
Discard the uncompressed delta as soon as its compressed
Honor pack.windowlimit to cap memory usage during packing
Honor pack.threads and perform delta search in parallel
Cache small deltas during packing
Implement delta generation during packing
debug-show-packdelta: Dump a pack delta to the console
Initial pack format delta generator
Add debugging toString() method to ObjectToPack
Make ObjectToPack clearReuseAsIs signal available to subclasses
Correctly classify the compressing objects phase
Refactor ObjectToPack's delta depth setting
Configure core.bigFileThreshold into PackWriter
Add doNotDelta flag to ObjectToPack
Add more configuration options to PackWriter
Save object path hash codes during packing
Add path hash code to ObjectWalk
Add getObjectSize to ObjectReader
Allow TemporaryBuffer.Heap to allocate smaller than 8 KiB
Define a constant for 127 in DeltaEncoder
Cap delta copy instructions at 64k
...
Stefan Lay [Thu, 22 Jul 2010 12:57:00 +0000 (14:57 +0200)]
Allow client of Add command to set a WorkingTreeIterator
This is e.g. useful when a client of the AddCommand has
additional rules to ignore files. In Eclipse a resource can
be set to derived or be excluded by preferences.
Change-Id: I6c47e54a1ce26315faf5ed0723298ad2c2db197c Signed-off-by: Stefan Lay <stefan.lay@sap.com>
Move ignore node handling into WorkingTreeIterator
The working tree iterator has perfect knowledge of the path structure
as well as immediate information about whether or not an ignore file
even exists at this level. We can exploit that to simplify the
logic and running time for testing ignored file status by pushing
all of the checks down into the iterator itself.
Change-Id: I22ff534853e8c5672cc5c2d9444aeb14e294070e Signed-off-by: Shawn O. Pearce <spearce@spearce.org> CC: Charley Wang <chwang@redhat.com> CC: Chris Aniszczyk <caniszczyk@gmail.com> CC: Stefan Lay <stefan.lay@sap.com> CC: Matthias Sohn <matthias.sohn@sap.com>
The WorkingTreeIterator has a method to check whether
the current file differs from the corresponding index
entry. This commit improves this check to also handle
racy git situations.
See http://git.kernel.org/?p=git/git.git;a=blob;f=Documentation/technical/racy-git.txt;hb=HEAD
Change-Id: I3ad0897211dcbb2eac9eebcb19d095a5052fb06b Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Smudge racily clean index entries by truncating length (like git.git)
To mark an entry racily clean we set its length to 0 (like native git
does). Entries which are not racily clean and have zero length can be
distinguished from racily clean entries by checking P_OBJECTID
against the SHA1 of empty content. When length is 0 and P_OBJECTID is
different from SHA1 of empty content we know the entry is marked
racily clean.
See http://dev.eclipse.org/mhonarc/lists/jgit-dev/msg00488.html
Change-Id: I689552931441ab51964b430b303160c9126b66af Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Use proper constants for .gitignore and .git directory
We have a constant for .gitignore, so use it. While we are in
the same method, correct the reference of ".git" to be the actual
GIT_DIR given. This might not be within the work tree if the
GIT_DIR and GIT_WORK_TREE environment variables were used.
Change-Id: I38e1cec13405109b9c347858b38dd9fb2f1f2560 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> CC: Charley Wang <chwang@redhat.com> CC: Chris Aniszczyk <caniszczyk@gmail.com> CC: Stefan Lay <stefan.lay@sap.com> CC: Matthias Sohn <matthias.sohn@sap.com>
Remove gitIgnoreTimestamp from abstract iterator API
This never should have been exposed on the top of the
AbstractTreeIterator type hierarchy. There is no concept of a
timestamp in a canonical tree read from the object database, and
the time in the DirCache isn't what we want here either.
Actually all that we need is to find the files whose names are
".gitignore" and are below the root directory. We can accomplish
that with a suffix filter, and process them immediately.
Change-Id: Ib09cbf81a9e038452ce491385c65498312e2916b Signed-off-by: Shawn O. Pearce <spearce@spearce.org> CC: Charley Wang <chwang@redhat.com> CC: Chris Aniszczyk <caniszczyk@gmail.com> CC: Stefan Lay <stefan.lay@sap.com> CC: Matthias Sohn <matthias.sohn@sap.com>
If we have two adds of the same object but no deletes the detector
threw an NPE because the entry that came back from the deleted map
was null (no matching objects). In this case we need to put the
adds all back onto the list of left over additions since they did
not match a delete.
We didn't correctly handle the zlib trailer for an object. If the
trailer bytes were outside of the current buffer window but we had
fully inflated the object itself, we broke out of the loop (as we had
our target size) but inflate wasn't finished (as it did not yet get
the trailer) so we failed the test and threw a corruption exception.
Use an infinite loop and only break out when the inflater is done.
Change-Id: I7c9bbbeb577a990d9bc56a50ebd485935460f6c8 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Jonathan Gossage [Fri, 16 Jul 2010 23:53:23 +0000 (01:53 +0200)]
Fully implement Logger interface
On April 27, 2010 the Logger interface was upgraded with a number of new methods
to make it consistent with the implementations it was meant to support.
This patch makes RecordingLogger consistent with the Logger interface and allows to
also use Jetty 7.1.5 released with Helios which can be installed from the p2 repository
at http://download.eclipse.org/jetty/7.1.5.v20100705/repository
Change-Id: I5645436bbe7492f82d4069e4d9cbebede0bf764e Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Discard the uncompressed delta as soon as its compressed
The DeltaCache will most likely need to copy the compressed delta
into a new buffer in order to compact away the wasted space at the
end caused by over allocation. Since we don't need the uncompressed
format anymore, null out our only reference to it so the GC can
reclaim this memory if it needs to perform a collection in order
to satisfy the cache's allocation attempt.
Change-Id: I50403cfd2e3001b093f93a503cccf7adab43cc9d Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* js/rename:
Implemented file path based tie breaking to exact rename detection
Added more test cases for RenameDetector
Added very small optimization to exact rename detection
Fixed Misleading Javadoc
Added file path similarity to scoring metric in rename detection
Fixed potential div by zero bug
Added file size based rename detection optimization
Create FileHeader from DiffEntry
log: Implement --follow
Cache the diff configuration section
log: Add whitespace ignore options
Format submodule links during differences
Redo DiffFormatter API to be easier to use
log, diff: Add rename detection support
Implement similarity based rename detection
Added a preliminary version of rename detection
Refactored code out of FileHeader to facilitate rename detection
A programming error using the Inflater API led to an infinite
loop within IndexPack, caused by the Inflater returning 0 from
the inflate() method, but it didn't want more input. This happens
when it has reached the end of the stream, or has reached a spot
asking for an external dictionary. Such a case is a failure for us,
and we should abort out.
Thanks to Alex for pointing out that we had 3 implementations of
the inflate rountine, which should be consolidated into one and
use a switch to determine where to load data from.
Bug: 317416
Change-Id: I34120482375b687ea36ed9154002d77047e94b1f Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Jeff Schumacher [Tue, 13 Jul 2010 19:21:12 +0000 (12:21 -0700)]
Implemented file path based tie breaking to exact rename detection
During the exact rename detection phase in RenameDetector, ties were
resolved on a first-found basis. I added support for file path based
tie breaking during that phase. Basically, there are four situations
that have to be handled:
One add matching one delete:
In this simple case, we pair them as a rename.
One add matching many deletes:
Find the delete whos path matches the add the closest, and
pair them as a rename.
Many adds matching one delete:
Similar to the above case, we find the add that matches the
delete the closest, and pair them as a rename. The other adds
are marked as copies of the delete.
Many adds matching many deletes:
Build a scoring matrix similar to the one used for content-
based matching, scoring instead by file path. Some of the
utility functions in SimilarityRenameDetector are used in
this case, as we use the same encoding scheme. Once the
matrix is built, scan it for the best matches, marking them
as renames. The rest are marked as copies.
I don't particularly like the idea of using utility functions right
out of SimilarityRenameDetector, but it works for the moment. A later
commit will likely refactor this into a common utility class, as well
as bringing exact rename detection out of RenameDetector and into a
separate class, much like SimilarityRenameDetector.
Added possibility to compare the current entry of a WorkingTreeIterator
to a given DirCacheEntry. This is done to detect whether an entry
in the index is dirty or not. 'Dirty' means that the file in the working tree
is different from what's in the index. Merge algorithms will make use of
this to detect conflicts.
Change-Id: I3ff847f4bf392553dcbd6ee236c6ca32a13eedeb Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
After refactoring ReadTreeTest the tests failed for filesystems
with coarse modification time granularity. This is fixed by
explicitly telling the repo to reread the index after we build
a new index.
Additionally the test testDirectoryFileSimple was simplified
by using buildTree() instead of misusing GitIndex to construct
trees.
Change-Id: I20d2f097491e4cc8c657a696beabc7026b485017 Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Charley Wang [Mon, 12 Jul 2010 22:34:15 +0000 (00:34 +0200)]
Add compatibility with gitignore specifications
This patch adds ignore compatibility to jgit. It encompasses
exclude files as well as .gitignore. Uses TreeWalk and
FileTreeIterator to find nodes and parses .gitignore
files when required. The patch includes a simple cache that
can be used to save results and avoid excessive gitignore
parsing.
CQ: 4302
Bug: 303925
Change-Id: Iebd7e5bb534accca4bf00d25bbc1f561d7cad11b Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com> Signed-off-by: Stefan Lay <stefan.lay@sap.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Jeff Schumacher [Mon, 12 Jul 2010 18:48:56 +0000 (11:48 -0700)]
Added very small optimization to exact rename detection
Optimized a small loop in findExactRenames. The loop would go through
all the items in a list of DiffEntries even after it already found
what it was looking for. I made it break out of the loop as soon as
a good match was found.
Jeff Schumacher [Mon, 12 Jul 2010 17:37:10 +0000 (10:37 -0700)]
Fixed Misleading Javadoc
The javadoc for the setRenameLimit method in RenameDetector said
that you could only have limits in the range (0,100), implying
that 0 and 100 were illegal inputs. The code, however, allowed 0 and
100. I changed the javadoc to say that the range [0,100] was legal.
I also documented the IllegalArgumentException that is thrown if the
limit is outside that range.
Jeff Schumacher [Fri, 9 Jul 2010 22:11:54 +0000 (15:11 -0700)]
Added file path similarity to scoring metric in rename detection
The scoring method was not taking into account the similarity of
the file paths and file names. I changed the metric so that it is 99%
based on content (which used to be 100% of the old metric), and 1%
based on path similarity. Of that 1%, half (.5% of the total final
score) is based on the actual file names (e.g. "foo.java"), and half
on the directory (e.g. "src/com/foo/bar/").
Jeff Schumacher [Fri, 9 Jul 2010 19:53:57 +0000 (12:53 -0700)]
Fixed potential div by zero bug
The scoring logic in SimilarityIndex was dividing by the max file
size. If both files are empty, this would cause a div by zero
error. This case cannot currently happen, since two empty files
would have the same SHA1, and would therefore be caught in the
earlier SHA1 based detection pass. Still, if this logic eventually
gets separated from that pass, a div by zero error would occur.
I changed the logic to instead consider two empty files to have a
similarity score of 100.
Jeff Schumacher [Fri, 9 Jul 2010 18:18:50 +0000 (11:18 -0700)]
Added file size based rename detection optimization
Prior to this change, files that were very different in size (enough
so that they could not have enough in common to be detected as
renames) were still having their scores calculated. I added an
optimization to skip such files. For example, if the rename detection
threshold is 60%, the larger file is 200kb, and the smaller file is
50kb, the pair cannot be counted as a rename since they cannot
possibly share 60% of their content in common. (200*.6=120, 120>50)
Honor pack.windowlimit to cap memory usage during packing
The pack.windowlimit configuration parameter places an upper bound
on the number of bytes used by the DeltaWindow class as it scans
through the object list. If memory usage would exceed the limit
the window is temporarily decreased in size to keep memory used
within that bound.
Change-Id: I09521b8f335475d8aee6125826da8ba2e545060d Signed-off-by: Shawn O. Pearce <spearce@spearce.org>