mirrors/jgit - jgit - source @ dussan.org

Commit grafiek

Auteur	SHA1	Bericht	Datum
Adithya Chakilam	0bd2f4bf77	Introduce getMergedInto(RevCommit commit, Collection<Ref> refs) In cases where we need to determine if a given commit is merged into many refs, using isMergedInto(base, tip) for each ref would cause multiple unwanted walks. getMergedInto() marks the unreachable commits as uninteresting which would then avoid walking that same path again. Using the same api, also introduce isMergedIntoAny() and isMergedIntoAll() Change-Id: I65de9873dce67af9c415d1d236bf52d31b67e8fe Signed-off-by: Adithya Chakilam <quic_achakila@quicinc.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	3 jaren geleden
Terry Parker	dbd05433ec	Move reachability checker generation into the ObjectReader object Reachability checkers are retrieved from RevWalk and ObjectWalk objects: * RevWalk.createReachabilityChecker() * ObjectWalk.createObjectReachabilityChecker() Since RevWalks and ObjectWalks are themselves directly instantiated in hundreds of places (e.g. UploadPack...) overriding them in a consistent way requires overloading 100s of methods, which isn't feasible. Moving reachability checker generation to a more central place solves that problem. The ObjectReader object seems a good place from which to get reachability checkers, because reachability checkers return information about relationships between objects. ObjectDatabases delegate many operations to ObjectReaders, and reachability bitmaps are attached to ObjectReaders. The Bitmapped and Pedestrian reachability checker objects were package private in the org.eclipse.jgit.revwalk package. This change makes them public and moves them to the org.eclipse.jgit.internal.revwalk package. Corresponding tests are also moved. Motivation: 1) Reachability checking algorithms need to scale. One of the internal Android repositories has ~2.4 million refs/changes/* references, causing bad long tail performance in reachability checks. 2) Reachability check performance is impacted by repository topography: number of refs, number of objects, amounts of related vs. unrelated history. 3) Reachability check performance is also affected by per-branch access (Gerrit branch permissions) since different users can see different branches. 4) Reachability check performance isn't affected by any state in a RevWalk or ObjectWalk. I don't yet know if a single algorithm will work for all cases in #2 and #3. We may need to evolve the ReachabilityChecker interfaces over time to solve the Gerrit branch permissions case, or use Gerrit-specific identity information to solve that in an efficient way. This change takes the existing public API and moves it to the ObjectReader/whole repository level, which is where we can do consistent customizations for #2 and #3. We intend to upstream the best of whatever works, but anticipate the need for multiple rounds of experimentation. Change-Id: I9185feff43551fb387957c436112d5250486833d Signed-off-by: Terry Parker <tparker@google.com>	3 jaren geleden
Alex Spradlin	e498d43186	RevWalk: new topo sort to not mix lines of history The topological sort algorithm in TopoSortGenerator for RevWalk may mix multiple lines of history, producing results that differ from C git's git-log whose man page states: "Show no parents before all of its children are shown, and avoid showing commits on multiple lines of history intermixed." Lines of history are mixed because TopoSortGenerator merely delays producing a commit until all of its children have been produced; it does not immediately produce a commit after its last child has been produced. Therefore, add a new RevSort option called TOPO_KEEP_BRANCH_TOGETHER with a new topo sort algorithm in TopoNonIntermixGenerator. In the Generator, when the last child of a commit has been produced, unpop that commit so that it will be returned upon the subsequent call to next(). To avoid producing duplicates, mark commits that have not yet been produced as TOPO_QUEUED so that when a commit is popped, it is produced if and only if TOPO_QUEUED is set. To support nesting with other generators that may produce the same commit multiple times like DepthGenerator (for example, StartGenerator does this), do not increment parent inDegree for the same child commit more than once. Commit `b5e764abd2` modified the existing TopoSortGenerator to avoid mixing lines of history, but it was reverted in `e40c38ab08` because the new behavior caused problems for EGit users. This motivated adding a new Generator for the new behavior. Signed-off-by: Alex Spradlin <alexaspradlin@google.com> Change-Id: Icbb24eac98c00e45c175b01e1c8122554f617933	4 jaren geleden
Alex Spradlin	e40c38ab08	Revert "RevWalk: stop mixing lines of history in topo sort" This reverts commit `b5e764abd2`. PlotWalk uses the TopoSortGenerator, which is causing problems for EGit users who rely on the emission of commits being somewhat based on date as in the previous topo-sort algorithm. Bug: 560529 Change-Id: I3dbd3598a7aeb960de3fc39352699b4f11a8c226 Signed-off-by: Alex Spradlin <alexaspradlin@google.com>	4 jaren geleden
Alex Spradlin	b5e764abd2	RevWalk: stop mixing lines of history in topo sort The topological sort algorithm in TopoSortGenerator for RevWalk may mix multiple lines of history, producing results that differ from C git's git log whose man page states: "Show no parents before all of its children are shown, and avoid showing commits on multiple lines of history intermixed." Lines of history are mixed because TopoSortGenerator merely delays a commit until all of its children have been produced; it does not immediately produce a commit after its last child has been produced. Therefore, when the last child of a commit has been produced, unpop the commit so that it will be returned upon the subsequent call to next() in TopoSortGenerator. To avoid producing duplicates, mark commits that have not yet been produced as TOPO_QUEUED so that when a commit is popped, it is produced if and only if TOPO_QUEUED is set. To support nesting with other generators that may produce the same commit multiple times like DepthGenerator (for example, StartGenerator does this), do not increment parent inDegree for the same child commit more than once. Modify tests that assert that TopoSortGenerator mixes lines of commit history. Change-Id: I4ee03c7a8e5265d61230b2a01ae3858745b2432b Signed-off-by: Alex Spradlin <alexaspradlin@google.com>	4 jaren geleden
Matthias Sohn	5c5f7c6b14	Update EDL 1.0 license headers to new short SPDX compliant format This is the format given by the Eclipse legal doc generator [1]. [1] https://www.eclipse.org/projects/tools/documentation.php?id=technology.jgit Bug: 548298 Change-Id: I8d8cabc998ba1b083e3f0906a8d558d391ffb6c4 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	4 jaren geleden
Matthias Sohn	70258a9cb2	[error prone] fix ReferenceEquality warning in RevWalk#isMergedInto Change-Id: Ibef75e2bc76e90f6e29c4cb3ba1c1f6e67009b10 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	4 jaren geleden
Thomas Wolf	758124fa9c	Correct @since in RevWalk for the --first-parent methods Fixes PDE API checks complaining: the methods were added in JGit 5.5.0. Change-Id: I9ff860c3408c6bb3891fa0da7547394d0fe9d0b6 Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>	4 jaren geleden
Dave Borowitz	4973f05252	RevWalk: Add a setFirstParent that mimics C git's --first-parent RevWalk does not currently provide a --first-parent equivalent and the feature has been requested. Add a field to the RevWalk class to specify whether walks should traverse first parents only. Modify Generator implementations to support the feature. Change-Id: I4a9a0d5767f82141dcf6d08659d7cb77c585fae4 Signed-off-by: Dave Borowitz <dborowitz@google.com> Signed-off-by: Alex Spradlin <alexaspradlin@google.com>	5 jaren geleden
Ivan Frade	5dce8614ab	RevWalk: new method createReachabilityChecker() Every caller would need to check if bitmaps are available in the repo to instantiate a reachability checker. Offer a method to create the reachability checker in the walk: the caller has already a walk over the repo, and the walk has all the information required. This allows us to make the implementation classes package-private. Change-Id: I355e47486fcd9d55baa7cb5700ec08dcc230eea5 Signed-off-by: Ivan Frade <ifrade@google.com>	5 jaren geleden
Carsten Hammer	6a4c77e619	Use isEmpty() instead of size()==0 where possible Change-Id: I97f1367a2ea9f1f6146e264c27c3981b842f2a26 Signed-off-by: Carsten Hammer <carsten.hammer@t-online.de> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	5 jaren geleden
Mincong HUANG	d09388e156	Avoid double words Change-Id: I0fdf595cba93f5a5cdd0496cee07ac91db304532 Signed-off-by: Mincong Huang <mincong.h@gmail.com>	5 jaren geleden
Jonathan Nieder	a0cd400c37	Simplify RevWalk#iterator by factoring out common code Factor out a helper that calls next() and tunnels IOException in a RuntimeException, similar to TunnelException.tunnel(RevWalk::next) in Guava terms[1]. This should make the code a little more readable. No functional change intended. [1] https://github.com/google/guava/issues/2828#issuecomment-304187823 Change-Id: I97c062d03a17663d5c40895fd3d2c6a7306d4f39 Signed-off-by: Jonathan Nieder <jrn@google.com>	5 jaren geleden
Jonathan Nieder	aeba003200	Simplify exception handling in RevWalk#iterator MissingObjectException and IncorrectObjectTypeException are subclasses of IOException. Change-Id: Ib4e1f37ce1b0b08e69ba3375bbdb6ee82ee4f036 Signed-off-by: Jonathan Nieder <jrn@google.com>	5 jaren geleden
Terry Parker	115a740e2f	Correctly handle initialization of shallow commits In a new RevWalk, if the first object parsed is one of the shallow commits, the following happens: 1) RevCommit.parseCanonical() is called on a new "r1" RevCommit. 2) RevCommit.parseCanonical() immediately calls RevWalk.initializeShallowCommits(). 3) RevWalk.initializeShallowCommits() calls lookupCommit(id), creating and adding a new "r2" version of this same object and marking its parents empty. 4) RevCommit.parseCanonical() initializes the "r1" RevCommit's fields, including the parents. 5) RevCommit.parseCanonical()'s caller uses the "r1" commit that has parents, losing the fact that it is a shallow commit. This change passes the current RevCommit as an argument to RevWalk.initializeShallowCommits() so that method can set its parents empty rather than creating the duplicate "r2" commit. Change-Id: I67b79aa2927dd71ac7b0d8f8917f423dcaf08c8a Signed-off-by: Terry Parker <tparker@google.com>	6 jaren geleden
Han-Wen Nienhuys	f3ec7cf3f0	Remove further unnecessary 'final' keywords Remove it from * package private functions. * try blocks * for loops this was done with the following python script: $ cat f.py import sys import re import os def replaceFinal(m): return m.group(1) + "(" + m.group(2).replace('final ', '') + ")" methodDecl = re.compile(r"^([\t ][a-zA-Z_ ]+)$([^)])$") def subst(fn): input = open(fn) os.rename(fn, fn + "~") dest = open(fn, 'w') for l in input: l = methodDecl.sub(replaceFinal, l) dest.write(l) dest.close() for root, dirs, files in os.walk(".", topdown=False): for f in files: if not f.endswith('.java'): continue full = os.path.join(root, f) print full subst(full) Change-Id: If533a75a417594fc893e7c669d2c1f0f6caeb7ca Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>	6 jaren geleden
Han-Wen Nienhuys	6d370d837c	Remove 'final' in parameter lists Change-Id: Id924f79c8b2c720297ebc49bf9c5d4ddd6d52547 Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>	6 jaren geleden
David Pursehouse	94cf82dbc5	RevWalk: Annotate methods documented to return "Never null" as @NonNull Change-Id: If1a1bed4b04dd48c9573fd3c4eacbf73de40622f Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>	6 jaren geleden
Matthias Sohn	0cba440277	Fix javadoc in org.eclipse.jgit revwalk package Change-Id: I3fabab8afa284b1919ab7bc656cab19e56ed474e Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	6 jaren geleden
David Pursehouse	3b4448637f	Enable and fix warnings about redundant specification of type arguments Since the introduction of generic type parameter inference in Java 7, it's not necessary to explicitly specify the type of generic parameters. Enable the warning in Eclipse, and fix all occurrences. Change-Id: I9158caf1beca5e4980b6240ac401f3868520aad0 Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>	7 jaren geleden
David Pursehouse	7ac182f4e4	Enable and fix 'Should be tagged with @Override' warning Set missingOverrideAnnotation=warning in Eclipse compiler preferences which enables the warning: The method <method> of type <type> should be tagged with @Override since it actually overrides a superclass method Justification for this warning is described in: http://stackoverflow.com/a/94411/381622 Enabling this causes in excess of 1000 warnings across the entire code-base. They are very easy to fix automatically with Eclipse's "Quick Fix" tool. Fix all of them except 2 which cause compilation failure when the project is built with mvn; add TODO comments on those for further investigation. Change-Id: I5772061041fd361fe93137fd8b0ad356e748a29c Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>	7 jaren geleden
Terry Parker	2a7897bb4c	RevWalk: Make fields available to DepthWalk DepthWalk needs to override toObjectWalkWithSameObjects() and thus needs to be able to directly set the objects and freeFlags fields, so make them package private. Change-Id: I24561b82c54ba3d6522582ca25105b204d777074 Signed-off-by: Terry Parker <tparker@google.com>	7 jaren geleden
Andrey Loskutov	9bdbbd32ae	Don't call reader.close() 2 times on dispose() Bug: 479406 Change-Id: I6645a8f36ea349a5f04fd14d2c1ef2ecac2bcc37 Signed-off-by: Andrey Loskutov <loskutov@gmx.de>	8 jaren geleden
Shawn Pearce	b46c446395	UploadPack: Verify clients send only commits for shallow lines If a client mistakenly tries to send a tag object as a shallow line JGit blindly assumes this is a commit and tries to parse the tag buffer using the commit parser. This can cause an obtuse error like: InvalidObjectIdException: Invalid id: t c0ff331234... The "t" comes from the "object c0ff331234..." line of the tag tring to be parsed as though it where the "tree" line of a commit. Run any client supplied shallow lines through the RevWalk to lookup the object types. Fail fast with a protocol exception if any of them are non-commit. Skip objects not known to this repository. This matches behavior with git-core's upload-pack, which sliently skips over any shallow line object named by the client but not known by the server. Change-Id: Ic6c57a90a42813164ce65c2244705fc42e84d700	8 jaren geleden
Matthias Sohn	686124bec3	Replace deprecated release() methods by close() See the discussion [1] in the Gerrit mailing list. [1] https://groups.google.com/forum/#!topic/repo-discuss/RRQT_xCqz4o Change-Id: I2c67384309c5c2e8511a7d0d4e088b4e95f819ff Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	9 jaren geleden
Shawn Pearce	53e39094bf	ObjectWalk: make setRetainBody(false) the default Despite being the primary author of RevWalk and ObjectWalk I still fail to remember to setRetainBody(false) in application code using an ObjectWalk to examine the graph. Document the default for RevWalk is setRetainBody(true), where the application usually wants the commit bodies to display or inspect. Change the default for ObjectWalk to setRetainBody(false), as nearly all callers want only the graph shape and do not need the larger text inside a commit body. This allows some code in JGit to be simplified. Change-Id: I367e42209e805bd5e1f41b4072aeb2fa98ec9d99	9 jaren geleden
Dave Borowitz	1e694f3847	RevWalk: Do not close reader passed explicitly to constructor The RevWalk(ObjectReader) constructor is explicitly to handle the case where the caller is responsible for opening and closing the reader. The reader should only be closed when it was created in the RevWalk(Repository) constructor. Change-Id: Ic0d595dc8d10de79e87549546c6c5ea2dc617e9b	9 jaren geleden
Dave Borowitz	a91f87d9e1	RevWalk: Stop using deprecated ObjectReader#release() Change-Id: If4d34f18352bd17467aeded6fd3478f29244657b	9 jaren geleden
Matthias Sohn	77030a5e94	Implement AutoClosable interface on classes that used release() Implement AutoClosable and deprecate the old release() method to give JGit consumers some time to adapt. Bug: 428039 Change-Id: Id664a91dc5a8cf2ac401e7d87ce2e3b89e221458 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	9 jaren geleden
Shawn Pearce	048dbf4173	Add retainOnReset(RevFlag) to RevWalk to simplify reset usage Applications sometimes use a RevFlag instead of a Set<RevObject> to track boolean state bits about objects being processed. However this requires careful use of the resetRetain() methods to avoid an accidental clearing of the RevFlag bits, effectively clearing the Set<RevObject> the application wanted to track. Simplify that use case by offering retainOnReset, a collection of flags that are never cleared by the RevWalk. Change-Id: I4c05b89b1398e4a4f371eac3a5d1d5edddec838f	9 jaren geleden
Dave Borowitz	dbf922ce91	RevWalk: Allow disabling parent rewriting Previously, setting any TreeFilter on a RevWalk triggered parent rewriting, which in the current StartGenerator implementation ends up buffering the entire commit history in memory. Aside from causing poor performance on large histories, this does not match the default behavior of `git rev-list`, which does not rewrite parent SHAs unless asked to via --parents/--children. Add a new method setRewriteParents() to RevWalk to disable this behavior. Continue rewriting parents by default to maintain backwards compatibility. Change-Id: I1f38e05526071c75ca58095e312663de5e6f334d	10 jaren geleden
Gustaf Lundh	7d5e1f8497	Fixed RevWalk.isMergedInto() returning wrong results Under certain circumstances isMergedInto() returned false even though base is reachable from the tip. This will hinder pushes and receives by falsely detecting "non fast forward" merges. o---o---o---o---o / \ / o---o---A---o---M / / ---2---1- if M (tip) was compared to 1 (base), the method isMergedInto() could still return false, since two mergeBases will be detected and the return statement will only look at one of them: return next() == base; In most cases this would pass, but if "A" is a commit with an old timestamp, the Generator would walk down to "2" before completing the walk pass "A" and first finding the other merge base "1". In this case, the first call to next() returns 2, which compared to base evaluates as false. This is fixed by iterating merge bases and returning true if base is found among them. Change-Id: If2ee1f4270f5ea4bee73ecb0e9c933f8234818da Signed-off-by: Gustaf Lundh <gustaf.lundh@sonymobile.com> Signed-off-by: Sven Selberg <sven.selberg@sonymobile.com>	10 jaren geleden
Christian Halstrick	8352d1729c	Add a missing since tag Otherwise you get errors if you want to edit JGit in Eclipse Change-Id: I840d4388f159e2db27845a17030b511fc5708f43	10 jaren geleden
Shawn Pearce	b0174a089c	Fix serving fetch of existing shallow client In certain cases a JGit server updating an existing shallow client selected a common ancestor that was behind the shallow edge of the client. This allowed the server to assume the client had some objects it did not have and allowed creation of pack deltas the client could never inflate. Any commit the client has advertised as shallow must be treated by UploadPack server as though it has no parents. With no parents the walker cannot visit graph history the client does not have, and PackWriter cannot consider delta base candidates the client is lacking. Change-Id: I4922b9354df9f490966a586fb693762e897345a2	10 jaren geleden
Dave Borowitz	b0326235e1	Remove unused repository field from RevWalk The comment about legacy Tag and Object types no longer applies, though prior to Idb273d5a92849b42935ac14eed73b796b80aad50 the field was still being used by RewriteTreeFilter. Change-Id: I9ee5da8f8a3b61c9cf543817c03117ee0609dd8f	11 jaren geleden
Marc Strapetz	67edd3eda7	RevWalk support for shallow clones StartGenerator now processes .git/shallow to have the RevWalk stop for shallow commits. See RevWalkShallowTest for tests. Bug: 394543 CQ: 6908 Change-Id: Ia5af1dab3fe9c7888f44eeecab1e1bcf2e8e48fe Signed-off-by: Chris Aniszczyk <zx@twitter.com>	11 jaren geleden
Robin Stocker	c8ed4a5006	RevWalk: Add link to parseHeaders/parseBody in Javadoc of lookupCommit Change-Id: I7765d1a69d19968ebad603025a9c686f17633ebd	11 jaren geleden
Kevin Sawicki	17fb542e9e	Remove 86 boxing warnings Use Integer, Character, and Long valueOf methods when passing parameters to MessageFormat and other places that expect objects instead of primitives Change-Id: I5942fbdbca6a378136c00d951ce61167f2366ca4	12 jaren geleden
Robin Rosenberg	95d311f888	Move JGitText to an internal package Change-Id: I763590a45d75f00a09097ab6f89581a3bbd3c797	12 jaren geleden
Kevin Sawicki	86e96b41e2	Correct typo in RevWalk.parseBody comment Change-Id: I0e65a5a6809a8d32d256322dbcae94b6aa603e5e Signed-off-by: Kevin Sawicki <kevin@github.com>	12 jaren geleden
Shawn O. Pearce	53db854185	Speed up ObjectWalk by 6235 objects/sec The "Counting objects" phase of packing is the most time consuming part for any server providing access to Git repositories. Scanning through the entire project history, including every revision of every tree that has ever existed is expensive and takes an incredible amount of CPU time. Inline the tree parsing logic, unroll a number of loops, and setup to better handle the common case of seeing another occurrence of an object that was already marked SEEN. This change boosts the "Counting objects" phase when JGit is acting as a server and is packing the linux-2.6 repository for its client. Compared to CGit on the same hardware, a JGit daemon server is now 21883 objects/sec faster: CGit: Counted `2058062` objects in 38981 ms at 52796.54 objects/sec Counted `2058062` objects in 38920 ms at 52879.29 objects/sec Counted `2058062` objects in 39059 ms at 52691.11 objects/sec JGit (before): Counted `2058062` objects in 31529 ms at 65275.21 objects/sec Counted `2058062` objects in 30359 ms at 67790.84 objects/sec Counted `2058062` objects in 30033 ms at 68526.69 objects/sec JGit (this commit): Counted `2058062` objects in 28726 ms at 71644.57 objects/sec Counted `2058062` objects in 27652 ms at 74427.24 objects/sec Counted `2058062` objects in 27528 ms at 74762.50 objects/sec Above the first run was a "cold server". For JGit the JVM had just started up with `jgit daemon`, and for CGit we hadn't touched the repository "recently" (but it was certainly in kernel buffer cache). The second and third runs were against the running JGit JVM, allowing timing tests to better reflect the benefits of JGit's pack and index caching, as well as any optimizations the JIT may have performed. The timings are fair. CGit is opening, checking and mmap'ing both the pack and index during the timer. JGit is opening, checking and malloc+read'ing the pack and index data into its Java heap during the timer. Both processes are walking the same graph space, and are computing the "path hash" necessary to sort objects in the object table for delta compression. Since this commit only impacts the "Counting objects" phase, delta compression was obviously not included in the timings and JGit may still be performing delta compression slower than CGit, resulting in an overall slower server experience for clients. Change-Id: Ieb184bfaed8475d6960a494b1f3c870e0382164a Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	12 jaren geleden
Shawn O. Pearce	bd970007be	ObjectIdOwnerMap: More lightweight map for ObjectIds OwnerMap is about 200 ms faster than SubclassMap, more friendly to the GC, and uses less storage: testing the "Counting objects" part of PackWriter on `1886362` objects: ObjectIdSubclassMap: load factor 50% table: `4194304` (wasted `2307942`) ms spent 36998 36009 34795 34703 34941 35070 34284 34511 34638 34256 ms avg 34800 (last 9 runs) ObjectIdOwnerMap: load factor 100% table: `2097152` (wasted 210790) directory: 1024 ms spent 36842 35112 34922 34703 34580 34782 34165 34662 34314 34140 ms avg 34597 (last 9 runs) The major difference with OwnerMap is entries must extend from ObjectIdOwnerMap.Entry, where the OwnerMap has injected its own private "next" field into each object. This allows the OwnerMap to use a singly linked list for chaining collisions within a bucket. By putting collisions in a linked list, we gain the entire table back for the SHA-1 bits to index their own "private" slot. Unfortunately this means that each object can appear in at most ONE OwnerMap, as there is only one "next" field within the object instance to thread into the map. For types that are very object map heavy like RevWalk (entity RevObject) and PackWriter (entity ObjectToPack) this is sufficient, these entity types are only put into one map by their container. By introducing a new map type, we don't break existing applications that might be trying to use ObjectIdSubclassMap to track RevCommits they obtained from a RevWalk. The OwnerMap uses less memory. Each object uses 1 reference more (so we're up 1,886,362 references), but the table is 1/2 the size (2^20 rather than 2^21). The table itself wastes only 210,790 slots, rather than 2,307,942. So OwnerMap is wasting 200k fewer references. OwnerMap is more friendly to the GC, because it hardly ever generates garbage. As the map reaches its 100% load factor target, it doubles in size by allocating additional segment arrays of 2048 entries. (So the first grow allocates 1 segment, second 2 segments, third 4 segments, etc.) These segments are hooked into the pre-allocated directory of 1024 spaces. This permits the map to grow to 2 million objects before the directory itself has to grow. By using segments of 2048 entries, we are asking the GC to acquire 8,204 bytes in a 32 bit JVM. This is easier to satisfy then 2,307,942 bytes (for the 512k table that is just an intermediate step in the SubclassMap). By reusing the previously allocated segments (they are re-hashed in-place) we don't release any memory during a table grow. When the directory grows, it does so by discarding the old one and using one that is 4x larger (so the directory goes to 4096 entries on its first grow). A directory of size 4096 can handle up to 8 millon objects. The second directory grow (16384) goes to 33 million objects. At that point we're starting to really push the limits of the JVM heap, but at least its many small arrays. Previously SubclassMap would need a table of `67108864` entries to handle that object count, which needs a single contiguous allocation of 256 MiB. That's hard to come by in a 32 bit JVM. Instead OwnerMap uses 8192 arrays of about 8 KiB each. This is much easier to fit into a fragmented heap. Change-Id: Ia4acf5cfbf7e9b71bc7faa0db9060f6a969c0c50 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 jaren geleden
Shawn O. Pearce	bc1af8459e	RevWalk: Don't reset ObjectReader when stopping Applications like UploadPack reset() and reuse the same RevWalk multiple times in very rapid succession. Releasing the ObjectReader's internal state on each use, only to allocate it again on the next cycle kills performance if the ObjectReader has internal caches, or even if the Inflater gets returned and pulled from the InflaterCache too frequently. Making releasing the ObjectReader the application's responsibility when it is done with the RevWalk, which most already do by wrapping their loop in a try/finally block. Change-Id: I3ad188a719e8d7f6bf27d1a7ca16d465534713f4 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 jaren geleden
Shawn O. Pearce	5664fb3bfb	UploadPack: Donate parsed commits to PackWriter When UploadPack has computed the merge base between the client's have set and the want set, its already loaded and parsed all of the interesting commits that PackWriter needs to transmit to the client. Switching the RevWalk and its object pool over to be an ObjectWalk saves PackWriter from needing to re-parse these same commits from the ObjectDatabase, reducing the startup latency for the enumeration phase of packing. UploadPack doesn't want to use an ObjectWalk for the okToGiveUp() tests because its slower, during each commit popped it needs to cache the tree into the pendingObjects list, and during each reset() it discards a bunch of ObjectWalk specific state and reallocates some internal collections. ObjectWalk was never meant to be rapidly reset() like UploadPack does, so its perhaps somewhat cleaner to allow "upgrading" a RevWalk to an ObjectWalk. Bug: 301639 Change-Id: I97ef52a0b79d78229c272880aedb7f74d0f7532f Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 jaren geleden
Shawn O. Pearce	b505e2a558	Use 5 MiB for RevWalk default limit Instead of getting the limit from CoreConfig, use the larger of the reader's limit or 5 MiB, under the assumption that any annotated tag or commit of interest should be under 5 MiB. But if a repository was really insane and had bigger objects, the reader implementation can set its streaming limit higher in order to allow RevWalk to still process it. Change-Id: If2c15235daa3e2d1f7167e781aa83fedb5af9a30 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 jaren geleden
Shawn O. Pearce	127a5f95e1	Use limited getCachedBytes in RevWalk Parsing is rewritten to use the size limited form of getCachedBytes, thus freeing the revwalk infrastructure from needing to care about a large object vs. a small object when it gets an ObjectLoader. Right now we hardcode our upper bound for a commit or annotated tag to be 15 MiB. I don't know of any that is more than 1 MiB in the wild, so going 15x that should give us some reasonable headroom. Change-Id: If296c211d8b257d76e44908504e71dd9ba70ffa8 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	13 jaren geleden
Shawn O. Pearce	f048af3fd1	Implement async/batch lookup of object data An ObjectReader implementation may be very slow for a single object, but yet support bulk queries efficiently by batching multiple small requests into a single larger request. This easily happens when the reader is built on top of a database that is stored on another host, as the network round-trip time starts to dominate the operation cost. RevWalk, ObjectWalk, UploadPack and PackWriter are the first major users of this new bulk interface, with the goal being to support an efficient way to pack a repository for a fetch/clone client when the source repository is stored in a high-latency storage system. Processing the want/have lists is now done in bulk, to remove the high costs associated with common ancestor negotiation. PackWriter already performs object reuse selection in bulk, but it now can also do the object size lookup and object counting phases with higher efficiency. Actual object reuse, deltification, and final output are still doing sequential lookups, making them a bit more expensive to perform. Change-Id: I4c966f84917482598012074c370b9831451404ee Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 jaren geleden
Shawn O. Pearce	11a5bef8b1	Offer ObjectReaders advice about a RevWalk By giving the reader information about the roots of a revision traversal, some readers may be able to prefetch information from their backing store using background threads in order to reduce data access latency. However this isn't typically necessary so the default reader implementation doesn't react to the advice. Change-Id: I72c6cbd05cff7d8506826015f50d9f57d5cda77e Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 jaren geleden
Shawn O. Pearce	384a19eee0	Deprecate all of the older Tree related code We want to get rid of these APIs, because they don't perform as well as DirCache/TreeWalk, or don't offer nearly as many features. Bug: 319145 Change-Id: I2b28f9cddc36482e1ad42d53e86e9d6461ba3bfc Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 jaren geleden
Shawn O. Pearce	412ca65bd5	Avoid unbounded getCachedBytes during parseAny Since we don't know the type of object we are parsing, we don't know if its a massive blob, or some small commit or annotated tag. Avoid pulling the cached bytes until we have checked the type and decided if we actually need them to continue parsing right now. This way large blobs which won't fit in memory and would throw a LargeObjectException don't abort parsing. Change-Id: Ifb70df5d1c59f616aa20ee88898cb69524541636 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 jaren geleden

1 2

60 Commits (0bd2f4bf77c856213e09d656a948e71f71cfd038)