summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorDariusz Luksza <dariusz.luksza@gmail.com>2023-11-03 11:52:03 +0000
committerMatthias Sohn <matthias.sohn@sap.com>2023-11-07 01:29:39 +0100
commit3937300f3eb4dd557ec2d195f21793f737d6cb4e (patch)
tree9fe2d9492e028061cab1b1c38c109a51ed3dc4c7
parent1c320d0d41a944fe7a28158103d0318a01785e46 (diff)
downloadjgit-3937300f3eb4dd557ec2d195f21793f737d6cb4e.tar.gz
jgit-3937300f3eb4dd557ec2d195f21793f737d6cb4e.zip
Optimise Git protocol v2 `ref-prefix` scanning
Currenty JGit will go over all refs in the repository for each `ref-prefix`. This means that refs will be read multiple times, which leads to subpar performance. Native git, uses a different approach, where all refs are read once and then for each ref, all `ref-prefix` filter values are checked in one pass. This change implements this approach in JGit. And makes `ref-prefix` filtering ~28% faster for a repository with fully packed refs and ~5% when RefTable is used instead of refdir. Different implementations were tested on a synthetic file repository with 300k refs. Different implementations were tested for unpacked and fully packed refs (results are in seconds). Unpacked refs: Current Impl: 54.838 57.234 56.138 Nested for loops: 36.094 37.025 36.502 Nested stream's: 36.154 35.989 37.262 Parallel stream + stream: 36.923 37.272 35.362 Nested parallel stream's: 35.512 38.395 36.745 Stream + for loop: 34.950 36.164 37.191 Parallel stream + for loop: 37.695 35.511 35.378 Packed refs: Current Impl: 39.713 39.954 38.653 Nested for loops: 29.891 29.753 29.377 Nested stream's: 30.340 29.637 30.412 Parallel stream + stream: 28.653 28.254 29.138 Nested parallel stream's: 29.942 28.850 31.030 Stream + for loop: 29.405 29.576 30.539 Parallel stream + for loop: 29.012 29.215 29.380 RefTable: Current Impl: 0.273 0.294 0.330 Nested for loops: 0.252 0.169 0.215 Nested stream's: 0.252 0.228 0.213 Parallel stream + stream: 0.233 0.259 0.247 Nested parallel stream's: 0.416 0.309 0.340 Stream + for loop: 0.224 0.247 0.242 Parallel stream + for loop: 0.347 0.246 0.346 The elapsed time was measured around `getRefsByPrefix` call in `UploadPack.getFilteredRefs(Collection<String>)` (around lines 952 and 954). Based on the above results, the implementation with parallel stream and stream was selected. Bug: 578550 Change-Id: Iac3a3aacf897b87b3448c1d528cdac64ad312199 Signed-off-by: Dariusz Luksza <dariusz.luksza@gmail.com>
-rw-r--r--org.eclipse.jgit/src/org/eclipse/jgit/lib/RefDatabase.java8
1 files changed, 3 insertions, 5 deletions
diff --git a/org.eclipse.jgit/src/org/eclipse/jgit/lib/RefDatabase.java b/org.eclipse.jgit/src/org/eclipse/jgit/lib/RefDatabase.java
index 9e05a39731..f3726868cb 100644
--- a/org.eclipse.jgit/src/org/eclipse/jgit/lib/RefDatabase.java
+++ b/org.eclipse.jgit/src/org/eclipse/jgit/lib/RefDatabase.java
@@ -474,11 +474,9 @@ public abstract class RefDatabase {
*/
@NonNull
public List<Ref> getRefsByPrefix(String... prefixes) throws IOException {
- List<Ref> result = new ArrayList<>();
- for (String prefix : prefixes) {
- result.addAll(getRefsByPrefix(prefix));
- }
- return Collections.unmodifiableList(result);
+ return getRefsByPrefix(ALL).parallelStream().filter(
+ ref -> Stream.of(prefixes).anyMatch(ref.getName()::startsWith))
+ .collect(Collectors.toUnmodifiableList());
}