Fix bug in copyPackBypassCache's skip 'PACK' header logic
Bug caused the pack to be 12 bytes short when cold cache. Also added
test for copyPackAsIs method.
Change-Id: Idf8fb0e50d1215245d4b032e2e00df4b218c115f
Signed-off-by: Minh Thai <mthai@google.com>
This method tried to iterate spurious files which may exist in the
.git/refs folder, e.g. on Mac a .DS_Store may have been created there by
inspecting the folder using the finder application. This led to a
NotDirectoryException when deleteEmptyRefsFolders tried to create an
iterator for such a file entry. Skip files contained in the refs folder
to ensure the method only tries to iterate contained folders but not
files.
Change-Id: I5f31e733072a35db1e93908a9c69a8891ae5c206
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
dfs: Remove synchronization in BlockBasedFile#LazyChannel
As explained in 'The "Double-Checked Locking is Broken"
Declaration'[*], Java's memory model does not support double-checked
locking:
class LazyReadableChannel {
private ReachableChannel rc = null;
public ReadableChannel get() {
if (rc == null) {
synchronized (this) {
if (rc == null) {
rc = new ReadableChannel();
}
}
}
return rc;
}
}
With JDK 5 and newer, there is a formal memory model that ensures this
works if "rc" is volatile, but it is still not thread-safe without
that.
Fortunately, this ReadableChannelSupplier is never passed between
threads, so it does not need to be thread-safe. Simplify by removing
the synchronization.
[*] https://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html
Change-Id: I0698ee6618d734fc129dd4f63fc047c1c17c94a9
Signed-off-by: Jonathan Nieder <jrn@google.com>
Lazily open ReadableChannel in BlockBasedFile.getOrLoadBlock
To avoid opening the readable channel in case of DfsBlockCache
hits. Also cleaning up typos around DfsBlockCache.
Change-Id: I615e349cb4838387c1e6743cdc384d1b81b54369
Signed-off-by: Minh Thai <mthai@google.com>
DfsBlockCache: Consolidate where ReadableChannel is opened
Opening a readable channel can be expensive and the number of channels
can be limited in DFS. Ensure that caller of
BlockBasedFile.readOneBlock() is responsible for opening and closing
the file, and that the ReadableChannel is reused in the request. As a side
effect, this makes the code easier to read, with better use of
try-with-resources.
The downside is that this means a readable channel is always opened, even
when the entire pack is already available for copying from cache. This
should be an acceptable cost: it's rare enough not to overload the server
and from a client latency perspective, the latency cost is in the noise
relative to the data transfer cost involved in a clone. If this turns out
to be a problem in practice, we can reintroduce that optimization in a
followup change.
Change-Id: I340428ee4bacd2dce019d5616ef12339a0c85f0b
Signed-off-by: Minh Thai <mthai@google.com>
Signed-off-by: Jonathan Nieder <jrn@google.com>
DfsBlockCache to lock while loading object references
We see the same index being loaded by multiple threads. Each is
hundreds of MB and takes several seconds to load, causing server to
run out of memory. This change introduces a lock to avoid these
duplicate works. It uses a new set of locks similar in implementation
to the loadLocks for getOrLoad of blocks. The locks are kept separate
to prevent long-running index loading from blocking out fast block
loading. The cache instance can be configured with a consumer to
monitor the wait time of the new locks.
Change-Id: I44962fe84093456962d5981545e3f7851ecb6e43
Signed-off-by: Minh Thai <mthai@google.com>
Using findRef instead of getRef makes it clearer that the caller wants
to search for the ref in the search path, instead of looking for a ref
that exactly matches the input.
This change introduces the new findRef method and deprecates getRef.
It updates Repository#findRef to use the new method, ensuring some
test coverage. Other callers will be updated in followup changes.
A nice side effect of introducing the new findRef method is that it is
final and based on firstExactRef, so implementers can focus on
implementing the latter efficiently and do not have to carefully write
custom path search code respecting SEARCH_PATH.
Change-Id: Id3bb944344a9743705fd1f20193ab679298fa51c
Signed-off-by: Jonathan Nieder <jrn@google.com>
RefDirectory: Look up several exact refs in one shot
Override exactRef(String...) and firstExactRef(String...) with
implementations specific to FileRepository.
The specialized implementations are similar to the generic ones from
RefDatabase, but because these use readRef directly instead of
exactRef, they only need to call fireRefsChanged once.
This will allow replacing RefDirectory#getRef with a generic
implementation that uses firstExactRef without hurting performance.
Change-Id: I1be1948bd6121c1a1e8152e201aab97e7fb308bb
Signed-off-by: Jonathan Nieder <jrn@google.com>
RefDirectory: Do not use search path to find additional refs
Psuedorefs like FETCH_HEAD and MERGE_HEAD are supposed to be directly
under the .git directory, not in other locations in the SEARCH_PATH
like refs/ and refs/heads/. Use exactRef to access them.
Change-Id: Iab8ac47008822fa78fc0691e239e518c34d7a98e
Signed-off-by: Jonathan Nieder <jrn@google.com>
getRef and exactRef can produce recoverable exceptions --- for
example, a corrupt loose ref that cannot be parsed. If readRef was
called and updated looseRefs in the process, RefsChangedEvent should
still be fired.
Noticed while improving the implementation of getRef. This commit
only affects exactRef and getRef. Other methods might be similarly
skipping firing RefsChangedEvent in their error handling code, and
this change does not fix them.
Change-Id: I0f460f6c8d9a585ad8453a4a47c1c77e24a1fb83
Signed-off-by: Jonathan Nieder <jrn@google.com>
RefDirectory: Refactor getRef and exactRef to share code
Both getRef and exactRef look for a ref or pseudoref in the $GIT_DIR
directory, with careful error handling to handle non-refs like
.git/config.
Avoid the duplication by factoring out a helper that takes care of
this. This should make the code easier to understand and manipulate.
Change-Id: I2ea67816d2385e84e2d3394b897e23df5826ba50
Signed-off-by: Jonathan Nieder <jrn@google.com>
Now the reference carries its updateIndex, so the cursor doesn't need
to expose it.
Change-Id: Icbfca46f92a13f3d8215ad10b2a166a6f40b0b0f
Signed-off-by: Ivan Frade <ifrade@google.com>
RefDatabase/Ref: Add versioning to reference database
In DFS implementations the reference table can fall out of sync, but
it is not possible to check this situation in the current API.
Add a property to the Refs indicating the order of its updates. This
version is set only by RefDatabase implementations that support
versioning (e.g reftable based).
Caller is responsible to check if the reference db creates versioned
refs before accessing getUpdateIndex(). E.g:
Ref ref = refdb.exactRef(...);
if (refdb.hasVersioning()) {
ref.getUpdateIndex();
}
Change-Id: I0d5ec8e8df47c730301b2e12851a6bf3dac9d120
Signed-off-by: Ivan Frade <ifrade@google.com>
This continues what commit d9ac7ddf10
(Remove unnecessary modifiers from interfaces, 2018-11-15) started.
Change-Id: I89720985a5a986722a0dcb9b5e9bbc25996bd5b3
The usage of non-short-circuit logic is intentional, per the inline
comment added in change Ib4b35e357 as a follow-up to Ie3761ffb4 which
was a previously rejected attempt to "fix" a similar warning that had
been raised by FindBugs.
Change-Id: I3f6729f954d45d30ce697356d2ab3cc877d3ad54
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
DfsFsck: Check that .gitmodules in the repository have valid contents
Previous commits block the addition to the repo of dangerous .gitmodules
files, but some could have been committed before those safeguards where
in place.
Add a check in DfsFsck to validate the .gitmodules files in the repo.
Use the same validation than the ReceivePack, translating the
results to FsckErrors.
Note that *all* .gitmodules files in the storage will be checked, not
only the latest version.
Change-Id: I040cf1f31a779419aad0292ba5e6e76eb7f32b66
Signed-off-by: Ivan Frade <ifrade@google.com>
Explicitly specify charset when constructing BufferedReader
Replace explicit construction of BufferedReader with calls to the
utility method Files.newBufferedReader, which allows to specify
the charset.
Change-Id: I61b9451dbc8d9cf83fc8a5981292b8fdc713ce37
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
As reported by Error Prone:
An inner class should be static unless it references members of its
enclosing class. An inner class that is made non-static unnecessarily
uses more memory and does not make the intent of the class clear.
See https://errorprone.info/bugpattern/ClassCanBeStatic
Change-Id: Ib99d120532630dba63cf400cc1c61c318286fc41
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Externalize warning message in RefDirectory.delete()
Change-Id: Icec16c01853a3f5ea016d454b3d48624498efcce
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
(cherry picked from commit 5e68fe245f)
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Suppress warning for trying to delete non-empty directory
This is actually a fairly common occurrence; deleting the parent
directories can work only if the file deleted was the last one
in the directory.
Bug: 537872
Change-Id: I86d1d45e1e2631332025ff24af8dfd46c9725711
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
(cherry picked from commit d9e767b431)
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
FS_POSIX.createNewFile(File) failed to properly implement atomic file
creation on NFS using the algorithm [1]:
- name of the hard link must be unique to prevent that two processes
using different NFS clients try to create the same link. This would
render nlink useless to detect if there was a race.
- the hard link must be retained for the lifetime of the file since we
don't know when the state of the involved NFS clients will be
synchronized. This depends on NFS configuration options.
To fix these issues we need to change the signature of createNewFile
which would break API. Hence deprecate the old method
FS.createNewFile(File) and add a new method createNewFileAtomic(File).
The new method returns a LockToken which needs to be retained by the
caller (LockFile) until all involved NFS clients synchronized their
state. Since we don't know when the NFS caches are synchronized we need
to retain the token until the corresponding file is no longer needed.
The LockToken must be closed after the LockFile using it has been
committed or unlocked. On Posix, if core.supportsAtomicCreateNewFile =
false this will delete the hard link which guarded the atomic creation
of the file. When acquiring the lock fails ensure that the hard link is
removed.
[1] https://www.time-travellers.org/shane/papers/NFS_considered_harmful.html
also see file creation flag O_EXCL in
http://man7.org/linux/man-pages/man2/open.2.html
Change-Id: I84fcb16143a5f877e9b08c6ee0ff8fa4ea68a90d
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
GC: Avoid logging errors when deleting non-empty folders
I88304d34c and Ia555bce00 modified the way errors are handled when
trying to delete non-empty reference folders. Before, this error was
silently ignored as it was considered an expected output. Now, every
failed folder delete is logged which can be noisy.
Ignore the DirectoryNotEmptyException but log any other error avoiding
deletion of an eligible folder.
Signed-off-by: Hector Oswaldo Caballero <hector.caballero@ericsson.com>
Change-Id: I194512f67885231d62c03976ae683e5cc450ec7c
Fix NoSuchFileException in GC.deleteTempPacksIdx()
This exception is thrown in GC.deleteTempPacksIdx() if the repository
has no packs.
Bug: 538286
Change-Id: Ieb482be751226baf0843068a0f847e0cdc6e0cb6
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
On recent VMs, collection.toArray(new T[0]) is faster than
collection.toArray(new T[collection.size()]). Since it is also more
readable, it should now be the preferred way of collection to array
conversion.
https://shipilev.net/blog/2016/arrays-wisdom-ancients/
Change-Id: I80388532fb4b2b0663ee1fe8baa94f5df55c8442
Signed-off-by: Michael Keppler <Michael.Keppler@gmx.de>
Suppress warning for trying to delete non-empty directory
This is actually a fairly common occurrence; deleting the parent
directories can work only if the file deleted was the last one
in the directory.
Bug: 537872
Change-Id: I86d1d45e1e2631332025ff24af8dfd46c9725711
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Since I3870cadb4, GC task was always delegated to an executor even when
background option was set to false. This was an issue because if more
than one GC object was instantiated and executed in parallel, only one GC
was actually running because of the single thread executor.
Change-Id: I8c587d22d63c1601b7d75914692644a385cd86d6
Signed-off-by: Hugo Arès <hugo.ares@ericsson.com>
Remove completely the empty directories under refs/<namespace>
including the first level partition of the changes, when they are
completely empty.
Bug: 536777
Change-Id: I88304d34cc42435919c2d1480258684d993dfdca
Signed-off-by: Luca Milanesio <luca.milanesio@gmail.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Use java.nio to delete path to get detailed errors
Get the full IOException of the reason why a directory
cannot be removed during GC.
Change-Id: Ia555bce009fa48087a73d677f1ce3b9c0b685b57
Signed-off-by: Luca Milanesio <luca.milanesio@gmail.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Always send refs' objects despite "filter" in pack
In a0c9016abd ("upload-pack: send refs' objects despite "filter"",
2018-07-09), Git updated the "filter" option in the fetch-pack
upload-pack protocol to not filter objects explicitly specified in
"want" lines, even if they match the criterion of the filter. Update
JGit to match that behavior.
Change-Id: Ia4d74326edb89e61062e397e05483298c50f9232
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
GC: Trim more EWAHCompressedBitmaps to free unused memory
04b9f4436 fixed places where compressed bitmaps were holding on to their
full buffers, but missed this StoredBitmap.getBitmap() case where a
bitmap is resonstituted from an xor chain.
Change-Id: I7cf75d9e49c18a1a8a880a4df7e821502edc68a4
Signed-off-by: Terry Parker <tparker@google.com>
Make Reftable seek* and has* method names more consistent
Make the method names more consistent and their semantics simpler:
hasRef and seekRef to look up a single exact reference by name and
hasRefsByPrefix and seekRefsByPrefix to look up multiple references by
name prefix.
In particular, splitting hasRef into two separate methods for its
different uses makes DfsReftableDatabase.isNameConflicting easier to
follow.
[jn: fleshed out commit message]
Change-Id: I71106068ff3ec4f7e14dd9eb6ee6b5fab8d14d0b
Signed-off-by: Minh Thai <mthai@google.com>
Signed-off-by: Jonathan Nieder <jrn@google.com>
Reftable implementation of RefDatabase.getRefsByPrefix() should be
more performant, as references are filtered directly by prefix;
instead of fetching the whole subtree then filter by prefix.
Change-Id: If4f5f8c08285ea1eaec9efb83c3d864cea7a1321
Signed-off-by: Minh Thai <mthai@google.com>
GC: Trim EWAHCompressedBitmaps to free unused memory
The "Building bitmaps" GC phase fails for large repositories (repos with
10M objects use 1.25MB per uncompressed bitmap, and those with long
histories may build >25k bitmaps). Since these bitmaps xor well against
each other, the actual space needed for each compressed bitmap is
usually no more than a few KB. Calling trim() will ensure we aren't
holding on to excess memory.
Change-Id: I40bf78c730b9f6051da6025f9777ce27220a5b0a
Signed-off-by: Terry Parker <tparker@google.com>
This may be convenient for downstream implementers who require a dummy
StoredConfig implementation, rather than making them reimplement the two
abstract StoredConfig methods.
Change-Id: I2b7bc6250d722c2b95d9f99e4eff1e5bf97cb567
After packaging references, the folders containing these references are
not deleted. In a busy repository, this causes operations to slow down
as traversing the references tree becomes longer.
Delete empty reference folders after the loose references have been
packed.
To avoid deleting a folder that was just created by another concurrent
operation, only delete folders that were not modified in the last 30
seconds.
Signed-off-by: Hector Oswaldo Caballero <hector.caballero@ericsson.com>
Change-Id: Ie79447d6121271cf5e25171be377ea396c7028e0
Signed-off-by: Luca Milanesio <luca.milanesio@gmail.com>
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Log as warning when an attempt to remove a directory
fails. This helps troubleshooting some bugs like the GC leaving
behind empty directories.
Change-Id: Idb94ce17f8be9668a970c7ecae31436bf434073c
Signed-off-by: Luca Milanesio <luca.milanesio@gmail.com>
Fix a GC scalability issue when selecting commit bitmaps
The previous algorithm selected commits by creating bitmaps at
each branch tip, doing a revwalk to populate each bitmap, and
looping in this way:
1) Select the remaining branch with the most commits (the branch
whose bitmap has the highest cardinality)
2) Select well-spaced bitmaps in that branch
3) Remove commits in the selected branch from the remaining
branch-tip bitmaps
4) Repeat at #1
This algorithm gave good commit selection on all branches but
a more uniform selection on "important" branches, where branch
length is the proxy for "important". However the algorithm
required N bitmaps of size M solely for the purpose of commit
selection, where N is the number of branch tips in the primary
GC pack, and M is the number of objects in the pack.
This new algorithm uses branch modification date as the proxy for
"important" branches, replacing the N*M memory allocation with a
single M-sized bitmap and N revwalks from new branch tips to
shared history (which will be short when there is a lot of shared
history).
GcCommitSelectionTest.testDistributionOnMultipleBranches verifies
that this algorithm still yields good coverage on all branches.
Change-Id: Ib6019b102b67eabb379e6b85623e4b5549590e6e
Signed-off-by: Terry Parker <tparker@google.com>
From the javadoc for Files.list:
"The returned stream encapsulates a DirectoryStream. If timely disposal
of file system resources is required, the try-with-resources construct
should be used to ensure that the stream's close method is invoked
after the stream operations are completed."
This is the only call to Files#newDirectoryStream that is not already in
a try-with-resources.
Change-Id: I91e6c56b5d74e8435457ad6ed9e6b4b24d2aa14e
(cherry picked from commit 1c16ea4601)