Rewrite reference handling to be abstract and accurate
This commit actually does three major changes to the way references
are handled within JGit. Unfortunately they were easier to do as
a single massive commit than to break them up into smaller units.
Disambiguate symbolic references:
---------------------------------
Reporting a symbolic reference such as HEAD as though it were
any other normal reference like refs/heads/master causes subtle
programming errors. We have been bitten by this error on several
occasions, as have some downstream applications written by myself.
Instead of reporting HEAD as a reference whose name differs from
its "original name", report it as an actual SymbolicRef object
that the application can test the type and examine the target of.
With this change, Ref is now an abstract type with different
subclasses for the different types.
In the classical example of "HEAD" being a symbolic reference to
branch "refs/heads/master", the Repository.getAllRefs() method
will now return:
Map<String, Ref> all = repository.getAllRefs();
SymbolicRef HEAD = (SymbolicRef) all.get("HEAD");
ObjectIdRef master = (ObjectIdRef) all.get("refs/heads/master");
assertSame(master, HEAD.getTarget());
assertSame(master.getObjectId(), HEAD.getObjectId());
assertEquals("HEAD", HEAD.getName());
assertEquals("refs/heads/master", master.getName());
A nice side-effect of this change is the storage type of the
symbolic reference is no longer ambiguous with the storge type
of the underlying reference it targets. In the above example,
if master was only available in the packed-refs file, then the
following is also true:
assertSame(Ref.Storage.LOOSE, HEAD.getStorage());
assertSame(Ref.Storage.PACKED, master.getStorage());
(Prior to this change we returned the ambiguous storage of
LOOSE_PACKED for HEAD, which was confusing since it wasn't
actually true on disk).
Another nice side-effect of this change is all intermediate
symbolic references are preserved, and are therefore visible
to the application when they walk the target chain. We can
now correctly inspect chains of symbolic references.
As a result of this change the Ref.getOrigName() method has been
removed from the API. Applications should identify a symbolic
reference by testing for isSymbolic() and not by using an arcane
string comparsion between properties.
Abstract the RefDatabase storage:
---------------------------------
RefDatabase is now abstract, similar to ObjectDatabase, and a
new concrete implementation called RefDirectory is used for the
traditional on-disk storage layout. In the future we plan to
support additional implementations, such as a pure in-memory
RefDatabase for unit testing purposes.
Optimize RefDirectory:
----------------------
The implementation of the in-memory reference cache, reading, and
update routines has been completely rewritten. Much of the code
was heavily borrowed or cribbed from the prior implementation,
so copyright notices have been left intact as much as possible.
The RefDirectory cache no longer confuses symbolic references
with normal references. This permits the cache to resolve the
value of a symbolic reference as late as possible, ensuring it
is always current, without needing to maintain reverse pointers.
The cache is now 2 sorted RefLists, rather than 3 HashMaps.
Using sorted lists allows the implementation to reduce the
in-memory footprint when storing many refs. Using specialized
types for the elements allows the code to avoid additional map
lookups for auxiliary stat information.
To improve scan time during getRefs(), the lists are returned via
a copy-on-write contract. Most callers of getRefs() do not modify
the returned collections, so the copy-on-write semantics improves
access on repositories with a large number of packed references.
Iterator traversals of the returned Map<String,Ref> are performed
using a simple merge-join of the two cache lists, ensuring we can
perform the entire traversal in linear time as a function of the
number of references: O(PackedRefs + LooseRefs).
Scans of the loose reference space to update the cache run in
O(LooseRefs log LooseRefs) time, as the directory contents
are sorted before being merged against the in-memory cache.
Since the majority of stable references are kept packed, there
typically are only a handful of reference names to be sorted,
so the sorting cost should not be very high.
Locking is reduced during getRefs() by taking advantage of the
copy-on-write semantics of the improved cache data structure.
This permits concurrent readers to pull back references without
blocking each other. If there is contention updating the cache
during a scan, one or more updates are simply skipped and will
get picked up again in a future scan.
Writing to the $GIT_DIR/packed-refs during reference delete is
now fully atomic. The file is locked, reparsed fresh, and written
back out if a change is necessary. This avoids all race conditions
with concurrent external updates of the packed-refs file.
The RefLogWriter class has been fully folded into RefDirectory
and is therefore deleted. Maintaining the reference's log is
the responsiblity of the database implementation, and not all
implementations will use java.io for access.
Future work still remains to be done to abstract the ReflogReader
class away from local disk IO.
Change-Id: I26b9287c45a4b2d2be35ba2849daa316f5eec85d
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago Rewrite reference handling to be abstract and accurate
This commit actually does three major changes to the way references
are handled within JGit. Unfortunately they were easier to do as
a single massive commit than to break them up into smaller units.
Disambiguate symbolic references:
---------------------------------
Reporting a symbolic reference such as HEAD as though it were
any other normal reference like refs/heads/master causes subtle
programming errors. We have been bitten by this error on several
occasions, as have some downstream applications written by myself.
Instead of reporting HEAD as a reference whose name differs from
its "original name", report it as an actual SymbolicRef object
that the application can test the type and examine the target of.
With this change, Ref is now an abstract type with different
subclasses for the different types.
In the classical example of "HEAD" being a symbolic reference to
branch "refs/heads/master", the Repository.getAllRefs() method
will now return:
Map<String, Ref> all = repository.getAllRefs();
SymbolicRef HEAD = (SymbolicRef) all.get("HEAD");
ObjectIdRef master = (ObjectIdRef) all.get("refs/heads/master");
assertSame(master, HEAD.getTarget());
assertSame(master.getObjectId(), HEAD.getObjectId());
assertEquals("HEAD", HEAD.getName());
assertEquals("refs/heads/master", master.getName());
A nice side-effect of this change is the storage type of the
symbolic reference is no longer ambiguous with the storge type
of the underlying reference it targets. In the above example,
if master was only available in the packed-refs file, then the
following is also true:
assertSame(Ref.Storage.LOOSE, HEAD.getStorage());
assertSame(Ref.Storage.PACKED, master.getStorage());
(Prior to this change we returned the ambiguous storage of
LOOSE_PACKED for HEAD, which was confusing since it wasn't
actually true on disk).
Another nice side-effect of this change is all intermediate
symbolic references are preserved, and are therefore visible
to the application when they walk the target chain. We can
now correctly inspect chains of symbolic references.
As a result of this change the Ref.getOrigName() method has been
removed from the API. Applications should identify a symbolic
reference by testing for isSymbolic() and not by using an arcane
string comparsion between properties.
Abstract the RefDatabase storage:
---------------------------------
RefDatabase is now abstract, similar to ObjectDatabase, and a
new concrete implementation called RefDirectory is used for the
traditional on-disk storage layout. In the future we plan to
support additional implementations, such as a pure in-memory
RefDatabase for unit testing purposes.
Optimize RefDirectory:
----------------------
The implementation of the in-memory reference cache, reading, and
update routines has been completely rewritten. Much of the code
was heavily borrowed or cribbed from the prior implementation,
so copyright notices have been left intact as much as possible.
The RefDirectory cache no longer confuses symbolic references
with normal references. This permits the cache to resolve the
value of a symbolic reference as late as possible, ensuring it
is always current, without needing to maintain reverse pointers.
The cache is now 2 sorted RefLists, rather than 3 HashMaps.
Using sorted lists allows the implementation to reduce the
in-memory footprint when storing many refs. Using specialized
types for the elements allows the code to avoid additional map
lookups for auxiliary stat information.
To improve scan time during getRefs(), the lists are returned via
a copy-on-write contract. Most callers of getRefs() do not modify
the returned collections, so the copy-on-write semantics improves
access on repositories with a large number of packed references.
Iterator traversals of the returned Map<String,Ref> are performed
using a simple merge-join of the two cache lists, ensuring we can
perform the entire traversal in linear time as a function of the
number of references: O(PackedRefs + LooseRefs).
Scans of the loose reference space to update the cache run in
O(LooseRefs log LooseRefs) time, as the directory contents
are sorted before being merged against the in-memory cache.
Since the majority of stable references are kept packed, there
typically are only a handful of reference names to be sorted,
so the sorting cost should not be very high.
Locking is reduced during getRefs() by taking advantage of the
copy-on-write semantics of the improved cache data structure.
This permits concurrent readers to pull back references without
blocking each other. If there is contention updating the cache
during a scan, one or more updates are simply skipped and will
get picked up again in a future scan.
Writing to the $GIT_DIR/packed-refs during reference delete is
now fully atomic. The file is locked, reparsed fresh, and written
back out if a change is necessary. This avoids all race conditions
with concurrent external updates of the packed-refs file.
The RefLogWriter class has been fully folded into RefDirectory
and is therefore deleted. Maintaining the reference's log is
the responsiblity of the database implementation, and not all
implementations will use java.io for access.
Future work still remains to be done to abstract the ReflogReader
class away from local disk IO.
Change-Id: I26b9287c45a4b2d2be35ba2849daa316f5eec85d
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 years ago |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383 |
- /*
- * Copyright (C) 2008, Shawn O. Pearce <spearce@spearce.org>
- * and other copyright owners as documented in the project's IP log.
- *
- * This program and the accompanying materials are made available
- * under the terms of the Eclipse Distribution License v1.0 which
- * accompanies this distribution, is reproduced below, and is
- * available at http://www.eclipse.org/org/documents/edl-v10.php
- *
- * All rights reserved.
- *
- * Redistribution and use in source and binary forms, with or
- * without modification, are permitted provided that the following
- * conditions are met:
- *
- * - Redistributions of source code must retain the above copyright
- * notice, this list of conditions and the following disclaimer.
- *
- * - Redistributions in binary form must reproduce the above
- * copyright notice, this list of conditions and the following
- * disclaimer in the documentation and/or other materials provided
- * with the distribution.
- *
- * - Neither the name of the Eclipse Foundation, Inc. nor the
- * names of its contributors may be used to endorse or promote
- * products derived from this software without specific prior
- * written permission.
- *
- * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
- * CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
- * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
- * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
- * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
- * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
- * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
- * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
- * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
- * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
- * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
- * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
- package org.eclipse.jgit.transport;
-
- import static org.eclipse.jgit.transport.WalkRemoteObjectDatabase.ROOT_DIR;
-
- import java.io.IOException;
- import java.io.OutputStream;
- import java.util.ArrayList;
- import java.util.Collection;
- import java.util.LinkedHashMap;
- import java.util.List;
- import java.util.Map;
- import java.util.TreeMap;
-
- import org.eclipse.jgit.errors.TransportException;
- import org.eclipse.jgit.lib.AnyObjectId;
- import org.eclipse.jgit.lib.Constants;
- import org.eclipse.jgit.lib.ObjectId;
- import org.eclipse.jgit.lib.ObjectIdRef;
- import org.eclipse.jgit.lib.PackWriter;
- import org.eclipse.jgit.lib.ProgressMonitor;
- import org.eclipse.jgit.lib.Ref;
- import org.eclipse.jgit.lib.RefWriter;
- import org.eclipse.jgit.lib.Repository;
- import org.eclipse.jgit.lib.Ref.Storage;
- import org.eclipse.jgit.transport.RemoteRefUpdate.Status;
-
- /**
- * Generic push support for dumb transport protocols.
- * <p>
- * Since there are no Git-specific smarts on the remote side of the connection
- * the client side must handle everything on its own. The generic push support
- * requires being able to delete, create and overwrite files on the remote side,
- * as well as create any missing directories (if necessary). Typically this can
- * be handled through an FTP style protocol.
- * <p>
- * Objects not on the remote side are uploaded as pack files, using one pack
- * file per invocation. This simplifies the implementation as only two data
- * files need to be written to the remote repository.
- * <p>
- * Push support supplied by this class is not multiuser safe. Concurrent pushes
- * to the same repository may yield an inconsistent reference database which may
- * confuse fetch clients.
- * <p>
- * A single push is concurrently safe with multiple fetch requests, due to the
- * careful order of operations used to update the repository. Clients fetching
- * may receive transient failures due to short reads on certain files if the
- * protocol does not support atomic file replacement.
- *
- * @see WalkRemoteObjectDatabase
- */
- class WalkPushConnection extends BaseConnection implements PushConnection {
- /** The repository this transport pushes out of. */
- private final Repository local;
-
- /** Location of the remote repository we are writing to. */
- private final URIish uri;
-
- /** Database connection to the remote repository. */
- private final WalkRemoteObjectDatabase dest;
-
- /**
- * Packs already known to reside in the remote repository.
- * <p>
- * This is a LinkedHashMap to maintain the original order.
- */
- private LinkedHashMap<String, String> packNames;
-
- /** Complete listing of refs the remote will have after our push. */
- private Map<String, Ref> newRefs;
-
- /**
- * Updates which require altering the packed-refs file to complete.
- * <p>
- * If this collection is non-empty then any refs listed in {@link #newRefs}
- * with a storage class of {@link Storage#PACKED} will be written.
- */
- private Collection<RemoteRefUpdate> packedRefUpdates;
-
- WalkPushConnection(final WalkTransport walkTransport,
- final WalkRemoteObjectDatabase w) {
- Transport t = (Transport)walkTransport;
- local = t.local;
- uri = t.getURI();
- dest = w;
- }
-
- public void push(final ProgressMonitor monitor,
- final Map<String, RemoteRefUpdate> refUpdates)
- throws TransportException {
- markStartedOperation();
- packNames = null;
- newRefs = new TreeMap<String, Ref>(getRefsMap());
- packedRefUpdates = new ArrayList<RemoteRefUpdate>(refUpdates.size());
-
- // Filter the commands and issue all deletes first. This way we
- // can correctly handle a directory being cleared out and a new
- // ref using the directory name being created.
- //
- final List<RemoteRefUpdate> updates = new ArrayList<RemoteRefUpdate>();
- for (final RemoteRefUpdate u : refUpdates.values()) {
- final String n = u.getRemoteName();
- if (!n.startsWith("refs/") || !Repository.isValidRefName(n)) {
- u.setStatus(Status.REJECTED_OTHER_REASON);
- u.setMessage("funny refname");
- continue;
- }
-
- if (AnyObjectId.equals(ObjectId.zeroId(), u.getNewObjectId()))
- deleteCommand(u);
- else
- updates.add(u);
- }
-
- // If we have any updates we need to upload the objects first, to
- // prevent creating refs pointing at non-existent data. Then we
- // can update the refs, and the info-refs file for dumb transports.
- //
- if (!updates.isEmpty())
- sendpack(updates, monitor);
- for (final RemoteRefUpdate u : updates)
- updateCommand(u);
-
- // Is this a new repository? If so we should create additional
- // metadata files so it is properly initialized during the push.
- //
- if (!updates.isEmpty() && isNewRepository())
- createNewRepository(updates);
-
- RefWriter refWriter = new RefWriter(newRefs.values()) {
- @Override
- protected void writeFile(String file, byte[] content)
- throws IOException {
- dest.writeFile(ROOT_DIR + file, content);
- }
- };
- if (!packedRefUpdates.isEmpty()) {
- try {
- refWriter.writePackedRefs();
- for (final RemoteRefUpdate u : packedRefUpdates)
- u.setStatus(Status.OK);
- } catch (IOException err) {
- for (final RemoteRefUpdate u : packedRefUpdates) {
- u.setStatus(Status.REJECTED_OTHER_REASON);
- u.setMessage(err.getMessage());
- }
- throw new TransportException(uri, "failed updating refs", err);
- }
- }
-
- try {
- refWriter.writeInfoRefs();
- } catch (IOException err) {
- throw new TransportException(uri, "failed updating refs", err);
- }
- }
-
- @Override
- public void close() {
- dest.close();
- }
-
- private void sendpack(final List<RemoteRefUpdate> updates,
- final ProgressMonitor monitor) throws TransportException {
- String pathPack = null;
- String pathIdx = null;
-
- try {
- final PackWriter pw = new PackWriter(local, monitor);
- final List<ObjectId> need = new ArrayList<ObjectId>();
- final List<ObjectId> have = new ArrayList<ObjectId>();
- for (final RemoteRefUpdate r : updates)
- need.add(r.getNewObjectId());
- for (final Ref r : getRefs()) {
- have.add(r.getObjectId());
- if (r.getPeeledObjectId() != null)
- have.add(r.getPeeledObjectId());
- }
- pw.preparePack(need, have);
-
- // We don't have to continue further if the pack will
- // be an empty pack, as the remote has all objects it
- // needs to complete this change.
- //
- if (pw.getObjectsNumber() == 0)
- return;
-
- packNames = new LinkedHashMap<String, String>();
- for (final String n : dest.getPackNames())
- packNames.put(n, n);
-
- final String base = "pack-" + pw.computeName().name();
- final String packName = base + ".pack";
- pathPack = "pack/" + packName;
- pathIdx = "pack/" + base + ".idx";
-
- if (packNames.remove(packName) != null) {
- // The remote already contains this pack. We should
- // remove the index before overwriting to prevent bad
- // offsets from appearing to clients.
- //
- dest.writeInfoPacks(packNames.keySet());
- dest.deleteFile(pathIdx);
- }
-
- // Write the pack file, then the index, as readers look the
- // other direction (index, then pack file).
- //
- final String wt = "Put " + base.substring(0, 12);
- OutputStream os = dest.writeFile(pathPack, monitor, wt + "..pack");
- try {
- pw.writePack(os);
- } finally {
- os.close();
- }
-
- os = dest.writeFile(pathIdx, monitor, wt + "..idx");
- try {
- pw.writeIndex(os);
- } finally {
- os.close();
- }
-
- // Record the pack at the start of the pack info list. This
- // way clients are likely to consult the newest pack first,
- // and discover the most recent objects there.
- //
- final ArrayList<String> infoPacks = new ArrayList<String>();
- infoPacks.add(packName);
- infoPacks.addAll(packNames.keySet());
- dest.writeInfoPacks(infoPacks);
-
- } catch (IOException err) {
- safeDelete(pathIdx);
- safeDelete(pathPack);
-
- throw new TransportException(uri, "cannot store objects", err);
- }
- }
-
- private void safeDelete(final String path) {
- if (path != null) {
- try {
- dest.deleteFile(path);
- } catch (IOException cleanupFailure) {
- // Ignore the deletion failure. We probably are
- // already failing and were just trying to pick
- // up after ourselves.
- }
- }
- }
-
- private void deleteCommand(final RemoteRefUpdate u) {
- final Ref r = newRefs.remove(u.getRemoteName());
- if (r == null) {
- // Already gone.
- //
- u.setStatus(Status.OK);
- return;
- }
-
- if (r.getStorage().isPacked())
- packedRefUpdates.add(u);
-
- if (r.getStorage().isLoose()) {
- try {
- dest.deleteRef(u.getRemoteName());
- u.setStatus(Status.OK);
- } catch (IOException e) {
- u.setStatus(Status.REJECTED_OTHER_REASON);
- u.setMessage(e.getMessage());
- }
- }
-
- try {
- dest.deleteRefLog(u.getRemoteName());
- } catch (IOException e) {
- u.setStatus(Status.REJECTED_OTHER_REASON);
- u.setMessage(e.getMessage());
- }
- }
-
- private void updateCommand(final RemoteRefUpdate u) {
- try {
- dest.writeRef(u.getRemoteName(), u.getNewObjectId());
- newRefs.put(u.getRemoteName(), new ObjectIdRef.Unpeeled(
- Storage.LOOSE, u.getRemoteName(), u.getNewObjectId()));
- u.setStatus(Status.OK);
- } catch (IOException e) {
- u.setStatus(Status.REJECTED_OTHER_REASON);
- u.setMessage(e.getMessage());
- }
- }
-
- private boolean isNewRepository() {
- return getRefsMap().isEmpty() && packNames != null
- && packNames.isEmpty();
- }
-
- private void createNewRepository(final List<RemoteRefUpdate> updates)
- throws TransportException {
- try {
- final String ref = "ref: " + pickHEAD(updates) + "\n";
- final byte[] bytes = Constants.encode(ref);
- dest.writeFile(ROOT_DIR + Constants.HEAD, bytes);
- } catch (IOException e) {
- throw new TransportException(uri, "cannot create HEAD", e);
- }
-
- try {
- final String config = "[core]\n"
- + "\trepositoryformatversion = 0\n";
- final byte[] bytes = Constants.encode(config);
- dest.writeFile(ROOT_DIR + "config", bytes);
- } catch (IOException e) {
- throw new TransportException(uri, "cannot create config", e);
- }
- }
-
- private static String pickHEAD(final List<RemoteRefUpdate> updates) {
- // Try to use master if the user is pushing that, it is the
- // default branch and is likely what they want to remain as
- // the default on the new remote.
- //
- for (final RemoteRefUpdate u : updates) {
- final String n = u.getRemoteName();
- if (n.equals(Constants.R_HEADS + Constants.MASTER))
- return n;
- }
-
- // Pick any branch, under the assumption the user pushed only
- // one to the remote side.
- //
- for (final RemoteRefUpdate u : updates) {
- final String n = u.getRemoteName();
- if (n.startsWith(Constants.R_HEADS))
- return n;
- }
- return updates.get(0).getRemoteName();
- }
- }
|