Don't use interruptable pread() to access pack files
The J2SE NIO APIs require that FileChannel close the underlying file
descriptor if a thread is interrupted while it is inside of a read or
write operation on that channel. This is insane, because it means we
cannot share the file descriptor between threads. If a thread is in
the middle of the FileChannel variant of IO.readFully() and it
receives an interrupt, the pack will be automatically closed on us.
This causes the other threads trying to use that same FileChannel to
receive IOExceptions, which leads to the pack getting marked as
invalid. Once the pack is marked invalid, JGit loses access to its
entire contents and starts to report MissingObjectExceptions.
Because PackWriter must ensure that the chosen pack file stays
available until the current object's data is fully copied to the
output, JGit cannot simply reopen the pack when its automatically
closed due to an interrupt being sent at the wrong time. The pack may
have been deleted by a concurrent `git gc` process, and that open file
descriptor might be the last reference to the inode on disk. Once its
closed, the PackWriter loses access to that object representation, and
it cannot complete sending the object the client.
Fortunately, RandomAccessFile's readFully method does not have this
problem. Interrupts during readFully() are ignored. However, it
requires us to first seek to the offset we need to read, then issue
the read call. This requires locking around the file descriptor to
prevent concurrent threads from moving the pointer before the read.
This reduces the concurrency level, as now only one window can be
paged in at a time from each pack. However, the WindowCache should
already be holding most of the pages required to handle the working
set for a process, and its own internal locking was already limiting
us on the number of concurrent loads possible. Provided that most
concurrent accesses are getting hits in the WindowCache, or are for
different repositories on the same server, we shouldn't see a major
performance hit due to the more serialized loading.
I would have preferred to use a pool of RandomAccessFiles for each
pack, with threads borrowing an instance dedicated to that thread
whenever they needed to page in a window. This would permit much
higher levels of concurrency by using multiple file descriptors (and
file pointers) for each pack. However the code became too complex to
develop in any reasonable period of time, so I've chosen to retrofit
the existing code with more serialization instead.
Bug: 308945
Change-Id: I2e6e11c6e5a105e5aef68871b66200fd725134c9
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 年之前 Don't use interruptable pread() to access pack files
The J2SE NIO APIs require that FileChannel close the underlying file
descriptor if a thread is interrupted while it is inside of a read or
write operation on that channel. This is insane, because it means we
cannot share the file descriptor between threads. If a thread is in
the middle of the FileChannel variant of IO.readFully() and it
receives an interrupt, the pack will be automatically closed on us.
This causes the other threads trying to use that same FileChannel to
receive IOExceptions, which leads to the pack getting marked as
invalid. Once the pack is marked invalid, JGit loses access to its
entire contents and starts to report MissingObjectExceptions.
Because PackWriter must ensure that the chosen pack file stays
available until the current object's data is fully copied to the
output, JGit cannot simply reopen the pack when its automatically
closed due to an interrupt being sent at the wrong time. The pack may
have been deleted by a concurrent `git gc` process, and that open file
descriptor might be the last reference to the inode on disk. Once its
closed, the PackWriter loses access to that object representation, and
it cannot complete sending the object the client.
Fortunately, RandomAccessFile's readFully method does not have this
problem. Interrupts during readFully() are ignored. However, it
requires us to first seek to the offset we need to read, then issue
the read call. This requires locking around the file descriptor to
prevent concurrent threads from moving the pointer before the read.
This reduces the concurrency level, as now only one window can be
paged in at a time from each pack. However, the WindowCache should
already be holding most of the pages required to handle the working
set for a process, and its own internal locking was already limiting
us on the number of concurrent loads possible. Provided that most
concurrent accesses are getting hits in the WindowCache, or are for
different repositories on the same server, we shouldn't see a major
performance hit due to the more serialized loading.
I would have preferred to use a pool of RandomAccessFiles for each
pack, with threads borrowing an instance dedicated to that thread
whenever they needed to page in a window. This would permit much
higher levels of concurrency by using multiple file descriptors (and
file pointers) for each pack. However the code became too complex to
develop in any reasonable period of time, so I've chosen to retrofit
the existing code with more serialization instead.
Bug: 308945
Change-Id: I2e6e11c6e5a105e5aef68871b66200fd725134c9
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
14 年之前 |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192 |
- /*
- * Copyright (C) 2009-2010, Google Inc. and others
- *
- * This program and the accompanying materials are made available under the
- * terms of the Eclipse Distribution License v. 1.0 which is available at
- * https://www.eclipse.org/org/documents/edl-v10.php.
- *
- * SPDX-License-Identifier: BSD-3-Clause
- */
-
- package org.eclipse.jgit.http.server;
-
- import static javax.servlet.http.HttpServletResponse.SC_PARTIAL_CONTENT;
- import static javax.servlet.http.HttpServletResponse.SC_REQUESTED_RANGE_NOT_SATISFIABLE;
- import static org.eclipse.jgit.util.HttpSupport.HDR_ACCEPT_RANGES;
- import static org.eclipse.jgit.util.HttpSupport.HDR_CONTENT_LENGTH;
- import static org.eclipse.jgit.util.HttpSupport.HDR_CONTENT_RANGE;
- import static org.eclipse.jgit.util.HttpSupport.HDR_IF_RANGE;
- import static org.eclipse.jgit.util.HttpSupport.HDR_RANGE;
-
- import java.io.EOFException;
- import java.io.File;
- import java.io.FileNotFoundException;
- import java.io.IOException;
- import java.io.OutputStream;
- import java.io.RandomAccessFile;
- import java.text.MessageFormat;
- import java.time.Instant;
- import java.util.Enumeration;
-
- import javax.servlet.http.HttpServletRequest;
- import javax.servlet.http.HttpServletResponse;
-
- import org.eclipse.jgit.lib.ObjectId;
- import org.eclipse.jgit.util.FS;
-
- /**
- * Dumps a file over HTTP GET (or its information via HEAD).
- * <p>
- * Supports a single byte range requested via {@code Range} HTTP header. This
- * feature supports a dumb client to resume download of a larger object file.
- */
- final class FileSender {
- private final File path;
-
- private final RandomAccessFile source;
-
- private final Instant lastModified;
-
- private final long fileLen;
-
- private long pos;
-
- private long end;
-
- FileSender(File path) throws FileNotFoundException {
- this.path = path;
- this.source = new RandomAccessFile(path, "r");
-
- try {
- this.lastModified = FS.DETECTED.lastModifiedInstant(path);
- this.fileLen = source.getChannel().size();
- this.end = fileLen;
- } catch (IOException e) {
- try {
- source.close();
- } catch (IOException closeError) {
- // Ignore any error closing the stream.
- }
-
- final FileNotFoundException r;
- r = new FileNotFoundException(MessageFormat.format(HttpServerText.get().cannotGetLengthOf, path));
- r.initCause(e);
- throw r;
- }
- }
-
- void close() {
- try {
- source.close();
- } catch (IOException e) {
- // Ignore close errors on a read-only stream.
- }
- }
-
- Instant getLastModified() {
- return lastModified;
- }
-
- String getTailChecksum() throws IOException {
- final int n = 20;
- final byte[] buf = new byte[n];
- source.seek(fileLen - n);
- source.readFully(buf, 0, n);
- return ObjectId.fromRaw(buf).getName();
- }
-
- void serve(final HttpServletRequest req, final HttpServletResponse rsp,
- final boolean sendBody) throws IOException {
- if (!initRangeRequest(req, rsp)) {
- rsp.sendError(SC_REQUESTED_RANGE_NOT_SATISFIABLE);
- return;
- }
-
- rsp.setHeader(HDR_ACCEPT_RANGES, "bytes");
- rsp.setHeader(HDR_CONTENT_LENGTH, Long.toString(end - pos));
-
- if (sendBody) {
- try (OutputStream out = rsp.getOutputStream()) {
- final byte[] buf = new byte[4096];
- source.seek(pos);
- while (pos < end) {
- final int r = (int) Math.min(buf.length, end - pos);
- final int n = source.read(buf, 0, r);
- if (n < 0) {
- throw new EOFException(MessageFormat.format(HttpServerText.get().unexpectedeOFOn, path));
- }
- out.write(buf, 0, n);
- pos += n;
- }
- out.flush();
- }
- }
- }
-
- private boolean initRangeRequest(final HttpServletRequest req,
- final HttpServletResponse rsp) throws IOException {
- final Enumeration<String> rangeHeaders = getRange(req);
- if (!rangeHeaders.hasMoreElements()) {
- // No range headers, the request is fine.
- return true;
- }
-
- final String range = rangeHeaders.nextElement();
- if (rangeHeaders.hasMoreElements()) {
- // To simplify the code we support only one range.
- return false;
- }
-
- final int eq = range.indexOf('=');
- final int dash = range.indexOf('-');
- if (eq < 0 || dash < 0 || !range.startsWith("bytes=")) {
- return false;
- }
-
- final String ifRange = req.getHeader(HDR_IF_RANGE);
- if (ifRange != null && !getTailChecksum().equals(ifRange)) {
- // If the client asked us to verify the ETag and its not
- // what they expected we need to send the entire content.
- return true;
- }
-
- try {
- if (eq + 1 == dash) {
- // "bytes=-500" means last 500 bytes
- pos = Long.parseLong(range.substring(dash + 1));
- pos = fileLen - pos;
- } else {
- // "bytes=500-" (position 500 to end)
- // "bytes=500-1000" (position 500 to 1000)
- pos = Long.parseLong(range.substring(eq + 1, dash));
- if (dash < range.length() - 1) {
- end = Long.parseLong(range.substring(dash + 1));
- end++; // range was inclusive, want exclusive
- }
- }
- } catch (NumberFormatException e) {
- // We probably hit here because of a non-digit such as
- // "," appearing at the end of the first range telling
- // us there is a second range following. To simplify
- // the code we support only one range.
- return false;
- }
-
- if (end > fileLen) {
- end = fileLen;
- }
- if (pos >= end) {
- return false;
- }
-
- rsp.setStatus(SC_PARTIAL_CONTENT);
- rsp.setHeader(HDR_CONTENT_RANGE, "bytes " + pos + "-" + (end - 1) + "/"
- + fileLen);
- source.seek(pos);
- return true;
- }
-
- private static Enumeration<String> getRange(HttpServletRequest req) {
- return req.getHeaders(HDR_RANGE);
- }
- }
|