The core.autocrlf variable can take on three values: false, true,
and input. Parsing it as a boolean is wrong, we instead need to
parse a tri-state enumeration.
Add support for parsing and setting enum values from Java from and
to the text based configuration file, and use that to handle the
autocrlf variable.
Bug: 301775
Change-Id: I81b9e33087a33d2ef2eac89ba93b9e83b7ecc223
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Use 8192 as default buffer size in ObjectLoader copyTo
As ObjectStreams are supposed to be buffered, most implementors will
be wrapping their underlying stream inside of a BufferedInputStream
in order to satisfy this requirement. Because developers are by
nature lazy, they will use the default buffer size rather than
specify their own.
The OpenJDk JRE implementations use 8192 as the default buffer
size, and when the higher level reader uses the same buffer size
the buffers "stack" nicely by avoiding a copy to the internal
buffer array. As OpenJDK is a popular virtual machine, we should
try to benefit from this nice stacking property during copyTo().
Change-Id: I69d53f273b870b841ced2be2e9debdfd987d98f4
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Instead of getting the limit from CoreConfig, use the larger of the
reader's limit or 5 MiB, under the assumption that any annotated tag
or commit of interest should be under 5 MiB. But if a repository
was really insane and had bigger objects, the reader implementation
can set its streaming limit higher in order to allow RevWalk to
still process it.
Change-Id: If2c15235daa3e2d1f7167e781aa83fedb5af9a30
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
canResetHead now returns true.
Resetting mixed / hard works in EGit in merging state.
Change-Id: I1512145bbd831bb9734528ce8b71b1701e3e6aa9
Signed-off-by: Jens Baumgart <jens.baumgart@sap.com>
Currently, if a branch is created that has special chars ('#' in the bug),
Config will surround the subsection name with double quotes during
it's toText method which will result in an invalid file after saving the
Config.
Bug: 318249
Change-Id: I0a642f52def42d936869e4aaaeb6999567901001
Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>
Use 3 different types of LargeObjectException for the 3 major ways
that we can fail to load an object. For each of these use a unique
string translation which describes the root cause better than just
the ObjectId.name() does.
Change-Id: I810c98d5691b74af9fc6cbd46fc9879e35a7bdca
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Buffer very large delta streams to reduce explosion of CPU work
Large delta streams are unpacked incrementally, but because a delta
can seek to a random position in the base to perform a copy we may
need to inflate the base repeatedly just to complete one delta.
So work around it by copying the base to a temporary file, and then
we can read from that temporary file using random seeks instead.
Its far more efficient because we now only need to inflate the
base once.
This is still really ugly because we have to dump to a temporary
file, but at least the code can successfully process a large
file without throwing OutOfMemoryError. If speed is an
issue, the user will need to increase the JVM heap and ensure
core.streamFileThreshold is set to a higher value, so we don't use
this code path as often.
Unfortunately we lose the "optimization" of skipping over portions
of a delta base that we don't actually need in the final result.
This is going to cause us to inflate and write to disk useless
regions that were deleted and do not appear in the final result.
We could later improve on our code by trying to flatten delta
instruction streams before we touch the bottom base object, and
then only store the portions of the base we really need for the
final result and that appear out-of-order. Since that is some
pretty complex code I'm punting on it for now and just doing this
simple whole-object buffering.
Because the process umask might be permitting other users to read
files we create, we put the temporary buffers into $GIT_DIR/objects.
We can reasonably assume that if a reader can read our temporary
buffer file in that directory, they can also read the base pack
file we are pulling it from and therefore its not a security breach
to expose the inflated content in a file. This requires a reader
to have write access to the repository, but only if the file is
really big. I'd rather err on the side of caution here and refuse
to read a very big file into /tmp than to possibly expose a secured
content because the Java 5 JVM won't let us create a protected
temporary file that only the current user can access.
Change-Id: I66fb80b08cbcaf0f65f2db0462c546a495a160dd
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
PersonIdent should be parsable for an invalid commit which
contains multiple authors, like "A <a@a.org>, B <b@b.org>".
PersonIdent(String) constructor now delegates to
RawParseUtils.parsePersonIdent().
Change-Id: Ie9798d36d9ecfcc0094ca795f5a44b003136eaf7
Increase the default streaming threshold to 15 MiB
Applying deltas in the large streaming mode is horrifically slow.
Trying to pack icu4c is impossible because a single 11 MiB file
sits on top of a 15 MiB file though a 10 deep delta chain, which
results in this very slow inflate process.
Upping the default limit to 15 MiB lets us process this large in a
reasonable time, but its still sufficiently low enough to prevent
exploding the heap of a very large process like Eclipse or Gerrit
Code Review.
We have to revisit the streaming delta application process and do
something much smarter, like flatten the delta chain before we apply
it to the base. But even that is ugly, I've seen a 155 MiB delta
sitting on top of a 450 MiB file to produce a 300 MiB result object.
If the chain is deep, we may have trouble flatting it down.
Change-Id: If5a0dcbf9d14ea683d75546f104b09bb8cd8fdbb
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
We didn't fully cover what we support and what we don't. It was
also a bit hard to follow the syntaxes supported. Clean that up
by documenting it.
Change-Id: I7b96fa6cbefcc2364a51f336712ad361ae42df2d
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
We can now resolve expressions that reference a path within a
commit, designating a specific revision of a specific tree or
file in the project.
Change-Id: Ie6a8be629d264d72209db894bd680c5900035cc0
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
We now match on the -gABBREV style output created by git describe
when its describing a non-tagged commit, and resolve that back to
the full ObjectId using the abbreviation resolution feature that
we already support.
Change-Id: Ib3033f9483d9e1c66c8bb721ff48d4485bcdaef1
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Throw AmbiguousObjectException during resolve if its ambiguous
Its wrong to return null if we are resolving an abbreviation and we
have proven it matches more than one object. We know how to resolve
it if we had more nybbles, as there are two or more objects with the
same prefix. Declare that to the caller quite clearly by giving them
an AmbiguousObjectException.
Change-Id: I01bb48e587e6d001b93da8575c2c81af3eda5a32
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Use limited getCachedBytes code to reduce duplication
Rather than duplicating this block everywhere, reuse the limited size
form of getCachedBytes to acquire the content of an object.
Change-Id: I2e26a823e6fd0964d8f8dbfaa0fc2e8834c179c1
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Add brute force byte array loading to ObjectLoader
Some algorithms are coded in a way that requires us to provide them
the entire object contents as a contiguous byte array. The parsers
in RevCommit and RevTag, or our RawText objects are really good
examples of these.
Instead of duplicating this logic everywhere, lets put it into the
base ObjectLoader type. That way the caller only needs to give us
their upper size bound, and we'll do the rest of the heavy work to
figure out if the object still fits within that bound, and get them
an array that has the complete contents.
Change-Id: Id95a7f79d2b97e39f6949370ccca2f2c9cfb1a0f
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
If the loader's stream is broken and returns to us more content
than it originally declared as the size of the object, don't
copy that onto the output stream. Instead throw EOFException
and abort fast. This way we don't follow an infinite stream,
but instead will at least stop when the size was reached.
Change-Id: I7ec0c470c875f03b1f12a74a9b4d2f6e73b659bb
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
If the stream is a delta decompression stream, getting the size
can be expensive. Its cheaper to get it from the stream itself
rather than from the object loader.
Change-Id: Ia7f0af98681f6d56ea419a48c6fa8eea09274b28
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
ObjectReader implementations are now responsible for creating the
unique abbreviation of an ObjectId, or for resolving an abbreviation
back to its full form. In this latter case the reader can offer up
multiple candidates to the caller, who may be able to disambiguate
them based on context.
Repository.resolve() doesn't take multiple candidates into account
right now, but it could in the future by looking for a remaining
^0 or ^{commit} suffix and take an expansion if there is only one
commit that matches the input abbreviation. It could also use
the distance from an annotated tag to resolve "tag-NNN-gcommit"
style strings that are often output by `git describe`.
Change-Id: Icd3250adc8177ae05278b858933afdca0cbbdb56
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
ObjectWriter is a deprecated API that people shouldn't be using.
So get rid of it in favor of the ObjectInserter API.
Change-Id: I6218bcb26b6b9ffb64e3e470dba5dca2e0a62fd4
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Move commit and tag formatting to CommitBuilder, TagBuilder
These objects should be responsible for their own formatting,
rather than delegating it to some obtuse type called ObjectInserter.
While we are at it, simplify the way we insert these into a database.
Passing in the type and calling format in application code turned
out to be a huge mistake in terms of ease-of-use of the insert API.
Change-Id: Id5bb95ee56aa2a002243e9b7853b84ec8df1d7bf
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Since these types no longer support reading, calling them a Builder
is a better description of what they do. They help the caller to
build a commit or a tag object.
Change-Id: I53cae5a800a66ea1721b0fe5e702599df31da05d
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Add documentation explaining how to read Commit and Tag
Since we stopped supporting these types for reading, but their
name is a natural candidate for someone to try and use in code,
explain where they should be looking instead.
Change-Id: I091a1b0ef71b842016020f938ba3161431aab9c9
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Perform automatic CRLF to LF conversion during WorkingTreeIterator
WorkingTreeIterator now optionally performs CRLF to LF conversion for
text files. A basic framework is left in place to support enabling
(or disabling) this feature based on gitattributes, and also to
support the more generic smudge/clean filter system. As there is
no gitattribute support yet in JGit this is left unimplemented,
but the mightNeedCleaning(), isBinary() and filterClean() methods
will provide reasonable places to plug that into in the future.
[sp: All bugs inside of WorkingTreeIterator are my fault, I wrote
most of it while cherry-picking this patch and building it on
top of Marc's original work.]
CQ: 4419
Bug: 301775
Change-Id: I0ca35cfbfe3f503729cbfc1d5034ad4abcd1097e
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
An ObjectReader implementation may be very slow for a single object,
but yet support bulk queries efficiently by batching multiple small
requests into a single larger request. This easily happens when the
reader is built on top of a database that is stored on another host,
as the network round-trip time starts to dominate the operation cost.
RevWalk, ObjectWalk, UploadPack and PackWriter are the first major
users of this new bulk interface, with the goal being to support an
efficient way to pack a repository for a fetch/clone client when the
source repository is stored in a high-latency storage system.
Processing the want/have lists is now done in bulk, to remove
the high costs associated with common ancestor negotiation.
PackWriter already performs object reuse selection in bulk, but it
now can also do the object size lookup and object counting phases
with higher efficiency. Actual object reuse, deltification, and
final output are still doing sequential lookups, making them a bit
more expensive to perform.
Change-Id: I4c966f84917482598012074c370b9831451404ee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
By giving the reader information about the roots of a revision
traversal, some readers may be able to prefetch information from
their backing store using background threads in order to reduce
data access latency. However this isn't typically necessary so
the default reader implementation doesn't react to the advice.
Change-Id: I72c6cbd05cff7d8506826015f50d9f57d5cda77e
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Storage implementations or application code using an ObjectReader
may want to access this constant without being inside of a subclass
of the reader.
Change-Id: I6c871a03d5846b9bb899de4d14a265e8b204d8e0
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This permits formatting in hex into an existing byte array
supplied by the caller, and mirrors our copyRawTo method
with the same parameter signature.
Change-Id: Ia078d83e338b09b903bfd2d04284e5283f885a19
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The Tag class now only supports the creation of an annotated tag
object. To read an annotated tag, applictions should use RevTag.
This permits us to have exactly one implementation, and RevTag's
is faster and more bug-free.
Change-Id: Ib573f7e15f36855112815269385c21dea532e2cf
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The Commit class now only supports the creation of a commit object.
To read a commit, applictions should use RevCommit. This permits
us to have exactly one implementation, and RevCommit's is faster
and more bug-free.
Change-Id: Ib573f7e15f36855112815269385c21dea532e2cf
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Correct PersonIdent hashCode() and equals() to ignore milliseconds
Git doesn't store millisecond accuracy in person identity lines,
so a line that we create in Java and round-trip through a Git object
wouldn't compare as being equal. Truncate to seconds when comparing
values to ensure the same identity is equal.
Change-Id: Ie4ebde64061f52c612714e89ad34de8ac2694b07
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Since equals() is now final and does not permit being overridden,
we should do the same thing with compareTo() to prevent different
subclasses from having different ordering behaviors. This could
lead to the same mess that we had with different equals() behaviors.
Change-Id: I35a849b6efccee5fe74cc5788a3566a1516004b7
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Since equals() is now final and does not permit being overridden,
we should do the same thing with hashCode() to prevent different
subclasses from having different hashing behaviors. This could
lead to the same mess that we had with different equals() behaviors.
Change-Id: I35a849b6efccee5fe74cc5788a3566a1516004b7
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Configuration change events were not being triggered, now they are
forwarded from the FileConfig up to the Repository's listeners.
Change-Id: Ida94a59f5a2b7fa8ae0126e33c13343275483ee5
Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
We should be more lenient when tagging without an
tagger or message. Currently, we will throw an NPE
which is incorrect behavior.
Change-Id: I04e30ce25a9432e4ca56c3f29658ecb24fb18d24
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
There was a duplicated getter and setter for tagger in Tag.
There's no needed to have two getters and setters that represent
the same things. The appropriate tests were updated also.
Change-Id: If46dc00c4c0f31ea4234c6d3bda3c03e6ebbafac
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Exclude ignored files from IndexDiff tree walk.
This makes EGit commit much faster.
Change-Id: I398499510c22c37667b7612db32eac3b31d325f0
Signed-off-by: Jens Baumgart <jens.baumgart@sap.com>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Add gitignore support to IndexDiff and use TreeWalk
IndexDiff was re-implemented and now uses TreeWalk instead
of GitIndex. Additionally, gitignore support and retrieval of
untracked files was added.
Change-Id: Ie6a8e04833c61d44c668c906b161202b200bb509
Signed-off-by: Jens Baumgart <jens.baumgart@sap.com>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
This refactoring permits applications to configure global per-process
settings for all packing and easily pass it through to per-request
PackWriters, ensuring that the process configuration overrides the
repository specific settings.
For example this might help in a daemon environment where the server
wants to cap the resources used to serve a dynamic upload pack
request, even though the repository's own pack.* settings might be
configured to be more aggressive. This allows fast but less bandwidth
efficient serving of clients, while still retaining good compression
through a cron managed `git gc`.
Change-Id: I58cc5e01b48924b1a99f79aa96c8150cdfc50846
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Meaningful error message when trying to check-out submodules
Currently, a NullPointerException occurs in this case. We should
instead throw a more meaningful Exception with a proper message.
This is a very "stupid" implementation which simply checks for
the existence of a ".gitmodules" file.
Bug: 300731
Bug: 306765
Bug: 308452
Bug: 314853
Change-Id: I155aa340a85cbc5d7d60da31dba199fc30689b67
Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>
Add methods to the Repository class which write into MERGE_HEAD
and MERGE_MSG files. Since we have the read methods in the same
class this seems to be the right place.
Change-Id: I5dd65306ceb06e008fcc71b37ca3a649632ba462
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Fix concurrent read / write issue in LockFile on Windows
LockFile.commit fails if another thread concurrently reads
the base file. The problem is fixed by retrying the rename
operation if it fails.
Change-Id: I6bb76ea7f2e6e90e3ddc45f9dd4d69bd1b6fa1eb
Bug: 308506
Signed-off-by: Jens Baumgart <jens.baumgart@sap.com>
Make StoredConfig an abstraction above FileBasedConfig
This exposes a load and save method, allowing a Repository to denote
that it has a persistent configuration of some kind which can be
accessed by the application, without needing to know exact details
of how its stored .
Change-Id: I7c414bc0f975b80f083084ea875eca25c75a07b2
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Fix concurrent read / write issue in GitIndex on Windows
GitIndex.write fails if another thread concurrently reads
the index file. The problem is fixed by retrying the rename
operation if it fails.
Bug: 311051
Change-Id: Ib243d2a90adae312712d02521de4834d06804944
Signed-off-by: Jens Baumgart <jens.baumgart@sap.com>
Smudge racily clean index entries by truncating length (like git.git)
To mark an entry racily clean we set its length to 0 (like native git
does). Entries which are not racily clean and have zero length can be
distinguished from racily clean entries by checking P_OBJECTID
against the SHA1 of empty content. When length is 0 and P_OBJECTID is
different from SHA1 of empty content we know the entry is marked
racily clean.
See http://dev.eclipse.org/mhonarc/lists/jgit-dev/msg00488.html
Change-Id: I689552931441ab51964b430b303160c9126b66af
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>