Base Java version for JGit is now Java 7. The java.text.Normalizer
class was available in Java 6. Reflection is no longer required to
normalize strings for Mac OS X.
Change-Id: I98e14b72629a7a729a2d40a3aa275932841274e8
Add fsck.allowInvalidPersonIdent to accept invalid author/committers
A larger than expected number of real-world repositories found on
the Internet contain invalid author, committer and tagger lines
in their history. Many of these seem to be caused by users misusing
the user.name and user.email fields, e.g.:
[user]
name = Au Thor <author@example.com>
email = author@example.com
that some version of Git (or a reimplementation thereof) copied
directly into the object header. These headers are not valid and
are rejected by a strict fsck, making it impossible to transfer
the repository with JGit/EGit.
Another form is an invalid committer line with double negative for
the time zone, e.g.
committer Au Thor <a@b> 1288373970 --700
The real world is messy. :(
Allow callers and users to weaken the fsck settings to accept these
sorts of breakages if they really want to work on a repo that has
broken history. Most routines will still function fine, however
commit timestamp sorting in RevWalk may become confused by a corrupt
committer line and sort commits out of order. This is mostly fine if
the corrupted chain is shorter than the slop window.
Change-Id: I6d529542c765c131de590f4f7ef8e7c1c8cb9db9
ObjectChecker: Disallow names potentially mapping to ".git" on HFS+
Mac's HFS+ folds concatentations of ".git" and ignorable Unicode
characters [1] to ".git" [2]. Hence we need to disallow all names which
could potentially be a shortname for ".git". Example: in an empty
directory create a folder ".g\U+200Cit". Now you can't create another
folder ".git".
The following characters are ignorable Unicode which are ignored on
HFS+:
unicode hex name
-------------------------------------------------
U+200C 0xe2808c ZERO WIDTH NON-JOINER
U+200D 0xe2808d ZERO WIDTH JOINER
U+200E 0xe2808e LEFT-TO-RIGHT MARK
U+200F 0xe2808f RIGHT-TO-LEFT MARK
U+202A 0xe280aa LEFT-TO-RIGHT EMBEDDING
U+202B 0xe280ab RIGHT-TO-LEFT EMBEDDING
U+202C 0xe280ac POP DIRECTIONAL FORMATTING
U+202D 0xe280ad LEFT-TO-RIGHT OVERRIDE
U+202E 0xe280ae RIGHT-TO-LEFT OVERRIDE
U+206A 0xe281aa INHIBIT SYMMETRIC SWAPPING
U+206B 0xe281ab ACTIVATE SYMMETRIC SWAPPING
U+206C 0xe281ac INHIBIT ARABIC FORM SHAPING
U+206D 0xe281ad ACTIVATE ARABIC FORM SHAPING
U+206E 0xe281ae NATIONAL DIGIT SHAPES
U+206F 0xe281af NOMINAL DIGIT SHAPES
U+FEFF 0xefbbbf ZERO WIDTH NO-BREAK SPACE
[1] http://www.unicode.org/versions/Unicode7.0.0/ch05.pdf#G40025http://www.unicode.org/reports/tr31/#Layout_and_Format_Control_Characters
[2] http://dubeiko.com/development/FileSystems/HFSPLUS/tn1150.html#UnicodeSubtleties
Change-Id: Ib6a1dd090b2649bdd8ec16387c994ed29de2860d
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Windows creates shortnames for all non-8.3 files (see [1]). Hence we
need to disallow all names which could potentially be a shortname for
".git". Example: in an empty directory create a folder "GIT~1". Now you
can't create another folder ".git".
The path "GIT~1" may map to ".git" on Windows. A potential victim to
such an attack first has to initialize a git repository in order to
receive any git commits. Hence the .git folder created by init will get
the shortname "GIT~1". ".git" will only get a different shortname if the
user has created a file "GIT~1" before initialization of the git
repository.
[1] http://en.wikipedia.org/wiki/8.3_filename
Change-Id: I9978ab8f2d2951c46c1b9bbde57986d64d26b9b2
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Windows treats "foo." and "foo " as "foo". The ".git" directory is
special, as it contains metadata for a local Git repository. Disallow
variations that Windows considers to be the same.
Change-Id: I28eb48859a95a89111b4987c91de97557e3bb539
Always ignore case when forbidding .git in ObjectChecker
The component name ".GIT" inside a tree entry could confuse a
case insensitive filesystem into looking at a submodule and
not a directory entry.
Disallow any case permutations of ".git" to prevent this
confusion from entering a repository and showing up at a
later date on a case insensitive system.
Change-Id: Iaa3f768931d0d5764bf07ac5f6f3ff2b1fdda01b
Move checkPath from DirCacheCheckout to ObjectChecker
The bulk of the "is this sane" logic is inside of ObjectChecker. The
only caller for the version in DirCacheCheckout is an obtuse usage for
the static isValidRefName() method in Repository.
Deprecate the weird single use method in DirCacheCheckout and move all
code for checking a sequence of path components into ObjectChecker,
where it makes sense alongside the existing code that checks a single
component at a time.
Reuse a single ObjectChecker for the local platform, to avoid looking
up the system properties on each path string considered.
Change-Id: Iae6e769f2bfcad05c166e70ff255f9cf9fcdc87e
When safeForMacOS is enabled the checker verifies a name does not
match against another name in the same tree after normalization to
NFC. The check was incorrect and failed when the first name was put
in, rejecting simple trees containing only one file like "F".
Add a test for this simple tree to verify it is accepted.
Fix the test for NFC normalization to actually normalize
and have a collision.
Change-Id: I39e7d71150948872bff6cd2b06bf8dae52aa3c33
Check for duplicate names after folding case in ObjectChecker
Mac OS X and Windows filesystems are generally case insensitive and
will fold 'a' and 'A' to the same directory entry. If the checker is
enforcing safe semantics for these platforms, track all names and
look for duplicates after folding case and normalizing to NFC.
Change-Id: I170b6f649a72d6ef322b7254943d4c604a8d25b9
Change DirCacheCheckout to verify path using ObjectChecker
Reuse the generic logic in ObjectChecker to examine paths.
This required extracting the scanner loop to check for bad
characters within the path name segment.
Change-Id: I02e964d114fb544a0c1657790d5367c3a2b09dff
Most Mac OS X systems use a case insensitive HFS+ volume. Like
Windows ".git" and ".GIT" are the same path and can confuse a Git
program into expecting a repository where one does not exist.
Change-Id: Iec6ce9e6c2872f8b0850cc6aec023fa0fcb05ae4
Reject special Windows device names in ObjectChecker
If Windows rejection is enabled reject special device names like
NUL and PRN, including NUL.txt. This prevents a tree that might
be used on a Windows client from referencing a confusing name.
Change-Id: Ic700ea8fa68724509e0357d4b758a41178c4d70c
Allow an ObjectChecker to reject special characters for Windows
Repositories that are frequently checked out on Windows platforms
may need to ensure trees do not contain strange names that cause
problems on those systems. Follow the MSDN guidelines and refuse
to accept a tree containing a special character, or names that end
with " " (space) or "." (dot).
Since Windows filesystems are usually case insensitive, also reject
mixed case versions of the reserved ".git" name.
Change-Id: Ic3042444b1e162c6d01b88c7e6ea39b2a73c4eca
Using .git as a name in a tree is invalid for most Git repositories.
This can confuse clients into thinking there is a submodule or another
repository deeper in the tree, which is incorrect.
Change-Id: I90a1eaf25d45e91557f3f548b69cdcd8f7cddce1
Extract path segment check function in ObjectChecker
Start pulling out the path segment checking. This will be used
later to support DirCacheCheckout verification of paths, after
folding that logic into this location.
Change-Id: I66eaee5c988eb7d425fb7a708ef6f5419ab77348
Permit ObjectChecker to optionally accept leading '0' in trees
The leading '0' is a broken mode that although incorrect in the
Git canonical tree format was created by a couple of libraries
frequently used on a popular Git hosting site. Some projects have
these modes stuck in their ancient history and cannot easily
repair the damage without a full history rewrite. Optionally permit
ObjectChecker to ignore them.
Bug: 307291
Change-Id: Ib921dfd77ce757e89280d1c00328a88430daef35
A few classes such as Constanrs are marked with @SuppressWarnings, as are
toString() methods with many liternal, but otherwise $NLS-n$ is used for
string containing text that should not be translated. A few literals may
fall into the gray zone, but mostly I've tried to only tag the obvious
ones.
Change-Id: I22e50a77e2bf9e0b842a66bdf674e8fa1692f590
Use Integer, Character, and Long valueOf methods when
passing parameters to MessageFormat and other places
that expect objects instead of primitives
Change-Id: I5942fbdbca6a378136c00d951ce61167f2366ca4
The strings are externalized into the root resource bundles.
The resource bundles are stored under the new "resources" source
folder to get proper maven build.
Strings from tests are, in general, not externalized. Only in
cases where it was necessary to make the test pass the strings
were externalized. This was typically necessary in cases where
e.getMessage() was used in assert and the exception message was
slightly changed due to reuse of the externalized strings.
Change-Id: Ic0f29c80b9a54fcec8320d8539a3e112852a1f7b
Signed-off-by: Sasa Zivkov <sasa.zivkov@sap.com>
Relax ObjectChecker to permit missing tagger lines
Annotated tags created with C Git versions before the introduction
of c818566 ([PATCH] Update tags to record who made them, 2005-07-14),
do not have a "tagger" line present in the object header. This line
did not appear in C Git until v0.99.1~9.
Ancient projects such as the Linux kernel contain such tags, for
example Linux 2.6.12 is older than when this feature first appeared
in C Git. Linux v2.6.13-rc4 in late July 2005 is the first kernel
version tag to actually contain a tagger line.
It is therefore acceptable for the header to be missing, and for
the RevTag.getTaggerIdent() method to return null.
Since the Javadoc for getTaggerIdent() already explained that the
identity may be null, we just need to test that this is true when
the header is missing, and allow the ObjectChecker to pass anyway.
Change-Id: I34ba82e0624a0d1a7edcf62ffba72260af6f7e5d
See: http://code.google.com/p/gerrit/issues/detail?id=399
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Since Constants.OBJECT_ID_LENGTH is a compile time constant we
can be sure that it will always be inlined. The same goes for the
associated constant STR_LEN which is now refactored to the Constant
class and given a name better suited for wider use.
Change-Id: I03f52131e64edcd0aa74bbbf36e7d42faaf4a698
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
Per CQ 3448 this is the initial contribution of the JGit project
to eclipse.org. It is derived from the historical JGit repository
at commit 3a2dd9921c.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>