You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

WorkingTreeIterator.java 48KB

Fix core.autocrlf for non-normalized index With text=auto or core.autocrlf=true, git does not normalize upon check-in if the file in the index contains already CR/LFs. The documentation says: "When text is set to "auto", the path is marked for automatic end-of-line conversion. If Git decides that the content is text, its line endings are converted to LF on checkin. When the file has been committed with CRLF, no conversion is done."[1] Implement the last bit as in canonical git: check the blob in the index for CR/LFs. For very large files, we check only the first 8000 bytes, like RawText.isBinary() and AutoLFInputStream do. In Auto(CR)LFInputStream, ensure that the buffer is filled as much as possible for the isBinary() check. Regarding these content checks, there are a number of inconsistencies: * Canonical git considers files containing lone CRs as binary. * RawText checks the first 8000 bytes. * Auto(CR)LFInputStream checks the first 8096 (not 8192!) bytes. None of these are changed with this commit. It appears that canonical git will check the whole blob, not just the first 8k bytes. Also note: the check for CR/LF text won't work with LFS (neither in JGit nor in git) since the blob data is not run through the smudge filter. C.f. [2]. Two tests in AddCommandTest actually tested that normalization was done even if the file was already committed with CR/LF.These tests had to be adapted. I find the git documentation unclear about the case where core.autocrlf=input, but from [3] it looks as if this non-normalization also applies in this case. Add new tests in CommitCommandTest testing this for the case where the index entry is for a merge conflict. In this case, canonical git uses the "ours" version.[4] Do the same. [1] https://git-scm.com/docs/gitattributes [2] https://github.com/git/git/blob/3434569fc/convert.c#L225 [3] https://github.com/git/git/blob/3434569fc/convert.c#L529 [4] https://github.com/git/git/blob/f2b6aa98b/read-cache.c#L3281 Bug: 470643 Change-Id: Ie7310539fbe6c737d78b1dcc29e34735d4616b88 Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
5 years ago
Fix computation of id in WorkingTreeIterator with autocrlf and smudging JGit failed to do checkouts when the index contained smudged entries and autocrlf was on. In such cases the WorkingTreeIterator calculated the SHA1 sometimes on content which was not correctly filtered. The SHA1 was computed on content which two times went through a lf->crlf conversion. We used to tell the treewalk whether it is a checkin or checkout operation and always use the related filters when reading any content. If on windows and autocrlf is true and we do a checkout operation then we always used a lf->crlf conversion on any text content. That's not correct. Even during a checkout we sometimes need the crlf->lf conversion. E.g. when calculating the content-id for working-tree content we need to use crlf->lf filtering although the overall operation type is checkout. Often this bug does not have effects because we seldom compute the content-id of filesystem content during a checkout. But we do need to know whether a file is dirty or not before we overwrite it during a checkout. And if the index entries are smudged we don't trust the index and compute filesystem-content-sha1's explicitly. This caused EGit not to be able to switch branches anymore on Windows when autocrlf was true. EGit denied the checkout because it thought workingtree files are dirty because content-sha1 are computed on wrongly filtered content. Bug: 493360 Change-Id: I1072a57b4c529ba3aaa50b7b02d2b816bb64a9b8 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
8 years ago
Fix computation of id in WorkingTreeIterator with autocrlf and smudging JGit failed to do checkouts when the index contained smudged entries and autocrlf was on. In such cases the WorkingTreeIterator calculated the SHA1 sometimes on content which was not correctly filtered. The SHA1 was computed on content which two times went through a lf->crlf conversion. We used to tell the treewalk whether it is a checkin or checkout operation and always use the related filters when reading any content. If on windows and autocrlf is true and we do a checkout operation then we always used a lf->crlf conversion on any text content. That's not correct. Even during a checkout we sometimes need the crlf->lf conversion. E.g. when calculating the content-id for working-tree content we need to use crlf->lf filtering although the overall operation type is checkout. Often this bug does not have effects because we seldom compute the content-id of filesystem content during a checkout. But we do need to know whether a file is dirty or not before we overwrite it during a checkout. And if the index entries are smudged we don't trust the index and compute filesystem-content-sha1's explicitly. This caused EGit not to be able to switch branches anymore on Windows when autocrlf was true. EGit denied the checkout because it thought workingtree files are dirty because content-sha1 are computed on wrongly filtered content. Bug: 493360 Change-Id: I1072a57b4c529ba3aaa50b7b02d2b816bb64a9b8 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
8 years ago
Fix computation of id in WorkingTreeIterator with autocrlf and smudging JGit failed to do checkouts when the index contained smudged entries and autocrlf was on. In such cases the WorkingTreeIterator calculated the SHA1 sometimes on content which was not correctly filtered. The SHA1 was computed on content which two times went through a lf->crlf conversion. We used to tell the treewalk whether it is a checkin or checkout operation and always use the related filters when reading any content. If on windows and autocrlf is true and we do a checkout operation then we always used a lf->crlf conversion on any text content. That's not correct. Even during a checkout we sometimes need the crlf->lf conversion. E.g. when calculating the content-id for working-tree content we need to use crlf->lf filtering although the overall operation type is checkout. Often this bug does not have effects because we seldom compute the content-id of filesystem content during a checkout. But we do need to know whether a file is dirty or not before we overwrite it during a checkout. And if the index entries are smudged we don't trust the index and compute filesystem-content-sha1's explicitly. This caused EGit not to be able to switch branches anymore on Windows when autocrlf was true. EGit denied the checkout because it thought workingtree files are dirty because content-sha1 are computed on wrongly filtered content. Bug: 493360 Change-Id: I1072a57b4c529ba3aaa50b7b02d2b816bb64a9b8 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
8 years ago
Fix computation of id in WorkingTreeIterator with autocrlf and smudging JGit failed to do checkouts when the index contained smudged entries and autocrlf was on. In such cases the WorkingTreeIterator calculated the SHA1 sometimes on content which was not correctly filtered. The SHA1 was computed on content which two times went through a lf->crlf conversion. We used to tell the treewalk whether it is a checkin or checkout operation and always use the related filters when reading any content. If on windows and autocrlf is true and we do a checkout operation then we always used a lf->crlf conversion on any text content. That's not correct. Even during a checkout we sometimes need the crlf->lf conversion. E.g. when calculating the content-id for working-tree content we need to use crlf->lf filtering although the overall operation type is checkout. Often this bug does not have effects because we seldom compute the content-id of filesystem content during a checkout. But we do need to know whether a file is dirty or not before we overwrite it during a checkout. And if the index entries are smudged we don't trust the index and compute filesystem-content-sha1's explicitly. This caused EGit not to be able to switch branches anymore on Windows when autocrlf was true. EGit denied the checkout because it thought workingtree files are dirty because content-sha1 are computed on wrongly filtered content. Bug: 493360 Change-Id: I1072a57b4c529ba3aaa50b7b02d2b816bb64a9b8 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
8 years ago
Fix computation of id in WorkingTreeIterator with autocrlf and smudging JGit failed to do checkouts when the index contained smudged entries and autocrlf was on. In such cases the WorkingTreeIterator calculated the SHA1 sometimes on content which was not correctly filtered. The SHA1 was computed on content which two times went through a lf->crlf conversion. We used to tell the treewalk whether it is a checkin or checkout operation and always use the related filters when reading any content. If on windows and autocrlf is true and we do a checkout operation then we always used a lf->crlf conversion on any text content. That's not correct. Even during a checkout we sometimes need the crlf->lf conversion. E.g. when calculating the content-id for working-tree content we need to use crlf->lf filtering although the overall operation type is checkout. Often this bug does not have effects because we seldom compute the content-id of filesystem content during a checkout. But we do need to know whether a file is dirty or not before we overwrite it during a checkout. And if the index entries are smudged we don't trust the index and compute filesystem-content-sha1's explicitly. This caused EGit not to be able to switch branches anymore on Windows when autocrlf was true. EGit denied the checkout because it thought workingtree files are dirty because content-sha1 are computed on wrongly filtered content. Bug: 493360 Change-Id: I1072a57b4c529ba3aaa50b7b02d2b816bb64a9b8 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
8 years ago
Fix computation of id in WorkingTreeIterator with autocrlf and smudging JGit failed to do checkouts when the index contained smudged entries and autocrlf was on. In such cases the WorkingTreeIterator calculated the SHA1 sometimes on content which was not correctly filtered. The SHA1 was computed on content which two times went through a lf->crlf conversion. We used to tell the treewalk whether it is a checkin or checkout operation and always use the related filters when reading any content. If on windows and autocrlf is true and we do a checkout operation then we always used a lf->crlf conversion on any text content. That's not correct. Even during a checkout we sometimes need the crlf->lf conversion. E.g. when calculating the content-id for working-tree content we need to use crlf->lf filtering although the overall operation type is checkout. Often this bug does not have effects because we seldom compute the content-id of filesystem content during a checkout. But we do need to know whether a file is dirty or not before we overwrite it during a checkout. And if the index entries are smudged we don't trust the index and compute filesystem-content-sha1's explicitly. This caused EGit not to be able to switch branches anymore on Windows when autocrlf was true. EGit denied the checkout because it thought workingtree files are dirty because content-sha1 are computed on wrongly filtered content. Bug: 493360 Change-Id: I1072a57b4c529ba3aaa50b7b02d2b816bb64a9b8 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
8 years ago
Fix computation of id in WorkingTreeIterator with autocrlf and smudging JGit failed to do checkouts when the index contained smudged entries and autocrlf was on. In such cases the WorkingTreeIterator calculated the SHA1 sometimes on content which was not correctly filtered. The SHA1 was computed on content which two times went through a lf->crlf conversion. We used to tell the treewalk whether it is a checkin or checkout operation and always use the related filters when reading any content. If on windows and autocrlf is true and we do a checkout operation then we always used a lf->crlf conversion on any text content. That's not correct. Even during a checkout we sometimes need the crlf->lf conversion. E.g. when calculating the content-id for working-tree content we need to use crlf->lf filtering although the overall operation type is checkout. Often this bug does not have effects because we seldom compute the content-id of filesystem content during a checkout. But we do need to know whether a file is dirty or not before we overwrite it during a checkout. And if the index entries are smudged we don't trust the index and compute filesystem-content-sha1's explicitly. This caused EGit not to be able to switch branches anymore on Windows when autocrlf was true. EGit denied the checkout because it thought workingtree files are dirty because content-sha1 are computed on wrongly filtered content. Bug: 493360 Change-Id: I1072a57b4c529ba3aaa50b7b02d2b816bb64a9b8 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
8 years ago
Fix computation of id in WorkingTreeIterator with autocrlf and smudging JGit failed to do checkouts when the index contained smudged entries and autocrlf was on. In such cases the WorkingTreeIterator calculated the SHA1 sometimes on content which was not correctly filtered. The SHA1 was computed on content which two times went through a lf->crlf conversion. We used to tell the treewalk whether it is a checkin or checkout operation and always use the related filters when reading any content. If on windows and autocrlf is true and we do a checkout operation then we always used a lf->crlf conversion on any text content. That's not correct. Even during a checkout we sometimes need the crlf->lf conversion. E.g. when calculating the content-id for working-tree content we need to use crlf->lf filtering although the overall operation type is checkout. Often this bug does not have effects because we seldom compute the content-id of filesystem content during a checkout. But we do need to know whether a file is dirty or not before we overwrite it during a checkout. And if the index entries are smudged we don't trust the index and compute filesystem-content-sha1's explicitly. This caused EGit not to be able to switch branches anymore on Windows when autocrlf was true. EGit denied the checkout because it thought workingtree files are dirty because content-sha1 are computed on wrongly filtered content. Bug: 493360 Change-Id: I1072a57b4c529ba3aaa50b7b02d2b816bb64a9b8 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
8 years ago
Fix computation of id in WorkingTreeIterator with autocrlf and smudging JGit failed to do checkouts when the index contained smudged entries and autocrlf was on. In such cases the WorkingTreeIterator calculated the SHA1 sometimes on content which was not correctly filtered. The SHA1 was computed on content which two times went through a lf->crlf conversion. We used to tell the treewalk whether it is a checkin or checkout operation and always use the related filters when reading any content. If on windows and autocrlf is true and we do a checkout operation then we always used a lf->crlf conversion on any text content. That's not correct. Even during a checkout we sometimes need the crlf->lf conversion. E.g. when calculating the content-id for working-tree content we need to use crlf->lf filtering although the overall operation type is checkout. Often this bug does not have effects because we seldom compute the content-id of filesystem content during a checkout. But we do need to know whether a file is dirty or not before we overwrite it during a checkout. And if the index entries are smudged we don't trust the index and compute filesystem-content-sha1's explicitly. This caused EGit not to be able to switch branches anymore on Windows when autocrlf was true. EGit denied the checkout because it thought workingtree files are dirty because content-sha1 are computed on wrongly filtered content. Bug: 493360 Change-Id: I1072a57b4c529ba3aaa50b7b02d2b816bb64a9b8 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
8 years ago
Fix computation of id in WorkingTreeIterator with autocrlf and smudging JGit failed to do checkouts when the index contained smudged entries and autocrlf was on. In such cases the WorkingTreeIterator calculated the SHA1 sometimes on content which was not correctly filtered. The SHA1 was computed on content which two times went through a lf->crlf conversion. We used to tell the treewalk whether it is a checkin or checkout operation and always use the related filters when reading any content. If on windows and autocrlf is true and we do a checkout operation then we always used a lf->crlf conversion on any text content. That's not correct. Even during a checkout we sometimes need the crlf->lf conversion. E.g. when calculating the content-id for working-tree content we need to use crlf->lf filtering although the overall operation type is checkout. Often this bug does not have effects because we seldom compute the content-id of filesystem content during a checkout. But we do need to know whether a file is dirty or not before we overwrite it during a checkout. And if the index entries are smudged we don't trust the index and compute filesystem-content-sha1's explicitly. This caused EGit not to be able to switch branches anymore on Windows when autocrlf was true. EGit denied the checkout because it thought workingtree files are dirty because content-sha1 are computed on wrongly filtered content. Bug: 493360 Change-Id: I1072a57b4c529ba3aaa50b7b02d2b816bb64a9b8 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
8 years ago
Significantly speed up FileTreeIterator on Windows Getting attributes of files on Windows is an expensive operation. Windows stores file attributes in the directory, so they are basically available "for free" when a directory is listed. The implementation of Java's Files.walkFileTree() takes advantage of that (at least in the OpenJDK implementation for Windows) and provides the attributes from the directory to a FileVisitor. Using Files.walkFileTree() with a maximum depth of 1 is thus a good approach on Windows to get both the file names and the attributes in one go. In my tests, this gives a significant speed-up of FileTreeIterator over the "normal" way: using File.listFiles() and then reading the attributes of each file individually. The speed-up is hard to quantify exactly, but in my tests I've observed consistently 30-40% for staging 500 files one after another, each individually, and up to 50% for individual TreeWalks with a FileTreeIterator. On Unix, this technique is detrimental. Unix stores file attributes differently, and getting attributes of individual files is not costly. On Unix, the old way of doing a listFiles() and getting individual attributes (both native operations) is about three times faster than using walkFileTree, which is implemented in Java. Therefore, move the operation to FS/FS_Win32 and call it from FileTreeIterator, so that we can have different implementations depending on the file system. A little performance test program is included as a JUnit test (to be run manually). While this does speed up things on Windows, it doesn't solve the basic problem of bug 532300: the iterator always gets the full directory listing and the attributes of all files, and the more files there are the longer that takes. Bug: 532300 Change-Id: Ic5facb871c725256c2324b0d97b95e6efc33282a Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
6 years ago
Fix computation of id in WorkingTreeIterator with autocrlf and smudging JGit failed to do checkouts when the index contained smudged entries and autocrlf was on. In such cases the WorkingTreeIterator calculated the SHA1 sometimes on content which was not correctly filtered. The SHA1 was computed on content which two times went through a lf->crlf conversion. We used to tell the treewalk whether it is a checkin or checkout operation and always use the related filters when reading any content. If on windows and autocrlf is true and we do a checkout operation then we always used a lf->crlf conversion on any text content. That's not correct. Even during a checkout we sometimes need the crlf->lf conversion. E.g. when calculating the content-id for working-tree content we need to use crlf->lf filtering although the overall operation type is checkout. Often this bug does not have effects because we seldom compute the content-id of filesystem content during a checkout. But we do need to know whether a file is dirty or not before we overwrite it during a checkout. And if the index entries are smudged we don't trust the index and compute filesystem-content-sha1's explicitly. This caused EGit not to be able to switch branches anymore on Windows when autocrlf was true. EGit denied the checkout because it thought workingtree files are dirty because content-sha1 are computed on wrongly filtered content. Bug: 493360 Change-Id: I1072a57b4c529ba3aaa50b7b02d2b816bb64a9b8 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
8 years ago
Fix core.autocrlf for non-normalized index With text=auto or core.autocrlf=true, git does not normalize upon check-in if the file in the index contains already CR/LFs. The documentation says: "When text is set to "auto", the path is marked for automatic end-of-line conversion. If Git decides that the content is text, its line endings are converted to LF on checkin. When the file has been committed with CRLF, no conversion is done."[1] Implement the last bit as in canonical git: check the blob in the index for CR/LFs. For very large files, we check only the first 8000 bytes, like RawText.isBinary() and AutoLFInputStream do. In Auto(CR)LFInputStream, ensure that the buffer is filled as much as possible for the isBinary() check. Regarding these content checks, there are a number of inconsistencies: * Canonical git considers files containing lone CRs as binary. * RawText checks the first 8000 bytes. * Auto(CR)LFInputStream checks the first 8096 (not 8192!) bytes. None of these are changed with this commit. It appears that canonical git will check the whole blob, not just the first 8k bytes. Also note: the check for CR/LF text won't work with LFS (neither in JGit nor in git) since the blob data is not run through the smudge filter. C.f. [2]. Two tests in AddCommandTest actually tested that normalization was done even if the file was already committed with CR/LF.These tests had to be adapted. I find the git documentation unclear about the case where core.autocrlf=input, but from [3] it looks as if this non-normalization also applies in this case. Add new tests in CommitCommandTest testing this for the case where the index entry is for a merge conflict. In this case, canonical git uses the "ours" version.[4] Do the same. [1] https://git-scm.com/docs/gitattributes [2] https://github.com/git/git/blob/3434569fc/convert.c#L225 [3] https://github.com/git/git/blob/3434569fc/convert.c#L529 [4] https://github.com/git/git/blob/f2b6aa98b/read-cache.c#L3281 Bug: 470643 Change-Id: Ie7310539fbe6c737d78b1dcc29e34735d4616b88 Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
5 years ago
Fix core.autocrlf for non-normalized index With text=auto or core.autocrlf=true, git does not normalize upon check-in if the file in the index contains already CR/LFs. The documentation says: "When text is set to "auto", the path is marked for automatic end-of-line conversion. If Git decides that the content is text, its line endings are converted to LF on checkin. When the file has been committed with CRLF, no conversion is done."[1] Implement the last bit as in canonical git: check the blob in the index for CR/LFs. For very large files, we check only the first 8000 bytes, like RawText.isBinary() and AutoLFInputStream do. In Auto(CR)LFInputStream, ensure that the buffer is filled as much as possible for the isBinary() check. Regarding these content checks, there are a number of inconsistencies: * Canonical git considers files containing lone CRs as binary. * RawText checks the first 8000 bytes. * Auto(CR)LFInputStream checks the first 8096 (not 8192!) bytes. None of these are changed with this commit. It appears that canonical git will check the whole blob, not just the first 8k bytes. Also note: the check for CR/LF text won't work with LFS (neither in JGit nor in git) since the blob data is not run through the smudge filter. C.f. [2]. Two tests in AddCommandTest actually tested that normalization was done even if the file was already committed with CR/LF.These tests had to be adapted. I find the git documentation unclear about the case where core.autocrlf=input, but from [3] it looks as if this non-normalization also applies in this case. Add new tests in CommitCommandTest testing this for the case where the index entry is for a merge conflict. In this case, canonical git uses the "ours" version.[4] Do the same. [1] https://git-scm.com/docs/gitattributes [2] https://github.com/git/git/blob/3434569fc/convert.c#L225 [3] https://github.com/git/git/blob/3434569fc/convert.c#L529 [4] https://github.com/git/git/blob/f2b6aa98b/read-cache.c#L3281 Bug: 470643 Change-Id: Ie7310539fbe6c737d78b1dcc29e34735d4616b88 Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
5 years ago
Fix core.autocrlf for non-normalized index With text=auto or core.autocrlf=true, git does not normalize upon check-in if the file in the index contains already CR/LFs. The documentation says: "When text is set to "auto", the path is marked for automatic end-of-line conversion. If Git decides that the content is text, its line endings are converted to LF on checkin. When the file has been committed with CRLF, no conversion is done."[1] Implement the last bit as in canonical git: check the blob in the index for CR/LFs. For very large files, we check only the first 8000 bytes, like RawText.isBinary() and AutoLFInputStream do. In Auto(CR)LFInputStream, ensure that the buffer is filled as much as possible for the isBinary() check. Regarding these content checks, there are a number of inconsistencies: * Canonical git considers files containing lone CRs as binary. * RawText checks the first 8000 bytes. * Auto(CR)LFInputStream checks the first 8096 (not 8192!) bytes. None of these are changed with this commit. It appears that canonical git will check the whole blob, not just the first 8k bytes. Also note: the check for CR/LF text won't work with LFS (neither in JGit nor in git) since the blob data is not run through the smudge filter. C.f. [2]. Two tests in AddCommandTest actually tested that normalization was done even if the file was already committed with CR/LF.These tests had to be adapted. I find the git documentation unclear about the case where core.autocrlf=input, but from [3] it looks as if this non-normalization also applies in this case. Add new tests in CommitCommandTest testing this for the case where the index entry is for a merge conflict. In this case, canonical git uses the "ours" version.[4] Do the same. [1] https://git-scm.com/docs/gitattributes [2] https://github.com/git/git/blob/3434569fc/convert.c#L225 [3] https://github.com/git/git/blob/3434569fc/convert.c#L529 [4] https://github.com/git/git/blob/f2b6aa98b/read-cache.c#L3281 Bug: 470643 Change-Id: Ie7310539fbe6c737d78b1dcc29e34735d4616b88 Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
5 years ago
Fix core.autocrlf for non-normalized index With text=auto or core.autocrlf=true, git does not normalize upon check-in if the file in the index contains already CR/LFs. The documentation says: "When text is set to "auto", the path is marked for automatic end-of-line conversion. If Git decides that the content is text, its line endings are converted to LF on checkin. When the file has been committed with CRLF, no conversion is done."[1] Implement the last bit as in canonical git: check the blob in the index for CR/LFs. For very large files, we check only the first 8000 bytes, like RawText.isBinary() and AutoLFInputStream do. In Auto(CR)LFInputStream, ensure that the buffer is filled as much as possible for the isBinary() check. Regarding these content checks, there are a number of inconsistencies: * Canonical git considers files containing lone CRs as binary. * RawText checks the first 8000 bytes. * Auto(CR)LFInputStream checks the first 8096 (not 8192!) bytes. None of these are changed with this commit. It appears that canonical git will check the whole blob, not just the first 8k bytes. Also note: the check for CR/LF text won't work with LFS (neither in JGit nor in git) since the blob data is not run through the smudge filter. C.f. [2]. Two tests in AddCommandTest actually tested that normalization was done even if the file was already committed with CR/LF.These tests had to be adapted. I find the git documentation unclear about the case where core.autocrlf=input, but from [3] it looks as if this non-normalization also applies in this case. Add new tests in CommitCommandTest testing this for the case where the index entry is for a merge conflict. In this case, canonical git uses the "ours" version.[4] Do the same. [1] https://git-scm.com/docs/gitattributes [2] https://github.com/git/git/blob/3434569fc/convert.c#L225 [3] https://github.com/git/git/blob/3434569fc/convert.c#L529 [4] https://github.com/git/git/blob/f2b6aa98b/read-cache.c#L3281 Bug: 470643 Change-Id: Ie7310539fbe6c737d78b1dcc29e34735d4616b88 Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
5 years ago
Fix core.autocrlf for non-normalized index With text=auto or core.autocrlf=true, git does not normalize upon check-in if the file in the index contains already CR/LFs. The documentation says: "When text is set to "auto", the path is marked for automatic end-of-line conversion. If Git decides that the content is text, its line endings are converted to LF on checkin. When the file has been committed with CRLF, no conversion is done."[1] Implement the last bit as in canonical git: check the blob in the index for CR/LFs. For very large files, we check only the first 8000 bytes, like RawText.isBinary() and AutoLFInputStream do. In Auto(CR)LFInputStream, ensure that the buffer is filled as much as possible for the isBinary() check. Regarding these content checks, there are a number of inconsistencies: * Canonical git considers files containing lone CRs as binary. * RawText checks the first 8000 bytes. * Auto(CR)LFInputStream checks the first 8096 (not 8192!) bytes. None of these are changed with this commit. It appears that canonical git will check the whole blob, not just the first 8k bytes. Also note: the check for CR/LF text won't work with LFS (neither in JGit nor in git) since the blob data is not run through the smudge filter. C.f. [2]. Two tests in AddCommandTest actually tested that normalization was done even if the file was already committed with CR/LF.These tests had to be adapted. I find the git documentation unclear about the case where core.autocrlf=input, but from [3] it looks as if this non-normalization also applies in this case. Add new tests in CommitCommandTest testing this for the case where the index entry is for a merge conflict. In this case, canonical git uses the "ours" version.[4] Do the same. [1] https://git-scm.com/docs/gitattributes [2] https://github.com/git/git/blob/3434569fc/convert.c#L225 [3] https://github.com/git/git/blob/3434569fc/convert.c#L529 [4] https://github.com/git/git/blob/f2b6aa98b/read-cache.c#L3281 Bug: 470643 Change-Id: Ie7310539fbe6c737d78b1dcc29e34735d4616b88 Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
5 years ago
Fix core.autocrlf for non-normalized index With text=auto or core.autocrlf=true, git does not normalize upon check-in if the file in the index contains already CR/LFs. The documentation says: "When text is set to "auto", the path is marked for automatic end-of-line conversion. If Git decides that the content is text, its line endings are converted to LF on checkin. When the file has been committed with CRLF, no conversion is done."[1] Implement the last bit as in canonical git: check the blob in the index for CR/LFs. For very large files, we check only the first 8000 bytes, like RawText.isBinary() and AutoLFInputStream do. In Auto(CR)LFInputStream, ensure that the buffer is filled as much as possible for the isBinary() check. Regarding these content checks, there are a number of inconsistencies: * Canonical git considers files containing lone CRs as binary. * RawText checks the first 8000 bytes. * Auto(CR)LFInputStream checks the first 8096 (not 8192!) bytes. None of these are changed with this commit. It appears that canonical git will check the whole blob, not just the first 8k bytes. Also note: the check for CR/LF text won't work with LFS (neither in JGit nor in git) since the blob data is not run through the smudge filter. C.f. [2]. Two tests in AddCommandTest actually tested that normalization was done even if the file was already committed with CR/LF.These tests had to be adapted. I find the git documentation unclear about the case where core.autocrlf=input, but from [3] it looks as if this non-normalization also applies in this case. Add new tests in CommitCommandTest testing this for the case where the index entry is for a merge conflict. In this case, canonical git uses the "ours" version.[4] Do the same. [1] https://git-scm.com/docs/gitattributes [2] https://github.com/git/git/blob/3434569fc/convert.c#L225 [3] https://github.com/git/git/blob/3434569fc/convert.c#L529 [4] https://github.com/git/git/blob/f2b6aa98b/read-cache.c#L3281 Bug: 470643 Change-Id: Ie7310539fbe6c737d78b1dcc29e34735d4616b88 Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
5 years ago
Fix core.autocrlf for non-normalized index With text=auto or core.autocrlf=true, git does not normalize upon check-in if the file in the index contains already CR/LFs. The documentation says: "When text is set to "auto", the path is marked for automatic end-of-line conversion. If Git decides that the content is text, its line endings are converted to LF on checkin. When the file has been committed with CRLF, no conversion is done."[1] Implement the last bit as in canonical git: check the blob in the index for CR/LFs. For very large files, we check only the first 8000 bytes, like RawText.isBinary() and AutoLFInputStream do. In Auto(CR)LFInputStream, ensure that the buffer is filled as much as possible for the isBinary() check. Regarding these content checks, there are a number of inconsistencies: * Canonical git considers files containing lone CRs as binary. * RawText checks the first 8000 bytes. * Auto(CR)LFInputStream checks the first 8096 (not 8192!) bytes. None of these are changed with this commit. It appears that canonical git will check the whole blob, not just the first 8k bytes. Also note: the check for CR/LF text won't work with LFS (neither in JGit nor in git) since the blob data is not run through the smudge filter. C.f. [2]. Two tests in AddCommandTest actually tested that normalization was done even if the file was already committed with CR/LF.These tests had to be adapted. I find the git documentation unclear about the case where core.autocrlf=input, but from [3] it looks as if this non-normalization also applies in this case. Add new tests in CommitCommandTest testing this for the case where the index entry is for a merge conflict. In this case, canonical git uses the "ours" version.[4] Do the same. [1] https://git-scm.com/docs/gitattributes [2] https://github.com/git/git/blob/3434569fc/convert.c#L225 [3] https://github.com/git/git/blob/3434569fc/convert.c#L529 [4] https://github.com/git/git/blob/f2b6aa98b/read-cache.c#L3281 Bug: 470643 Change-Id: Ie7310539fbe6c737d78b1dcc29e34735d4616b88 Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
5 years ago
Fix core.autocrlf for non-normalized index With text=auto or core.autocrlf=true, git does not normalize upon check-in if the file in the index contains already CR/LFs. The documentation says: "When text is set to "auto", the path is marked for automatic end-of-line conversion. If Git decides that the content is text, its line endings are converted to LF on checkin. When the file has been committed with CRLF, no conversion is done."[1] Implement the last bit as in canonical git: check the blob in the index for CR/LFs. For very large files, we check only the first 8000 bytes, like RawText.isBinary() and AutoLFInputStream do. In Auto(CR)LFInputStream, ensure that the buffer is filled as much as possible for the isBinary() check. Regarding these content checks, there are a number of inconsistencies: * Canonical git considers files containing lone CRs as binary. * RawText checks the first 8000 bytes. * Auto(CR)LFInputStream checks the first 8096 (not 8192!) bytes. None of these are changed with this commit. It appears that canonical git will check the whole blob, not just the first 8k bytes. Also note: the check for CR/LF text won't work with LFS (neither in JGit nor in git) since the blob data is not run through the smudge filter. C.f. [2]. Two tests in AddCommandTest actually tested that normalization was done even if the file was already committed with CR/LF.These tests had to be adapted. I find the git documentation unclear about the case where core.autocrlf=input, but from [3] it looks as if this non-normalization also applies in this case. Add new tests in CommitCommandTest testing this for the case where the index entry is for a merge conflict. In this case, canonical git uses the "ours" version.[4] Do the same. [1] https://git-scm.com/docs/gitattributes [2] https://github.com/git/git/blob/3434569fc/convert.c#L225 [3] https://github.com/git/git/blob/3434569fc/convert.c#L529 [4] https://github.com/git/git/blob/f2b6aa98b/read-cache.c#L3281 Bug: 470643 Change-Id: Ie7310539fbe6c737d78b1dcc29e34735d4616b88 Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
5 years ago
Fix core.autocrlf for non-normalized index With text=auto or core.autocrlf=true, git does not normalize upon check-in if the file in the index contains already CR/LFs. The documentation says: "When text is set to "auto", the path is marked for automatic end-of-line conversion. If Git decides that the content is text, its line endings are converted to LF on checkin. When the file has been committed with CRLF, no conversion is done."[1] Implement the last bit as in canonical git: check the blob in the index for CR/LFs. For very large files, we check only the first 8000 bytes, like RawText.isBinary() and AutoLFInputStream do. In Auto(CR)LFInputStream, ensure that the buffer is filled as much as possible for the isBinary() check. Regarding these content checks, there are a number of inconsistencies: * Canonical git considers files containing lone CRs as binary. * RawText checks the first 8000 bytes. * Auto(CR)LFInputStream checks the first 8096 (not 8192!) bytes. None of these are changed with this commit. It appears that canonical git will check the whole blob, not just the first 8k bytes. Also note: the check for CR/LF text won't work with LFS (neither in JGit nor in git) since the blob data is not run through the smudge filter. C.f. [2]. Two tests in AddCommandTest actually tested that normalization was done even if the file was already committed with CR/LF.These tests had to be adapted. I find the git documentation unclear about the case where core.autocrlf=input, but from [3] it looks as if this non-normalization also applies in this case. Add new tests in CommitCommandTest testing this for the case where the index entry is for a merge conflict. In this case, canonical git uses the "ours" version.[4] Do the same. [1] https://git-scm.com/docs/gitattributes [2] https://github.com/git/git/blob/3434569fc/convert.c#L225 [3] https://github.com/git/git/blob/3434569fc/convert.c#L529 [4] https://github.com/git/git/blob/f2b6aa98b/read-cache.c#L3281 Bug: 470643 Change-Id: Ie7310539fbe6c737d78b1dcc29e34735d4616b88 Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
5 years ago
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130113111321133113411351136113711381139114011411142114311441145114611471148114911501151115211531154115511561157115811591160116111621163116411651166116711681169117011711172117311741175117611771178117911801181118211831184118511861187118811891190119111921193119411951196119711981199120012011202120312041205120612071208120912101211121212131214121512161217121812191220122112221223122412251226122712281229123012311232123312341235123612371238123912401241124212431244124512461247124812491250125112521253125412551256125712581259126012611262126312641265126612671268126912701271127212731274127512761277127812791280128112821283128412851286128712881289129012911292129312941295129612971298129913001301130213031304130513061307130813091310131113121313131413151316131713181319132013211322132313241325132613271328132913301331133213331334133513361337133813391340134113421343134413451346134713481349135013511352135313541355135613571358135913601361136213631364136513661367136813691370137113721373137413751376137713781379138013811382138313841385138613871388138913901391139213931394139513961397139813991400140114021403140414051406140714081409141014111412141314141415141614171418141914201421142214231424142514261427142814291430143114321433143414351436143714381439144014411442144314441445144614471448144914501451145214531454145514561457145814591460146114621463146414651466146714681469147014711472147314741475147614771478147914801481148214831484148514861487148814891490149114921493149414951496149714981499150015011502150315041505150615071508150915101511151215131514151515161517151815191520152115221523152415251526152715281529153015311532153315341535153615371538153915401541154215431544154515461547154815491550155115521553155415551556155715581559156015611562156315641565156615671568156915701571157215731574157515761577157815791580158115821583158415851586158715881589159015911592
  1. /*
  2. * Copyright (C) 2008, Shawn O. Pearce <spearce@spearce.org>
  3. * Copyright (C) 2010, Christian Halstrick <christian.halstrick@sap.com>
  4. * Copyright (C) 2010, Matthias Sohn <matthias.sohn@sap.com>
  5. * Copyright (C) 2012-2021, Robin Rosenberg and others
  6. *
  7. * This program and the accompanying materials are made available under the
  8. * terms of the Eclipse Distribution License v. 1.0 which is available at
  9. * https://www.eclipse.org/org/documents/edl-v10.php.
  10. *
  11. * SPDX-License-Identifier: BSD-3-Clause
  12. */
  13. package org.eclipse.jgit.treewalk;
  14. import static java.nio.charset.StandardCharsets.UTF_8;
  15. import java.io.ByteArrayInputStream;
  16. import java.io.File;
  17. import java.io.FileInputStream;
  18. import java.io.FileNotFoundException;
  19. import java.io.IOException;
  20. import java.io.InputStream;
  21. import java.nio.ByteBuffer;
  22. import java.nio.CharBuffer;
  23. import java.nio.charset.CharacterCodingException;
  24. import java.nio.charset.CharsetEncoder;
  25. import java.nio.file.Path;
  26. import java.text.MessageFormat;
  27. import java.time.Instant;
  28. import java.util.Arrays;
  29. import java.util.Collections;
  30. import java.util.Comparator;
  31. import java.util.HashMap;
  32. import java.util.Map;
  33. import org.eclipse.jgit.api.errors.FilterFailedException;
  34. import org.eclipse.jgit.attributes.AttributesNode;
  35. import org.eclipse.jgit.attributes.AttributesRule;
  36. import org.eclipse.jgit.attributes.FilterCommand;
  37. import org.eclipse.jgit.attributes.FilterCommandRegistry;
  38. import org.eclipse.jgit.diff.RawText;
  39. import org.eclipse.jgit.dircache.DirCacheEntry;
  40. import org.eclipse.jgit.dircache.DirCacheIterator;
  41. import org.eclipse.jgit.errors.CorruptObjectException;
  42. import org.eclipse.jgit.errors.LargeObjectException;
  43. import org.eclipse.jgit.errors.MissingObjectException;
  44. import org.eclipse.jgit.errors.NoWorkTreeException;
  45. import org.eclipse.jgit.ignore.FastIgnoreRule;
  46. import org.eclipse.jgit.ignore.IgnoreNode;
  47. import org.eclipse.jgit.internal.JGitText;
  48. import org.eclipse.jgit.lib.ConfigConstants;
  49. import org.eclipse.jgit.lib.Constants;
  50. import org.eclipse.jgit.lib.CoreConfig.CheckStat;
  51. import org.eclipse.jgit.lib.CoreConfig.EolStreamType;
  52. import org.eclipse.jgit.lib.CoreConfig.SymLinks;
  53. import org.eclipse.jgit.lib.FileMode;
  54. import org.eclipse.jgit.lib.ObjectId;
  55. import org.eclipse.jgit.lib.ObjectLoader;
  56. import org.eclipse.jgit.lib.ObjectReader;
  57. import org.eclipse.jgit.lib.Repository;
  58. import org.eclipse.jgit.submodule.SubmoduleWalk;
  59. import org.eclipse.jgit.treewalk.TreeWalk.OperationType;
  60. import org.eclipse.jgit.util.FS;
  61. import org.eclipse.jgit.util.FS.ExecutionResult;
  62. import org.eclipse.jgit.util.FileUtils;
  63. import org.eclipse.jgit.util.Holder;
  64. import org.eclipse.jgit.util.IO;
  65. import org.eclipse.jgit.util.Paths;
  66. import org.eclipse.jgit.util.RawParseUtils;
  67. import org.eclipse.jgit.util.TemporaryBuffer;
  68. import org.eclipse.jgit.util.TemporaryBuffer.LocalFile;
  69. import org.eclipse.jgit.util.io.EolStreamTypeUtil;
  70. import org.eclipse.jgit.util.sha1.SHA1;
  71. /**
  72. * Walks a working directory tree as part of a
  73. * {@link org.eclipse.jgit.treewalk.TreeWalk}.
  74. * <p>
  75. * Most applications will want to use the standard implementation of this
  76. * iterator, {@link org.eclipse.jgit.treewalk.FileTreeIterator}, as that does
  77. * all IO through the standard <code>java.io</code> package. Plugins for a Java
  78. * based IDE may however wish to create their own implementations of this class
  79. * to allow traversal of the IDE's project space, as well as benefit from any
  80. * caching the IDE may have.
  81. *
  82. * @see FileTreeIterator
  83. */
  84. public abstract class WorkingTreeIterator extends AbstractTreeIterator {
  85. private static final int MAX_EXCEPTION_TEXT_SIZE = 10 * 1024;
  86. /** An empty entry array, suitable for {@link #init(Entry[])}. */
  87. protected static final Entry[] EOF = {};
  88. /** Size we perform file IO in if we have to read and hash a file. */
  89. static final int BUFFER_SIZE = 2048;
  90. /**
  91. * Maximum size of files which may be read fully into memory for performance
  92. * reasons.
  93. */
  94. private static final long MAXIMUM_FILE_SIZE_TO_READ_FULLY = 65536;
  95. /** Inherited state of this iterator, describing working tree, etc. */
  96. private final IteratorState state;
  97. /** The {@link #idBuffer()} for the current entry. */
  98. private byte[] contentId;
  99. /** Index within {@link #entries} that {@link #contentId} came from. */
  100. private int contentIdFromPtr;
  101. /** List of entries obtained from the subclass. */
  102. private Entry[] entries;
  103. /** Total number of entries in {@link #entries} that are valid. */
  104. private int entryCnt;
  105. /** Current position within {@link #entries}. */
  106. private int ptr;
  107. /** If there is a .gitignore file present, the parsed rules from it. */
  108. private IgnoreNode ignoreNode;
  109. /**
  110. * cached clean filter command. Use a Ref in order to distinguish between
  111. * the ref not cached yet and the value null
  112. */
  113. private Holder<String> cleanFilterCommandHolder;
  114. /**
  115. * cached eol stream type. Use a Ref in order to distinguish between the ref
  116. * not cached yet and the value null
  117. */
  118. private Holder<EolStreamType> eolStreamTypeHolder;
  119. /** Repository that is the root level being iterated over */
  120. protected Repository repository;
  121. /** Cached canonical length, initialized from {@link #idBuffer()} */
  122. private long canonLen = -1;
  123. /** The offset of the content id in {@link #idBuffer()} */
  124. private int contentIdOffset;
  125. /** A comparator for {@link Instant}s. */
  126. private final InstantComparator timestampComparator = new InstantComparator();
  127. /**
  128. * Create a new iterator with no parent.
  129. *
  130. * @param options
  131. * working tree options to be used
  132. */
  133. protected WorkingTreeIterator(WorkingTreeOptions options) {
  134. super();
  135. state = new IteratorState(options);
  136. }
  137. /**
  138. * Create a new iterator with no parent and a prefix.
  139. * <p>
  140. * The prefix path supplied is inserted in front of all paths generated by
  141. * this iterator. It is intended to be used when an iterator is being
  142. * created for a subsection of an overall repository and needs to be
  143. * combined with other iterators that are created to run over the entire
  144. * repository namespace.
  145. *
  146. * @param prefix
  147. * position of this iterator in the repository tree. The value
  148. * may be null or the empty string to indicate the prefix is the
  149. * root of the repository. A trailing slash ('/') is
  150. * automatically appended if the prefix does not end in '/'.
  151. * @param options
  152. * working tree options to be used
  153. */
  154. protected WorkingTreeIterator(final String prefix,
  155. WorkingTreeOptions options) {
  156. super(prefix);
  157. state = new IteratorState(options);
  158. }
  159. /**
  160. * Create an iterator for a subtree of an existing iterator.
  161. *
  162. * @param p
  163. * parent tree iterator.
  164. */
  165. protected WorkingTreeIterator(WorkingTreeIterator p) {
  166. super(p);
  167. state = p.state;
  168. repository = p.repository;
  169. }
  170. /**
  171. * Initialize this iterator for the root level of a repository.
  172. * <p>
  173. * This method should only be invoked after calling {@link #init(Entry[])},
  174. * and only for the root iterator.
  175. *
  176. * @param repo
  177. * the repository.
  178. */
  179. protected void initRootIterator(Repository repo) {
  180. repository = repo;
  181. Entry entry;
  182. if (ignoreNode instanceof PerDirectoryIgnoreNode)
  183. entry = ((PerDirectoryIgnoreNode) ignoreNode).entry;
  184. else
  185. entry = null;
  186. ignoreNode = new RootIgnoreNode(entry, repo);
  187. }
  188. /**
  189. * Define the matching {@link org.eclipse.jgit.dircache.DirCacheIterator},
  190. * to optimize ObjectIds.
  191. *
  192. * Once the DirCacheIterator has been set this iterator must only be
  193. * advanced by the TreeWalk that is supplied, as it assumes that itself and
  194. * the corresponding DirCacheIterator are positioned on the same file path
  195. * whenever {@link #idBuffer()} is invoked.
  196. *
  197. * @param walk
  198. * the walk that will be advancing this iterator.
  199. * @param treeId
  200. * index of the matching
  201. * {@link org.eclipse.jgit.dircache.DirCacheIterator}.
  202. */
  203. public void setDirCacheIterator(TreeWalk walk, int treeId) {
  204. state.walk = walk;
  205. state.dirCacheTree = treeId;
  206. }
  207. /**
  208. * Retrieves the {@link DirCacheIterator} at the current entry if
  209. * {@link #setDirCacheIterator(TreeWalk, int)} was called.
  210. *
  211. * @return the DirCacheIterator, or {@code null} if not set or not at the
  212. * current entry
  213. * @since 5.0
  214. */
  215. protected DirCacheIterator getDirCacheIterator() {
  216. if (state.dirCacheTree >= 0 && state.walk != null) {
  217. return state.walk.getTree(state.dirCacheTree,
  218. DirCacheIterator.class);
  219. }
  220. return null;
  221. }
  222. /**
  223. * Defines whether this {@link WorkingTreeIterator} walks ignored
  224. * directories.
  225. *
  226. * @param includeIgnored
  227. * {@code false} to skip ignored directories, if possible;
  228. * {@code true} to always include them in the walk
  229. * @since 5.0
  230. */
  231. public void setWalkIgnoredDirectories(boolean includeIgnored) {
  232. state.walkIgnored = includeIgnored;
  233. }
  234. /**
  235. * Tells whether this {@link WorkingTreeIterator} walks ignored directories.
  236. *
  237. * @return {@code true} if it does, {@code false} otherwise
  238. * @since 5.0
  239. */
  240. public boolean walksIgnoredDirectories() {
  241. return state.walkIgnored;
  242. }
  243. /** {@inheritDoc} */
  244. @Override
  245. public boolean hasId() {
  246. if (contentIdFromPtr == ptr)
  247. return true;
  248. return (mode & FileMode.TYPE_MASK) == FileMode.TYPE_FILE;
  249. }
  250. /** {@inheritDoc} */
  251. @Override
  252. public byte[] idBuffer() {
  253. if (contentIdFromPtr == ptr)
  254. return contentId;
  255. if (state.walk != null) {
  256. // If there is a matching DirCacheIterator, we can reuse
  257. // its idBuffer, but only if we appear to be clean against
  258. // the cached index information for the path.
  259. DirCacheIterator i = state.walk.getTree(state.dirCacheTree,
  260. DirCacheIterator.class);
  261. if (i != null) {
  262. DirCacheEntry ent = i.getDirCacheEntry();
  263. if (ent != null && compareMetadata(ent) == MetadataDiff.EQUAL
  264. && ((ent.getFileMode().getBits()
  265. & FileMode.TYPE_MASK) != FileMode.TYPE_GITLINK)) {
  266. contentIdOffset = i.idOffset();
  267. contentIdFromPtr = ptr;
  268. return contentId = i.idBuffer();
  269. }
  270. contentIdOffset = 0;
  271. } else {
  272. contentIdOffset = 0;
  273. }
  274. }
  275. switch (mode & FileMode.TYPE_MASK) {
  276. case FileMode.TYPE_SYMLINK:
  277. case FileMode.TYPE_FILE:
  278. contentIdFromPtr = ptr;
  279. return contentId = idBufferBlob(entries[ptr]);
  280. case FileMode.TYPE_GITLINK:
  281. contentIdFromPtr = ptr;
  282. return contentId = idSubmodule(entries[ptr]);
  283. }
  284. return zeroid;
  285. }
  286. /** {@inheritDoc} */
  287. @Override
  288. public boolean isWorkTree() {
  289. return true;
  290. }
  291. /**
  292. * Get submodule id for given entry.
  293. *
  294. * @param e
  295. * a {@link org.eclipse.jgit.treewalk.WorkingTreeIterator.Entry}
  296. * object.
  297. * @return non-null submodule id
  298. */
  299. protected byte[] idSubmodule(Entry e) {
  300. if (repository == null)
  301. return zeroid;
  302. File directory;
  303. try {
  304. directory = repository.getWorkTree();
  305. } catch (NoWorkTreeException nwte) {
  306. return zeroid;
  307. }
  308. return idSubmodule(directory, e);
  309. }
  310. /**
  311. * Get submodule id using the repository at the location of the entry
  312. * relative to the directory.
  313. *
  314. * @param directory
  315. * a {@link java.io.File} object.
  316. * @param e
  317. * a {@link org.eclipse.jgit.treewalk.WorkingTreeIterator.Entry}
  318. * object.
  319. * @return non-null submodule id
  320. */
  321. protected byte[] idSubmodule(File directory, Entry e) {
  322. try (Repository submoduleRepo = SubmoduleWalk.getSubmoduleRepository(
  323. directory, e.getName(),
  324. repository != null ? repository.getFS() : FS.DETECTED)) {
  325. if (submoduleRepo == null) {
  326. return zeroid;
  327. }
  328. ObjectId head = submoduleRepo.resolve(Constants.HEAD);
  329. if (head == null) {
  330. return zeroid;
  331. }
  332. byte[] id = new byte[Constants.OBJECT_ID_LENGTH];
  333. head.copyRawTo(id, 0);
  334. return id;
  335. } catch (IOException exception) {
  336. return zeroid;
  337. }
  338. }
  339. private static final byte[] digits = { '0', '1', '2', '3', '4', '5', '6',
  340. '7', '8', '9' };
  341. private static final byte[] hblob = Constants
  342. .encodedTypeString(Constants.OBJ_BLOB);
  343. private byte[] idBufferBlob(Entry e) {
  344. try {
  345. final InputStream is = e.openInputStream();
  346. if (is == null)
  347. return zeroid;
  348. try {
  349. state.initializeReadBuffer();
  350. final long len = e.getLength();
  351. InputStream filteredIs = possiblyFilteredInputStream(e, is, len,
  352. OperationType.CHECKIN_OP);
  353. return computeHash(filteredIs, canonLen);
  354. } finally {
  355. safeClose(is);
  356. }
  357. } catch (IOException err) {
  358. // Can't read the file? Don't report the failure either.
  359. return zeroid;
  360. }
  361. }
  362. private InputStream possiblyFilteredInputStream(final Entry e,
  363. final InputStream is, final long len) throws IOException {
  364. return possiblyFilteredInputStream(e, is, len, null);
  365. }
  366. private InputStream possiblyFilteredInputStream(final Entry e,
  367. final InputStream is, final long len, OperationType opType)
  368. throws IOException {
  369. if (getCleanFilterCommand() == null
  370. && getEolStreamType(opType) == EolStreamType.DIRECT) {
  371. canonLen = len;
  372. return is;
  373. }
  374. if (len <= MAXIMUM_FILE_SIZE_TO_READ_FULLY) {
  375. ByteBuffer rawbuf = IO.readWholeStream(is, (int) len);
  376. rawbuf = filterClean(rawbuf.array(), rawbuf.limit(), opType);
  377. canonLen = rawbuf.limit();
  378. return new ByteArrayInputStream(rawbuf.array(), 0, (int) canonLen);
  379. }
  380. if (getCleanFilterCommand() == null && isBinary(e)) {
  381. canonLen = len;
  382. return is;
  383. }
  384. final InputStream lenIs = filterClean(e.openInputStream(),
  385. opType);
  386. try {
  387. canonLen = computeLength(lenIs);
  388. } finally {
  389. safeClose(lenIs);
  390. }
  391. return filterClean(is, opType);
  392. }
  393. private static void safeClose(InputStream in) {
  394. try {
  395. in.close();
  396. } catch (IOException err2) {
  397. // Suppress any error related to closing an input
  398. // stream. We don't care, we should not have any
  399. // outstanding data to flush or anything like that.
  400. }
  401. }
  402. private static boolean isBinary(Entry entry) throws IOException {
  403. InputStream in = entry.openInputStream();
  404. try {
  405. return RawText.isBinary(in);
  406. } finally {
  407. safeClose(in);
  408. }
  409. }
  410. private ByteBuffer filterClean(byte[] src, int n, OperationType opType)
  411. throws IOException {
  412. InputStream in = new ByteArrayInputStream(src);
  413. try {
  414. return IO.readWholeStream(filterClean(in, opType), n);
  415. } finally {
  416. safeClose(in);
  417. }
  418. }
  419. private InputStream filterClean(InputStream in) throws IOException {
  420. return filterClean(in, null);
  421. }
  422. private InputStream filterClean(InputStream in, OperationType opType)
  423. throws IOException {
  424. in = handleAutoCRLF(in, opType);
  425. String filterCommand = getCleanFilterCommand();
  426. if (filterCommand != null) {
  427. if (FilterCommandRegistry.isRegistered(filterCommand)) {
  428. LocalFile buffer = new TemporaryBuffer.LocalFile(null);
  429. FilterCommand command = FilterCommandRegistry
  430. .createFilterCommand(filterCommand, repository, in,
  431. buffer);
  432. while (command.run() != -1) {
  433. // loop as long as command.run() tells there is work to do
  434. }
  435. return buffer.openInputStreamWithAutoDestroy();
  436. }
  437. FS fs = repository.getFS();
  438. ProcessBuilder filterProcessBuilder = fs.runInShell(filterCommand,
  439. new String[0]);
  440. filterProcessBuilder.directory(repository.getWorkTree());
  441. filterProcessBuilder.environment().put(Constants.GIT_DIR_KEY,
  442. repository.getDirectory().getAbsolutePath());
  443. ExecutionResult result;
  444. try {
  445. result = fs.execute(filterProcessBuilder, in);
  446. } catch (IOException | InterruptedException e) {
  447. throw new IOException(new FilterFailedException(e,
  448. filterCommand, getEntryPathString()));
  449. }
  450. int rc = result.getRc();
  451. if (rc != 0) {
  452. throw new IOException(new FilterFailedException(rc,
  453. filterCommand, getEntryPathString(),
  454. result.getStdout().toByteArray(MAX_EXCEPTION_TEXT_SIZE),
  455. result.getStderr().toString(MAX_EXCEPTION_TEXT_SIZE)));
  456. }
  457. return result.getStdout().openInputStreamWithAutoDestroy();
  458. }
  459. return in;
  460. }
  461. private InputStream handleAutoCRLF(InputStream in, OperationType opType)
  462. throws IOException {
  463. return EolStreamTypeUtil.wrapInputStream(in, getEolStreamType(opType));
  464. }
  465. /**
  466. * Returns the working tree options used by this iterator.
  467. *
  468. * @return working tree options
  469. */
  470. public WorkingTreeOptions getOptions() {
  471. return state.options;
  472. }
  473. /**
  474. * Retrieves the {@link Repository} this {@link WorkingTreeIterator}
  475. * operates on.
  476. *
  477. * @return the {@link Repository}
  478. * @since 5.9
  479. */
  480. public Repository getRepository() {
  481. return repository;
  482. }
  483. /** {@inheritDoc} */
  484. @Override
  485. public int idOffset() {
  486. return contentIdOffset;
  487. }
  488. /** {@inheritDoc} */
  489. @Override
  490. public void reset() {
  491. if (!first()) {
  492. ptr = 0;
  493. if (!eof())
  494. parseEntry();
  495. }
  496. }
  497. /** {@inheritDoc} */
  498. @Override
  499. public boolean first() {
  500. return ptr == 0;
  501. }
  502. /** {@inheritDoc} */
  503. @Override
  504. public boolean eof() {
  505. return ptr == entryCnt;
  506. }
  507. /** {@inheritDoc} */
  508. @Override
  509. public void next(int delta) throws CorruptObjectException {
  510. ptr += delta;
  511. if (!eof()) {
  512. parseEntry();
  513. }
  514. }
  515. /** {@inheritDoc} */
  516. @Override
  517. public void back(int delta) throws CorruptObjectException {
  518. ptr -= delta;
  519. parseEntry();
  520. }
  521. private void parseEntry() {
  522. final Entry e = entries[ptr];
  523. mode = e.getMode().getBits();
  524. final int nameLen = e.encodedNameLen;
  525. ensurePathCapacity(pathOffset + nameLen, pathOffset);
  526. System.arraycopy(e.encodedName, 0, path, pathOffset, nameLen);
  527. pathLen = pathOffset + nameLen;
  528. canonLen = -1;
  529. cleanFilterCommandHolder = null;
  530. eolStreamTypeHolder = null;
  531. }
  532. /**
  533. * Get the raw byte length of this entry.
  534. *
  535. * @return size of this file, in bytes.
  536. */
  537. public long getEntryLength() {
  538. return current().getLength();
  539. }
  540. /**
  541. * Get the filtered input length of this entry
  542. *
  543. * @return size of the content, in bytes
  544. * @throws java.io.IOException
  545. */
  546. public long getEntryContentLength() throws IOException {
  547. if (canonLen == -1) {
  548. long rawLen = getEntryLength();
  549. if (rawLen == 0)
  550. canonLen = 0;
  551. InputStream is = current().openInputStream();
  552. try {
  553. // canonLen gets updated here
  554. possiblyFilteredInputStream(current(), is, current()
  555. .getLength());
  556. } finally {
  557. safeClose(is);
  558. }
  559. }
  560. return canonLen;
  561. }
  562. /**
  563. * Get the last modified time of this entry.
  564. *
  565. * @return last modified time of this file, in milliseconds since the epoch
  566. * (Jan 1, 1970 UTC).
  567. * @deprecated use {@link #getEntryLastModifiedInstant()} instead
  568. */
  569. @Deprecated
  570. public long getEntryLastModified() {
  571. return current().getLastModified();
  572. }
  573. /**
  574. * Get the last modified time of this entry.
  575. *
  576. * @return last modified time of this file
  577. * @since 5.1.9
  578. */
  579. public Instant getEntryLastModifiedInstant() {
  580. return current().getLastModifiedInstant();
  581. }
  582. /**
  583. * Obtain an input stream to read the file content.
  584. * <p>
  585. * Efficient implementations are not required. The caller will usually
  586. * obtain the stream only once per entry, if at all.
  587. * <p>
  588. * The input stream should not use buffering if the implementation can avoid
  589. * it. The caller will buffer as necessary to perform efficient block IO
  590. * operations.
  591. * <p>
  592. * The caller will close the stream once complete.
  593. *
  594. * @return a stream to read from the file.
  595. * @throws java.io.IOException
  596. * the file could not be opened for reading.
  597. */
  598. public InputStream openEntryStream() throws IOException {
  599. InputStream rawis = current().openInputStream();
  600. if (getCleanFilterCommand() == null
  601. && getEolStreamType() == EolStreamType.DIRECT) {
  602. return rawis;
  603. }
  604. return filterClean(rawis);
  605. }
  606. /**
  607. * Determine if the current entry path is ignored by an ignore rule.
  608. *
  609. * @return true if the entry was ignored by an ignore rule file.
  610. * @throws java.io.IOException
  611. * a relevant ignore rule file exists but cannot be read.
  612. */
  613. public boolean isEntryIgnored() throws IOException {
  614. return isEntryIgnored(pathLen);
  615. }
  616. /**
  617. * Determine if the entry path is ignored by an ignore rule.
  618. *
  619. * @param pLen
  620. * the length of the path in the path buffer.
  621. * @return true if the entry is ignored by an ignore rule.
  622. * @throws java.io.IOException
  623. * a relevant ignore rule file exists but cannot be read.
  624. */
  625. protected boolean isEntryIgnored(int pLen) throws IOException {
  626. return isEntryIgnored(pLen, mode);
  627. }
  628. /**
  629. * Determine if the entry path is ignored by an ignore rule.
  630. *
  631. * @param pLen
  632. * the length of the path in the path buffer.
  633. * @param fileMode
  634. * the original iterator file mode
  635. * @return true if the entry is ignored by an ignore rule.
  636. * @throws IOException
  637. * a relevant ignore rule file exists but cannot be read.
  638. */
  639. private boolean isEntryIgnored(int pLen, int fileMode)
  640. throws IOException {
  641. // The ignore code wants path to start with a '/' if possible.
  642. // If we have the '/' in our path buffer because we are inside
  643. // a sub-directory include it in the range we convert to string.
  644. //
  645. final int pOff = 0 < pathOffset ? pathOffset - 1 : pathOffset;
  646. String pathRel = TreeWalk.pathOf(this.path, pOff, pLen);
  647. String parentRel = getParentPath(pathRel);
  648. // CGit is processing .gitignore files by starting at the root of the
  649. // repository and then recursing into subdirectories. With this
  650. // approach, top-level ignored directories will be processed first which
  651. // allows to skip entire subtrees and further .gitignore-file processing
  652. // within these subtrees.
  653. //
  654. // We will follow the same approach by marking directories as "ignored"
  655. // here. This allows to have a simplified FastIgnore.checkIgnore()
  656. // implementation (both in terms of code and computational complexity):
  657. //
  658. // Without the "ignored" flag, we would have to apply the ignore-check
  659. // to a path and all of its parents always(!), to determine whether a
  660. // path is ignored directly or by one of its parent directories; with
  661. // the "ignored" flag, we know at this point that the parent directory
  662. // is definitely not ignored, thus the path can only become ignored if
  663. // there is a rule matching the path itself.
  664. if (isDirectoryIgnored(parentRel)) {
  665. return true;
  666. }
  667. IgnoreNode rules = getIgnoreNode();
  668. final Boolean ignored = rules != null
  669. ? rules.checkIgnored(pathRel, FileMode.TREE.equals(fileMode))
  670. : null;
  671. if (ignored != null) {
  672. return ignored.booleanValue();
  673. }
  674. return parent instanceof WorkingTreeIterator
  675. && ((WorkingTreeIterator) parent).isEntryIgnored(pLen,
  676. fileMode);
  677. }
  678. private IgnoreNode getIgnoreNode() throws IOException {
  679. if (ignoreNode instanceof PerDirectoryIgnoreNode)
  680. ignoreNode = ((PerDirectoryIgnoreNode) ignoreNode).load();
  681. return ignoreNode;
  682. }
  683. /**
  684. * Retrieves the {@link org.eclipse.jgit.attributes.AttributesNode} for the
  685. * current entry.
  686. *
  687. * @return the {@link org.eclipse.jgit.attributes.AttributesNode} for the
  688. * current entry.
  689. * @throws IOException
  690. */
  691. public AttributesNode getEntryAttributesNode() throws IOException {
  692. if (attributesNode instanceof PerDirectoryAttributesNode)
  693. attributesNode = ((PerDirectoryAttributesNode) attributesNode)
  694. .load();
  695. return attributesNode;
  696. }
  697. private static final Comparator<Entry> ENTRY_CMP = (Entry a,
  698. Entry b) -> Paths.compare(a.encodedName, 0, a.encodedNameLen,
  699. a.getMode().getBits(), b.encodedName, 0, b.encodedNameLen,
  700. b.getMode().getBits());
  701. /**
  702. * Constructor helper.
  703. *
  704. * @param list
  705. * files in the subtree of the work tree this iterator operates
  706. * on
  707. */
  708. protected void init(Entry[] list) {
  709. // Filter out nulls, . and .. as these are not valid tree entries,
  710. // also cache the encoded forms of the path names for efficient use
  711. // later on during sorting and iteration.
  712. //
  713. entries = list;
  714. int i, o;
  715. final CharsetEncoder nameEncoder = state.nameEncoder;
  716. for (i = 0, o = 0; i < entries.length; i++) {
  717. final Entry e = entries[i];
  718. if (e == null)
  719. continue;
  720. final String name = e.getName();
  721. if (".".equals(name) || "..".equals(name)) //$NON-NLS-1$ //$NON-NLS-2$
  722. continue;
  723. if (Constants.DOT_GIT.equals(name))
  724. continue;
  725. if (Constants.DOT_GIT_IGNORE.equals(name))
  726. ignoreNode = new PerDirectoryIgnoreNode(
  727. TreeWalk.pathOf(path, 0, pathOffset)
  728. + Constants.DOT_GIT_IGNORE,
  729. e);
  730. if (Constants.DOT_GIT_ATTRIBUTES.equals(name))
  731. attributesNode = new PerDirectoryAttributesNode(e);
  732. if (i != o)
  733. entries[o] = e;
  734. e.encodeName(nameEncoder);
  735. o++;
  736. }
  737. entryCnt = o;
  738. Arrays.sort(entries, 0, entryCnt, ENTRY_CMP);
  739. contentIdFromPtr = -1;
  740. ptr = 0;
  741. if (!eof())
  742. parseEntry();
  743. else if (pathLen == 0) // see bug 445363
  744. pathLen = pathOffset;
  745. }
  746. /**
  747. * Obtain the current entry from this iterator.
  748. *
  749. * @return the currently selected entry.
  750. */
  751. protected Entry current() {
  752. return entries[ptr];
  753. }
  754. /**
  755. * The result of a metadata-comparison between the current entry and a
  756. * {@link DirCacheEntry}
  757. */
  758. public enum MetadataDiff {
  759. /**
  760. * The entries are equal by metaData (mode, length,
  761. * modification-timestamp) or the <code>assumeValid</code> attribute of
  762. * the index entry is set
  763. */
  764. EQUAL,
  765. /**
  766. * The entries are not equal by metaData (mode, length) or the
  767. * <code>isUpdateNeeded</code> attribute of the index entry is set
  768. */
  769. DIFFER_BY_METADATA,
  770. /** index entry is smudged - can't use that entry for comparison */
  771. SMUDGED,
  772. /**
  773. * The entries are equal by metaData (mode, length) but differ by
  774. * modification-timestamp.
  775. */
  776. DIFFER_BY_TIMESTAMP
  777. }
  778. /**
  779. * Is the file mode of the current entry different than the given raw mode?
  780. *
  781. * @param rawMode
  782. * an int.
  783. * @return true if different, false otherwise
  784. */
  785. public boolean isModeDifferent(int rawMode) {
  786. // Determine difference in mode-bits of file and index-entry. In the
  787. // bitwise presentation of modeDiff we'll have a '1' when the two modes
  788. // differ at this position.
  789. int modeDiff = getEntryRawMode() ^ rawMode;
  790. if (modeDiff == 0)
  791. return false;
  792. // Do not rely on filemode differences in case of symbolic links
  793. if (getOptions().getSymLinks() == SymLinks.FALSE)
  794. if (FileMode.SYMLINK.equals(rawMode))
  795. return false;
  796. // Ignore the executable file bits if WorkingTreeOptions tell me to
  797. // do so. Ignoring is done by setting the bits representing a
  798. // EXECUTABLE_FILE to '0' in modeDiff
  799. if (!state.options.isFileMode())
  800. modeDiff &= ~FileMode.EXECUTABLE_FILE.getBits();
  801. return modeDiff != 0;
  802. }
  803. /**
  804. * Compare the metadata (mode, length, modification-timestamp) of the
  805. * current entry and a {@link org.eclipse.jgit.dircache.DirCacheEntry}
  806. *
  807. * @param entry
  808. * the {@link org.eclipse.jgit.dircache.DirCacheEntry} to compare
  809. * with
  810. * @return a
  811. * {@link org.eclipse.jgit.treewalk.WorkingTreeIterator.MetadataDiff}
  812. * which tells whether and how the entries metadata differ
  813. */
  814. public MetadataDiff compareMetadata(DirCacheEntry entry) {
  815. if (entry.isAssumeValid())
  816. return MetadataDiff.EQUAL;
  817. if (entry.isUpdateNeeded())
  818. return MetadataDiff.DIFFER_BY_METADATA;
  819. if (isModeDifferent(entry.getRawMode()))
  820. return MetadataDiff.DIFFER_BY_METADATA;
  821. // Don't check for length or lastmodified on folders
  822. int type = mode & FileMode.TYPE_MASK;
  823. if (type == FileMode.TYPE_TREE || type == FileMode.TYPE_GITLINK)
  824. return MetadataDiff.EQUAL;
  825. if (!entry.isSmudged() && entry.getLength() != (int) getEntryLength())
  826. return MetadataDiff.DIFFER_BY_METADATA;
  827. // Cache and file timestamps may differ in resolution. Therefore don't
  828. // compare instants directly but use a comparator that compares only
  829. // up to the lower apparent resolution of either timestamp.
  830. //
  831. // If core.checkstat is set to "minimal", compare only the seconds part.
  832. Instant cacheLastModified = entry.getLastModifiedInstant();
  833. Instant fileLastModified = getEntryLastModifiedInstant();
  834. if (timestampComparator.compare(cacheLastModified, fileLastModified,
  835. getOptions().getCheckStat() == CheckStat.MINIMAL) != 0) {
  836. return MetadataDiff.DIFFER_BY_TIMESTAMP;
  837. }
  838. if (entry.isSmudged()) {
  839. return MetadataDiff.SMUDGED;
  840. }
  841. // The file is clean when when comparing timestamps
  842. return MetadataDiff.EQUAL;
  843. }
  844. /**
  845. * Checks whether this entry differs from a given entry from the
  846. * {@link org.eclipse.jgit.dircache.DirCache}.
  847. *
  848. * File status information is used and if status is same we consider the
  849. * file identical to the state in the working directory. Native git uses
  850. * more stat fields than we have accessible in Java.
  851. *
  852. * @param entry
  853. * the entry from the dircache we want to compare against
  854. * @param forceContentCheck
  855. * True if the actual file content should be checked if
  856. * modification time differs.
  857. * @param reader
  858. * access to repository objects if necessary. Should not be null.
  859. * @return true if content is most likely different.
  860. * @throws java.io.IOException
  861. * @since 3.3
  862. */
  863. public boolean isModified(DirCacheEntry entry, boolean forceContentCheck,
  864. ObjectReader reader) throws IOException {
  865. if (entry == null)
  866. return !FileMode.MISSING.equals(getEntryFileMode());
  867. MetadataDiff diff = compareMetadata(entry);
  868. switch (diff) {
  869. case DIFFER_BY_TIMESTAMP:
  870. if (forceContentCheck) {
  871. // But we are told to look at content even though timestamps
  872. // tell us about modification
  873. return contentCheck(entry, reader);
  874. }
  875. // We are told to assume a modification if timestamps differs
  876. return true;
  877. case SMUDGED:
  878. // The file is clean by timestamps but the entry was smudged.
  879. // Lets do a content check
  880. return contentCheck(entry, reader);
  881. case EQUAL:
  882. if (mode == FileMode.SYMLINK.getBits()) {
  883. return contentCheck(entry, reader);
  884. }
  885. return false;
  886. case DIFFER_BY_METADATA:
  887. if (mode == FileMode.TREE.getBits()
  888. && entry.getFileMode().equals(FileMode.GITLINK)) {
  889. byte[] idBuffer = idBuffer();
  890. int idOffset = idOffset();
  891. if (entry.getObjectId().compareTo(idBuffer, idOffset) == 0) {
  892. return true;
  893. } else if (ObjectId.zeroId().compareTo(idBuffer,
  894. idOffset) == 0) {
  895. Path p = repository.getWorkTree().toPath()
  896. .resolve(entry.getPathString());
  897. return FileUtils.hasFiles(p);
  898. }
  899. return false;
  900. } else if (mode == FileMode.SYMLINK.getBits())
  901. return contentCheck(entry, reader);
  902. return true;
  903. default:
  904. throw new IllegalStateException(MessageFormat.format(
  905. JGitText.get().unexpectedCompareResult, diff.name()));
  906. }
  907. }
  908. /**
  909. * Get the file mode to use for the current entry when it is to be updated
  910. * in the index.
  911. *
  912. * @param indexIter
  913. * {@link org.eclipse.jgit.dircache.DirCacheIterator} positioned
  914. * at the same entry as this iterator or null if no
  915. * {@link org.eclipse.jgit.dircache.DirCacheIterator} is
  916. * available at this iterator's current entry
  917. * @return index file mode
  918. */
  919. public FileMode getIndexFileMode(DirCacheIterator indexIter) {
  920. final FileMode wtMode = getEntryFileMode();
  921. if (indexIter == null) {
  922. return wtMode;
  923. }
  924. final FileMode iMode = indexIter.getEntryFileMode();
  925. if (getOptions().isFileMode() && iMode != FileMode.GITLINK && iMode != FileMode.TREE) {
  926. return wtMode;
  927. }
  928. if (!getOptions().isFileMode()) {
  929. if (FileMode.REGULAR_FILE == wtMode
  930. && FileMode.EXECUTABLE_FILE == iMode) {
  931. return iMode;
  932. }
  933. if (FileMode.EXECUTABLE_FILE == wtMode
  934. && FileMode.REGULAR_FILE == iMode) {
  935. return iMode;
  936. }
  937. }
  938. if (FileMode.GITLINK == iMode
  939. && FileMode.TREE == wtMode && !getOptions().isDirNoGitLinks()) {
  940. return iMode;
  941. }
  942. if (FileMode.TREE == iMode
  943. && FileMode.GITLINK == wtMode) {
  944. return iMode;
  945. }
  946. return wtMode;
  947. }
  948. /**
  949. * Compares the entries content with the content in the filesystem.
  950. * Unsmudges the entry when it is detected that it is clean.
  951. *
  952. * @param entry
  953. * the entry to be checked
  954. * @param reader
  955. * acccess to repository data if necessary
  956. * @return <code>true</code> if the content doesn't match,
  957. * <code>false</code> if it matches
  958. * @throws IOException
  959. */
  960. private boolean contentCheck(DirCacheEntry entry, ObjectReader reader)
  961. throws IOException {
  962. if (getEntryObjectId().equals(entry.getObjectId())) {
  963. // Content has not changed
  964. // We know the entry can't be racily clean because it's still clean.
  965. // Therefore we unsmudge the entry!
  966. // If by any chance we now unsmudge although we are still in the
  967. // same time-slot as the last modification to the index file the
  968. // next index write operation will smudge again.
  969. // Caution: we are unsmudging just by setting the length of the
  970. // in-memory entry object. It's the callers task to detect that we
  971. // have modified the entry and to persist the modified index.
  972. entry.setLength((int) getEntryLength());
  973. return false;
  974. }
  975. if (mode == FileMode.SYMLINK.getBits()) {
  976. return !new File(readSymlinkTarget(current())).equals(
  977. new File(readContentAsNormalizedString(entry, reader)));
  978. }
  979. // Content differs: that's a real change
  980. return true;
  981. }
  982. private static String readContentAsNormalizedString(DirCacheEntry entry,
  983. ObjectReader reader) throws MissingObjectException, IOException {
  984. ObjectLoader open = reader.open(entry.getObjectId());
  985. byte[] cachedBytes = open.getCachedBytes();
  986. return FS.detect().normalize(RawParseUtils.decode(cachedBytes));
  987. }
  988. /**
  989. * Reads the target of a symlink as a string. This default implementation
  990. * fully reads the entry's input stream and converts it to a normalized
  991. * string. Subclasses may override to provide more specialized
  992. * implementations.
  993. *
  994. * @param entry
  995. * to read
  996. * @return the entry's content as a normalized string
  997. * @throws java.io.IOException
  998. * if the entry cannot be read or does not denote a symlink
  999. * @since 4.6
  1000. */
  1001. protected String readSymlinkTarget(Entry entry) throws IOException {
  1002. if (!entry.getMode().equals(FileMode.SYMLINK)) {
  1003. throw new java.nio.file.NotLinkException(entry.getName());
  1004. }
  1005. long length = entry.getLength();
  1006. byte[] content = new byte[(int) length];
  1007. try (InputStream is = entry.openInputStream()) {
  1008. int bytesRead = IO.readFully(is, content, 0);
  1009. return FS.detect()
  1010. .normalize(RawParseUtils.decode(content, 0, bytesRead));
  1011. }
  1012. }
  1013. private static long computeLength(InputStream in) throws IOException {
  1014. // Since we only care about the length, use skip. The stream
  1015. // may be able to more efficiently wade through its data.
  1016. //
  1017. long length = 0;
  1018. for (;;) {
  1019. long n = in.skip(1 << 20);
  1020. if (n <= 0)
  1021. break;
  1022. length += n;
  1023. }
  1024. return length;
  1025. }
  1026. private byte[] computeHash(InputStream in, long length) throws IOException {
  1027. SHA1 contentDigest = SHA1.newInstance();
  1028. final byte[] contentReadBuffer = state.contentReadBuffer;
  1029. contentDigest.update(hblob);
  1030. contentDigest.update((byte) ' ');
  1031. long sz = length;
  1032. if (sz == 0) {
  1033. contentDigest.update((byte) '0');
  1034. } else {
  1035. final int bufn = contentReadBuffer.length;
  1036. int p = bufn;
  1037. do {
  1038. contentReadBuffer[--p] = digits[(int) (sz % 10)];
  1039. sz /= 10;
  1040. } while (sz > 0);
  1041. contentDigest.update(contentReadBuffer, p, bufn - p);
  1042. }
  1043. contentDigest.update((byte) 0);
  1044. for (;;) {
  1045. final int r = in.read(contentReadBuffer);
  1046. if (r <= 0)
  1047. break;
  1048. contentDigest.update(contentReadBuffer, 0, r);
  1049. sz += r;
  1050. }
  1051. if (sz != length)
  1052. return zeroid;
  1053. return contentDigest.digest();
  1054. }
  1055. /**
  1056. * A single entry within a working directory tree.
  1057. *
  1058. * @since 5.0
  1059. */
  1060. public abstract static class Entry {
  1061. byte[] encodedName;
  1062. int encodedNameLen;
  1063. void encodeName(CharsetEncoder enc) {
  1064. final ByteBuffer b;
  1065. try {
  1066. b = enc.encode(CharBuffer.wrap(getName()));
  1067. } catch (CharacterCodingException e) {
  1068. // This should so never happen.
  1069. throw new RuntimeException(MessageFormat.format(
  1070. JGitText.get().unencodeableFile, getName()), e);
  1071. }
  1072. encodedNameLen = b.limit();
  1073. if (b.hasArray() && b.arrayOffset() == 0)
  1074. encodedName = b.array();
  1075. else
  1076. b.get(encodedName = new byte[encodedNameLen]);
  1077. }
  1078. @Override
  1079. public String toString() {
  1080. return getMode().toString() + " " + getName(); //$NON-NLS-1$
  1081. }
  1082. /**
  1083. * Get the type of this entry.
  1084. * <p>
  1085. * <b>Note: Efficient implementation required.</b>
  1086. * <p>
  1087. * The implementation of this method must be efficient. If a subclass
  1088. * needs to compute the value they should cache the reference within an
  1089. * instance member instead.
  1090. *
  1091. * @return a file mode constant from {@link FileMode}.
  1092. */
  1093. public abstract FileMode getMode();
  1094. /**
  1095. * Get the byte length of this entry.
  1096. * <p>
  1097. * <b>Note: Efficient implementation required.</b>
  1098. * <p>
  1099. * The implementation of this method must be efficient. If a subclass
  1100. * needs to compute the value they should cache the reference within an
  1101. * instance member instead.
  1102. *
  1103. * @return size of this file, in bytes.
  1104. */
  1105. public abstract long getLength();
  1106. /**
  1107. * Get the last modified time of this entry.
  1108. * <p>
  1109. * <b>Note: Efficient implementation required.</b>
  1110. * <p>
  1111. * The implementation of this method must be efficient. If a subclass
  1112. * needs to compute the value they should cache the reference within an
  1113. * instance member instead.
  1114. *
  1115. * @return time since the epoch (in ms) of the last change.
  1116. * @deprecated use {@link #getLastModifiedInstant()} instead
  1117. */
  1118. @Deprecated
  1119. public abstract long getLastModified();
  1120. /**
  1121. * Get the last modified time of this entry.
  1122. * <p>
  1123. * <b>Note: Efficient implementation required.</b>
  1124. * <p>
  1125. * The implementation of this method must be efficient. If a subclass
  1126. * needs to compute the value they should cache the reference within an
  1127. * instance member instead.
  1128. *
  1129. * @return time of the last change.
  1130. * @since 5.1.9
  1131. */
  1132. public abstract Instant getLastModifiedInstant();
  1133. /**
  1134. * Get the name of this entry within its directory.
  1135. * <p>
  1136. * Efficient implementations are not required. The caller will obtain
  1137. * the name only once and cache it once obtained.
  1138. *
  1139. * @return name of the entry.
  1140. */
  1141. public abstract String getName();
  1142. /**
  1143. * Obtain an input stream to read the file content.
  1144. * <p>
  1145. * Efficient implementations are not required. The caller will usually
  1146. * obtain the stream only once per entry, if at all.
  1147. * <p>
  1148. * The input stream should not use buffering if the implementation can
  1149. * avoid it. The caller will buffer as necessary to perform efficient
  1150. * block IO operations.
  1151. * <p>
  1152. * The caller will close the stream once complete.
  1153. *
  1154. * @return a stream to read from the file.
  1155. * @throws IOException
  1156. * the file could not be opened for reading.
  1157. */
  1158. public abstract InputStream openInputStream() throws IOException;
  1159. }
  1160. /** Magic type indicating we know rules exist, but they aren't loaded. */
  1161. private static class PerDirectoryIgnoreNode extends IgnoreNode {
  1162. protected final Entry entry;
  1163. private final String name;
  1164. PerDirectoryIgnoreNode(String name, Entry entry) {
  1165. super(Collections.<FastIgnoreRule> emptyList());
  1166. this.name = name;
  1167. this.entry = entry;
  1168. }
  1169. IgnoreNode load() throws IOException {
  1170. IgnoreNode r = new IgnoreNode();
  1171. try (InputStream in = entry.openInputStream()) {
  1172. r.parse(name, in);
  1173. }
  1174. return r.getRules().isEmpty() ? null : r;
  1175. }
  1176. }
  1177. /** Magic type indicating there may be rules for the top level. */
  1178. private static class RootIgnoreNode extends PerDirectoryIgnoreNode {
  1179. final Repository repository;
  1180. RootIgnoreNode(Entry entry, Repository repository) {
  1181. super(entry != null ? entry.getName() : null, entry);
  1182. this.repository = repository;
  1183. }
  1184. @Override
  1185. IgnoreNode load() throws IOException {
  1186. IgnoreNode r;
  1187. if (entry != null) {
  1188. r = super.load();
  1189. if (r == null)
  1190. r = new IgnoreNode();
  1191. } else {
  1192. r = new IgnoreNode();
  1193. }
  1194. FS fs = repository.getFS();
  1195. Path path = repository.getConfig().getPath(
  1196. ConfigConstants.CONFIG_CORE_SECTION, null,
  1197. ConfigConstants.CONFIG_KEY_EXCLUDESFILE, fs, null, null);
  1198. if (path != null) {
  1199. loadRulesFromFile(r, path.toFile());
  1200. }
  1201. File exclude = fs.resolve(repository.getDirectory(),
  1202. Constants.INFO_EXCLUDE);
  1203. loadRulesFromFile(r, exclude);
  1204. return r.getRules().isEmpty() ? null : r;
  1205. }
  1206. private static void loadRulesFromFile(IgnoreNode r, File exclude)
  1207. throws FileNotFoundException, IOException {
  1208. if (FS.DETECTED.exists(exclude)) {
  1209. try (FileInputStream in = new FileInputStream(exclude)) {
  1210. r.parse(exclude.getAbsolutePath(), in);
  1211. }
  1212. }
  1213. }
  1214. }
  1215. /** Magic type indicating we know rules exist, but they aren't loaded. */
  1216. private static class PerDirectoryAttributesNode extends AttributesNode {
  1217. final Entry entry;
  1218. PerDirectoryAttributesNode(Entry entry) {
  1219. super(Collections.<AttributesRule> emptyList());
  1220. this.entry = entry;
  1221. }
  1222. AttributesNode load() throws IOException {
  1223. AttributesNode r = new AttributesNode();
  1224. try (InputStream in = entry.openInputStream()) {
  1225. r.parse(in);
  1226. }
  1227. return r.getRules().isEmpty() ? null : r;
  1228. }
  1229. }
  1230. private static final class IteratorState {
  1231. /** Options used to process the working tree. */
  1232. final WorkingTreeOptions options;
  1233. /** File name character encoder. */
  1234. final CharsetEncoder nameEncoder;
  1235. /** Buffer used to perform {@link #contentId} computations. */
  1236. byte[] contentReadBuffer;
  1237. /** TreeWalk with a (supposedly) matching DirCacheIterator. */
  1238. TreeWalk walk;
  1239. /** Position of the matching {@link DirCacheIterator}. */
  1240. int dirCacheTree = -1;
  1241. /** Whether the iterator shall walk ignored directories. */
  1242. boolean walkIgnored = false;
  1243. final Map<String, Boolean> directoryToIgnored = new HashMap<>();
  1244. IteratorState(WorkingTreeOptions options) {
  1245. this.options = options;
  1246. this.nameEncoder = UTF_8.newEncoder();
  1247. }
  1248. void initializeReadBuffer() {
  1249. if (contentReadBuffer == null) {
  1250. contentReadBuffer = new byte[BUFFER_SIZE];
  1251. }
  1252. }
  1253. }
  1254. /**
  1255. * Get the clean filter command for the current entry.
  1256. *
  1257. * @return the clean filter command for the current entry or
  1258. * <code>null</code> if no such command is defined
  1259. * @throws java.io.IOException
  1260. * @since 4.2
  1261. */
  1262. public String getCleanFilterCommand() throws IOException {
  1263. if (cleanFilterCommandHolder == null) {
  1264. String cmd = null;
  1265. if (state.walk != null) {
  1266. cmd = state.walk
  1267. .getFilterCommand(Constants.ATTR_FILTER_TYPE_CLEAN);
  1268. }
  1269. cleanFilterCommandHolder = new Holder<>(cmd);
  1270. }
  1271. return cleanFilterCommandHolder.get();
  1272. }
  1273. /**
  1274. * Get the eol stream type for the current entry.
  1275. *
  1276. * @return the eol stream type for the current entry or <code>null</code> if
  1277. * it cannot be determined. When state or state.walk is null or the
  1278. * {@link org.eclipse.jgit.treewalk.TreeWalk} is not based on a
  1279. * {@link org.eclipse.jgit.lib.Repository} then null is returned.
  1280. * @throws java.io.IOException
  1281. * @since 4.3
  1282. */
  1283. public EolStreamType getEolStreamType() throws IOException {
  1284. return getEolStreamType(null);
  1285. }
  1286. /**
  1287. * @param opType
  1288. * The operationtype (checkin/checkout) which should be used
  1289. * @return the eol stream type for the current entry or <code>null</code> if
  1290. * it cannot be determined. When state or state.walk is null or the
  1291. * {@link TreeWalk} is not based on a {@link Repository} then null
  1292. * is returned.
  1293. * @throws IOException
  1294. */
  1295. private EolStreamType getEolStreamType(OperationType opType)
  1296. throws IOException {
  1297. if (eolStreamTypeHolder == null) {
  1298. EolStreamType type = null;
  1299. if (state.walk != null) {
  1300. type = state.walk.getEolStreamType(opType);
  1301. OperationType operationType = opType != null ? opType
  1302. : state.walk.getOperationType();
  1303. if (OperationType.CHECKIN_OP.equals(operationType)
  1304. && EolStreamType.AUTO_LF.equals(type)
  1305. && hasCrLfInIndex(getDirCacheIterator())) {
  1306. // If text=auto (or core.autocrlf=true) and the file has
  1307. // already been committed with CR/LF, then don't convert.
  1308. type = EolStreamType.DIRECT;
  1309. }
  1310. } else {
  1311. switch (getOptions().getAutoCRLF()) {
  1312. case FALSE:
  1313. type = EolStreamType.DIRECT;
  1314. break;
  1315. case TRUE:
  1316. case INPUT:
  1317. type = EolStreamType.AUTO_LF;
  1318. break;
  1319. }
  1320. }
  1321. eolStreamTypeHolder = new Holder<>(type);
  1322. }
  1323. return eolStreamTypeHolder.get();
  1324. }
  1325. /**
  1326. * Determines whether the file was committed un-normalized. If the iterator
  1327. * points to a conflict entry, checks the "ours" version.
  1328. *
  1329. * @param dirCache
  1330. * iterator pointing to the current entry for the file in the
  1331. * index
  1332. * @return {@code true} if the file in the index is not binary and has CR/LF
  1333. * line endings, {@code false} otherwise
  1334. */
  1335. private boolean hasCrLfInIndex(DirCacheIterator dirCache) {
  1336. if (dirCache == null) {
  1337. return false;
  1338. }
  1339. // Read blob from index and check for CR/LF-delimited text.
  1340. DirCacheEntry entry = dirCache.getDirCacheEntry();
  1341. if ((entry.getRawMode() & FileMode.TYPE_MASK) == FileMode.TYPE_FILE) {
  1342. ObjectId blobId = entry.getObjectId();
  1343. if (entry.getStage() > 0
  1344. && entry.getStage() != DirCacheEntry.STAGE_2) {
  1345. blobId = null;
  1346. // Merge conflict: check ours (stage 2)
  1347. byte[] name = entry.getRawPath();
  1348. int i = 0;
  1349. while (!dirCache.eof()) {
  1350. dirCache.next(1);
  1351. i++;
  1352. entry = dirCache.getDirCacheEntry();
  1353. if (entry == null
  1354. || !Arrays.equals(name, entry.getRawPath())) {
  1355. break;
  1356. }
  1357. if (entry.getStage() == DirCacheEntry.STAGE_2) {
  1358. if ((entry.getRawMode()
  1359. & FileMode.TYPE_MASK) == FileMode.TYPE_FILE) {
  1360. blobId = entry.getObjectId();
  1361. }
  1362. break;
  1363. }
  1364. }
  1365. dirCache.back(i);
  1366. }
  1367. if (blobId != null) {
  1368. try (ObjectReader reader = repository.newObjectReader()) {
  1369. ObjectLoader loader = reader.open(blobId,
  1370. Constants.OBJ_BLOB);
  1371. try {
  1372. return RawText.isCrLfText(loader.getCachedBytes());
  1373. } catch (LargeObjectException e) {
  1374. try (InputStream in = loader.openStream()) {
  1375. return RawText.isCrLfText(in);
  1376. }
  1377. }
  1378. } catch (IOException e) {
  1379. // Ignore and return false below
  1380. }
  1381. }
  1382. }
  1383. return false;
  1384. }
  1385. private boolean isDirectoryIgnored(String pathRel) throws IOException {
  1386. final int pOff = 0 < pathOffset ? pathOffset - 1 : pathOffset;
  1387. final String base = TreeWalk.pathOf(this.path, 0, pOff);
  1388. final String pathAbs = concatPath(base, pathRel);
  1389. return isDirectoryIgnored(pathRel, pathAbs);
  1390. }
  1391. private boolean isDirectoryIgnored(String pathRel, String pathAbs)
  1392. throws IOException {
  1393. assert pathRel.length() == 0 || (pathRel.charAt(0) != '/'
  1394. && pathRel.charAt(pathRel.length() - 1) != '/');
  1395. assert pathAbs.length() == 0 || (pathAbs.charAt(0) != '/'
  1396. && pathAbs.charAt(pathAbs.length() - 1) != '/');
  1397. assert pathAbs.endsWith(pathRel);
  1398. Boolean ignored = state.directoryToIgnored.get(pathAbs);
  1399. if (ignored != null) {
  1400. return ignored.booleanValue();
  1401. }
  1402. final String parentRel = getParentPath(pathRel);
  1403. if (parentRel != null && isDirectoryIgnored(parentRel)) {
  1404. state.directoryToIgnored.put(pathAbs, Boolean.TRUE);
  1405. return true;
  1406. }
  1407. final IgnoreNode node = getIgnoreNode();
  1408. for (String p = pathRel; node != null
  1409. && !"".equals(p); p = getParentPath(p)) { //$NON-NLS-1$
  1410. ignored = node.checkIgnored(p, true);
  1411. if (ignored != null) {
  1412. state.directoryToIgnored.put(pathAbs, ignored);
  1413. return ignored.booleanValue();
  1414. }
  1415. }
  1416. if (!(this.parent instanceof WorkingTreeIterator)) {
  1417. state.directoryToIgnored.put(pathAbs, Boolean.FALSE);
  1418. return false;
  1419. }
  1420. final WorkingTreeIterator wtParent = (WorkingTreeIterator) this.parent;
  1421. final String parentRelPath = concatPath(
  1422. TreeWalk.pathOf(this.path, wtParent.pathOffset, pathOffset - 1),
  1423. pathRel);
  1424. assert concatPath(TreeWalk.pathOf(wtParent.path, 0,
  1425. Math.max(0, wtParent.pathOffset - 1)), parentRelPath)
  1426. .equals(pathAbs);
  1427. return wtParent.isDirectoryIgnored(parentRelPath, pathAbs);
  1428. }
  1429. private static String getParentPath(String path) {
  1430. final int slashIndex = path.lastIndexOf('/', path.length() - 2);
  1431. if (slashIndex > 0) {
  1432. return path.substring(path.charAt(0) == '/' ? 1 : 0, slashIndex);
  1433. }
  1434. return path.length() > 0 ? "" : null; //$NON-NLS-1$
  1435. }
  1436. private static String concatPath(String p1, String p2) {
  1437. return p1 + (p1.length() > 0 && p2.length() > 0 ? "/" : "") + p2; //$NON-NLS-1$ //$NON-NLS-2$
  1438. }
  1439. }