aboutsummaryrefslogtreecommitdiffstats
path: root/modules/indexer
Commit message (Collapse)AuthorAgeFilesLines
* Refactor Git Attribute & performance optimization (#34154)Lunny Xiao4 days1-1/+2
| | | | | | | | | | | | | | | | This PR moved git attributes related code to `modules/git/attribute` sub package and moved language stats related code to `modules/git/languagestats` sub package to make it easier to maintain. And it also introduced a performance improvement which use the `git check-attr --source` which can be run in a bare git repository so that we don't need to create a git index file. The new parameter need a git version >= 2.40 . If git version less than 2.40, it will fall back to previous implementation. --------- Co-authored-by: wxiaoguang <wxiaoguang@gmail.com> Co-authored-by: yp05327 <576951401@qq.com>
* feat: Add sorting by exclusive labels (issue priority) (#33206)Thomas E Lackey5 days4-14/+35
| | | | | | | | | | | | | Fix #2616 This PR adds a new sort option for exclusive labels. For exclusive labels, a new property is exposed called "order", while in the UI options are populated automatically in the `Sort` column (see screenshot below) for each exclusive label scope. --------- Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
* Enable addtional linters (#34085)TheFox0x72025-04-019-33/+34
| | | | | | | | enable mirror, usestdlibbars and perfsprint part of: https://github.com/go-gitea/gitea/issues/34083 --------- Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
* Enable testifylint rules (#34075)TheFox0x72025-03-314-18/+18
| | | | enable testifylint rules disabled in: https://github.com/go-gitea/gitea/pull/34054
* enable staticcheck QFxxxx rules (#34064)TheFox0x72025-03-291-2/+3
|
* Decouple Batch from git.Repository to simplify usage without requiring the ↵Lunny Xiao2025-03-272-14/+2
| | | | | creation of a Repository struct. (#34001) No logic change
* Fix incorrect code search indexer options (#33992)wxiaoguang2025-03-241-3/+2
| | | | | Fix #33798 Co-authored-by: Giteabot <teabot@gitea.io>
* Fix file name could not be searched if the file was not a text file when ↵charles2025-03-211-1/+2
| | | | | | | | | using the Bleve indexer (#33959) Close #33828 --------- Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
* Allow filtering issues by any assignee (#33343)Andreas Svanberg2025-03-218-30/+91
| | | | | | | | | | | | | | | | This is the opposite of the "No assignee" filter, it will match all issues that have at least one assignee. Before ![Before change](https://github.com/user-attachments/assets/4aea194b-9add-4a84-8d6b-61bfd8d9e58e) After ![After change with any filter](https://github.com/user-attachments/assets/99f1205d-ba9f-4a0a-a60b-cc1a0c0823fe) --------- Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
* Refactor functions to reduce repopath expose (#33892)Lunny Xiao2025-03-161-5/+5
|
* Make SearchMode have default value and add comments (#33863)wxiaoguang2025-03-148-19/+34
| | | | | | | * Make `SearchMode` have default value if it is empty * Add some comments for the "match" queries * Fix a copy-paste mistake in `buildMatchQuery` (`db.go`) * Add missing `q.Analyzer = repoIndexerAnalyzer`, it is in old code, although I do not see real difference ....
* Improve issue & code search (#33860)wxiaoguang2025-03-1318-101/+214
| | | | Each "indexer" should provide the "search modes" they support by themselves. And we need to remove the "fuzzy" search for code.
* Remove context from git struct (#33793)TheFox0x72025-03-043-10/+10
| | | | Argument is moved from struct init in command run, which lets us remove context from struct.
* Use test context in tests and new loop system in benchmarks (#33648)TheFox0x72025-02-204-32/+29
| | | | | | | | Replace all contexts in tests with go1.24 t.Context() --------- Co-authored-by: Giteabot <teabot@gitea.io> Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
* Fix project issues list and counting (#33594)Lunny Xiao2025-02-171-2/+2
| | | | Co-authored-by: delvh <dev.lh@web.de> Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
* enable literal string for code search (#33590)Darren Hoo2025-02-167-15/+147
| | | | | | | | Close: #33588 --------- Co-authored-by: wxiaoguang <wxiaoguang@gmail.com> Co-authored-by: Giteabot <teabot@gitea.io>
* Fix unnecessary comment when moving issue on the same project column (#33496)Lunny Xiao2025-02-051-1/+6
| | | Fix #33482
* Refactor older tests to use testify (#33140)TheFox0x72025-01-093-66/+24
| | | | | Refactor checks to use assert/require Use require.Eventually for waiting in elastic and meilisearch tests Use require to exit early instead of assert
* Fix bleve fuzziness search (#33078)wxiaoguang2025-01-034-8/+15
| | | Close #31565
* Enable tenv and testifylint rules (#32852)TheFox0x72024-12-151-19/+19
| | | | Enables tenv and testifylint linters closes: https://github.com/go-gitea/gitea/issues/32842
* Update golangci-lint to v1.62.2, fix issues (#32845)silverwind2024-12-152-8/+8
| | | Update it and fix new issues related to `redefines-builtin-id`
* Add `is_archived` option for issue indexer (#32735)yp053272024-12-128-11/+55
| | | | | | | Try to fix #32697 Reason: `is_archived` is already defined in the query options, but it is not implemented in the indexer.
* Add label/author/assignee filters to the user/org home issue list (#32779)wxiaoguang2024-12-113-9/+9
| | | | | | | | | | Replace #26661, fix #25979 Not perfect, but usable and much better than before. Since it is quite complex, I am not quite sure whether there would be any regression, if any, I will fix in first time. I have tested the related pages many times: issue list, milestone issue list, project view, user issue list, org issue list.
* Fix markup render regression and fix some tests (#32640)wxiaoguang2024-11-261-2/+0
| | | | | | | Fix #32639, https://github.com/go-gitea/gitea/issues/32608#issuecomment-2497918210 By the way, fix some incorrect SQLs (use single quote but not double quote)
* Reduce integration test overhead (#32475)Rowan Bohde2024-11-141-7/+4
| | | | | | | | | | | | | In profiling integration tests, I found a couple places where per-test overhead could be reduced: * Avoiding disk IO by synchronizing instead of deleting & copying test Git repository data. This saves ~100ms per test on my machine * When flushing queues in `PrintCurrentTest`, invoke `FlushWithContext` in a parallel. --------- Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
* Updated tokenizer to better matching when search for code snippets (#32261)Bruno Sofiato2024-11-065-7/+77
| | | | | | | | | | | | | | | | | | | | This PR improves the accuracy of Gitea's code search. Currently, Gitea does not consider statements such as `onsole.log("hello")` as hits when the user searches for `log`. The culprit is how both ES and Bleve are tokenizing the file contents (in both cases, `console.log` is a whole token). In ES' case, we changed the tokenizer to [simple_pattern_split](https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-simplepatternsplit-tokenizer.html#:~:text=The%20simple_pattern_split%20tokenizer%20uses%20a,the%20tokenization%20is%20generally%20faster.). In such a case, tokens are words formed by digits and letters. In Bleve's case, it employs a [letter](https://blevesearch.com/docs/Tokenizers/) tokenizer. Resolves #32220 --------- Signed-off-by: Bruno Sofiato <bruno.sofiato@gmail.com>
* Update go dependencies (#32389)wxiaoguang2024-10-311-6/+2
|
* Allow code search by filename (#32210)Bruno Sofiato2024-10-119-40/+534
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a large and complex PR, so let me explain in detail its changes. First, I had to create new index mappings for Bleve and ElasticSerach as the current ones do not support search by filename. This requires Gitea to recreate the code search indexes (I do not know if this is a breaking change, but I feel it deserves a heads-up). I've used [this approach](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/analysis-pathhierarchy-tokenizer.html) to model the filename index. It allows us to efficiently search for both the full path and the name of a file. Bleve, however, does not support this out-of-box, so I had to code a brand new [token filter](https://blevesearch.com/docs/Token-Filters/) to generate the search terms. I also did an overhaul in the `indexer_test.go` file. It now asserts the order of the expected results (this is important since matches based on the name of a file are more relevant than those based on its content). I've added new test scenarios that deal with searching by filename. They use a new repo included in the Gitea fixture. The screenshot below depicts how Gitea shows the search results. It shows results based on content in the same way as the current version does. In matches based on the filename, the first seven lines of the file contents are shown (BTW, this is how GitHub does it). ![image](https://github.com/user-attachments/assets/9d938d86-1a8d-4f89-8644-1921a473e858) Resolves #32096 --------- Signed-off-by: Bruno Sofiato <bruno.sofiato@gmail.com>
* Fixed race condition when deleting documents by repoId in ElasticSearch (#32185)Bruno Sofiato2024-10-031-1/+27
| | | | | | | Resolves #32184 --------- Signed-off-by: Bruno Sofiato <bruno.sofiato@gmail.com>
* Change the code search to sort results by relevance (#32134)Bruno Sofiato2024-09-282-2/+6
| | | | | Resolves #32129 Signed-off-by: Bruno Sofiato <bruno.sofiato@gmail.com>
* Fix index too many file names bug (#31903)Lunny Xiao2024-09-011-7/+31
| | | | Try to fix #31884 Fix #28584
* Refactor the usage of batch catfile (#31754)Lunny Xiao2024-08-202-18/+22
| | | | | | | | | | When opening a repository, it will call `ensureValidRepository` and also `CatFileBatch`. But sometimes these will not be used until repository closed. So it's a waste of CPU to invoke 3 times git command for every open repository. This PR removed all of these from `OpenRepository` but only kept checking whether the folder exists. When a batch is necessary, the necessary functions will be invoked.
* Properly filter issue list given no assignees filter (#31522)Kemal Zebari2024-07-232-1/+12
| | | Quick fix #31520. This issue is related to #31337.
* Allow searching issues by ID (#31479)Carsten Klein2024-07-174-2/+54
| | | | | | | | | | When you are entering a number in the issue search, you likely want the issue with the given ID (code internal concept: issue index). As such, when a number is detected, the issue with the corresponding ID will now be added to the results. Fixes #4479 Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
* Fix bug filtering issues which have no project (#31337)Lunny Xiao2024-06-141-1/+6
| | | | | | Fix #31327 This is a quick patch to fix the bug. Some parameters are using 0, some are using -1. I think it needs a refactor to keep consistent. But that will be another PR.
* Rename project board -> column to make the UI less confusing (#30170)Lunny Xiao2024-05-279-23/+23
| | | | | | | | | | | | | This PR split the `Board` into two parts. One is the struct has been renamed to `Column` and the second we have a `Template Type`. But to make it easier to review, this PR will not change the database schemas, they are just renames. The database schema changes could be in future PRs. --------- Co-authored-by: silverwind <me@silverwind.io> Co-authored-by: yp05327 <576951401@qq.com>
* Fix bleve fuzziness (#30799)wxiaoguang2024-05-013-9/+15
| | | | Fix #30797 Fix #30317
* Fix tautological conditions (#30735)silverwind2024-04-301-6/+0
| | | | | | | As discovered by https://github.com/go-gitea/gitea/pull/30729. --------- Co-authored-by: Giteabot <teabot@gitea.io>
* Resolve lint for unused parameter and unnecessary type arguments (#30750)Chongyi Zheng2024-04-291-8/+4
| | | | | | | | | | Resolve all cases for `unused parameter` and `unnecessary type arguments` Related: #30729 --------- Co-authored-by: Giteabot <teabot@gitea.io>
* Perform Newest sort type correctly when sorting issues (#30644)Kemal Zebari2024-04-231-2/+2
| | | | | | | | | | | | | Should resolve #30642. Before this commit, we were treating an empty `?sort=` query parameter as the correct sorting type (which is to sort issues in descending order by their created UNIX time). But when we perform `sort=latest`, we did not include this as a type so we would sort by the most recently updated when reaching the `default` switch statement block. This commit fixes this by considering the empty string, "latest", and just any other string that is not mentioned in the switch statement as sorting by newest.
* Enable more `revive` linter rules (#30608)silverwind2024-04-222-3/+0
| | | | | | | | | | | Noteable additions: - `redefines-builtin-id` forbid variable names that shadow go builtins - `empty-lines` remove unnecessary empty lines that `gofumpt` does not remove for some reason - `superfluous-else` eliminate more superfluous `else` branches Rules are also sorted alphabetically and I cleaned up various parts of `.golangci.yml`.
* Render embedded code preview by permlink in markdown (#30234)wxiaoguang2024-04-021-7/+9
| | | | | The permlink in markdown will be rendered as a code preview block, like GitHub Co-authored-by: silverwind <me@silverwind.io>
* Use db.ListOptions directly instead of Paginator interface to make it easier ↵Lunny Xiao2024-03-246-16/+39
| | | | | | | | | | | | | to use and fix performance of /pulls and /issues (#29990) This PR uses `db.ListOptions` instead of `Paginor` to make the code simpler. And it also fixed the performance problem when viewing /pulls or /issues. Before the counting in fact will also do the search. --------- Co-authored-by: Jason Song <i@wolfogre.com> Co-authored-by: silverwind <me@silverwind.io>
* Support repo code search without setting up an indexer (#29998)wxiaoguang2024-03-241-17/+18
| | | | | | | | | | | | | | | | | By using git's ability, end users (especially small instance users) do not need to enable the indexer, they could also benefit from the code searching feature. Fix #29996 ![image](https://github.com/go-gitea/gitea/assets/2114189/11b7e458-88a4-480d-b4d7-72ee59406dd1) ![image](https://github.com/go-gitea/gitea/assets/2114189/0fe777d5-c95c-4288-a818-0427680805b6) --------- Co-authored-by: silverwind <me@silverwind.io>
* Determine fuzziness of bleve indexer by keyword length (#29706)65432024-03-233-28/+22
| | | also bleve did match on fuzzy search and the other way around. this also fix that bug.
* Use db.ListOptionsAll instead of db.ListOptions{ListAll: true} (#29995)Lunny Xiao2024-03-222-35/+17
|
* Meilisearch double quote on "match" query (#29740)65432024-03-162-70/+37
| | | | | make `nonFuzzyWorkaround` unessesary cc @Kerollmops
* Refactor code_indexer to use an SearchOptions struct for PerformSearch (#29724)65432024-03-166-35/+53
| | | | | | | | similar to how it's already done for the issue_indexer --- *Sponsored by Kithara Software GmbH*
* Refactor to use optional.Option for issue index search option (#29739)65432024-03-139-187/+140
| | | Signed-off-by: 6543 <6543@obermui.de>
* Use repo object format name instead of detecting from git repository (#29702)Lunny Xiao2024-03-101-8/+4
| | | | It's unnecessary to detect the repository object format from git repository. Just use the repository's object format name.