aboutsummaryrefslogtreecommitdiffstats
path: root/modules/indexer/code/indexer_test.go
Commit message (Collapse)AuthorAgeFilesLines
* Enable testifylint rules (#34075)TheFox0x72025-03-311-1/+1
| | | | enable testifylint rules disabled in: https://github.com/go-gitea/gitea/pull/34054
* Make SearchMode have default value and add comments (#33863)wxiaoguang2025-03-141-1/+2
| | | | | | | * Make `SearchMode` have default value if it is empty * Add some comments for the "match" queries * Fix a copy-paste mistake in `buildMatchQuery` (`db.go`) * Add missing `q.Analyzer = repoIndexerAnalyzer`, it is in old code, although I do not see real difference ....
* Improve issue & code search (#33860)wxiaoguang2025-03-131-16/+19
| | | | Each "indexer" should provide the "search modes" they support by themselves. And we need to remove the "fuzzy" search for code.
* Use test context in tests and new loop system in benchmarks (#33648)TheFox0x72025-02-201-8/+7
| | | | | | | | Replace all contexts in tests with go1.24 t.Context() --------- Co-authored-by: Giteabot <teabot@gitea.io> Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
* Fix bleve fuzziness search (#33078)wxiaoguang2025-01-031-1/+3
| | | Close #31565
* Fix markup render regression and fix some tests (#32640)wxiaoguang2024-11-261-2/+0
| | | | | | | Fix #32639, https://github.com/go-gitea/gitea/issues/32608#issuecomment-2497918210 By the way, fix some incorrect SQLs (use single quote but not double quote)
* Reduce integration test overhead (#32475)Rowan Bohde2024-11-141-7/+4
| | | | | | | | | | | | | In profiling integration tests, I found a couple places where per-test overhead could be reduced: * Avoiding disk IO by synchronizing instead of deleting & copying test Git repository data. This saves ~100ms per test on my machine * When flushing queues in `PrintCurrentTest`, invoke `FlushWithContext` in a parallel. --------- Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
* Updated tokenizer to better matching when search for code snippets (#32261)Bruno Sofiato2024-11-061-0/+49
| | | | | | | | | | | | | | | | | | | | This PR improves the accuracy of Gitea's code search. Currently, Gitea does not consider statements such as `onsole.log("hello")` as hits when the user searches for `log`. The culprit is how both ES and Bleve are tokenizing the file contents (in both cases, `console.log` is a whole token). In ES' case, we changed the tokenizer to [simple_pattern_split](https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-simplepatternsplit-tokenizer.html#:~:text=The%20simple_pattern_split%20tokenizer%20uses%20a,the%20tokenization%20is%20generally%20faster.). In such a case, tokens are words formed by digits and letters. In Bleve's case, it employs a [letter](https://blevesearch.com/docs/Tokenizers/) tokenizer. Resolves #32220 --------- Signed-off-by: Bruno Sofiato <bruno.sofiato@gmail.com>
* Allow code search by filename (#32210)Bruno Sofiato2024-10-111-15/+169
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a large and complex PR, so let me explain in detail its changes. First, I had to create new index mappings for Bleve and ElasticSerach as the current ones do not support search by filename. This requires Gitea to recreate the code search indexes (I do not know if this is a breaking change, but I feel it deserves a heads-up). I've used [this approach](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/analysis-pathhierarchy-tokenizer.html) to model the filename index. It allows us to efficiently search for both the full path and the name of a file. Bleve, however, does not support this out-of-box, so I had to code a brand new [token filter](https://blevesearch.com/docs/Token-Filters/) to generate the search terms. I also did an overhaul in the `indexer_test.go` file. It now asserts the order of the expected results (this is important since matches based on the name of a file are more relevant than those based on its content). I've added new test scenarios that deal with searching by filename. They use a new repo included in the Gitea fixture. The screenshot below depicts how Gitea shows the search results. It shows results based on content in the same way as the current version does. In matches based on the filename, the first seven lines of the file contents are shown (BTW, this is how GitHub does it). ![image](https://github.com/user-attachments/assets/9d938d86-1a8d-4f89-8644-1921a473e858) Resolves #32096 --------- Signed-off-by: Bruno Sofiato <bruno.sofiato@gmail.com>
* Refactor code_indexer to use an SearchOptions struct for PerformSearch (#29724)65432024-03-161-1/+10
| | | | | | | | similar to how it's already done for the issue_indexer --- *Sponsored by Kithara Software GmbH*
* Patch in exact search for meilisearch (#29671)65432024-03-091-1/+1
| | | | | | | | | | | | | | | | | | | meilisearch does not have an search option to contorl fuzzynes per query right now: - https://github.com/meilisearch/meilisearch/issues/1192 - https://github.com/orgs/meilisearch/discussions/377 - https://github.com/meilisearch/meilisearch/discussions/1096 so we have to create a workaround by post-filter the search result in gitea until this is addressed. For future works I added an option in backend only atm, to enable fuzzynes for issue indexer too. And also refactored the code so the fuzzy option is equal in logic to code indexer --- *Sponsored by Kithara Software GmbH*
* Replace assert.Fail with assert.FailNow (#27578)Nanguan Lin2023-10-111-4/+2
| | | | | | | | | assert.Fail() will continue to execute the code while assert.FailNow() not. I thought those uses of assert.Fail() should exit immediately. PS: perhaps it's a good idea to use [require](https://pkg.go.dev/github.com/stretchr/testify/require) somewhere because the assert package's default behavior does not exit when an error occurs, which makes it difficult to find the root error reason.
* make writing main test easier (#27270)Lunny Xiao2023-09-281-4/+1
| | | | | | | | | This PR removed `unittest.MainTest` the second parameter `TestOptions.GiteaRoot`. Now it detects the root directory by current working directory. --------- Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
* move repository deletion to service layer (#26948)Lunny Xiao2023-09-081-0/+2
| | | Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
* Refactor indexer (#25174)Jason Song2023-06-231-2/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Refactor `modules/indexer` to make it more maintainable. And it can be easier to support more features. I'm trying to solve some of issue searching, this is a precursor to making functional changes. Current supported engines and the index versions: | engines | issues | code | | - | - | - | | db | Just a wrapper for database queries, doesn't need version | - | | bleve | The version of index is **2** | The version of index is **6** | | elasticsearch | The old index has no version, will be treated as version **0** in this PR | The version of index is **1** | | meilisearch | The old index has no version, will be treated as version **0** in this PR | - | ## Changes ### Split Splited it into mutiple packages ```text indexer ├── internal │   ├── bleve │   ├── db │   ├── elasticsearch │   └── meilisearch ├── code │   ├── bleve │   ├── elasticsearch │   └── internal └── issues ├── bleve ├── db ├── elasticsearch ├── internal └── meilisearch ``` - `indexer/interanal`: Internal shared package for indexer. - `indexer/interanal/[engine]`: Internal shared package for each engine (bleve/db/elasticsearch/meilisearch). - `indexer/code`: Implementations for code indexer. - `indexer/code/internal`: Internal shared package for code indexer. - `indexer/code/[engine]`: Implementation via each engine for code indexer. - `indexer/issues`: Implementations for issues indexer. ### Deduplication - Combine `Init/Ping/Close` for code indexer and issues indexer. - ~Combine `issues.indexerHolder` and `code.wrappedIndexer` to `internal.IndexHolder`.~ Remove it, use dummy indexer instead when the indexer is not ready. - Duplicate two copies of creating ES clients. - Duplicate two copies of `indexerID()`. ### Enhancement - [x] Support index version for elasticsearch issues indexer, the old index without version will be treated as version 0. - [x] Fix spell of `elastic_search/ElasticSearch`, it should be `Elasticsearch`. - [x] Improve versioning of ES index. We don't need `Aliases`: - Gitea does't need aliases for "Zero Downtime" because it never delete old indexes. - The old code of issues indexer uses the orignal name to create issue index, so it's tricky to convert it to an alias. - [x] Support index version for meilisearch issues indexer, the old index without version will be treated as version 0. - [x] Do "ping" only when `Ping` has been called, don't ping periodically and cache the status. - [x] Support the context parameter whenever possible. - [x] Fix outdated example config. - [x] Give up the requeue logic of issues indexer: When indexing fails, call Ping to check if it was caused by the engine being unavailable, and only requeue the task if the engine is unavailable. - It is fragile and tricky, could cause data losing (It did happen when I was doing some tests for this PR). And it works for ES only. - Just always requeue the failed task, if it caused by bad data, it's a bug of Gitea which should be fixed. --------- Co-authored-by: Giteabot <teabot@gitea.io>
* Use more specific test methods (#24265)KN4CK3R2023-04-221-1/+1
| | | | Co-authored-by: silverwind <me@silverwind.io> Co-authored-by: Giteabot <teabot@gitea.io>
* Implement FSFE REUSE for golang files (#21840)flynnnnnnnnnn2022-11-271-2/+1
| | | | | | | | | Change all license headers to comply with REUSE specification. Fix #16132 Co-authored-by: flynnnnnnnnnn <flynnnnnnnnnn@github> Co-authored-by: John Olheiser <john.olheiser@gmail.com>
* Use a struct as test options (#19393)Lunny Xiao2022-04-141-1/+3
| | | | | | | * Use a struct as test options * Fix name * Fix test
* Automatically pause queue if index service is unavailable (#15066)Lauris BH2022-01-271-1/+2
| | | | | | * Handle keyword search error when issue indexer service is not available * Implement automatic disabling and resume of code indexer queue
* format with gofumpt (#18184)65432022-01-201-40/+38
| | | | | | | | | | | * gofumpt -w -l . * gofumpt -w -l -extra . * Add linter * manual fix * change make fmt
* Propagate context and ensure git commands run in request context (#17868)zeripath2022-01-191-1/+2
| | | | | | | | | This PR continues the work in #17125 by progressively ensuring that git commands run within the request context. This now means that the if there is a git repo already open in the context it will be used instead of reopening it. Signed-off-by: Andrew Thornton <art27@cantab.net>
* Add missing `X-Total-Count` and fix some related bugs (#17968)qwerty2872021-12-151-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Add missing `X-Total-Count` and fix some related bugs Adds `X-Total-Count` header to APIs that return a list but doesn't have it yet. Fixed bugs: * not returned after reporting error (https://github.com/qwerty287/gitea/blob/39eb82446c6fe5da3d79124e1f701f3795625b69/routers/api/v1/user/star.go#L70) * crash with index out of bounds, API issue/issueSubscriptions I also found various endpoints that return lists but do not apply/support pagination yet: ``` /repos/{owner}/{repo}/issues/{index}/labels /repos/{owner}/{repo}/issues/comments/{id}/reactions /repos/{owner}/{repo}/branch_protections /repos/{owner}/{repo}/contents /repos/{owner}/{repo}/hooks/git /repos/{owner}/{repo}/issue_templates /repos/{owner}/{repo}/releases/{id}/assets /repos/{owner}/{repo}/reviewers /repos/{owner}/{repo}/teams /user/emails /users/{username}/heatmap ``` If this is not expected, an new issue should be opened. Closes #13043 * fmt * Update routers/api/v1/repo/issue_subscription.go Co-authored-by: KN4CK3R <admin@oldschoolhack.me> * Use FindAndCount Co-authored-by: KN4CK3R <admin@oldschoolhack.me> Co-authored-by: 6543 <6543@obermui.de>
* Move repository model into models/repo (#17933)Lunny Xiao2021-12-101-0/+1
| | | | | | | | | | | | | | | * Some refactors related repository model * Move more methods out of repository * Move repository into models/repo * Fix test * Fix test * some improvements * Remove unnecessary function
* Decouple unit test code from business code (#17623)wxiaoguang2021-11-121-2/+3
|
* Move db related basic functions to models/db (#17075)Lunny Xiao2021-09-191-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | * Move db related basic functions to models/db * Fix lint * Fix lint * Fix test * Fix lint * Fix lint * revert unnecessary change * Fix test * Fix wrong replace string * Use *Context * Correct committer spelling and fix wrong replaced words Co-authored-by: zeripath <art27@cantab.net>
* Fixed assert statements. (#16089)KN4CK3R2021-06-071-1/+1
|
* [Feature] add precise search type for Elastic Search (#12869)Jui-Nan Lin2021-01-271-1/+1
| | | | | | | | * feat: add type query parameters for specifying precise search * feat: add select dropdown in search box Co-authored-by: Lauris BH <lauris@nix.lv> Co-authored-by: techknowlogick <techknowlogick@gitea.io>
* Support elastic search for code search (#10273)Lunny Xiao2020-08-301-0/+83
* Support elastic search for code search * Finished elastic search implementation and add some tests * Enable test on drone and added docs * Add new fields to elastic search * Fix bug * remove unused changes * Use indexer alias to keep the gitea indexer version * Improve codes * Some code improvements * The real indexer name changed to xxx.v1 Co-authored-by: zeripath <art27@cantab.net>