summaryrefslogtreecommitdiffstats
path: root/vendor/github.com/blevesearch/zap/v12/zap.md
diff options
context:
space:
mode:
Diffstat (limited to 'vendor/github.com/blevesearch/zap/v12/zap.md')
-rw-r--r--vendor/github.com/blevesearch/zap/v12/zap.md177
1 files changed, 177 insertions, 0 deletions
diff --git a/vendor/github.com/blevesearch/zap/v12/zap.md b/vendor/github.com/blevesearch/zap/v12/zap.md
new file mode 100644
index 0000000000..d74dc548b8
--- /dev/null
+++ b/vendor/github.com/blevesearch/zap/v12/zap.md
@@ -0,0 +1,177 @@
+# ZAP File Format
+
+## Legend
+
+### Sections
+
+ |========|
+ | | section
+ |========|
+
+### Fixed-size fields
+
+ |--------| |----| |--| |-|
+ | | uint64 | | uint32 | | uint16 | | uint8
+ |--------| |----| |--| |-|
+
+### Varints
+
+ |~~~~~~~~|
+ | | varint(up to uint64)
+ |~~~~~~~~|
+
+### Arbitrary-length fields
+
+ |--------...---|
+ | | arbitrary-length field (string, vellum, roaring bitmap)
+ |--------...---|
+
+### Chunked data
+
+ [--------]
+ [ ]
+ [--------]
+
+## Overview
+
+Footer section describes the configuration of particular ZAP file. The format of footer is version-dependent, so it is necessary to check `V` field before the parsing.
+
+ |==================================================|
+ | Stored Fields |
+ |==================================================|
+ |-----> | Stored Fields Index |
+ | |==================================================|
+ | | Dictionaries + Postings + DocValues |
+ | |==================================================|
+ | |---> | DocValues Index |
+ | | |==================================================|
+ | | | Fields |
+ | | |==================================================|
+ | | |-> | Fields Index |
+ | | | |========|========|========|========|====|====|====|
+ | | | | D# | SF | F | FDV | CF | V | CC | (Footer)
+ | | | |========|====|===|====|===|====|===|====|====|====|
+ | | | | | |
+ |-+-+-----------------| | |
+ | |--------------------------| |
+ |-------------------------------------|
+
+ D#. Number of Docs.
+ SF. Stored Fields Index Offset.
+ F. Field Index Offset.
+ FDV. Field DocValue Offset.
+ CF. Chunk Factor.
+ V. Version.
+ CC. CRC32.
+
+## Stored Fields
+
+Stored Fields Index is `D#` consecutive 64-bit unsigned integers - offsets, where relevant Stored Fields Data records are located.
+
+ 0 [SF] [SF + D# * 8]
+ | Stored Fields | Stored Fields Index |
+ |================================|==================================|
+ | | |
+ | |--------------------| ||--------|--------|. . .|--------||
+ | |-> | Stored Fields Data | || 0 | 1 | | D# - 1 ||
+ | | |--------------------| ||--------|----|---|. . .|--------||
+ | | | | |
+ |===|============================|==============|===================|
+ | |
+ |-------------------------------------------|
+
+Stored Fields Data is an arbitrary size record, which consists of metadata and [Snappy](https://github.com/golang/snappy)-compressed data.
+
+ Stored Fields Data
+ |~~~~~~~~|~~~~~~~~|~~~~~~~~...~~~~~~~~|~~~~~~~~...~~~~~~~~|
+ | MDS | CDS | MD | CD |
+ |~~~~~~~~|~~~~~~~~|~~~~~~~~...~~~~~~~~|~~~~~~~~...~~~~~~~~|
+
+ MDS. Metadata size.
+ CDS. Compressed data size.
+ MD. Metadata.
+ CD. Snappy-compressed data.
+
+## Fields
+
+Fields Index section located between addresses `F` and `len(file) - len(footer)` and consist of `uint64` values (`F1`, `F2`, ...) which are offsets to records in Fields section. We have `F# = (len(file) - len(footer) - F) / sizeof(uint64)` fields.
+
+
+ (...) [F] [F + F#]
+ | Fields | Fields Index. |
+ |================================|================================|
+ | | |
+ | |~~~~~~~~|~~~~~~~~|---...---|||--------|--------|...|--------||
+ ||->| Dict | Length | Name ||| 0 | 1 | | F# - 1 ||
+ || |~~~~~~~~|~~~~~~~~|---...---|||--------|----|---|...|--------||
+ || | | |
+ ||===============================|==============|=================|
+ | |
+ |----------------------------------------------|
+
+
+## Dictionaries + Postings
+
+Each of fields has its own dictionary, encoded in [Vellum](https://github.com/couchbase/vellum) format. Dictionary consists of pairs `(term, offset)`, where `offset` indicates the position of postings (list of documents) for this particular term.
+
+ |================================================================|- Dictionaries +
+ | | Postings +
+ | | DocValues
+ | Freq/Norm (chunked) |
+ | [~~~~~~|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~] |
+ | |->[ Freq | Norm (float32 under varint) ] |
+ | | [~~~~~~|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~] |
+ | | |
+ | |------------------------------------------------------------| |
+ | Location Details (chunked) | |
+ | [~~~~~~|~~~~~|~~~~~~~|~~~~~|~~~~~~|~~~~~~~~|~~~~~] | |
+ | |->[ Size | Pos | Start | End | Arr# | ArrPos | ... ] | |
+ | | [~~~~~~|~~~~~|~~~~~~~|~~~~~|~~~~~~|~~~~~~~~|~~~~~] | |
+ | | | |
+ | |----------------------| | |
+ | Postings List | | |
+ | |~~~~~~~~|~~~~~|~~|~~~~~~~~|-----------...--| | |
+ | |->| F/N | LD | Length | ROARING BITMAP | | |
+ | | |~~~~~|~~|~~~~~~~~|~~~~~~~~|-----------...--| | |
+ | | |----------------------------------------------| |
+ | |--------------------------------------| |
+ | Dictionary | |
+ | |~~~~~~~~|--------------------------|-...-| |
+ | |->| Length | VELLUM DATA : (TERM -> OFFSET) | |
+ | | |~~~~~~~~|----------------------------...-| |
+ | | |
+ |======|=========================================================|- DocValues Index
+ | | |
+ |======|=========================================================|- Fields
+ | | |
+ | |~~~~|~~~|~~~~~~~~|---...---| |
+ | | Dict | Length | Name | |
+ | |~~~~~~~~|~~~~~~~~|---...---| |
+ | |
+ |================================================================|
+
+## DocValues
+
+DocValues Index is `F#` pairs of varints, one pair per field. Each pair of varints indicates start and end point of DocValues slice.
+
+ |================================================================|
+ | |------...--| |
+ | |->| DocValues |<-| |
+ | | |------...--| | |
+ |==|=================|===========================================|- DocValues Index
+ ||~|~~~~~~~~~|~~~~~~~|~~| |~~~~~~~~~~~~~~|~~~~~~~~~~~~||
+ || DV1 START | DV1 STOP | . . . . . | DV(F#) START | DV(F#) END ||
+ ||~~~~~~~~~~~|~~~~~~~~~~| |~~~~~~~~~~~~~~|~~~~~~~~~~~~||
+ |================================================================|
+
+DocValues is chunked Snappy-compressed values for each document and field.
+
+ [~~~~~~~~~~~~~~~|~~~~~~|~~~~~~~~~|-...-|~~~~~~|~~~~~~~~~|--------------------...-]
+ [ Doc# in Chunk | Doc1 | Offset1 | ... | DocN | OffsetN | SNAPPY COMPRESSED DATA ]
+ [~~~~~~~~~~~~~~~|~~~~~~|~~~~~~~~~|-...-|~~~~~~|~~~~~~~~~|--------------------...-]
+
+Last 16 bytes are description of chunks.
+
+ |~~~~~~~~~~~~...~|----------------|----------------|
+ | Chunk Sizes | Chunk Size Arr | Chunk# |
+ |~~~~~~~~~~~~...~|----------------|----------------|