You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 6.9KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158
  1. # zap file format
  2. Advanced ZAP File Format Documentation is [here](zap.md).
  3. The file is written in the reverse order that we typically access data. This helps us write in one pass since later sections of the file require file offsets of things we've already written.
  4. Current usage:
  5. - mmap the entire file
  6. - crc-32 bytes and version are in fixed position at end of the file
  7. - reading remainder of footer could be version specific
  8. - remainder of footer gives us:
  9. - 3 important offsets (docValue , fields index and stored data index)
  10. - 2 important values (number of docs and chunk factor)
  11. - field data is processed once and memoized onto the heap so that we never have to go back to disk for it
  12. - access to stored data by doc number means first navigating to the stored data index, then accessing a fixed position offset into that slice, which gives us the actual address of the data. the first bytes of that section tell us the size of data so that we know where it ends.
  13. - access to all other indexed data follows the following pattern:
  14. - first know the field name -> convert to id
  15. - next navigate to term dictionary for that field
  16. - some operations stop here and do dictionary ops
  17. - next use dictionary to navigate to posting list for a specific term
  18. - walk posting list
  19. - if necessary, walk posting details as we go
  20. - if location info is desired, consult location bitmap to see if it is there
  21. ## stored fields section
  22. - for each document
  23. - preparation phase:
  24. - produce a slice of metadata bytes and data bytes
  25. - produce these slices in field id order
  26. - field value is appended to the data slice
  27. - metadata slice is varint encoded with the following values for each field value
  28. - field id (uint16)
  29. - field type (byte)
  30. - field value start offset in uncompressed data slice (uint64)
  31. - field value length (uint64)
  32. - field number of array positions (uint64)
  33. - one additional value for each array position (uint64)
  34. - compress the data slice using snappy
  35. - file writing phase:
  36. - remember the start offset for this document
  37. - write out meta data length (varint uint64)
  38. - write out compressed data length (varint uint64)
  39. - write out the metadata bytes
  40. - write out the compressed data bytes
  41. ## stored fields idx
  42. - for each document
  43. - write start offset (remembered from previous section) of stored data (big endian uint64)
  44. With this index and a known document number, we have direct access to all the stored field data.
  45. ## posting details (freq/norm) section
  46. - for each posting list
  47. - produce a slice containing multiple consecutive chunks (each chunk is varint stream)
  48. - produce a slice remembering offsets of where each chunk starts
  49. - preparation phase:
  50. - for each hit in the posting list
  51. - if this hit is in next chunk close out encoding of last chunk and record offset start of next
  52. - encode term frequency (uint64)
  53. - encode norm factor (float32)
  54. - file writing phase:
  55. - remember start position for this posting list details
  56. - write out number of chunks that follow (varint uint64)
  57. - write out length of each chunk (each a varint uint64)
  58. - write out the byte slice containing all the chunk data
  59. If you know the doc number you're interested in, this format lets you jump to the correct chunk (docNum/chunkFactor) directly and then seek within that chunk until you find it.
  60. ## posting details (location) section
  61. - for each posting list
  62. - produce a slice containing multiple consecutive chunks (each chunk is varint stream)
  63. - produce a slice remembering offsets of where each chunk starts
  64. - preparation phase:
  65. - for each hit in the posting list
  66. - if this hit is in next chunk close out encoding of last chunk and record offset start of next
  67. - encode field (uint16)
  68. - encode field pos (uint64)
  69. - encode field start (uint64)
  70. - encode field end (uint64)
  71. - encode number of array positions to follow (uint64)
  72. - encode each array position (each uint64)
  73. - file writing phase:
  74. - remember start position for this posting list details
  75. - write out number of chunks that follow (varint uint64)
  76. - write out length of each chunk (each a varint uint64)
  77. - write out the byte slice containing all the chunk data
  78. If you know the doc number you're interested in, this format lets you jump to the correct chunk (docNum/chunkFactor) directly and then seek within that chunk until you find it.
  79. ## postings list section
  80. - for each posting list
  81. - preparation phase:
  82. - encode roaring bitmap posting list to bytes (so we know the length)
  83. - file writing phase:
  84. - remember the start position for this posting list
  85. - write freq/norm details offset (remembered from previous, as varint uint64)
  86. - write location details offset (remembered from previous, as varint uint64)
  87. - write length of encoded roaring bitmap
  88. - write the serialized roaring bitmap data
  89. ## dictionary
  90. - for each field
  91. - preparation phase:
  92. - encode vellum FST with dictionary data pointing to file offset of posting list (remembered from previous)
  93. - file writing phase:
  94. - remember the start position of this persistDictionary
  95. - write length of vellum data (varint uint64)
  96. - write out vellum data
  97. ## fields section
  98. - for each field
  99. - file writing phase:
  100. - remember start offset for each field
  101. - write dictionary address (remembered from previous) (varint uint64)
  102. - write length of field name (varint uint64)
  103. - write field name bytes
  104. ## fields idx
  105. - for each field
  106. - file writing phase:
  107. - write big endian uint64 of start offset for each field
  108. NOTE: currently we don't know or record the length of this fields index. Instead we rely on the fact that we know it immediately precedes a footer of known size.
  109. ## fields DocValue
  110. - for each field
  111. - preparation phase:
  112. - produce a slice containing multiple consecutive chunks, where each chunk is composed of a meta section followed by compressed columnar field data
  113. - produce a slice remembering the length of each chunk
  114. - file writing phase:
  115. - remember the start position of this first field DocValue offset in the footer
  116. - write out number of chunks that follow (varint uint64)
  117. - write out length of each chunk (each a varint uint64)
  118. - write out the byte slice containing all the chunk data
  119. NOTE: currently the meta header inside each chunk gives clue to the location offsets and size of the data pertaining to a given docID and any
  120. read operation leverage that meta information to extract the document specific data from the file.
  121. ## footer
  122. - file writing phase
  123. - write number of docs (big endian uint64)
  124. - write stored field index location (big endian uint64)
  125. - write field index location (big endian uint64)
  126. - write field docValue location (big endian uint64)
  127. - write out chunk factor (big endian uint32)
  128. - write out version (big endian uint32)
  129. - write out file CRC of everything preceding this (big endian uint32)