Skip to main content

Format of Documents File and Meta File

See also

Format of Documents File

Documents file is a part of fraction data. It has *.docs extention and contains documents stored in fraction.

The file format is a sequence of DocBlock following one after another.

<DocBlocks1> <DocBlocks2> <DocBlocks3> ... <DocBlocksN>

In the current implementation, each DocBlock corresponds to the data of the bulk-request, so every bulk-request creates exactly one DocBlock.

DocBlock Format

DocBlock has

  • a fixed header with block's meta data
  • arbitrary payload, which may be compressed (lz4 for now).

DocBlock Header

It is 33 byte area with 5 fields:

FieldSizeType
Codec1 bytebyte
Length8 bytes64uint
RawLength8 bytes64uint
Ext18 bytesbinary
Ext28 bytesbinary

DocBlocks Payload Format

Content of DocBlocks payload of Documents file is generated by a seq-proxy from incoming documents and consists of a sequence of records with two fields:

<SIZE_1> <BINARY_DATA_1> <SIZE_2> <BINARY_DATA_2> <SIZE_3> <BINARY_DATA_3> ... <SIZE_N> <BINARY_DATA_N>

  • Size - It is unit32
  • Binary data with size Size - It must be a valid json document

Format of Meta File

Meta file is a part of active fraction. It has *.meta extention and contains metadata of documents.

This file has almost the same format as a Documents file but with 2 differences:

  1. Different BINARY_DATA. Each BINARY_DATA item corresponds to its own document from the Documents file. Format of BINARY_DATA in Meta file is a JSON:
{
"mid": int,
"rid": int,
"s": int, // document size
"t":[ // tokens
["field1", "value1"],
["field2", "value2"],
["field3", "value3"],
// ...
]
}
  1. One more difference: each DocBlock with meta stores in Ext1 field of header size of coresponding DocBlock with documents.

Diagram of Docs/Meta File Format

Diagram of Docs/Meta File Format