Format of Documents File and Meta File
See also
- Fractions
- Format of Documents and Metadata File
- Format of Index File of Sealed Fraction
Format of Documents File
Documents file is a part of fraction data. It has *.docs extention and contains documents stored in fraction.
The file format is a sequence of DocBlock following one after another.
<DocBlocks1> <DocBlocks2> <DocBlocks3> ... <DocBlocksN>
In the current implementation, each DocBlock corresponds to the data of the bulk-request, so every bulk-request creates exactly one DocBlock.
DocBlock Format
DocBlock has
- a fixed header with block's meta data
- arbitrary payload, which may be compressed (lz4 for now).
DocBlock Header
It is 33 byte area with 5 fields:
Field | Size | Type |
---|---|---|
Codec | 1 byte | byte |
Length | 8 bytes | 64uint |
RawLength | 8 bytes | 64uint |
Ext1 | 8 bytes | binary |
Ext2 | 8 bytes | binary |
DocBlocks Payload Format
Content of DocBlocks payload of Documents file is generated by a seq-proxy from incoming documents and consists of a sequence of records with two fields:
<SIZE_1> <BINARY_DATA_1> <SIZE_2> <BINARY_DATA_2> <SIZE_3> <BINARY_DATA_3> ... <SIZE_N> <BINARY_DATA_N>
- Size - It is unit32
- Binary data with size Size - It must be a valid json document
Format of Meta File
Meta file is a part of active fraction. It has *.meta extention and contains metadata of documents.
This file has almost the same format as a Documents file but with 2 differences:
- Different BINARY_DATA. Each BINARY_DATA item corresponds to its own document from the Documents file. Format of BINARY_DATA in Meta file is a JSON:
{
"mid": int,
"rid": int,
"s": int, // document size
"t":[ // tokens
["field1", "value1"],
["field2", "value2"],
["field3", "value3"],
// ...
]
}
- One more difference: each DocBlock with meta stores in Ext1 field of header size of coresponding DocBlock with documents.