Index types and mappings
seq-db doesn't index any fields from the ingested data by default. Instead, indexing is controlled through a special file called the mapping file. The mapping file specifies the indexed fields and the used index types.
Index types
Below is a description of mapping types seq-db currently supports. There are several index types with different behaviors.
keyword
mapping type
The keyword
mapping type treats the whole value as a single token, without breaking it up.
Usually used for content like statuses, namespaces, tags or any other data where a full match search is required.
Note that keyword
index should be used with care with high-cardinality values like trace_id
, span_id
or
other unique request identifiers, as the indexing of these fields might blow up the index.
Example of a mapping for a keyword field:
mapping-list:
- name: level
type: keyword
path
mapping type
This mapping type indexes hierarchical path-like values.
It is very similar to the
elasticsearch's path tokenizer.
When a field is indexed with the path
index, its value is broken
into hierarchical terms.
Used for searching by beginning of a path or full path. For example, the following documents will match query uri:"/my/path"
:
[
{"uri": "/my/path"},
{"uri": "/my/path/1"},
{"uri": "/my/path/1/"},
{"uri": "/my/path/1/one"}
]
Example of a mapping for a path field:
mapping-list:
- name: uri
type: path
text
mapping type
Index for content like error messages or request bodies where full-text search is required. This mapping type is used to index fields containing unstructured, natural language text, as well as human-written messages, descriptions, error messages and other free-text fields.
For example, the search query message:"error"
will emit all the documents that contain token error
. And query message:"error code"
will emit all the documents that contain both error
and code
.
Example of a mapping for a text field:
mapping-list:
- name: message
type: text
exists
mapping type
Used when the presence of the field is important and not the value.
This mapping type should be used when a field might or might not be
present in a message.
For example, for query _exists_:service
all documents that have a service
field will be found regardless of the field value.
Note that the _exists_
query will work on other types of mapping as well.
Example of a mapping for an exists field:
mapping-list:
- name: service
type: exists
Configuration parameters
-
--partial-indexing
- if true, will index only the first part of long fields, otherwise will skip entry if length of field value is greater than threshold. -
--max-token-size
- max size of a single token, default is 72. -
--case-sensitive
- if false, will convert values to lower case. -
constant
consts.MaxTextFieldValueLength
- limits maximum length of the text field value. Current threshold is 32768 bytes.
Object indexing
seq-db can also index logs containing nested structured data.
In this case, the parent field should have the object
index type,
and contain a mapping-list
object inside, that would specify
how exactly its nested fields should be indexed.
For example:
mapping-list:
- name: "myobject" # key that contains nested json data
type: "object"
mapping-list: # mapping for the nested fields
- type: "keyword"
name: "nested"
- type: "text"
name: "nestedtext"
Field names containing dots and nested objects
seq-db doesn't differentiate between fields indexed
in a nested object and fields containing a dot in their name.
For example, the mapping displayed below, will index both name
keys nested inside the object with the key user
, and the user.name
field
inside the root log.
mapping-list:
- name: user
type: object
mapping-list:
- type: keyword
name: name
- user.name: keyword
Multiple indexes on a single field
A single field can be indexed with multiple types at the same time. This allows to combine multiple indexing strategies enabling more flexible search capabilities.
For instance, a field can be indexed as both keyword
and text
,
allowing both full-text search and exact filtering.
If multiple indexing types are used, the additional indexing type should have titles, that will be used during data search. Say we have this mapping:
mapping-list:
- name: message
types:
- type: text
- title: keyword
type: keyword
size: 18
In this case, the type with empty title
will be default one.
For types with title
new "implicit" fields will be created: message.keyword
in our case. Use message.keyword
to search over message as keyword and message
to search as text.
The title of implicit field consists of values of name
and title
joined together with a dot between them.
Illustrated mapping example
Let's walk through a practical example. We'll define a mapping and analyze how seq-db would index a sample document.
Say we have this example mapping with explicit field types: text
, keyword
, path
, exists
.
mapping-list:
- name: message
types:
- type: text
- title: keyword
type: keyword
size: 18
- name: level
type: keyword
- name: foo
type: exists
- name: bar
type: keyword
- name: uri
type: path