fts-elastic is a Dovecot full-text search indexing plugin that uses ElasticSearch as a backend.
Dovecot communicates to ES using HTTP/JSON queries. It supports automatic indexing and searching of e-mail. For mailboxes with more than 10000 messages it uses elastic scroll API.
- Dovecot 2.2+
- JSON-C
- ElasticSearch 6.x, 7.x
- Autoconf 2.53+
You can provide the path to your source tree by passing --with-dovecot= to ./configure.
Install dependencies
# sudo apt install dovecot
sudo apt install gcc make libjson-c-dev dovecot-dev
An example build may look like:
./autogen.sh
./configure --with-dovecot=/usr/lib/dovecot/
make
make install
sudo ln -s /usr/lib/dovecot/lib21_fts_elastic_plugin.so /usr/lib/dovecot/modules/lib21_fts_elastic_plugin.so
Create an index in Elasticsearch:
curl -X PUT "localhost:9200/m?pretty"
Create /etc/dovecot/conf.d/90-fts.conf with content:
mail_plugins = $mail_plugins fts fts_elastic
plugin {
fts = elastic
fts_elastic = debug url=http://localhost:9200/m/ bulk_size=5000000 refresh=fts rawlog_dir=/var/log/fts-elastic/
# no indexes new emails when user make search
# yes indexes every email when delivered
fts_autoindex = no
fts_autoindex_exclude = \Junk
fts_autoindex_exclude2 = \Trash
}
and (re)start dovecot:
dovecot stop; dovecot
- url=<elasticsearch url> Required elastic URL with index name, must end with slash /
- bulk_size=<positive integer> How large bulk requests we want to send to elastic in bytes (default=5000000)
- refresh={fts,index,never} When you want to refresh elastic index so new emails will be searchable
- fts: when dovecot fts plugin calls it (typically before search)
- index: after each bulk update using ?refrest=true query param (create not effective indexes when combined with fts_autoindex=yes)
- never: leave it to elastic, indexed emails may not be searchable immediately
- debug Enables HTTP debugging
- rawlog_dir is directory where HTTP communication with elasticsearch server is written (useful for debugging plugin or elastic schema)
This plugin stores all message in one elastic index. You can use sharding to support large numbers of users. Since it uses routing key, updates and searches are accessing only one shard. _id is in the form "_id":"uid/mbox-guid/user@domain", example: "_id":"3/f40efa2f8f44ad54424000006e8130ae/filip.hanes@example.com"
You can setup index mapping on Elasticsearch 6.x with command
curl -X PUT "http://elasticIP:9200/m?pretty" -H 'Content-Type: application/json' -d "@elastic6-schema.json"
on Elasticsearch 7.x there is different date format parser, you need to use different schema:
curl -X PUT "http://elasticIP:9200/m?pretty" -H 'Content-Type: application/json' -d "@elastic7-schema.json"
Fields box and user needs to be keyword fields, as you can see in file elastic-schema.json
.
In our schema there is _source enabled because we don't see much storage savings when _source is disabled and elastic documentation doesn't recommend it either.
This plugin doesn't use _source. It explicitly disables it in response queries, but you can use it for better management and insight to indexed emails or when you want to use elastic for other than dovecot fts (analysis, spammers detection, ...).
In case of elastic reindexing _source will be needed.
Any time you can reindex users mailbox with doveadm commands;
doveadm fts rescan -u user@example.com
doveadm index -u user@domain -q '*'
An example of pushed document:
{
"user": "filip.hanes@example.com",
"box": "f40efa2f8f44ad54424000006e8130ae",
"uid": 3,
"date": "Thu, 08 Jan 2015 00:20:05 +0000",
"from": "josh <josh@localhost.localdomain>",
"sender": "Filip Hanes",
"to": "<filip.hanes@example.com>",
"cc": "User <user@example.com>",
"bcc": "\"Test User\" <test@example.com>",
"subject": "Test #3",
"message-id": "<20150107132005.07DA3140314@example.com>",
"body": "This is the body of test #3.\n"
}
An example search:
curl -X POST "http://elasticIP:9200/m/_search?pretty" -H 'Content-Type: application/json' -d '
{
"query": {
"bool": {
"filter": [
{"term": {"user": "filip.hanes@example.com"}},
{"term": {"box": "f40efa2f8f44ad54424000006e8130ae"}}
],
"must": [
{
"multi_match": {
"query": "test",
"operator": "and",
"fields": ["from","to","cc","bcc","sender","subject","body"]
}
}
]
}
},
"size": 100
}
'
- user/mbox_guid parametrized url i.e.: url=http://127.0.0.1/m-%u/ would use index http://127.0.0.1/m-filip.hanes@example.com/
- Multiple mailbox lookup (for clients that call lookup_multi; need to find one)
- Optimisation (if any)
This plugin borrows heavily from dovecot itself particularly for the automatic detection of dovecont-config (see m4/dovecot.m4). The fts-solr and fts-squat plugins were also used as reference material for understanding the Dovecot FTS API. FTS-lucene was used as reference for implementing proper rescan.