This page contains some open tasks for the Sphinx project. If you're interested in tackling one of these or have any suggestions please feel free to contact us.
- Replace Sphinx UTF-8 decoder with this one (Difficulty: Easy)
Sphinx works with UTF-8 data extensively. This implementation is faster than the current one. This task mostly involves testing and comparing two versions of decoders.
- Improve indextool --check (Difficulty: Medium)
This tool is used for checking index consistency, but it is always lagging behind index formats and doesn't detect some index errors. For example, JSON attributes in plain indexes are not checked at all.
- CRC32 SSE (Difficulty: Medium)
We use two hash functions in Sphinx: CRC32 and FNV64. And we use them a lot. Improving CRC32 speed could improve Sphinx's overall performance.
- JIT compiler for expression evaluating engine (Difficulty: Hard)
This is a long-term missed thing. Even simple JIT compilation will improve Sphinx expressions dramatically and will help a lot of users.
- Multi-threaded indexation (Difficulty: Hard)
Parallelism in process of indexation could potentially improve speed and utilize dreaming CPU forces.
- Add new functions to expression evaluator (Difficulty: Medium)
A lot of possibilities here. One example could be GROUP_SET_CONCAT which acts much like GROUP_CONCAT but returns only unique values.
- Improve builtin ranking formulas (Difficulty: Hard)
Search quality is a major thing for all search engines. Currently the best ranker in terms of a search quality is proximity_bm25. But it's not ideal. Adding a new fast ranking formula would be great. To compare qualities of different formulas some assessments are needed.