💫 Release v0.36.0

@punndcoder28

Release Note (`0.36.0`)

Release time: 2023-07-18 14:43:28

This release contains 2 new features, 5 bug fixes, 1 performance improvement and 1 documentation improvement.

🆕 Features

JAX Integration (#1646)

You can now use JAX with Docarray. We have introduced JaxArray as a new type option for your documents. JaxArray ensures that JAX can now natively process any array-like data in your DocArray documents. Here's how you use of it:

from docarray import BaseDoc
from docarray.typing import JaxArray
import jax.numpy as jnp


class MyDoc(BaseDoc):
    arr: JaxArray
    image_arr: JaxArray[3, 224, 224] # For images of shape (3, 224, 224)
    square_crop: JaxArray[3, 'x', 'x'] # For any square image, regardless of dimensions
    random_image: JaxArray[3, ...]  # For any image with 3 color channels, and arbitrary other dimensions

As you can see, the JaxArray typing is extremely flexible and can support a wide range of tensor shapes.

Creating a Document with Tensors

Creating a document with tensors is straightforward. Here is an example:

doc = MyDoc(
    arr=jnp.zeros((128,)),
    image_arr=jnp.zeros((3, 224, 224)),
    square_crop=jnp.zeros((3, 64, 64)),
    random_image=jnp.zeros((3, 128, 256)),
)

Redis Integration (#1550)

Leverage the power of Redis in your Docarray project with this latest integration. Here's a simple usage example:

import numpy as np
from docarray import BaseDoc
from docarray.index import RedisDocumentIndex
from docarray.typing import NdArray


class MyDoc(BaseDoc):
    text: str
    embedding: NdArray[10]

docs = [MyDoc(text=f'text {i}', embedding=np.random.rand(10)) for i in range(10)]
query = np.random.rand(10)
db = RedisDocumentIndex[MyDoc](host='localhost')
db.index(docs)
results = db.find(query, search_field='embedding', limit=10)

In this example, we're creating a document class with both textual and numeric data. Then, we initialize a Redis-backed document index and use it to index our documents. Finally, we perform a search query.

Supported Functionalities

Find: Vector search for efficient retrieval of similar documents.
Filter: Use Redis syntax to filter based on textual and numeric data.
Text Search: Leverage text search methods, such as BM25, to find relevant documents.
Get/Del: Fetch or delete specific documents from the index.
Hybrid Search: Combine find and filter functionalities for more refined search. Currently, only these two can be combined.
Subindex: Search through nested data.

🚀 Performance

Speedup `HnswDocumentIndex` by caching num docs (#1706)

We've optimized the num_docs() operation by caching the document count, addressing previous slowdowns during searches. This change results in a minor increase in indexing time, but significantly accelerates search times.

from docarray import BaseDoc, DocList
from docarray.index import HnswDocumentIndex
from docarray.typing import NdArray
import numpy as np
import time

class MyDoc(BaseDoc):
    text: str
    embedding: NdArray[128]


docs = [MyDoc(text='hey', embedding=np.random.rand(128)) for _ in range(20000)]
index = HnswDocumentIndex[MyDoc](work_dir='tst', index_name='index')

index_start = time.time()
index.index(docs=DocList[MyDoc](docs))
index_time = time.time() - index_start

query = docs[0]

find_start = time.time()
matches, _ = index.find(query, search_field='embedding', limit=10)
find_time = time.time() - find_start

In the above experiment, we observed a 13x improvement in the speed of the search function, reducing its execution time from 0.0238 to 0.0018 seconds.

⚙ Refactoring

Put Contains method in the base class (#1701)

We've moved the contains method into the base class. With this refactoring, the responsibility for checking if a document exists is now delegated to individual backend implementations using the new _doc_exists method.

More robust method to detect duplicate index (#1651)

We have implemented a more robust method of detecting existing indices for WeaviateDocumentIndex

🐞 Bug Fixes

`WeaviateDocumentIndex` handles lowercase index names (#1711)

We've addressed an issue in the WeaviateDocumentIndex where passing a lowercase index name led to mismatches and subsequent errors. This was due to the system automatically capitalizing the index name when creating an index.

`QdrantDocumentIndex` unable to see `index_name` (#1705)

We've resolved an issue where the QdrantDocumentIndex was not properly recognizing the index_name parameter. Previously, the specified index_name was ignored and the system defaulted to the schema name.

Fix search in `InMemoryExactNNIndex` with `AnyEmbedding` (#1696)

From now on, you can perform search operations in InMemoryExactNNIndex using AnyEmbedding

Use `safe_issubclass` everywhere (#1691)

We now use safe_issubclass instead of issubclass because it supports non-class inputs, helping us to avoid unexpected errors

Avoid converting `DocLists` in the base index (#1685)

We added an additional check to avoid passing DocLists to a function that converts a list of dictionaries to a DocList.

📗 Documentation Improvements

Add docs for dict() method (#1643)

🤟 Contributors

We would like to thank all contributors to this release:

Puneeth K (@punndcoder28)
Joan Fontanals (@JoanFM)
Saba Sturua (@jupyterjazz)
Aman Agarwal (@agaraman0)
samsja (@samsja)
Shukri (@hsm207)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

💫 Release v0.36.0

Release Note (`0.36.0`)

🆕 Features

JAX Integration (#1646)

Creating a Document with Tensors

Redis Integration (#1550)

Supported Functionalities

🚀 Performance

Speedup `HnswDocumentIndex` by caching num docs (#1706)

⚙ Refactoring

Put Contains method in the base class (#1701)

More robust method to detect duplicate index (#1651)

🐞 Bug Fixes

`WeaviateDocumentIndex` handles lowercase index names (#1711)

`QdrantDocumentIndex` unable to see `index_name` (#1705)

Fix search in `InMemoryExactNNIndex` with `AnyEmbedding` (#1696)

Use `safe_issubclass` everywhere (#1691)

Avoid converting `DocLists` in the base index (#1685)

📗 Documentation Improvements

🤟 Contributors

Contributors

Uh oh!

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

💫 Release v0.36.0

Release Note (0.36.0)

🆕 Features

JAX Integration (#1646)

Creating a Document with Tensors

Redis Integration (#1550)

Supported Functionalities

🚀 Performance

Speedup HnswDocumentIndex by caching num docs (#1706)

⚙ Refactoring

Put Contains method in the base class (#1701)

More robust method to detect duplicate index (#1651)

🐞 Bug Fixes

WeaviateDocumentIndex handles lowercase index names (#1711)

QdrantDocumentIndex unable to see index_name (#1705)

Fix search in InMemoryExactNNIndex with AnyEmbedding (#1696)

Use safe_issubclass everywhere (#1691)

Avoid converting DocLists in the base index (#1685)

📗 Documentation Improvements

🤟 Contributors

Contributors

Uh oh!

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Release Note (`0.36.0`)

Speedup `HnswDocumentIndex` by caching num docs (#1706)

`WeaviateDocumentIndex` handles lowercase index names (#1711)

`QdrantDocumentIndex` unable to see `index_name` (#1705)

Fix search in `InMemoryExactNNIndex` with `AnyEmbedding` (#1696)

Use `safe_issubclass` everywhere (#1691)

Avoid converting `DocLists` in the base index (#1685)