You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
DocArray offers a number of predefined documents, like [ImageDoc][docarray.documents.ImageDoc] and [TextDoc][docarray.documents.TextDoc].
122
+
If you try to use these directly as a schema for a Document Index, you will get unexpected behavior:
123
+
Depending on the backend, an exception will be raised, or no vector index for ANN lookup will be built.
124
+
125
+
The reason for this is that predefined documents don't hold information about the dimensionality of their `.embedding`
126
+
field. But this is crucial information for any vector database to work properly!
127
+
128
+
You can work around this problem by subclassing the predefined document and adding the dimensionality information:
129
+
130
+
=== "Using type hint"
131
+
```python
132
+
from docarray.documents import TextDoc
133
+
from docarray.typing import NdArray
134
+
from docarray.index import HnswDocumentIndex
135
+
136
+
137
+
class MyDoc(TextDoc):
138
+
embedding: NdArray[128]
139
+
140
+
141
+
db = HnswDocumentIndex[MyDoc]('test_db')
142
+
```
143
+
144
+
=== "Using Field()"
145
+
```python
146
+
from docarray.documents import TextDoc
147
+
from docarray.typing import AnyTensor
148
+
from docarray.index import HnswDocumentIndex
149
+
from pydantic import Field
150
+
151
+
152
+
class MyDoc(TextDoc):
153
+
embedding: AnyTensor = Field(dim=128)
154
+
155
+
156
+
db = HnswDocumentIndex[MyDoc]('test_db3')
157
+
```
158
+
159
+
Once you have defined the schema of your Document Index in this way, the data that you index can be either the predefined Document type or your custom Document type.
160
+
161
+
The [next section](#index) goes into more detail about data indexing, but note that if you have some `TextDoc`s, `ImageDoc`s etc. that you want to index, you _don't_ need to cast them to `MyDoc`:
DocArray offers a number of predefined documents, like [ImageDoc][docarray.documents.ImageDoc] and [TextDoc][docarray.documents.TextDoc].
132
-
If you try to use these directly as a schema for a Document Index, you will get unexpected behavior:
133
-
Depending on the backend, an exception will be raised, or no vector index for ANN lookup will be built.
134
-
135
-
The reason for this is that predefined documents don't hold information about the dimensionality of their `.embedding`
136
-
field. But this is crucial information for any vector database to work properly!
137
-
138
-
You can work around this problem by subclassing the predefined document and adding the dimensionality information:
139
-
140
-
=== "Using type hint"
141
-
```python
142
-
from docarray.documents import TextDoc
143
-
from docarray.typing import NdArray
144
-
from docarray.index import ElasticDocIndex
145
-
146
-
147
-
class MyDoc(TextDoc):
148
-
embedding: NdArray[128]
149
-
150
-
151
-
db = ElasticDocIndex[MyDoc](index_name='test_db')
152
-
```
153
-
154
-
=== "Using Field()"
155
-
```python
156
-
from docarray.documents import TextDoc
157
-
from docarray.typing import AnyTensor
158
-
from docarray.index import ElasticDocIndex
159
-
from pydantic import Field
160
-
161
-
162
-
class MyDoc(TextDoc):
163
-
embedding: AnyTensor = Field(dim=128)
164
-
165
-
166
-
db = ElasticDocIndex[MyDoc](index_name='test_db3')
167
-
```
168
-
169
-
Once you have defined the schema of your Document Index in this way, the data that you index can be either the predefined Document type or your custom Document type.
170
-
171
-
The [next section](#index) goes into more detail about data indexing, but note that if you have some `TextDoc`s, `ImageDoc`s etc. that you want to index, you _don't_ need to cast them to `MyDoc`:
0 commit comments