You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: pgml-cms/docs/introduction/import-your-data/README.md
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -16,7 +16,7 @@ If your intention is to use PostgresML as your primary database, your job here i
16
16
17
17
If your primary database is hosted elsewhere, for example AWS RDS, or Azure Postgres, you can get your data replicated to PostgresML in real time using logical replication.
Having access to your data immediately is very useful to
22
22
accelerate your machine learning use cases and removes the need for moving data multiple times between microservices. Latency-sensitive applications should consider using this approach.
@@ -25,7 +25,7 @@ accelerate your machine learning use cases and removes the need for moving data
25
25
26
26
Foreign data wrappers are a set of PostgreSQL extensions that allow making direct connections from inside the database directly to other databases, even if they aren't running on Postgres. For example, Postgres has foreign data wrappers for MySQL, S3, Snowflake and many others.
27
27
28
-
<figureclass="my-3 py-3"><imgsrc="../../../.gitbook/assets/Getting-Started_FDW-Diagram.svg"alt="Foreign data wrappers"width="80%"><figcaption></figcaption></figure>
28
+
<figureclass="my-3 py-3"><imgsrc="../../.gitbook/assets/Getting-Started_FDW-Diagram.svg"alt="Foreign data wrappers"width="80%"><figcaption></figcaption></figure>
29
29
30
30
FDWs are useful when data access is infrequent and not latency-sensitive. For many use cases, like offline batch workloads and not very busy websites, this approach is suitable and easy to get started with.
Copy file name to clipboardExpand all lines: pgml-cms/docs/introduction/import-your-data/foreign-data-wrappers.md
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ description: Connect your production database to PostgresML using Foreign Data W
6
6
7
7
Foreign data wrappers are a set of Postgres extensions that allow making direct connections to other databases from inside your PostgresML database. Other databases can be your production Postgres database on RDS or Azure, or another database engine like MySQL, Snowflake, or even an S3 bucket.
8
8
9
-
<figureclass="my-3 py-3"><imgsrc="../../../.gitbook/assets/Getting-Started_FDW-Diagram.svg"alt="Foreign data wrappers"width="80%"><figcaption></figcaption></figure>
9
+
<figureclass="my-3 py-3"><imgsrc="../../.gitbook/assets/Getting-Started_FDW-Diagram.svg"alt="Foreign data wrappers"width="80%"><figcaption></figcaption></figure>
Copy file name to clipboardExpand all lines: pgml-cms/docs/introduction/import-your-data/logical-replication/README.md
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ description: Stream data from your primary database to PostgresML in real time u
6
6
7
7
Logical replication allows your PostgresML database to copy data from your primary database to PostgresML in real time. As soon as your customers make changes to their data on your website, those changes will become available in PostgresML.
Copy file name to clipboardExpand all lines: pgml-cms/docs/open-source/pgml/guides/chatbots/README.md
+4-4Lines changed: 4 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -30,7 +30,7 @@ Here is an example flowing from:
30
30
31
31
text -> tokens -> LLM -> probability distribution -> predicted token -> text
32
32
33
-
<figure><imgsrc="../../.gitbook/assets/Chatbots_Limitations-Diagram.svg"alt=""><figcaption><p>The flow of inputs through an LLM. In this case the inputs are "What is Baldur's Gate 3?" and the output token "14" maps to the word "I"</p></figcaption></figure>
33
+
<figure><imgsrc="../../../../.gitbook/assets/Chatbots_Limitations-Diagram.svg"alt=""><figcaption><p>The flow of inputs through an LLM. In this case the inputs are "What is Baldur's Gate 3?" and the output token "14" maps to the word "I"</p></figcaption></figure>
34
34
35
35
{% hint style="info" %}
36
36
We have simplified the tokenization process. Words do not always map directly to tokens. For instance, the word "Baldur's" may actually map to multiple tokens. For more information on tokenization checkout [HuggingFace's summary](https://huggingface.co/docs/transformers/tokenizer\_summary).
@@ -108,11 +108,11 @@ What does an `embedding` look like? `Embeddings` are just vectors (for our use c
<figure><imgsrc="../../.gitbook/assets/Chatbots_King-Diagram.svg"alt=""><figcaption><p>The flow of word -> token -> embedding</p></figcaption></figure>
111
+
<figure><imgsrc="../../../../.gitbook/assets/Chatbots_King-Diagram.svg"alt=""><figcaption><p>The flow of word -> token -> embedding</p></figcaption></figure>
112
112
113
113
`Embeddings` aren't limited to words, we have models that can embed entire sentences.
114
114
115
-
<figure><imgsrc="../../.gitbook/assets/Chatbots_Tokens-Diagram.svg"alt=""><figcaption><p>The flow of sentence -> tokens -> embedding</p></figcaption></figure>
115
+
<figure><imgsrc="../../../../.gitbook/assets/Chatbots_Tokens-Diagram.svg"alt=""><figcaption><p>The flow of sentence -> tokens -> embedding</p></figcaption></figure>
116
116
117
117
Why do we care about `embeddings`? `Embeddings` have a very interesting property. Words and sentences that have close [semantic similarity](https://en.wikipedia.org/wiki/Semantic\_similarity) sit closer to one another in vector space than words and sentences that do not have close semantic similarity.
118
118
@@ -157,7 +157,7 @@ print(context)
157
157
158
158
There is a lot going on with this, let's check out this diagram and step through it.
159
159
160
-
<figure><img src="../../.gitbook/assets/Chatbots_Flow-Diagram.svg"alt=""><figcaption><p>The flow of taking a document, splitting it into chunks, embedding those chunks, and then retrieving a chunk based off of a users query</p></figcaption></figure>
160
+
<figure><img src="../../../../.gitbook/assets/Chatbots_Flow-Diagram.svg"alt=""><figcaption><p>The flow of taking a document, splitting it into chunks, embedding those chunks, and then retrieving a chunk based off of a users query</p></figcaption></figure>
161
161
162
162
Step 1: We take the document and split it into chunks. Chunks are typically a paragraph or two in size. There are many ways to split documents into chunks, for more information check out [this guide](https://www.pinecone.io/learn/chunking-strategies/).
0 commit comments