Update HuggingFace

Modify HuggingFace to Hugging Face in multiple pages.
postgresml · craigmoore1 · May 2, 2024 · May 3, 2024 · May 5, 2024 · May 5, 2024
commit c3d1f8f9293980486bde7ffde8655da075aaf24c
diff --git a/pgml-cms/docs/api/sql-extension/pgml.tune.md b/pgml-cms/docs/api/sql-extension/pgml.tune.md
@@ -45,7 +45,7 @@ translation
 
 \===
 
-This HuggingFace dataset stores the data as language key pairs in a JSON document. To use it with PostgresML, we'll need to provide a `VIEW` that structures the data into more primitively typed columns.
+This Hugging Face dataset stores the data as language key pairs in a JSON document. To use it with PostgresML, we'll need to provide a `VIEW` that structures the data into more primitively typed columns.
 
 \=== "SQL"
 

diff --git a/pgml-cms/docs/use-cases/chatbots/README.md b/pgml-cms/docs/use-cases/chatbots/README.md
@@ -33,7 +33,7 @@ text -> tokens -> LLM -> probability distribution -> predicted token -> text
 <figure><img src="https://files.gitbook.com/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FrvfCoPdoQeoovZiqNG90%2Fuploads%2FPzJzmVS3uNhbvseiJbgi%2FScreenshot%20from%202023-12-13%2013-19-33.png?alt=media&#x26;token=11d57b2a-6aa3-4374-b26c-afc6f531d2f3" alt=""><figcaption><p>The flow of inputs through an LLM. In this case the inputs are "What is Baldur's Gate 3?" and the output token "14" maps to the word "I"</p></figcaption></figure>
 
 {% hint style="info" %}
-We have simplified the tokenization process. Words do not always map directly to tokens. For instance, the word "Baldur's" may actually map to multiple tokens. For more information on tokenization checkout [HuggingFace's summary](https://huggingface.co/docs/transformers/tokenizer\_summary).
+We have simplified the tokenization process. Words do not always map directly to tokens. For instance, the word "Baldur's" may actually map to multiple tokens. For more information on tokenization checkout [Hugging Face's summary](https://huggingface.co/docs/transformers/tokenizer\_summary).
 {% endhint %}
 
 To be very specific, modern LLM's are [function approximators](https://en.wikipedia.org/wiki/Function\_approximation) for the next most likely `token` given a list of `tokens`. They are not perfect and the level of accuracy is dependent on a number of factors like the model architecture, the training algorithms, and potentially most importantly the data it was trained on.
@@ -356,7 +356,7 @@ By chaining these special tags we can build a conversation that Hermes has been
 This example highlights that modern LLM's are stateless function approximators. Notice we have included the first question we asked and the models response in our input. Every time we ask it a new question in our conversation, we will have to supply the entire conversation history if we want it to know what we already discussed. LLMs have no built in way to remember past questions and conversations.
 {% endhint %}
 
-Doing this by hand seems very tedious, how do we actually accomplish this in the real world? We use [Jinja](https://jinja.palletsprojects.com/en/3.1.x/) templates. Conversational models on HuggingFace typical come with a Jinja template which can be found in the `tokenizer_config.json`. [Checkout `teknium/OpenHermes-2.5-Mistral-7B`'s Jinja template in the `tokenizer_config.json`](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B/blob/main/tokenizer\_config.json). For more information on Jinja templating check out [HuggingFace's introduction](https://huggingface.co/docs/transformers/main/chat\_templating).
+Doing this by hand seems very tedious, how do we actually accomplish this in the real world? We use [Jinja](https://jinja.palletsprojects.com/en/3.1.x/) templates. Conversational models on Hugging Face typical come with a Jinja template which can be found in the `tokenizer_config.json`. [Checkout `teknium/OpenHermes-2.5-Mistral-7B`'s Jinja template in the `tokenizer_config.json`](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B/blob/main/tokenizer\_config.json). For more information on Jinja templating check out [Hugging Face's introduction](https://huggingface.co/docs/transformers/main/chat\_templating).
 
 Luckily for everyone reading this, our `pgml` library automatically handles templating and formatting inputs correctly so we can skip a bunch of boring code. We do want to change up our program a little bit to take advantage of this automatic templating:
 
@@ -393,7 +393,7 @@ Hello! My name is Hermes. How can I help you today?
 You just asked me what my name is, and I am a friendly and helpful chatbot named Hermes. How can I assist you today? Feel free to ask me any questions or seek any assistance you need.
 ```
 
-Note that we have a list of dictionaries called `history` we use to store the chat history, and instead of feeding text into our model, we are inputting the `history` list. Our library automatically converts this list of dictionaries into the format expected by the model. Notice the `roles` in the dictionaries are the same as the `roles` of the messages in the previous example. This list of dictionaries with keys `role` and `content`  as a storage system for messages is pretty standard and used by us as well as OpenAI and HuggingFace.
+Note that we have a list of dictionaries called `history` we use to store the chat history, and instead of feeding text into our model, we are inputting the `history` list. Our library automatically converts this list of dictionaries into the format expected by the model. Notice the `roles` in the dictionaries are the same as the `roles` of the messages in the previous example. This list of dictionaries with keys `role` and `content`  as a storage system for messages is pretty standard and used by us as well as OpenAI and Hugging Face.
 
 Let's ask it the dreaded question:
 

diff --git a/...eddings/personalize-embedding-results-with-application-data-in-your-database.md b/...eddings/personalize-embedding-results-with-application-data-in-your-database.md
@@ -1,12 +1,12 @@
 # Personalize embedding results with application data in your database
 
-PostgresML makes it easy to generate embeddings using open source models from Huggingface and perform complex queries with vector indexes and application data unlike any other database. The full expressive power of SQL as a query language is available to seamlessly combine semantic, geospatial, and full text search, along with filtering, boosting, aggregation, and ML reranking in low latency use cases. You can do all of this faster, simpler and with higher quality compared to applications built on disjoint APIs like OpenAI + Pinecone. Prove the results in this series to your own satisfaction, for free, by signing up for a GPU accelerated database.
+PostgresML makes it easy to generate embeddings using open source models from Hugging Face and perform complex queries with vector indexes and application data unlike any other database. The full expressive power of SQL as a query language is available to seamlessly combine semantic, geospatial, and full text search, along with filtering, boosting, aggregation, and ML reranking in low latency use cases. You can do all of this faster, simpler and with higher quality compared to applications built on disjoint APIs like OpenAI + Pinecone. Prove the results in this series to your own satisfaction, for free, by signing up for a GPU accelerated database.
 
 ## Introduction
 
 This article is the third in a multipart series that will show you how to build a post-modern semantic search and recommendation engine, including personalization, using open source models. You may want to start with the previous articles in the series if you aren't familiar with PostgresML's capabilities.
 
-1. Generating LLM Embeddings with HuggingFace models
+1. Generating LLM Embeddings with Hugging Face models
 2. Tuning vector recall with pgvector
 3. Personalizing embedding results with application data
 4. Optimizing semantic results with an XGBoost ranking model - coming soon!
-Original file line number
+Diff line change
@@ Expand Up / @@ -45,7 +45,7 @@ translation @@
     \===
-    This HuggingFace dataset stores the data as language key pairs in a JSON document. To use it with PostgresML, we'll need to provide a `VIEW` that structures the data into more primitively typed columns.
+    This Hugging Face dataset stores the data as language key pairs in a JSON document. To use it with PostgresML, we'll need to provide a `VIEW` that structures the data into more primitively typed columns.
     \=== "SQL"
@@ Expand Down @@