0% found this document useful (0 votes)
347 views14 pages

GPT4 All

GPT4All
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
347 views14 pages

GPT4 All

GPT4All
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 14
7118123, 655 AM [GPTAAI locally on your PC and no interne | DataDrivenlovestor| Member-only story Offline Al Magic: Implementing GPT4AIl Locally with Python No Costs, No surprises. How to install GPT4All locally on your PC and use your documents! Christophe Aten - Follow (J) 85) Published in DataDriveninvestor - 10 min read - May 24 Sar Qe a © oO Photo by Boliviainteligente on Unsplash hitps archive is!A2realsaocton-176,0-200%.42 ane 18723, 8550m GPE ocalyon your PC and no ite DstaDiveninvesor Ready to dive into the world of AI without breaking the bank? Say hello to GPT4All — your new best friend for setting up a personal AI Helpdesk right on your PC. No internet? No problem! No hidden costs? You bet! I’ve got a step-by-step guide to get you from download to deployment, transforming your PC into a self-hosted AI powerhouse. Ready to roll up your sleeves and jump in? Let’s make GPT4All your secret weapon for success based on your own documents! No matter who you are! What is the difference between ChatGPT and GPT4AII? GPT4AIl is an ecosystem of open-source, assistant-style large language models that run locally on consumer-grade CPUs. It allows for the training and deployment of powerful and customized large language models. The GPT4All model is a 3GB — 8GB file that you can download and plug into the GPT4All open-source ecosystem software. The goal of GPT4All is to be the best instruction-tuned assistant-style language model that any person or enterprise can freely use, distribute, and build on. It provides a chat client that allows any GPT4All model to be run natively on a desktop. In contrast, ChatGPT is a language model developed by OpenAl. As of my last update in September 2021, it did not have specific system requirements and was primarily accessed via the internet. WebScrapping + Indexing + LangChain + GPT4AIl = Powerfull AlBot! hips archive is!A2realsalocton-176,0-200%.42 ang re, 55am PTAA ety on your PC and no at] OaaDeeinvesar Creating your own AIBot based on your documents requires just a few simple steps and is created in less than 1 hour! The process is efficient, speedy, and straightforward once you understand the underlying principles. The desired concept Taimed to build an AI Bot by following a series of steps: 1. Importing various types of text data (starting with HTML and eventually including PDFs). 2. Utilizing these documents as a knowledge base. 3. Determining the relevant information needed by the GPT4All model based on the user’s query. 4. Supplying this identified knowledge to the GPT4All model. 5, Receiving a fitting response from the model. So in my quest to develop an independent offline AIBot, I conducted extensive research on the internet, hoping to find exceptional Python code that would meet my specific requirements. However, I was unable to find any existing solutions that fully satisfied my needs. During my search, I came across: * articles discussing OpenAl, which required a key for access. « articles about Huggingface, which similarly necessitated a key for implementation. « I did find some offline working alternatives. However, they did not meet the desired level of performance. hitps archive is!A2realsaocton-176,0-200%.42 ane 7118123, 655 AM [GPTAAI locally on your PC and no interne | DataDrivenlovestor| Consequently, I took matters into my own hands and developed my own AIBot based on a combination of various information available on the internet. Why did | pursue this approach for an offline model? Numerous situations arise where users prefer to keep their information confidential and avoid sharing it with external entities. Moreover, many systems are deliberately isolated from the internet. It is precisely for these individuals that I have crafted the following code! What do you need? In order to create your own AIBot you need only a few things: Some URLs of your knowledge: Later I will create an article where you can use any kind of PDF files Python installed and some specific python packages: I use Anaconda3 GPT4All model: https://huggingface.co/mrgaang/aira/blob/main/gptgall- Internet for initial setup (No personal data is leaving!): If not available, download all the models before and copy them to the internal system without the internet. Python Packages: hitps archive is!A2realsaocton-176,0-200%.42 ana 7118123, 655 AM {GPTAAI locally on your PC and no interne | DataDriveninvestor Python packages needed for the offline GPT4All Model ¥ Install all those packages langchai install requests 11 sentence-transformers==2.2.2 Download the GPT4All model: Download the model and store it at the same location where your code is. You can also create a model folder to separate the code and the models. B rvnehgrae © ine rons #oneme, ious eA aniem pie: + ine ape, st Ahonen ommnty ' i Components Explained hitps archive is!A2realsaocton-176,0-200%.42 sine 1028, 685M PTéAlloay on your PC ad no tre DetaDrveninestr GPT4All: GPT4All is a chatbot that is not only free to use but also operates locally, ensuring privacy. There’s no need for a GPU or internet connection to utilize it. LangChain: Essentially, LangChain serves as a foundational structure centered on Language Learning Models (LLMs). It can be utilized for a wide array of applications, including chatbots, Generative Question-Answering (GQA), and summarization, among others. The fundamental concept behind this library is the ability to link various elements in a “chain”, thereby enabling the development of more sophisticated use cases involving LLMs. Sentence Transfomers: The sentence-transformers library offers user- friendly techniques to calculate embeddings (dense vector representations) for sentences, paragraphs, and even images. By placing texts in a vector space in a way that ensures proximity for similar content, it opens up possibilities for applications such as semantic search, clustering, and information retrieval. BeautifulSoup4: Beautiful Soup is a Python library that is used for web scraping purposes to pull the data out of HTML and XML files. It creates a parse tree from page source code that can be used to extract data ina hierarchical and readable manner. The whole Python code split and explained The Python code comprises several components that collectively generate a robust AIBot, leveraging either your personal data or publicly available information. In this use case, I will utilize data from Wikipedia, specifically focusing on the Eurovision Song Contest 2023. However, it’s important to note that hips archive is!A2realsalocton-176,0-200%.42 eta re, 55am PTAA ety on your PC and no at] OaaDeeinvesar you can customize the code to extract data from any desired website. Additionally, I will soon publish another article outlining the process of utilizing your own PDF files to upgrade your AIBot. 1. WebScrapping 2. Indexing 3. LangChain + GPT4All 1. WebScraping To construct a basic web scraper and convert the HTML content into the desired format, as shown below, several steps need to be followed. These steps are crucial not just for creating this offline AI Bot that relies on indexing and LangChain but is also key to another project I am currently developing, so stay tuned. “prompt! # Import the fo! 9 packages for the Webscraper from bs4 impo ifulsoup hips archive is!A2realsoocton-176,0-200%.42 m4 7118123, 655 AM {GPTAAI locally on your PC and no internet | DstaDriveniovestor import requests import. json The following code will iterate over a list of urls, in my example only 1 and create the desired structure of “prompt”, “response” and “source”. urls = ["https://en.wikipedia.org/wiki/Burovision_Song Contest_2023",] result = (] # Send HTTP request to the specified URL and save the response from server in a for url in urls response = requests.get (url) # Create @ BeautifulSoup object and specify the parser library at the same { soup = BeautifulSoup (response.text, "html.parser') # Find all the h and p tags on the page headers = soup.find all({"hi', 'h2', "h3', ‘hat, thS', 'h6', tp']) current_prompt = "" current_response for tag in headers: if tag.name in ["hl', "h2", "h3', that, "HS", th6']s if current_prompt and current_response: # ensuring both prompt and result.append({"prompt": current_prompt, "response": current_re: current prompt = tag. text current_response = "" elif tag-name == 'p! current_response += ' ' + tag.text # Don't forget the last one if current_prompt and current_responss result.append({"prompt": current_prompt, "response": current_response. st # Convert the list to JSON json_result = json.dumps(result, indent=4) hips archive is!A2realsalocton-176,0-200%.42 ana 7118123, 655 AM {GPTAAI locally on your PC and no internet | DstaDriveniovestor Below you can find a part of the “json_result”. "prompt": "Eurovision Song Contest 2023", 9 Contest. 2023 was the 67th edition of tl .wikipedia.org/wiki/Eurovision Song Contest_2023" "response": "The Eurovision "source": "nttps: So far so good. We have our knowledge. Now we need to create a vector store for information retrieval based on our questions. The questions are the way the user can interact with the AIBot. 2. Indexing: Using SentenceTransformers and FAISS In order to make our knowledge retrieved in the previous step accessible we will use SentenceTransformers and FAISS. FAISS is a library for efficient similarity search. # Load the sentence transformer model model = SentenceTransformer ("all-MiniLM-L6-v2") # step 3: Convert all the entries into embeddings, based on the prompt. entries = [{"prompt': entry['prompt'], 'response': entry['response']} for entry # Generate the embeddings for the prompts prompt_embeddings = model.encode({entry['prompt'} for entry in entries)) hitps archive is!A2realsaocton-176,0-200%.42 ane 1028, 685M GP TEA ocalyon your PC and po iter DstaDrvennvesor Having generated prompt embeddings with the SentenceTransformer model, we're now prepared to feed these into FAISS to establish an index database. ¥ Dimension of the embeddings t_embeddings.. shay dimension = p: ¥ Configure the FATSS index index = fai: (dimension) # Add vectors to the index dex. ada enbeddings) 3. LangChain + GPT4AII: The final superpower! With just a few more lines of code, we're about to wrap up. Up to this point, we've constructed a web scraper using BeautifulSoup4, and we've created an index database using SentenceTransformers and FAISS — all with just a handful of code lines. In order to find the best matching prompt we need the following function. The function will search within the index the most accurate prompt based on the user question. def find best_matching prompt (question, index): # Convert the question into an embedding quest.ion_es stion}) bedding = model.encede (user # Perform a search D, I = index. search (question embedding, vv # Get the best matching entry T{0] (01 entries (best_match_index hitps archive is!A2realsoocton-176,0-200%.42 sone 7118123, 655 AM {GPTAAI locally on your PC and no internet | DstaDriveniovestor Are you ready for the last final three code snippets? Keep going, it will be worth it! gpt4all_path = '.. - /models/gpt4ali-converted. ¥# Calback manager for handling the calls with the model allback_manager = Cal lbackWanagex ([Streamin stdoutcal lbackHandler () 1) Lim = GPP4All (model=gpt4all_path, callback managerscallback manager, verbose A prompt to interact with the user and to retrieve the best_matching_entry based on the user_question. # User question Yuser_question = "Which mechanisms does Dataiku DSS provides for python code?" user_question = input () best matching entry = find best matching prompt (user quest index) Now comes the moment where all the pieces come together. We'll utilize a template prompt, incorporate the context, and add the user’s question. All of this will then be injected into the LLMChain to generate our final response. It’s quite magical how it all works together. template = """Given the following extracted parts of a ng document and a ques’ you don't know the answer, just say that you don't know. Don't try to make uj ALWAYS ret zn a "SOURCES" pa: Respond in English. hitps archive js!A2readsaocton-176,0-200%.42 se 7118123, 655 AM [GPTAAI locally on your PC and no interne | DataDrivenlovestor| QUESTION: {quest NGLISH: # Creating the context context = b tching_entry[' response’ plate, input_variabli Lim=11m) Impressive, isn’t it? We've successfully fed our own specific information into the LLMChain — information that the model wasn’t previously aware ofp Thisis:the final:answer omthe following question: > Question: What was the location of the Eurovision Song Contest in 2023? Answer: The Eurovision Song Contest 2023 took place on May 13th at Liverpool Arena hosted by BBC and EBU. There were thirty-seven participating countries, which is less than last year’s contest due to the global energy crisis of that time period as Bulgaria, Montenegro & North Macedonia had ceased their participation for financial reasons. The winner was Sweden with a song entitled “Tattoo” performed by Loreen and written by her together five others. Finland, Israel Italy & Norway came second through fifth respectively in the top 5 of this contest. Sweden won both combined vote as well televote rounding out their win doubled success from last year when Johnny Logan did it for Ireland’s second time victories. Conclusion hips archive is!A2realsalocton-176,0-200%.42 ran4 re, 55am PTAA ety on your PC and no at] OaaDeeinvesar This model, after the initial setup, can function completely independently without any internet connection. This makes it an ideal solution for any organization that: « Is keen on keeping their information confidential and not sharing it with third parties. + Wants to avoid paying fees for each query to OpenAl or other third parties. Moreover, such a self-contained model offers improved privacy and data security, as the data never leaves your local system. Additionally, it offers a level of cost efficiency over time, as there are no ongoing fees per query, making it a sustainable solution for handling large volumes of queries. It also offers the flexibility to be used in environments with limited or no internet access. The whole Python code can be found under my git- repo: https://github.com/vashAl/GPT4All_ OwnDocuments_ Offline My most-recent posts: ¢ Deep Learning and Financial Inclusion: Opportunities and Challenge Top 5 Books for Mastering Deep Learning in Finance Data Privacy in the Age of Artificial Intelligence in Finance hitps archive is!A2realsaocton-176,0-200%.42 134 7118123, 655 AM [GPTAAI locally on your PC and no interne | DataDrivenlovestor| Did you enjoy it? Follow me: Christophe Atten and Clap 50 times Did you find this interesting? Want more? Use my referral link to join Medium. You'll get access to a wide range of unique stories and ideas, and you'll be supporting my work at the same time. Click here to get started, ‘Thanks for your support You can find me on Medium, Twitter and LinkedIn. Let’s enjoy Data Science, Machine Learning and Innovations together! Ifyou enjoyed this article and would like to support my work, clap 50 times. THANKS! Join Medium with my referral link — Christophe Atten Read every story from Christophe Atten (and thousands of other writers on Medium). Your membership fee directly... medium.com Subscribe to DDIntel Here. Visit our website here: https://www.datadriveninvestor.com Join our network here: https://datadriveninvestor.com/collaborate hitps archive is!A2realsaocton-176,0-200%.42 sane

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy