Natural Language Processing

Natural Language Processing:
Introduction:
This chapter includes a detailed view of how NLP is being utilized in this project. It includes
basic principles and techniques as well as the utilization of LLMs and library packages needed
for the implementation of the project.
Natural Language processing is a branch of artificial intelligence that mainly explores the
methods in which computers and human beings communicate. The main object of NLP
techniques is to analyze, interpret and respond to human queries in the form of speech or text.
It is a branch of computational linguistics. It focuses on the combined use of computer science
linguistics and artificial intelligence to study the complexities of human communication.
It can be briefly defined in the following 2 steps
- Data preprocessing:
In data preprocessing we make it so that the data is prepared by cleaning it up and giving
it readable form so that NLP algorithm can analyze it. Some of these techniques include
tokenization which breaks data into units such as words, punctuation and phrases.
Lemmatization reduces the words dto their base form. POS tagging breaks sentences to
in nouns, verbs and other parts. Parsing deals with the relation between words
- Algorithm development:
Algorithm development means that to take useful information from the data so that NLP
algorithms can be applied to it. Common natural language processing is sentiment
analysis, to determine the emotional context of the input which is labeled as positive,
negative or neutral. Named entity recognition identifies people, locations, dates and
organizations. Topic modeling is used by pairing similar words and phrase to understand
the general theme of the project from a collection of text. Machine translation is utilized
to convert one language to another with the application of machine learning. Language
modeling is used for autocomplete, auto correct and speech-to-text systems.
Basic technique (BOW):

Bag of words is the simplest of the text processing algorithms. It is the representation of text in
the form of numbers.it describes the occurrence of a word in the document and keeps track of it.
It disregards the use of grammar or structure of sentence and all other associated characteristics.
This technique is utilized because it enables us to store data in fixed-length vectors which are
easier for the machine learning models to process and also, we have numerical data to work with
instead of text data.
The first implementation of the chatbot was using BOW. The first part of this is to train a neural
network for the program. The intents are loaded from the intent file, punctuation marks are
ignored. The file is iterated each time and are separated in by making three empty lists named
‘words’, ‘classes’ and ‘documents’ for storage of processed data. The program tokenizes the data
and updates the above lists with the words and also classifies the data according to the said
intent. Then training is done in which we lemmatize all the data and keep track of occurrence of
each word in the documents. Then a feed forward neural network is initialized using
Keras’Sequential API. It consists of an input layer with 128 neurons, a dropout layer to prevent
overfitting, a hidden layer with 64 neurons, another dropout layer, and finally an output layer
with neurons equal to the number of classes, using SoftMax activation for multi-class
classification. The model is compiled using stochastic gradient descent (SGD) optimizer with a
learning rate of 0.01 and momentum of 0.9. The loss function is set to categorical cross-entropy,
and accuracy is used as the evaluation metric.
Then using the training data which contains words and classes are stored.
The chatbot is then initialized to observe the workings. First input text is pre processed to pass
into the model. The training datasets of intents and words and classes. The user input is made
into a bag of words and then it predicts the intent of the user and generates the appropriate
response.
This technique however is limited in its capacity as it cannot function on the larger input and it
gives randomized answers. Also, it loses context of the conversation.
LLM:
Large language models describe class of deep learning architectures called transformer networks.
Transformers are neural network models that keeps track of the context and meaning of
sentences by tracking sequence of data in words and phrases in a sentence.
Transformer models were first described in a 2017 paper called “Attention is All You Need”(
https://www.ibm.com/topics/transformer-model#:~:text=A%20transformer%20model%20is
%20a,machine%20learning%20and%20artificial%20intelligence.) by Ashsh Vaswani, a team at
Google Brain and a group from University of Toronto. Transformer technology is the primary
tool utilized in development of NLP.
A transformer consists of a multiple layer. It has self-attention layer, feed forward layers and
normalization layers which are used to decode stream of input to anticipate the output. Multiple
layers can be utilized to make the LLM more powerful. Innovation in transformer models mainly
lies in two features incorporated in LLMs, self-attention and positional encodings.
Positional encoding allows words to be given to the model non-sequentially in the neural
networks by embedding the order in which input occurs.
Self-attention enables the model to consider which input is more relevant to the context and the
output. This is done by assigning weights to the words. This is a learned behavior which is
perfected over time by as the model works on mountain loads of data. This enables the models to
take large streams of input which can be processed properply while keeping track of the context
and keeping relevant information in the loop.
This non-sequential analysis is usually accompanied by multiple computations which is
generally done with GPUs which work on multiple inputs in parallel.
-Llama: Llama 2 is a collection of pretrained and finetuned LLMs which are trained form 7
billion to 70 billon parameters. The fine-tuned models called Llama 2-Chat are optimized for
dialogue use cases. The models outperform the open-source chat models on all bench marks
tested and based on human evaluation for helpfulness and safety, the models are optimal
substitute for closed-source models.[1]
For the use case of this project, I have used “TheBloke/Llama-2-7B-Chat-GGUF/llama-2-7b-
chat.Q4_K_M.gguf” which is trained 7B parameters with and has quantization of 4 bits. The
model is recommended for most use cases such as ours and is compatible on the operating
hardware. It takes 4.08 GB of space and requires maximum running RAM of 6.83 GB.
The Llama2 chat or any other LLM can be utilized in the following manners
[(don’t know where to put) LM STUDIO:
LM Studio is a desktop application which is used for accessing local and open source LLMs on
PCs. The software allows to download models from hugging face and currently it supports
GGML Llama and StarCoder models available on hugging face. The software has the capability
to run LLMs on CPU completely offline and without the need of internet. It also enables the
making of http server of said models in OpenAI compatible server.]
Prompt engineering:
Prompt engineering is a new niche being developed as we move towards a more AI driven
world. While the thought process behind it exists in the form of Google search i.e. we can get a
variety of results on the same topic by changing a few key words and go for the closest possible
result. It can be defined as the practice of designing and refining prompts to guide the AI model
towards a specific output.
Few key components play a role in prompt engineering such as
- Model architecture
- Training and tokenization
- Model parameters
- Temperature and Top-k sampling
- Loss function and gradients
Out of the above understanding of all is important while temperature is the parameter a prompt
engineer has to deal with as it influences the randomness of the response and is essential in how
does the model work in a typical setting.
Following are the requisites of a good prompt:
-Instruction: It clearly states what is expected from th prompt and can be very decisive in the
output.
-Context: It gives the model the ability to focus and provide a clearer response as to what is
expected from it. It can be useful while making it work on more specific topics.
-Input data: Along with the context, this is the information which we want the model to work
with. This can include paragraphs, numbers, code and words.
-Output indicator: In the form of role-playing scenarios, such as building a bot for assistive
measures and communication, it tells the model what output do we need from it and what tone is
expected from the model.
For my project, I utilized prompt engineering and tested my output against various iterations
with multiple temperature variations. The final output prompt is suitable for the task at hand and
the input context is continuously appended to keep track of the conversation and the fields are
divided into system and user.
(Retrieval Augmented Generation) RAG:
AI models are well suited to engage in sentiment analysis and named entity recognition, however
as smaller models may not be updated to the latest information and only have an understanding
of the data, they have been trained on so these models do not have the capability to give accurate
facts and info. What we can do to remedy this is to give them access to external documents
which can be utilized as a knowledge base for the model and give more accurate responses.
This is mainly in knowledge extensive tasks such as a chatbot which gives answer about the
documentation of a company’s existing technologies. For this, RAG is utilized. The RAG
flowchart shown in Fig is shows the difference between when ChatGPT is asked a question
about a recent phenomenon with and without access to a documentation. This clearly illustrates
the contrast between the two use cases.
Figure 1 [2]
In my project RAG was initially utilized but it does not enhance the capability of the current
scope of the project as it is to have generic discussion with the user and does not need a
specialized knowledge base to do so.
Fine Tuning:
Chat model fine-tuning is modifying a pre-trained language model (e.g., GPT-3) to make it more
efficient for certain jobs or to conform to certain user needs. In order to accomplish this, the
model is usually trained again using a specialized dataset that represents the intended domain,
tone, or kind of interaction. Fine-tuning improves performance and relevance by modifying the
model's parameters to help it comprehend the subtleties and context unique to the new data. For
example, a dataset of customer service chats can be used to refine a general-purpose chatbot
model and improve its ability to handle support inquiries. By ensuring the chatbot can respond
with precision and awareness of context, this technique improves user experience and increases
the model's usefulness for specific applications.
In this project fine tuning is implemented to make it so that the responses are more tailored
towards the user.
Software Implementation:
For the implementation of this project, we mainly used python for its programming and utilized
anaconda navigator for the smooth setup of environments and variables.
Python:
Python is a general-purpose programming language. It is a dynamically typed and garbage-
collected language. It is widely used for multiple programming applications as it can support
multiple programming paradigms such as structured, object oriented and functional
programming.
Multiple features of the language are utilized in this project. We have utilized the language for
the task of data management, web services, voice processing and AI integration. Below I have
summarized the main features of the language utilized.
Anaconda: The Python and R programming languages are distributed via this Windows-based
software platform. It has a rather sizable repository with 1500 data science packages, such as
Matplotlib, NumPy, and Pandas. Additionally, it enables the user to establish a virtual
environment where packages of various versions can be installed without causing conflicts.
Package management, the Jupiter notebook interface, code execution and deployment, and
library integration are a few more features. Anaconda was utilized in this project to create and
save the virtual environment.
- IDE: For this project we have utilized VS code as the main IDE. VS code is an open-
source and free to use coding editor. It supports multiple languages and is highly efficient
in its variety of capabilities such as auto completion and syntax highlighting.
- Libraries:
Following libraries provide the main functionality to the program.
-VOSK: Vosk is an open-source speech recognition toolkit with offline streaming
capabilities. The Vosk library is easily downloadable and can support more than 20
languages and dialects[3]. For the purpose of this project, we have used the language
model of English-In which is English language in Indian accent as it gave the best result
when compared against other dialects.
-pyttsx3: It is a text to speech conversion. Library. It is used alternatively to VOSK.
This library also works offline and can give voice output to the system. The library works
efficiently with Python 2 and 3.
-speech_recognition: It is a library with speech to text capabilitites. It can be utilized
offline as well as online. It has support for a vast number of engines and APIs.
-Langchain: Langchain is an open-source framework for development of applications
using LLMs. Its toolkit and APIs ease the process of building chatbots and virtual
assistants which harness the capabilities to LLMs
In this project for the implementation of RAG we utilize RAG which we will be
explained in code implementation part.
-OpenAi: OpenAI is a python library which provides convenient access to OpenAI
REST API services which is needed to access the capabilities of LLM local server
established using LM Studio and also access to all its necessary functions.
-Firebase_admin: This library provides back-end services to python applications such
as real time database, authentication, app connection, analytics to name a few. For this
project we have utilized the real time database. The detailed implementation will be in the
next section.
-Tkinter: The standard Python GUI (Graphical User Interface) module, Tkinter, offers a
quick and simple method for developing graphical programs. It is a small object-oriented
layer built on top of the strong, cross-platform graphical user interface toolkit known as
Tcl/Tk GUI toolkit.
Code implementation:
-text to speech conversion:
An offline speech recognition system can be initialized using the provided Vosk-based code. The
first step involves importing the required libraries, which include Vosk's Model,
KaldiRecognizer, subprocess, json, and time. The language model (vosk-model-small-en-in-0.4)
is then used to initialize a Vosk model, and a KaldiRecognizer is made to analyze audio frames
with word-level detail enabled. Audio frames are continuously processed by the
speech_recognition function from a queue (recordings). It obtains audio frames, applies
rec.AcceptWaveform() to process them, and then takes the recognized text out of the output.
Next, output is used to attach this text to the output.append_stdout(), with the addition of a short
sleep interval to control the timing of the loop.
Alternatively, we can utilize speech_recognition python library which initializes a text-to-speech
engine in the program. We initialize Microphone as source and Recognizer as recognizer. Then
we select source as s and call listen() function. Then we utilize recognize_google to utilize the
online transcription service.
This is enclosed in a function called listen_for_command(). When the program is executed, it
starts listening for command but does not enter the command loop until it hears the trigger word.
Then it prompts the user to state their reservation.
-perform_command(command,ref): The output is then tested against defined use cases in the
code which are mapped to specific functions such as an IoT system to control appliances which
is explained in detail in the next section. There is also feature to send custom messages to the
caretaker from the patient by giving audio que for the function. This is also explained in detail in
the app development section. The main highlight of this section is the communication with LLM
running in the background. When the user utters the phrase “lets chat” or “chat” or similar
synonym the function enters the server communication function define as chat_with_assistant()
and prompts the user to ask about any topic the patient wants to discuss. The queries of the user
are then promptly answered by the language model. The model continues to chat with the user
until prompted by the user to exit or go to sleep in which it again goes to the state where it goes
to listen for the trigger word. The loop continues in perpetuity.
- reply from server: We define a respond(text) function which is utilized to give replies from the
program to the user based on his/her query. All the audio outputs from the system are given by
employing this function.
- context memory: In chat_with_assistant() function there is a functionality which continuously
stores responses by the chat application and also the queries of the user. This allows the chatbot
to keep in touch with the context of the conversation and give more accurate responses to the
user.
-chat_with_assistant(): This function utilizes the components of the OpenAi library and imports
functionality direct from it. We define the working parameters of the function such as model,
max tokens, temperature, API key and prompt. It also has the append function to store context as
mentioned above.
-message_to_caretaker(): This function is called when the phrase “send a message” or similar
key words are said. Then the user is prompted to record a message of their choosing for their care
taker. The message is sent to the android studio app. The detailed explanation of the application
is in its chapter/section.
(flowchart)
[1]H. Touvron et al., “Llama 2: Open Foundation and Fine-Tuned Chat Models,” 2023.
Available: https://arxiv.org/pdf/2307.09288
‌[2]Y. Gao et al., “Retrieval-Augmented Generation for Large Language Models: A Survey,”
Mar. 2024. Available: https://arxiv.org/pdf/2312.10997
[3]
“Vosk Speech Recognition Toolkit,” GitHub, Nov. 01, 2021. https://github.com/alphacep/vosk-
api

Natural Language Processing

Uploaded by

Copyright:

Available Formats

Natural Language Processing

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Natural Language Processing

Uploaded by

Copyright:

Available Formats

Natural Language Processing:

Basic technique (BOW):

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.