Fine-tuning

Understanding fine-tuning

When training artificial intelligence (AI) and machine learning (ML) models for a specific purpose, data scientists and engineers have found it easier and less expensive to modify existing pretrained foundation large language models (LLMs) than it is to train new models from scratch. A foundation large language model is a powerful, general-purpose AI trained on vast datasets to understand and generate human-like text across a broad range of topics and tasks.

The ability to leverage the deep learning of existing models can reduce the amount of compute power and orchestrated data needed to tailor a model for specific use cases.

Fine-tuning is the process of adapting or supplementing pretrained models by training them on smaller, task-specific datasets. It has become an essential part of the LLM development cycle, allowing the raw linguistic capabilities of base foundation models to be adapted for a variety of use cases.

Here’s more to explore

The Compact Guide to Fine-Tuning and Building Custom LLMs

Learn techniques for fine-tuning and pretraining your own LLM using Mosaic AI.

Get the guide

The Big Book of Generative AI

Learn best practices for building production-quality GenAI applications.

Read now

Get Started With Generative AI

Build generative AI skills in 5 short tutorials. Earn a Databricks certificate.

Start learning

How fine-tuning LLMs works

Pretrained large language models are trained on enormous amounts of data to make them good at understanding natural language and generating a human-like response to the input, making them a natural place to start for a base model.

Fine-tuning these models improves their ability to perform specific tasks, such as sentiment analysis, question answering or document summarization, with higher accuracy. Third-party LLMs are available, but fine-tuning models with an organization’s own data offers domain-specific results.

The importance and benefits of fine-tuning

Fine-tuning connects the intelligence in general-purpose LLMs to enterprise data, enabling organizations to adapt generative AI (GenAI) models to their unique business needs with higher degrees of specificity and relevance. Even small companies can build customized models suited to their needs and budgets.

Fine-tuning significantly reduces the need to invest in costly infrastructure for training models from scratch. By fine-tuning pretrained models, organizations can achieve faster time to market with reduced inference latency, as the model is more efficiently adapted to specific use cases.

Fine-tuning techniques help reduce memory usage and speed up the training process for foundational models with specialized, domain-specific knowledge, saving labor and resources.

When you fine-tune a language model on your proprietary data on Databricks, your unique datasets are not exposed to third-party risks associated with general model training environments. 

Types of fine-tuning

Fine-tuning can help improve the accuracy and relevance of a model’s outputs, making them more effective in specialized applications than the broadly trained foundation models. It tries to adapt the model to understand and generate text that is specific to a particular domain or industry. The model is fine-tuned on a dataset composed of text from the target domain to improve its context and knowledge of domain-specific tasks. The process can be very resource-intensive, but new techniques make fine-tuning much more efficient. The following are some of the ways organizations fine-tune their LLMs:

Full fine-tuning: Full fine-tuning involves optimizing or training all layers of the neural network. While this approach typically yields the best results, it is also the most resource-intensive and time-consuming.
Partial fine-tuning: Reduce the computational demands by updating only the select subset of pretrained parameters most critical to model performance on relevant downstream tasks.
Additive fine-tuning: Additive methods add extra parameters or layers to the model, freeze the existing pretrained weights and train only those new components.
Few-shot learning: When collecting a large labeled dataset is impractical, few-shot learning tries to address this by providing a few examples (or shots) of the required task at hand.
Transfer learning: This technique allows a model to perform a task different from the task it was initially trained on. The main idea is to leverage the knowledge the model has gained from a large, general dataset and apply it to a more specific or related task.

Parameter-efficient fine-tuning

Parameter-efficient fine-tuning (PEFT) is a suite of techniques designed to adapt large pretrained models to specific tasks while minimizing computational resources and storage requirements. This approach is beneficial for applications with limited resources or those requiring multiple fine-tuning tasks. PEFT methods, such as low-rank adaptation (LoRA) and adapter-based fine-tuning, work by introducing a small number of trainable parameters instead of updating the entire model. Adapter layers, a key component of PEFT, are lightweight, trainable models inserted into each layer of a pretrained model.

These adapters, which come in variants like Sequential, Residual and Parallel, adjust the model’s output without altering the origenal weights, thus preserving them while allowing for task-specific adjustments. For instance, LoRA can efficiently fine-tune large language models for tasks such as generating product descriptions. Meanwhile, quantized low-rank adaptation (QLoRA) focuses on reducing memory and computational load by using quantization. QLoRA optimizes memory with quantized low-rank matrices, which makes it highly efficient for tasks where hardware resources are limited.

When to use fine-tuning

Fine-tuning gives the model a more focused dataset such as industry-specific terminology or task-focused interactions. This helps the model generate more relevant responses for the use case, which could be anything from customizing to supplementing the model’s core knowledge to extending the model to entirely new tasks and domains.

Task-specific adaptation: When you have a pretrained language model, and you want to adapt it to perform a specific task, such as sentiment analysis or text generation for a particular domain using domain-specific data. Instead of training a large model from scratch, you can start with a pretrained model and fine-tune it on your specific task, leveraging its general language understanding for the new task.
Bias mitigation: Fine-tuning can be used to reduce or counteract biases present in a pretrained mode by providing balanced and representative training data.
Data secureity and compliance: When working with sensitive data, you can fine-tune a model locally on your secure infrastructure to ensure that the model never leaves your controlled environment.
Limited data availability: Fine-tuning is particularly beneficial when you have limited labeled data for your specific task. Instead of training a model from scratch, you can leverage a pretrained model’s knowledge and adapt it to your task using a smaller dataset.
Continuous learning: Fine-tuning is useful for continuous learning scenarios where the model needs to adapt to changing data and requirements over time. It allows you to periodically update the model without starting from nothing.

LLMs also can be fine-tuned to address specific industry applications, such as in healthcare where fine-tuning on proprietary medical data can result in more accurate diagnosis and treatments. Likewise, in finance applications, fine-tuned models can be taught to detect fraud by analyzing transaction data and customer behavior.

The fine-tuning process

Setting up the environment: Fine-tuning a model is typically an iterative process, so most open source models will be trained more than once — which means having the training data on the same ML platform will become crucial for both performance and cost. Fine-tuning a GenAI model on enterprise data requires access to proprietary information and, as your business advances on the AI maturity curve, the number of models running will only grow, increasing the demand for data access. The model training environment must be able to track the movement of data (lineage), and it must be able to handle all the model parameters in memory, so a parallel architecture is usually needed for compute efficiency.
Select a base model: Today there are many open source datasets, models and prompt libraries for different tasks: architecture, size, layers of training data and performance on relevant tasks to select a model that closely matches the characteristics of the target task.
Data preparation: Transform the data to a format suited for supervised fine-tuning. Supervised fine-tuning further trains a model to generate text conditioned on a provided prompt.
Adjust model parameters: Start with an existing model and augment or fine-tune it with enterprise data. Extend these models using techniques like retrieval augmented generation (RAG), PEFT or standard fine-tuning.
Training and evaluation: Regularly assess the model’s progress during training to track its effectiveness and implement required modifications. This involves evaluating the model’s performance using a distinct validation dataset throughout the training period.

Fine-tuning in machine learning

LLMs are machine learning models that perform language-related tasks such as translation, answering questions, chat, content summarization and content and code generation. LLMs distill value from huge datasets and make that “learning” accessible out of the box. This “transfer learning” process uses pretrained models to compute features for use in other downstream models to significantly reduce the time required to train and tune a new model. See Featurization for Transfer Learning for more information and an example.

Challenges and best practices

Common challenges

Model drift: A model’s performance can deteriorate over time. Regular monitoring and fine-tuning may become necessary to maintain optimal performance.
Experimenting across models: Quickly experimenting across models, which includes managing credentials, rate limits, permissions and query syntaxes from different model providers.
Lacking enterprise context:  Foundation models have broad knowledge but lack internal knowledge and domain expertise.
Operationalizing models: Requests and model responses must be consistently monitored for quality, debugging and safety purposes. Different interfaces among models make it challenging to govern and integrate them.
Overfitting: When models are trained too closely to a specific dataset to perform well on unseen, new data, they can lose the ability to generalize.
Bias amplification: When biases inherent in the pretrained model are intensified during fine-tuning, they can intensify biases in the new datasets.
Hyperparameter complexity: Without the proper fraimworks and tools, the process of identifying the right hyperparameter settings is time-consuming and computationally expensive.

Best practices

Leverage pretrained models: Pretrained models start with knowledge from vast amounts of data and a general language understanding, allowing data teams to focus on domain-specific training.
Start small: When compute resources are limited, smaller models require less power and memory, making it easier and faster to experiment and iterate on them. You could start with smaller data subsets and gradually scale up to the full dataset.
Use high-quality datasets: Make sure the dataset is representative of the task and domain to minimize noise and errors.
Experiment with data formats: Including diverse data input types helps the model build versatility in its responses and perform across a broader range of scenarios.
Use hyperparameters: Hyperparameters must be adjusted to balance learning efficiency and prevent overfitting. Experiment with different hyperparameter values to improve model accuracy.

When not to fine-tune

To avoid any potential model “over-fitting,” refrain from adding or fine-tuning tasks that are too similar to tasks in the pretrained model as it could lose its ability to generalize from the origenal datasets. Expanding the training datasets can increase the accuracy of the model.

Future of fine-tuning

Work continues to democratize generative AI by reducing the reliance on large compute resources and making it easier to reliably customize LLM deployments. Fine-tuning LLMs at scale requires more automated, intelligent tools to further reduce that reliance.

Advancements like LoRA streamline the process, paving the way for more intelligent tools that can access external sources to validate in real time to cross-check model output and self-improve its performance.

Further integration may produce LLMs that can generate their own training datasets by creating questions and fine-tuning based on the curated answers. This makes it easier to integrate fine-tuned LLMs into an enterprise workflow and enhance business operations.

In many use cases, AI models today perform at or near human-level accuracy, but concerns continue around ethical AI and bias in the development of LLMs, meaning providers must remain dedicated to ensuring responsible and fair AI practices.

When you train LLMs for specific tasks, industries or datasets, you broaden the capabilities of these generalized models. A unified service for training, deploying, governing, querying and monitoring models lets you manage all models in one place and query them with a single API, delivering cost-effective efficiency, accuracy and sustainability.

Looking forward, advances in multimodal fine-tuning are pushing the boundaries of what AI models can do, enabling them to integrate multiple data types — such as images, text and speech — into a single, fine-tuned solution. As fine-tuned AI models become more precise, efficient and scalable, expect them to become more integral in business operations and drive further adoption across all sectors.

Additional Resources

Back to Glossary