Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Kubernetes for Generative AI Solutions
Kubernetes for Generative AI Solutions

Kubernetes for Generative AI Solutions: A complete guide to designing, optimizing, and deploying Generative AI workloads on Kubernetes

Arrow left icon
Profile Icon Ashok Srirama Profile Icon Sukirti Gupta
Arrow right icon
€26.98 €29.99
eBook Jun 2025 334 pages 1st Edition
eBook
€26.98 €29.99
Paperback
€37.99
Subscription
Free Trial
Renews at €18.99p/m
Arrow left icon
Profile Icon Ashok Srirama Profile Icon Sukirti Gupta
Arrow right icon
€26.98 €29.99
eBook Jun 2025 334 pages 1st Edition
eBook
€26.98 €29.99
Paperback
€37.99
Subscription
Free Trial
Renews at €18.99p/m
eBook
€26.98 €29.99
Paperback
€37.99
Subscription
Free Trial
Renews at €18.99p/m

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Table of content icon View table of contents Preview book icon Preview Book

Kubernetes for Generative AI Solutions

Generative AI Fundamentals

Generative AI (GenAI) has revolutionized our world and has grabbed everyone’s attention since the introduction of ChatGPT in November of 2022 by OpenAI (https://openai.com/index/chatgpt/). However, the foundational concepts of this technology have been around for quite some time. In this chapter, we will introduce the key concepts of GenAI and how it has evolved over time. We will then discuss how to think about a GenAI project and align it with the business objectives, covering the entire process for developing and deploying GenAI workloads, along with potential use cases across different industries.

In this chapter, we’re going to cover the following main topics:

  • Artificial intelligence versus GenAI
  • The evolution of machine learning
  • Transformer architecture
  • The GenAI project life cycle
  • The GenAI deployment stack
  • GenAI project use cases

Artificial Intelligence versus GenAI

Before we dive deeper into GenAI concepts, let’s discuss the differences between Artificial Intelligence (AI), Machine Learning (ML), Deep Learning (DL), and GenAI, as these terms are often used interchangeably.

Figure 1.1 shows the relationships between these concepts.

Figure 1.1 – Relationships between AI, ML, DL, and GenAI

Figure 1.1 – Relationships between AI, ML, DL, and GenAI

Let’s learn more about these relationships:

  • AI: AI refers to a system or algorithm that is capable of performing tasks that would otherwise typically require human intelligence. These tasks include reasoning, learning, problem-solving, perception, and language understanding. AI is a broad category and can include rule-based systems, expert systems, neural networks, and GenAI algorithms. The evolution of AI algorithms has provided machines with human-like senses and capabilities, such as vision to analyze the world around them, listening and speaking to understand...

Evolution of machine learning

Since this book is about GenAI, what could be a better way to start it than asking ChatGPT to summarize the evolution of AI and ML over the last decade?

Prompt: "Why did the chicken cross the road?" Describe how that question's answer evolved using AI/ML over the last decade.

ChatGPT Response (ChatGPT-4o, June 16th, 2024):

The evolution of AI/ML responses to the question "Why did the chicken cross the road?" over the past decade reflects significant advancements in language processing and understanding. Here's a detailed description of how these responses have evolved:
Early 2010s: Rule-Based Systems
Response Type: Traditional, rule-based
Example Answer: "To get to the other side."
Description: In the early 2010s, AI systems were primarily rule-based. These systems relied on pre-defined patterns and rules coded by developers. When asked "Why did the chicken cross the road?" the AI would output...

Transformer architecture

A transformer model uses an encoder-decoder architecture, where the encoder maps the input sequences/tokens through a self-attention mechanism. This mapped data is used by the decoder to generate the output sequence. The mapping of input tokens retains not only their intrinsic values but also their context and weight in the original sequence. Let’s go through some key aspects of the encoder architecture in the following figure:

Figure 1.3 – Transformer architecture from the Attention Is All You Need paper

Figure 1.3 – Transformer architecture from the Attention Is All You Need paper

Here is a breakdown of the concepts highlighted in Figure 1.3:

  • Input embeddings: Marked as 1 in the figure, this is a key part of the transformer model, which converts input sequences/tokens into high-dimensional vector embeddings. In real-world applications, output embeddings from a trained model may be stored in high-dimensional vector databases, such as Elasticsearch, Milvus, or PineCone. Vector databases...

GenAI project life cycle

Enterprise spending on GenAI projects has been growing exponentially since 2023, with c-suite executives planning to spend even more on GenAI projects (https://www.gartner.com/en/newsroom/press-releases/2023-10-11-gartner-says-more-than-80-percent-of-enterprises-will-have-used-generative-ai-apis-or-deployed-generative-ai-enabled-applications-by-2026). However, there is growing concern about how to quantify the Return on Investment (ROI) of these efforts, such as revenue impact, efficiency, and accuracy gains. Moving forward, ROI will become a critical part of the conversation, as enterprises look for new GenAI projects. So, before starting a new GenAI project, it is recommended to think about the entire project life cycle. In this section, we will be covering the project life cycle.

Let’s first look at the following figure, which outlines the end-to-end GenAI project life cycle, starting from defining business objectives, or KPIs. This is followed...

GenAI deployment stack

As we discuss GenAI application development and deployment over Kubernetes, it is a good idea to understand the entire deployment stack, which can help us to think about the right infrastructure, orchestration platform, and libraries. The following figure shows the various layers of the GenAI deployment stack, from the foundational infrastructure layer comprising compute, storage, and networking through to the orchestration, tools, and deployment layers.

Figure1.6 – Deployment stack for GenAI applications

Let’s a closer look at each of these layers:

  • Infrastructure layer: We will start from the foundation layer of the stack and move upward. The foundation of this stack is the infrastructure layer, which covers compute, networking, and storage options:
    • Compute: For compute, we can use options such as CPUs, GPUs, custom accelerators, or a combination of these. As explained previously, LLMs are very computationally...

GenAI use cases

GenAI is transforming all industries. As per McKinsey (https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/the-economic-potential-of-generative-ai-the-next-productivity-frontier#key-insights), it is expected to add trillions of dollars to the economy by 2030. The following are some of the industry verticals affected and use cases. It is by no stretch a comprehensive list, as the list of applications is growing very rapidly.

However, learning about some of these will give you an idea of the potential GenAI carries:

  • Retail/e-commerce:
    • Product design: GenAI can be used with every stage of product design, such as summarizing market and user research, creating visuals/animations of possible product options, refining concepts based on user feedback, and creating product descriptions and marketing campaigns.
    • Personalized recommendation: GenAI can enable more natural engagement for e-commerce websites. Users can ask in natural language for product...

Summary

In this chapter, we covered the differences between AI and GenAI. AI is a very broad term that refers to technologies that enable machines to emulate human intelligence and encompasses a broad range of applications, including GenAI, whereas GenAI specifically focuses on creating new content, such as text, images, and videos.

We then looked at the evolution of ML, understanding its progression from CNNs/RNNs to the transformer architecture introduced in 2017. Transformers have revolutionized AI with their ability to process sequences of data efficiently, making them fundamental to many GenAI applications, particularly in NLP.

The chapter also outlined the life cycle of a GenAI project, which includes business objectives and KPIs, foundational model selection, model training, evaluation, and deployment. Each stage is critical, with continuous iterations based on performance feedback.

Finally, the chapter covered various use cases of GenAI across different sectors, including...

Appendix 1B – Transformer mathematical models for the self-attention mechanism

In this section, we will provide a basic overview of how the transformer model works, including a mathematical explanation of its functionality. We discussed the concepts of queues, keys, and values as part of transformer analysis earlier in this chapter. For a given attention head i, the following are the query, key, and value vectors:

Q= X* WiQ

K= X* WiK

V= X* WiV

Where WiQ, WiK, and WiV are the weight vectors for the attention head i for the query, key, and values. These weights are the parameters that we optimize as we train the model.

To understand the computational complexity of these calculations, let’s look over the dimensionality of these vectors:

  • X= [n, dmodel], where n is the number of tokens in the input sequence and dmodel is the dimensionality of the multi-dimensional space.
  • Weight vectors: WiQ Wik Wiv = [dmodel ,dk ], where dk =dmodel / # of attention...
Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Build and deploy your first Generative AI workload on Kubernetes with confidence
  • Learn to optimize costly resources such as GPUs using fractional allocation, Spot Instances, and automation
  • Gain hands-on insights into observability, infrastructure automation, and scaling Generative AI workloads
  • Purchase of the print or Kindle book includes a free PDF eBook

Description

Generative AI (GenAI) is revolutionizing industries, from chatbots to recommendation engines to content creation, but deploying these systems at scale poses significant challenges in infrastructure, scalability, security, and cost management. This book is your practical guide to designing, optimizing, and deploying GenAI workloads with Kubernetes (K8s) the leading container orchestration platform trusted by AI pioneers. Whether you're working with large language models, transformer systems, or other GenAI applications, this book helps you confidently take projects from concept to production. You’ll get to grips with foundational concepts in machine learning and GenAI, understanding how to align projects with business goals and KPIs. From there, you'll set up Kubernetes clusters in the cloud, deploy your first workload, and build a solid infrastructure. But your learning doesn't stop at deployment. The chapters highlight essential strategies for scaling GenAI workloads in production, covering model optimization, workflow automation, scaling, GPU efficiency, observability, security, and resilience. By the end of this book, you’ll be fully equipped to confidently design and deploy scalable, secure, resilient, and cost-effective GenAI solutions on Kubernetes.

Who is this book for?

This book is for solutions architects, product managers, engineering leads, DevOps teams, GenAI developers, and AI engineers. It's also suitable for students and academics learning about GenAI, Kubernetes, and cloud-native technologies. A basic understanding of cloud computing and AI concepts is needed, but no prior knowledge of Kubernetes is required.

What you will learn

  • Explore GenAI deployment stack, agents, RAG, and model fine-tuning
  • Implement HPA, VPA, and Karpenter for efficient autoscaling
  • Optimize GPU usage with fractional allocation, MIG, and MPS setups
  • Reduce cloud costs and monitor spending with Kubecost tools
  • Secure GenAI workloads with RBAC, encryption, and service meshes
  • Monitor system health and performance using Prometheus and Grafana
  • Ensure high availability and disaster recovery for GenAI systems
  • Automate GenAI pipelines for continuous integration and delivery

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Jun 06, 2025
Length: 334 pages
Edition : 1st
Language : English
ISBN-13 : 9781836209928
Tools :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Product Details

Publication date : Jun 06, 2025
Length: 334 pages
Edition : 1st
Language : English
ISBN-13 : 9781836209928
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
€18.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
€189.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts
€264.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts

Table of Contents

19 Chapters
Part 1:GenAI and Kubernetes Foundation Chevron down icon Chevron up icon
Chapter 1: Generative AI Fundamentals Chevron down icon Chevron up icon
Chapter 2: Kubernetes – Introduction and Integration with GenAI Chevron down icon Chevron up icon
Chapter 3: Getting Started with Kubernetes in the Cloud Chevron down icon Chevron up icon
Part 2: Productionalizing GenAI Workloads Using K8s Chevron down icon Chevron up icon
Chapter 4: GenAI Model Optimization for Domain-Specific Use Cases Chevron down icon Chevron up icon
Chapter 5: Working with GenAI on K8s: Chatbot Example Chevron down icon Chevron up icon
Chapter 6: Scaling GenAI Applications on Kubernetes Chevron down icon Chevron up icon
Chapter 7: Cost Optimization of GenAI Applications on Kubernetes Chevron down icon Chevron up icon
Chapter 8: Networking Best Practices for Deploying GenAI on K8s Chevron down icon Chevron up icon
Chapter 9: Security Best Practices for Deploying GenAI on Kubernetes Chevron down icon Chevron up icon
Chapter 10: Optimizing GPU Resources for GenAI Applications in Kubernetes Chevron down icon Chevron up icon
Part 3: Operating GenAI Workloads on K8s Chevron down icon Chevron up icon
Chapter 11: GenAIOps: Data Management and the GenAI Automation Pipeline Chevron down icon Chevron up icon
Chapter 12: Observability – Getting Visibility into GenAI on K8s Chevron down icon Chevron up icon
Chapter 13: High Availability and Disaster Recovery for GenAI Applications Chevron down icon Chevron up icon
Chapter 14: Wrapping Up: GenAI Coding Assistants and Further Reading Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy