Chapter 3

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

Chapter 3

DevOPs for ML Systems

--------------------------------------------------------------------------------------------------------------------

3.1 Introduction to DevOPs

DevOps is a cultural and technical movement that combines software development (Dev) and IT operations (Ops) to
improve collaboration, efficiency, and the delivery of software products. It aims to shorten the software development
lifecycle (SDLC) while delivering features, updates, and fixes frequently and reliably through automation and
collaboration.

Key Principles of DevOps

1. Collaboration and Communication: DevOps fosters a collaborative culture between developers and operations
teams, breaking down silos to ensure seamless communication.
2. Automation: Automating repetitive tasks such as testing, deployment, and monitoring enhances productivity
and minimizes human error.
o Continuous Integration and Continuous Delivery (CI/CD): Continuous Integration (CI):
Developers regularly integrate code into a shared repository, followed by automated testing to detect
issues early.
o Continuous Delivery (CD): Ensures that the code is always in a deployable state, automating
deployment pipelines to push code changes into production.
3. Infrastructure as Code (IaC): Treating infrastructure as software enables teams to define and manage
resources (e.g., servers, networks) through code, ensuring consistency and repeatability.
4. Monitoring and Feedback: Continuous monitoring and feedback loops provide insights into system
performance and user behavior, allowing teams to improve software iteratively.

Benefits of DevOps

1. Faster Time-to-Market:
Automation and streamlined processes allow for faster development, testing, and deployment cycles.
2. Improved Collaboration:
Breaking down silos between development and operations ensures that teams work toward shared goals.
3. Higher Quality Software:
Continuous testing and integration reduce bugs and ensure higher-quality code.
4. Enhanced Scalability:
Automation and IaC make it easier to scale systems up or down based on demand.
5. Better Customer Satisfaction:
Faster delivery of features and fixes leads to better user experiences.

DevOps Toolchain
A DevOps toolchain comprises tools used across the SDLC stages to automate processes and improve efficiency. Some
popular tools include:

1. Version Control:
o Git, GitHub, GitLab, Bitbucket
2. CI/CD:
o Jenkins, CircleCI, TravisCI, GitHub Actions
3. Containerization:
o Docker, Podman
4. Container Orchestration:
o Kubernetes, Docker Swarm
5. Configuration Management:
o Ansible, Puppet, Chef, SaltStack
6. Infrastructure as Code (IaC):
o Terraform, AWS CloudFormation
7. Monitoring and Logging:
o Prometheus, Grafana, Splunk, ELK Stack

Challenges in DevOps

1. Cultural Resistance:
Adopting a DevOps mindset requires significant cultural change, which can face resistance from traditional
teams.
2. Tool Overload:
Choosing and managing the right tools can be overwhelming given the wide variety of options available.
3. Complexity in CI/CD Pipelines:
Building and maintaining robust CI/CD pipelines requires expertise and continuous effort.
4. Security Concerns:
Integrating security into DevOps (DevSecOps) is a challenge that demands additional focus and resources.
5. Scalability:
Scaling DevOps practices across large organizations can be complex.

3.1.1 DevOPS Lifecycle

DevOps is a combination of Development and Operations, emphasizing collaboration,


automation, and integration between development teams and IT operations. The goal is to
shorten the software development lifecycle while delivering high-quality software continuously.
The DevOps lifecycle can be broken down into several phases, each emphasizing different
activities that align with continuous integration, continuous testing, and continuous deployment.
1. Plan: Professionals determine the commercial need and gather end-user opinions
throughout this level. In this step, they design a project plan to optimize business impact
and produce the intended result.
2. Code – During this point, the code is being developed. To simplify the design process, the
developer team employs lifecycle DevOps tools and extensions like Git that assist them in
preventing safety problems and bad coding standards.
3. Build – After programmers have completed their tasks, they use tools such as Maven and
Gradle to submit the code to the common code source.
4. Test – To assure software integrity, the product is first delivered to the test platform to
execute various sorts of screening such as user acceptability testing, safety testing,
integration checking, speed testing, and so on, utilizing tools such as JUnit, Selenium, etc.
5. Release – At this point, the build is prepared to be deployed in the operational
environment. The DevOps department prepares updates or sends several versions to
production when the build satisfies all checks based on the organizational demands.
6. Deploy – At this point, Infrastructure-as-Code assists in creating the operational
infrastructure and subsequently publishes the build using various DevOps lifecycle tools.
7. Operate – This version is now convenient for users to utilize. With tools including Chef,
the management department take care of server configuration and deployment at this point.
8. Monitor – The DevOps workflow is observed at this level depending on data gathered
from consumer behavior, application efficiency, and other sources. The ability to observe
the complete surroundings aids teams in identifying bottlenecks affecting the production
and operations teams’ performance.

3.1.2 7 Cs of DevOps

The seven C’s of DevOPs are

1. Continuous Development
2. Continuous Integration
3. Continuous Testing
4. Continuous Deployment/Continuous Delivery
5. Continuous Monitoring
6. Continuous Feedback
7. Continuous Operations
1. Continuous Development

In Continuous Development code is written in small, continuous bits rather than all at once,
Continuous Development is important in DevOps because this improves efficiency every time
a piece of code is created, it is tested, built, and deployed into production. Continuous
Development raises the standard of the code and streamlines the process of repairing flaws,
vulnerabilities, and defects. It facilitates developers’ ability to concentrate on creating high-
quality code.

2. Continuous Integration

Continuous Integration can be explained mainly in 4 stages in DevOps. They are as follows:
1. Getting the SourceCode from SCM
2. Building the code
3. Code quality review
4. Storing the build artifacts
The stages mentioned above are the flow of Continuous Integration and we can use any of the
tools that suit our requirement in each stage and of the most popular tools are GitHub for
source code management(SCM) when the developer develops the code on his local machine
he pushes it to the remote repository which is GitHub from here who is having the access can
Pull, clone and can make required changes to the code. From there by using Maven we can
build them into the required package (war, jar, ear) and can test the Junit
cases.SonarQube performs code quality reviews where it will measure the quality of source
code and generates a report in the form of HTML or PDF format. Nexus for storing the build
artifacts will help us to store the artifacts that are build by using Maven and this whole process
is achieved by using a Continuous Integration tool Jenkins.

3. Continuous Testing

Any firm can deploy continuous testing with the use of the agile and DevOps methodologies.
Depending on our needs, we can perform continuous testing using automation testing tools
such as Testsigma, Selenium, LambdaTest, etc. With these tools, we can test our code and
prevent problems and code smells, as well as test more quickly and intelligently. With the aid
of a continuous integration platform like Jenkins, the entire process can be automated, which is
another added benefit.

4. Continuous Deployment/ Continuous Delivery

Continuous Deployment: Continuous Deployment is the process of automatically deploying an


application into the production environment when it has completed testing and the build stages.
Here, we’ll automate everything from obtaining the application’s source code to deploying it.
Continuous Delivery: Continuous Delivery is the process of deploying an application into
production servers manually when it has completed testing and the build stages. Here, we’ll
automate the continuous integration processes, however, manual involvement is still required
for deploying it to the production environment.

5. Continuous Monitoring

DevOps lifecycle is incomplete if there was no Continuous Monitoring. Continuous


Monitoring can be achieved with the help of Prometheus and Grafana we can continuously
monitor and can get notified before anything goes wrong with the help of Prometheus we can
gather many performance measures, including CPU and memory utilization, network traffic,
application response times, error rates, and others. Grafana makes it possible to visually
represent and keep track of data from time series, such as CPU and memory utilization.

6. Continuous Feedback

Once the application is released into the market the end users will use the application and
they will give us feedback about the performance of the application and any glitches
affecting the user experience after getting multiple feedback from the end users’ the
DevOps team will analyze the feedbacks given by end users and they will reach out to the
developer team tries to rectify the mistakes they are performed in that piece of code by this
we can reduce the errors or bugs that which we are currently developing and can produce
much more effective results for the end users also we reduce any unnecessary steps to
deploy the application. Continuous Feedback can increase the performance of the
application and reduce bugs in the code making it smooth for end users to use the
application.

7. Continuous Operations

We will sustain the higher application uptime by implementing continuous operation,


which will assist us to cut down on the maintenance downtime that will negatively impact
end users’ experiences. More output, lower manufacturing costs, and better quality control
are benefits of continuous operations.

3.2 Fundamentals of CI/CD

CI/CD is a set of practices that automate the building, testing, and deployment of
applications, ensuring they are always in a releasable state. It promotes faster development
cycles, improved software quality, and reduced time to market. A CI/CD (Continuous
Integration/Continuous Deployment) pipeline automates the software development process,
from code integration and testing (CI) to deployment and delivery (CD). In MLOps, it
streamlines machine learning model development and deployment.

1. Continuous Integration (CI):

 Definition: Continuous Integration is a practice where developers integrate code into


a shared repository frequently, ideally multiple times a day. Each integration is
automatically tested to detect and fix bugs early in the development cycle.
 Purpose: The main goal of CI is to prevent integration issues and reduce the time it
takes to detect and address problems in the codebase. By running automated tests on
every integration, issues are caught and resolved quickly.
 Process:
1. Developers commit code changes to a version control system (like Git).
2. These changes trigger an automatic build process.
3. Automated tests run on the newly built code.
4. If the tests pass, the code is successfully integrated; if not, developers are
alerted to fix the issues.

2. Continuous Delivery (CD):

 Definition: Continuous Delivery builds on CI by automating the release of code


changes to a staging environment for further testing. The process ensures that the
codebase is always in a deployable state and can be pushed to production at any time.
 Purpose: The goal is to ensure that the latest version of the code is always ready for
release. However, the deployment to production is still a manual step.
 Process:
1. After passing automated tests in CI, code changes are automatically deployed
to a staging or pre-production environment.
2. Additional testing or approval is done in this environment.
3. When ready, the code is manually released to production.

3. Continuous Deployment (CD):


 Definition: Continuous Deployment is the next step after Continuous Delivery. In
this case, every code change that passes automated tests is automatically released to
the production environment without any manual intervention.
 Purpose: The main goal of Continuous Deployment is to eliminate manual steps in
the release process, allowing developers to deliver updates quickly and frequently to
users.
 Process:
1. Like Continuous Delivery, code changes are automatically deployed to staging.
2. Once approved, the code is automatically released to production.

Key Concepts in CI/CD:

1. Version Control:
o All code changes are committed to a shared version control system, such as
Git. This allows teams to track changes, collaborate, and revert to earlier
versions if necessary.
2. Automated Build:
o Every time code is integrated, the system automatically builds the software,
ensuring the code compiles correctly and dependencies are properly managed.
3. Automated Testing:
o Automated testing is crucial in CI/CD to catch bugs early. Unit tests,
integration tests, and acceptance tests are run automatically after each
commit.
4. Staging Environment:
o In Continuous Delivery, after passing initial tests, the code is deployed to a
staging environment. This environment mirrors production but allows for
final checks and approvals before the code is released to users.
5. Deployment Automation:
o The CD pipeline automates deployment to staging and production
environments, reducing human error and ensuring consistency.
6. Monitoring and Alerts:
o Once deployed to production, the system is monitored for performance issues,
bugs, or failures. If any issues arise, alerts are sent to the team to fix them
promptly.

Advantages of CI/CD:

1. Faster Time to Market:


o Automated build and testing processes enable teams to release updates
more frequently, resulting in faster delivery of new features and bug fixes.
2. Reduced Risk:
o Automated testing at every stage ensures that bugs are caught early, reducing
the chances of major issues in production.
3. Improved Quality:
o Continuous testing, combined with early feedback, helps improve the
overall quality of the codebase.
4. Greater Collaboration:
o By integrating code frequently, teams work more collaboratively,
avoiding integration issues that arise when code diverges for extended
periods.
5. Automated Deployment:
o Reduces the time and effort required for deployments, minimizing
human intervention and errors.
3.3 ETL Pipeline

An ETL Pipeline (Extract, Transform, Load) is a process used in data engineering to extract data
from various sources, transform it into a usable format, and load it into a target destination, such
as a database, data warehouse, or data lake. ETL pipelines are fundamental for preparing data for
analytics, reporting, and machine learning applications.
3.3.1 ETL Pipeline VS CI/CD Pipeline

Feature ETL Pipeline CI/CD Pipeline


Data extraction, transformation, and Automating software build, test, and
Purpose
loading for analytics. deployment processes.
Focus Data processing and integration. Software development and delivery.
Stages Extract, Transform, Load. Build, Test, Deploy.
Data integration tools (e.g., Apache CI/CD tools (e.g., Jenkins, GitHub
Tools
Nifi, Talend, AWS Glue). Actions, Azure Pipelines).
Data vs. Code Primarily focuses on data. Primarily focuses on application code.
Often run on a scheduled basis Continuous and often triggered by
Frequency
(batch processing). code changes.
Automating software delivery,
Data integration for reporting and
Use Cases reducing deployment times, enhancing
analytics, data warehousing.
collaboration.
Transformed data loaded into data
Deployed applications or services to
Outputs storage systems (e.g., data
production or staging environments.
warehouses).
Focuses on data validation and Focuses on unit, integration, and
Testing
quality checks during transformation. acceptance testing of code.
Typically involves data warehouses, Involves application servers, cloud
Infrastructure
databases, and ETL tools. services, and CI/CD tools.

3.3.2 Stages of MLOps pipeline

An MLOps pipeline is the sequence of steps and tools used to develop, deploy, monitor, and
maintain machine learning models in production. The stages of an MLOps pipeline include:
1. Code and Data Versioning:
 Version control for code and data.
 Collaboration and traceability.
2. Data Preprocessing:
 Cleaning, transforming, and feature engineering.
 Ensure data quality.
3. Model Training:
 Developing and training machine learning models.
 Hyperparameter tuning and cross-validation.
4. Model Evaluation:
 Assessing model performance using metrics.
 Validation data separation.
5. Model Deployment:
 Containerization, orchestration, and API endpoints.
 Automating deployment via CI/CD.
6. Model Monitoring:
 Continuous tracking of model performance.
 Alerts for anomalies and drift.
7. Feedback and Iteration:
 Incorporate user feedback into model updates.
 Iterate on models for improvement.
An MLOps pipeline ensures a systematic and automated approach to managing machine learning
models throughout their lifecycle.

3.3.3 MLOps vs DevOps


3.4 Fundamentals of Jenkins

Jenkins is an open-source automation server used to streamline and automate various tasks in
software development. It is widely known for enabling Continuous Integration (CI) and
Continuous Delivery/Deployment (CD), helping teams integrate code changes, test them, and
deploy applications efficiently. Jenkins is highly extensible with hundreds of plugins to support
building, deploying, and automating any project.

Key Features of Jenkins

1. Open-Source: Jenkins is free and has an active community contributing to its continuous
development.
2. Platform Independent: Runs on Windows, macOS, and Linux, and supports all major
development environments.
3. Extensibility: Over 1,800 plugins available for integration with various tools and platforms (e.g.,
Docker, Kubernetes, Git, Maven).
4. Distributed Builds: Supports a master-slave architecture for distributed builds, enabling parallel
execution.
5. Pipeline Support: Facilitates complex workflows through pipelines defined in code.
6. Integration: Supports popular tools like Git, JIRA, Docker, Selenium, and Kubernetes.

3.4.1 Jenkins Pipeline

Jenkins Pipeline is a suite of plugins that support implementing and integrating continuous
delivery pipelines into Jenkins. It allows you to define the entire build, test, and deployment
process of your applications as code. This approach provides greater flexibility, reusability, and
maintainability compared to traditional Jenkins jobs, which were typically defined through the
UI.

Key Concepts

1. Pipeline: The entire process defined in a Jenkinsfile that describes how the software will
be built, tested, and deployed.
2. Jenkinsfile: A text file that contains the definition of a Jenkins Pipeline and is stored in
the version control system along with the application code.
3. Stages: Logical segments within a pipeline that define different parts of the build process,
such as "Build," "Test," and "Deploy."
4. Steps: Individual tasks that are executed within a stage, such as running a shell
command, invoking another job, or sending notifications.
5. Declarative and Scripted Pipelines: Two types of syntax used to define pipelines:
o Declarative Pipelines: A more structured and easier-to-read syntax.
Recommended for most use cases.
o Scripted Pipelines: A more flexible and powerful syntax, using Groovy, allowing
for complex logic and conditions.

Example of a Simple Declarative Jenkins Pipeline

Here’s a basic example of a Jenkinsfile using the declarative syntax:

pipeline {
agent any // Run on any available agent

stages {
stage('Build') {
steps {
echo 'Building...'
sh 'make' // Run the build command
}
}

stage('Test') {
steps {
echo 'Testing...'
sh 'make test' // Run the tests
}
}

stage('Deploy') {
steps {
echo 'Deploying...'
sh 'deploy.sh' // Run the deployment script
}
}
}

post {
success {
echo 'Pipeline completed successfully!'
}
failure {
echo 'Pipeline failed!'
}
}
}

A Jenkins Pipeline is a set of steps to define the workflow of a CI/CD process in code. It
consists of:

1. Declarative Pipeline:
o Easier to use and recommended for beginners.
o Example:

groovy
Copy code
pipeline {
agent any
stages {
stage('Build') {
steps {
echo 'Building...'
sh 'make build'
}
}
stage('Test') {
steps {
echo 'Testing...'
sh 'make test'
}
}
stage('Deploy') {
steps {
echo 'Deploying...'
sh 'make deploy'
}
}
}
}

2. Scripted Pipeline:
o More flexible and powerful.
o Written in Groovy scripting language.
o Example:

groovy
Copy code
node {
stage('Build') {
echo 'Building...'
sh 'make build'
}
stage('Test') {
echo 'Testing...'
sh 'make test'
}
stage('Deploy') {
echo 'Deploying...'
sh 'make deploy'
}
}

Key Components of Jenkins

o Jobs: Tasks configured to automate processes like building, testing, or deploying.


o Nodes: Jenkins master and agents (slaves) used for distributed builds.
o Plugins: Extend Jenkins' functionality (e.g., Git plugin, Docker plugin, Kubernetes plugin).
o Pipeline: Automates complex workflows with code.

Advantages of Jenkins
1. Automation: Simplifies repetitive tasks like testing and deployment.
2. Extensibility: Wide range of plugins for seamless integration.
3. Ease of Use: User-friendly interface and dashboards.
4. Scalability: Distributed builds for large-scale projects.
5. Community Support: Extensive documentation and active community.

Challenges of Jenkins

1. Complexity: Managing large pipelines with many plugins can be challenging.


2. Performance: High resource usage when handling multiple jobs.
3. Maintenance: Requires regular updates to avoid plugin/version conflicts.

3.5 Different Model Packaging and Deployment Types:

1. Containerization:

 Packaging models in containers (e.g., Docker) for consistent


deployment.

 Portable and works across various environments.

2. Serverless:

 Deploying models as functions in serverless platforms (e.g., AWS


Lambda).

 Automatically scales based on demand.

3. On-Premises Servers:

 Deploy models on in-house servers for control and security.

 Requires infrastructure management.

4. Cloud Services:

 Hosting models on cloud platforms (e.g., AWS SageMaker, Azure ML).

 Managed services for easy deployment and scaling.

Platforms for Hosting Models:

1. Amazon SageMaker:
 Provides end-to-end ML model development and hosting on AWS.

2. Azure Machine Learning:

 Azure's platform for model development and deployment.

3. Google AI Platform:

 Google Cloud's ML platform for model hosting and management.

4. Kubernetes:

 Container orchestration platform for custom model deployment.

5. TensorFlow Serving:

 A framework for serving TensorFlow models.

6. Apache Spark:

 Used for distributed model deployment in big data environments.

Choose the packaging and deployment type and platform that best fit your project's
requirements and infrastructure.

3.6 Batch processing vs Stream processing with respect to MLOPs

Batch processing and stream processing are two fundamental approaches for handling data in the
context of MLOps (Machine Learning Operations). Each has its strengths and weaknesses
depending on the use case and requirements of machine learning workflows. Here’s a
comparison of the two approaches:

Feature Batch Processing Stream Processing


Processes large volumes
Processes data in real-time as it
Definition of data at once in fixed
arrives (streams).
intervals (batches).
Higher latency; data is
Data Low latency; data is processed
processed after
Latency immediately upon arrival.
accumulation.
Suitable for scenarios
where immediate
Ideal for real-time applications,
insights are not critical,
such as monitoring, anomaly
Use Cases such as training models
detection, or making predictions
on historical data or
based on live data streams.
running large-scale data
analysis.
Typically simpler to More complex due to the need
implement; involves for real-time data handling,
Complexity
scheduled jobs and often involving event-driven
predefined data sets. architectures.
Can optimize resource May require more resources to
Resource
usage by processing handle continuous data input and
Utilization
data in bulk. processing.
Deals with potentially infinite
Handles large volumes
Data data streams, focusing on
of data efficiently in a
Volume processing small chunks of data
single run.
in real-time.
Often used for offline
model training, Enables online learning or
Model
retraining models continuous retraining based on
Training
periodically with the latest data.
accumulated data.
Apache Hadoop,
Apache Kafka, Apache Flink,
Framework Apache Spark (batch
Apache Storm, and stream-
s and Tools mode), and traditional
processing frameworks.
ETL tools.
Slower feedback loop; Faster feedback loop; immediate
Feedback insights are available insights can lead to real-time
Loop only after batch adjustments in models or
processing completes. operations.
Scales well with larger
Can scale horizontally to handle
datasets, but latency
Scalability increased data rates without
increases with larger
significantly increasing latency.
batch sizes.

Some Important Questions

 Explain the concept of DevOPs


 Explain DevOPS Lifecycle in detail
 What is the difference between MLOps and DevOps?
 Describe fundamentals of CICD
 What is a CICD pipeline? Explain various stages involved in the MLOps pipeline.
 Explain Different Model packaging and Deployment types and platforms for hosting
models?.
 What is the difference between MLOps and DevOps
 Compare between ETL Pipeline VS CI/CD Pipeline
 Compare between MLOPS vs DEVOPS
 Fundamentals of Jenkins
 Write a short note on Jenkins Pipeline
 Batch processing vs Stream processing with respect to MLOPs

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy