Docerhg
Docerhg
Docerhg
Steps to Run
Let us assume we don't have MySQL installed and we decided to explore MySQL with docker
Step1: We can pull a docker image from docker hub with pull command
Docker pull mysql from docker hub for the first time
Step2:Once we issue a pull command docker will check for local copy of images if it is available
it will compare the hash value of image with that form docker hub to check for any updates. If
there are some updates it will pull the recent image from docker
here we are running a container with name c1 with mysql password=my-secret-pwd (-e is used
for setting env password )which is built from image mysql in backgroun(using -d)
so here we are going inside the container c1 with exec -it and running the command bash .On
executing the command we will interact with container and will go inside container and we can
see the Command prompt changed from home/somanath to <container-ID>
Then we use mysql -u root with root-password which we gave earlier and we will come out of
container using exit
Difference between Container and Image
Now if we come-out of the container and if we shutdown the machine and recreate a container
then we will have all data lost. To check this
Docker is based on layer concept.Say for example we are having a mysql image and when we
run the mysql image we will create a container C1 and we will create a database inside the
container. Now we will have original image and on top of it a layer of changes will be added as
shown below
What if we want to persist the data for future use .Say we have done a lot of analysis on a docker
mysql container and we have created a test DB so if the container is lost during shut down all
data will be lost .So to avoid it we will be mounting a volume using mount so that the data
changes made inside the container can be mapped to some physical location in my machine
We can use — mount flag to attach volume in local system to volume in container
To find out the location where we want to attach the volume we have to check for volume
keyword in docker file
docker-library/mysql
Permalink GitHub is home to over 50 million developers working together to
host and review code, manage projects, and…
github.com
the above command creates a volume mysqldata and so we will login the container and create a
db test
Now we can see the test db under
/var/lib/docker/volumes
Now we have the persistent volume but the only issue is I have to go inside the docker container
every time to execute queries which will be very difficult in long run.
Sometimes we may use workbench so we might need to connect to mysql inside docker
container with it .So for that we have to transfer connection from inside docker port (3306) to
some localhost’s available port(33360 for example)
So the last step is instead of running a mysql inside docker we can expose the docker port to ip of
running machine using port forwarding
Conclusion
Once we install a docker we will use volume mounts to store persistent data and finally we will
expose the docker to outside world using port forwarding
which will allow you to access mysql running inside 3306 in docker with localhost:33360(the
port is forwarded from 3306 to 33360).
Dockerfile is the blueprint or config file from which docker builds an Image .
Let us see a simple usecase where will run a small server using python
Docker Daemon uses abstraction called Build context which basically ships
all the content of the directory where dockerFile is present and will execute the instructions on
after another and build layer after layer .
Then it will be executing the executable commands one after another and build layers one after
another .As docker builds layers one after another we must create a docker file intuitively by
stacking non -changeable layers first and changeable layers atlast ,meaning say for example we
are having a python app we will pull the base image and install python — required modules as
first 2 steps as it wont change and the last step will be adding the application code .
Docker uses something called cache which is similar to cache so unless we change a layer it
will not rebuild and use the existing layer 2
As Shown above if we have adding code using copy command as second step every time the
docker has to rebuilt as it doesn’t have a cache rather if we add changes like copy files from local
system as last step as show above , it will just re build the last step that why we see the same
build 2 time took only 10 s compare to 35 sec for the first time
The more the number of commands in docker File ,the more is the number of layers
the more is the number of layers, the more is the size of the Image
the more is the size of the Image ,the more is the GC and memory issues while deploying images
in prod.
So it is common to use unix command chaining layers while writing dockerfile as shown below
Let us create a simple http server and let us see the below common commands used with docker
Step2: We will create a directory where we will keep the html files(RUN can be used to run any
command)
Step3: We will declare it as a volume so that we can use mount to change the data sent from app
(Will be used in docker run)
Step4:Since http.server module will run on 8000 we are exposing this port so that we can -p
command to connect from outside docker(Will be used in docker run)
Step 5 : Declare workdir so that the app will start from this directory
The below command will build the image named pythonapp by using dockerfile current working
directory
Conclusion
In this part 2 of docker series ,we successfull build a image from Dockerfile with a volume and
exposed a port to run http server and once the image is built,We mounted a volume and exposed
a port 4040 and successfully curl the data
==================-======================================================
Containers have a long history that dates back to the ‘60s. Over time, this technology has
advanced to a great deal and has become one of the most useful tools in the software industry.
Today, Docker has become synonymous for containers.
In one of our previous articles, we discussed how Docker is helping in the Machine Learning
space. Today, we will implement one of the many use cases of Docker in the development of ML
applications.
Introduction To Docker
Installing & Setting Up Docker
Getting Started With Docker
TensorFlow 2.0 Container
o Downloading Tensorflow 2.0-Docker
o Firing Up Container
o Accessing The Jupyter Notebook
o Sharing Files
o Installing Missing Dependencies
o Committing Changes & Saving The container Instance
o Running Container From The New Image
Introduction To Docker
Docker is a very popular and widely-used container technology. Docker has an entire ecosystem
for managing containers which includes a repository of images, container registries and
command-line interfaces, among others. Docker also comes with cluster management for
containers which allows multiple containers to be managed collectively in a distributed
environment.
Head to https://hub.docker.com/ and sign up with a Docker ID. Once you are in, you will see the
following page.
Once the file is downloaded, open it to install Docker Desktop. Follow the standard procedure
for installation based on your operating system and preferences.
You can click on the icon to set your Docker preferences and to update it.
If you see a green dot which says Docker Desktop is running we are all set to fire up containers.
Also, execute the following command in the terminal or command prompt to ensure that
everything is perfect:
docker --version
Output:
Docker version 19.03.4, build 9013bf5
Before we begin, there are a few basic things that you need to know.
Containers: Containers are the running instances of a docker image. We use an image to fire up
multiple containers.
The above command downloads the specified version of a docker image from the specified
repository.
docker images
The above command will return a table of images in your local (local machine) repository.
docker run
docker ps
The above command will return a table of all the running docker containers.
docker ps -a -q
The above command will display all the containers both running and inactive.
The above command can be used to delete a docker image from the local repository.
docker stop <container_id/name>
The above command can be used to delete or remove a running Docker container. The -f flag
force removes the container if it’s running. Like images, containers also have IDs and names.
We will be using the above commands a lot when dealing with Docker containers. We will also
learn some additional commands in the following sections.
Once all the downloading and extracting is complete, type docker images command to list the
Docker images in your machine.
We can now list the running containers in the system using docker ps command.
To stop the container use docker stop. The container id will be returned by the docker ps
command.
On successful execution of the run command, the Jupyter Notebook will be served on port 1234
of the localhost.
Copy the token from the logs and use it to log in to Jupyter Notebook.
Once logged in you will see an empty directory which is pointing to the /tf/ directory in the
container. This directory is mapped to the Documents/Docker directory of the local machine.
Sharing Files
While running the container, we mounted a local volume to the container that maps to the /tf/
directory within the container.
To share any files with the container, simply copy the required files into the local folder that was
mounted to the container. In this case copy the file to /Users/aim/Documents/Docker to access it
in the Jupyter Notebook.
Once you copy and refresh the notebook, you will find your files there.
Installing Missing Dependencies
Find an example notebook below. In the following notebook we will try to predict the cost of
used cars from MachineHack’s Predicting The Costs Of Used Cars – Hackathon. Sign up to
download the datasets for free.
Download the above notebook along with the datasets and copy them into your mounted
directory. (/Users/aim/Document/Docker – in my case).
Now let’s start from where we left off with our Jupyter Notebook running on docker.
Open the notebook and try to import some of the necessary modules.
import tensorflow as tf
print(tf.__version__)
import numpy as np
import pandas as pd
Output:
You will find that most of the modules are missing. Now let’s fix this.
There are two ways to fix this. We can either use pip install from the Jupyter Notebook and
commit changes in the container or we can go inside the container install all the missing
dependencies and commit changes.
Note:
Since we have used -it flag we will not be able to use the existing terminal /command prompt
window. Open a new terminal for the following process.
Get the container id using docker ps and use the following command to enter inside the running
container.
So let’s install all those necessary modules that we need. For this example, I will install 4
modules that I found missing.
Inside the container do pip install for all the missing libraries:
Note:
The easiest way to do it is by listing all the missing modules inside a requirements.txt file from
your local machine and copying it into the shared directory of the container and run pip
install -r requirements.txt. You can also use pip freeze > requirements.txt
command to export the installed modules from your local environment into requirements.txt
file.
Now go back to your Jupyter Notebook and try importing all those modules again.
Hooray! No more missing modules error!
Now that we have our development environment ready with all dependencies let’s save it so that
we don’t have to install all of them again.
Use the following command to commit the changes made to the container and save it as a new
image/version.
Now we have a new image with all the dependencies installed, we can remove the downloaded
image and use the new one instead.
Note:
To remove an image you must first kill all the running containers of that image. Use docker stop
command to stop and docker rm command to remove containers.
To fire up containers from the new image, use the following command:
Note:
You can create as many containers as your machine would permit. In the above example, you
can run multiple Jupyter notebooks by mapping different local machine ports to different
containers.
Great! You can now set up different development environments for each of your projects!
======================---------------------------------
In one of our previous articles, we learned how to set up Docker and use the latest Tensorflow
2.0 image to create a development environment for data science projects.
Docker is the most trusted and established tool for containerization and so it comes with a lot of
pre-built images for all the major frameworks, software and tools for the industry. Docker Hub
which is the Dockers repository for images contains official images for popular tools used by
Data Scientists across the world and tensorflow:nightly-py3-jupyter image which we used last
time is one among them.
In this article, we will take a conventional approach to set up docker containers from scratch.
Although this approach is not as straightforward as downloading an image and running it, it
gives us flexibility in terms of creating a custom environment for any project.
To follow along, you must have a basic understanding of Docker and must have it installed on
your local machine.
Containers: Containers are the running instances of a docker image. We use an image to fire up
multiple containers.
Dockerfile: A Dockerfile is a simple text file that defines the environment to be built. It is used
to build a docker image and contains all the commands a user could call on the command line to
assemble an image.
Let’s look at the simplest dockerfile and try to understand how it works.
#The base image - The container is built on top of this image --# reference:
https://hub.docker.com/_/ubuntu/
FROM ubuntu:18.04
LABEL version="1.0"
wget \
bzip2 \
ca-certificates \
build-essential \
curl \
git-core \
htop \
pkg-config \
unzip \
unrar \
tree \
freetds-dev
# Clean up
python3.6 \
python3-pip
Reading through the above file, you will notice that everything inside is nothing but shell scripts.
If you are a Linux person writing a dockerfile is a piece of cake. Even otherwise, it’s not much of
an effort. So let’s try to understand the above script.
The major portion of a dockerfile is occupied by Linux commands, there are some Docker
Specific commands that specify things to the docker engine for creating an image. For example,
the commands in uppercase letters such as FROM, RUN, LABEL etc are docker specific
commands.
FROM: Initializes a new build stage and sets the Base Image for subsequent instructions. The
specified image forms the base layer of the container.
RUN: Executes the command that follows within the environment and commits the changes.
LABEL: Adds metadata to an image. It is used as versioning the images.
Initializes a new image with a specified base image of Ubuntu:18.04 with FROM command
Adds versioning to the image with LABEL Command.
Sets the Environment Language parameter for the Linux environment.
Creates an empty directory called Volume in the root folder of the new image/container.
Install and updates packages for the new image.
Runs the cleanup command to clean up packages.
Installs Python 3.6 and PIP inside the new image.
So far we have just written instructions to build a docker image. The docker engine will use these
instructions to build a docker image. Let’s do that.
Building Image
With in the same directory where the dockerfile is residing, execute the following command to
build an image.
The build command builds a docker image using the specified instructions in the dockerfile. The
t flag allows us to specify a tag or name for the new image.
Output:
docker images
To fire up a container from the newly created image use the docker run command as follows:
The -it flag runs the container in interactive mode. This allows us to enter directly into the
container as it is fired up.
The above command runs the container and enters the root directory of the container.
We now have a separate Linux instance running with all the specific dependencies of the
dockerfile.
Thus we can have our own custom environment depending on the project requirements. We can
also use the newly created image as the base image for creating new images.
===========================-====================================
Introduction
My first encounter with Docker was not to solve a Data Science problem, but to install MySQL.
Yes, to install MySQL! Quite an anti-climatic start, right? At times, you stumble upon jewels
while going through StackOverflow and Docker was one of them. What started with a one-off
use case, ended up becoming a useful tool in my daily workflow.
I got a taste of docker when I tried to install TensorFlow in my system. Just to give you the
context, TensorFlow is a deep learning library which requires a series of steps that you ought to
do for system setup. Especially it is extremely complex to install Nvidia Graphics drivers. I
literally had to reinstall my Operating system countless number of times. That loop stopped only
when I shifted to docker, thankfully!
Docker provides you with an easy way to share your working environments including libraries
and drivers. This enables us to create reproducible data science workflows.
This article aims to provide the perfect starting point to nudge you to use Docker for your Data
Science workflows! I will cover both the useful aspects of Docker – namely, setting up your
system without installing the tools and creating your own data science environment.
Table of Contents
1. What is Docker?
2. Use Cases for Data Science
3. Docker terminology
4. Docker Hello-World
5. Data Science tools without installation
6. Your first Docker Image
7. Docker eco-system
1. What is Docker?
Docker is a software technology providing containers, promoted by the company Docker, Inc.
Docker provides an additional layer of abstraction and automation of operating-system-level
virtualization on Windows and Linux.
The underlying concept that Docker promotes is usage of containers, which are essentially
“boxes” of self-contained softwares. Containers have been in existence before Docker & quite
successful, but 2015 saw a huge adoption by the software community in terms of
containerization to solve the day-to-day issues.
2. Use Cases for Data Science
When you walk into a cubicle of Data Science folks, they are either doing data processing or
struggling to setup something on their work-stations/laptops. Okay, that might be an
exaggeration but you get the sense of helplessness. To give a small example, for someone to
setup a Caffe environment there are more than 30 unique ways. And trust me, you’ll end up
creating a new blogpost just for showing all the steps!
You get the idea. Anaconda distribution has made virtual environments and replicating
environments using a standardized method a reality…yet things do get muddled and sometimes
we miss the bullet points in the README file, carefully created to replicate those.
In my last article, we looked at how wrapping ML model in an API helps to make it
available to your consumers. This is just one part of it. With small teams with no
independent DevOps team to take care of deployments, Docker and the eco-
system around it — docker-compose, docker-machine helps to ease the problems at a
small scale.
Sales guys needs to present an RShiny application but don’t want to run the
code? Docker can help you with that!
3. Docker Terminology
I’ve been going on about containers, containerization in the previous section. Let’s understand
the Docker terminologies first.
A Dockerfile can be considered as an automated setup file. This small file helps to create/modify
Docker images. All the talk makes no sense, until there’s some proof. Let’s dive in and fire up
your terminals.
4. Docker: Hello-World
To install Docker, below are the links for the major operating systems:
o Linux
o Mac OS
o Windows
After installation, to test if Docker has been successfully installed run:
The above output means that Docker CLI is ready. Next step would be to download an image,
now how do we get any Docker image. Docker has a repository for that similar to a github
repo called Dockerhub. Visit dockerhub to know more.
After you have logged in, you would see your dashboard (which would be empty at first). Do a
quick-search using the Search button and type in: hello-world. (Below is my dashboard)
Searching hello-world would give you the results below:
Click on the first result which also happens to be the official image (created by good folks at
Docker, try to use the official images always if there’s a choice or create your own).
The command: docker pull hello-world is what you need to run on your terminal. That’s
how you download images to your local system.
o To know which images are already present, run: docker
images
You go to Dockerhub and search for the official Docker image for TensorFlow. All you need
to run on your terminal is: docker pull tensorflow/tensorflow
As discussed above (in Docker Terminology section), the tensorflow docker image is also a
layered object that forms images. Once all the intermediate layers are downloaded, run: docker
images to check whether our docker pull was successful.
To run the image, run the command: docker run -it -p 8888:8888
tensorflow/tensorflow
[NOTE: At the time of writing, port 8888 was already used up so running it on 8889. You can
run it on any port though *shrugs*]
Now the above docker run command packs in a few more command line argurments. A few
which you need to know better are as follows:
Now since a docker container is created, you can visit: http://localhost:8889 where you can try
out tensorflow.
Wasn’t that easy? Now as a exercise, replace -it in the docker run command by -d. See
whether you can get the tensorflow jupyter environment again or not?