Docker for Data Scientists

This content originally appeared on Level Up Coding - Medium and was authored by Vasukumar P

Docker Beginners Guide for Data Science Practitioners.

Learning the fundamental concepts of docker hands-on.

Image Generated by DALL-E

What is Docker?

Docker is an open-source platform designed to automate the deployment, scaling, and management of applications using containerization technology. It allows developers to package applications and their dependencies into a standardized unit called a container, which can run consistently across different computing environments.

Key Features of Docker

Containerization: Docker uses containers to encapsulate an application and its environment. This ensures that the application runs the same way, regardless of where it is deployed (on a developer’s machine, a testing environment, or in production).
Isolation: Each container runs in its isolated environment, which means that multiple containers can run on the same host without interfering with each other. This isolation helps to manage dependencies and minimize conflicts.
Portability: Since containers include all necessary dependencies and configurations, they can be easily moved between different environments (ex., from development to production) without compatibility issues.
Efficiency: Containers are lightweight and share the host OS kernel, which makes them more efficient in terms of resource utilization compared to traditional virtual machines (VMs). This allows for faster startup times and reduced overhead.
Scalability: Docker makes it easy to scale applications up or down by adding or removing containers as needed. This is particularly useful for microservice architectures, where different components of an application can be scaled independently.
Version Control: Docker images (the templates for containers) can be versioned, allowing developers to track changes and roll back to previous versions if necessary.
Ecosystem: Docker has a rich ecosystem of tools and services, including Docker Hub (a public repository for sharing and storing Docker images), Docker Compose (for defining and managing multi-container applications), and Docker Swarm (for container orchestration).

Getting Started with Docker

Docker is specifically developed for the Linux operating system, so most of the Docker commands are very relevant to the Linux commands. If you’re a Linux user, most of the commands will be familiar to you. But if you are using another operating system, some commands may vary.

We have installed the Docker desktop based on your operating system from the following documentation: docs.docker.com.

Also, you can verify the docker installation by running the following command in the terminal.

docker --version

Ways to access Docker

Docker desktop: It is a console setup, managing the images and containers by clicking the options in the console.
Docker daemon: It is a terminal setup, managing the images and containers by using the commands in the CLI (command line interface).

Difference between Image and Container:

Docker Image: Docker images are built from the Docker files. It is a static, immutable package that contains the code, dependencies and configuration for the application. Images do not have a lifecycle; they exist as files in a registry.
Docker Container: Once the Docker image is executed, it is called as container. Each execution creates a new Docker container from the Docker image. It is a dynamic, running instance of an image. It is an isolated environment (software-level virtualization) to run an application. Containers have a lifecycle (create, start, stop, delete).

Difference between Registry and Repository

Registry: A Docker registry is a service that stores Docker images. It can be public or private. Registries are where images are uploaded (pushed) and downloaded (pulled) from. For example, Docker Hub, AWS ECR (Elastic Container Registry), and GCR (Google Container Registry) are the repositories. Registry is the home for repositories.
Repository: A repository is a collection of related Docker images, typically containing multiple versions (tags) of a specific image. For example, an image repository for a web application might have tags for ‘latest’, ‘1.0’, and ‘1.1’.

Basic Docker Commands

The following Docker commands are used to start & stop the Docker daemon from the CLI.

#start the docker daemon
systemctl --user start docker-desktop
   
#check the status of docker daemon
systemctl --user status docker-desktop

#restart the docker daemon
systemctl --user restart docker-desktop

#stop the docker daemon
systemctl --user stop docker-desktop

Commands to Run

First, we will list the Docker images that are available in our system. The following two commands are doing the same thing:

docker images

docker image ls

The following command shows the current running Docker containers, the ‘ps’ stands for processing status.

docker ps

The ‘pull’ command is used to download a particular docker image (ngnix:1.23) with a tag (version — a single image can have a multiple version) from the docker hub.

docker pull ngnix:1.23

The ‘run’ command is used to run as a container from the downloaded image.

Also, the run command is enough to download an image and run a container in one go.

docker run ngnix:1.23

The ‘d’ stands for detached mode, it starts the container in the background and immediately returns control to your terminal, allowing you to continue using the terminal for other tasks.

docker run -d ngnix:1.23

The ‘log’ is used to list the history of the container, you can use both the container name and ID.

docker logs ea491royfgv5f

The ‘stop’ is used to shut down a running container, you can use both the container name and ID.

docker stop ea491royfgv5f

Port Binding

Port binding in Docker is the process of mapping a port on the host machine to a port inside the Docker container. This allows you to access services running inside a container (like a web server, database, etc.) through a specific port on the host.

The ‘p’ stands for publishing. It is used to publish or expose a port from the container to the host machine, allowing external access to services running inside the container.

In the following Docker command, the nginx:1.23 container port (where the image is running) is bound with local host 9000. You can access the Docker container through the local host (9000).

docker run -d -p 9000:80 nginx:1.23

#standard way is to keep the container port as the local host port
docker run -d -p 80:80 nginx:1.23

The ‘ — — name’ functionality is used to assign a custom name to the container (ex : web_app).

docker run --name web_app -d -p 80:80 nginx:1.23

The ‘a’ stands for all, it is used to list out the all containers (running and stopped) in the following command.

docker ps -a

The ‘q’ stands for quiet. It is used with the processing status command to show only the container IDs.

docker ps -q

In the following docker command, we can start or stop more than one docker container at a time by using the container (web_app) name or ID (916c89dt549i).

# start or stop more than one container
docker start 916c89dt549i web_app eq13f78u5t

docker stop web_app 916c89dt549i

Commands to Create

The following commands are used to create a Docker image.

FROM: Creating our image from the specified base image (ex: light Linux + Python).
RUN: Executes commands in a new layer on top of the current image and commits the results. It’s often used to install packages or perform other configuration tasks. You can use terminal commands or specify scripts to run.
COPY: Copy files or directories from your host machine to the image. This command can be used to include application code, configuration files, or other resources.
WORKDIR: Sets the working directory for subsequent commands. If the directory does not exist, it will be created. This allows you to organize your file structure within the image.
CMD: It is used to specify the executable or script to run when the container starts.

Docker File

The following Docker files were examples of Fast API development for the data science project. Docker files consist of layers. Each instruction creates one layer. These layers are stacked, and each one is a delta (difference) of the changes from the previous layer.

# Use the official Python image as the base image
FROM python:3.9

# Set the working directory in the container
WORKDIR /app

# Copy the requirements.txt file into the container
COPY requirements.txt .

# Install the required packages
RUN pip install --no-cache-dir -r requirements.txt

# Copy the entire content of the project into the container
COPY . .

# Expose the port that the app runs on
EXPOSE 8000

# Define the command to run the FastAPI application
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

After successfully defining the Docker file, we have to build the Docker image from the file by using the following command. The (.) stands for the current directory, otherwise, you can use the path of the Docker file. Also, the ‘t’ stands for tag (version), here we used the 1.1 as a tag for the building image (test_image).

docker build -t test_image:1.1 .

Summary

Docker is a tool used to build, run, and ship applications in a consistent manner. It standardizes the process of running services in any environment. Docker helps us build an image that consists of the application’s source code, dependencies, and configuration. This image also contains the set of instructions to run the application.

The image includes a lightweight Linux operating system along with required software like Python and pip. Docker provides software-level virtualization, creating an environment to run developed applications without conflicting with other files. These environments are known as Docker containers, which are running instances of Docker images.

Thanks for Reading!

Docker for Data Scientists was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.

This content originally appeared on Level Up Coding - Medium and was authored by Vasukumar P

Print Share Comment Cite Upload Translate Updates

APA

Vasukumar P | Sciencx (2024-11-07T01:20:31+00:00) Docker for Data Scientists. Retrieved from https://www.scien.cx/2024/11/07/docker-for-data-scientists/

MLA

" » Docker for Data Scientists." Vasukumar P | Sciencx - Thursday November 7, 2024, https://www.scien.cx/2024/11/07/docker-for-data-scientists/

HARVARD

Vasukumar P | Sciencx Thursday November 7, 2024 » Docker for Data Scientists., viewed ,<https://www.scien.cx/2024/11/07/docker-for-data-scientists/>

VANCOUVER

Vasukumar P | Sciencx - » Docker for Data Scientists. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2024/11/07/docker-for-data-scientists/

CHICAGO

" » Docker for Data Scientists." Vasukumar P | Sciencx - Accessed . https://www.scien.cx/2024/11/07/docker-for-data-scientists/

IEEE

" » Docker for Data Scientists." Vasukumar P | Sciencx [Online]. Available: https://www.scien.cx/2024/11/07/docker-for-data-scientists/. [Accessed: ]

rf:citation

» Docker for Data Scientists | Vasukumar P | Sciencx | https://www.scien.cx/2024/11/07/docker-for-data-scientists/ |

Please log in to upload a file.

There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.