Database Sharding 101 With Python.

Sharding 101Hello Everyone,Sharding is a crucial concept in database system design. I explored it with reference to this very helpful Udemy course: Database Engines Crash Course.Image Credit FreepikIntuitionThe backbone of this blog is the usage of con…


This content originally appeared on Level Up Coding - Medium and was authored by Rahul Beniwal

Sharding 101

Hello Everyone,

Sharding is a crucial concept in database system design. I explored it with reference to this very helpful Udemy course: Database Engines Crash Course.

Image Credit Freepik

Intuition

The backbone of this blog is the usage of consistent ring or hashing to select the right database server to serve incoming requests.

Both sharding and consistent hashing are important concepts in system design, so I recommend delving deeply into both topics. This blog focuses on implementation, but I may cover the theoretical aspects in the future.

Folder Structure

.
├── constants.py
├── db
│ ├── Dockerfile
│ └── init.sql
├── docker-compose.yml
├── Dockerfile
├── main.py
├── README.md
├── requirements.txt
├── utils.py
└── venv

Step 1 : SQL for creating table for storing URL info.

CREATE TABLE url_table {
id SERIAL NOT NULL PRIMARY KEY,
url TEXT,
url_id Character(10)
};

Build the Docker image:

docker build -t pgshard .

Step 2: Dockerfile for Spinning Up DB Servers

from postgres
COPY init.sql /docker-entrypoint-initdb.d

Step 3: Creating Virtual Environment and Installing Dependencies

I am using venv to manage virtual environments and Python 3.12.4. These are the required dependencies:

annotated-types==0.7.0
anyio==4.4.0
asyncpg==0.29.0
click==8.1.7
fastapi==0.110.0
h11==0.14.0
idna==3.7
pydantic==2.8.2
pydantic_core==2.20.1
sniffio==1.3.1
starlette==0.36.3
typing_extensions==4.12.2
uhashring==2.3
uvicorn==0.29.0
watchfiles==0.21.0

Step 4: Creating a Basic FastAPI App

Inside main.py, we will place code for the application server:

from fastapi import FastAPI

app = FastAPI()

I will complete this first let me write my docker-compose file.

Step 5: Dockerfile for Application Server

FROM python:3.12.4-slim

WORKDIR /app

COPY ./requirements.txt /app
# Install any needed packages specified in requirements.txt
RUN pip install -r requirements.txt

COPY . /app

# Define environment variable
ENV PYTHONUNBUFFERED=1

# Run app.py when the container launches
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

Step 6: Docker Compose File for Spinning Up Containers

version: '3.8'

services:
pgshard1:
image: pgshard:latest
container_name: pgshard1
ports:
- "5432:5432"
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: password
POSTGRES_DB: postgres

pgshard2:
image: pgshard:latest
container_name: pgshard2
ports:
- "5433:5432"
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: password
POSTGRES_DB: postgres

pgshard3:
image: pgshard:latest
container_name: pgshard3
ports:
- "5434:5432"
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: password
POSTGRES_DB: postgres

fastapi:
build:
context: .
dockerfile: Dockerfile
container_name: fastapi
ports:
- "8000:8000"
depends_on:
- pgshard1
- pgshard2
- pgshard3
environment:
PG_SHARD1_HOST: pgshard1
PG_SHARD2_HOST: pgshard2
PG_SHARD3_HOST: pgshard3
PG_USER: postgres
PG_PASSWORD: password
PG_DATABASE: postgres

Step 7: Creating Basic Utilities for the Project

utils
import os
from hashlib import md5

from uhashring import HashRing

import constants

PG_SHARD1_HOST = os.getenv('PG_SHARD1_HOST')
PG_SHARD2_HOST = os.getenv('PG_SHARD2_HOST')
PG_SHARD3_HOST = os.getenv('PG_SHARD3_HOST')
PG_USER = os.getenv('PG_USER', 'postgres')
PG_PASSWORD = os.getenv('PG_PASSWORD', 'password')
PG_DATABASE = os.getenv('PG_DATABASE', 'postgres')


def hash_url(url):
return md5(url.encode("utf-8")).hexdigest()


def get_url_id(url_digest):
return url_digest[:10]


def get_db_configs():
return {
"pgshard1": {
"user": PG_USER,
"password": PG_PASSWORD,
"host": PG_SHARD1_HOST,
"port": "5432",
"database": PG_DATABASE,
},
"pgshard2": {
"user": PG_USER,
"password": PG_PASSWORD,
"host": PG_SHARD2_HOST,
"port": "5432",
"database": PG_DATABASE,
},
"pgshard3": {
"user": PG_USER,
"password": PG_PASSWORD,
"host": PG_SHARD3_HOST,
"port": "5432",
"database": PG_DATABASE,
},
}


def get_consistent_hash_obj():
servers = [constants.PG_SHARD1, constants.PG_SHARD2, constants.PG_SHARD3]
return HashRing(nodes=servers)
  • hash_url:This function make a md5 hex digest of requested URL.
  • get_url_id :First 10 characters from right will be treated as id.
  • get_db_configs :Required db configs.
  • get_consistent_hash_obj:Prepare an object to implement consistent hashing which take available servers as options.
Constants
PG_SHARD1 = "pgshard1"
PG_SHARD2 = "pgshard2"
PG_SHARD3 = "pgshard3"

Final Step: Implementation Views

In this final step i will implement one get and post view to get url detail and create short url

Request and Response Formats
from pydantic import BaseModel


class Url(BaseModel):
url: str


class UrlResponse(BaseModel):
shard: str
url_id: str
url: str

Connecting Databases

from contextlib import asynccontextmanager
import asyncpg


@asynccontextmanager
async def lifespan(app: FastAPI):
global pool1
global pool2
global pool3

pool1 = await asyncpg.create_pool(
**db_configs["pgshard1"],
)
pool2 = await asyncpg.create_pool(
**db_configs["pgshard2"],
)
pool3 = await asyncpg.create_pool(
**db_configs["pgshard3"],
)

yield
Complete Code
from contextlib import asynccontextmanager

import asyncpg
from fastapi import FastAPI
from pydantic import BaseModel

import constants
from utils import get_consistent_hash_obj, get_db_configs, get_url_id, hash_url


class Url(BaseModel):
url: str


class UrlResponse(BaseModel):
shard: str
url_id: str
url: str


db_configs = get_db_configs()
servers = get_consistent_hash_obj()

# postgresql connection pool
pool1 = None
pool2 = None
pool3 = None


@asynccontextmanager
async def lifespan(app: FastAPI):
global pool1
global pool2
global pool3

pool1 = await asyncpg.create_pool(
**db_configs["pgshard1"],
)
pool2 = await asyncpg.create_pool(
**db_configs["pgshard2"],
)
pool3 = await asyncpg.create_pool(
**db_configs["pgshard3"],
)

yield


app = FastAPI(lifespan=lifespan)


@app.get("/url/{url_id}")
async def get_url(url_id: str):
shard = servers.get_node(url_id)

pools = {
constants.PG_SHARD1: pool1,
constants.PG_SHARD2: pool2,
constants.PG_SHARD3: pool3
}

pool = pools.get(shard, pool3)
async with pool.acquire() as connection:
async with connection.transaction():
response = await connection.fetchrow(
"SELECT * FROM url_table WHERE url_id = $1", url_id
)
response_dict = dict(response or {})
if response_dict:
return UrlResponse(
shard=shard,
url_id=response_dict["url_id"],
url=response_dict["url"],
)
else:
return {"error": "Url not exists"}


@app.post("/url")
async def create_url(url: Url):
url = url.url
url_hash = hash_url(url)
url_id = get_url_id(url_hash)

shard = servers.get_node(url_id)
pools = {
constants.PG_SHARD1: pool1,
constants.PG_SHARD2: pool2,
constants.PG_SHARD3: pool3
}

pool = pools.get(shard, pool3)

async with pool.acquire() as connection:
async with connection.transaction():
await connection.execute(
"INSERT INTO url_table (url_id, url) VALUES ($1, $2)", url_id, url
)
return UrlResponse(
shard=shard,
url_id=url_id,
url=url,
)

Step 8: POC

You can also check complete documentation here.

DB Sharding with Python

Conclusion

In this blog, we explored the practical implementation of database sharding using FastAPI. We covered the essential steps to set up a sharded database environment, from creating the necessary SQL tables and Docker configurations to implementing the FastAPI application with consistent hashing. This approach helps in distributing the data across multiple database servers.

To see complete working code,

GitHub - Rahulbeniwal26119/url-shortner-with-db-sharding: Quick Demonstration of DB Sharding using FastAPI to create URL Shortner

If you found this blog helpful, leave a 👏 and follow Rahul Beniwal for more related content. If you want to learn about Backend, Python and Databases you can leave a comment and we can discuss further.

Thanks for reading till end. You can also check my other similar blogs


Database Sharding 101 With Python. was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.


This content originally appeared on Level Up Coding - Medium and was authored by Rahul Beniwal


Print Share Comment Cite Upload Translate Updates
APA

Rahul Beniwal | Sciencx (2024-07-14T17:24:16+00:00) Database Sharding 101 With Python.. Retrieved from https://www.scien.cx/2024/07/14/database-sharding-101-with-python/

MLA
" » Database Sharding 101 With Python.." Rahul Beniwal | Sciencx - Sunday July 14, 2024, https://www.scien.cx/2024/07/14/database-sharding-101-with-python/
HARVARD
Rahul Beniwal | Sciencx Sunday July 14, 2024 » Database Sharding 101 With Python.., viewed ,<https://www.scien.cx/2024/07/14/database-sharding-101-with-python/>
VANCOUVER
Rahul Beniwal | Sciencx - » Database Sharding 101 With Python.. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2024/07/14/database-sharding-101-with-python/
CHICAGO
" » Database Sharding 101 With Python.." Rahul Beniwal | Sciencx - Accessed . https://www.scien.cx/2024/07/14/database-sharding-101-with-python/
IEEE
" » Database Sharding 101 With Python.." Rahul Beniwal | Sciencx [Online]. Available: https://www.scien.cx/2024/07/14/database-sharding-101-with-python/. [Accessed: ]
rf:citation
» Database Sharding 101 With Python. | Rahul Beniwal | Sciencx | https://www.scien.cx/2024/07/14/database-sharding-101-with-python/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.