This content originally appeared on Level Up Coding - Medium and was authored by Rahul Beniwal
Sharding 101
Hello Everyone,
Sharding is a crucial concept in database system design. I explored it with reference to this very helpful Udemy course: Database Engines Crash Course.
Intuition
The backbone of this blog is the usage of consistent ring or hashing to select the right database server to serve incoming requests.
Both sharding and consistent hashing are important concepts in system design, so I recommend delving deeply into both topics. This blog focuses on implementation, but I may cover the theoretical aspects in the future.
Folder Structure
.
├── constants.py
├── db
│ ├── Dockerfile
│ └── init.sql
├── docker-compose.yml
├── Dockerfile
├── main.py
├── README.md
├── requirements.txt
├── utils.py
└── venv
Step 1 : SQL for creating table for storing URL info.
CREATE TABLE url_table {
id SERIAL NOT NULL PRIMARY KEY,
url TEXT,
url_id Character(10)
};
Build the Docker image:
docker build -t pgshard .
Step 2: Dockerfile for Spinning Up DB Servers
from postgres
COPY init.sql /docker-entrypoint-initdb.d
Step 3: Creating Virtual Environment and Installing Dependencies
I am using venv to manage virtual environments and Python 3.12.4. These are the required dependencies:
annotated-types==0.7.0
anyio==4.4.0
asyncpg==0.29.0
click==8.1.7
fastapi==0.110.0
h11==0.14.0
idna==3.7
pydantic==2.8.2
pydantic_core==2.20.1
sniffio==1.3.1
starlette==0.36.3
typing_extensions==4.12.2
uhashring==2.3
uvicorn==0.29.0
watchfiles==0.21.0
Step 4: Creating a Basic FastAPI App
Inside main.py, we will place code for the application server:
from fastapi import FastAPI
app = FastAPI()
I will complete this first let me write my docker-compose file.
Step 5: Dockerfile for Application Server
FROM python:3.12.4-slim
WORKDIR /app
COPY ./requirements.txt /app
# Install any needed packages specified in requirements.txt
RUN pip install -r requirements.txt
COPY . /app
# Define environment variable
ENV PYTHONUNBUFFERED=1
# Run app.py when the container launches
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
Step 6: Docker Compose File for Spinning Up Containers
version: '3.8'
services:
pgshard1:
image: pgshard:latest
container_name: pgshard1
ports:
- "5432:5432"
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: password
POSTGRES_DB: postgres
pgshard2:
image: pgshard:latest
container_name: pgshard2
ports:
- "5433:5432"
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: password
POSTGRES_DB: postgres
pgshard3:
image: pgshard:latest
container_name: pgshard3
ports:
- "5434:5432"
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: password
POSTGRES_DB: postgres
fastapi:
build:
context: .
dockerfile: Dockerfile
container_name: fastapi
ports:
- "8000:8000"
depends_on:
- pgshard1
- pgshard2
- pgshard3
environment:
PG_SHARD1_HOST: pgshard1
PG_SHARD2_HOST: pgshard2
PG_SHARD3_HOST: pgshard3
PG_USER: postgres
PG_PASSWORD: password
PG_DATABASE: postgres
Step 7: Creating Basic Utilities for the Project
utils
import os
from hashlib import md5
from uhashring import HashRing
import constants
PG_SHARD1_HOST = os.getenv('PG_SHARD1_HOST')
PG_SHARD2_HOST = os.getenv('PG_SHARD2_HOST')
PG_SHARD3_HOST = os.getenv('PG_SHARD3_HOST')
PG_USER = os.getenv('PG_USER', 'postgres')
PG_PASSWORD = os.getenv('PG_PASSWORD', 'password')
PG_DATABASE = os.getenv('PG_DATABASE', 'postgres')
def hash_url(url):
return md5(url.encode("utf-8")).hexdigest()
def get_url_id(url_digest):
return url_digest[:10]
def get_db_configs():
return {
"pgshard1": {
"user": PG_USER,
"password": PG_PASSWORD,
"host": PG_SHARD1_HOST,
"port": "5432",
"database": PG_DATABASE,
},
"pgshard2": {
"user": PG_USER,
"password": PG_PASSWORD,
"host": PG_SHARD2_HOST,
"port": "5432",
"database": PG_DATABASE,
},
"pgshard3": {
"user": PG_USER,
"password": PG_PASSWORD,
"host": PG_SHARD3_HOST,
"port": "5432",
"database": PG_DATABASE,
},
}
def get_consistent_hash_obj():
servers = [constants.PG_SHARD1, constants.PG_SHARD2, constants.PG_SHARD3]
return HashRing(nodes=servers)
- hash_url:This function make a md5 hex digest of requested URL.
- get_url_id :First 10 characters from right will be treated as id.
- get_db_configs :Required db configs.
- get_consistent_hash_obj:Prepare an object to implement consistent hashing which take available servers as options.
Constants
PG_SHARD1 = "pgshard1"
PG_SHARD2 = "pgshard2"
PG_SHARD3 = "pgshard3"
Final Step: Implementation Views
In this final step i will implement one get and post view to get url detail and create short url
Request and Response Formats
from pydantic import BaseModel
class Url(BaseModel):
url: str
class UrlResponse(BaseModel):
shard: str
url_id: str
url: str
Connecting Databases
from contextlib import asynccontextmanager
import asyncpg
@asynccontextmanager
async def lifespan(app: FastAPI):
global pool1
global pool2
global pool3
pool1 = await asyncpg.create_pool(
**db_configs["pgshard1"],
)
pool2 = await asyncpg.create_pool(
**db_configs["pgshard2"],
)
pool3 = await asyncpg.create_pool(
**db_configs["pgshard3"],
)
yield
Complete Code
from contextlib import asynccontextmanager
import asyncpg
from fastapi import FastAPI
from pydantic import BaseModel
import constants
from utils import get_consistent_hash_obj, get_db_configs, get_url_id, hash_url
class Url(BaseModel):
url: str
class UrlResponse(BaseModel):
shard: str
url_id: str
url: str
db_configs = get_db_configs()
servers = get_consistent_hash_obj()
# postgresql connection pool
pool1 = None
pool2 = None
pool3 = None
@asynccontextmanager
async def lifespan(app: FastAPI):
global pool1
global pool2
global pool3
pool1 = await asyncpg.create_pool(
**db_configs["pgshard1"],
)
pool2 = await asyncpg.create_pool(
**db_configs["pgshard2"],
)
pool3 = await asyncpg.create_pool(
**db_configs["pgshard3"],
)
yield
app = FastAPI(lifespan=lifespan)
@app.get("/url/{url_id}")
async def get_url(url_id: str):
shard = servers.get_node(url_id)
pools = {
constants.PG_SHARD1: pool1,
constants.PG_SHARD2: pool2,
constants.PG_SHARD3: pool3
}
pool = pools.get(shard, pool3)
async with pool.acquire() as connection:
async with connection.transaction():
response = await connection.fetchrow(
"SELECT * FROM url_table WHERE url_id = $1", url_id
)
response_dict = dict(response or {})
if response_dict:
return UrlResponse(
shard=shard,
url_id=response_dict["url_id"],
url=response_dict["url"],
)
else:
return {"error": "Url not exists"}
@app.post("/url")
async def create_url(url: Url):
url = url.url
url_hash = hash_url(url)
url_id = get_url_id(url_hash)
shard = servers.get_node(url_id)
pools = {
constants.PG_SHARD1: pool1,
constants.PG_SHARD2: pool2,
constants.PG_SHARD3: pool3
}
pool = pools.get(shard, pool3)
async with pool.acquire() as connection:
async with connection.transaction():
await connection.execute(
"INSERT INTO url_table (url_id, url) VALUES ($1, $2)", url_id, url
)
return UrlResponse(
shard=shard,
url_id=url_id,
url=url,
)
Step 8: POC
You can also check complete documentation here.
Conclusion
In this blog, we explored the practical implementation of database sharding using FastAPI. We covered the essential steps to set up a sharded database environment, from creating the necessary SQL tables and Docker configurations to implementing the FastAPI application with consistent hashing. This approach helps in distributing the data across multiple database servers.
To see complete working code,
If you found this blog helpful, leave a 👏 and follow Rahul Beniwal for more related content. If you want to learn about Backend, Python and Databases you can leave a comment and we can discuss further.
Thanks for reading till end. You can also check my other similar blogs
- Is This Year a Leap Year? A SQL Solution
- Demystifying Database Partitioning: Explaining Impact With Practical Examples.
- Unlocking 5 Invaluable Django ORM Insights: Gain an Early Learning Advantage from My Experience
Database Sharding 101 With Python. was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.
This content originally appeared on Level Up Coding - Medium and was authored by Rahul Beniwal
Rahul Beniwal | Sciencx (2024-07-14T17:24:16+00:00) Database Sharding 101 With Python.. Retrieved from https://www.scien.cx/2024/07/14/database-sharding-101-with-python/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.