Building an NBA Sport Data Lake Analytic using AWS Services

Overview

The NBA Sport Data Lake Analytic project is a cloud-native solution that builds a scalable data lake for NBA analytics. By leveraging AWS services, this project automates data ingestion, cataloging, and querying, enabling efficient …


This content originally appeared on DEV Community and was authored by Ameh Mathias Ejeh

Overview

The NBA Sport Data Lake Analytic project is a cloud-native solution that builds a scalable data lake for NBA analytics. By leveraging AWS services, this project automates data ingestion, cataloging, and querying, enabling efficient storage and analysis of NBA-related data.

Architecture

The architecture of the project is designed to process and analyze NBA data efficiently. The main components are:

  • Amazon S3: Stores raw and processed data.
  • AWS Glue: Automates data cataloging and schema creation.
  • Amazon Athena: Enables SQL querying of the data stored in S3.

Architecture Diagram

Image description

Workflow

  • Data Ingestion: Fetch data from SportsData.io's NBA API.
  • Data Storage: Store the raw data in Amazon S3.
  • Data Cataloging: Use AWS Glue to create a database and table schema.
  • Data Querying: Query the data using Amazon Athena for analytics.

Prerequisites

Required Accounts and Tools

  • SportsData.io API Key: Sign up at SportsData.io to get access to the NBA API.
  • AWS Account: An active AWS account with permissions to use S3, Glue, and Athena.
  • Python Environment: Python 2.31.0 installed locally. A virtual environment for dependency management.

Permissions

Ensure the IAM user or role has the following AWS permissions:

  • S3: s3:CreateBucket, s3:PutObject, s3:DeleteBucket, s3:ListBucket
  • Glue: glue:CreateDatabase, glue:CreateTable, glue:DeleteDatabase, glue:DeleteTable
  • Athena: athena:StartQueryExecution, athena:GetQueryResults

Setup Guide

Step 1: Clone the Repository

git clone https://github.com/ameh0429/ameh0429-NBA-Sport-Data-Lake-Analytic.git
cd ameh0429-NBA-Sport-Data-Lake-Analytic

Step 2: Install Dependencies

  • Create and activate a virtual environment:
pip install -r requirements.txt

Step 3: Configure Environment Variables

  • Create a .env file with your API key and endpoint:
echo "SPORTS_DATA_API_KEY=your_api_key" >> .env
echo "NBA_ENDPOINT=https://api.sportsdata.io/v3/nba/scores/json/Players" >> .env

Step 4: Run the Data Lake Setup Script

  • In the CLI terminal, paste the setup_nba_data_lake.py script

Image description

  • Run the script
python setup_nba_data_lake.py

The script performs the following actions:

  • Creates an S3 bucket named sports-analytics-data-lake-0429.
  • Uploads NBA player data to the raw-data folder.
  • Configures a Glue database and table.
  • Sets up Athena for querying

Image description

Step 5: Validate Setup

  • S3: Verify the bucket and data file in the AWS Management Console.

Image description

Image description

  • Athena: Run a test query:

Query 1

SELECT FirstName, LastName, Position, Team
FROM nba_players
WHERE Position = 'PG';

The output

Image description
Query 2

SELECT PlayerID, FirstName, LastName, Team, Position
FROM nba_players
WHERE Team = 'LAL';

The output

Image description

Cleanup

To delete all the resources created by the project, run the cleanup script:

python delete_resources.py

This will:

  • Remove the S3 bucket and its contents.
  • Delete the Glue database and table.
  • Clean up Athena configurations.


This content originally appeared on DEV Community and was authored by Ameh Mathias Ejeh


Print Share Comment Cite Upload Translate Updates
APA

Ameh Mathias Ejeh | Sciencx (2025-01-23T22:36:30+00:00) Building an NBA Sport Data Lake Analytic using AWS Services. Retrieved from https://www.scien.cx/2025/01/23/building-an-nba-sport-data-lake-analytic-using-aws-services/

MLA
" » Building an NBA Sport Data Lake Analytic using AWS Services." Ameh Mathias Ejeh | Sciencx - Thursday January 23, 2025, https://www.scien.cx/2025/01/23/building-an-nba-sport-data-lake-analytic-using-aws-services/
HARVARD
Ameh Mathias Ejeh | Sciencx Thursday January 23, 2025 » Building an NBA Sport Data Lake Analytic using AWS Services., viewed ,<https://www.scien.cx/2025/01/23/building-an-nba-sport-data-lake-analytic-using-aws-services/>
VANCOUVER
Ameh Mathias Ejeh | Sciencx - » Building an NBA Sport Data Lake Analytic using AWS Services. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/01/23/building-an-nba-sport-data-lake-analytic-using-aws-services/
CHICAGO
" » Building an NBA Sport Data Lake Analytic using AWS Services." Ameh Mathias Ejeh | Sciencx - Accessed . https://www.scien.cx/2025/01/23/building-an-nba-sport-data-lake-analytic-using-aws-services/
IEEE
" » Building an NBA Sport Data Lake Analytic using AWS Services." Ameh Mathias Ejeh | Sciencx [Online]. Available: https://www.scien.cx/2025/01/23/building-an-nba-sport-data-lake-analytic-using-aws-services/. [Accessed: ]
rf:citation
» Building an NBA Sport Data Lake Analytic using AWS Services | Ameh Mathias Ejeh | Sciencx | https://www.scien.cx/2025/01/23/building-an-nba-sport-data-lake-analytic-using-aws-services/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.