Harnessing AWS Bedrock: Create a Generative AI PDF Chatbot

This content originally appeared on DEV Community and was authored by Ndzenyuy Jones Tsangue

In this project, we will develop an advanced AI-powered chatbot designed to facilitate seamless interaction with PDF documents. Users will have the capability to upload files of up to 200MB and pose questions to the chatbot, enabling them to explore and extract valuable insights from the content of the documents.

This innovative application will enhance user experience by providing a conversational interface that simplifies the process of retrieving information from complex PDF files. By leveraging cutting-edge natural language processing techniques, the chat-bot will understand and respond to user inquiries intelligently, making it a powerful tool for both educational and professional settings.

The development of this application will utilize a range of sophisticated tools and technologies, ensuring robust performance and an intuitive user experience. The tools used to build this application include:

Amazon bedrock
AWS S3
AWS EC2
Docker
Langchain
Streamlit

Architecture

Principle

In this application, users can upload a PDF file through the user interface (UI). The PDF is then processed using PyPDF, which divides the content into manageable chunks. These chunks are transformed into vectors, providing a machine-learning-friendly representation of the PDF's content. The resulting vectors are subsequently stored in an Amazon S3 bucket for efficient access and retrieval.
When a user Queries the PDF, the application processes the vector from the S3 for similarities and generates a prompt with the query and context, these are then used as input for Jurassic-2 Mid llm model which will generate an answer and respond to the user.

The application operates within a Docker container, utilizing Streamlit to create a visually appealing and user-friendly interface. This combination of technologies ensures a smooth and interactive experience for users as they engage with the content of their uploaded PDFs.

How to build It

Launch an EC2 instance

Longin to AWS console and Launch a t2.micro instance with the following configuration

Name: pdf-Chat-Bot Instance type: t2.micro AMI: Ubuntu:latest Volume: 8GiB Security gate: Create new - inbound rules => allow 8083 from everywhere => allow ssh from my IP launch template:

#!bin/bash
# Add Docker's official GPG key:
sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc

# Add the repository to Apt sources:
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin -y

Create an IAM Role for the EC2 instance to access Bedrock and S3

Go to AWS console -> IAM -> Roles -> Create role

name: pdfBotRole attach policies: - AmazonBedrockFullAccess - AmazonS3FullAccess

Attach role to EC2 instance

Go to the EC2 console, select the instance then select Actions -> Security -> Modify IAM role
Select the IAM role previously created "pdfBotRole" and click Apply

Create S3 bucket

On the console, search S3 then create a bucket with the following(xxx are random numbers to make the bucket name unique)

name: bedrock-chatpdf-xxx region: us-east-one allow defaults and choose create
Copy the bucket name as well use it in the next steps

SSH into the instance and clone Source code

Copy the public address of the instance and open a terminal
ssh -i "path/to/.pem-file" ubuntu@public-ip-address
Verify docker installation

docker ps

If docker is installed, will see a table for existing docker containers which will of course be empty

Clone source

Open a terminal and run the following
git clone https://github.com/Ndzenyuy/chatPdf.git cd chatPdf

Access for LLM models in Amazon Bedrock

On the console Amazon Bedrock -> Base models -> Model Access
Make sure you have access to Jurassic-2 Ultra and Titan Embeddings G1 - Text, if not you can access by requesting access.

Build and Run App docker image

Make sure you are inside chatPdf folder and run the following command

docker build -t chatPdf-app .

The image will be built, then we can run it with the following

docker run -d -e BUCKET_NAME="yourBucketName" -p 8083:8083 chatPdf

Now copy the public IP of the EC2 instance and type it on the browser followed by the port number 8083. For instance

XX.XX.XX.XX:8083

How to use the App

The landing page will first require the user to upload a pdf document

Either drag and drop or Click on the button "Browse files" Load the PDF document and ask questions based on its content

Conclusion

In this project, we have successfully developed an innovative PDF chatbot application designed to significantly reduce the time spent on research and reading. By transforming hours of traditional document review into just a few minutes of interactive engagement, users can efficiently understand the content, authorship, summaries, and key findings of PDF documents.

This application serves as a valuable tool for students, enhancing their ability to engage with academic articles and literature. By streamlining the process of extracting relevant information from complex texts, the chatbot not only saves time but also fosters critical thinking and enhances research skills. This project represents a significant advancement in educational technology, paving the way for improved student capabilities in navigating and relating to scholarly articles.

This content originally appeared on DEV Community and was authored by Ndzenyuy Jones Tsangue

Print Share Comment Cite Upload Translate Updates

APA

Ndzenyuy Jones Tsangue | Sciencx (2024-11-02T20:58:56+00:00) Harnessing AWS Bedrock: Create a Generative AI PDF Chatbot. Retrieved from https://www.scien.cx/2024/11/02/harnessing-aws-bedrock-create-a-generative-ai-pdf-chatbot/

MLA

" » Harnessing AWS Bedrock: Create a Generative AI PDF Chatbot." Ndzenyuy Jones Tsangue | Sciencx - Saturday November 2, 2024, https://www.scien.cx/2024/11/02/harnessing-aws-bedrock-create-a-generative-ai-pdf-chatbot/

HARVARD

Ndzenyuy Jones Tsangue | Sciencx Saturday November 2, 2024 » Harnessing AWS Bedrock: Create a Generative AI PDF Chatbot., viewed ,<https://www.scien.cx/2024/11/02/harnessing-aws-bedrock-create-a-generative-ai-pdf-chatbot/>

VANCOUVER

Ndzenyuy Jones Tsangue | Sciencx - » Harnessing AWS Bedrock: Create a Generative AI PDF Chatbot. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2024/11/02/harnessing-aws-bedrock-create-a-generative-ai-pdf-chatbot/

CHICAGO

" » Harnessing AWS Bedrock: Create a Generative AI PDF Chatbot." Ndzenyuy Jones Tsangue | Sciencx - Accessed . https://www.scien.cx/2024/11/02/harnessing-aws-bedrock-create-a-generative-ai-pdf-chatbot/

IEEE

" » Harnessing AWS Bedrock: Create a Generative AI PDF Chatbot." Ndzenyuy Jones Tsangue | Sciencx [Online]. Available: https://www.scien.cx/2024/11/02/harnessing-aws-bedrock-create-a-generative-ai-pdf-chatbot/. [Accessed: ]

rf:citation

» Harnessing AWS Bedrock: Create a Generative AI PDF Chatbot | Ndzenyuy Jones Tsangue | Sciencx | https://www.scien.cx/2024/11/02/harnessing-aws-bedrock-create-a-generative-ai-pdf-chatbot/ |

Please log in to upload a file.

There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.