This content originally appeared on DEV Community and was authored by Jad Tounsi
π Multi-Agent System for π ANY AI/ML Model: π Web Scraping & π Content Analysis Powered by the π AI/ML API
This project demonstrates a multi-agent system that automates web scraping, content analysis, and summary generation using the AI/ML API. It is built using Streamlit for the user interface, BeautifulSoup for web scraping, and the AI/ML API for text generation and analysis.
The app enables you to dynamically change the model and modify any agent in the workflow to suit different use cases. Simply provide your AI/ML API key, and you can use any model supported by the AI/ML API.
Get your AI/ML API
You can obtain your AI/ML API key by visiting the following link:
Features
- Web Scraping: Scrapes the content of a given website URL using BeautifulSoup.
- Content Analysis: Analyzes the scraped content to extract key insights using the AI/ML API.
- Summary Generation: Generates a detailed summary of the analyzed content.
- Streamlit UI: Interactive user interface that allows users to enter the website URL and view the generated report.
- Flexible AI Models: Supports any model from the AI/ML API. You can change the model used for content analysis and summary generation dynamically.
- Agent Customization: Modify the behavior of each agent (scraping, analyzing, summarizing) by changing the instructions, functions, or models.
How It Works
-
AI/ML API Key Input
- The app dynamically sets the API key using an input field. The key is stored in the environment and used for making API calls to the AI/ML API.
-
Web Scraping
- The app scrapes the provided website URL using BeautifulSoup and extracts the text content from the website's HTML.
-
Content Analysis
- The scraped content is analyzed by the AI/ML API using a chat completion model to extract key insights.
-
Summary Generation
- A detailed summary is generated using the AI/ML API based on the content analysis.
-
Download Report
- The final summary can be downloaded as a text file directly from the Streamlit interface.
Installation
Prerequisites
- Python 3.10+
- Streamlit for the interactive web interface.
- BeautifulSoup for web scraping.
- Requests for handling HTTP requests.
- AI/ML API Key for making API calls.
Steps
-
Clone the Repository:
git clone https://github.com/jadouse5/aimlapi-webscraper-agents.git cd aimlapi-webscraper-agents
-
Set Up a Virtual Environment:
python3 -m venv myenv source myenv/bin/activate # On macOS/Linux myenv\Scripts\activate # On Windows
-
Install Required Packages:
pip install -r requirements.txt
-
Set Up API Keys:
Create a.env
file in the project root and add your AI/ML API key:
echo "AIMLAPI_API_KEY=your-api-key-here" > .env
-
Run the Application:
streamlit run app.py
Usage
Open the Web Interface:
Once the application is running, it will open in your default browser. If not, go to http://localhost:8501 manually.Set Your AI/ML API Key:
Input your AI/ML API Key in the text box to authenticate and allow the app to access the API.Input Website URL:
Enter the URL of the website you want to scrape in the provided input box.Run Workflow:
Click the "Run Workflow" button to start scraping the website, analyzing its content, and generating a summary report.Modify Models or Agents:
You can modify the AI models used in each agent by adjusting the code, allowing you to experiment with different models for scraping, analysis, or summarizing.Download Report:
Once the workflow completes, you can download the generated report by clicking the "Download Report" button.
Key Components
Web Scraping:
Scrapes the text content from the provided website URL using BeautifulSoup.Content Analysis:
The scraped content is analyzed using the AI/ML API, extracting key insights.Summary Generation:
A detailed summary is generated based on the analysis using another AI model call.
Code Example
Hereβs an example of how the system orchestrates the workflow:
def orchestrate_workflow(client, url):
# Step 1: Scrape the website
scraped_content = scrape_website(url)
# Step 2: Analyze the scraped content
messages = [
{"role": "system", "content": "You are an agent that analyzes content and extracts key insights."},
{"role": "user", "content": f"Analyze the following content: {scraped_content}"}
]
response = client.chat.completions.create(
model="gpt-4o-mini-2024-07-18",
messages=messages
)
analysis_summary = response.choices[0].message.content
# Step 3: Write the summary based on the analysis
messages = [
{"role": "system", "content": "You are an agent that writes summaries of research."},
{"role": "user", "content": f"Write a summary based on this analysis: {analysis_summary}"}
]
response = client.chat.completions.create(
model="gpt-4o-mini-2024-07-18",
messages=messages
)
final_summary = response.choices[0].message.content
return final_summary
Customization
Using Different Models
You can change the models used in the agents by modifying the model
parameter in the orchestrate_workflow
function. The AI/ML API supports multiple models, allowing you to experiment with different models for each task:
- Scraping Agent: Modify the scraping agent to handle different types of content or preprocess the data differently.
- Analysis Agent: Choose a model that best suits your analysis needs, such as summarization or topic extraction.
- Summary Agent: Use a model that generates detailed, concise, or creative summaries depending on your goal.
Modify Agents
Each agent is highly customizable. Adjust the instructions or add new functions for more advanced workflows.
Future Improvements
- Advanced Scraping: Improve the scraper to handle dynamic content (e.g., JavaScript-heavy sites).
- More Detailed Analysis: Expand the analysis to include sentiment analysis or categorization.
- Multilingual Support: Extend the app to support scraping, analyzing, and summarizing content in multiple languages.
- CAPTCHA Handling: Add support for bypassing or manually entering CAPTCHAs when scraping protected websites.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Contact
Developed by: Jad Tounsi El Azzoiani
GitHub: Jad Tounsi El Azzoiani
LinkedIn: Jad Tounsi El Azzoiani
This content originally appeared on DEV Community and was authored by Jad Tounsi
Jad Tounsi | Sciencx (2024-10-20T21:13:23+00:00) Multi-Agent System for π ANY AI/ML Model: π Web Scraping & π Content Analysis Powered by the π AI/ML API. Retrieved from https://www.scien.cx/2024/10/20/multi-agent-system-for-%f0%9f%9a%80-any-ai-ml-model-%f0%9f%8c%90-web-scraping-%f0%9f%93%9d-content-analysis-powered-by-the-%f0%9f%94%97-ai-ml-api/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.