This content originally appeared on DEV Community and was authored by Krinskumar Vaghasia
Scrappy
This week, I initiated my own open source project called scrappy. `Scrappy is a command line tool that will convert any website that can be scraped into a markdown. Its just a normal classroom project that fetches the website using a URL -> Extracts the body -> Converts the body into MD using LLM. The magic comes after this, where everyone along with me shall contribute to the project to make it better and add more functionality.
Features
- Input: The main feature is that you can convert any website into a md, For this we will need a url of the page. You can provide a URL either using a file or command line arg.
# with url in the file
scrappy files/input.txt
# with url in the args
scrappy --url https://www.senecapolytechnic.ca/cgi-bin/subject?s1=OSD600
or
scrappy -u https://www.senecapolytechnic.ca/cgi-bin/subject?s1=OSD600
-
Output: The convert md can be stored in a preferred file if the file is passed using
-0
flag.# the md will be saved in the output.md in this case scrappy files/input.txt -0 files/output or scrappy files/input.txt --output files/output # no output file will make a new file in the same dir scrappy files/input.txt
Example
To use the tool, you just have to call it with the input file that contains the link, the output file is optional, This will fill the input file the md of the link that the file contains as shown in the gif below.
This content originally appeared on DEV Community and was authored by Krinskumar Vaghasia
Krinskumar Vaghasia | Sciencx (2024-09-21T03:25:40+00:00) URL -> Markdown. Retrieved from https://www.scien.cx/2024/09/21/url-markdown/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.