URL -> Markdown

Scrappy

This week, I initiated my own open source project called scrappy. `Scrappy is a command line tool that will convert any website that can be scraped into a markdown. Its just a normal classroom project that fetches the website using a…


This content originally appeared on DEV Community and was authored by Krinskumar Vaghasia

Scrappy

This week, I initiated my own open source project called scrappy. `Scrappy is a command line tool that will convert any website that can be scraped into a markdown. Its just a normal classroom project that fetches the website using a URL -> Extracts the body -> Converts the body into MD using LLM. The magic comes after this, where everyone along with me shall contribute to the project to make it better and add more functionality.

Features

  • Input: The main feature is that you can convert any website into a md, For this we will need a url of the page. You can provide a URL either using a file or command line arg.


# with url in the file
scrappy files/input.txt
# with url in the args
scrappy --url https://www.senecapolytechnic.ca/cgi-bin/subject?s1=OSD600
or
scrappy -u https://www.senecapolytechnic.ca/cgi-bin/subject?s1=OSD600

  • Output: The convert md can be stored in a preferred file if the file is passed using -0 flag. # the md will be saved in the output.md in this case scrappy files/input.txt -0 files/output or scrappy files/input.txt --output files/output # no output file will make a new file in the same dir scrappy files/input.txt

Example

To use the tool, you just have to call it with the input file that contains the link, the output file is optional, This will fill the input file the md of the link that the file contains as shown in the gif below.

Image description


This content originally appeared on DEV Community and was authored by Krinskumar Vaghasia


Print Share Comment Cite Upload Translate Updates
APA

Krinskumar Vaghasia | Sciencx (2024-09-21T03:25:40+00:00) URL -> Markdown. Retrieved from https://www.scien.cx/2024/09/21/url-markdown/

MLA
" » URL -> Markdown." Krinskumar Vaghasia | Sciencx - Saturday September 21, 2024, https://www.scien.cx/2024/09/21/url-markdown/
HARVARD
Krinskumar Vaghasia | Sciencx Saturday September 21, 2024 » URL -> Markdown., viewed ,<https://www.scien.cx/2024/09/21/url-markdown/>
VANCOUVER
Krinskumar Vaghasia | Sciencx - » URL -> Markdown. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2024/09/21/url-markdown/
CHICAGO
" » URL -> Markdown." Krinskumar Vaghasia | Sciencx - Accessed . https://www.scien.cx/2024/09/21/url-markdown/
IEEE
" » URL -> Markdown." Krinskumar Vaghasia | Sciencx [Online]. Available: https://www.scien.cx/2024/09/21/url-markdown/. [Accessed: ]
rf:citation
» URL -> Markdown | Krinskumar Vaghasia | Sciencx | https://www.scien.cx/2024/09/21/url-markdown/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.