Using pysimilar to compute similarity between texts

Hi guys ,
I recently wrote an article titled How to detect plagiarism in text using python where by I shown how you can easily detect the plagiarism between documents as title says manually using cosine similarity.

I republished that article on multi…


This content originally appeared on DEV Community and was authored by Jordan Kalebu

Hi guys ,
I recently wrote an article titled How to detect plagiarism in text using python where by I shown how you can easily detect the plagiarism between documents as title says manually using cosine similarity.

I republished that article on multiple platform including here on dev.to and Hackernoon, and its one of my most viewed article plus most starred GitHub repository among articles repositories.

Which gave me a second thought to refactor the code/article to make it more easily and friendly to get started with even for absolutely beginners leading me to build a python library pysimilar which I can say simplify it to the maximum;

Getting started with Pysimilar

To get started with pysimilar for comparing text documents, you just need to install first of which you can either install directly from github or using pip.

Here how to install pysimilar using pip

$ pip install pysimilar

Here how to install directly from github

$ git clone https://github.com/Kalebu/pysimilar
$ cd pysimilar
$ pysimilar -> python setup.py install

With Pysimilar you can either compare text documents as strings or specify the path to the file containing the textual documents.

Comparing strings directly

You can easily compare strings using pysimilar using compare() method just as illustrated below;

>>> from pysimilar import compare
>>> compare('very light indeed', 'how fast is light')
0.17077611319011649

Comparing strings contained files

To compare strings contained in the files, you just need to explicit specify the isfile parameter to True just as illustrated below;

>>> compare('README.md', 'LICENSE', isfile=True)
0.25545580376557886

Well that's all for this article

GitHub logo Kalebu / pysimilar

A python library for computing the similarity between two string(text) based on cosine similarity

pysimilar

A python library for computing the similarity between two string(text) based on cosine similarity made by kalebu

Buy Me A Coffee

How does it work ?

It uses Tfidf Vectorizer to transform the text into vectors and then obtained vectors are converted into arrays of numbers and then finally cosine similary computation is employed resulting to output indicating how similar they are.

Installation

You can either install it directly from Github or use pip to install it, here is how you to install it directly from github;

$  git clone https://github.com/Kalebu/pysimilar
$  cd pysimilar
$ pysimilar -> python setup.py install

Installation with pip

$ pip install pysimilar

Example of usage

Pysimilar allows you to either specify the string you want to compare directly or specify path to files containing string you want to compare.

Here an example on how to compare strings directly;

>>> from pysimilar import compare
>>> compare


This content originally appeared on DEV Community and was authored by Jordan Kalebu


Print Share Comment Cite Upload Translate Updates
APA

Jordan Kalebu | Sciencx (2021-04-07T19:48:42+00:00) Using pysimilar to compute similarity between texts. Retrieved from https://www.scien.cx/2021/04/07/using-pysimilar-to-compute-similarity-between-texts/

MLA
" » Using pysimilar to compute similarity between texts." Jordan Kalebu | Sciencx - Wednesday April 7, 2021, https://www.scien.cx/2021/04/07/using-pysimilar-to-compute-similarity-between-texts/
HARVARD
Jordan Kalebu | Sciencx Wednesday April 7, 2021 » Using pysimilar to compute similarity between texts., viewed ,<https://www.scien.cx/2021/04/07/using-pysimilar-to-compute-similarity-between-texts/>
VANCOUVER
Jordan Kalebu | Sciencx - » Using pysimilar to compute similarity between texts. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2021/04/07/using-pysimilar-to-compute-similarity-between-texts/
CHICAGO
" » Using pysimilar to compute similarity between texts." Jordan Kalebu | Sciencx - Accessed . https://www.scien.cx/2021/04/07/using-pysimilar-to-compute-similarity-between-texts/
IEEE
" » Using pysimilar to compute similarity between texts." Jordan Kalebu | Sciencx [Online]. Available: https://www.scien.cx/2021/04/07/using-pysimilar-to-compute-similarity-between-texts/. [Accessed: ]
rf:citation
» Using pysimilar to compute similarity between texts | Jordan Kalebu | Sciencx | https://www.scien.cx/2021/04/07/using-pysimilar-to-compute-similarity-between-texts/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.