Comparing Language Detection Libraries (& API) Using Java/ColdFusion/CFML

Language detection is a feature that we needed in a past project. I wrote an article in 2020 regarding the use of kju2 fork of the Optimaize Language Detector java library. The Optimaize library hasn’t been updated since 2015 and the kju2 fork was pla…


This content originally appeared on DEV Community and was authored by James Moberg

Language detection is a feature that we needed in a past project. I wrote an article in 2020 regarding the use of kju2 fork of the Optimaize Language Detector java library. The Optimaize library hasn't been updated since 2015 and the kju2 fork was placed in read-only mode on Apr 16, 2023.

I evaluated the Lingua java library. It claims to be the "The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike" and also appears to be actively updated & supported. In my small unit test, Lingua seemed to be slightly slower and couldn't correctly identify Malay text.

The detection time for both java libraries was fairly random for English. Sometimes it would return a response in 295 ms and other times it would be 48,000+ ms. (Maybe it's just my developer PC.) kju2 seemed to be faster on average.

I also found a third-party Detect Language API that supports 165 languages and claims to have "high accuracy". It requires an API key and offers both free & premium plans.

If you're performing language detection with Java and/or ColdFusion/CFML, what are you using?

Should Be kju2 lang kju2 ms lingua lang lingua ms api lang api ms Sample Text
ENGLISH ENGLISH 2272 ENGLISH 570 en 537 A great way to learn Spanish vocabulary is by reading texts, stories or articles that are completely in the language. That is why we have written are own short reading passages in Spanish about different topics.
GREEK GREEK 6 GREEK 12 el 105 Βίβλος γενέσεως Ἰησοῦ Χριστοῦ υἱοῦ Δαυεὶδ υἱοῦ Ἀβραάμ.
FRENCH FRENCH 61 FRENCH 78 fr 70 En hiver, il fait froid en France. Le soleil se lève tard. Il fait encore nuit quand je vais au travail. Parfois, il y a même de la neige.
HEBREW HEBREW 3 HEBREW 11 iw 110 כל ישראל יש להם חלק לעולם הבא, שנאמר ועמך כולם צדיקים, לעולם יירשו ארץ, נצר מטעי מעשה ידי להתפאר.
ARABIC ARABIC 2 ARABIC 19 ar 81 عندما يريد العالم أن ‪يتكلّم ‬ ، فهو يتحدّث بلغة يونيكود. تسجّل الآن لحضور المؤتمر الدولي العاشر ليونيكود
CHINESE CHINESE 2 CHINESE 8 zh 74 虽然它长得不好看,但是它有一颗无比善良的心。小猴子乐乐的家被大水冲垮了,无家可归。丑丑就让乐乐住在自己的家,还把自己最喜欢吃的巧克力分给乐乐吃。不仅如此,谁头痛、生病了,没钱买药,它都会尽其所能进行帮助。
KOREAN KOREAN 12 KOREAN 3 ko 120 안녕하십니까 할리데이비슨 대구점 MC 우제헌입니다. 포티에잇 문의 전달받고 전화 드렸습니다만 연결되지 않아 문자 드립니다.
SPANISH SPANISH 0 SPANISH 116 es 92 Habitualmente este término se aplica a todas las pistas donde aterrizan aviones, sin embargo el término correcto es aeródromo.
THAI THAI 1 THAI 14 th 105 ข้อ 1 มนุษย์ทั้งหลายเกิดมามีอิสระและเสมอภาคกันในเกียรติศักด[เกียรติศักดิ์]และสิทธิ ต่างมีเหตุผลและมโนธรรม และควรปฏิบัติต่อกันด้วยเจตนารมณ์แห่งภราดรภาพ
VIETNAMESE VIETNAMESE 2 VIETNAMESE 14 vi 98 Tất cả mọi người sinh ra đều được tự do và bình đẳng về nhân phẩm và quyền lợi. Mọi con người đều được tạo hóa ban cho lý trí và lương tâm và cần phải đối xử với nhau trong tình anh em.
TURKISH TURKISH 3 TURKISH 93 tr 212 Yukarda mavi gök, asağıda yağız yer yaratıldıkta; ikisinin arasında insan oğlu yaratılmış. İnsan oğulları üzerine ecdadım Bumın hakan, İstemi hakan tahta oturmuş; oturarak Türk milletinin ülkesini, türesini, idare edivermiş, tanzim edivermis. Dört taraf hep düşman imiş. Asker sevk edip dört taraftaki kavmi hep (itaati altına) almış hep muti kılmış. Başlılara baş eğdirmiş, dizlilere diz çöktürmüş.
JAPANESE JAPANESE 3 JAPANESE 7 ja 194 幸運こううんにも、息子むすこはこの四月しがつから保育園ほいくえんに入はいることができ、私わたしはまた働はたらき始はじめた。
RUSSIAN RUSSIAN 2 RUSSIAN 1393 ru 96 Все люди рождаются свободными и равными в своем достоинстве и правах. Они наделены разумом и совестью и должны поступать в отношении друг друга в духе братства.
FINNISH FINNISH 6 FINNISH 383 fi 125 Jokaisella on oikeus saada opetusta. Opetuksen on oltava ainakin alkeis- ja perusopetuksen osalta maksutonta. Alkeisopetuksen on oltava pakollinen. Teknistä ja ammattiopetusta on oltava yleisesti saatavilla, ja korkeamman opetuksen on oltava avoinna yhtäläisesti kaikille heidän kykyjensä mukaan.
URDU URDU 1 URDU 22 ur 104 ایک ملک پر سخت گیر بادشاہ حکومت کرتا تھا۔وہ رعایا پر طرح طرح کے ٹیکس عائد کرتا اور ٹیکس کے پیسے دوسرے ممالک میں جاکر فضولیات میں ضائع کرتا۔
THAI THAI 1 THAI 2 th 78 Hello World สวัสดีชาวโลก
MALAY MALAY 2 UNKNOWN 291 id 88 Pesan moral dari Cerita Rakyat Bawang Merah Bawang Putih adalah Jangan terlalu tamak dan serakah. Setiap orang sudah memiliki rezekinya masing-masing. Orang yang terlalu serakah akan mendapatkan balasan yang setimpal dengan perbuatannya. Selalu berbuat baik lah dalam setiap tingkah laku, maka kita akan mendapatkan kebaikan dan kebahagiaan.

Source Code

https://gist.github.com/JamoCA/b883fbddf0303df8f4b0d597cfc2ae25


This content originally appeared on DEV Community and was authored by James Moberg


Print Share Comment Cite Upload Translate Updates
APA

James Moberg | Sciencx (2024-09-05T00:00:12+00:00) Comparing Language Detection Libraries (& API) Using Java/ColdFusion/CFML. Retrieved from https://www.scien.cx/2024/09/05/comparing-language-detection-libraries-api-using-java-coldfusion-cfml/

MLA
" » Comparing Language Detection Libraries (& API) Using Java/ColdFusion/CFML." James Moberg | Sciencx - Thursday September 5, 2024, https://www.scien.cx/2024/09/05/comparing-language-detection-libraries-api-using-java-coldfusion-cfml/
HARVARD
James Moberg | Sciencx Thursday September 5, 2024 » Comparing Language Detection Libraries (& API) Using Java/ColdFusion/CFML., viewed ,<https://www.scien.cx/2024/09/05/comparing-language-detection-libraries-api-using-java-coldfusion-cfml/>
VANCOUVER
James Moberg | Sciencx - » Comparing Language Detection Libraries (& API) Using Java/ColdFusion/CFML. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2024/09/05/comparing-language-detection-libraries-api-using-java-coldfusion-cfml/
CHICAGO
" » Comparing Language Detection Libraries (& API) Using Java/ColdFusion/CFML." James Moberg | Sciencx - Accessed . https://www.scien.cx/2024/09/05/comparing-language-detection-libraries-api-using-java-coldfusion-cfml/
IEEE
" » Comparing Language Detection Libraries (& API) Using Java/ColdFusion/CFML." James Moberg | Sciencx [Online]. Available: https://www.scien.cx/2024/09/05/comparing-language-detection-libraries-api-using-java-coldfusion-cfml/. [Accessed: ]
rf:citation
» Comparing Language Detection Libraries (& API) Using Java/ColdFusion/CFML | James Moberg | Sciencx | https://www.scien.cx/2024/09/05/comparing-language-detection-libraries-api-using-java-coldfusion-cfml/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.