PHP: Guess the language of a given text
There exist a open project called LibTextCat. I’ve used this class for many projects with greats results. What this project do is recieve a text as a parameter and return in what lang it is text written.
A great project for web-crawler and other kinds of projects that need text categorization. The only “problem” (that is not a problem) is that I haven’t found a PHP port of it.
So, I’ve implement the LibTextCat algorithm, and the result could be found here.
Also this package deliver me to my first nomination of PHPClasses innovation award.
A text can be written in many different idioms. Without a prior knowledge of the idiom on which a text is written, it is hard for a human to guess and eventually use an appropriate idiom translation tool.
This class can be used to guess the idiom of a text. It takes prebuilt data files that are used to give different weights to the presence of certain characters in a text that are more associated to an idiom.
This way the class can give a good idea of the idioms on which a given text is more likely to be written.
Manuel Lemos
Here is a little example of how it work
<?
include “saddorlibtextcat.php”;
$libtext = new SaddorLibTextCat();
$libtext->WhatLang(“This is a text in english, so the first option when you
print the array of ranking it has to be english!!!, so is it work???”);
print “<pre>”;
print_r($libtext->ranking);
print “</pre>”;?>
Also this project right now is Public Domain.
Download it.
Share and Enjoy:
These icons link to social bookmarking sites where readers can share and discover new web pages.
July 17th, 2007 at 5:52 pm
[…] Cesar D. Rodas, web development. Technology news. PHP, MySQL, Apache, C, Bash, ASM Control your house from Internet PHP: Guess the language of a given text […]