Cesar D. Rodas, web development. Technology news. PHP, MySQL, Apache, C, Bash, ASM

PHP: Guess the language of a given text

There exist a open project called LibTextCat. I’ve used this class for many projects with greats results. What this project do is recieve a text as a parameter and return in what lang it is text written.

A great project for web-crawler and other kinds of projects that need text categorization. The only “problem” (that is not a problem) is that I haven’t found a PHP port of it.

So, I’ve implement the LibTextCat algorithm, and the result could be found here.

Also this package deliver me to my first nomination of PHPClasses innovation award.

A text can be written in many different idioms. Without a prior knowledge of the idiom on which a text is written, it is hard for a human to guess and eventually use an appropriate idiom translation tool.

This class can be used to guess the idiom of a text. It takes prebuilt data files that are used to give different weights to the presence of certain characters in a text that are more associated to an idiom.

This way the class can give a good idea of the idioms on which a given text is more likely to be written.

Manuel Lemos

Here is a little example of how it work


<?
include “saddorlibtextcat.php”;
$libtext = new SaddorLibTextCat();

$libtext
->WhatLang(“This is a text in english, so the first option when you
print the array of ranking it has to be english!!!, so is it work???”
);
print
“<pre>”;
print_r($libtext->ranking);
print
“</pre>”;

?>

Also this project right now is Public Domain.

Download it.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages. Digg del.icio.us StumbleUpon Technorati BlinkList Furl NewsVine Reddit

One Response to “PHP: Guess the language of a given text”

  1. Saddor :: Cesar D. Rodas » Blog Archive » PHP: implementing a heavy use of AJAX Says:

    […] Cesar D. Rodas, web development. Technology news. PHP, MySQL, Apache, C, Bash, ASM Control your house from Internet PHP: Guess the language of a given text […]

Enter your email address:

Delivered by FeedBurner

this Site

Archives

July 2007
S M T W T F S
« Jun   Aug »
1234567
891011121314
15161718192021
22232425262728
293031  

Syndication

Google