Cesar D. Rodas, web development. Technology news. PHP, MySQL, Apache, C, Bash, ASM

Archive for March, 2008

Contributing with WordPress

Sunday, March 23rd, 2008

As you should know the Google Summer of code 2008 is near to its beginig. They are many projects where you can contribute writing free softwares for these project (extense functionality or write plugings almost all the times). There exists many projects where you can send proposal and if your proposals are accepted, you can work on the project.

I have wrote two proposals (and I hope they accept) for write applications for WordPress blog publishing system.

I will talk a few about my proposals, please write your feed back if you can.

WordPress Cache

This project focus to help those blogs with huge traffic. Unlike most of cache which only focus on server side cache, it will focus on server and client side cache, because server side cache offload the server but dont save bandwidth.

Projects features:

  • Cache in client.

    • The cache will send cache headers to the client with the TTL, so the browser can cache the contents. Also the browser can send a especial query to see if their cache is already valid.

  • Cache in server

    • Server compressed an uncompressed pages, useful to save bandwidth, and it will not overhead the server compressing for every request because the cache will have two versions of a page, the compressed and uncompressed one, and the cache will select which one to serve.

    • Storage independent: It will come with FS (cache are mere files), APC and Memcached storage support. Every developer will be free to write a storage driver by defining a simpling class with some methods.

 

Category suggester

This project is an idea that I got while I was reading the N-gram text Categorization paper wrote by William B. Cavnar and John M. Trenkle. The idea is to suggest categories to new post while it is written based on similarities with previous post’s categories.

Using n-grams (a n-gram is a sequence of n letters) instead of words had many advantages:

  • Language independent: it is not necessary to use stemming algorithm, to use word’s root (i.e. working, worked = work). Also, it doesn’t work only for English, since the algorithm learns about previous post.

  • Easy tokenization: it has a unique way to parse and extract features, since it is not working with words it doesn’t care about the language.

  • Perfect for blogs: It is perfect for blogs since most of authors do not write their post with a program that can check misspell errors, therefore most of them have errors. The n-grams are tolerant to misspell errors.

The project will be based on Cavnar’s work and also will extend it using Naive Bayes classifier, and others method used today to classify texts.

Enter your email address:

Delivered by FeedBurner

this Site

Archives

March 2008
S M T W T F S
« Jan   Jun »
 1
2345678
9101112131415
16171819202122
23242526272829
3031  

Syndication

Google