Monday, May 2, 2011

Creating a Word Density Checker Function - PHP

I was looking for some help of writing my keyword density checker class and found this function and it is worth sharing. Have a look at it.


Function Code for calculating the word density goes as Follows:



<?php
function calculate_word_popularity($string, $min_word_char = 2, $exclude_words = array())
{
$string = strip_tags($string);

$initial_words_array  =  str_word_count($string, 1);
$total_words = sizeof($initial_words_array);

$new_string = $string;

foreach($exclude_words as $filter_word)
{
$new_string = preg_replace("/\b".$filter_word."\b/i", "", $new_string); // strip excluded words
}

$words_array = str_word_count($new_string, 1);

$words_array = array_filter($words_array, create_function('$var', 'return (strlen($var) >= '.$min_word_char.');'));

$popularity = array();

$unique_words_array = array_unique($words_array);

foreach($unique_words_array as $key => $word)
 {
 preg_match_all('/\b'.$word.'\b/i', $string, $out);

 $count = count($out[0]);

 $percent = number_format((($count * 100) / $total_words), 2); 

 $popularity[$key]['word'] = $word;
 $popularity[$key]['count'] = $count;
 $popularity[$key]['percent'] = $percent.'%';
 }

function cmp($a, $b)
{
    return ($a['count'] > $b['count']) ? +1 : -1;
}

usort($popularity, "cmp");

return $popularity;
}
?>


"This is a function that is meant to calculate the density of the words from a text. Since there are many words that have less then 3 characters, I’ve decided to add a filter that will not take into account words that aren’t bigger then (X) characters (examples: if, or, is, it etc.). Also, you can setup an array with a list of words that you do not want to add in the ranking calculation." - from http://www.bitrepository.com/word-popularity-script.html

Thanks.

2 comments: