Saturday, April 30, 2011

PHP Simple HTML DOM Parser - an Intoduction

PHP Simple HTML DOM Parser, is one of the easiest DOM manipulation script written in PHP5+. Being Open source, it is free to use under the MIT License. Supporting invalid HTML, this parser is better then other PHP scripts that use complicated regexes to extract information from web pages. It helps to find tags on an HTML page with selectors just like jQuery.

Some of the Usage examples are:



Extracting all the Images and Links from a Web page:
// Create DOM from URL or file
$html = file_get_html('http://www.microsoft.com/');

// Extract links
foreach($html->find('a') as $element)
echo $element->href . '<br>'; 

// Extract images
foreach($html->find('img') as $element)
echo $element->src . '<br>';


Retrieving just plain text of a Web page:

echo file_get_html('http://www.yahoo.com/')->plaintext;


And there are lots of more things that can be done with it.
Thanks for reading this. I'll be writing more about it in future, because as the title suggests, it is just an introduction and there is lot more to come :).

23 comments: