PHP
RSS (Rich Site Summary) is one of the XML applications, in which separate elements (such as news headlines) are presented as completely allocated elements, which allow other sites to get the latest news in that format, in which they wish to receive and display them in their pages in full compliance with the rules of XML-data exchange.
RSS version 0.91 was developed by Netscape for their network "My Netscape Network", and it allows you to create XML-file, which contains information about the website and also individual elements, which have a "title", "link" and "description". That's great, but how can we after having received RSS-file extract information from it and publish it using HTML? Each language allows you to do this by its own, but here we'll use PHP with its built-in XML-parser. PHP uses "expat" library of James Clark, which you already have if you have installed Apache version 1.3.9 or later. To parse XML documents by using PHP, you need to include "with-xml" argument and compile PHP from source codes.
We'll write a simple script which parses RSS-file, extract information from it, formats it and displays it as a regular HTML. It not only serves as an example of that, how to parse XML in PHP, but it can also be included in any other script to display this information.
The first thing we need to do - is to create class which will store our titles:
class xItem {
var $xTitle;
var $xLink;
var $xDescription;
}
Then we need to define a few global variables for basic information about the website and array to store objects of the header:
$sTitle = "";
$sLink = "";
$sDescription = "";
$arItems = array();
$itemCount = 0;
Then our first two functions:
function startElement($parser, $name, $attrs) {
global $curTag;
$curTag .= "^$name";
}
function endElement($parser, $name) {
global $curTag;
$caret_pos = strrpos($curTag,^);
$curTag = substr($curTag,0,$caret_pos);
}
To parse XML in PHP, you need to define the functions, which are called in such cases:
- parser encounters a starting element of the tag;
- parser encounters a end element of the tag;
- parser encounters data between the start and end tags.
We will issue them as follows: by setting the global variable ($curTag) as a string, which contains all the parent tags, separated by ^. For example, for such XML-structure:
variable $curTag will looks like:
^RSS^CHANNEL^ITEM
All we need to do - is to find out when the parser will meet the right $curTag, and extract data in accordance with it. All this is done in the characterData function. Here it is:
function characterData($parser, $data) {
global $curTag; // initially channel information
global $sTitle, $sLink, $sDescription;
$titleKey = "^RSS^CHANNEL^TITLE";
$linkKey = "^RSS^CHANNEL^LINK";
$descKey = "^RSS^CHANNEL^DESCRIPTION";
if ($curTag == $titleKey) {
$sTitle = $data;
} elseif ($curTag == $linkKey) {
$sLink = $data;
} elseif ($curTag == $descKey) {
$sDescription = $data;
}
// now we get the elements
global $arItems, $itemCount;
$itemTitleKey = "^RSS^CHANNEL^ITEM^TITLE";
$itemLinkKey = "^RSS^CHANNEL^ITEM^LINK";
$itemDescKey = "^RSS^CHANNEL^ITEM^DESCRIPTION";
if ($curTag == $itemTitleKey) {
// create a new xItem
$arItems[$itemCount] = new xItem();
// set the properties of the new element
$arItems[$itemCount]->xTitle = $data;
} elseif ($curTag == $itemLinkKey) {
$arItems[$itemCount]->xLink = $data;
} elseif ($curTag == $itemDescKey) {
$arItems[$itemCount]->xDescription = $data;
// increase the counter
$itemCount++;
}
}
This function checks whether the $curTag contains necessary for us string, and if its so, then extracts data from it and assigned to variables. Initially it extracts the basic information about website, then checks if there are elements. If there are, then it creates xItem, inserts it into the array $arItems and sets the properties for the corresponding data in the RSS-file.
Now when functions are defined, we use the standard in PHP way for communication with XML-parser:
// basic cycle
$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData");
if (!($fp = fopen($uFile,"r"))) {
die ("Can not get RSS");
}
while ($data = fread($fp, 4096)) {
if (!xml_parse($xml_parser, $data, feof($fp))) {
die(sprintf("XML error: %s at line %d", xml_error_string(xml_get_error_code($xml_parser)), xml_get_current_line_number($xml_parser)));
}
}
xml_parser_free($xml_parser);
Anything that in the above code begins with "xml_" - is a standard XML-functions in PHP. We report a parser that our functions must be fulfilled, when it meets: the start tag, the end tag, or data, and then we load the RSS file (the variable $uFile must be set on the correct RSS-file) and then we run the parser (xml_parse).
Now when our data is disassembled on individual variables, its not difficult to transform them into HTML:
We have added some variables for description, font, its size.
When it comes to data exchange, its difficult to oppose anything to XML. Defining of XML-format, which can be used by many people (like RSS) - is just one of the benefits of this complex but elegant technology.
Full source code look here:
class xItem {
var $xTitle;
var $xLink;
var $xDescription;
}
// general vars
$sTitle = "";
$sLink = "";
$sDescription = "";
$arItems = array();
$itemCount = 0;
// ********* Start User-Defined Vars ************
// rss url goes here
$uFile = "http://www.wirelessdevnet.com/wirelessnews/rss/dailynews.rss";
// descriptions (true or false) goes here
$bDesc = true;
// font goes here
$uFont = "Verdana, Arial, Helvetica, sans-serif";
$uFontSize = "2";
// ********* End User-Defined Vars **************
function startElement($parser, $name, $attrs) {
global $curTag;
$curTag .= "^$name";
}
function endElement($parser, $name) {
global $curTag;
$caret_pos = strrpos($curTag,^);
$curTag = substr($curTag,0,$caret_pos);
}
function characterData($parser, $data) { global $curTag; // get the Channel information first
global $sTitle, $sLink, $sDescription;
$titleKey = "^RSS^CHANNEL^TITLE";
$linkKey = "^RSS^CHANNEL^LINK";
$descKey = "^RSS^CHANNEL^DESCRIPTION";
if ($curTag == $titleKey) {
$sTitle = $data;
}
elseif ($curTag == $linkKey) {
$sLink = $data;
}
elseif ($curTag == $descKey) {
$sDescription = $data;
}
// now get the items
global $arItems, $itemCount;
$itemTitleKey = "^RSS^CHANNEL^ITEM^TITLE";
$itemLinkKey = "^RSS^CHANNEL^ITEM^LINK";
$itemDescKey = "^RSS^CHANNEL^ITEM^DESCRIPTION";
if ($curTag == $itemTitleKey) {
// make new xItem
$arItems[$itemCount] = new xItem();
// set new item objects properties
$arItems[$itemCount]->xTitle = $data;
}
elseif ($curTag == $itemLinkKey) {
$arItems[$itemCount]->xLink = $data;
}
elseif ($curTag == $itemDescKey) {
$arItems[$itemCount]->xDescription = $data;
// increment item counter
$itemCount++;
}
}
// main loop
$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData");
if (!($fp = fopen($uFile,"r"))) {
die ("could not open RSS for input");
}
while ($data = fread($fp, 4096)) {
if (!xml_parse($xml_parser, $data, feof($fp))) {
die(sprintf("XML error: %s at line %d", xml_error_string(xml_get_error_code($xml_parser)), xml_get_current_line_number($xml_parser)));
}
}
xml_parser_free($xml_parser);
// write out the items
?>
for ($i=0;$i $txItem = $arItems[$i];
?>
xTitle); ?>
if ($bDesc) {
?>
xDescription); ?>
}
echo ("
");
}