FreeSticky Logo
           Free content for your website
Available FREE Content
Cartoons & Jokes
Competitions
Educational
Entertainment
Financial
Gambling
Games
Guides & Directories
Horoscopes & Astrology
Industry
Kids and Teenagers
Lottery Results
Maps
Message Boards
Miscellaneous
News Headlines
Photographs
Portals
Programming
Software
Sports
Syndicated Articles
Syndicated Articles (Manual)
Tickers
Tips and Advice
Tools
Weather

FreeSticky Main
Home
Advertise with us
Contact
Link to us
About FreeSticky

StickyGOLD
Selected free content
Interesting Facts
On This Day in History
Ultimate Questions
The Daily Phrazzle®
Hollywood Lessons
Random Joke
Lottery Picker
Sticky Tetris
Wine Content
The CodeProject Latest
Top Software Downloads
 
Site Features
SEARCH Freesticky
Latest Additions
Article Index (Content Syndication)
This Weeks featured article
Recommended Books
Submission Guidelines
Made with FreeSticky
Webmaster Links
Create your own FreeSticky

Knowledge is Power

- Syndicated Columns - 
Just for Webmasters
Sticky Marketing
Reflections
Sticky Business
Web Clues
WMP Weekly Webmaster Tip
Articles on FreeSticky
- Article Index -
Content Hunting
Syndicate your content with ASP
Use Perl to harness XML data sources
Online Taxonomy
Free Scientific Content ?
Finding Content Through Networking
Creating Targeted Content
Viral Marketing Tips
Copyright Your Content
Rolling out the free content
Themestream.com is Closing its Doors: Another Door Will Open
Developing Free Content
Ezine Publishing Primer
Content Writing Quickstart
Free Content Generator
Free Reprint Rights
Using the F - word in sales
Amazon Micro-payments
How to build an online community
Add thousands of Free News Headlines using Servlets and JSP
Content Writing Editing Tips
Free Content as a linking strategy
Plagiarism on the internet
Should you really give away your content ?
Writing free articles to promote your website
Where To Get Great Ideas For Writing Your Next Article
Use free sticky content to keep your web site visitors returning
Practical content creation
Professional Website Marketing Strategies
Syndicate your content using PHP and MySQL
How to generate referral content
Why good copywriting (and editing) matters
Writing Original Topics
Syndicate your own content
Ezine Promotion
EBook Publishing
Desktop Syndicator Released
Self Publish your own articles
This Day in History - on your website
Self Publishing for profit AND pleasure
Recycle Your Old Ezine Articles!
Community Tips
Usability Issues
Creating Quality Web Site Content
I Have Nothing to Say!
How to write an article and generate a revenue

Premium Sites
Certifications


Article List


Use Perl to harness XML Data Sources
A step by step guide to process the Moreover XML headline feeds

19th February 2001

Web gurus are constantly telling us that great content is the key to an even greater website. But, with available ‘content wizards’ and JavaScript code snippets offering the somewhat limited implementation of content from external sources for your site, where do you turn to for headlines, statistics and other useful data for use on a website? The answer? It must be XML !

The Perl server-side scripting language is the ultimate partner for XML as it enables you to actually use the data from XML sources. At this point we are assuming a basic knowledge of the Perl language and how to upload and maintain scripts on a server. 

Perl lets developers get web pages (and other files) from the web via the use of its LWP module. The following script will download a web page and pass it on to the user: 

#!/usr/bin/perl 
use LWP::Simple; 

$WebPage=get(‘http://www.freesticky.com/stickyweb/default.asp’);  # $Webpage now holds FreeSticky.com page 

print ‘Status: HTTP/2.0 200 OK\nContent-type: text/html\n\n’; # Print Headers for web browser viewing 

print $WebPage; 

When uploaded to a Perl/CGI-enabled host and viewed through a web browser, this script should display the Freesticky.com homepage. As you’ll have guessed, the get() can now also be used to retrieve XML documents.

You’ll find many sources for XML-formatted data on the web, but some may limit your commercial usage of such content. Moreover.com are famous for their Javascript-creating web wizard, but did you know that they also offer their content in the form of XML for your own customized use. A full list of XML addresses from Moreover is available here

For the following example we are going to use the Microsoft Corporation XML feed (http://p.moreover.com/cgi-local/page?c=Microsoft%20news&o=xml).

On Moreover, content is offered in the form:

<article id="ARTICLE_ID">

<url>ARTICLE_URL</url>

<headline_text>HEADLINE_CLIPPET</headline_text>

<source>ORIGINATION_OF_ARTICLE</source>

<media_type>text</media_type>

<cluster>moreover...</cluster>

<tagline> </tagline>

<document_url>ORIGINATION_WEB_ADDRESS</document_url>

<harvest_time>TIME_HARVESTED</harvest_time>

<access_registration> </access_registration>

<access_status> </access_status>

</article> 

Of course, document headers surround these repeating clusters of data, but these are the pieces of data we’ll be working with.

So, to start writing a Perl script to collect, parse and redisplay this data, we’ll start off with the mandatory headers:

#!/usr/bin/perl

use LWP::Simple;

$_=get(‘http://p.moreover.com/cgi-local/page?c=Microsoft%20news&o=xml’);

You may want to replace the XML address with your preferred choice, but at this point we’ll have the entire XML page in $_. Now we can run a loop which will, while it can still find the start of a new article (<article id="ARTICLE_ID">) the script will find each piece of information - headline text, source URL, etc - and place it in individual arrays.

while (m/<article id=”/) { #Find start of new article 

#First let’s get the URL 
$_=$’; #Now $_ contains all data after the latest ‘<article id="’ 
m/<url>/; #Get first piece of article data - a link 
$_=$’; #$_ contains URL and rest of data 
m#</url>#; #$` contains text before latest find of ‘</url>’ and $’ contains text after 
$URL[$ArticleNumber] = $`; 

#Now retrieve headline text 
$_=$’; #Set $_ to contain data after last find 
m/<headline_text>/; #Get the headline start 
$_=$’; #$_ contains headline and rest of data 
m#</headline_text>#; #$` contains text before latest find of ‘</headline_text>’ and $’ contains text after 
$Headline[$ArticleNumber] = $`; #$Headline[$ArticleNumber] contains headline 

#Now retrieve source of article 
$_=$’; #Set $_ to contain data after last find 
m/<source>/; #Get the source start 
$_=$’; #$_ contains source and rest of data 
m#</source>#; #$` contains text before find of ‘</source>’ and $’ contains text after 
$Source[$ArticleNumber] = $`; #$Source[$ArticleNumber] contains article headline source 

#Now retrieve media type of article 
$_=$’; #Set $_ to contain data after last find 
m/<media_type>/; #Get the media type start 
$_=$’; #$_ contains media type and rest of data 
m#</media_type>#; #$` contains text before find of ‘</media_type>’ and $’ contains text after 
$MediaType[$ArticleNumber] = $`; #$MediaType[$ArticleNumber] contains the article’s media type 

#Now retrieve tagline of article 
$_=$’; #Set $_ to contain data after last find 
m/<tagline>/; #Get the tagline start 
$_=$’; #$_ contains tagline and rest of data 
m#</tagline>#; #$` contains text before find of ‘</tagline>’ and $’ contains text after 
$Tagline[$ArticleNumber] = $`; #$Tagline[$ArticleNumber] contains the article’s tagline 

#Now retrieve document URL of article 
$_=$’; #Set $_ to contain data after last find 
m/<document_url>/; #Get the document URL start 
$_=$’; #$_ contains document URL and rest of data 
m#</document_url>#; #$` contains text before find of ‘</document_url>’ and $’ contains text after 
$DocumentURL[$ArticleNumber] = $`; #$DocumentURL[$ArticleNumber] contains the article’s document URL 


#Now retrieve harvest time of article 
$_=$’; #Set $_ to contain data after last find 
m/<harvest_time>/; #Get the harvest time start 
$_=$’; #$_ contains harvest time and rest of data 
m#</harvest_time>#; #$` contains text before find of ‘</harvest_time>’ and $’ contains text after 
$HarvestTime[$ArticleNumber] = $`; #$HarvestTime[$ArticleNumber] contains the article’s time of harvest 

#Now retrieve access registration of article 
$_=$’; #Set $_ to contain data after last find 
m/<access_registration>/; #Get the access registration start 
$_=$’; #$_ contains access registration and rest of data 
m#</access_registration>#; #$` contains text before find of ‘</access_registration>’ and $’ contains text after 
$AccessRegistration[$ArticleNumber] = $`; #$AccessRegistration[$ArticleNumber] contains the article’s access registration 

#Now retrieve access status of article 
$_=$’; #Set $_ to contain data after last find 
m/<access_status>/; #Get the access status start 
$_=$’; #$_ contains access status and rest of data 
m#</access_status>#; #$` contains text before find of ‘</access_status>’ and $’ contains text after 
$AccessStatus[$ArticleNumber] = $`; #$AccessStatys[$ArticleNumber] contains the article’s access status 

$ArticleNumber++; # Increment the array number to store data about the same article 

We now have 9 arrays of article data, each of whose items correspond with another array. For example, the URL of the headline $Headline[5] can be found in $DocumentURL[5]. What can be now done with the data we now have in the arrays? The main thing you’ll probably want to do is simply display it. A simple piece of code which can follow the last loop is:

print ‘Status: HTTP/2.0 200 OK\nContent-type: text/html\n\n’; # HTTP Headers for viewing page through a web browser 

for ($Article=0; $Article < $ArticleNumber; $Article++) {     # Go through each article 

print "<A HREF=\"$DocumentURL[$Article]\">$Headline[$Article]</A><BR>$HarvestTime[$Article] from <A HREF=\"$URL[$Article]\">$Source[$Article]</A><BR><BR>"; 

The possibilities for XML are clearly endless - limitless distribution and representation of data from sources anywhere in the world; easily parsed and updateable automatically. What is more, Moreover.com are just one of many suppliers of harvested data and the market is growing as Microsoft promote this area. The outlook is certainly great for XML and content kings in the online world.

Copyright © 2001 Adam Waude. All Rights Reserved.

Author Information:
Adam Waude - adamwaude@hotmail.com 

See Also:
W3 Standards: www.w3.org/XML
The centre for standards and standards-setting information from W3.org

FreeSticky recommends
Apollo Hosting
Want Fast, Reliable Web Hosting? Click Here!


Globat.com
Web Hosting Made Easy-2500 MB, 75 GB Transfer -$7.95


Testkings

PowWeb
#1 Web Hosting Pick!!- Only $7.77/mo!


Dot5 Hosting
ONLY $5 Web Hosting!!!


Top 50 Online Syndication Company
Freesticky makes it into the top 50 online syndication companies



CompTIA ActualTests HP ISTQB Braindumps Microsoft PMI PRINCE2 Certkiller Symantec Vmware

Copyright © FreeSticky.com, 2000 - 2014. All Rights Reserved.
The products and external links referenced in this website are provided by parties other than FreeSticky.com.
FreeSticky.com makes no representations regarding either the products or external links.