Tap into the newsfeeds
For sites that publish regular news updates, the best option is to tap into the newsfeeds that have already been prepared for syndication in standard Rich Site Summary (RSS) format. Using XML, RSS newsfeeds contains the headline and sometimes a brief description of an article, as well as the URL of the complete article. This can be an excellent way to retrieve the essential information, and makes it easy to keep up with news from sectors that are important to you.
Many Web sites that publish news on a regular basis also provide a newsfeed. In addition, specialized news aggregators such as http://w.moreover.com/categories/category_list_xml.html build their own channels on specific subjects, that include leading newsfeeds from individual sites on that theme. Both types of newsfeed can be configured to appear in Watznew’s single interface, and be automatically updated at regular intervals.
To locate some interesting newsfeeds, we used the http://www.newsisfree.com/ and http://www.syndic8.com/ web sites that maintain directories of many thousands of newsfeeds, on just about every subject that is dealt with on the Web. Just find the newsfeed that interests you, enter its URL into Watznew, and the news will regularly fall onto your desktop. This is perhaps one of the best uses of Watznew. Although Watznew can be configured to check sources at any frequency – every minute if you desire - in practice NewsIsFree limits users to one update per feed per hour, to avoid server saturation.
Monitoring HTML pages on the Web is possible with Watznew’s HTTP channels, based on the elements that you have specified using HTML tags and the character range. This can be an interesting option where no newsfeed is available. But as the publisher points out, this software is not really adapted to monitoring entire pages, since crawling the whole page on the look out for changes would be too time-consuming. And monitoring an entire site is simply not an option with Watznew.
The second type of channel is for POP3 e-mail accounts. These are straightforward to put in place: the subject, sender and date of creation of new e-mails are displayed in Watznew, and your e-mail client can be launched by clicking on the message.
Finally, it is also possible to configure your own channels using a Perl script. Many basic scripts are included, such as IPCONFIG.pl and Environment Variables.pl that display respectively the network data for your machine and the environment variables. For automatic page monitoring, only the first two modules are of interest: HTTP GET.pl, that makes it possible to download a page and extract various elements from it, and perhaps Ping, to check server availability.
These Perl modules are correctly interfaced with Watznew. Configuring a new channel for a news site such as Yahoo News is straightforward for anyone who masters regular expressions. All that is required is a piece of code, that is used to identify and extract elements from a document, such as the title and page URL, and display them in the Watznew interface. Certain sources may have specific anti-agent protection (user agent detection, session cookies, multiple redirections…). Watznew handles some of these protections, but in these cases it is necessary to add one or two lines of Perl in the capture script. The interface enables visualisation of HTTP headers and cookies sent for debugging purposes. Using Perl and its powerful regular expressions is a good option in Watznew, although this feature is reserved for initiates.