DISQUS

DISQUS Hello! Huddled Masses is using DISQUS, a powerful comment system, to manage its comments. Learn more.

Community Page

Huddled Masses

Joel Bennett's development blog...
Jump to original thread »
Author

HuddledParser 2.0

Started by Jaykul · 9 months ago

A long time ago, on a domain not my own, I wrote a PHP parser that handled pretty much all versions of RSS/RDF xml feeds. I released it for free, and a few people used it. It wasn’t pretty code, but it was small, and it was easy to use, and it didn’t [... ... Continue reading »

10 comments

  • Great little parser.
    It has saved me a lot of work and headaches as this one handles the atom feeds as well (gmail/blogger etc.) in contrast to most others.
    I'm using it from my private page to view my feeds.
    I hope you don't mind that I build a little on it.
    I have adjusted it a little to use cURL instead of fopen.
    This in turn allowed me to add some more code to view authenticated pages (gmail in my case)simply by supplying it with a url like: https://gmailusername:password@gmail.google.com....
    The only other thing I changed was moving the $showSummary bolean to the parseFeed function as for some reason I didn't get summaries with the original setup.
    If you are interested I'll be happy to send you the changed code.
    Thanks and good luck
    Jan
  • Well, I admit I hadn't tried it before, mainly because I'm just using this on web-pages, and I don't have any private feeds that I want to make public [;)] but I just tried it on _my_ gmail atom feed, and it works fine with https://user:password@host using fopen.
  • Joel, I would suggest you use fsocket instead of fopen - just because fsocket supports a connection timeout to the remote url. This way if a feed is unavailable you can handle it.

    Other than that it sounds like a nice little package. I will have to download it and see if it will suit my needs better than the current parser I use, lastRSS (which also uses fopen but is easilly modified to use fsocket).
  • Joel, one more thing - to guarantee uniqueness on your saved cache file name you can try this:

    $cacheName = $this->cacheFolder . '/xmlcache_' . md5($url);

    using the md5 hash on the url will pretty much guarantee that you get a unique file name for each cached feed.
  • any thoughts on allowing for enclosure fetching?
  • I have been looking around for a PHP script that can parse feeds but preferably liberally - from what you say this seems to do that (Magpie doesn't). However you only parse for summaries and my aim is to build a complete server side personal RSS aggregator (the current ones about don't 'float my boat', I have 'needs').

    Any ideas? is this suitable with modifications?

    Thanks
  • Of course you could use this and modify it ... it's really just a simple example of how to use PHP's xml parsing ;), The only reason it's limited to what it is, is that I'm just not interested in competing with existing server-side aggregators, all I wanted was something I could use on my site.
  • is there a live demo of this? would really like to see this at work.
  • is there a n0ob proof man for the installation for this script? would be great if that script could be used by other people than nerds. tzhank you very much for your reply.

    Ben
  • Hello.

    Thank you for HuddledParser. I've been trying forever to figure out how to pull the link element from Blogger and HuddledParser is doing it, except Blogger's feed includes two link elements and it's adding both to the href attribute.

    Example: href="https://www.blogger.com/atom/8090863/109380437787746685http://onlytheweb.blogspot.com/2004/08/quick-reply-in-opera-mail.html" title="">Quick Reply in Opera Mail

    How can I remove the first link?

    Thanks.

Add New Comment

Returning? Login