Announcement

Collapse
No announcement yet.

ATOM to RSS filter.

Collapse
X
Collapse
First Prev Next Last
 
  • Filter
  • Time
  • Show
Clear All
new posts

    ATOM to RSS filter.

    Part 1 of 3.

    While this topic is specific to vBulletin, due to the fact that it affects RSS aggregators in general, I thought it would be good information to re-post here.


    I've been pulling hair out for some time now trying to figure out why sometimes my RSS aggregator works fine and other times is crashes.
    I finally pulled all of the logs and sorted out each and every RSS feed request until I came up with a list of broken feeds.
    It is annoying because if vB encounters a broken feed it just stops pulling all other feeds, rather than error the one feed and continuing to the next in the list.
    What threw me is when I manually tested the RSS it failed with an XML read error!
    I applied an XML extension in place of the RSS then it gave me a line read error???
    In each case, the sites had been recently upgraded. The RSS pull had worked before the updates.

    I opened a ticket about this issue and this is the reply I got-
    Hello,

    Thank you for contacting vBulletin Support. The sites in question are using a markup called ATOM.
    We only support XML/RSS/RSS2 in the RSS Feed Manager. ATOM's markup is incompatible.
    ATOM uses characters and formatting which resembles RSS but does not conform to RSS standards.

    All the best,
    Wayne Luke

    So, I went to the URL's and examined the page code for all the feeds no longer working.
    Sure enough!

    Code:
    <?xml version="1.0" encoding="UTF-8"?><feed xmlns="http://www.w3.org/2005/[B]Atom[/B]">
    This URL ends with RSS, reads as XML, yet is in fact ATOM.

    So, here is my question for the community.

    Does anyone know of a (free) script or process by which these ATOM feeds can be filtered and aggregated into vB?
    The only thing I could find which came close is a third party service called Feed Rinse, but that is a pay service, and presently down for upgrades.
    I would prefer a script I can host on my own. e.g.

    Code:
    https://myforum.php?external_script.php&http://somesite.com/thereATOMfeed.rss
    Scrapers don't work as they take the entire page content, not link by link and pass it as RSS.
    Perhaps some modification of a Scraper would work, but RSS encoding is not my forte'.
    So far as I can tell, vB has no plans to include ATOM into its aggregator.
    I, and no doubt the multitudes of vB users experiencing this issue, would be extremely grateful.

    FYI, in my search to resolve this issue, I have found several PHP and XLS scripts, most from around 2008, and none providing the desired effect.
    I have also tested several 3rd party services, none of which produced vB acceptable formatting, with one exception.

    Code:
    http://feedmix.novaclic.com/atom2rss.php?source=
    along with the ATOM feed URL.
    e.g. http://feedmix.novaclic.com/atom2rss...ting/index.rss

    This is NOT an RSS feed. And it does not aggregate into vB. It is an ATOM feed.
    But once processed through this service, it works in vB just fine.

    This is a free 3rd party host that seems to work perfectly for the conversion process!
    However...
    Firstly, it is a 3rd party host, which I loath to use.
    Secondly, it is hosted on a OVH-SAS French provider at 178.32.28.114, well known for spyware and monitoring data flow. This means anything you aggregate from this service is tracked and monitored. Not a good idea if your vB content is controversial.

    So, if anyone can provide a script like that used in this service, it will resolve the vB ATOM issue
    RSS (Rich Site Summary; originally RDF Site Summary; often called Really Simple Syndication) uses a family of standard web feed formats to publish frequently updated information: blog entries, news headlines, audio, video. An RSS document (called "feed", "web feed", or "channel") includes full or summarized text, and metadata, like publishing date and author's name.

    RSS feeds enable publishers to syndicate data automatically. A standard XML file format ensures compatibility with many different machines/programs. RSS feeds also benefit users who want to receive timely updates from favourite websites or to aggregate data from many sites.

    Subscribing to a website RSS removes the need for the user to manually check the website for new content. Instead, their browser constantly monitors the site and informs the user of any updates. The browser can also be commanded to automatically download the new data for the user.

    The name Atom applies to a pair of related Web standards. The Atom Syndication Format is an XML language used for web feeds, while the Atom Publishing Protocol (AtomPub or APP) is a simple HTTP-based protocol for creating and updating web resources.

    Web feeds allow software programs to check for updates published on a website. To provide a web feed, the site owner may use specialized software (such as a content management system) that publishes a list (or "feed") of recent articles or content in a standardized, machine-readable format. The feed can then be downloaded by programs that use it, like websites that syndicate content from the feed, or by feed reader programs that allow Internet users to subscribe to feeds and view their content.

    A feed contains entries, which may be headlines, full-text articles, excerpts, summaries, and/or links to content on a website, along with various metadata.

    The Atom format was developed as an alternative to RSS. Ben Trott, an advocate of the new format that became Atom, believed that RSS had limitations and flawsโ€”such as lack of on-going innovation and its necessity to remain backward compatibleโ€” and that there were advantages to a fresh design.

    Proponents of the new format formed the IETF Atom Publishing Format and Protocol Workgroup. The Atom syndication format was published as an IETF proposed standard in RFC 4287 (December 2005), and the Atom Publishing Protocol was published as RFC 5023 (October 2007).


    ATOM however was never widely accepted and because of the way ATOM feeds present themselves to look like RSS feeds many compatibility issues arose.
    Although ATOM was created some 5 years after the last RSS update and has many improvements over RSS, global users didn't want to change software that already worked.

    If instead of using the .rss extension, and say, they used .atom, it would have made things a whole lot simpler.

    Now, the global internet is about 75% RSS and 25% ATOM. That is after nearly a decade since ATOM's release.

    Feed managers/News Agents/Aggregators that support RSS but not ATOM often report bad formatting, XML errors, bad links or just crash when they reach an ATOM feed thought to be RSS.
    Newer tools can read all the formats, but for those that can't, this is an annoying issue.
    Some additional information. I decided to look into vB's repair work to the RSS Aggregator. Imagine my surprise to discover all they did was went back to the 5.1.x version that worked. This means that even as of 5.2.1a5 vBulletin is ONLY designed to use RSS and not ATOM.
    You'll note the GUID is a dead giveaway for RSS.

    RSS formating being:
    Code:
    <?xml version="1.0" encoding="UTF-8"?>
    <rss version="2.0">
      <channel>
        <title>Example RSS Feed</title>
        <item>
          <title>Example Item</title>
          <description>A summary.</description>
          <link>http://www.example.com/foo</link>
          <guid>http://www.example.com/foo</guid>
          <pubDate>Mon, 23 Sep 2013 03:00:05 GMT</pubDate>
        </item>
      </channel>
    </rss>
    Whereas ATOM format is thus:
    Code:
    <feed xmlns="http://www.w3.org/2005/Atom"
          xml:lang="en"
          xml:base="http://www.example.org">
      <id>http://www.example.org/myfeed</id>
      <title>My Simple Feed</title>
      <updated>2005-07-15T12:00:00Z</updated>
      <link href="/blog" />
      <link rel="self" href="/myfeed" />
      <entry>
        <id>http://www.example.org/entries/1</id>
        <title>A simple blog entry</title>
        <link href="/blog/2005/07/1" />
        <updated>2005-07-15T12:00:00Z</updated>
        <summary>This is a simple blog entry</summary>
      </entry>
    </feed>
    With ATOM the giveaway is the first line of code
    Code:
    <feed xmlns="http://www.w3.org/2005/Atom"

    If someone a little better at PHP than I could look this over and come up with an ATOM subroutine you'd be a hero to many.

    #2
    Part 2 of 3.


    Code:
    <?php if (!defined('VB_ENTRY')) die('Access denied.');
    /*========================================================================*\
    || ###################################################################### ||
    || # vBulletin 5.2.1# ||
    || # ------------------------------------------------------------------ # ||
    || # Copyright 2000-2016 vBulletin Solutions Inc. All Rights Reserved.  # ||
    || # This file may not be redistributed in whole or significant part.   # ||
    || # ----------------- VBULLETIN IS NOT FREE SOFTWARE ----------------- # ||
    || # http://www.vbulletin.com | http://www.vbulletin.com/license.html   # ||
    || ###################################################################### ||
    \*========================================================================*/
    
    class vB_External_Export_Rss extends vB_External_Export
    {
        // RSS information needed to fill format fields
        protected $rssinfo = array();
    
        // default language information
        protected $defaultLang = array();
    
        // assertor
        protected $assertor;
    
        protected function __construct()
        {
            parent::__construct();
            $this->loadDefLanguage();
            $this->assertor = vB::getDbAssertor();
        }
    
        protected function buildOutputFromItems($items, $options)
        {
            $this->loadRssInfo($options);
        }
    
        /**
         * Loads default language data needed for RSS output
         */
        protected function loadDefLanguage()
        {
            $langid = vB::getDatastore()->getOption('languageid');
            $languages = vB_Api::instanceInternal('language')->fetchAll();
            $this->defaultLang = $languages[$langid];
        }
    
        /**
         * Loads information needed for RSS output
         *
         * @param     array     Options to be considered for feed.
         */
        protected function loadRssInfo($options)
        {
            $description = $this->getPhraseFromGuid(vB_Page::PAGE_HOME, 'metadesc');
    
            $stylevars = vB_Api::instanceInternal('style')->fetchStylevars(array(vB::getDatastore()->getOption('styleid')));
            $imgdir = (!empty($stylevars['imgdir_misc']) AND !empty($stylevars['imgdir_misc']['imagedir'])) ? $stylevars['imgdir_misc']['imagedir'] : '';
            $this->rssinfo = array(
                'title' => vB::getDatastore()->getOption('bbtitle'),
                'link' => vB::getDatastore()->getOption('frontendurl'),
                'icon' => $imgdir . '/rss.png',
                'description' => $description,
                'ttl' => 60
            );
    
            $this->rssinfo = $this->applyRssOptions($options, $this->rssinfo);
        }
    
    
        /**
         *
         * Gather needed channel information for RSS items.
         * Like htmltitle which is a clean version of channel title.
         *
         * @param     Array     List of items to fetch channel information for.
         *
         * @return     Array     Array containing the needed channels information.
         */
        protected function getItemsChannelInfo($items)
        {
            $info = array();
            foreach ($items AS $id => $item)
            {
                if (!isset($info[$item['content']['channelid']]))
                {
                    $info[$item['content']['channelid']] = vB_Library::instance('node')->getNodeBare($item['content']['channelid']);
                }
            }
    
            return $info;
        }
    
        /**
         * Builds description tag content used in RSS outputs.
         *
         *     @param         String     Text to build description from.
         *    @param         Array     Options to consider building description.
         *
         *     @return     String     Description.
         *
         */
        protected function getItemDescription($text, $options)
        {
            // @TODO VBV-11108 description should be plain text only, replace this to use plain text parser when implemented
            if (!empty($options['fulldesc']))
            {
                $description = vB_String::htmlSpecialCharsUni(
                    vB_String::fetchCensoredText(
                        vB_String::stripBbcode($text, true, false, true, true)
                    )
                );
            }
            else
            {
                $description = vB_String::htmlSpecialCharsUni(
                    vB_String::fetchCensoredText(
                        vB_String::fetchTrimmedTitle(
                            vB_String::stripBbcode($text, true, false, true, true), vB::getDatastore()->getOption('threadpreview')
                        )
                    )
                );
            }
    
            return $description;
        }
    
        /**
         * Modifies RSS information used for output from given options
         *
         * @param     array     Options.
         */
        private function applyRssOptions($options, $info)
        {
            foreach ($this->options AS $name => $val)
            {
                if (isset($options[$name]))
                {
                    switch ($name)
                    {
                        case 'nodeid':
                            if (sizeof($options[$name]) == 1)
                            {
                                $channel = $this->assertor->getRow('vBForum:getPageInfoFromChannelId', array(
                                    'nodeid' => $options[$name]
                                ));
    
                                $info['title'] = vB_Phrase::fetchSinglePhrase('external_x_hyphen_y', array($info['title'], $this->getPhraseFromGuid($channel['guid'], 'title')));
                                $info['description'] = $this->getPhraseFromGuid($channel['guid'], 'metadesc');
                            }
                            else
                            {
                                $info['title'] = vB_Phrase::fetchSinglePhrase('external_x_hyphen_y', array($info['title'], implode(', ', $options[$name])));
                            }
                            break;
                        default:
                            break;
                    }
                }
            }
    
            return $info;
        }
    
        /**
         * Get metadescription phrase from a given page.guid
         *
         * @param     string     GUID
         * @param     string     Field to render phrase (title, metadesc)
         *
         * @return     string     Phrase
         */
        private function getPhraseFromGuid($guid, $phrase)
        {
            $guidforphrase = vB_Library::instance('phrase')->cleanGuidForPhrase($guid);
            $rows = $this->assertor->getRows('vBForum:phrase', array('languageid' => array($this->defaultLang['languageid'], 0, -1),
                'varname' => ('page_' . $guidforphrase . '_' . $phrase)
            ));
    
            $description = '';
            if (!empty($rows) AND is_array($rows) AND !isset($rows['errors']))
            {
                foreach ($rows AS $row)
                {
                    // get default lang phrase if possible
                    if ($row['languageid'] == $this->defaultLang['languageid'])
                    {
                        $description = $row['text'];
                    }
                    // default install set lang -1 for page phrases which change to the right langid or 0 on page edit/save.
                    else if (in_array($row['languageid'], array(0, -1)))
                    {
                        $description = $row['text'];
                    }
                }
            }
            else
            {
                $page = $this->assertor->getRow('vBForum:page', array('guid' => $guid));
                $description = $page[($phrase == 'title' ? $phrase : 'metadescription')];
            }
    
            return $description;
        }
    }
    
    /*=========================================================================*\
    || #######################################################################
    || # Downloaded: 13:52, Sun Mar 13th 2016
    || # CVS: $RCSfile$ - $Revision: 85802 $
    || #######################################################################
    \*=========================================================================*/

    Comment


      #3
      Part 3 of 3.

      hmmm, interesting...

      In doing a little research and testing I have discovered something more about hxxp://feedmix.novaclic.com/atom2rss.php
      They are doing far more than JUST an ATOM to RSS conversion.

      I have a Celtic site to which I've long wanted to add a Tumbler blog, hxxp://loki-in-myth.tumblr.com/rss
      Tumbler though is notorious for badly formatted RSS.
      In this example one can see that the feed outputs RSS (not ATOM); that elements of RSS 1.1 and 2.0 are used; that multiple description entries are used; that the formatting is not proper for any version of RSS.

      Code:
      <?xml version="1.0" encoding="UTF-8"?>
      <rss xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
      <channel>
      <description>A blog detailing Loki and his roles and interactions in Norse Mythology
      There is a lot of conflicting information in Norse mythology, so nothing can really be considered a perfect truth about Loki.  However, I aim to provide the most reasonable conclusions and explanations that I can, based on my research.
      At times, I will speculate on implications of Lokiโ€™s character/role/history from literary, mythological, and anthropological standpoints.  I will attempt to give proper explanation for each perspective.
      Father: Farbauti (or Bergelmir) (male)Mother: Laufey (or Nal) (female)Possible Siblings:
      Odin (m) and Hoenir (m) - contested in favour of idea that Odin and Loki are merely brothers in blood oathor
      Byleist (m) and Helblind (m) - most popular conception, but also most easily contested, as both these names are names used by Odinor
      Hler (m) [water], Karl (m) [air], and Ran (f) [the sea] - also contested, but prominent among Rokkr beliefs of a pantheon of gods predating the AesirRacial Affiliation: AsSpecies: God (a term used here to include As, Van, and Jotunn, as all are of the same species)Home: AsgardSpouse: Glut (1), Sigyn (2)Children: by Glut (f) (mother) - Eisa (f), Einmyria (f)
      by Angrboda (f) (father) - Fenrir (m), Midgardsormr (?), Hela (f)
      by Svadilfari (m) (father) - Sleipnir (?)
      by Sigyn (f) (mother) - Vali (m), Narfi (m)</description>
      <title>Loki: A Humble Dose of Mythos</title>
      <generator>Tumblr (3.0; @loki-in-myth)</generator>
      <link>http://loki-in-myth.tumblr.com/</link>
      <item>
      <title>What evidence is there that intertwined snakes were used as a symbol of Loki?</title>
      <description>&lt;p&gt;Sorry for the confusion.  There is a lot of archaeological evidence connecting the symbol to Loki, but I donโ€™t have any specific articles to recommend for that, so I simply cited archaeological evidence in general.  My citation of the Rundkvist article was more secondary in nature, simply to identify the prolific appearances of the symbol.  Iโ€™m planning to completely reformat this blog, so it can be more helpful to people trying to do their own research.  At that time, I will have combed through all of my articles/papers/texts/sources, so I will assign a more proper source than โ€œarchaeological evidenceโ€.  That wonโ€™t happen for another few months though, because my current studies take up all of my time.  Iโ€™m sorry for the lack of new posts as well, but it is for the same reason.  I am so incredibly thankful for your interest though, despite the current inactivity of this blog.  Iโ€™m very excited to finish my classes, so I can get back to this!&lt;/p&gt;</description>
      <link>http://loki-in-myth.tumblr.com/post/42625087535</link>
      <guid>http://loki-in-myth.tumblr.com/post/42625087535</guid>
      <pubDate>Fri, 08 Feb 2013 17:14:03 -0800</pubDate>
      <category>Loki Laufeyson</category>
      </item>
      </channel>
      </rss>
      I tested the feed in vB, and as expected is came back with an error
      XML Error: Not well-formed (invalid token) at Line 2
      On a whim, I filtered the feed through hxxp://feedmix.novaclic.com/atom2rss.php and to my surprise, IT WORKED.

      So, what have I learned from this experiment?
      A. Many websites use terrible feed formatting
      B. vB is set up to ONLY process RSS 2.0 and has no error trapping for any other version or bad formatting (it will error and crash)
      C. hxxp://feedmix.novaclic.com/atom2rss.php is doing more than just ATOM to RSS processing. It is testing for the presence of multiple RSS versions, even when ATOM is suppose to be passed to it; it is reorganizing the code; and it is outputting it into proper RSS 2.0 format which works in vB.

      There has to be a way to rewrite the existing vB RSS.php to do the same thing.

      Comment

      Users Viewing This Page

      Collapse

      There is 1 user viewing this forum topic.

      • Guest Guest

      Latest Posts

      Collapse

      Working...
      X
      Searching...Please wait.
      An unexpected error was returned: 'Your submission could not be processed because you have logged in since the previous page was loaded.

      Please push the back button and reload the previous window.'
      An unexpected error was returned: 'Your submission could not be processed because the token has expired.

      Please push the back button and reload the previous window.'
      An internal error has occurred and the module cannot be displayed.
      There are no results that meet this criteria.
      Search Result for "|||"