Part 1 of 3.
While this topic is specific to vBulletin, due to the fact that it affects RSS aggregators in general, I thought it would be good information to re-post here.
I've been pulling hair out for some time now trying to figure out why sometimes my RSS aggregator works fine and other times is crashes.
I finally pulled all of the logs and sorted out each and every RSS feed request until I came up with a list of broken feeds.
It is annoying because if vB encounters a broken feed it just stops pulling all other feeds, rather than error the one feed and continuing to the next in the list.
What threw me is when I manually tested the RSS it failed with an XML read error!
I applied an XML extension in place of the RSS then it gave me a line read error???
In each case, the sites had been recently upgraded. The RSS pull had worked before the updates.
I opened a ticket about this issue and this is the reply I got-
So, I went to the URL's and examined the page code for all the feeds no longer working.
Sure enough!
This URL ends with RSS, reads as XML, yet is in fact ATOM.
So, here is my question for the community.
Does anyone know of a (free) script or process by which these ATOM feeds can be filtered and aggregated into vB?
The only thing I could find which came close is a third party service called Feed Rinse, but that is a pay service, and presently down for upgrades.
I would prefer a script I can host on my own. e.g.
Scrapers don't work as they take the entire page content, not link by link and pass it as RSS.
Perhaps some modification of a Scraper would work, but RSS encoding is not my forte'.
So far as I can tell, vB has no plans to include ATOM into its aggregator.
I, and no doubt the multitudes of vB users experiencing this issue, would be extremely grateful.
FYI, in my search to resolve this issue, I have found several PHP and XLS scripts, most from around 2008, and none providing the desired effect.
I have also tested several 3rd party services, none of which produced vB acceptable formatting, with one exception.
along with the ATOM feed URL.
e.g. http://feedmix.novaclic.com/atom2rss...ting/index.rss
This is NOT an RSS feed. And it does not aggregate into vB. It is an ATOM feed.
But once processed through this service, it works in vB just fine.
This is a free 3rd party host that seems to work perfectly for the conversion process!
However...
Firstly, it is a 3rd party host, which I loath to use.
Secondly, it is hosted on a OVH-SAS French provider at 178.32.28.114, well known for spyware and monitoring data flow. This means anything you aggregate from this service is tracked and monitored. Not a good idea if your vB content is controversial.
So, if anyone can provide a script like that used in this service, it will resolve the vB ATOM issue
RSS (Rich Site Summary; originally RDF Site Summary; often called Really Simple Syndication) uses a family of standard web feed formats to publish frequently updated information: blog entries, news headlines, audio, video. An RSS document (called "feed", "web feed", or "channel") includes full or summarized text, and metadata, like publishing date and author's name.
RSS feeds enable publishers to syndicate data automatically. A standard XML file format ensures compatibility with many different machines/programs. RSS feeds also benefit users who want to receive timely updates from favourite websites or to aggregate data from many sites.
Subscribing to a website RSS removes the need for the user to manually check the website for new content. Instead, their browser constantly monitors the site and informs the user of any updates. The browser can also be commanded to automatically download the new data for the user.
The name Atom applies to a pair of related Web standards. The Atom Syndication Format is an XML language used for web feeds, while the Atom Publishing Protocol (AtomPub or APP) is a simple HTTP-based protocol for creating and updating web resources.
Web feeds allow software programs to check for updates published on a website. To provide a web feed, the site owner may use specialized software (such as a content management system) that publishes a list (or "feed") of recent articles or content in a standardized, machine-readable format. The feed can then be downloaded by programs that use it, like websites that syndicate content from the feed, or by feed reader programs that allow Internet users to subscribe to feeds and view their content.
A feed contains entries, which may be headlines, full-text articles, excerpts, summaries, and/or links to content on a website, along with various metadata.
The Atom format was developed as an alternative to RSS. Ben Trott, an advocate of the new format that became Atom, believed that RSS had limitations and flawsโsuch as lack of on-going innovation and its necessity to remain backward compatibleโ and that there were advantages to a fresh design.
Proponents of the new format formed the IETF Atom Publishing Format and Protocol Workgroup. The Atom syndication format was published as an IETF proposed standard in RFC 4287 (December 2005), and the Atom Publishing Protocol was published as RFC 5023 (October 2007).
ATOM however was never widely accepted and because of the way ATOM feeds present themselves to look like RSS feeds many compatibility issues arose.
Although ATOM was created some 5 years after the last RSS update and has many improvements over RSS, global users didn't want to change software that already worked.
If instead of using the .rss extension, and say, they used .atom, it would have made things a whole lot simpler.
Now, the global internet is about 75% RSS and 25% ATOM. That is after nearly a decade since ATOM's release.
Feed managers/News Agents/Aggregators that support RSS but not ATOM often report bad formatting, XML errors, bad links or just crash when they reach an ATOM feed thought to be RSS.
Newer tools can read all the formats, but for those that can't, this is an annoying issue.
Some additional information. I decided to look into vB's repair work to the RSS Aggregator. Imagine my surprise to discover all they did was went back to the 5.1.x version that worked. This means that even as of 5.2.1a5 vBulletin is ONLY designed to use RSS and not ATOM.
You'll note the GUID is a dead giveaway for RSS.
RSS formating being:
Whereas ATOM format is thus:
With ATOM the giveaway is the first line of code
If someone a little better at PHP than I could look this over and come up with an ATOM subroutine you'd be a hero to many.
While this topic is specific to vBulletin, due to the fact that it affects RSS aggregators in general, I thought it would be good information to re-post here.
I've been pulling hair out for some time now trying to figure out why sometimes my RSS aggregator works fine and other times is crashes.
I finally pulled all of the logs and sorted out each and every RSS feed request until I came up with a list of broken feeds.
It is annoying because if vB encounters a broken feed it just stops pulling all other feeds, rather than error the one feed and continuing to the next in the list.
What threw me is when I manually tested the RSS it failed with an XML read error!
I applied an XML extension in place of the RSS then it gave me a line read error???
In each case, the sites had been recently upgraded. The RSS pull had worked before the updates.
I opened a ticket about this issue and this is the reply I got-
Hello,
Thank you for contacting vBulletin Support. The sites in question are using a markup called ATOM.
We only support XML/RSS/RSS2 in the RSS Feed Manager. ATOM's markup is incompatible.
ATOM uses characters and formatting which resembles RSS but does not conform to RSS standards.
All the best,
Wayne Luke
Thank you for contacting vBulletin Support. The sites in question are using a markup called ATOM.
We only support XML/RSS/RSS2 in the RSS Feed Manager. ATOM's markup is incompatible.
ATOM uses characters and formatting which resembles RSS but does not conform to RSS standards.
All the best,
Wayne Luke
Sure enough!
Code:
<?xml version="1.0" encoding="UTF-8"?><feed xmlns="http://www.w3.org/2005/[B]Atom[/B]">
So, here is my question for the community.
Does anyone know of a (free) script or process by which these ATOM feeds can be filtered and aggregated into vB?
The only thing I could find which came close is a third party service called Feed Rinse, but that is a pay service, and presently down for upgrades.
I would prefer a script I can host on my own. e.g.
Code:
https://myforum.php?external_script.php&http://somesite.com/thereATOMfeed.rss
Perhaps some modification of a Scraper would work, but RSS encoding is not my forte'.
So far as I can tell, vB has no plans to include ATOM into its aggregator.
I, and no doubt the multitudes of vB users experiencing this issue, would be extremely grateful.
FYI, in my search to resolve this issue, I have found several PHP and XLS scripts, most from around 2008, and none providing the desired effect.
I have also tested several 3rd party services, none of which produced vB acceptable formatting, with one exception.
Code:
http://feedmix.novaclic.com/atom2rss.php?source=
e.g. http://feedmix.novaclic.com/atom2rss...ting/index.rss
This is NOT an RSS feed. And it does not aggregate into vB. It is an ATOM feed.
But once processed through this service, it works in vB just fine.
This is a free 3rd party host that seems to work perfectly for the conversion process!
However...
Firstly, it is a 3rd party host, which I loath to use.
Secondly, it is hosted on a OVH-SAS French provider at 178.32.28.114, well known for spyware and monitoring data flow. This means anything you aggregate from this service is tracked and monitored. Not a good idea if your vB content is controversial.
So, if anyone can provide a script like that used in this service, it will resolve the vB ATOM issue
RSS (Rich Site Summary; originally RDF Site Summary; often called Really Simple Syndication) uses a family of standard web feed formats to publish frequently updated information: blog entries, news headlines, audio, video. An RSS document (called "feed", "web feed", or "channel") includes full or summarized text, and metadata, like publishing date and author's name.
RSS feeds enable publishers to syndicate data automatically. A standard XML file format ensures compatibility with many different machines/programs. RSS feeds also benefit users who want to receive timely updates from favourite websites or to aggregate data from many sites.
Subscribing to a website RSS removes the need for the user to manually check the website for new content. Instead, their browser constantly monitors the site and informs the user of any updates. The browser can also be commanded to automatically download the new data for the user.
The name Atom applies to a pair of related Web standards. The Atom Syndication Format is an XML language used for web feeds, while the Atom Publishing Protocol (AtomPub or APP) is a simple HTTP-based protocol for creating and updating web resources.
Web feeds allow software programs to check for updates published on a website. To provide a web feed, the site owner may use specialized software (such as a content management system) that publishes a list (or "feed") of recent articles or content in a standardized, machine-readable format. The feed can then be downloaded by programs that use it, like websites that syndicate content from the feed, or by feed reader programs that allow Internet users to subscribe to feeds and view their content.
A feed contains entries, which may be headlines, full-text articles, excerpts, summaries, and/or links to content on a website, along with various metadata.
The Atom format was developed as an alternative to RSS. Ben Trott, an advocate of the new format that became Atom, believed that RSS had limitations and flawsโsuch as lack of on-going innovation and its necessity to remain backward compatibleโ and that there were advantages to a fresh design.
Proponents of the new format formed the IETF Atom Publishing Format and Protocol Workgroup. The Atom syndication format was published as an IETF proposed standard in RFC 4287 (December 2005), and the Atom Publishing Protocol was published as RFC 5023 (October 2007).
ATOM however was never widely accepted and because of the way ATOM feeds present themselves to look like RSS feeds many compatibility issues arose.
Although ATOM was created some 5 years after the last RSS update and has many improvements over RSS, global users didn't want to change software that already worked.
If instead of using the .rss extension, and say, they used .atom, it would have made things a whole lot simpler.
Now, the global internet is about 75% RSS and 25% ATOM. That is after nearly a decade since ATOM's release.
Feed managers/News Agents/Aggregators that support RSS but not ATOM often report bad formatting, XML errors, bad links or just crash when they reach an ATOM feed thought to be RSS.
Newer tools can read all the formats, but for those that can't, this is an annoying issue.
Some additional information. I decided to look into vB's repair work to the RSS Aggregator. Imagine my surprise to discover all they did was went back to the 5.1.x version that worked. This means that even as of 5.2.1a5 vBulletin is ONLY designed to use RSS and not ATOM.
You'll note the GUID is a dead giveaway for RSS.
RSS formating being:
Code:
<?xml version="1.0" encoding="UTF-8"?> <rss version="2.0"> <channel> <title>Example RSS Feed</title> <item> <title>Example Item</title> <description>A summary.</description> <link>http://www.example.com/foo</link> <guid>http://www.example.com/foo</guid> <pubDate>Mon, 23 Sep 2013 03:00:05 GMT</pubDate> </item> </channel> </rss>
Code:
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en" xml:base="http://www.example.org"> <id>http://www.example.org/myfeed</id> <title>My Simple Feed</title> <updated>2005-07-15T12:00:00Z</updated> <link href="/blog" /> <link rel="self" href="/myfeed" /> <entry> <id>http://www.example.org/entries/1</id> <title>A simple blog entry</title> <link href="/blog/2005/07/1" /> <updated>2005-07-15T12:00:00Z</updated> <summary>This is a simple blog entry</summary> </entry> </feed>
Code:
<feed xmlns="http://www.w3.org/2005/Atom"
If someone a little better at PHP than I could look this over and come up with an ATOM subroutine you'd be a hero to many.
Comment