RssFeed not recognizing "media" Extension

Topics: Argotic.Core, Argotic.Extensions
Feb 2, 2008 at 4:55 AM
I am trying to consume a rss feed on PhotoBucket that uses the media extension from Yahoo. However, it seems that when consuming the feed, for some reason any tag with the media prefix is omitted.

Here is an example album feed:
http://s268.photobucket.com/albums/jj22/star_wisher2013/feed.rss

There are 2 namespaces declared in the <rss> tag (media and dc), but only dc is shown when doing a rssFeed.ToString(), even though viewing the feed output shows the media declaration is there. Moreover, the <media:content> elements ( and any other with media prefix) are completely missing from the <item> element.

I iterated through SyndicationAdapter.FrameworkExtensions dictionary, and I see both the dc and media extensions are there.

With no luck, I read through another thread that had a similar problem (http://www.codeplex.com/Argotic/Thread/View.aspx?ThreadId=11614), but it seems that example uses an older version. But I tried it anyway (using more current classes, ex YahooMediaExtension):

Uri pbUri = UrlBuilder.ToUri("http://s268.photobucket.com/albums/jj22/star_wisher2013/feed.rss");

SyndicationFeedSettings settings = new SyndicationFeedSettings();
YahooMediaExtension ext = new YahooMediaExtension();
settings.SupportedExtensions.Add(ext.Namespace, ext);

RssFeed feed = RssFeed.Create(pbUri, settings);

In either case, here is the output:

Raw RSS
<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:media="http://search.yahoo.com/mrss/">

RSS after calling RssFeed.Create
<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/">

I would really appreciate some help with this. Thanks - Joe.
Feb 2, 2008 at 5:08 AM
Ok I figured it out. The YahooMediaExtension class has this code:

public YahooMediaExtension() : base()
{
this.Namespace = new Uri("http://search.yahoo.com/mrss");
// ...
}

And the PhotoBucket feed has this as the media namespace URI:
http://search.yahoo.com/mrss/

As you can see, the extra slash at the end is causing the problem. I saved the feed to a file, removed the slash, and it suddenly worked.

Just for formality, I checked the Yahoo spec and it has the slash omitted, so this is technically a PhotoBucket bug. However how should the Argotic code handle these slight discrepancies?
Coordinator
Feb 4, 2008 at 5:07 PM
Dear Joe,

I had thought I had setup the framework to handle when publishers treat a XML namespace URI as a URL, but obviously I missed something. The next version supports the automatic handling of this use case where the publisher incorrectly uses an XML namespace that does not conform to the actual specification (which happens more often than you would think). Your comment is now going to be a test case to ensure the Argotic framework handles this common mistake by publishers.
Coordinator
Feb 20, 2008 at 10:24 PM
Created work item Syndication extensions that are published with incorrect XML namespace information fail to load. to verify this issue is added as a feature in the next release.
Feb 27, 2008 at 5:55 PM
Sorry for the delayed reply. I wound up implementing a workaround where I sanitize the feed XML by stripping the extra slash.

I was thinking that even if you include some support for detecting variances in the URI one would always deal with the possibility that the incorrect URI wouldn't still be detected. For instance let say you started using a regex pattern to detect extra slashes and it worked, but then another feed provider mistyped a letter. The consuming app would be back to implementing a workaround.

You may want to include some support for consumer applications to supply their own overridden namespace URIs in these cases. For example, let's say I had an app that aggregated several feed all using the media extension, and some of them have varying and incorrect namespace URIs. Instead of sanitizing the XML beforehand I could populate some dictionary (probably static) within your framework with all the incorrect URIs that map to the media extension. When RssFeed objects are created and XML loaded, your code that checks the existing dictionary for all supported extensions would then pick up these "temp" URIs. By giving the app more control over these inefficiencies you could reduce code bloat by trying to anticipate all these quirky cases.
Feb 27, 2008 at 5:55 PM
Edited Feb 27, 2008 at 5:57 PM
!
Coordinator
Feb 27, 2008 at 9:03 PM
Joe,

The abstract SyndicationExtension base class that framework and custom extensions inherit from defines a protected virtual method called ExistsInSource(XPathNavigator) that returns a boolean value that indicates if the extension is present in the syndication data source being parsed. If developers encounter a feed that is malformed/non-conformant per the extension specification, they can certainly sub-class one or more of the framework's implementations of a syndication extension and override this method to plug-in their own custom logic.

However, the current base implementation that is used to determine if an extension exists in the source is to extract all of the XML namespaces on the source using XPathNavigator.GetNamespacesInScope(XmlNamespaceScope.ExcludeXml), and then first checks if the XML namespace for the extension exists in the collection values, and if the fails, checks if the XML prefix for the extension exists in the collection keys. While you will occasionally find feeds that do not use the correct XML namespace, the XML prefix is rarely incorrect, and so this default implementation should catch the quirky cases. This does introduce the possibility of collisions since prefixes are not the actual identifiers, but in practice this would be a very rare occurrence.

Hope this clears up this issue, and hopefully a quick email to the 'offending' sites will get them to fix their non-compliant feeds, and thus reduce the headaches developers like me face. :-)