Accents are converted to "??"

Topics: Argotic.Core
Oct 11, 2007 at 10:03 AM
Hello,
I have a problem with feeds using accent characters. Those feeds use UTF8 encoding but accents are always converted to ??.
I'am using the GenericFeed class to read those feeds.

Here is a feed which is not displayed correctly : http://feeds.feedburner.com/capitaine-commerce/Sfkd

Any ideas ?
Thanks !
Coordinator
Oct 11, 2007 at 1:42 PM
Djhi,

This issue has been fixed in the 2007.2 release. I am able to retrieve the feed you specified without loss of foreign language encoding characters.

Are you using the current release? If so, can you provide some sample code, as I am not seeing an encoding issue. This issue was addressed by the work item: Loss of valid UTF8 Encoded foreign language characters.
Coordinator
Oct 11, 2007 at 1:43 PM
Djhi,

I was not checking the GenericFeed class, I will investigate, as the fix for this issue may not have been applied properly to that class, I will get back to you ASAP.
Coordinator
Oct 11, 2007 at 1:45 PM
Djhi,

The GenericFeed class also is also retrieving the feed you specified without loss of foreign language encoding characters. I hope this is just a case of you not using the 2007.2 version, but if you are please post some sample code for me to investigate further.
Oct 11, 2007 at 3:30 PM
Oppositional,

I'am using the 2007.1 release... I'll try with the 2007.2 release.

Thanks :)
Oct 11, 2007 at 3:48 PM
Oppositional,

That was the problem, thanks again !
Coordinator
Oct 11, 2007 at 3:52 PM
Djhi,

Thanks for letting me know, it is good to get confirmation that the fix is working as expected.
Nov 5, 2007 at 12:11 PM
I’m still having this problem :(
When I´m loading this feed (http://blog.tv2.dk/englen20/rss.xml) I´m still seeing some bad conversions.
Coordinator
Nov 7, 2007 at 9:12 PM
Dear Morten,

This is caused by not using a native encoding of Unicode (using UTF-8) when parsing feed data for hexadecimal characters that are invalid in XML. This issue was fixed by the work item: Framework not handling foreign language encoded feeds properly. Fix will be public in the next release, thanks for the feedback!
Jun 17, 2008 at 11:36 AM
Edited Jun 23, 2008 at 12:00 PM
Hey
I'm Jan from and I making a website where the content comes from rss.
I've release 2008.0.1.0 of Argotic and I got still the problem with characters that are been read as "rectangle" and saved in the sql database as "?".  Als we get sometines Arabic and other characters special characters like Â

Maby it's usefull to say that we are from Belgium and all our rss items are in Dutch (Nederlands).

I looked into the source code of our Argotic version and found that the solution(gzip problem) of garazy has been applied.
 if (contentEncoding.Contains("GZIP"))
                {
                    stream  = new GZipStream(response.GetResponseStream(), CompressionMode.Decompress);
                }



It's really diffucult to give you an example because the rss items change to much.
Url example on 23 june 2008: http://www.site.kifkif.be/kifkif/rss.php?cat_id=4&page_class=three&open_menu_id=24
This item from the rssfeed "De controverse rond l’affaire Guigue" gets read as "De controverse rond l[]affaire Guigue"  (I use this [] as the rectangle, because I can't copy paste the rectangle character in this text editor)  and is saved in the sql database as "De controverse rond l?affaire Guigue".

A part of the html source: 
<?xml version="1.0" encoding="iso-8859-1" ?><rss version="2.00" xmlns:msxsl="urn:schemas-microsoft-com:xslt">
...
<div style="clear: both;"/><div class="entry"><h3><a href="http://site.kifkif.be/kifkif/nieuws.php?nws_id=1729&amp;page_class=three&amp;open_menu_id=24">De controverse rond l’affaire Guigue</a>


Is there any one who found a working solution for this problem?
Coordinator
Jul 1, 2008 at 6:06 PM
The framework will pre-process feed data to isolate common issues like invalid hexadecimal characters in XML. It does an inspection of the encoding attribute of the XML document
to determine which encoding to use when processing the stream data. If the feed content is encoded in a format that is different than that specified in the XML document, you can pass a
SyndicationResourceLoadSettings instance as a parameter that allows you to specify the correct encoding for the feed.

Hope this helps, if not please open an new issue and if possivle provide sample XML data to test issue against.
Jul 16, 2008 at 7:41 AM
Edited Jul 16, 2008 at 7:44 AM
Thank you for the repley.
I have done your solution and put the CharacterEncoding to default (unicode give an error).

            Dim settings As New SyndicationResourceLoadSettings
            settings.CharacterEncoding = System.Text.Encoding.Default

            Dim feed As New Argotic.Syndication.RssFeed
            feed = Argotic.Syndication.RssFeed.Create(New Uri(rssItem.Hyperlink), settings)


This resolve the rectangle and ? character convertion problem.
But now we got problems with other characters like é that's been translated to é and ë to ë

                 ?John Mayer wijst huwelijksaanzoek af?  (before)
                 ’John Mayer wijst huwelijksaanzoek af’   (after)

example:   Meintjes café nog één keer open    (this is before we applied the solution with the settings)
                Meintjes café nog één keer open        (this is with the settings to defaul characterEncoding)

                Oud België    (before)
                Oud België  (after)

Is there a way that the character encoding can be autodetected and the feed been read in that encoding. Because we have a verry large number of feeds (arround the 1500) it's difficult to get everything good with the same setting.