<?xml version="1.0" encoding="UTF-8"?><rss
version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
> <channel><title>Comments on: Language detection, a usability enhancer?</title> <atom:link href="http://friendlybit.com/other/language-detection-a-usability-enhancer/feed/" rel="self" type="application/rss+xml" /><link>http://friendlybit.com/other/language-detection-a-usability-enhancer/</link> <description>You have found Friendly Bit, a web development blog. I focus on client side technologies like CSS, HTML and Javascript. You find my articles below and categories to the right.</description> <lastBuildDate>Thu, 11 Mar 2010 16:54:48 +0000</lastBuildDate> <generator>http://wordpress.org/?v=2.9.2</generator> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <item><title>By: Emil Stenström</title><link>http://friendlybit.com/other/language-detection-a-usability-enhancer/#comment-22388</link> <dc:creator>Emil Stenström</dc:creator> <pubDate>Thu, 15 Mar 2007 22:14:12 +0000</pubDate> <guid
isPermaLink="false">http://friendlybit.com/other/language-detection-a-usability-enhancer/#comment-22388</guid> <description>@Peter Vigren: I&#039;ll see what I can do, I&#039;m moving atm so not too much extra time available.</description> <content:encoded><![CDATA[<p>@Peter Vigren: I&#8217;ll see what I can do, I&#8217;m moving atm so not too much extra time available.</p> ]]></content:encoded> </item> <item><title>By: Peter Vigren</title><link>http://friendlybit.com/other/language-detection-a-usability-enhancer/#comment-22298</link> <dc:creator>Peter Vigren</dc:creator> <pubDate>Sun, 11 Mar 2007 16:33:12 +0000</pubDate> <guid
isPermaLink="false">http://friendlybit.com/other/language-detection-a-usability-enhancer/#comment-22298</guid> <description>I am also interested in example code for PHP. I find this quite intriguing, maybe because I find programming AND languages so fascinating. :-) But one thing, if you remove or translate characters like é etc, won&#039;t that make the result a bit weird? After all, if a language uses é a lot and others don&#039;t, shouldn&#039;t that be significant? In any case, I really enjoyed this article, first time I ever had heard of this so it tickles my brain like crazy! Thank you. :-)</description> <content:encoded><![CDATA[<p>I am also interested in example code for PHP. I find this quite intriguing, maybe because I find programming AND languages so fascinating. :-) But one thing, if you remove or translate characters like é etc, won&#8217;t that make the result a bit weird? After all, if a language uses é a lot and others don&#8217;t, shouldn&#8217;t that be significant? In any case, I really enjoyed this article, first time I ever had heard of this so it tickles my brain like crazy! Thank you. :-)</p> ]]></content:encoded> </item> <item><title>By: Emil Stenström</title><link>http://friendlybit.com/other/language-detection-a-usability-enhancer/#comment-4010</link> <dc:creator>Emil Stenström</dc:creator> <pubDate>Sun, 27 Aug 2006 09:56:57 +0000</pubDate> <guid
isPermaLink="false">http://friendlybit.com/other/language-detection-a-usability-enhancer/#comment-4010</guid> <description>@Dan Pettersson: so your algorithm gives different numbers for the exact same texts? That&#039;s some bug in your code. If you post a link to it (&lt;a href=&quot;http://se2.php.net/highlight_file&quot; rel=&quot;nofollow&quot;&gt;syntax highlight it&lt;/a&gt;) and I&#039;ll have a look.</description> <content:encoded><![CDATA[<p>@Dan Pettersson: so your algorithm gives different numbers for the exact same texts? That&#8217;s some bug in your code. If you post a link to it (<a
href="http://se2.php.net/highlight_file">syntax highlight it</a>) and I&#8217;ll have a look.</p> ]]></content:encoded> </item> <item><title>By: Dan Pettersson</title><link>http://friendlybit.com/other/language-detection-a-usability-enhancer/#comment-3994</link> <dc:creator>Dan Pettersson</dc:creator> <pubDate>Sat, 26 Aug 2006 22:02:59 +0000</pubDate> <guid
isPermaLink="false">http://friendlybit.com/other/language-detection-a-usability-enhancer/#comment-3994</guid> <description>Does anyone have an example of code (e.g. in PHP) that would compare two texts and give you a number...In my own tests squaring and square rooting messes things up. E.g. if I duplicate a text it won&#039;t show the same result.I&#039;ve solved it by making my own version without the squares, and I think that works fine...</description> <content:encoded><![CDATA[<p>Does anyone have an example of code (e.g. in PHP) that would compare two texts and give you a number&#8230;</p><p>In my own tests squaring and square rooting messes things up. E.g. if I duplicate a text it won&#8217;t show the same result.</p><p>I&#8217;ve solved it by making my own version without the squares, and I think that works fine&#8230;</p> ]]></content:encoded> </item> <item><title>By: Emil Stenström</title><link>http://friendlybit.com/other/language-detection-a-usability-enhancer/#comment-1653</link> <dc:creator>Emil Stenström</dc:creator> <pubDate>Mon, 19 Jun 2006 10:36:20 +0000</pubDate> <guid
isPermaLink="false">http://friendlybit.com/other/language-detection-a-usability-enhancer/#comment-1653</guid> <description>@Adam Zakreski: The link indeed handles only blog content that technorati indexes. &lt;a href=&quot;http://ausweb.scu.edu.au/aw03/papers/edwards2/paper.html&quot; rel=&quot;nofollow&quot;&gt;Other sources&lt;/a&gt; have managed to approximate the number to 68% which means you are right there. Well done :)&lt;a href=&quot;http://global-reach.biz/globstats/index.php3&quot; rel=&quot;nofollow&quot;&gt;Global Reach&lt;/a&gt; meassures the number of people online by language and finds 35%, something that tells much about the future of the web.</description> <content:encoded><![CDATA[<p>@Adam Zakreski: The link indeed handles only blog content that technorati indexes. <a
href="http://ausweb.scu.edu.au/aw03/papers/edwards2/paper.html">Other sources</a> have managed to approximate the number to 68% which means you are right there. Well done :)</p><p><a
href="http://global-reach.biz/globstats/index.php3">Global Reach</a> meassures the number of people online by language and finds 35%, something that tells much about the future of the web.</p> ]]></content:encoded> </item> <item><title>By: Adam Zakreski</title><link>http://friendlybit.com/other/language-detection-a-usability-enhancer/#comment-1490</link> <dc:creator>Adam Zakreski</dc:creator> <pubDate>Fri, 16 Jun 2006 21:54:32 +0000</pubDate> <guid
isPermaLink="false">http://friendlybit.com/other/language-detection-a-usability-enhancer/#comment-1490</guid> <description>I find your statement, &quot;In fact, most of the content online is not in English.&quot; A little hard to swallow.  Even with the citation, the original source seems to only be refering to a certain type of blog posts.  I believe saying, &quot;most blog content online is not in English,&quot; would be more accurate (though still a stretch).  I do find it a lot easier to believe that the Japanese are more avid bloggers than English speakers, though.</description> <content:encoded><![CDATA[<p>I find your statement, &#8220;In fact, most of the content online is not in English.&#8221; A little hard to swallow.  Even with the citation, the original source seems to only be refering to a certain type of blog posts.  I believe saying, &#8220;most blog content online is not in English,&#8221; would be more accurate (though still a stretch).  I do find it a lot easier to believe that the Japanese are more avid bloggers than English speakers, though.</p> ]]></content:encoded> </item> <item><title>By: Emil Stenström</title><link>http://friendlybit.com/other/language-detection-a-usability-enhancer/#comment-667</link> <dc:creator>Emil Stenström</dc:creator> <pubDate>Tue, 16 May 2006 08:46:04 +0000</pubDate> <guid
isPermaLink="false">http://friendlybit.com/other/language-detection-a-usability-enhancer/#comment-667</guid> <description>@Jesse Skinner: I wouldn&#039;t use a dictionary. Since the texts you will be testing are ordinary texts you should base your text data on that kind of texts. Say if English contains a lot of &quot;in&quot;. Then that bigram should have a high percentage, whether the reason is because it&#039;s in many different words or that those words are common.</description> <content:encoded><![CDATA[<p>@Jesse Skinner: I wouldn&#8217;t use a dictionary. Since the texts you will be testing are ordinary texts you should base your text data on that kind of texts. Say if English contains a lot of &#8220;in&#8221;. Then that bigram should have a high percentage, whether the reason is because it&#8217;s in many different words or that those words are common.</p> ]]></content:encoded> </item> <item><title>By: Jesse Skinner</title><link>http://friendlybit.com/other/language-detection-a-usability-enhancer/#comment-654</link> <dc:creator>Jesse Skinner</dc:creator> <pubDate>Mon, 15 May 2006 15:25:33 +0000</pubDate> <guid
isPermaLink="false">http://friendlybit.com/other/language-detection-a-usability-enhancer/#comment-654</guid> <description>Thanks for the algorithm.. I would never have known where to begin to implement something like this.I think, maybe dictionary files would be the perfect place to build up a bigram statistic list. Those are usually (somewhat) easy to find.</description> <content:encoded><![CDATA[<p>Thanks for the algorithm.. I would never have known where to begin to implement something like this.</p><p>I think, maybe dictionary files would be the perfect place to build up a bigram statistic list. Those are usually (somewhat) easy to find.</p> ]]></content:encoded> </item> <item><title>By: Steve Tucker</title><link>http://friendlybit.com/other/language-detection-a-usability-enhancer/#comment-647</link> <dc:creator>Steve Tucker</dc:creator> <pubDate>Sun, 14 May 2006 23:28:45 +0000</pubDate> <guid
isPermaLink="false">http://friendlybit.com/other/language-detection-a-usability-enhancer/#comment-647</guid> <description>I try to always put some reference to the language in my markup documents - it makes logical sense. At worst cannot do any harm!</description> <content:encoded><![CDATA[<p>I try to always put some reference to the language in my markup documents &#8211; it makes logical sense. At worst cannot do any harm!</p> ]]></content:encoded> </item> <item><title>By: Sarven Capadisli</title><link>http://friendlybit.com/other/language-detection-a-usability-enhancer/#comment-638</link> <dc:creator>Sarven Capadisli</dc:creator> <pubDate>Sun, 14 May 2006 00:23:50 +0000</pubDate> <guid
isPermaLink="false">http://friendlybit.com/other/language-detection-a-usability-enhancer/#comment-638</guid> <description>Concerning:
# Assisting search enginesI am not sure how effective the lang attribute is.A while back I had a problem with this in fact.
I&#039;ve placed lang=&quot;en&quot; on my pages, however google had a problem choosing the wrong language for one of my articles.The only reasoning I could come up with was that the article (at the time) was  more popular in the German (de) community.So I had to write this:
&lt;a href=&quot;http://www.csarven.ca/google-deutschland-stole-my-article&quot; rel=&quot;nofollow&quot;&gt;Google Deutschland stole my article&lt;/a&gt;Although this is no longer an issue, I am still spectacle when it comes to search engines (in fact anything) if they are really acknowledging the language of a given document.</description> <content:encoded><![CDATA[<p>Concerning:<br
/> # Assisting search engines</p><p>I am not sure how effective the lang attribute is.</p><p>A while back I had a problem with this in fact.<br
/> I&#8217;ve placed lang=&#8221;en&#8221; on my pages, however google had a problem choosing the wrong language for one of my articles.</p><p>The only reasoning I could come up with was that the article (at the time) was  more popular in the German (de) community.</p><p>So I had to write this:<br
/> <a
href="http://www.csarven.ca/google-deutschland-stole-my-article">Google Deutschland stole my article</a></p><p>Although this is no longer an issue, I am still spectacle when it comes to search engines (in fact anything) if they are really acknowledging the language of a given document.</p> ]]></content:encoded> </item> </channel> </rss>
<!-- This site's performance optimized by W3 Total Cache. Dramatically improve the speed and reliability of your blog!

Learn more about our WordPress Plugins: http://www.w3-edge.com/wordpress-plugins/

Minified using disk
Page Caching using disk (user agent is rejected)
Database Caching 12/26 queries in 0.022 seconds using disk

Served from: c8.67.364a.static.theplanet.com @ 2010-03-12 13:13:54 -->