<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Language detection, a usability enhancer?</title>
	<atom:link href="http://friendlybit.com/other/language-detection-a-usability-enhancer/feed/" rel="self" type="application/rss+xml" />
	<link>http://friendlybit.com/other/language-detection-a-usability-enhancer/</link>
	<description>You have found Friendly Bit, a web development blog. I focus on client side technologies like CSS, HTML and Javascript. You find my articles below and categories to the right.</description>
	<pubDate>Fri, 04 Jul 2008 13:20:19 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5.1</generator>
		<item>
		<title>By: Emil Stenström</title>
		<link>http://friendlybit.com/other/language-detection-a-usability-enhancer/#comment-22388</link>
		<dc:creator>Emil Stenström</dc:creator>
		<pubDate>Thu, 15 Mar 2007 22:14:12 +0000</pubDate>
		<guid isPermaLink="false">http://friendlybit.com/other/language-detection-a-usability-enhancer/#comment-22388</guid>
		<description>@Peter Vigren: I'll see what I can do, I'm moving atm so not too much extra time available.</description>
		<content:encoded><![CDATA[<p>@Peter Vigren: I&#8217;ll see what I can do, I&#8217;m moving atm so not too much extra time available.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Peter Vigren</title>
		<link>http://friendlybit.com/other/language-detection-a-usability-enhancer/#comment-22298</link>
		<dc:creator>Peter Vigren</dc:creator>
		<pubDate>Sun, 11 Mar 2007 16:33:12 +0000</pubDate>
		<guid isPermaLink="false">http://friendlybit.com/other/language-detection-a-usability-enhancer/#comment-22298</guid>
		<description>I am also interested in example code for PHP. I find this quite intriguing, maybe because I find programming AND languages so fascinating. :-) But one thing, if you remove or translate characters like é etc, won't that make the result a bit weird? After all, if a language uses é a lot and others don't, shouldn't that be significant? In any case, I really enjoyed this article, first time I ever had heard of this so it tickles my brain like crazy! Thank you. :-)</description>
		<content:encoded><![CDATA[<p>I am also interested in example code for PHP. I find this quite intriguing, maybe because I find programming AND languages so fascinating. :-) But one thing, if you remove or translate characters like é etc, won&#8217;t that make the result a bit weird? After all, if a language uses é a lot and others don&#8217;t, shouldn&#8217;t that be significant? In any case, I really enjoyed this article, first time I ever had heard of this so it tickles my brain like crazy! Thank you. :-)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Emil Stenström</title>
		<link>http://friendlybit.com/other/language-detection-a-usability-enhancer/#comment-4010</link>
		<dc:creator>Emil Stenström</dc:creator>
		<pubDate>Sun, 27 Aug 2006 09:56:57 +0000</pubDate>
		<guid isPermaLink="false">http://friendlybit.com/other/language-detection-a-usability-enhancer/#comment-4010</guid>
		<description>@Dan Pettersson: so your algorithm gives different numbers for the exact same texts? That's some bug in your code. If you post a link to it (&lt;a href="http://se2.php.net/highlight_file" rel="nofollow"&gt;syntax highlight it&lt;/a&gt;) and I'll have a look.</description>
		<content:encoded><![CDATA[<p>@Dan Pettersson: so your algorithm gives different numbers for the exact same texts? That&#8217;s some bug in your code. If you post a link to it (<a href="http://se2.php.net/highlight_file" rel="nofollow">syntax highlight it</a>) and I&#8217;ll have a look.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dan Pettersson</title>
		<link>http://friendlybit.com/other/language-detection-a-usability-enhancer/#comment-3994</link>
		<dc:creator>Dan Pettersson</dc:creator>
		<pubDate>Sat, 26 Aug 2006 22:02:59 +0000</pubDate>
		<guid isPermaLink="false">http://friendlybit.com/other/language-detection-a-usability-enhancer/#comment-3994</guid>
		<description>Does anyone have an example of code (e.g. in PHP) that would compare two texts and give you a number...

In my own tests squaring and square rooting messes things up. E.g. if I duplicate a text it won't show the same result.

I've solved it by making my own version without the squares, and I think that works fine...</description>
		<content:encoded><![CDATA[<p>Does anyone have an example of code (e.g. in PHP) that would compare two texts and give you a number&#8230;</p>
<p>In my own tests squaring and square rooting messes things up. E.g. if I duplicate a text it won&#8217;t show the same result.</p>
<p>I&#8217;ve solved it by making my own version without the squares, and I think that works fine&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Emil Stenström</title>
		<link>http://friendlybit.com/other/language-detection-a-usability-enhancer/#comment-1653</link>
		<dc:creator>Emil Stenström</dc:creator>
		<pubDate>Mon, 19 Jun 2006 10:36:20 +0000</pubDate>
		<guid isPermaLink="false">http://friendlybit.com/other/language-detection-a-usability-enhancer/#comment-1653</guid>
		<description>@Adam Zakreski: The link indeed handles only blog content that technorati indexes. &lt;a href="http://ausweb.scu.edu.au/aw03/papers/edwards2/paper.html" rel="nofollow"&gt;Other sources&lt;/a&gt; have managed to approximate the number to 68% which means you are right there. Well done :)

&lt;a href="http://global-reach.biz/globstats/index.php3" rel="nofollow"&gt;Global Reach&lt;/a&gt; meassures the number of people online by language and finds 35%, something that tells much about the future of the web.</description>
		<content:encoded><![CDATA[<p>@Adam Zakreski: The link indeed handles only blog content that technorati indexes. <a href="http://ausweb.scu.edu.au/aw03/papers/edwards2/paper.html" rel="nofollow">Other sources</a> have managed to approximate the number to 68% which means you are right there. Well done :)</p>
<p><a href="http://global-reach.biz/globstats/index.php3" rel="nofollow">Global Reach</a> meassures the number of people online by language and finds 35%, something that tells much about the future of the web.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Adam Zakreski</title>
		<link>http://friendlybit.com/other/language-detection-a-usability-enhancer/#comment-1490</link>
		<dc:creator>Adam Zakreski</dc:creator>
		<pubDate>Fri, 16 Jun 2006 21:54:32 +0000</pubDate>
		<guid isPermaLink="false">http://friendlybit.com/other/language-detection-a-usability-enhancer/#comment-1490</guid>
		<description>I find your statement, "In fact, most of the content online is not in English." A little hard to swallow.  Even with the citation, the original source seems to only be refering to a certain type of blog posts.  I believe saying, "most blog content online is not in English," would be more accurate (though still a stretch).  I do find it a lot easier to believe that the Japanese are more avid bloggers than English speakers, though.</description>
		<content:encoded><![CDATA[<p>I find your statement, &#8220;In fact, most of the content online is not in English.&#8221; A little hard to swallow.  Even with the citation, the original source seems to only be refering to a certain type of blog posts.  I believe saying, &#8220;most blog content online is not in English,&#8221; would be more accurate (though still a stretch).  I do find it a lot easier to believe that the Japanese are more avid bloggers than English speakers, though.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Emil Stenström</title>
		<link>http://friendlybit.com/other/language-detection-a-usability-enhancer/#comment-667</link>
		<dc:creator>Emil Stenström</dc:creator>
		<pubDate>Tue, 16 May 2006 08:46:04 +0000</pubDate>
		<guid isPermaLink="false">http://friendlybit.com/other/language-detection-a-usability-enhancer/#comment-667</guid>
		<description>@Jesse Skinner: I wouldn't use a dictionary. Since the texts you will be testing are ordinary texts you should base your text data on that kind of texts. Say if English contains a lot of "in". Then that bigram should have a high percentage, whether the reason is because it's in many different words or that those words are common.</description>
		<content:encoded><![CDATA[<p>@Jesse Skinner: I wouldn&#8217;t use a dictionary. Since the texts you will be testing are ordinary texts you should base your text data on that kind of texts. Say if English contains a lot of &#8220;in&#8221;. Then that bigram should have a high percentage, whether the reason is because it&#8217;s in many different words or that those words are common.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jesse Skinner</title>
		<link>http://friendlybit.com/other/language-detection-a-usability-enhancer/#comment-654</link>
		<dc:creator>Jesse Skinner</dc:creator>
		<pubDate>Mon, 15 May 2006 15:25:33 +0000</pubDate>
		<guid isPermaLink="false">http://friendlybit.com/other/language-detection-a-usability-enhancer/#comment-654</guid>
		<description>Thanks for the algorithm.. I would never have known where to begin to implement something like this.

I think, maybe dictionary files would be the perfect place to build up a bigram statistic list. Those are usually (somewhat) easy to find.</description>
		<content:encoded><![CDATA[<p>Thanks for the algorithm.. I would never have known where to begin to implement something like this.</p>
<p>I think, maybe dictionary files would be the perfect place to build up a bigram statistic list. Those are usually (somewhat) easy to find.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Steve Tucker</title>
		<link>http://friendlybit.com/other/language-detection-a-usability-enhancer/#comment-647</link>
		<dc:creator>Steve Tucker</dc:creator>
		<pubDate>Sun, 14 May 2006 23:28:45 +0000</pubDate>
		<guid isPermaLink="false">http://friendlybit.com/other/language-detection-a-usability-enhancer/#comment-647</guid>
		<description>I try to always put some reference to the language in my markup documents - it makes logical sense. At worst cannot do any harm!</description>
		<content:encoded><![CDATA[<p>I try to always put some reference to the language in my markup documents - it makes logical sense. At worst cannot do any harm!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sarven Capadisli</title>
		<link>http://friendlybit.com/other/language-detection-a-usability-enhancer/#comment-638</link>
		<dc:creator>Sarven Capadisli</dc:creator>
		<pubDate>Sun, 14 May 2006 00:23:50 +0000</pubDate>
		<guid isPermaLink="false">http://friendlybit.com/other/language-detection-a-usability-enhancer/#comment-638</guid>
		<description>Concerning:
# Assisting search engines

I am not sure how effective the lang attribute is. 

A while back I had a problem with this in fact. 
I've placed lang="en" on my pages, however google had a problem choosing the wrong language for one of my articles. 

The only reasoning I could come up with was that the article (at the time) was  more popular in the German (de) community.

So I had to write this:
&lt;a href="http://www.csarven.ca/google-deutschland-stole-my-article" rel="nofollow"&gt;Google Deutschland stole my article&lt;/a&gt;

Although this is no longer an issue, I am still spectacle when it comes to search engines (in fact anything) if they are really acknowledging the language of a given document.</description>
		<content:encoded><![CDATA[<p>Concerning:<br />
# Assisting search engines</p>
<p>I am not sure how effective the lang attribute is. </p>
<p>A while back I had a problem with this in fact.<br />
I&#8217;ve placed lang=&#8221;en&#8221; on my pages, however google had a problem choosing the wrong language for one of my articles. </p>
<p>The only reasoning I could come up with was that the article (at the time) was  more popular in the German (de) community.</p>
<p>So I had to write this:<br />
<a href="http://www.csarven.ca/google-deutschland-stole-my-article" rel="nofollow">Google Deutschland stole my article</a></p>
<p>Although this is no longer an issue, I am still spectacle when it comes to search engines (in fact anything) if they are really acknowledging the language of a given document.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
