IE6 bug: Encode and ignore

I’ve previously talked about encodings and tried to explain how they work. This time I’ll show you a bug in IE6 that is based on encoding problems. Because if you don’t watch out, IE6 might ignore whole rules in your stylesheet. If you want you can see the example right away: Ignore encoding example. Open in in IE6 and compare with the rendering in a modern browser.

I’ve never seen this bug mentioned before so I took the liberty of naming it the “Encode and Ignore bug”. If you find it somewhere else, please tell, and I’ll use that name instead.

Now. Stylesheets unfortunately have no way of specifying encoding. So you type away in your favourite text editor set to some obfuscate Greek charset and of course expect everything to work. It often does. CSS works with very few characters; mostly you just use the letters A-Z, some brackets, and colons. Since most charsets have those positioned similarly there’s no problem. But there are exceptions where you want to use other letters too: inside comments and in the future: in generated content.

[Update: Niels Leenheer points out that there are two ways to specify encoding on stylesheets. Either using the method in the encodings article to send a proper HTTP header, or using the @charset “utf-8”; rule. The latter is just a rule you put on the first line of the CSS. Even seems to have decent browser support. Thanks Niels!]

So this Swedish friend of mine is learning CSS and I’m helping him out when he notices a strange error. When setting his html document to be encoded in utf-8 IE6 starts to display the page differently. I had never seen anything like it and start digging through the code. After like half an hour I find the culprit: an “å”-letter in a comment!

What he had done was add a comment after one of the colors he used, /* ljusblå */ (“light blue” in Swedish). When IE switches the HTML to UTF-8 the CSS seems to be switched with it. In UTF-8 mode the incorrectly encoded “å”-letter means something else, and IE not only ignores the comment or the line, but everything following it (still inside the current rule). So about half of a rule was ignored. I researched further and found that it was only triggered when the strange character was at the end of a comment.

Interesting, and easy to miss. The solution is of course to encode your CSS in the same charset as your HTML, or if you’re lazy put some characters after the culprit. A very simple (and kinda rare) problem, but I thought it might save you an hour of debugging sometime.

5 responses to “IE6 bug: Encode and ignore

  1. Hi Emil,

    I believe there are two fundamental mistakes in your article. First of all, this is not a bug and secondly, there are two ways to specify the encoding of an css stylesheet.

    Let me explain. Western character encodings use a single byte for each character. UTF-8 uses multiple bytes. Some characters use a single byte, other two or three. The first byte of such a doublet or triplet is an indication of how many bytes that particular character is using.

    When you use the å character in one of the ISO-8859 encodings it will use the same byte value as the first byte in one of those UTF-8 multibyte characters.

    So IE treats the å as the start of a multibyte character. The next bytes are part of the same character. Now, if that next byte is the asterisk of the comment closing marker it will break the marker and the comment will simply never end. For example /*å*/ will become something like this: /*?/. The questionmark represents one multibyte UTF-8 character.

    So, this is normal behavoir and certainly not a bug. IE just parses the stylesheet as instructed.

    Secondly. There are two ways to specify the charset encoding of the stylesheet. First of all you can configure the server to tell the browser by using HTTP headers, or you could use the @charset “at-rule”. For example, start your css file with: @charset “UTF-8”;

  2. @Niels: I could understand the bug if it was å*/, but as you see in the example there’s a space in between the å and the *. That’s a bug then isn’t it?

    I had heard of the @charset rule but didn’t think there where any browser support for it. Is there?

    This blog is more practical than adhearing to specifications :) Good comment, and thanks for pointing out my errors.

  3. IE6 treats the iso-8859-1 string “å *” as a three-byte character when interpreted as utf-8. Other browsers don’t. (I’m not sure which is correct.)

    @charset is supported by at least Gecko, Opera and recent WebKit builds, and it seems IE6 supports it too, actually.

  4. @emil: In the ISO 8859 character set the character å is represented by the hexadecimal value of E5.

    If the first byte of a character in the UTF-8 charset starts with a hexadecimal value between E0 and EF, then the character consists of three bytes.

    So, both the å, the space and the asterisk would be part of a single character…

  5. @Niels: Interesting. Then it’s not a IE bug, just a difference in handling between IE and Firefox. Interesting to see Firefox handling stuff more code less strict than IE. I’m blessed with skilled readers!

Comments are closed.