Current issues with Microformats

A couple of weeks ago I attended Geek Meet in Stockholm. It’s a small group of people gathering to talk about semantics and share their knowledge of front-end webdev. It was the second meet and, just like on the last one, I had a great time. It’s something special meeting up with people that share your interest for the small niche front-end web development is.

On that meeting Peter Krantz held a presentation about microformats, a way to standardize small snippets of HTML you use often. What was different with the presentation is that it wasn’t just praise like many new techniques get. The presentation showed both the good and bad sides, and didn’t close with a final conclusion of whether it should be used or not. This article is my view of microformats (and includes some of the issues from Peter’s presentation).

What are microformats?

Microformats are small standardized snippets of HTML. They are standardized to make it easier for crawlers/robots to find certain types of information. One idea could be to make contact information easier to find for a robot and letting anyone build a directory out of this information. Browsers could support this and display information in some special way. Sounds like a pretty good idea to start with, doesn’t it?

Microformats are not restricted to contact information, hCards. There’s also hCalendar for description of events, hReview for reviews of things and so on. You’ll find the full list on the microformats wiki.

An example of a hCard:

Tantek Çelik

Technorati

Serious problems with the format

Microformats are not perfect. The first problem is that they don’t use namespaces. Namespaces are some way of describing what you mean with a word and in this case it applies to the class name of the formats. Check the example above. If you wrote a robot, how would you recognise an hCard? The only information available is the class name of the above. This means you would need to search each page for that class name. This is quite inefficient from a parser’s point of view.

What’s worse, this means no one can use the class name “vcard” on their pages, that would make the robots get faulty information. Even this post probably breaks a few parsers because the code example above looks pretty much like an hCard. I hear the counter argument, “but no one uses vcard as a class name if they don’t use it for that”, I can only tell you I just did.

Multiplying the damage of not using namespaces is versioning. Compare with HTML, where in early versions of HTML a Doctype was not needed. Then the authors decided that the format needed an update, and with the update new elements were added. Without a doctype the browsers couldn’t tell the difference between the formats and the doctype was added in. This is exactly what will happen with hCard (or any other microformat that doesn’t use namespaces) if needs to get updated.

Is versioning a problem? Not if the format is perfect from the start, without any kinds of problems that need to be fixed in future releases. You see where this is going right?

Except for the issue with namespaces and parsing of class names we also have dates. Dates with time included are in the hCalendar format marked up with the abbr element. It looks something like this:


21 june 10:30

Is the title attribute used correctly here? I think not. The abbr element is meant to give users that do not know what a certain term/expression stands for a short explaination. In the example above it’s not easy guessing that “T” stands for time and that the number afterwards is a notation of time zone (GMT-7 or PDT-7?). All of this is of course possible to read from the ISO specification, but people hovering your link will not do that. The title attribute on abbr elements should be humanly readable!

If you still don’t agree with that you can check out ISOs own text about ISO 8601. Under “Advantages of ISO 8601” you can see:

  • Easily readable and writeable by systems
  • Easily comparable and sortable
  • Language independent
  • Larger units are written in front of smaller units
  • For most representations the notation is short and of constant length

While all of those are good points, none of them have anything to do with being easy to read for a human. This is the very point of the title attribute of the abbr element. For the sake of discussion Tantek wrote about the choice of ISO 8601 when the use of title was included.

Improving the format

The idea of making it easier for robots to parse pages is a good idea, but I’m not sure embedding content for robots into the human content is the best solution. What about putting this information into your web feed instead? Feeds are made to be readable by robots and since they are xml they can easily be expanded with new types of data.

If I were to consider microformats I would like the two main issues above to be fixed first. Add some kind of namespaces to the formats and make them required. One idea would be to use a meta element for each format you want to use inside the document. The name is the format you use, the content is what version you are using, and the scheme is a link to the microformat profile that corresponds to that format. Note that the use of this element should be required, not optional.

It would look something like this (you can of course use several formats by repeating the meta element):



The second improvement would be to use something else than the title attribute for the dates. Titles should be used for human readable content, not machine readable so that needs to change. Switching to another format for dates seems like a bad idea since ISO 8601 is so widely spread. So, what to do? I’m not sure. My best bet would be to use the class attribute like most of the microformats are built up on. The dates do not have a space in them so they can be parsed just like other class names.

Conclusion

Even though microformats are basically trying to make it easier to parse the web (an idea I like) I’m not sure this is the right way of doing it. Human and robot content are most often different things and mixing those in the same document could mean you end up with a mess. I have not touched upon all the extra divs and spans microformats add to the code but that’s also a sign that we’re moving the wrong way.

If the authors of microformats want their formats to spread they need to fix the two issues above. The two suggestions of improvement I made above would make me consider using them. Right now I don’t feel it’s worth it.

15 responses to “Current issues with Microformats

  1. Of the parts you mention I don’t think you’re issues with Microformats are likely to be addressed. At present they are semantics for the rest of us, for proper semantics RDF and OWL would appear to be better options.

  2. @Richard Conyard: Why not fix them? The fixes I propose are one line each and they make is easier for everyone involved. Except the specwriters…

  3. The approach microformats has used in some of their specs seems to me less semantic and a step back from where people are heading.

    In respect to dates, would using ins tags to markup the time be suitable ? They already contain a datetime attr, and define what you are saying – “this has been inserted at/for”.
    I cant really see using the class attr for yet more data.

    For the meta proposal and parsers/bots, I presume sites without the meta tag should ignore classes as a special case?

  4. @John Drinkwater: using the ins element is a great idea! It does not hit 100% on target but it’s very close.

    My proposal is that the meta element gets required, yes. If it doesn’t exist on a site the crawlers will not have to continue past the <head> block.

  5. Namespaces are not needed, because microformats work to converge meaning, instead of diverging it. Adding profile link is helpful, but doing it with meta instead of link is an odd suggestion. In practice not all publishers can change the head of a page, so microformats are deliberately not strict about this. This does make a little more work for people writing parsers, but a lot less for those publishing the formats. This is intentional, as successful formats have far more publishers than parser writers.
    Regarding ISO 8601 and abbr, that is a legitimate semantic use – the title is more specific. If you don’t like the form of 8601 Tantek used there, ‘20060621T1030-0700’ can be expressed as ‘2006-06-21 10:30-0700’ which is broadly human readable too.
    So, these don’t need fixing, they are by design. Microformats do not add extra elements to HTML, they re-use existing semantic elements to converge and add meaning.

  6. @Kevin Marks: Thanks for your comment. I’ll try to address the issues you bring up:

    1) In what way is specifying what you mean by the class “vcard” diverging? I’d say that’s specifying what you mean.

    2) I’m wasn’t aware that you could link to the profile too, but the way you do it is not really imortant, the point is that is will lead to hell if a namespace is not there.

    3) No, that’s not readable. The time and timezone looks like a strange interval of some kind. Also, UTC or GMT? They are there by design, but by faulty design, title is not there for computers.

    4) My fault at “extra elements”. I ment “extra divs and spans”, also known as “div hell” by some. I updated the paragraph to reflect that.

    I’m well aware that microformats are like they are by design, it’s the design I’m commenting on.

  7. Thanks for an interesting article. You’re quite right about new techniques related to web development and the lack of a critical look at them, and so as a newish user of microformats, your article came at just the right time.

    While the ideas behind microformats appeal to me no end, I found myself agreeing with many of your points, particularly the awkward use of the ABBR element and its title attribute.

    Another specific limitation of the hcalendar format that I think is interesting to mention is its difficulty in handling recurring events – say, a radio show that happens at the same time every Monday evening. Not much progress, it seems, has been made in terms of establishing an agreed way handling evebts like this, that are not restricted to a specific ordinal date (like Wednesday 26 July 2006 at 20:00), but rather need only a day and time to be specified. The documentation at the microformats site admits this much.

    So far, the ‘best’ I have come up with in terms of marking-up recurring events in a schedule is to use the specific date of the first day the event occurs after Summer/Winter Daylight Saving changes kick in, specifying the appropriate amount of hours after GMT for the start time, specifying a duration time for that event (rather than a definite finish time), and declaring the frequency with which the event recurs using the RRULE class. For example, for an event that happens every Wednesday at 20:00, lasting for 3 hours, I specify the first Wednesday after the time changes to Summertime (so, 20060329), GMT plus 2 hours (T2000+0200), and a duration of 3 hours (PT3H) to give me a dtstart value of “20060329T2000+0200/PT3H). Then I declare the RRULE class on an element with the FREQ class inside a child element, giving that child element the value ‘weekly’.

    I have no idea if this is sufficient. Worse, does anybody? ;)

  8. Emil,

    That’s OK, I didn’t post with the intention of getting help. I just wanted to point out for everyone an acknowledged shortcoming of the design of the format, in addition to the ones you mentioned.

  9. Hi, I’d just like to comment that (strictly speaking) GMT is not a time zone really. When people use “GMT” as a time zone, they actually refer to UTC+0. Time zone offsets always refer to UTC, so there’s no ambiguity.

    That’s not to say I don’t agree with the article, I just felt like sharing some knowledge. =)

  10. @Rotem: Thanks, reading up on timezones was good for me :) I changed the two examples to be GMT-7 and PDT-7 (instead of UTC-7 which would be the same zone ;). Anyway, I’ll keep GMT because people know what I mean when I use it, UTC is not as widely spread from what I know (in Sweden). Good comment.

Comments are closed.