Use formats instead of microformats

The Semantic Web continues to break new ground, and Web 3.0 seems to be a term that people associate with it. In the backwaters of semantics, microformats aims to develop standards to embed semantic information into XHTML. I can’t help to think that’s strange.

One of the principles of microformats is to “design for humans first, machines second”. Still, almost all formats are about adding span tags and class or rel attributes to existing XHTML. Humans will never see those, or benefit from them, unless there’s some kind of machine parsing done on top. Microformats were first built by people working with the blog search engine technorati, one of the reasons being to make it easier for technorati to aggregate data from those blogs. So machines it is.

Thing is, if you’re going to give information to machines, why not use vCard instead of the equivalent microformat hCard? hCard is just a translation of vCard into XHTML. vCards open in your e-mail program, allowing you to save the contact information there, hCards don’t open anywhere. vCards are also just as easy (probably easier) to crawl and parse as microformats.

So what I’m saying is, could we please use real formats instead of microformats?

Update: This article was too fuzzy, so let me clarify: This discussion is about embedded formats vs. formats. The “vs.” come from the fact that lots of sites that implement microformats choose not to implement the corresponding format, which in some cases lead to people not being able to use the extra information.

27 responses to “Use formats instead of microformats

  1. I see your point, but I think you’re overlooking one of the fundamental advantages of using microformats. On the vast majority of occasions, the data you want to expose to machines is already available in the HTML.

    Take hCard for example. If you want to provide a basic vCard, then it’s likely your contact page already contains your name, telephone number and e-mail address. Why copy this information into a separate file just to expose it to machines? Don’t Repeat Yourself, surely.

    Similarly with hAtom (marking up blog-type content), this saves having to create separate functionality for creating syndication feeds for RSS1, RSS2.0, Atom, plain XML and so on. You have your data marked up once, and it can be parsed and converted as appropriate. This is where some microformat XSLT parsing would be very useful.

    Technically speaking, with my contacts example, if the details were stored in a database then generating a vCard file from this data would alleviate the data-duplication issue. However not all sites run from a CMS and very often this information is buried in the HTML.

    That’s the simplified beauty of microformats.

  2. @Nick Dunn: Thanks again, great comment. I agree that if your contacts are buried inside a HTML chunk, it’s somewhat easier to make them a hCard than a vCard. But it’s not that much easier. Not storing important information like contacts in HTML of course gives you lots of other extras too.

    vCards are very rarely made by hand, so it wouldn’t be repeating oneself. Rendering machine data from a machine makes more sense to me.

  3. @Siegfried: If you throw out that the whole article is “nonsense”, I at least expect some arguments. My article is full of them, your comment only talks about RDFa. Better arguments please.

  4. No! I just said that “real formats versus microformats” is nonsense. There is no “versus”. And exactly for that i already wrote arguments.

    To repeat it: Embedding vCard in xhtml the correct way is via RDFa. But that is limited to xhtml. It is not possible for html. So embedding vCard the RDFa way is for the future, embedding hCard into html is for now. Both methods have their usage and their place. So there is no “versus”. As long as you stick to html, you have to stick to hCard. As soon as you switch to xhtml you have the option to switch to RDFa/vCard.

    In the title of this article there are two nonsense points. First is claiming some “versus” between microformats and vCard. Second is implying that microformats is no “real” format. It is a real format the same way RDFa is.

    The remainder of the article is mostly correct. And i agree that vCard is the way to go – in the (near?) future.

  5. The reasons to use microformats is that you without any additional effort can make the computer understand existing content on a page.

    Sure – it needs spans or other tags but the content would most of the time be printed on the page anyway so adding a tag that makes the computer understands doesn’t take much of an effort.

    And by the way – hCards opens as vCards very easily. Check out Operator, Tails and Microformats bookmarklets – they all export as vCards.

    Real formats are good – they are better than microformats. But microformats are better than nothing and doesn’t exclude real formats.

  6. Personally I’ve never understood the statement that microformats are “design for humans first, machines second”. I feel indeed it’s the opposite. Speaking of hCalendar, you put a human readable event date on your page. In the background you add the microformat in the HTML. But I like it. And tools like the Operator extension for Firefox, or Optimus, make it perfectly usable.

  7. @Siegfried: I don’t agree that it’s a nonsense point. It’s really about embedding formats in HTML or not. Embedding formats (RDFa or Microformats), or linking to formats (vCard or whatever).

    Development is about priorities, and I agree that adding both would be a good idea. Real world examples contradict that though, people that use microformats withdraw from making data available as non-embedded formats.

    @Gert & Pelle: Firefox extensions make them usable, but far too few people use them to motivate a developer to implement them. A big search engine like technorati could of course crawl and search for them, but even then: crawling for non-embedded formats would be easier.

  8. The by far most widespread “browser” is the IE. The IE does not understand XHTML (if you properly deliver it as application/xhtml+xml). So consequently the majority still sticks to HTML. Embedding Microformats in HTML is straightforward and has at least some basic rudimentary acceptance. So Microformats can already be used today. RDFa is limited to real XHTML (because of the namespaces). As you can see at the start of this comment, it does not make much sense to deliver productive pages as XHTML. So for such sites you have Microformats, nothing else. XHTML/RDFa is for early adopters which plan for the future already today. These are very vew people. They indeed should switch from Microformats to RDFa. And somewhen these few early adopters then will have an advantage.

    But there will be very much water going down the Rhine river until we reach that day. Just consider that the majority of the web designers of today still stick to the concepts and methods of HTML 3.2. Even if they use the HTML 4 doctype they still code and design likeback in the old days of HTML 3.2. So don’t expect the adoption of semantic markup, in what standard ever, too soon. And if there will be some minor adoption of the bare idea of web pages having not only a “look”, but also a semantic, the first step will be small. And microformats allows for the smaller first step.

  9. @Siegfried: I too think RDFa is a better choice for the future. This article isn’t about that though. It’s about embedded vs. standalone formats.

    @Sarven Capadisli: Instead of mindless self-promotion, tell me what I’ve misunderstood.

    @Jesse Skinner: Well, perhaps DRY is a reason, could be. But wouldn’t it make sense to not store contact information some other way than in a HTML chunk in your database? And if you generate a vCard from that, there’s very little reason to do hCard too.

  10. Mindless promotion? I beg to differ. If you feel that it is so “mindless”, please remove the links. Believe me, I’m neither looking for a promotion nor need your site to do that if I choose to do it. I was merely trying to help you.

    If all the comments above telling you that your views are incorrect because of pre-misconceptions, perhaps it is time to learn what microformats is really about or what it is really trying to solve at the end of the day. In any case, the fact that you don’t like microformats (or perhaps feel that you are left behind and trying to make up a case for it) shouldn’t get in the way to educate yourself on it. Don’t expect people to give you all answers if you are not willing to put the time to learn.

    But for the sake of the argument, I can point a few more things about your post that’s wrong on top of the above comments:

    Microformats is not trying to replace existing formats. It is a way to use existing formats in (X)HTML. Whether you want to use a separate file to keep track of a vCard or not is totally up to you. Note that, since you are already writing HTML, you might as well use certain names in your HTML that match those in widely accepted standards. Because that way, you can pull off a vCard by having a single instance for the data. Are you huffing about names like “entry_title” being useless and you rather go for “my_foo_title” instead? What’s the problem? If you are going to use a name you might as well use a standard name and have the advantage of parsers being able to understand your document among others. Maybe you don’t want to do that. That’s your call.

    You are thinking code bloat? You need to learn about writing HTML on different environments and how they differ at the end of the day. Cases where you need to move code around or keep consistency and define things on a granular level. Above all, maintenance. microformats can lead you in that direction, if, of course, you choose to understand how to make use of it in your own work.

    Technorati is not in charge of microformats by any means. Many formats have been developed after XFN and still being researched and developed by an open community. I take slight offense to this because I contribute my share. Believe me, the discussions are a lot more complex then anything under this URL. It caries out analysis and constructive feedback from the community.

    You’ve clearly missed the point of “design for humans first, machines second”. The idea is to mark “visible” data that we are already providing to humans so that the machines can also understand them. The tags and attributes that you speak of is a way to do that. It is like focusing on “social tagging” instead of meta-keywords. Can you guess why meta-keywords is dropped in favour of social tagging? microformats favors visible data as opposed to meta-data, keep that in mind.

    microformats is not a new language nor is it trying to revolutionise the way we work. It is a step in the right direction. It will not solve all our problems but it will get us 80% there because it is pretty reasonable right now. microformats is not competing against RDF(a). They are meant to solve a similar (but quite different) problem in a different way. If you want to cover your “Semantic Web”ness, perhaps microformats is not for you. If you want to have a way to provide a way for machines to understand it on the existing Web then microformats is for you. You also need to understand the state of the Web though. Don’t expect to go from zero to “Semantic Web” over night. microformats can help you bootstrap it though.

    Consider this a freebie. Now go read about it instead of making uneducated claims because you are adding noise to the Web without doing proper research.

    (This comment form is not very user friendly: dimensions of the container is too small for a comfortable writing. I am actually typing this out in a text editor and will paste it back.)

  11. O.k., if this artikle is not about embedded formats but about standalone formats then the title at least is not nonsense (though still i do not agree). Your artikle was fuzzy about that.

    O.k., HTML, or, in the (hopefully near) future is _the_ web format. Instead of tons of proprietary and very different formats in tons of files it is indeed better to have them embedded into a single file. The main advantage is that this could then be used simply by humans by simply reading it. It could be used by humans that even do not know of vCards and the like. You could just sit down in front of the monitor, take a pen and paper and write down that telephone number to call that person. You could try that with vCrard, too. And if you know enough about computer data you probably will succeed. But the average noob is far better off with a visually appealing nice web page.

    On the other hand embedded data can be extracted by programs to automatically enhance usability and usefulness of that information. You could still write down the address information on paper and then add it to your e-mail client manually. But is adds to usability if the computer does it for you by a simple click.

    Additionaly this information, let’s say an address information, is, if embedded into a web page, within a context. The address information is part of the complete information of that page. If you extract that address information, you get a naked address. For what purpose is this address? Why do you have it? The vCard data format profides no information about that. A web page does. This contextual information is completely lost when you extract this piece of data.

    Of course you may _link_ to that standalone data from your web page. This has advantages and disadvantages. The context is not available ad hoc as when embedded. But you get an immediate access to a format ready to use, without some program involved. To combine both, i have done both on my impressum page. I have the address information on the page marked up with hCard, and have alink to a pre-done vCard file. Often, it’s not a question about this _or_ that, but best practise would be to offer both.

    BTW: Since i’m already offer my pages in HTML and XHTML, i’m currently working on switching the XHTML files to RDFa while still using Microformats on the HTML versions :)

  12. @Sarven Capadisli, Siegfried: Thanks for your replies, I’ll reply later tonight (and also fix the narrow comment area, I agree that it sucks).

  13. @Sarven Capadisli: Sorry for the harsh tone, my guess is that you get as annoyed by me not understanding you, as I get when people don’t understand me.

    1. I know what microformats are for, and I know what they are trying to accomplish (I don’t need a FAQ or RTFM). Thanks anyway for typing it down here.
    2. Replacing. They are not meant to replace formats, but they do. Very few sites that use microformats do also implement their corresponding “real” format. That’s what this whole article is about, and that’s also what everyone is misinterpreting. I’ll update the article.
    3. No one said technorati was in charge of anything.
    4. Thanks for the ending rudeness, we’re even.
  14. @Siegfried: Yes, it was fuzzy about that, I’m sorry, will add an update at the end of it explaining.

    I see what you mean, and what you’re getting at, but I still don’t agree. One thing you’re saying is that the future is one format. But webpages of today already consist of many different formats. We have HTML, XSS, JS, PNG, JPG, SWF, and so on. Each format has a certain specific thing it accomplishes and it does it well. Other file types the user get to decide what to do with. If you click a vCard file you get a friendly Outlook “add contact” window. It works, for real users today.

    Good point about context, I didn’t think about that. As you say, it’s a bit harder for machines to resolve things based on links, but they probably still will have to do that across different pages.

    I agree that doing both, as you have on your impressum page, is the best way to do things right now. But if one of my clients gives me a couple of hours and asks me to make their contact information page more usable by ordinary people, I’d still pick vCard. First.

  15. Emil, I reacted, I apologise.

    (Implementing all those corresponding “real” formats, is not easy. Othwerise, we’d see it more often. Implementing microformats is simple and we let the scripts to do that extra work for us. This is trying to solve a real world problem and it is not meant to solve all problems either.)

  16. I don’t have time to read the entire discussion here, but I’m confused. I just started learning how to use microformats. I thought the purpose of microformats was to display information on a web page (so humans have instant access to it) and then provide the option to download a vCard (so humans can manipulate this useful information with a machine).

    Am I misunderstanding this? Is there a way to display a vCard directly on the page? I thought the point of microformats was to get the best of both worlds: microformats can be manipulated by machines/programming while not requiring extra interaction from the user.

  17. Hmmm, well… :)
    Just to be precise: I do not think that there will only be one format. This one format, html, will just be the one main container. Just as it is a container for formats like jpg, gif and png (and others). Embedding these formats is simple, well known and approved.

    Now about embedding other formats. Or not embed it but link to it. These are the 2 options you have. If you embed it, you have to adapt it to what is possible in your container. The container is html. If you do that, you have the information immediately embedded into a web pages whole context. There is no need for any human to do anything additional to see this information and to recognize it within its context. The drawback is indeed that you need some extra computer functionality to get that information in this standalone format. This is why microformats per se are not very useful. They become useful if there are functions to extract and convert them.

    If you do not embed them, but link to them, the advantage is that you get the information directly (well, mostly direct, you have to do an extra klick) in the format usable in your programs. That is nice, and you do not need any script snippet to extract that for you. But the information is out of context. And there is an extra action needed for the human in front of the computer.

    So both have its advantages and disadvantages. And i think, we both agree that the best way would be to offer both to combine the advantages of both while getting rid of the disadvatages. But then it is still no “versus” between both methods.

    And last point: I personally think that for the future we will have one basic format for all kind of meta data: RDF. This could be very well embedded into xhtml, and it could as well be a perfect standalone format. So with rdf you have a format combining the advantages of all we have today.

  18. @Siegfried: We both agree that the first priority is to supply both. I still think that if you have to chose one, you should pick the external format instead of the embedded one. Good summary of the two options you have.

  19. The problem is that very few people have the browser extensions needed to convert microformats to “real” formats.

    True enough. However, they don’t need to. Technorati provides a service through which HTML authors can generate vCards from hCards by adding a few parameters to a link. It’s relatively simple (especially with an example to work from) for semi-trained authors to use.

    Rather than an end to microformats, which are so easy to create, we need more format conversion tools like Technorati’s service — preferably open source scripts that can be run locally.

  20. @Stephanie: Or they could render a vCard themselves, if that’s the intent they’re after. vCards are clickable today, without the need for a third party service. I don’t think usability is the main argument for microformats, it might be in authoring (you can easily make a hCard in your existing CMS!).

  21. For people who aren’t running a CMS — and you wouldn’t believe how many university departments still use plain old HTML files, often authored in an ancient copy of Dreamweaver or even FrontPage — creating a vCard is an intimidating process. They understand how to write HTML.

Comments are closed.