microformats2 parsing specification

Jump to: navigation, search

microformats2 is a simple, open format for marking up data in HTML. The microformats2 parsing specification describes how to implement a microformats2 parser, independent of any specific vocabularies.

Status
This is a Living Specification with several interoperable implementations. This specification is stable, subject to editorial changes only for improving clarity of existing meaning. While substantive changes are unexpected, it is a living specification subject to substantive change by issues and errata filed in response to implementation experience, requiring consensus among participating implementers (since 2015-01-21) as part of an explicit change control process. There are currently no draft or proposed new features in this specification, and if any were to be added, they would be explicitly labeled as such.
Note: This specification is only marked as a "Draft Specification" because of pending edits from resolved issues before 2016-06-20. Once those edits have been completed, the link to [[Category:Draft Specifications]] at the bottom of this document should be changed to [[Category:Specifications]].
Participate
Open Issues
Resolved issues before 2016-06-20
IRC: #microformats on Freenode
Editor
Tantek Çelik
License
Per CC0, to the extent possible under law, the editors have waived all copyright and related or neighboring rights to this work. In addition, as of 2018-06-28, the editors have made this specification available under the Open Web Foundation Agreement Version 1.0.

Contents

[hide]

algorithm

parse a document for microformats

To parse a document for microformats, follow the HTML parsing rules and do the following:

{
 "items": [],
 "rels": {},
 "rel-urls": {}
}

Parsers may simultaneously parse the document for both class and rel microformats (e.g. in a single tree traversal).

parse an element for class microformats

To parse an element for class microformats:

The "*" for root (and property) class names consists of an optional vendor prefix (series of 1+ number or lowercase a-z characters i.e. [0-9a-z]+, followed by '-'), then one or more '-' separated lowercase a-z words.

parse an element for properties

parsing a p- property

To parse an element for a p-x property value (whether explicit p-* or backcompat equivalent):

parsing a u- property

To parse an element for a u-x property value (whether explicit u-* or backcompat equivalent):

parsing a dt- property

To parse an element for a dt-x property value (whether explicit dt-* or backcompat equivalent):

parsing an e- property

To parse an element for a e-x property value (whether explicit "e-*" or backcompat equivalent):


parsing for implied properties

Imply properties only on explicit h-x class name root microformat element (no backcompat roots):

Note: The same markup for a property should not be causing that property to occur in both a microformat and one embedded inside - such a property should only be showing up on one of them. The parsing algorithm has details to prevent that, such as the :not[.h-*] tests above.

parse a hyperlink element for rel microformats

To parse a hyperlink element (e.g. a or link) for rel microformats: use the following algorithm or an algorithm that produces equivalent results:

rel parse examples

Here are some examples to show how parsed rels may be reflected into the JSON (empty items key).

E.g. parsing this markup:

<a rel="author" href="http://example.com/a">author a</a>
<a rel="author" href="http://example.com/b">author b</a>
<a rel="in-reply-to" href="http://example.com/1">post 1</a>
<a rel="in-reply-to" href="http://example.com/2">post 2</a>
<a rel="alternate home"
   href="http://example.com/fr"
   media="handheld"
   hreflang="fr">French mobile homepage</a>

Would generate this JSON:

{
  "items": [],
  "rels": { 
    "author": [ "http://example.com/a", "http://example.com/b" ],
    "in-reply-to": [ "http://example.com/1", "http://example.com/2" ],
    "alternate": [ "http://example.com/fr" ], 
    "home": [ "http://example.com/fr" ] 
  },
  "rel-urls": {
    "http://example.com/a": {
      "rels": ["author"], 
      "text": "author a"
    },
    "http://example.com/b": {
      "rels": ["author"], 
      "text": "author b"
    },
    "http://example.com/1": {
      "rels": ["in-reply-to"], 
      "text": "post 1"
    },
    "http://example.com/2": {
      "rels": ["in-reply-to"], 
      "text": "post 2"
    },
    "http://example.com/fr": {
      "rels": ["alternate", "home"],
      "media": "handheld", 
      "hreflang": "fr", 
      "text": "French mobile homepage"
    }
  }
}


what do the CSS selector expressions mean

This section is non-normative.

Use SelectORacle to expand any of the above CSS selector expressions into longform English prose.

Exception:

note HTML parsing rules

This section is non-normative.

microformats2 parsers are expected to follow HTML parsing rules, which includes for example:

note backward compatibility details

The parsing algorithm and details refer to "backcompat root classes" (backcompat roots for short) and "backcompat properties". These conditions and steps in the algorithm document how to parse pre-microformats2 microformats which all defined their own specific root class names and explicit sets of properties.

Some details to be aware of (which are explicitly in the algorithm, this is just an informal summary)

backward compatibility mappings

Note: several parser implementations have encoded backward compatible mappings into source and data files. Implementers of parsers may find these useful:

questions

See the FAQ:

issues

See the issues page:

implementations

Main article: microformats2#Implementations

There are open source microformats2 parsers available for Javascript, node.js, PHP, Ruby and Python.

test suite

See:

Ports to/for other languages encouraged.

change control

Minor editorial changes (e.g. fixing minor typos or punctuation) that do not change and preferably clarify the structure and existing intended meaning may be done by anyone without filing issues, requiring only a sufficient "Summary" description field entry for the edit. More than minor but still purely editorial changes may be made by an editor. Anyone may question such editorial changes by undoing corresponding edits without filing an issue. Any further reversion or iteration on such an editorial change must be done by filing an issue.

Per the stable status of this document, substantive issue filing, resolution, and edits are done with the following change control steps, which may nearly all be done asynchronously once an issue is filed to reach the required state of "Resolve by implementation verified rough consensus". All steps should be openly documented (e.g. on this wiki or GitHub issues) such that others may later verify the history of an issue, and all steps are encouraged to be announced on #microformats IRC with a link to the issue.

These change control steps are inspired by the tradition of "Rough consensus and running code" as exhibited by example by IETF and W3C processes, and in that regard, seek to be a philosophically compatible approach to specification iteration. They have been in rough practice since 2015-01-21, increasingly strictly applied since then with consensus of issue discussion participants, and explicitly documented based on issue resolving and spec editing experience.

see also

Categories

microformats2 parsing specification was last modified: Tuesday, April 17th, 2018

Views