Extracting mf2 data from DOM trees

This document tries to formalise different microformats algorithms.

Microformats are usually extracted from HTML. Follow the HTML spec to parse an HTML document into a DOM tree before applying any of these algorithms.

Get value for p- property

  1. Let input be any Element.
  2. Let output be the result of running the value class pattern algorithm on input.
  3. If output is not null, then return output.

Get value through Value Class Pattern

Note: this algorithm has skipped implementing special rules for date and time parsing from the original algorithm.

Note: there is a bug in this algorithm where null might be added to output. This carries from the original algorithm.

  1. Let input be any Element.
  2. Let output be an empty list.
  3. Let walker be a new TreeWalker where root is input and whatToShow is ELEMENT_NODE.
  4. Let descendant be the return value of calling firstChild on walker.
  5. While descendant is not null:
    1. Let classes be the descendant’s classList.
    2. If the return value of calling contains on classes with the argument "value" is true, then:
      1. Let value be descendant’s textContent.
      2. To find better options for value, switch on descendant’s tagName:
        IMG
        AREA
        Set value to the return value of calling getAttribute on descendant with the argument "alt".
        DATA
        If the the return value of calling hasAttribute on descendant with the argument "value" is true, then set value to the return value of calling getAttribute on descendant with the argument "value".
        ABBR
        If the the return value of calling hasAttribute on descendant with the argument "title" is true, then set value to the return value of calling getAttribute on descendant with the argument "title".
      3. Append value to output.
      4. Set descendant to the return value of calling nextSibling on walker.
    3. Otherwise, if length of classes is bigger than 0, then:
      1. Let skipElement be false.
      2. For each item in classes:
        1. If item starts with "p-", "u-", "e-", or "dt-", then:
          1. Set skipElement to true.
          2. Break
      3. If skipElement is true, then set descendant to the return value of calling nextSibling on walker.
      4. Otherwise, set descendant to the return value of calling nextNode on walker.
    4. Otherwise, set descendant to the return value of calling nextNode on walker.
  6. If output has any items, then return the concatenation of output.
  7. Return null.