Text content from HTML

Test the algorithm!

<div class="p-name"> Text content of the DIV:
</div> E.g. the microformats p-* value.

Plain text of element.

To get the plain text for an Element input:

  1. Let output be the result of running element to string on input.
  2. Remove any sequence of one or more consecutive U+0020 SPACE code points directly before and after an U+000A LF code point from output.
  3. Strip leading and trailing ASCII whitespace from output.
  4. Replace any sequence of one or more consecutive U+0020 SPACE code points in output with a single U+0020 SPACE code point.
  5. Return output.

Element to string.

To get the string value for an Element input:

  1. Let output be an empty list.
  2. Let children be the children of input in tree order.
  3. For each child in children:
  4. Return the concatenation of output.