Eric Niquette

Introduction

It's surprisingly difficult to find information on the use of PDF tags outside the deeply-technical PDF ISO standards. This list serves as a quick reference point on the use of common elements and some examples of how they should be nested to maximize accessibility.

What are PDF tags?

In a PDF document, tags provide information on the structure and content to assistive devices. Elements are assigned a tag that describe the type of content found within. Examples of semantic tags include headings, paragraphs, tables, and lists.

Tags are invisible to the end user but are a critical component of an accessible PDF document. The tag tree is used by many screen readers to not only provide structural information but also define the document's reading order.

List of common tags

The following table contains a curated list of the PDF tags you're likely to encounter in your documents. I've attempted to provide a short description of each element, though many are self-explanatory.

Some tags have been omitted because, even though they're common, they're either no longer supported or their usage is discouraged based on the PDF and PDF/UA Best Practices syntax guide.

Tag Description
<Art> Separates individual articles within the same document. Often found in the <Part> element, but not required to be.
<Aside> Content that is indirectly related to the current topic, like a sidenote or a tip.
<BlockQuote> Block-level quotation. Can contain several paragraphs and a caption.
<Caption> Used to title an element. Should ideally be the first child of its parent but can also be used as the last child or placed outside the parent.
<Code> Inline text of programming code. Found in a block-level element.
<Div> Semantically-empty container. Typically used to apply styles to grouped elements.
<Document> The container of a complete document. A PDF may contain multiple documents. Can be left empty to indicate a blank page.
<Figure> Images, charts, and other graphical elements. May contain various elements but will be interpreted as a single image by screen readers.
<Form> Form elements. Can contain text when multiple fields are grouped but typically only contains an attributed object.
<Formula> Mathematical or scientific formulas. Can be used inline or at block level.
<H(X)> Section, document, or page titles. Should appear hierarchically, without skipping a level.
<Index> Container for a subject index, usually found at the end of a publication.
<Lbl> Labels for list markers such as bullets and numbers found in the <LI> element. Unlike HTML, this element is not to be used label a form element.
<Link> A link to a web page or another location in the document.
<L> Parent list container. Contains <LI> children.
<LI> Individual list items found in a parent <L>. Parent to the <Lbody>. Parent to the <L> element when using nested lists.
<LBody> The contents of a list item, found in the <LI> element.
<Note> An explanatory note like a footnote or endnote. Typically found under <Reference>.
<P> An ordinary paragraph.
<Part> Used to divide larger documents in parts.
<Quote> Inline quote in a block-level parent.
<Reference> A citation to text or data found elsewhere in the document. Can include a <Link> element.
<Sect> Used to divide a document into small sections. Often found in <Part> or <Art> elements.
<Span> Semantically-empty inline container, often wrapped around styled text.
<Table> Table parent container. Contains <TD> and <TH> elements.
<TBody> Designates a section of the table as the content area. Optional.
<TD> Found in <Table>, <TBody>, or <TFoot> elements.
<TFoot> Designates a table section as the footer, typically a total row. Optional.
<TH> Found in <Table> elements. Can be assigned a scope.
<THead> Designates a table section as the header. Typically contains the table's <TH> cells. Optional.
<TOC> Parent container for a table of contents that can be found at the root or in a <TOCI> for multi-level tables.
<TOCI> Individual table of contents items. Can contain another <TOC> for nested tables.
<TR> Table row used to group <TD> or <TH> cells in a row

Syntax and hierarchy

The following are sample tag trees for commonly-nested elements. Note that some tags can be approached in various ways as its a rather flexible format in that regard, but presented below are the layouts I like to use.

Captions

The <Caption> element is used to provide a title to an element, commonly used with <Figure> and <Table>.

It should ideally be placed as either the first element of its parent and can contain other tags. It can also be provided as the last child element, or outside the parent element if required.

<Figure>
  <p>
  <Caption>
    <p>Table 1. Example of a caption

Data tables

A three-column table with a caption, a row of column header cells, and a single row of data cells. Note that the <THead>, <TBody> and <TFoot> elements are optional.

<Table>
  <Caption>
  <THead>
    <TR>
      <TH>
        <P>Row 1, column 1
      <TH>
        <P>Row 1, column 2
      <TH>
  <TBody>
    <TR>
      <TD>
        <P>Row 2, column 1
      <TD>
        <P>Row 2, column 2
      <TD>
  <TFoot>
    <TR>
      <TD>
        <P>Row 3, column 1
      <TD>
        <P>Row 3, column 2
      <TD>

Forms

Every input must have its own <Form> element unless they are a group, like checkbox or radio sets. The <Form> element should appear at the same level as the primary label, and both within a common parent element. The OBJR notation is an Object Reference, which means the tag represents the actual field element.

Note that there is no mechanism to assign an input to a particular label. As such, the reading order and tooltips should be used.

Text inputs

A single parent element can contain multiple inputs.

<P>
  Label text
  <Form>
    Field Name - OBJR
  Label text 2
  <Form>
    Field 2 Name - OBJR

Checkboxes and radio buttons

Individal form labels should be found directly before or after their object.

<P>
  Label text:
  <Form>
    Checkbox 1 Name - OBJR
    Checkbox 1 label text
    Checkbox 2 Name - OBJR
    Checkbox 2 label text
    Checkbox 3 Name - OBJR
    Checkbox 3 label text

Lists

A simple nested list. Another way to do a list tree is to include the bullet character in the <Lbody> and wrap the text in a non-semantic element like a <span>.

<L>
  <LI>
    <Lbl>•
    <LBody>List item text
    <L>
      <LI>
        <Lbl>•
        <LBody>List item text
  <LI>
    <Lbl>•
    <LBody>List item text

Table of Contents

Table of contents can be nested or presented in a single, flat level. Both approaches are acceptable.

Nested

The <TOC> element can be nested as a child of another <TOC> or in a <TOCI>.

<TOC>
  <TOCI>
    <TOC>
      <TOCI>
      <TOCI>
      <TOCI>
  <TOCI>

Flat

Tables of contents can also be flattened and displayed linearly.

<TOC>
  <TOCI>
  <TOCI>
  <TOCI>
  <TOCI>