Table Areas

Creating table areas improves recognition

All tabular matter should be identified as a table, not text. This includes material without rules arounds it, but where the text aligns in more than one column.

At this time, tables should be identified as such but they do not need to be corrected during the OCR process. Let ABBYY FineReader do its best to automatically recognize the material. We will return to tables later in the production process.

  • For tables, use the Draw Table Area tool. Complex pages can have multiple tables, requiring a combination of multiple text and table areas. This takes time and managing the numbering of your boxes becomes critical.
    Figure 1. Page with both text and table areas.

  • All tabular matter should be identified as a table, even if it does not have rules around it. This is critical to success in the OCR process.
    When tabular data is treated as text instead of as a table, it results in output that might look acceptable in AFR and even in the docx file. However, Word formats it as a list, and it will not convert properly into TEI, requiring extra work to adapt it to a table format. The work goes faster if such material is formatted as a table at the initial OCR stage.
    Figure 2. Tabular matter without rules around it.

    The first instance of tabular matter is correctly identified as a table, but the second is not and needs to be corrected by creating a four-column table with six rows. See Edit tables.