abbyy_to_epub3 package¶

Submodules¶

abbyy_to_epub3.constants module¶

abbyy_to_epub3.create_epub module¶

abbyy_to_epub3.parse_abbyy module¶

class abbyy_to_epub3.parse_abbyy.AbbyyParser(document, metadata_file, metadata, paragraphs, blocks, debug=False)[source]¶

Bases: object

The ABBYY parser object. Parses ABBYY metadata in preparation for import into an EPUB 3 document.

Here are the components of the ABBYY schema we use:

<page>
    <block>types Picture, Separator, Table, or Text</block>

Text:

<page>
        <region>
        <text> contains a '\n' as a text element
        <par> The paragraph, repeatable
            <line> The line, repeatable
                <formatting>
                <charParams>: The individual character

Image: Separator: Table:

<row>
  <cell>
    <text>
      <par>

Each paragraph has an identifier, which has a unique style, including the paragraph’s role, eg:

 <paragraphStyle
     id="{000000DD-016F-0A36-032F-EEBBD9B8571E}"
     name="Heading #1|1"
     mainFontStyleId="{000000DE-016F-0A37-032F-176E5F6405F5}"
     role="heading"
     roleLevel="1"
     align="Right"
     startIndent="0" leftIndent="0"
     rightIndent="0" lineSpacing="1790" fixedLineSpacing="1">
<par align="Right" lineSpacing="1790"
     style="{000000DD-016F-0A36-032F-EEBBD9B8571E}">

The roles map as follows:

Role name	role
Body text	text
Footnote	footnote
Header or footer	rt
Heading	heading
Other	other
Table caption	tableCaption
Table of contents	contents

etree = ''¶

is_block_type(elem, blocktype)[source]¶: Identifies if an XML element is a textblock.

ns = ''¶

nsm = ''¶

parse_abbyy()[source]¶: read the ABBYY file into an lxml etree

parse_content()[source]¶: Parse each page of the book.

parse_metadata()[source]¶: Parse out the metadata from the _meta.xml file

parse_paragraph_styles()[source]¶: Paragraph styles are on their own at the start of the ABBYY

version = ''¶

abbyy_to_epub3.parse_abbyy.add_last_text(blocks, page)[source]¶: Given a list of blocks and the page number of the last page in the list, mark up the last text block for that page in the list, if it exists.

abbyy_to_epub3.parse_abbyy.gettext(elem)[source]¶

abbyy_to_epub3.utils module¶

abbyy_to_epub3.utils.dirtify_xml(text)[source]¶: Re-adds forbidden entities to any XML string. Could cause problems in the unlikely event the string literally should be ‘&amp’

abbyy_to_epub3.utils.is_increasing(l)[source]¶: Given a list, return True if the list elements are monotonically increasing, and False otherwise.

abbyy_to_epub3.utils.sanitize_xml(text)[source]¶: Removes forbidden entities from any XML string

abbyy_to_epub3 package¶

Submodules¶

abbyy_to_epub3.constants module¶

abbyy_to_epub3.create_epub module¶

abbyy_to_epub3.parse_abbyy module¶

abbyy_to_epub3.utils module¶

Module contents¶

Table Of Contents

Previous topic

This Page