abbyy_to_epub3 package¶
Submodules¶
abbyy_to_epub3.constants module¶
abbyy_to_epub3.create_epub module¶
abbyy_to_epub3.parse_abbyy module¶
-
class
abbyy_to_epub3.parse_abbyy.
AbbyyParser
(document, metadata_file, metadata, paragraphs, blocks, debug=False)[source]¶ Bases:
object
The ABBYY parser object. Parses ABBYY metadata in preparation for import into an EPUB 3 document.
Here are the components of the ABBYY schema we use:
<page> <block>types Picture, Separator, Table, or Text</block>
Text:
<page> <region> <text> contains a '\n' as a text element <par> The paragraph, repeatable <line> The line, repeatable <formatting> <charParams>: The individual character
Image: Separator: Table:
<row> <cell> <text> <par>
Each paragraph has an identifier, which has a unique style, including the paragraph’s role, eg:
<paragraphStyle id="{000000DD-016F-0A36-032F-EEBBD9B8571E}" name="Heading #1|1" mainFontStyleId="{000000DE-016F-0A37-032F-176E5F6405F5}" role="heading" roleLevel="1" align="Right" startIndent="0" leftIndent="0" rightIndent="0" lineSpacing="1790" fixedLineSpacing="1"> <par align="Right" lineSpacing="1790" style="{000000DD-016F-0A36-032F-EEBBD9B8571E}">
The roles map as follows:
Role name role Body text text Footnote footnote Header or footer rt Heading heading Other other Table caption tableCaption Table of contents contents -
etree
= ''¶
-
ns
= ''¶
-
nsm
= ''¶
-
version
= ''¶
-
abbyy_to_epub3.utils module¶
-
abbyy_to_epub3.utils.
dirtify_xml
(text)[source]¶ Re-adds forbidden entities to any XML string. Could cause problems in the unlikely event the string literally should be ‘&’