Welcome to abbyy_to_epub3’s documentation!¶
Features¶
- Unicode-compliant
- Can handle left-to-right and right-to-left text.
- Attempts to recognize running headers, footers, and decimal or page numbers.
Level of confidence in fuzzy matching can be fine tuned in
config.ini
. Errs on the side of minimizing false positives.
Limitations¶
- Accessibility is inherently limited by the input ABBYY FineReader documents. If they are marked up with headings and other semantic markup, that structure will be incorporated into the ePub.
- There is currently no functionality for image description.
- The module can also transform ABBYY XML documents generated by ABBYY FineReader 6. However, those documents are not marked up with headings, so there is no structural navigation for accessibility.
Requirements¶
- Python 3
- If running epubcheck, a Java Runtime environment
- If running DAISY Ace, Node.js
Usage¶
From within a Python program:
from abbyy_to_epub3 import create_epub
book = create_epub.Ebook('docname') # See *Assumptions* below.
book.craft_epub()
From the shell:
abbyy2epub docname # See *Assumptions* below.
The available command line arguments are:
..code:: bash
usage: abbyy2epub [-h] [-d] [–epubcheck] [–ace] docname
Process an ABBYY file into an EPUB
- positional arguments:
- docname A directory containing all the necessary files. See the README
- for details.
- optional arguments:
-h, --help show this help message and exit -d, --debug Show debugging information --epubcheck Run EpubCheck on the newly created EPUB --ace Run DAISY Ace on the newly created EPUB
System dependencies¶
If you’d like to run epubcheck, there are certain system dependencies. Depending on running environment, these may need to be manually installed. On Ubuntu, I installed these with:
sudo apt-get install default-jre libpython3-dev
If you’d like to run the DAISY Ace accessibility checker, you’ll also need Node.js and Ace. On Ubuntu, I installed these with:
sudo apt-get install nodejs
sudo npm install ace-core -g
If Ace successfully installed, you should be able to run:
ace --help
at the command line. This should display usage information. For more information see the Ace Getting Started Guide <http://inclusivepublishing.org/toolbox/accessibility-checker/getting-started/>.
Installation¶
This package can be installed on your local system. From the directory containing setup.py:
pip install -r requirements.txt
python setup.py develop
pip install .
You can rebuild the documentation, which is generated with Sphinx.
cd docs
make html
Testing¶
Run py.test
from the top-level app directory. Create new tests in the tests
subdirectory.
Assumptions¶
This application assumes you are working in a directory which contains a
subdirectory for the document and a specific set of files. If the document is
named docname
, the directory structure assumed is:
docname/
docname_abbyy.gz
docname_meta.xml
docname_jp2.zip
docname_abbyy.gz
unzips todocname_abbyy
, an XML file generated by ABBYY.docname_jp2.zip
unzips to a directory calleddocname_jp2
, which includes a number of documents in the formatdocname_####.jp2
.docname_0000.jp2
is scanner calibration.docname_0001.jp2
is the cover image and the first image reference in the ABBYY.