How to get Tess4j running on Ubuntu 10.04

In order to get a working Tess4j on an Ubuntu 10.04 machine, the following steps might help:

At first, install Ghostscript with apt-get to get the PDF to image conversion functionality:

apt-get install ghostscript

For building from source in the next steps, some additional packages are required. If they are not present already, install them with:

sudo apt-get install autoconf automake libtool
sudo apt-get install libpng12-dev
sudo apt-get install libjpeg62-dev
sudo apt-get install libtiff4-dev
sudo apt-get install zlib1g-dev

Because the package shipped with Ubuntu 10.04 is too old, get Leptonica sources (current version, at least 1.67), build them and install:

tar xzf leptonica-1.69.tar.gz
cd leptonica-1.69
sudo make install
sudo ldconfig

There is a package for Tesseract-OCR too, but it does not (at the time of this writing) contain the shared object library equivalent to the DLL provided by Tess4j. So in the last step, get the sources from subversion repository, patch with the C-API patch, build them and install:

svn checkout tesseract-ocr
cd tesseract-ocr
patch -p0 < ../001-tesseract-capi.patch
sudo make install
sudo make install-langs
sudo ldconfig


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: