git clone https://github.com/fin-officer/invocr.git
cd invocr
# Install Poetry if you don't have it
curl -sSL https://install.python-poetry.org | python3 -
# Install dependencies
poetry install
sudo apt update
sudo apt install tesseract-ocr
# Install additional language packs as needed
sudo apt install tesseract-ocr-eng tesseract-ocr-pol tesseract-ocr-deu tesseract-ocr-fra
brew install tesseract
# Install additional language packs
brew install tesseract-lang
# Check if Tesseract is installed correctly
tesseract --version
# Verify InvOCR installation
poetry run invocr --version
You can customize InvOCR behavior by creating a configuration file:
# Create default configuration
poetry run invocr config init --output ./config/invocr.yaml
# Use custom configuration
poetry run invocr --config ./config/invocr.yaml convert invoice.pdf invoice.json
For development purposes, install with development dependencies:
poetry install --with dev
If you encounter “Tesseract not found” errors:
tesseract --version
export TESSDATA_PREFIX=/usr/share/tesseract-ocr/4.00/tessdata/
(adjust path as needed)If you see “Warning: Invalid language” errors:
tesseract --list-langs