Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR:root: No template for Invoice.pdf #512

Open
nozprod opened this issue May 12, 2023 · 12 comments
Open

ERROR:root: No template for Invoice.pdf #512

nozprod opened this issue May 12, 2023 · 12 comments

Comments

@nozprod
Copy link

nozprod commented May 12, 2023

Hey,

Always getting the same error, tried both custom and predefined templates (using a french Google bill as example).

Any idea ?

My template :

issuer: Google Commerce Limited
fields:
  date: Récapitulatif pour la période suivante\s+:\s+(\d{1,2}\s+\w+\.\s+\d{4})
  ttc: Total en EUR\s+([\d,]+) €
  ht: Sous-total en EUR\s+([\d,]+) €
  tva_rate: TVA \((\d{1,3})%\)
  tva_amount: TVA \(\d{1,3}%\)\s+([\d,]+) €
keywords:
  - Google Commerce Limited
  - IE9825613N
options:
  currency: EUR
  date_formats:
    - '%d .%b .%Y'
  languages:
    - fr
  decimal_separator: ','

An example bill is attached
Invoice.pdf

@bosd
Copy link
Collaborator

bosd commented May 12, 2023

I don't see the error. Which message do you get?
Which os are you using?

Did you run it with the --debug flag. to get more detailed feedback?

@nozprod
Copy link
Author

nozprod commented May 23, 2023

Hi, here is the full message
Capture d’écran 2023-05-23 à 20 50 34
And I'm running MacOS 12.6.5

The --debug flag doesn't help, or I may be doing something wrong...
In case it helps, here is my script :

import os
import pytesseract
import argparse
import logging.config
import logging
from pdf2image import convert_from_path
from invoice2data import extract_data
from invoice2data.extract.loader import read_templates
import google.auth
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError

# Parser les arguments de la ligne de commande
parser = argparse.ArgumentParser()
parser.add_argument("--debug", action="store_true",
                    help="activer le mode de débogage")
args = parser.parse_args()

# Configurer le logger
if args.debug:
    logging.basicConfig(level=logging.DEBUG)
else:
    logging.basicConfig(level=logging.INFO)

def read_templates_from_folder(folder):
    result = []
    for path, subdirs, files in os.walk(folder):
        for name in files:
            if name.endswith(('.yml', '.yaml')):
                file_path = os.path.join(path, name)
                result.extend(read_templates(file_path))
                print(f"Template loaded from {file_path}")
    return result

def extract_data_from_invoice(pdf_path):
    print("Extraction du texte à partir du fichier PDF...")
    templates = read_templates_from_folder('Templates/')
    try:
        data = extract_data(pdf_path, templates=templates)  # Ajoutez cette ligne
        print(f"Data extracted: {data}")  # Ajoutez cette ligne
        return data
    except Exception as e:
        print(f"Error during extraction: {e}")
        return None

def extract_invoice_data(data):
    if not data:
        return None

    invoice_data = {}

    date = data.get('date')
    if date:
        invoice_data['date'] = date.strftime("%d/%m/%Y")

    invoice_data['ht'] = data.get('ht')
    invoice_data['tva_rate'] = data.get('tva_rate')
    invoice_data['tva_amount'] = data.get('tva_amount')
    invoice_data['ttc'] = data.get('ttc')

    return invoice_data

def authenticate_google_sheets():
    print("Authentification et création du service Google Sheets...")
    creds = None
    SCOPES = ['https://www.googleapis.com/auth/spreadsheets']
    token_path = 'token.json'
    credentials_path = 'credentials.json'

    if creds and creds.expired and creds.refresh_token:
        creds.refresh(Request())
    else:
        flow = InstalledAppFlow.from_client_secrets_file(credentials_path, SCOPES)
        creds = flow.run_local_server(port=0)

    return build('sheets', 'v4', credentials=creds)

def update_google_sheet(service, sheet_id, data):
    print("Mise à jour de la feuille de calcul Google Sheets...")
    range_name = 'Sheet1!A1:E1'
    values = [[data['date'], data['ht'], data['tva_rate'], data['tva_amount'], data['ttc']]]
    body = {'values': values}

    try:
        result = service.spreadsheets().values().append(
            spreadsheetId=sheet_id, range=range_name,
            valueInputOption='USER_ENTERED', insertDataOption='INSERT_ROWS', body=body).execute()
        print('{0} cells appended.'.format(result.get('updates').get('updatedCells')))
    except HttpError as error:
        print('An error occurred: {0}'.format(error))
        return None

if __name__ == '__main__':
    pdf_path = 'Bills/Invoice.pdf'
    data = extract_data_from_invoice(pdf_path)
    invoice_data = extract_invoice_data(data)
    print("Données de facture extraites:", invoice_data)

    if invoice_data:
        sheets_service = authenticate_google_sheets()
        sheet_id = '1A9UPxQ7uR6znZmZ96xJ7Klg2ycXKdFtGrSHmkVND2hs'
        update_google_sheet(sheets_service, sheet_id, invoice_data)
    else:
        print("Aucune donnée de facture extraite.")

@nozprod
Copy link
Author

nozprod commented May 30, 2023

Hey @bosd
Any idea ?

@bosd
Copy link
Collaborator

bosd commented May 30, 2023

Hey @bosd
Any idea ?

Not yet.
At first sight code looks good.
Maybe add a try except block on the read templates. To see if there any errors encountered loading the templates.

Did you verify your installation is working by using one of the supplied examples and running it from the command line?

@nozprod
Copy link
Author

nozprod commented Jun 2, 2023

It doesn't work either with the templates supplied. So it's seems it's my installation...
I'll go with the try except block.
Capture d’écran 2023-06-02 à 16 51 40

@bosd
Copy link
Collaborator

bosd commented Jun 8, 2023

I actually ment try to check if your template / input file is correct.
By first testing it from the command line, thus bypassing your custom code.

invoice2data Invoice.pdf --input-reader=pdftotext --template-folder=/home/templates --debug

Also kindly make sure, your template includes a exclude_keywords and priority tag.

@nozprod
Copy link
Author

nozprod commented Jun 9, 2023

I'm sorry I'm not very familiar with all of this, but I try hard, and this is the new issue I get...

Capture d’écran 2023-06-09 à 17 07 15

I tried to find any .DS_Store file and remove it, without success.
I also uninstalled / reinstalled invoice2data, same issue happening.

@bosd
Copy link
Collaborator

bosd commented Jun 9, 2023

Now we are getting somewhere 😁
While adding the json support I assumed people where only storing .yml or .json files.
Having any other file in your template directory resolves in an error.

This is actually fixed by #509 in the source code of this repo.
Yet it has not been released on pypy.

So to resolve this you can download the source code straight from the master branch. And use that...It should work for you..

Or delete all the ds store files.
The ds store files can be annoying to get rid off and keep popping up.
When all those files are gone. You should not get this error anymore.

@nozprod
Copy link
Author

nozprod commented Jun 13, 2023

Thanks a lot, I'll try today 😉

@nozprod
Copy link
Author

nozprod commented Jun 13, 2023

So it worked, my templates are working and I'm able to extract datas by using Invoice2Data directly \o/
But for any reason, I can't make it work through my custom script 😢
I still get the "No template" error...

@bosd
Copy link
Collaborator

bosd commented Jun 19, 2023

But for any reason, I can't make it work through my custom script cry

It might still be related to the ds_store files or any other file which makes the template directory dirty.

Did you update your installed code from the source in this repo?
You can download the zip from this repo, copy the src/invoice2data contents to the location where the lib is installed.
example, on my machine it is:
/home/user/.local/lib/python3.11/site-packages/invoice2data/

Try to debug your custom template reading, by printing the results.

def read_templates_from_folder(folder):
    result = []
    for path, subdirs, files in os.walk(folder):
        for name in files:
            if name.endswith(('.yml', '.yaml')):
                file_path = os.path.join(path, name)
                result.extend(read_templates(file_path))
                print(f"Template loaded from {file_path}")
    print(result)
    return result

@changtraisitinh
Copy link

So it worked, my templates are working and I'm able to extract datas by using Invoice2Data directly \o/ But for any reason, I can't make it work through my custom script 😢 I still get the "No template" error...

I custom source and install with python setup.py install. hope for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants