Skip to content
This repository has been archived by the owner on Feb 19, 2021. It is now read-only.

re.error: nothing to repeat at position 2 #568

Closed
kj21 opened this issue Oct 2, 2019 · 3 comments
Closed

re.error: nothing to repeat at position 2 #568

kj21 opened this issue Oct 2, 2019 · 3 comments

Comments

@kj21
Copy link

kj21 commented Oct 2, 2019

Great work with Paperless, It's making my paperwork less time consuming.
Recently I have been getting a regex error while running the consumer. The process quits with the line "re.error: nothing to repeat at position 2" in the Traceback. It seems to relate to the systems Python file "sre_parse.py" and the line 651:

raise source.error("nothing to repeat", source.tell() - here + len(this))

It seems to interpret the "+" as a repetition symbol in the regexp. I'm wondering how to fix this, since I don't feel comfortable messing around in the systems Python files.

@kj21
Copy link
Author

kj21 commented Oct 2, 2019

Traceback (most recent call last):
File "/usr/src/paperless/src/manage.py", line 11, in
execute_from_command_line(sys.argv)
File "/usr/lib/python3.7/site-packages/django/core/management/init.py", line 371, in execute_from_command_line
utility.execute()
File "/usr/lib/python3.7/site-packages/django/core/management/init.py", line 365, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/usr/lib/python3.7/site-packages/django/core/management/base.py", line 288, in run_from_argv
self.execute(*args, **cmd_options)
File "/usr/lib/python3.7/site-packages/django/core/management/base.py", line 335, in execute
output = self.handle(*args, **options)
File "/usr/src/paperless/src/documents/management/commands/document_consumer.py", line 96, in handle
self.loop_inotify(mail_delta)
File "/usr/src/paperless/src/documents/management/commands/document_consumer.py", line 129, in loop_inotify
self.loop_step(mail_delta)
File "/usr/src/paperless/src/documents/management/commands/document_consumer.py", line 121, in loop_step
self.file_consumer.consume_new_files()
File "/usr/src/paperless/src/documents/consumer.py", line 112, in consume_new_files
if not self.try_consume_file(file):
File "/usr/lib/python3.7/contextlib.py", line 74, in inner
return func(*args, **kwds)
File "/usr/src/paperless/src/documents/consumer.py", line 158, in try_consume_file
date
File "/usr/src/paperless/src/documents/consumer.py", line 228, in _store
relevant_tags = set(list(Tag.match_all(text)) + list(file_info.tags))
File "/usr/src/paperless/src/documents/models.py", line 84, in match_all
if tag.matches(text):
File "/usr/src/paperless/src/documents/models.py", line 108, in matches
if re.search(r"\b{}\b".format(word), text, **search_kwargs):
File "/usr/lib/python3.7/re.py", line 183, in search
return _compile(pattern, flags).search(string)
File "/usr/lib/python3.7/re.py", line 286, in _compile
p = sre_compile.compile(pattern, flags)
File "/usr/lib/python3.7/sre_compile.py", line 764, in compile
p = sre_parse.parse(p, flags)
File "/usr/lib/python3.7/sre_parse.py", line 930, in parse
p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
File "/usr/lib/python3.7/sre_parse.py", line 426, in _parse_sub
not nested and not items))
File "/usr/lib/python3.7/sre_parse.py", line 651, in _parse
source.tell() - here + len(this))
re.error: nothing to repeat at position 2

@pitkley
Copy link
Member

pitkley commented Oct 6, 2019

Thanks for reporting this @kj21. I think I found a fix and opened PR #571 for this. We'll let you know once it is fixed!

@kj21
Copy link
Author

kj21 commented Oct 6, 2019

Beautiful! As you lined out, I replaced r"\b{}\b".format(word), text, **search_kwargs) in line 101 and 108 in models.py with r"\b{}\b".format(re.escape(word)), text, **search_kwargs) and r"\b{}\b".format(self.match), text, **search_kwargs)) in line 114 with r"\b{}\b".format(re.escape(self.match)), text, **search_kwargs)). Works great. Thanks a lot!

@kj21 kj21 closed this as completed Oct 6, 2019
MasterofJOKers pushed a commit that referenced this issue Nov 2, 2019
Rather than using the user/document-provided values directly, we instead
escape them to use them verbatim.

This fixes issue #568.
pitkley added a commit to pitkley/paperless that referenced this issue Feb 23, 2020
Rather than using the user/document-provided values directly, we instead
escape them to use them verbatim.

This fixes issue the-paperless-project#568.
pitkley added a commit to pitkley/paperless that referenced this issue Feb 23, 2020
Rather than using the user/document-provided values directly, we instead
escape them to use them verbatim.

This fixes issue the-paperless-project#568.
pitkley added a commit that referenced this issue Feb 23, 2020
Rather than using the user/document-provided values directly, we instead
escape them to use them verbatim.

This fixes issue #568.
pitkley added a commit that referenced this issue Feb 23, 2020
Rather than using the user/document-provided values directly, we instead
escape them to use them verbatim.

This fixes issue #568.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants