Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[git] Use "backslashreplace" instead of "surrogateescape". #22

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jgbarah
Copy link
Contributor

@jgbarah jgbarah commented Mar 18, 2016

When decoding as utf8, if the character cannnot be decoded,
use the backslashreplace error handler, instead of the
surrogateescape error handler.

Fixes #18 for git backend, maybe others should be fixed too.

@sduenas
Copy link
Member

sduenas commented Mar 19, 2016

When I run the tests I get the next errors:

~/devel/grimoire/perceval/tests$ python3 test_git.py 
....E..E....................
======================================================================
ERROR: test_git_encoding_error (__main__.TestGitBackend)
Test if encoding errors are escaped when a git log is parsed
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_git.py", line 199, in test_git_encoding_error
    result = [commit for commit in commits]
  File "test_git.py", line 199, in <listcomp>
    result = [commit for commit in commits]
  File "../perceval/backends/git.py", line 160, in parse_git_log_from_file
    for commit in parser.parse():
  File "../perceval/backends/git.py", line 375, in parse
    for line in self.stream:
  File "/usr/lib/python3.4/codecs.py", line 319, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
TypeError: don't know how to handle UnicodeDecodeError in error callback

======================================================================
ERROR: test_git_utf8_error (__main__.TestGitBackend)
Characters that cannot decoded as utf8 can be later encoded as utf8.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_git.py", line 220, in test_git_utf8_error
    commit = [commit for commit in commits][0]
  File "test_git.py", line 220, in <listcomp>
    commit = [commit for commit in commits][0]
  File "../perceval/backends/git.py", line 160, in parse_git_log_from_file
    for commit in parser.parse():
  File "../perceval/backends/git.py", line 375, in parse
    for line in self.stream:
  File "/usr/lib/python3.4/codecs.py", line 319, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
TypeError: don't know how to handle UnicodeDecodeError in error callback

----------------------------------------------------------------------
Ran 28 tests in 0.176s

FAILED (errors=2)

@jgbarah
Copy link
Contributor Author

jgbarah commented Mar 19, 2016

I had forgotten about running all tests, sorry. But when I just did, curiously enough I get a different error:

$ python3 test_git.py 
....F.......................
======================================================================
FAIL: test_git_encoding_error (__main__.TestGitBackend)
Test if encoding errors are escaped when a git log is parsed
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_git.py", line 205, in test_git_encoding_error
    self.assertEqual(commit['message'], 'Calling \udc93Open Type\udc94 (CTRL+SHIFT+T) after startup - performance improvement.')
AssertionError: 'Calling \\x93Open Type\\x94 (CTRL+SHIFT+T) after s[29 chars]ent.' != 'Calling \udc93Open Type\udc94 (CTRL+SHIFT+T) after[31 chars]ent.'
- Calling \x93Open Type\x94 (CTRL+SHIFT+T) after startup - performance improvement.
?         ^^^^         ^^^^
+ Calling \udc93Open Type\udc94 (CTRL+SHIFT+T) after startup - performance improvement.
?         ^         ^


----------------------------------------------------------------------
Ran 28 tests in 0.394s

FAILED (failures=1)

I'm going to fix this one (which is due to the change of encoding in case of exceptions when encoding), by changing the expected result. And then I will have a look at the errors that are raised for you...

When decoding as utf8, if the character cannnot be decoded,
use the backslashreplace error handler, instead of the
surrogateescape error handler.

Fixes chaoss#18 for git backend, maybe others should be fixed too.
@jgbarah
Copy link
Contributor Author

jgbarah commented Mar 19, 2016

I just git amend forced a new commit which in my side passes all tests:

$ python3 test_git.py 
............................
----------------------------------------------------------------------
Ran 28 tests in 0.388s

OK

Would you mind checking once again? Maybe I missed something, but I cannot reproduce the problem you see...

@sduenas
Copy link
Member

sduenas commented Mar 21, 2016

It's still failing but I found why. Looks like backslashreplace was not supported before python 3.5 although the documentation says the opposite.

If we accept this change will mean Perceval will work only with Python 3.5. I don't see now any problems with it but for instance in Ubuntu 15.10, the default version is 3.4.

@sduenas sduenas added the git label Mar 21, 2016
@jgbarah
Copy link
Contributor Author

jgbarah commented Mar 22, 2016

Uhhhm. I hadn't noticed either :-( I can find another option, now that I sort of understand the problem, and produce some code dependent on Python being < 3.5. But maybe we could do that in separate PR, to make this trough and let it work in git repos such as that of the Linux kernel which need it...

@jgbarah
Copy link
Contributor Author

jgbarah commented Aug 5, 2017

@sduenas do you think we could do this change, or something similar? If so, I can update the patch. Otherwise, we better close the PR.

valeriocos pushed a commit to valeriocos/perceval that referenced this pull request Dec 6, 2017
@abitrolly
Copy link

Here is way to do the same operation in Python 2 and Python 3 compatible way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unicode error due to using surrogates
3 participants