Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extract: Wrong lineno when arguments are spread over multiple lines and contain a function call #1123

Open
oomsveta opened this issue Sep 6, 2024 · 4 comments · May be fixed by #1126 or #1127
Open

Comments

@oomsveta
Copy link

oomsveta commented Sep 6, 2024

Overview

Consider this piece of code:

_(
    "hello",
    dummy_fn_call()
)

Extracting it yields the following result:

(3, 'hello', [], None)

As you may have noticed, the lineno (the first component of the tuple) is wrong. It should be either 1 or 2 (depending on whether you expect the lineno of gettext or the lineno of the value), but not 3.

This is because the code setting message_lineno is executed every time an opening parenthesis is encountered and funcname is set:

elif tok == OP and value == '(':
if in_def:
# Avoid false positives for declarations such as:
# def gettext(arg='message'):
in_def = False
continue
if funcname:
message_lineno = lineno

while funcname is only changed if the name is in keywords (i.e., if the name is one of the gettext names):
elif tok == NAME and value in keywords:
funcname = value

So, in the case where you have a gettext call -setting funcname- with another function call inside its arguments -resulting in an opening parenthesis-, you meet the two conditions to change the value of lineno for something that might be wrong, for example, if you spread the arguments to gettext over multiple lines

Steps to Reproduce

Copy-paste this to your python REPL:

from io import BytesIO
from babel.messages.extract import extract

file = b"""_(
    "hello",
    dummy_fn_call()
)"""

list(extract("python", BytesIO(file)))

Actual Results

[(3, 'hello', [], None)]

Expected Results

[(1, 'hello', [], None)]

or maybe

[(2, 'hello', [], None)]

depending on whether you expect the lineno of gettext or the lineno of the value

Additional Information

The first and last test cases of test_comments_with_calls_that_spawn_multiple_lines should fail, assuming you decide to consider the lineno to be the lineno of the gettext call. It currently fails if you replace the call to len in the test cases with something that isn't a function call

@oomsveta
Copy link
Author

oomsveta commented Sep 6, 2024

I also wrote a quick-and-dirty fix:

diff --git a/babel/messages/extract.py b/babel/messages/extract.py
index 48573ee..182a74b 100644
--- a/babel/messages/extract.py
+++ b/babel/messages/extract.py
@@ -502,6 +502,7 @@ def extract_python(
     :param options: a dictionary of additional options (optional)
     :rtype: ``iterator``
     """
+    last_name_token = ""
     funcname = lineno = message_lineno = None
     stack_depth = -1
     buf = []
@@ -521,6 +522,8 @@ def extract_python(
     current_fstring_start = None
 
     for tok, value, (lineno, _), _, _ in tokens:
+        if tok == NAME:
+            last_name_token = value
         if stack_depth == -1 and tok == NAME and value in ('def', 'class'):
             in_def = True
         elif tok == OP and value == '(':
@@ -530,7 +533,8 @@ def extract_python(
                 in_def = False
                 continue
             if funcname:
-                message_lineno = lineno
+                if last_name_token in keywords:
+                    message_lineno = lineno
                 stack_depth += 1
         elif in_def and tok == OP and value == ':':
             # End of a class definition without parens
diff --git a/tests/messages/test_extract.py b/tests/messages/test_extract.py
index 7d3a05a..852cdea 100644
--- a/tests/messages/test_extract.py
+++ b/tests/messages/test_extract.py
@@ -97,10 +97,10 @@ add_notice(req, ngettext("Bar deleted.",
         messages = list(extract.extract_python(buf, ('ngettext', '_'), ['NOTE:'],
 
                                                {'strip_comment_tags': False}))
-        assert messages[0] == (3, 'ngettext', ('Catalog deleted.', 'Catalogs deleted.', None), ['NOTE: This Comment SHOULD Be Extracted'])
+        assert messages[0] == (2, 'ngettext', ('Catalog deleted.', 'Catalogs deleted.', None), ['NOTE: This Comment SHOULD Be Extracted'])
         assert messages[1] == (6, '_', 'Locale deleted.', ['NOTE: This Comment SHOULD Be Extracted'])
         assert messages[2] == (10, 'ngettext', ('Foo deleted.', 'Foos deleted.', None), ['NOTE: This Comment SHOULD Be Extracted'])
-        assert messages[3] == (15, 'ngettext', ('Bar deleted.', 'Bars deleted.', None), ['NOTE: This Comment SHOULD Be Extracted', 'NOTE: And This One Too'])
+        assert messages[3] == (14, 'ngettext', ('Bar deleted.', 'Bars deleted.', None), ['NOTE: This Comment SHOULD Be Extracted', 'NOTE: And This One Too'])
 
     def test_declarations(self):
         buf = BytesIO(b"""\

@tomasr8
Copy link
Member

tomasr8 commented Sep 6, 2024

Thanks for the investigation!

Just for reference, GNU's gettext sets the line number to 2:

> xgettext hello.py
> cat messages.po 
...

#: line.py:2
msgid "hello"
msgstr ""

I think it makes sense to try to be consistent with xgettext here, i.e. use lineno=2. Would you like to open a PR?

@oomsveta
Copy link
Author

I think it makes sense to try to be consistent with xgettext here, i.e. use lineno=2.
@tomasr8

Agreed, and in addition to being consistent with xgettext, it makes more sense to use the number of the line with the actual message.
It might be a bit tricky to implement, though: I imagine it'll require pairing each message with its lineno, which would change the current API and break the custom extract methods. What do you think?

Would you like to open a PR?

I don't mind giving it a try, but I'm not sure I'll be able to free up enough time to look into it at the moment

@tomasr8
Copy link
Member

tomasr8 commented Sep 13, 2024

I imagine it'll require pairing each message with its lineno, which would change the current API and break the custom extract methods.

How would this break custom extractors?

dylankiss added a commit to dylankiss/babel that referenced this issue Sep 18, 2024
When a nested function call was inside a gettext call, but on a separate
line, the message got extracted with the line number of that function
call. We updated the line number whenever we encountered an opening
parenthesis after a gettext function name.

This commit fixes that by only updating the line number on the first
argument inside a gettext call.

Compared with `xgettext`, some test cases were thus wrong and corrected
in this commit.

Fixes python-babel#1123
dylankiss added a commit to dylankiss/babel that referenced this issue Sep 18, 2024
When a gettext call had a nested function call on a new line, the
extract function would use that nested call's line number when
extracting the terms for the gettext call.

The reason is that we set the line number on any encounter of an opening
parenthesis after a gettext keyword. This does not work if either we
have a nested call, or our first term starts on a new line.

This commit fixes that by only setting the line number when we encounter
the first argument inside a gettext call.

Existing tests were adapted to work according to `xgettext` with regards
to the line numbers.

Fixes python-babel#1123
@dylankiss dylankiss linked a pull request Sep 18, 2024 that will close this issue
dylankiss added a commit to dylankiss/babel that referenced this issue Sep 19, 2024
Currently the Python extractor does not support deeply nested gettext
calls (deeper than as a direct argument to the top-level gettext call).

e.g.
```py
_("Hello %s", _("Person"))
_("Hello %s",
  random_function(", ".join([_("Person 1"), _("Person 2")])))
```

The extraction code was refactored quite a bit to simplify the flow and
support this use-case.

Fixes python-babel#1125
(meanwhile also fixes python-babel#1123)
dylankiss added a commit to dylankiss/babel that referenced this issue Oct 10, 2024
Currently the Python extractor does not support deeply nested gettext
calls (deeper than as a direct argument to the top-level gettext call).

e.g.
```py
_("Hello %s", _("Person"))
_("Hello %s",
  random_function(", ".join([_("Person 1"), _("Person 2")])))
```

The extraction code was refactored quite a bit to simplify the flow and
support this use-case.

Fixes python-babel#1125
(meanwhile also fixes python-babel#1123)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants