Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for user controlled or raw regular expression #45

Open
emanlove opened this issue Oct 31, 2023 · 4 comments
Open

Allow for user controlled or raw regular expression #45

emanlove opened this issue Oct 31, 2023 · 4 comments

Comments

@emanlove
Copy link
Member

emanlove commented Oct 31, 2023

The current implementation of the REGEX: match automatically prepend with ^ and postpend $ matching the entire message. This default is good because it reads without needing these symbols. But in some case when we want to use an inline modifier or flag then this throws an exception under Python 3.11 or greater. (Might be an earlier version but my search show this to have changed from a warning to an error with 3.11). Reason is that inline modifiers must be at the beginning of the expression. But with the behind the scenes addition of ^. this prevents this from happening with REGEX:.

To resolve while still keeping the existing usage I was proposing a new match called RAWRE (or RAWREGX or RAWREGEX or the like), for "raw regular expression", which will not include the prepended ^ and postpended $.

Attached is a sample test demonstrating the problem and a proposed solution. In addition to removing the prepend and postpended flags I also removed the re.DOTALL flag allowing the user to use rawre to use whichever flags they choose.

leading_global_flag_error.patch

diff --git a/test/tests.robot b/test/tests.robot
index 35b3fa1..3587a03 100644
--- a/test/tests.robot
+++ b/test/tests.robot
@@ -271,6 +271,7 @@ Expected PASS and log messages with COUNT
     ...    PASS Told ya!!
     ...    LOG 4 COUNT: 2
     ...    LOG 4:2 NONE
+    ...    LOG 3 INFO REGEXP: (?i)Any.*now\.\.\.
     Status    PASS    Told ya!!
     Log    Passing soon!
     Log    Any time now...

raw_re_solution.patch

diff --git a/robotstatuschecker.py b/robotstatuschecker.py
index 2f6acde..5091f9b 100755
--- a/robotstatuschecker.py
+++ b/robotstatuschecker.py
@@ -167,6 +167,10 @@ class BaseChecker:
             pattern = f"^{expected.replace('REGEXP:', '', 1).strip()}$"
             if re.match(pattern, actual, re.DOTALL):
                 return True
+        if expected.startswith("RAWRE:"):
+            pattern = f"{expected.replace('RAWRE:', '', 1).strip()}"
+            if re.match(pattern, actual):
+                return True
         if expected.startswith("GLOB:"):
             pattern = expected.replace("GLOB:", "", 1).strip()
             matcher = Matcher(pattern, caseless=False, spaceless=False)

test_solution.patch

diff --git a/test/tests.robot b/test/tests.robot
index 35b3fa1..21ff448 100644
--- a/test/tests.robot
+++ b/test/tests.robot
@@ -271,6 +271,7 @@ Expected PASS and log messages with COUNT
     ...    PASS Told ya!!
     ...    LOG 4 COUNT: 2
     ...    LOG 4:2 NONE
+    ...    LOG 3 INFO RAWRE: (?si)^Any.*now\.\.\.$
     Status    PASS    Told ya!!
     Log    Passing soon!
     Log    Any time now...
@pekkaklarck
Copy link
Member

I'd prefer flags being supported by default. I believe it would be enough to just use newish re.fullmatch and omit ^$ altogether.

@pekkaklarck
Copy link
Member

The offending ^ could actually be removed also with re.match, but re.fullmatch would be more explicit.

I don't see cases where the DOTALL mode would cause issues so I'd leave it. I wouldn't add a separate mode to allow disabling it either.

@pekkaklarck
Copy link
Member

Notice that although inline flags were supposed to be used in the beginning if the pattern, they were actually supported also elsewhere until Python 3.11. For example, ^(?i)xxx$ worked earlier, but with 3.11+ needs to be written like (?i)^xxx$. In most of the cases, such as in ours, this particular case is easiest to fix by removing ^ and using re.match or re.fullmatch.

@emanlove
Copy link
Member Author

emanlove commented Oct 31, 2023

I'm a bit out of my comfort zone and knowledge expertise when it comes to re design and architecture. The one thing I did really like and wanted to keep was that for regular expression one could just write the full expected message and not have to write out the match start and end. So if fullmatch works like that, which I was assuming it did, then I am good with your suggestions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants