Merge pull request #5 from andychase/develop

Reparse version 3
andychase · Nov 22, 2015 · 6afb310 · 6afb310
2 parents 47ee095 + 31ec4e9
commit 6afb310
Show file tree

Hide file tree

Showing 28 changed files with 271 additions and 146 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,6 +1,9 @@
 *.pyc
+.coverage
 .idea/*
+.tox/
+TAGS
+dist/*
 doc/build/*
 doc/html/*
-MANIFEST
-dist/*
+reparse.egg-info/
diff --git a/.landscape.yml b/.landscape.yml
@@ -0,0 +1,6 @@
+strictness: veryhigh
+pep8:
+  full: true
+doc-warnings: false
+test-warnings: true
+max-line-length: 80
diff --git a/.travis.yml b/.travis.yml
@@ -2,11 +2,12 @@ language: python
 sudo: false
 python:
   - 2.7
+  - 3.2
   - 3.3
   - 3.4
+  - 3.5
 install:
-  - pip install --use-mirrors pyyaml
-  - pip install --use-mirrors -r requirements.txt
-script: nosetests --with-doctest
+  - pip install -r requirements-dev.txt
+script: nosetests
 notifications:
   email: false
diff --git a/doc/source/about.rst b/doc/source/about.rst
@@ -1,35 +1,35 @@
 About: Why another tool for parsing?
 ====================================
 
-RE|PARSE is simply a tool for combining regular expressions together
+Reparse is simply a tool for combining regular expressions together
 and using a regular expression engine to scan/search/parse/process input for certain tasks.
 
 Larger parsing tools like YACC/Bison, ANTLR, and others are really
 good for structured input like computer code or xml. They aren't specifically
 designed for scanning and parsing semi-structured data from unstructured
 text (like books, or internet documents, or diaries).
 
-RE|PARSE is designed to work with exactly that kind of stuff, (and is completely
+Reparse is designed to work with exactly that kind of stuff, (and is completely
 useless for the kinds of tasks any of the above is often used for).
 
 Parsing Spectrum
 ----------------
 
-RE|PARSE isn't the first parser of it's kind. A hypothetical spectrum
+Reparse isn't the first parser of it's kind. A hypothetical spectrum
 of parsers from pattern-finding only
 all the way to highly-featured, structured grammars might look something like this::
 
-                    v- RE|PARSE            v- YACC/Bison
+                    v- Reparse            v- YACC/Bison
     UNSTRUCTURED |-------------------------| STRUCTURED
                  ^- Regex     ^- Parboiled/PyParsing
 
-RE|PARSE is in fact very featureless. It's only a little better
+Reparse is in fact very featureless. It's only a little better
 than plain regular expressions. Still, you might find it ideal
 for the kinds of tasks it was designed to deal with (like dates and addresses).
 
 
-What kind of things might RE|PARSE be useful for parsing?
----------------------------------------------------------
+What kind of things might Reparse be useful for parsing?
+--------------------------------------------------------
 
 Any kind of semi-structured formats:
 
@@ -41,38 +41,38 @@ Any kind of semi-structured formats:
 - Addresses
 - Phone numbers
 
-Or in other words, anything you might consider parsing with Regex, might consider RE|PARSE,
+Or in other words, anything you might consider parsing with Regex, might consider Reparse,
 especially if you are considering combining multiple regular expressions together.
 
 Why Regular Expressions
---------------------------------
+-----------------------
 
 PyParsing (Python) and Parboiled (JVM) also have use-cases very similar
-to RE|PARSE, and they are much more feature-filled. They have their own (much more powerful)
+to Reparse, and they are much more feature-filled. They have their own (much more powerful)
 DSL for parsing text.
 
-RE|PARSE uses Regular Expressions which has some advantages:
+Reparse uses Regular Expressions which has some advantages:
 
 - Short, minimal Syntax
 - Universal (with some minor differences between different engines)
 - Standard
 - Moderately Easy-to-learn (Though this is highly subjective)
     - Many programmers already know the basics
     - Skills can be carried else where
-- **Regular Expressions can be harvested elsewhere and used within RE|PARSE**
+- **Regular Expressions can be harvested elsewhere and used within Reparse**
 - Decent performance over large inputs
 - Ability to use fuzzy matching regex engines
 
 
-Limitations of RE|PARSE
--------------------------
+Limitations of Reparse
+----------------------
 
 Regular Expressions have been known to catch input that was unexpected,
 or miss input that was expected due to unforeseen edge cases.
-RE|PARSE provides tools to help alleviate this by checking the expressions against expected matching
+Reparse provides tools to help alleviate this by checking the expressions against expected matching
 inputs, and against expected non-matching inputs.
 
 This library is very limited in what it can parse, if you realize
 you need something like a recursive grammar, you might want to try PyParsing or something greater
-(though RE|PARSE might be helpful as a 'first step' matching and transforming the parse-able data before it is properly
+(though Reparse might be helpful as a 'first step' matching and transforming the parse-able data before it is properly
 parsed by a different library).
diff --git a/doc/source/best_practices.rst b/doc/source/best_practices.rst
@@ -13,7 +13,7 @@ they can have a long productive life without getting out of control:
 - Never let a regex become too big to be easily understood. Split up big regex
   into smaller expressions. (Sensible splits won't hurt them).
 - Maintain a Matches and Non-Matches
-    - RE|PARSE can use this to test your Regex to make sure they are matching properly
+    - Reparse can use this to test your Regex to make sure they are matching properly
     - It helps maintainers see which regular expressions match what quickly
     - It helps show your intention with each expression, so that others can confidently improve or modify them
 - Maintain a description which talks about what you are trying to match with each regex,

diff --git a/doc/source/howto.rst b/doc/source/howto.rst
@@ -1,5 +1,5 @@
-Howto: How to use RE|PARSE
-==========================
+Howto: How to use Reparse
+=========================
 
 
 You will need
@@ -10,15 +10,15 @@ You will need
 #. Some example texts that you will want to parse and their solutions.
    This will be useful to check your parser and will help you put together the expressions and patterns.
 
-1. Setup Python & RE|PARSE
---------------------------
+1. Setup Python & Reparse
+-------------------------
 
-See :ref:`installation-howto` for instructions on how to install RE|PARSE
+See :ref:`installation-howto` for instructions on how to install Reparse
 
-2. Layout of an example RE|PARSE parser
--------------------------------------
+2. Layout of an example Reparse parser
+--------------------------------------
 
-RE|PARSE needs 3 things in its operation:
+Reparse needs 3 things in its operation:
 
 1. Functions: A dictionary with String Key -> Function Value mapping.
 
@@ -113,7 +113,7 @@ in expressions and merely *combined* in patterns.
         Order: 2
         # I could have used <Basic Phone> instead to use a pattern inside a pattern but it wouldn't have made a difference really (just an extra function call).
 
-The order field tells RE|PARSE which pattern to pick if multiple patterns match.
+The order field tells Reparse which pattern to pick if multiple patterns match.
 Generally speaking, the more specific patterns should be ordered higher than the lower ones
 (you wouldn't want someone to try and call a fax machine!).
 
@@ -129,9 +129,9 @@ Done this way, I could have had 3 different formats for Area Code and the patter
 on any of them. I didn't here because that'd be overkill for phone numbers.
 
 5. Writing your functions.py file
-----------------------------------
+---------------------------------
 
-RE|PARSE matches text and also does some parsing using functions.
+Reparse matches text and also does some parsing using functions.
 
 The order in which the functions are run and results passed are as follows:
 
@@ -179,7 +179,7 @@ I used namedtuples here, but you can parse your output anyway you want to.
 6. Combining it all together!
 -----------------------------
 
-The builder.py module contains some functions to build a RE|PARSE system together.
+The builder.py module contains some functions to build a Reparse system together.
 Here's how I'd put together my phone number parser:
 
 .. code-block:: python

diff --git a/doc/source/modules.rst b/doc/source/modules.rst
@@ -1,4 +1,4 @@
-Here lies the embedded docblock documentation for the various parts of RE|PARSE.
+Here lies the embedded docblock documentation for the various parts of Reparse.
 
 expression
 =========

diff --git a/examples/colortime/colortime.py b/examples/colortime/colortime.py
@@ -1,16 +1,17 @@
+from __future__ import unicode_literals
 """ Example from docs:
 
->>> colortime_parser("~ ~ ~ go to the store ~ buy green at 11pm! ~ ~")
+>>> colortime_parser("~ ~ ~ go to the store ~ buy green at 11pm! ~ ~")  # doctest: +IGNORE_UNICODE
 [('green', datetime.time(23, 0))]
 
 In this case the processing functions weren't specified but you
 still get a useful result as a default.
->>> colortime_parser("~ ~ ~ Crazy 2pm green ~ ~")
+>>> colortime_parser("~ ~ ~ Crazy 2pm green ~ ~")  # doctest: +IGNORE_UNICODE
 [['green']]
 """
 # Example stuff -----------------------------------------------------
 # Have to add the parent directory just in case you
-# run this file in the demo directory without installing RE|PARSE
+# run this file in the demo directory without installing Reparse
 import sys
 sys.path.append('../..')
 
@@ -23,7 +24,7 @@
         path += "/"
 
 
-# RE|PARSE ----------------------------------------------------------
+# Reparse ----------------------------------------------------------
 from examples.colortime.functions import functions
 import reparse
 

diff --git a/examples/colortime/functions.py b/examples/colortime/functions.py
@@ -18,7 +18,7 @@ def color_time(Color=None, Time=None):
     return Color, Time
 
 # --------------- Function list ------------------
-# This is the dictionary that is used by the RE|PARSE
+# This is the dictionary that is used by the Reparse
 # expression builder. The key is the same value used in the patterns.yaml
 # file under ``Function: ``. The value is a reference to function.
 

diff --git a/examples/phone/functions.py b/examples/phone/functions.py
@@ -25,7 +25,7 @@ def fax_phone(p):
     return p._replace(fax=True)
 
 # --------------- Function list ------------------
-# This is the dictionary that is used by the RE|PARSE
+# This is the dictionary that is used by the Reparse
 # expression builder. The key is the same value used in the patterns.yaml
 # file under ``Function: ``. The value is a reference to function.
 

diff --git a/examples/phone/phone.py b/examples/phone/phone.py
@@ -1,12 +1,13 @@
+from __future__ import unicode_literals
 """ Example of a phone number parser
->>> phone_parser('+974-584-5656')
+>>> phone_parser('+974-584-5656')  # doctest: +IGNORE_UNICODE
 [phone(area_code='974', prefix='584', body='5656', fax=False)]
->>> phone_parser('Fax: +974-584-5656')
+>>> phone_parser('Fax: +974-584-5656')  # doctest: +IGNORE_UNICODE
 [phone(area_code='974', prefix='584', body='5656', fax=True)]
 """
 # Example stuff -----------------------------------------------------
 # Have to add the parent directory just in case you
-# run this file in the demo directory without installing RE|PARSE
+# run this file in the demo directory without installing Reparse
 import sys
 sys.path.append('../..')
 
@@ -19,7 +20,7 @@
         path += "/"
 
 
-# RE|PARSE ----------------------------------------------------------
+# Reparse ----------------------------------------------------------
 from examples.phone.functions import functions
 import reparse
 

diff --git a/examples/readme.rst b/examples/readme.rst
@@ -1,4 +1,4 @@
-These examples shows a very basic RE|PARSE setup to help you get started.
+These examples shows a very basic Reparse setup to help you get started.
 Under each directory there are files like this::
 
     expressions.yaml -- Contains the regular expression building blocks

diff --git a/nose.cfg b/nose.cfg
@@ -0,0 +1,3 @@
+[nosetests]
+with-doctest=1
+with-doctest-ignore-unicode=1
diff --git a/readme.rst b/readme.rst
@@ -1,5 +1,5 @@
-RE|PARSE
-========
+Reparse
+=======
 
 *Python library/tools for combining and parsing using Regular Expressions in a maintainable way*
 
@@ -28,7 +28,7 @@ So you want to get (color and time) or ``[('green', datetime.time(23, 0))]`` out
      blah blah blah go to the store to buy green at 11pm! blah blah
 
 If you need scan/search/parse/transform some unstructured input and get some semi-structured data
-out of it RE|PARSE might be able to help.
+out of it Reparse might be able to help.
 
 First structure some Regular Expressions (Here, in Yaml)
 --------------------------------------------------------
@@ -105,9 +105,9 @@ Result
 
 Cool!
 
-Intrigued? Learn more how to make the magic happen in `Howto: How to use RE|PARSE`_.
+Intrigued? Learn more how to make the magic happen in `Howto: How to use Reparse`_.
 
-Want to read more about what RE|PARSE is and what it can do? More info in `About: Why another tool for parsing?`_
+Want to read more about what Reparse is and what it can do? More info in `About: Why another tool for parsing?`_
 
 Info
 ====
@@ -127,7 +127,7 @@ manually
 ~~~~~~~~
 
 1. If you don't have them already,
-   RE|PARSE depends on REGEX_, and PyYaml_.
+   Reparse depends on REGEX_, and PyYaml_.
    Download those and ``python setup.py install`` in their directories.
    If you are on windows, you may have to find binary installers for these, since they
    contain modules that have to be compiled.
@@ -146,7 +146,7 @@ manually
 Support
 -------
 
-Need some help? Send me an email at asperous2@gmail.com and I'll do my best to help you.
+Need some help? Send me an email at theandychase@gmail.com and I'll do my best to help you.
 
 Contribution
 ------------
@@ -157,6 +157,7 @@ Send me suggestions, issues, and pull requests and I'll gladly review them!
 Versions
 --------
 
+- *3.0* InvalidPattern Exception, Allow monkey patching regex arguments. RE|PARSE -> Reparse.
 - *2.1* Change `yaml.load` to `yaml.safe_load` for security
 - *2.0* Major Refactor, Python 3, Better Parser builders
 - *1.1* Fix setup.py
@@ -176,7 +177,7 @@ MIT Licensed! See LICENSE file for the full text.
 
 .. _Docs at Readthedocs: https://reparse.readthedocs.org/en/latest/
 
-.. _`Howto: How to use RE|PARSE`: https://reparse.readthedocs.org/en/latest/howto.html
+.. _`Howto: How to use Reparse`: https://reparse.readthedocs.org/en/latest/howto.html
 
 .. _`About: Why another tool for parsing?`: https://reparse.readthedocs.org/en/latest/about.html
 

diff --git a/reparse/__init__.py b/reparse/__init__.py
@@ -1,4 +1,4 @@
-""" RE|PARSE
+""" Reparse
 """
 
 from reparse.parsers import *
diff --git a/reparse/builders.py b/reparse/builders.py
@@ -1,3 +1,4 @@
+from __future__ import unicode_literals
 from reparse.config import pattern_max_recursion_depth
 from reparse.expression import Group, AlternatesGroup, Expression
 from reparse.util import separate_string
@@ -55,7 +56,7 @@ def func(_):
             def func(_):
                 if any(_):
                     return _
-        func.__name__ = name
+        func.__name__ = str(name)
         return func
 
     def add_function(self, name, function):
@@ -76,13 +77,13 @@ class Expression_Builder(object):
     >>> function_builder.get_function = get_function
     >>> expression = {'greeting':{'greeting':{'Expression': '(hey)|(cool)', 'Groups' : ['greeting', 'cooly']}}}
     >>> eb = Expression_Builder(expression, function_builder)
-    >>> eb.get_type("greeting").findall("hey, cool!")
+    >>> eb.get_type("greeting").findall("hey, cool!")  # doctest: +IGNORE_UNICODE
     [[('hey',), ('',)], [('',), ('cool',)]]
     """
 
     def __init__(self, expressions_dict, function_builder):
         self.type_db = {}
-        
+
         for expression_type, expressions in expressions_dict.items():
             type_expressions = []
             for name, expression in expressions.items():

diff --git a/reparse/config.py b/reparse/config.py
@@ -6,5 +6,9 @@
 
 # The regex engine and settings
 regex_flags = regex.VERBOSE | regex.IGNORECASE
-expression_compiler = lambda expression: regex.compile(expression, flags=regex_flags)
-expression_sub = lambda expression, sub, string: regex.sub(expression, sub, string, flags=regex_flags)
+
+def get_expression_compiler():
+    return lambda expression: regex.compile(expression, flags=regex_flags)
+
+def get_expression_sub(): 
+    return lambda expression, sub, string: regex.sub(expression, sub, string, flags=regex_flags)