Skip to content

Commit

Permalink
Merge pull request #5 from andychase/develop
Browse files Browse the repository at this point in the history
Reparse version 3
  • Loading branch information
andychase committed Nov 22, 2015
2 parents 47ee095 + 31ec4e9 commit 6afb310
Show file tree
Hide file tree
Showing 28 changed files with 271 additions and 146 deletions.
7 changes: 5 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
*.pyc
.coverage
.idea/*
.tox/
TAGS
dist/*
doc/build/*
doc/html/*
MANIFEST
dist/*
reparse.egg-info/
6 changes: 6 additions & 0 deletions .landscape.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
strictness: veryhigh
pep8:
full: true
doc-warnings: false
test-warnings: true
max-line-length: 80
7 changes: 4 additions & 3 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,12 @@ language: python
sudo: false
python:
- 2.7
- 3.2
- 3.3
- 3.4
- 3.5
install:
- pip install --use-mirrors pyyaml
- pip install --use-mirrors -r requirements.txt
script: nosetests --with-doctest
- pip install -r requirements-dev.txt
script: nosetests
notifications:
email: false
32 changes: 16 additions & 16 deletions doc/source/about.rst
Original file line number Diff line number Diff line change
@@ -1,35 +1,35 @@
About: Why another tool for parsing?
====================================

RE|PARSE is simply a tool for combining regular expressions together
Reparse is simply a tool for combining regular expressions together
and using a regular expression engine to scan/search/parse/process input for certain tasks.

Larger parsing tools like YACC/Bison, ANTLR, and others are really
good for structured input like computer code or xml. They aren't specifically
designed for scanning and parsing semi-structured data from unstructured
text (like books, or internet documents, or diaries).

RE|PARSE is designed to work with exactly that kind of stuff, (and is completely
Reparse is designed to work with exactly that kind of stuff, (and is completely
useless for the kinds of tasks any of the above is often used for).

Parsing Spectrum
----------------

RE|PARSE isn't the first parser of it's kind. A hypothetical spectrum
Reparse isn't the first parser of it's kind. A hypothetical spectrum
of parsers from pattern-finding only
all the way to highly-featured, structured grammars might look something like this::

v- RE|PARSE v- YACC/Bison
v- Reparse v- YACC/Bison
UNSTRUCTURED |-------------------------| STRUCTURED
^- Regex ^- Parboiled/PyParsing

RE|PARSE is in fact very featureless. It's only a little better
Reparse is in fact very featureless. It's only a little better
than plain regular expressions. Still, you might find it ideal
for the kinds of tasks it was designed to deal with (like dates and addresses).


What kind of things might RE|PARSE be useful for parsing?
---------------------------------------------------------
What kind of things might Reparse be useful for parsing?
--------------------------------------------------------

Any kind of semi-structured formats:

Expand All @@ -41,38 +41,38 @@ Any kind of semi-structured formats:
- Addresses
- Phone numbers

Or in other words, anything you might consider parsing with Regex, might consider RE|PARSE,
Or in other words, anything you might consider parsing with Regex, might consider Reparse,
especially if you are considering combining multiple regular expressions together.

Why Regular Expressions
--------------------------------
-----------------------

PyParsing (Python) and Parboiled (JVM) also have use-cases very similar
to RE|PARSE, and they are much more feature-filled. They have their own (much more powerful)
to Reparse, and they are much more feature-filled. They have their own (much more powerful)
DSL for parsing text.

RE|PARSE uses Regular Expressions which has some advantages:
Reparse uses Regular Expressions which has some advantages:

- Short, minimal Syntax
- Universal (with some minor differences between different engines)
- Standard
- Moderately Easy-to-learn (Though this is highly subjective)
- Many programmers already know the basics
- Skills can be carried else where
- **Regular Expressions can be harvested elsewhere and used within RE|PARSE**
- **Regular Expressions can be harvested elsewhere and used within Reparse**
- Decent performance over large inputs
- Ability to use fuzzy matching regex engines


Limitations of RE|PARSE
-------------------------
Limitations of Reparse
----------------------

Regular Expressions have been known to catch input that was unexpected,
or miss input that was expected due to unforeseen edge cases.
RE|PARSE provides tools to help alleviate this by checking the expressions against expected matching
Reparse provides tools to help alleviate this by checking the expressions against expected matching
inputs, and against expected non-matching inputs.

This library is very limited in what it can parse, if you realize
you need something like a recursive grammar, you might want to try PyParsing or something greater
(though RE|PARSE might be helpful as a 'first step' matching and transforming the parse-able data before it is properly
(though Reparse might be helpful as a 'first step' matching and transforming the parse-able data before it is properly
parsed by a different library).
2 changes: 1 addition & 1 deletion doc/source/best_practices.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ they can have a long productive life without getting out of control:
- Never let a regex become too big to be easily understood. Split up big regex
into smaller expressions. (Sensible splits won't hurt them).
- Maintain a Matches and Non-Matches
- RE|PARSE can use this to test your Regex to make sure they are matching properly
- Reparse can use this to test your Regex to make sure they are matching properly
- It helps maintainers see which regular expressions match what quickly
- It helps show your intention with each expression, so that others can confidently improve or modify them
- Maintain a description which talks about what you are trying to match with each regex,
Expand Down
24 changes: 12 additions & 12 deletions doc/source/howto.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Howto: How to use RE|PARSE
==========================
Howto: How to use Reparse
=========================


You will need
Expand All @@ -10,15 +10,15 @@ You will need
#. Some example texts that you will want to parse and their solutions.
This will be useful to check your parser and will help you put together the expressions and patterns.

1. Setup Python & RE|PARSE
--------------------------
1. Setup Python & Reparse
-------------------------

See :ref:`installation-howto` for instructions on how to install RE|PARSE
See :ref:`installation-howto` for instructions on how to install Reparse

2. Layout of an example RE|PARSE parser
-------------------------------------
2. Layout of an example Reparse parser
--------------------------------------

RE|PARSE needs 3 things in its operation:
Reparse needs 3 things in its operation:

1. Functions: A dictionary with String Key -> Function Value mapping.

Expand Down Expand Up @@ -113,7 +113,7 @@ in expressions and merely *combined* in patterns.
Order: 2
# I could have used <Basic Phone> instead to use a pattern inside a pattern but it wouldn't have made a difference really (just an extra function call).
The order field tells RE|PARSE which pattern to pick if multiple patterns match.
The order field tells Reparse which pattern to pick if multiple patterns match.
Generally speaking, the more specific patterns should be ordered higher than the lower ones
(you wouldn't want someone to try and call a fax machine!).

Expand All @@ -129,9 +129,9 @@ Done this way, I could have had 3 different formats for Area Code and the patter
on any of them. I didn't here because that'd be overkill for phone numbers.

5. Writing your functions.py file
----------------------------------
---------------------------------

RE|PARSE matches text and also does some parsing using functions.
Reparse matches text and also does some parsing using functions.

The order in which the functions are run and results passed are as follows:

Expand Down Expand Up @@ -179,7 +179,7 @@ I used namedtuples here, but you can parse your output anyway you want to.
6. Combining it all together!
-----------------------------

The builder.py module contains some functions to build a RE|PARSE system together.
The builder.py module contains some functions to build a Reparse system together.
Here's how I'd put together my phone number parser:

.. code-block:: python
Expand Down
2 changes: 1 addition & 1 deletion doc/source/modules.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Here lies the embedded docblock documentation for the various parts of RE|PARSE.
Here lies the embedded docblock documentation for the various parts of Reparse.

expression
=========
Expand Down
9 changes: 5 additions & 4 deletions examples/colortime/colortime.py
Original file line number Diff line number Diff line change
@@ -1,16 +1,17 @@
from __future__ import unicode_literals
""" Example from docs:
>>> colortime_parser("~ ~ ~ go to the store ~ buy green at 11pm! ~ ~")
>>> colortime_parser("~ ~ ~ go to the store ~ buy green at 11pm! ~ ~") # doctest: +IGNORE_UNICODE
[('green', datetime.time(23, 0))]
In this case the processing functions weren't specified but you
still get a useful result as a default.
>>> colortime_parser("~ ~ ~ Crazy 2pm green ~ ~")
>>> colortime_parser("~ ~ ~ Crazy 2pm green ~ ~") # doctest: +IGNORE_UNICODE
[['green']]
"""
# Example stuff -----------------------------------------------------
# Have to add the parent directory just in case you
# run this file in the demo directory without installing RE|PARSE
# run this file in the demo directory without installing Reparse
import sys
sys.path.append('../..')

Expand All @@ -23,7 +24,7 @@
path += "/"


# RE|PARSE ----------------------------------------------------------
# Reparse ----------------------------------------------------------
from examples.colortime.functions import functions
import reparse

Expand Down
2 changes: 1 addition & 1 deletion examples/colortime/functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ def color_time(Color=None, Time=None):
return Color, Time

# --------------- Function list ------------------
# This is the dictionary that is used by the RE|PARSE
# This is the dictionary that is used by the Reparse
# expression builder. The key is the same value used in the patterns.yaml
# file under ``Function: ``. The value is a reference to function.

Expand Down
2 changes: 1 addition & 1 deletion examples/phone/functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ def fax_phone(p):
return p._replace(fax=True)

# --------------- Function list ------------------
# This is the dictionary that is used by the RE|PARSE
# This is the dictionary that is used by the Reparse
# expression builder. The key is the same value used in the patterns.yaml
# file under ``Function: ``. The value is a reference to function.

Expand Down
9 changes: 5 additions & 4 deletions examples/phone/phone.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
from __future__ import unicode_literals
""" Example of a phone number parser
>>> phone_parser('+974-584-5656')
>>> phone_parser('+974-584-5656') # doctest: +IGNORE_UNICODE
[phone(area_code='974', prefix='584', body='5656', fax=False)]
>>> phone_parser('Fax: +974-584-5656')
>>> phone_parser('Fax: +974-584-5656') # doctest: +IGNORE_UNICODE
[phone(area_code='974', prefix='584', body='5656', fax=True)]
"""
# Example stuff -----------------------------------------------------
# Have to add the parent directory just in case you
# run this file in the demo directory without installing RE|PARSE
# run this file in the demo directory without installing Reparse
import sys
sys.path.append('../..')

Expand All @@ -19,7 +20,7 @@
path += "/"


# RE|PARSE ----------------------------------------------------------
# Reparse ----------------------------------------------------------
from examples.phone.functions import functions
import reparse

Expand Down
2 changes: 1 addition & 1 deletion examples/readme.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
These examples shows a very basic RE|PARSE setup to help you get started.
These examples shows a very basic Reparse setup to help you get started.
Under each directory there are files like this::

expressions.yaml -- Contains the regular expression building blocks
Expand Down
3 changes: 3 additions & 0 deletions nose.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[nosetests]
with-doctest=1
with-doctest-ignore-unicode=1
17 changes: 9 additions & 8 deletions readme.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
RE|PARSE
========
Reparse
=======

*Python library/tools for combining and parsing using Regular Expressions in a maintainable way*

Expand Down Expand Up @@ -28,7 +28,7 @@ So you want to get (color and time) or ``[('green', datetime.time(23, 0))]`` out
blah blah blah go to the store to buy green at 11pm! blah blah

If you need scan/search/parse/transform some unstructured input and get some semi-structured data
out of it RE|PARSE might be able to help.
out of it Reparse might be able to help.

First structure some Regular Expressions (Here, in Yaml)
--------------------------------------------------------
Expand Down Expand Up @@ -105,9 +105,9 @@ Result
Cool!

Intrigued? Learn more how to make the magic happen in `Howto: How to use RE|PARSE`_.
Intrigued? Learn more how to make the magic happen in `Howto: How to use Reparse`_.

Want to read more about what RE|PARSE is and what it can do? More info in `About: Why another tool for parsing?`_
Want to read more about what Reparse is and what it can do? More info in `About: Why another tool for parsing?`_

Info
====
Expand All @@ -127,7 +127,7 @@ manually
~~~~~~~~

1. If you don't have them already,
RE|PARSE depends on REGEX_, and PyYaml_.
Reparse depends on REGEX_, and PyYaml_.
Download those and ``python setup.py install`` in their directories.
If you are on windows, you may have to find binary installers for these, since they
contain modules that have to be compiled.
Expand All @@ -146,7 +146,7 @@ manually
Support
-------

Need some help? Send me an email at asperous2@gmail.com and I'll do my best to help you.
Need some help? Send me an email at theandychase@gmail.com and I'll do my best to help you.

Contribution
------------
Expand All @@ -157,6 +157,7 @@ Send me suggestions, issues, and pull requests and I'll gladly review them!
Versions
--------

- *3.0* InvalidPattern Exception, Allow monkey patching regex arguments. RE|PARSE -> Reparse.
- *2.1* Change `yaml.load` to `yaml.safe_load` for security
- *2.0* Major Refactor, Python 3, Better Parser builders
- *1.1* Fix setup.py
Expand All @@ -176,7 +177,7 @@ MIT Licensed! See LICENSE file for the full text.

.. _Docs at Readthedocs: https://reparse.readthedocs.org/en/latest/

.. _`Howto: How to use RE|PARSE`: https://reparse.readthedocs.org/en/latest/howto.html
.. _`Howto: How to use Reparse`: https://reparse.readthedocs.org/en/latest/howto.html

.. _`About: Why another tool for parsing?`: https://reparse.readthedocs.org/en/latest/about.html

Expand Down
2 changes: 1 addition & 1 deletion reparse/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
""" RE|PARSE
""" Reparse
"""

from reparse.parsers import *
7 changes: 4 additions & 3 deletions reparse/builders.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
from __future__ import unicode_literals
from reparse.config import pattern_max_recursion_depth
from reparse.expression import Group, AlternatesGroup, Expression
from reparse.util import separate_string
Expand Down Expand Up @@ -55,7 +56,7 @@ def func(_):
def func(_):
if any(_):
return _
func.__name__ = name
func.__name__ = str(name)
return func

def add_function(self, name, function):
Expand All @@ -76,13 +77,13 @@ class Expression_Builder(object):
>>> function_builder.get_function = get_function
>>> expression = {'greeting':{'greeting':{'Expression': '(hey)|(cool)', 'Groups' : ['greeting', 'cooly']}}}
>>> eb = Expression_Builder(expression, function_builder)
>>> eb.get_type("greeting").findall("hey, cool!")
>>> eb.get_type("greeting").findall("hey, cool!") # doctest: +IGNORE_UNICODE
[[('hey',), ('',)], [('',), ('cool',)]]
"""

def __init__(self, expressions_dict, function_builder):
self.type_db = {}

for expression_type, expressions in expressions_dict.items():
type_expressions = []
for name, expression in expressions.items():
Expand Down
8 changes: 6 additions & 2 deletions reparse/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,9 @@

# The regex engine and settings
regex_flags = regex.VERBOSE | regex.IGNORECASE
expression_compiler = lambda expression: regex.compile(expression, flags=regex_flags)
expression_sub = lambda expression, sub, string: regex.sub(expression, sub, string, flags=regex_flags)

def get_expression_compiler():
return lambda expression: regex.compile(expression, flags=regex_flags)

def get_expression_sub():
return lambda expression, sub, string: regex.sub(expression, sub, string, flags=regex_flags)
Loading

0 comments on commit 6afb310

Please sign in to comment.