Skip to content

Latest commit

 

History

History
61 lines (43 loc) · 2.2 KB

README.rst

File metadata and controls

61 lines (43 loc) · 2.2 KB

PyRefine

Documentation Status Updates

OpenRefine is a great tool for exploring and cleaning datasets prior to analysing them. It also records an undo history of all actions that you can export as a sort of script in JSON format. However, in order to execute that script on a new dataset, you need to manually import it through the graphical interface or set up a BatchRefine server, neither of which is quick.

PyRefine allows you to execute OpenRefine JSON scripts against datasets without firing up a full Java/OpenRefine server. It has a commandline tool for quick use, or you can use it as a library to integrate it into your pandas-based data analysis pipeline.

More details in this blog post.

Please note: PyRefine is still very much alpha-quality. It probably doesn't work exactly how you're expecting right now. That said, please try it out, and consider :doc:`contributing`!

Features

  • Execute OpenRefine JSON against a dataset from the command line
  • Execute OpenRefine JSON from a Python script

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.