Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
aborruso authored Nov 29, 2018
1 parent 85b2aed commit ca91998
Showing 1 changed file with 47 additions and 1 deletion.
48 changes: 47 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,47 @@
# scrape
# scrape cli

It's the command-line version of the [great scraping tool](https://github.com/jeroenjanssens/data-science-at-the-command-line/blob/master/tools/scrape) written by [Jeroen Janssens](http://jeroenjanssens.com).

It extracts HTML elements using an XPath query or CSS3 selector.

Example usage:

```
$ curl -L 'http://en.wikipedia.org/wiki/List_of_sovereign_states' -s \
| scrape -be 'table.wikitable > tbody > tr > td > b > a'
```

It gives you back:

```html
<html>
<head>
</head>
<body>
<a href="/wiki/Afghanistan" title="Afghanistan">
Afghanistan
</a>
<a href="/wiki/Albania" title="Albania">
Albania
</a>
<a href="/wiki/Algeria" title="Algeria">
Algeria
</a>
<a href="/wiki/Andorra" title="Andorra">
Andorra
</a>
<a href="/wiki/Angola" title="Angola">
Angola
</a>
<a href="/wiki/Antigua_and_Barbuda" title="Antigua and Barbuda">
Antigua and Barbuda
</a>
<a href="/wiki/Argentina" title="Argentina">
Argentina
</a>
<a href="/wiki/Armenia" title="Armenia">
Armenia
</a>
</body>
</html>
```

0 comments on commit ca91998

Please sign in to comment.