-
Notifications
You must be signed in to change notification settings - Fork 36
Data Science Intro and Brainstorm
Our DS strategy is very much in the brainstorming phase, so we are open to new ideas and leadership. We have been collecting, curating, and publishing our data for a couple years, so we would love to use that data to help our mission. Here are a few of our projects:
http://heyduwamish.org/ http://gtvote.heyduwamish.org/ http://watershed.heyduwamish.org/ http://snoqualmie.heyduwamish.org/ http://heywillamette.org/
Feel free to explore those sites to get a general feel for what we are doing and the kind of data we are collecting and publishing. If you have any questions or feedback, or want to see some more examples, just let us know!
--
As for our data, it basically falls into 2 categories - person-generated and machine-generated. In the spirit of FOSS, all of our data is publicly accessible via our api's (except for private info like emails and account info). The project started as a way to address the needs around the Duwamish River watershed in Seattle, so that site has the most data and best examples. But we are definitely branching out to other watersheds!
Here is a brief description about our data, and some brainstorms in implementing DS solutions:
- person-generated data:
This data is submitted to us by residents, stakeholders, and community groups. Here is an example of community reports around the Duwamish watershed: https://api.heyduwamish.org/api/v2/smartercleanup/datasets/duwamish/places?format=json
And here are reports administrated by community organizations (which is coming from our dev api because it is an experimental feature): https://dev-api.heyduwamish.org/api/v2/smartercleanup/datasets/restoration/places?format=json https://dev-api.heyduwamish.org/api/v2/smartercleanup/datasets/vision/places?format=json
We have a lot more datasets like this, which you can see here:
https://api.heyduwamish.org/api/v2/smartercleanup/datasets
DS Applications
Again, this is in the brainstorming phase, and I'm no expert, so any help or new ideas around this are very welcome! I've been thinking about building an NLP model for our person-generated data. I've been brainstorming around a model that can rate which submissions/articles are most helpful. For example, if we can detect biased language, then we can either show a score next to a post, or adjust the emphasis of a post based on that score.
Here is a quick summary about the bias-detection that we've been brainstorming:
deep dive on the subject: https://web.stanford.edu/~jurafsky/pubs/neutrality.pdf shallow dive: https://en.wikipedia.org/wiki/Grammatical_mood off the shelf library for detecting verb mood (and doesn't require training): http://www.clips.ua.ac.be/pages/pattern-en#modality
- Machine-generated data:
We are using CartoDB to serve most of this data, which you can explore here:
https://smartercleanup.carto.com/maps?page=1
If you have an interest in sensors, then this might be more relevant. For example, you can look at our Riparian Sun Map, collected from King County here in Washington:
https://smartercleanup.carto.com/viz/c5a3fa12-f769-11e5-b95a-0ea31932ec1d/public_map
and that dataset is on our watershed.heyduwamish.org map, linked above. It's of special interest because the increasing temperatures are a major contributor to the recent uptick in salmon pre-spawn deaths. We might even be able to compare this machine-generated dataset to salmon pre-spawn mortality surveys, which are often person-generated. Here's an example of a salmon pre-spawn mortality report: http://heyduwamish.org/report/329 There are definitely a lot more of these surveys that are not on our site, but we should be able to find access to them.
Another idea is to work with groups like Public Labs (https://publiclab.org/) to empower communities to build their own environmental monitoring tools. But I don't believe any of our sites have done this yet.
New ideas are welcome!
If you have any ideas, or suggestion, I would love to hear them! We are a small group, trying to do a lot with very little, but as long as your ideas match up with our mission, then I think there is room for collaboration :)
Here is our mission statement:
To optimize community knowledge and resources to address environmental health issues over the next century.
and how: Build a free and open source community mapping platform to bring together residents and stakeholders working towards healthier watersheds.
Back to home