Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datasets #1

Open
Nowosad opened this issue May 16, 2020 · 8 comments
Open

Datasets #1

Nowosad opened this issue May 16, 2020 · 8 comments

Comments

@Nowosad
Copy link
Member

Nowosad commented May 16, 2020

At least three different scales:

  • Global/continental
  • Regional/country level
  • Local

Each level should have complete set of possible spatial object types with interesting attributes:

  • Points
  • Lines
  • Polygons
  • Categorical raster(s)
  • Continuous raster(s)

At least one of the scales should also have some temporal variables to showcase tmap's animation capabilities.

@mtennekes
Copy link
Member

Yes. We have to reduce the number of datasets in a smart way, since 3x5=15 is too much in my opinion.

I think we should aim for 3 topics/applications, one for each scale. Each topic is then covered with as few datasets as possible (i.e. such that is covers our needs).

Global

Regional / country
Have to look for suitable data. The only option I currently have is Dutch commuting data. It contains numbers of commuters between municipalities (400 in total), by mode of transport.

Local
We can analyze an satellite image of air pollution, and use OSM vector data as reference. For instance plot main (rail)roads and important buildings like schools. Satelite images from different moments in time would also be awesome (e.g. pre, during, and post COVID).

Although it is not the focus of the book, I think it's nice to have three different hot topics, like e.g. health (global), transport (country), and climate (local).

@Nowosad
Copy link
Member Author

Nowosad commented May 24, 2020

@mtennekes 15 datasets sound like a lot, but I tried to count (in memory) datasets used in geocompr, and there we used more than 20 datasets in the first eight chapters. However, I also think that adding datasets and modifying them (e.g. adding/removing variables, changing projections, etc.) is an incremental process. We will see what is missing while writing the book and then we can add it. We just need a starting point for now.

I like the idea of three different topics a lot. It is great!

Few remarks:

  1. Fill free to start downloading the data (especially the ones on global and regional levels)
  2. Do you have any suggestions for the location of local data?
  3. For the local level, we can also add some categorical rasters (land cover/land use).

@zross what do you think?

@zross
Copy link

zross commented May 25, 2020

A couple of thoughts:

  1. In my experience, coming up with an "analysis" to do makes things a bit more interesting and real world. Simply putting bubble points on a global map, I don't think, will be as compelling.

  2. I think starting with a topic would be the way I would prefer to do it, but practically-speaking, I think we may need to pick at least one dataset by location -- picking a location with pretty much any kind of dataset we can envision. This way, if we decide we need to include a land use layer, a tree layer, a hospital layer -- whatever -- we can be confident that data would be available. NYC, London etc.

  3. I wonder if we could come up with an unexpected place/topic. Like if we did something with Africa, instead of looking at climate or poverty or something like that we pick UNESCO heritage sites or beautiful parks or first archaeological find. I don't know. For the workshop I did at the RStudio conference, partly, I used data on burrito restaurants in San Francisco from {yelpr}. Road density near the restaurants, number of restaurants per neighborhood. That kind of thing and people enjoyed that.

Most of my own work and experience is with the US and we absolutely need to pick an less covered area also but in terms of what I know:

  1. My own expertise is air quality. I could easily come up with air quality-relate data for any place, any resolution. I'm currently working on a project on the global burden of air quality and have tons of useful global data from the Institute for Health Metrics and Evaluation.

  2. My own expertise is also NYC. As you might guess, NYC has a ton of amazing and interesting data. For my Datacamp course I used a census of trees which is a nice dataset.

  3. My wife works at a famous bird laboratory and they have amazing data. This person is someone I know at that lab and he could probably help us get some interesting data for anywhere in the world.

@Nowosad
Copy link
Member Author

Nowosad commented May 25, 2020

Hi @zross, great points.
How about we split the work here?:

  • @mtennekes could prepare regional level data - we have already discussed having Dutch commuting data (and I think it is a very good idea) + it could be a good example for interactive maps
  • @zross could think about some local data (as a city dweller, I would welcome some air quality visualizations). We could have some spatiotemporal examples (facets), animations, and we could also present the points with tiles in the background (static maps) and some rasters (elevation, land use)
  • I could work on global data preparation. It could be used to present different projections plus it could be good as an application of using the shiny package

What do you think about that?

@mtennekes
Copy link
Member

Agree with both of you.

I imagine that the bird datasets that Zev mentioned will be very interesting. And also something completely different (for most people at least). And it is still relevant (I mean the burrito dataset would be fun for sure, but I like topics that have impact).

I will prepare the Dutch commuting data. Not sure if it will work though, since it needs a lot of data processing to turn data into a useful map. For this purpose, I've started a new (small) package to handle this kind of OD data. Maybe I can use an already processed version of the data.

Air quality data is good to have. @zross I don't have a preference for a location for local scale: NYC is fine with me!

@Nowosad
Copy link
Member Author

Nowosad commented Jun 2, 2020

Hi @mtennekes and @zross,

I have started working on preparing global data using world borders from NaturalEarth and additional attributes from Gapminder. You can see it at https://github.com/r-tmap/tmap-data.

Please take a look at the code at
https://github.com/r-tmap/tmap-data/blob/master/R/01-prepare-world.R.

My comments and questions:

  1. I have slightly modified the NatualEarth data to be more consistent with Gapminder.
    Let me know what you think about it.
  2. I added several attributes, including (a) World Bank regions, (b) World Bank income groups, (c) Total population, (d) CO 2 emissions, (e) GDP per capita, (f) Life expectancy, (g) Corruption Perception Index, (h) Democracy score, (i) HDI, (j) Energy use, and (k) Literacy rate. What do you think about this list? Should I add or remove something?
  3. We could also create some spatiotemporal variables (one of the above attributes for a few years) to present some tmap capabalities, such as making animations.
  4. What should be the map projection used for the global dataset?

Overall, I also think that we can (and will) modify and improve datasets while writing the book, but it will be nice to have an agreed alpha version.

Best,
J.

@mtennekes
Copy link
Member

Great work!

  1. Do you mean the assignment of subcountries, #Puerto Rico -> USA etc.? Good idea. We can finetune it later.
  2. I find two of the added variables very interesting: corruption and democracy (see also below). Generally speaking , the other variables don't add much news in comparison to tmap::World and spdata::world. And for energy use and CO2 emissions, I wouldn't use countries borders, but a more detailed spatial resolution that also shows metropolitan areas.
  3. Yes, that would be awesome. If I have time, I can also take a look.
  4. Good question. A projection that has equal-area property is almost a must, especially for choropleths. I looked around, and the relatively new "Equal Earth" property seems the way to go. However, I got some warnings when applying st_transform. I noticed that there is little difference with my old favourite, Eckert IV, which I used for tmap::World:

I played around with this dataset, and created a composite indicator:

world_all2 = world_all %>% 
  sf::st_transform(crs = "+proj=eck4") %>% 
  sf::st_make_valid() %>% 
  mutate(demo_corr = democracy_score * 2.5 + 25 + corruption_perception_index / 2,
         demo_corr_rank = rank(-demo_corr, ties.method = "min"))

tmap_options(projection = 0, basemaps = NULL) # github version of tmap needed

tm_shape(world_all2) + 
tm_polygons("demo_corr", style = "cont", 
    popup.vars = c("democracy_score", "corruption_perception_index",
    "demo_corr","demo_corr_rank"), id = "name")

Screenshot from 2020-06-03 19-25-37

@Nowosad
Copy link
Member Author

Nowosad commented Jun 7, 2020

Great. I have updated the code a little bit yesterday. I think it is a good starting point for the world data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants