Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add names-current-term.csv #3

Open
davewhiteland opened this issue Jul 18, 2016 · 8 comments
Open

add names-current-term.csv #3

davewhiteland opened this issue Jul 18, 2016 · 8 comments

Comments

@davewhiteland
Copy link
Contributor

davewhiteland commented Jul 18, 2016

The big list of all the names of all politicians ever is impressive, but it's possible that a similar file that only contains politicians from all legislatures' current terms would be more useful for some more timely applications.

There's no specific use case for this, just a speculative idea that it might make the data more useful (that big file is cumbersome). Note that for specific legislatures, EveryPolitician already publishes a handy names.csv file (URL included in the countries.json index file).

It seems likely that the overhead to creating this as part of the creation of the megainclusive names.csv might be relatively low, so worth trying.

@tmtmtmtm
Copy link
Contributor

Should this be all people in the current term? Or just all current members?

@struan
Copy link

struan commented Jul 18, 2016

My hunch is that a naive user will expect the latter.

@davewhiteland
Copy link
Contributor Author

It's hard to know without a use case.

My feeling is it's always going to be a little fuzzy because of latency between real life and data anyway, so just current term is a start. shrugs

@tmtmtmtm
Copy link
Contributor

Well, we could of course create both versions. But that seems more likely to just be confusing.

Perhaps @pudo or @jpmckinney might have an opinion on this?

@pudo
Copy link

pudo commented Jul 18, 2016

Ok, first off: this sounds like a tremendously useful feature for EveryPolitician, I'd really like to use it. For my use case -- which is finding mentions of politicians in documents and databases -- I'd actually prefer the term members over the current members. That gives me a bit of extra coverage. Perhaps the former member had to resign over a scandal -- in that case I want to track them a bit longer :)

@davewhiteland
Copy link
Contributor Author

Further to my throwaway comment:

"It seems likely that the overhead to creating this as part of the creation of the megainclusive names.csv might be relatively low, so worth trying."

...I now notice that in fact everypolitician-names really is just doing very little beyond concatenating the existing names.csvs with some extra columns; which is to say currently it's wholly unaware of what term any name is from, and really is just shuffling CSV lines around. Heh.

@jpmckinney
Copy link

If you wanted to perform analysis over only current members (e.g. gender analysis), then you'd need the current members version. In lots of journalism use cases, however, the current term version is more relevant, as it matters if a member doesn't make it to end of term. So, yeah, I think both are useful.

@davewhiteland
Copy link
Contributor Author

The corollary to that is perhaps that we've already made this decision insomuch as the names.csv in each legislature's directory within everypolitician/everypolitician-data is created with no regard for terms... yet. But maybe there's a case for doing it there rather than in this repo; then this repo's remit would be just to collate the current-term and current-term-currently-in-office CSV files into "global" ones. Which is what it's doing with the names.csv files already, i.e., collating files from EveryPolitician into a single global one. Perhaps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants