Skip to content
This repository has been archived by the owner on Mar 3, 2022. It is now read-only.

classify ads by whether they're persuasive, mobilization, listbuilding or fundraising #78

Open
jeremybmerrill opened this issue Jul 23, 2018 · 3 comments

Comments

@jeremybmerrill
Copy link
Contributor

political ads can have many different purposes, including

  • listbuilding: finding potential supporters and getting their contact info, so you can claim them as supporters and also so you can ask them for money
  • fundraising: asking people -- probably people who you already know are your supporters or else people you think are reasonably likely to support you -- for money
  • mobilization: asking people -- probably people who you already know are your supporters or else people you think are reasonably likely to support you -- to do stuff, like vote early or volunteer
  • persuasion: communicating to people -- who are not your supporters but who probably aren't your opponent's supporters either -- about specifically-chosen issues/messages to persuade them to vote for you (or at least to not vote for your opponent)

(I realize this is a somewhat simplified ontology. Ideas on how to come up with -- and operationalize -- a different ontology are totally welcome.)

It'd be amazing to come up with a machine learning model that could come up with a decent guess as to which category a given political ad falls into. You might be able to figure this out just from the text of the ad. (In a perfect world, we could also extract interesting features from the ad images/video, but that's out of scope.)

I can talk endlessly about this idea. Let me know if you're interested. Reply here or email me at jeremy dot merrill at propublica dot org.

@yinleon
Copy link

yinleon commented Aug 1, 2018

This sounds cool, what kinds of data and metadata do you have?
We do ML for social science at my lab (it's hard!)

@jeremybmerrill
Copy link
Contributor Author

Hi @yinleon, thanks for your interest! We have about 54,000 ads; you can download them here. That page has the schema too. The text content of the ads (message) is probably the most predictive, but the targeting methods (parsed into targets; raw from Facebook in targetings) and any links in the raw html content of the ad (body) might also be predictive.

We have an image from each ad (either the main image or a still from the video). We don't have any data extracted from the images, whether by image recognition, text OCR or anything like that. There's likely-predictive data in here: often listbuilding ads contain a "survey" (e.g. this one) that's not actually collecting any data other than email addresses.

The biggest problem is that we don't have a labeled subset for training. The dataset is unbalanced; it's mostly fundraising and listbuilding ads, with fewer persuasive and mobilization ads.

Would love to hear your thoughts! I'm always looking to hear from folks with more experience doing ML... Let me know if you have more questions about the dataset or about my ontology.

@jeremybmerrill
Copy link
Contributor Author

Just for recordkeeping, here's an example of a mobilization ad: https://projects.propublica.org/facebook-ads/ad/23842873784130638. Danny O'Connor, a Dem special election candidate for US House in OH-12 is asking a custom audience to check his list of changed precincts for the election.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants