classify ads by whether they're persuasive, mobilization, listbuilding or fundraising #78

jeremybmerrill · 2018-07-23T18:53:10Z

political ads can have many different purposes, including

listbuilding: finding potential supporters and getting their contact info, so you can claim them as supporters and also so you can ask them for money
fundraising: asking people -- probably people who you already know are your supporters or else people you think are reasonably likely to support you -- for money
mobilization: asking people -- probably people who you already know are your supporters or else people you think are reasonably likely to support you -- to do stuff, like vote early or volunteer
persuasion: communicating to people -- who are not your supporters but who probably aren't your opponent's supporters either -- about specifically-chosen issues/messages to persuade them to vote for you (or at least to not vote for your opponent)

(I realize this is a somewhat simplified ontology. Ideas on how to come up with -- and operationalize -- a different ontology are totally welcome.)

It'd be amazing to come up with a machine learning model that could come up with a decent guess as to which category a given political ad falls into. You might be able to figure this out just from the text of the ad. (In a perfect world, we could also extract interesting features from the ad images/video, but that's out of scope.)

I can talk endlessly about this idea. Let me know if you're interested. Reply here or email me at jeremy dot merrill at propublica dot org.

yinleon · 2018-08-01T02:35:33Z

This sounds cool, what kinds of data and metadata do you have?
We do ML for social science at my lab (it's hard!)

jeremybmerrill · 2018-08-01T03:28:22Z

Hi @yinleon, thanks for your interest! We have about 54,000 ads; you can download them here. That page has the schema too. The text content of the ads (message) is probably the most predictive, but the targeting methods (parsed into targets; raw from Facebook in targetings) and any links in the raw html content of the ad (body) might also be predictive.

We have an image from each ad (either the main image or a still from the video). We don't have any data extracted from the images, whether by image recognition, text OCR or anything like that. There's likely-predictive data in here: often listbuilding ads contain a "survey" (e.g. this one) that's not actually collecting any data other than email addresses.

The biggest problem is that we don't have a labeled subset for training. The dataset is unbalanced; it's mostly fundraising and listbuilding ads, with fewer persuasive and mobilization ads.

Would love to hear your thoughts! I'm always looking to hear from folks with more experience doing ML... Let me know if you have more questions about the dataset or about my ontology.

jeremybmerrill · 2018-08-02T16:44:08Z

Just for recordkeeping, here's an example of a mobilization ad: https://projects.propublica.org/facebook-ads/ad/23842873784130638. Danny O'Connor, a Dem special election candidate for US House in OH-12 is asking a custom audience to check his list of changed precincts for the election.

jeremybmerrill added the help wanted label Jul 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

classify ads by whether they're persuasive, mobilization, listbuilding or fundraising #78

classify ads by whether they're persuasive, mobilization, listbuilding or fundraising #78

jeremybmerrill commented Jul 23, 2018

yinleon commented Aug 1, 2018

jeremybmerrill commented Aug 1, 2018

jeremybmerrill commented Aug 2, 2018

classify ads by whether they're persuasive, mobilization, listbuilding or fundraising #78

classify ads by whether they're persuasive, mobilization, listbuilding or fundraising #78

Comments

jeremybmerrill commented Jul 23, 2018

yinleon commented Aug 1, 2018

jeremybmerrill commented Aug 1, 2018

jeremybmerrill commented Aug 2, 2018