MARMOT: A Deep Learning Framework for Constructing Multimodal Representations for Vision-and-Language Tasks

[Job Market Paper]

Transfer learning allows researchers to use state-of-the-art deep learning methods with smaller datasets, which is common in political science. However, most current transfer learning approaches work with only one modality: that is, these transfer learning techniques can be applied only to images or text. Existing multimodal models are designed to make predictions over image-text pairs, but they require that every observation have both types of modalities (that is, it would not be able to make a prediction on an observation that only had text). I develop a multimodal model called Multimodal Representations Using Modality Translation, or MARMOT, which provides a framework for constructing joint image-text representations. One key advantage of MARMOT over other current multimodal approaches is that it does not require every observation have image or text. I use a pretrained transformer decoder to first “translate” the image; then, I jointly input all modalities into a pretrained BERT transformer encoder. This model shows dramatic improvements over existing ensemble classifier methods used in previous political science works on classifying tweets, particularly in multilabel classification problems. It also shows improvements over the state-of-the-art models on image-text problems.

This is joint work with Walter Mebane.

Partisan Associations from Word Embeddings of Twitter Users’ Bios

[APSA 2019]

One of the principal problems that faces political research of social media is the lack of measures of the social media users’ attributes that political scientists often care about. Most notably, users’ partisanships are not well-defined for most users. This project proposes using Twitter user bios to measure partisan associations. The method is simple and intuitive: we map user bios to document embeddings using doc2vec and we map individual words to word embeddings using word2vec. We then take the cosine similarity between these document embeddings and specific partisan subspaces defined using partisan keywords that refer to presidential campaigns, candidates, parties, and slogans to calculate partisan associations. The idea of this approach is to learn the non-partisan words that are in the contextual neighborhoods of explicitly partisan words. Even if someone does not explicitly use partisan expressions in their bio, he or she may describe themselves with words that the descriptions that feature explicit partisan expressions tend to contain. This idea resonates with research that studies the associations between partisan sentiments and seemingly non-partisan identities, activities, hobbies, spending habits, and interests. Our project shows that these measures capture partisan engagement and sentiment in intuitive ways, such as which partisan users they retweet, favorite, follow, and what hashtags they use.

This is joint work with Walter Mebane, Logan Woods, Joseph Klaver, and Preston Due.

Identifying Hostile Usages of Partisan Words on Twitter

The keywords chosen for the partisan associations project are assumed to be used in a non-hostile fashion. But it is quite common to see negative usages of partisan words, which may express dismay or condemnation of the other party or parties. This project develops a model to classify tweets as containing hostile or sarcastic usages of political words or not. Such a classifier is not only useful for the partisan association project, but any projects that want to detect hostile or sarcastic usages of partisan words, such as works on affective polarization. We currently use bidirectional encoder representations from transformers (BERT), which outperforms the existing word embedding-based methods on identifying hostile or sarcastic usages of words.

This is joint work with Walter Mebane and Logan Woods.