MARMOT: A Deep Learning Framework for Constructing Multimodal Representations for Vision-and-Language Tasks


Political activity on social media presents a data-rich window into political behavior, but the vast amount of data means that almost all content analyses of social media require a data labeling step. Most automated machine labeling methods, however, ignore the multimodality of posted content, focusing either on text or images. State-of-the-art vision-and-language models are unusable for most political science research: they require all observations to have both image and text and require computationally expensive pretraining. This paper proposes a novel vision-and-language framework called multimodal representations using modality translation (MARMOT). MARMOT presents two methodological contributions: it can construct representations for observations missing image or text and it replaces the computationally expensive pretraining with modality translation. We use a pretrained BERT transformer decoder to first “translate” the image; then, we jointly input all modalities into a pretrained BERT transformer encoder. This model shows dramatic improvements over existing ensemble classifier methods used in previous political science works on classifying tweets, particularly in multilabel classification problems. It also shows improvements over benchmark state-of-the-art models on image-text problems.

This is joint work with Walter Mebane.

Partisan Associations from Word Embeddings of Twitter Users’ Bios

[APSA 2019]

One of the principal problems that faces political research of social media is the lack of measures of the social media users’ attributes that political scientists often care about. Most notably, users’ partisanships are not well-defined for most users. This project proposes using Twitter user bios to measure partisan associations. The method is simple and intuitive: we map user bios to document embeddings using doc2vec and we map individual words to word embeddings using word2vec. We then take the cosine similarity between these document embeddings and specific partisan subspaces defined using partisan keywords that refer to presidential campaigns, candidates, parties, and slogans to calculate partisan associations. The idea of this approach is to learn the non-partisan words that are in the contextual neighborhoods of explicitly partisan words. Even if someone does not explicitly use partisan expressions in their bio, he or she may describe themselves with words that the descriptions that feature explicit partisan expressions tend to contain. This idea resonates with research that studies the associations between partisan sentiments and seemingly non-partisan identities, activities, hobbies, spending habits, and interests. Our project shows that these measures capture partisan engagement and sentiment in intuitive ways, such as which partisan users they retweet, favorite, follow, and what hashtags they use.

This is joint work with Walter Mebane, Logan Woods, Joseph Klaver, and Preston Due.

Identifying Hostile Usages of Partisan Words on Twitter

The keywords chosen for the partisan associations project are assumed to be used in a non-hostile fashion. But it is quite common to see negative usages of partisan words, which may express dismay or condemnation of the other party or parties. This project develops a model to classify tweets as containing hostile or sarcastic usages of political words or not. Such a classifier is not only useful for the partisan association project, but any projects that want to detect hostile or sarcastic usages of partisan words, such as works on affective polarization. We currently use bidirectional encoder representations from transformers (BERT), which outperforms the existing word embedding-based methods on identifying hostile or sarcastic usages of words.

This is joint work with Walter Mebane and Logan Woods.