Classifying the Ideological Orientation of User-Submitted Texts in Social Media
With the long-term goal of understanding how language is used and evolves within online communities, this work explores the application of natural language processing techniques to classify text articles according to their ideological orientation (i.e., conservative or liberal). We first collect a balanced corpus of text articles posted to the online communities r/Liberal and r/Conservative from the social media website Reddit. Using the corpus, we develop and apply three classifiers. The baseline classifier is a Bayes model that accounts for each text article’s web domain, as such, classification is independent of content. Next, we develop a support vector machine (SVM) model with term frequency-inverse document frequency (TF-IDF) features; this approach highlight differences in language using a count-based feature-space to differentiate text articles. Last, we evaluate the context-based transformer (RoBERTa) model and discuss its under-performance relative to the baseline and SVM models.
PDF