It is a clearly established fact that good categorization results are heavily dependent on representation techniques. Text representation is a necessity that must be fulfilled before working on any text analysis task since it creates a baseline which even advanced machine learning models fail to compensate. This paper aims to comprehensively analyze and quantitatively evaluate the various models to represent text in order to perform Subjectivity Analysis. We implement a diverse array of models on the Cornell Subjectivity Dataset. It is worth noting that the BERT Language Model gives much better results than any other model but is significantly computationally expensive than the other approaches. We obtained state-of-the-art results on the subjectivity task by fine-tuning the BERT Language Model. This can open up a lot of new avenues and potentially lead to a specialized model inspired by BERT dedicated to subjectivity analysis.

PDF

Datasets


Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Subjectivity Analysis SUBJ BERT-Base + CLR + LSTM Accuracy 97.30 # 2
Subjectivity Analysis SUBJ BERT-Base + LSTM Accuracy 96.60 # 4

Methods