ABSTRACT:
Online reviews play a significant role in influencing decisions made by users in day-to-day life. The presence of reviewers who deliberately post fake reviews for financial or other gains, however, negatively impacts both users and businesses. Unfortunately, automatically detecting such reviewers is a challenging problem since fake reviews do not seem out-of-place next to genuine reviews. In this paper, we present a fully unsupervised approach to detect anomalous behavior in online reviewers. We propose a novel hierarchical approach for this task in which we (1) derive distributions for key features that define reviewer behavior, and (2) combine these distributions into a finite mixture model. Our approach is highly generalizable and it allows us to seamlessly combine both univariate and multivariate distributions into a unified anomaly detection system. Most importantly, it requires no explicit labeling (spam/not spam) of the data. Our newly developed approach outperforms prior state-of-the-art unsupervised anomaly detection approaches.
Key words and phrases: online reviews, fake reviews, opinion spam, unsupervised learning, anomaly detection, mixture models, deception detection