Big social data analytics in journalism and mass communication: Comparing dictionary–based text analysis and unsupervised topic modeling

L. Guo; C. Vargo; Z. Pan; W. Ding

doi:10.1177/1077699016639231

Big social data analytics in journalism and mass communication: Comparing dictionary–based text analysis and unsupervised topic modeling

Authors: L. Guo; C. Vargo; Z. Pan; W. Ding

Publication: Journalism & Mass Communication Quarterly, 93(2):332-359, 2016

Version of record (DOI)

Download preprint PDF

Abstract

This paper presents an empirical study that investigated and compared two “big data” text analysis methods: dictionary-based analysis, perhaps the most popular automated analysis approach in social science research; and unsupervised topic modeling (i.e., LDA analysis), one of the most widely used algorithms in the field of computer science and engineering. By applying two “big data” methods to make sense of the same dataset—77 million tweets about the 2012 U.S. presidential election, the study provides a starting point for scholars to evaluate the efficacy and validity of different computer-assisted methods for conducting journalism and mass communication research, especially in the area of political communication. Keywords computer-assisted content analysis, unsupervised machine learning, topic modeling, political communication, Twitter 1 BIG SOCIAL DATA ANALYTICS McQuail notes that “the entire study of mass communication is based on the premise that the media have significant effects” (1994, p. 327). However, whether the “premise” still holds true in this transforming media environment remains a question. The latest Gallup polls show that Americans’ confidence in major news media platforms continued to decline in the past few years (Morales, 2012). Instead, people now turn to a wide variety of alternative media outlets such as blogs and social networking sites for news and in

How to Cite

Guo, L., Vargo, C., Pan, Z., & Ding, W. (2016). Big social data analytics in journalism and mass communication: Comparing dictionary–based text analysis and unsupervised topic modeling. Journalism & Mass Communication Quarterly, 93(2), 332–359. https://doi.org/10.1177/1077699016639231

BibTeX (.bib) RIS (.ris) CSL-JSON (.json)

Version and Rights

This is the author preprint. For the final published version, see the DOI above.

← Back to publications