Bitcoin

Media Slant is Contagious: Evaluating the Accuracy of Human Guesses

Abstract and 1 Introduction 2. Data

3. Measuring Media Slant and 3.1. Text pre-processing and featurization

3.2. Classifying transcripts by TV source

3.3. Text similarity between newspapers and TV stations and 3.4. Topic model

4. Econometric Framework

4.1. Instrumental variables specification

4.2. Instrument first stage and validity

5. Results

5.1. Main results

5.2. Robustness checks

6. Mechanisms and Heterogeneity

6.1. Local vs. national or international news content

6.2. Cable news media slant polarizes local newspapers

7. Conclusion and References

Online Appendices

A. Data Appendix

A.1. Newspaper articles

A.2. Alternative county matching of newspapers and A.3. Filtering of the article snippets

A.4. Included prime-time TV shows and A.5. Summary statistics

B. Methods Appendix, B.1. Text pre-processing and B.2. Bigrams most predictive for FNC or CNN/MSNBC

B.3. Human validation of NLP model

B.4. Distribution of Fox News similarity in newspapers and B.5. Example articles by Fox News similarity

B.6. Topics from the newspaper-based LDA model

C. Results Appendix

C.1. First stage results and C.2. Instrument exogeneity

C.3. Placebo: Content similarity in 1995/96

C.4. OLS results

C.5. Reduced form results

C.6. Sub-samples: Newspaper headquarters and other counties and C.7. Robustness: Alternative county matching

C.8. Robustness: Historical circulation weights and C.9. Robustness: Relative circulation weights

C.10. Robustness: Absolute and relative FNC viewership and C.11. Robustness: Dropping observations and clustering

C.12. Mechanisms: Language features and topics

C.13. Mechanisms: Descriptive Evidence on Demand Side

C.14. Mechanisms: Slant contagion and polarization

B.3. Human validation of NLP model

We evaluate the accuracy of human guesses on whether an 80-word TV transcript snippet is from FNC or CNN/MSNBC. This section provides some more detail on this validation step. We extract a random sample of 1,000 TV transcript snippets and ask three individual freelancers to guess whether each snippet is from FNC or CNN/MSNBC.

The individuals were recruited from the freelancing platform Upwork. When selecting the individuals, we imposed these three filtering criteria: must (i) live in the United States, (ii) be socialized in the U.S. (e.g., born and raised in the U.S.), and (iii) show good literacy (defined by properly reading our instructions, i.e., returning a valid working sample, see below). The initial job post read as follows: “We have a file with 1,000 very short excerpts of news reports. You will read them and spontaneously (based on your intuition) decide if you think a given excerpt was published by Fox News or by CNN. In the process of labelling, do not google or engage in any other form of research. Just give us your spontaneous impression based on how you perceive news reporting by the two channels in your everyday life.” All freelancers who replied to this post within a day were requested to submit a working sample of 10 snippets. We recruited the first three individuals who submitted the requested working sample. The hired freelancers received the file with the reminder: “Please indicate whether you think the text is from Fox News or CNN. We would like to remind you that you mustn’t do research of any kind when assessing the excerpts. Your labels should be based on your spontaneous guess and nothing else.” All individuals had or were in the process of acquiring a college degree. They were based in Point Pleasant (WV), Malvern (PA), and Houston (TX).

The accuracy scores of the freelancer’s guesses are 0.73, 0.78, and 0.78, respectively. The average false-positive rate (a freelancer guesses a CNN/MSNBC snippet to be from FNC) is slightly higher (at 0.14) than the false-negative rate (0.08). The three freelancers agree on whether a snippet appears to be from FNC or from CNN/MSNBC in 58% of cases (if they guessed randomly, they would agree in 25% of cases). We derive two conclusions from this exercise. First, even if cut into 80-word snippets, TV transcripts still contain information that allows a reader to infer the channel. Second, our classifier approximates the performance of humans.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button