Media Slant is Contagious: Evaluating the Accuracy of Human Guesses
We evaluate the accuracy of human guesses on whether an 80-word TV transcript snippet is from FNC or CNN/MSNBC. This section provides some more detail on this validation step. We extract a random sample of 1,000 TV transcript snippets and ask three individual freelancers to guess whether each snippet is from FNC or CNN/MSNBC.
The individuals were recruited from the freelancing platform Upwork. When selecting the individuals, we imposed these three filtering criteria: must (i) live in the United States, (ii) be socialized in the U.S. (e.g., born and raised in the U.S.), and (iii) show good literacy (defined by properly reading our instructions, i.e., returning a valid working sample, see below). The initial job post read as follows: “We have a file with 1,000 very short excerpts of news reports. You will read them and spontaneously (based on your intuition) decide if you think a given excerpt was published by Fox News or by CNN. In the process of labelling, do not google or engage in any other form of research. Just give us your spontaneous impression based on how you perceive news reporting by the two channels in your everyday life.” All freelancers who replied to this post within a day were requested to submit a working sample of 10 snippets. We recruited the first three individuals who submitted the requested working sample. The hired freelancers received the file with the reminder: “Please indicate whether you think the text is from Fox News or CNN. We would like to remind you that you mustn’t do research of any kind when assessing the excerpts. Your labels should be based on your spontaneous guess and nothing else.” All individuals had or were in the process of acquiring a college degree. They were based in Point Pleasant (WV), Malvern (PA), and Houston (TX).
The accuracy scores of the freelancer’s guesses are 0.73, 0.78, and 0.78, respectively. The average false-positive rate (a freelancer guesses a CNN/MSNBC snippet to be from FNC) is slightly higher (at 0.14) than the false-negative rate (0.08). The three freelancers agree on whether a snippet appears to be from FNC or from CNN/MSNBC in 58% of cases (if they guessed randomly, they would agree in 25% of cases). We derive two conclusions from this exercise. First, even if cut into 80-word snippets, TV transcripts still contain information that allows a reader to infer the channel. Second, our classifier approximates the performance of humans.