How Toxic Are Tankies Compared to Other Far-Left Groups?
Authors:
(1) UTKUCAN BALCI, Binghamton University, United States;
(2) MICHAEL SIRIVIANOS, Cyprus University of Technology, Cyprus;
(3) JEREMY BLACKBURN, Binghamton University, United States.
Table of Links
Abstract and 1 Introduction
2 Background & Related Work
3 Data
3.1 Identifying Tankie Subreddits
3.2 Identifying Ideology Subreddits and 3.3 Post Collection
4 User-Base Analysis and 4.1 Graph Construction & Community Detection
4.2 Community Growth
4.3 User Migrations Over Time
5 Content Analysis and 5.1 What do tankies talk about?
5.2 Who are tankies talking about?
5.3 Misalignment Analysis
5.4 Toxicity Analysis
5.5 Domain Analysis
5.6 Lemmygrad Analysis
6 Discussion & Conclusion and 6.1 Limitations
6.2 Implications & future work, and References
A DATA
B NAMED ENTITIES
C MISALIGNMENT ANALYSIS
D DOMAIN ANALYSIS
5.4 Toxicity Analysis
In this section, we use Perspective API models to compare the online behavior of tankies and other far-left communities.
Perspective API. The Perspective API [92] is a widely used [9, 12, 26] tool for measuring toxicity. Although it has limitations, e.g., there are issues of bias and questions of performance when encountering conversation patterns that it was not trained on, at scale it provides a decent measure for comparison between online communities. The API provides six production models: 1) TOXICITY, 2) SEVERE_TOXICITY, 3) INSULT, 4) IDENTITY_ATTACK, 5) THREAT, and 6) PROFANITY (See [91] for full details on the models). We consider a threshold of 0.8, defined as “high” for the SEVERE_TOXICITY scores by Hoseini et al. [57]. To have a baseline for the comparisons, we sample 0.5% of the Reddit posts during the dataset’s timeline, which accounts to more than 36 M posts.
Results. Figure 5 shows the cumulative distribution functions (CDFs) for each model within the far-left cluster. Our analysis reveals that tankies tend to have higher scores than other farleft communities (excluding r/alltheleft) for all Perspective API models. Additionally, all far-left communities have higher Perspective API scores than the baseline Reddit sample.
Specifically, tankies have the highest proportion for both scores ≥ 0.5 and high scores (i.e., scores ≥ 0.8) for IDENTITY_ATTACK and THREAT, and second highest for the remaining models. Table 5 shows that tankies have nearly twice as many high scores than the mean of other far-left communities.
We confirm that the score distributions for each model are significantly different between tankies and other far-left communities using a 2-sample KS test (𝑝 < 0.01 for all after adjustment for multiple testing using the Benjamini-Hochberg method). These results indicate that tankies tend to make posts with higher levels of toxicity, insult and profanity compared to other far-left communities, excluding r/alltheleft. Tankies also tend to make posts with more identity attacks and threats than all other far-left communities.
Next, we examine the named entities in tankies’ posts that have high Perspective API scores by removing any named entities that appear fewer than 100 times. In Table 6, we present the top 10 named entities ranked by the fraction of posts that mention the entity and score high across all Perspective models. For all models except IDENTITY_ATTACK, the most commonly mentioned named entities are primarily related to the US (e.g., Amerikkkans, Yankee, Charlotte, Qanon),
public figures and politicians (e.g., John Oliver, Elon Musk, Kyle Rittenhouse, Anthony Blinken, Erdogan, Alex Jones, Joe Rogan), or countries/nationalities (e.g., Chechens, Brasil, Iraqis). For the IDENTITY_ATTACK model, the most frequently mentioned named entities are typically religious or ethnic groups. Tankies appear to primarily target Muslims and Jews, with 41.80% and 34.03% of the posts mentioning these groups having high IDENTITY_ATTACK scores; the highest proportion of high IDENTITY_ATTACK scores for Muslims and Jews of any far-left community we analyze. Furthermore, we observe that tankies attack the identities of Asians, Arabs, Hindus, Mexicans, Africans, and Whites in more than 20% posts mentioning these identities.
Takeaways. Our analysis shows that tankies have the highest proportion of high scores for IDENTITY_ATTACK and THREAT among far-left communities, and they have the second highest proportion of high scores for TOXICITY, SEVERE_TOXICITY, PROFANITY, and INSULT, behind r/alltheleft. Although our findings indicate that US politics are not the primary focus for tankies, they still express strong opinions about US-related events, conspiracy theories, politicians, and public figures. Finally, we observe that tankies frequently target Muslims and Jews in their posts.