Bitcoin

GPT-2 Study Shows How Language Models Can Amplify Political Bias

mrarup82May 30, 2025

0 0 6 minutes read

A. Mathematical Formulation of WMLE

B Fine-tuning Setup

C Qualitative Bias Analysis Framework and Example of Bias Amplification Across Generations

D Distribution of Text Quality Index Across Generations

E Average Perplexity Across Generations

F Example of Quality Deterioration Across Generations

G Pearson Correlation Between Neuron Weight and Bias Performance | H Pearson Correlation Between Neuron Weight and Generation Quality | I Pearson Correlation Between Neuron Activation and Bias Performance | J Pearson Correlation Between Neuron Activation and Generation Quality | K Mathematical Details for the Statistical Tests | L Literature Review of Model Collapse

7 Limitations

While this work introduces a comprehensive framework for understanding bias amplification in large language models and provides empirical evidence using GPT-2, several limitations must be acknowledged. First, the scope of our experiments is restricted to political bias in the context of U.S. media. Additionally, our experiments were conducted using GPT-2, a relatively smaller model compared to state-of-the-art architectures like GPT-4 or LLaMA2. Future research should extend our empirical approach to other contexts and larger LLMs.

Another limitation lies in our choice of mitigation strategies. While Preservation and Accumulation show promise in reducing bias amplification, their computational cost and scalability must be considered. Moreover, the mitigation strategies were tested primarily in the context of synthetic data generation, and their efficacy in real-world deployments requires further investigation.

8 Ethical Considerations

This study addresses bias amplification in LLMs, a technical phenomenon with profound ethical implications, particularly regarding fairness and the integrity of AI systems. The risk of bias amplification is especially concerning in systems that are iteratively trained on synthetic data, as it can lead to unintended distortions in model outputs. These distortions may propagate harmful biases, influencing downstream tasks in areas such as automated content generation, decision-making, and user interactions with AI.

From an ethical standpoint, this work underlines the need for transparency in the training and deployment of LLMs. Our findings demonstrate that even without biased initial datasets, iterative training can amplify subtle biases embedded within a model’s architecture, thus raising concerns about accountability in models that are widely deployed in public-facing applications. This amplification can mislead users or result in models perpetuating one-sided perspectives, which could be especially problematic in sensitive domains like news summarization, policy generation, or social media content moderation.

Moreover, the identification of distinct neural mechanisms for bias amplification and model collapse brings to light the challenges of ensuring equitable performance across all dimensions of model behavior and raises important ethical questions regarding the adequacy of current mitigation techniques, particularly in high-stakes scenarios where the cost of algorithmic bias is substantial.

Future research should prioritize the development of more comprehensive and domain-specific bias mitigation techniques, with a clear focus on minimizing ethical risks. Additionally, rigorous testing and validation across diverse datasets and real-world applications will be critical to ensuring that models trained using these methods do not exacerbate existing inequalities or produce harmful outcomes.

References

Sina Alemohammad, Josue Casco-Rodriguez, Lorenzo Luzi, Ahmed Imtiaz Humayun, Hossein Babaei, Daniel LeJeune, Ali Siahkoohi, and Richard G. Baraniuk. 2023. Self-consuming generative models go mad. Preprint, arXiv:2307.01850.

AllSides. 2024a. Media bias rating methods. Accessed: 2024-09-16.

AllSides. 2024b. Nbc news media bias/fact check. Accessed: 2024-10-10.

Emily M. Bender, Timnit Gebru, Angelina McMillanMajor, and Shmargaret Shmitchell. 2021. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, page 610–623, New York, NY, USA. Association for Computing Machinery.

Tianwei Chen, Yusuke Hirota, Mayu Otani, Noa Garcia, and Yuta Nakashima. 2024. Would deep generative models amplify bias in future models? Preprint, arXiv:2404.03242.

Wei-Fan Chen, Henning Wachsmuth, Khalid Al-Khatib, and Benno Stein. 2018. Learning to flip the bias of news headlines. In 11th International Natural Language Generation Conference (INLG 2018), pages 79–88. Association for Computational Linguistics.

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: pre-training of deep bidirectional transformers for language understanding. CoRR, abs/1810.04805.

Elvis Dohmatob, Yunzhen Feng, and Julia Kempe. 2024a. Model collapse demystified: The case of regression. Preprint, arXiv:2402.07712.

Elvis Dohmatob, Yunzhen Feng, Pu Yang, Francois Charton, and Julia Kempe. 2024b. A tale of tails: Model collapse as a change of scaling laws. Preprint, arXiv:2402.07043.

Robert M. Entman. 1993. Framing: Toward clarification of a fractured paradigm. Journal of Communication, 43(4):51–58.

Yunzhen Feng, Elvis Dohmatob, Pu Yang, Francois Charton, and Julia Kempe. 2024. Beyond model collapse: Scaling up with synthesized data requires reinforcement. Preprint, arXiv:2406.07515.

Damien Ferbach, Quentin Bertrand, Avishek Joey Bose, and Gauthier Gidel. 2024. Self-consuming generative models with curated data provably optimize human preferences. Preprint, arXiv:2407.09499.

Matthias Gerstgrasser, Rylan Schaeffer, Apratim Dey, Rafael Rafailov, Henry Sleight, John Hughes, Tomasz Korbak, Rajashree Agrawal, Dhruv Pai, Andrey Gromov, Daniel A. Roberts, Diyi Yang, David L. Donoho, and Sanmi Koyejo. 2024. Is model collapse inevitable? breaking the curse of recursion by accumulating real and synthetic data. Preprint, arXiv:2404.01413.

Alex Graves. 2012. Sequence transduction with recurrent neural networks. Preprint, arXiv:1211.3711.

Tim Groeling. 2013. Media bias by the numbers: Challenges and opportunities in the empirical study of partisan news. Annual Review of Political Science, 16(Volume 16, 2013):129–151.

Yanzhu Guo, Guokan Shang, Michalis Vazirgiannis, and Chloé Clavel. 2024. The curious decline of linguistic diversity: Training language models on synthetic text. Preprint, arXiv:2311.09807.

Patrick Haller, Ansar Aynetdinov, and Alan Akbik. 2023. Opiniongpt: Modelling explicit biases in instructiontuned llms. Preprint, arXiv:2309.03876.

Sil Hamilton. 2024. Detecting mode collapse in language models via narration. Preprint, arXiv:2402.04477.

Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. 2020. The curious case of neural text degeneration. Preprint, arXiv:1904.09751.

Madhur Jindal. 2021. Gibberish detector. https://huggingface.co/madhurjindal/ autonlp-Gibberish-Detector-492513457. Accessed: 2024-09-19.

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. 2020. Scaling laws for neural language models. Preprint, arXiv:2001.08361.

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. Preprint, arXiv:1907.11692.

Nestor Maslej, Loredana Fattorini, Raymond Perrault, Vanessa Parli, Anka Reuel, Erik Brynjolfsson, John Etchemendy, Katrina Ligett, Terah Lyons, James Manyika, Juan Carlos Niebles, Yoav Shoham, Russell Wald, and Jack Clark. 2024. Artificial intelligence index report 2024. Preprint, arXiv:2405.19522.

Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. 2022. A survey on bias and fairness in machine learning. Preprint, arXiv:1908.09635.

Fumiya Motoki, Vinícius Pinho Neto, and Vanessa Rodrigues. 2024. More human than human: measuring chatgpt political bias. Public Choice, 198:3–23.

NBC News. 2016. First read: Why it’s so hard for trump to retreat on immigration. Accessed: 2024-10-10.

Sachita Nishal and Nicholas Diakopoulos. 2024. Envisioning the applications and implications of generative ai for news media. Preprint, arXiv:2402.18835.

Alicia Parrish, Angelica Chen, Nikita Nangia, Vishakh Padmakumar, Jason Phang, Jana Thompson, Phu Mon Htut, and Samuel R. Bowman. 2022. Bbq: A hand-built bias benchmark for question answering. Preprint, arXiv:2110.08193.

Simón Peña-Fernández, Koldobika Meso-Ayerdi, Ainara Larrondo-Ureta, and Javier Díaz-Noci. 2023. Without journalists, there is no journalism: the social dimension of generative artificial intelligence in the media. Profesional de la información, 32(2):e320227.

Colin Porlezza and Giuseppe Ferri. 2022. The missing piece: Ethics and the ontological boundaries of automated journalism. #ISOJ Journal, 12(1):71–98.

Luca Rettenberger, Markus Reischl, and Mark Schutera. 2024a. Assessing political bias in large language models. Preprint, arXiv:2405.13041.

Luca Rettenberger, Markus Reischl, and Mark Schutera. 2024b. Assessing political bias in large language models. Preprint, arXiv:2405.13041.

Francisco-Javier Rodrigo-Ginés, Jorge Carrillo de Albornoz, and Laura Plaza. 2024. A systematic review on media bias detection: What is media bias, how it is expressed, and how to detect it. Expert Systems with Applications, 237:121641.

George-Cristinel Rotaru, Sorin Anagnoste, and VasileMarian Oancea. 2024. How artificial intelligence can influence elections: Analyzing the large language models (llms) political bias. Proceedings of the International Conference on Business Excellence, 18(1):1882–1891.

Mohamed El Amine Seddik, Suei-Wen Chen, Soufiane Hayou, Pierre Youssef, and Merouane Debbah. 2024. How bad is training on synthetic data? a statistical analysis of language model collapse. Preprint, arXiv:2404.05090.

I. Shumailov, Z. Shumaylov, Y. Zhao, et al. 2024. Ai models collapse when trained on recursively generated data. Nature, 631:755–759.

Rohan Taori and Tatsunori B. Hashimoto. 2022. Data feedback loops: Model-driven amplification of dataset biases. Preprint, arXiv:2209.03942. The Associated Press. 2024. Artificial intelligence at the associated press. Accessed: 2024-09-16.

Ze Wang, Zekun Wu, Xin Guan, Michael Thaler, Adriano Koshiyama, Skylar Lu, Sachin Beepath, Ediz Ertekin Jr., and Maria Perez-Ortiz. 2024. Jobfair: A framework for benchmarking gender hiring bias in large language models. Preprint, arXiv:2406.15484.

Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, and William Fedus. 2022. Emergent abilities of large language models. Preprint, arXiv:2206.07682.

Sierra Wyllie, Ilia Shumailov, and Nicolas Papernot. 2024. Fairness feedback loops: Training on synthetic data amplifies bias. Preprint, arXiv:2403.07857.

Hangtong Xu, Yuanbo Xu, Yongjian Yang, Fuzhen Zhuang, and Hui Xiong. 2023. Dpr: An algorithm mitigate bias accumulation in recommendation feedback loops. Preprint, arXiv:2311.05864.

Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. 2017. Men also like shopping: Reducing gender bias amplification using corpus-level constraints. Preprint, arXiv:1707.09457.

Yuqi Zhou, Sunhao Dai, Liang Pang, Gang Wang, Zhenhua Dong, Jun Xu, and Ji-Rong Wen. 2024. Source echo chamber: Exploring the escalation of source bias in user, data, and recommender system feedback loop. Preprint, arXiv:2405.17998

Authors:

(1) Ze Wang, Holistic AI and University College London;

(2) Zekun Wu, Holistic AI and University College London;

(3) Jeremy Zhang, Emory University;

(4) Navya Jain, University College London;

(5) Xin Guan, Holistic AI;

(6) Adriano Koshiyama.

mrarup82May 30, 2025

0 0 6 minutes read

GPT-2 Study Shows How Language Models Can Amplify Political Bias

Table of Links

7 Limitations

8 Ethical Considerations

References

mrarup82

Leave a Reply Cancel reply

Table of Links

7 Limitations

8 Ethical Considerations

References

mrarup82

Related Articles

Target missed analysts’ sales expectations by nearly half a billion dollars in the aftermath of DEI-related boycotts

My Crypto Funding Wins Best Trading Conditions Among Crypto Prop Firms – CryptoMode

Trump’s ambitions collide with Epstein, Fed and health concerns

Americans brace for ‘pain’ of Trump’s tariffs that could cost them over $1,200 a year

Leave a Reply Cancel reply