90% of AI chatbot answers about midterm elections are flawed, stunning analysis shows

11 2 minutes read

If you consult an advanced AI chatbot about the upcoming midterm elections, there is a high chance that the information provided may be inaccurate, biased, or sourced from a state-run outlet, as per a recent analysis.

A study conducted by researchers at Forum AI, a startup dedicated to evaluating and enhancing the precision of AI models, examined four popular chatbots: OpenAI’s ChatGPT, Anthropic’s Claude, Google’s Gemini, and xAI’s Grok.

The analysis revealed that these chatbots struggle to differentiate between reputable news sources and propaganda, such as China’s Global Times, with 15% of responses citing at least one state-run media outlet.

For instance, Anthropic’s Claude referenced the Global Times when asked about the form of government in the United States, as highlighted in a blog post by Katie Harbath, a former Facebook executive and an expert at Forum.

The issue becomes more pronounced when the questions pertain to foreign policy, with ChatGPT citing state-run media outlets in 51% of its responses and Grok in 44%.

Overall, the chatbots referenced state-run media in 35% of responses related to foreign policy.

The information often originated from outlets associated with countries hostile to the US.

According to a blog post by Forum’s Andy Hall and Robby Goldfarb, “Chinese-controlled outlets like Xinhua, Global Times, CGTN, China Daily, as well as Russian and Iranian outlets, were frequently cited.”

The study involved posing 3,136 questions to the chatbots covering a wide range of topics including US politics, foreign affairs, healthcare, education, and the economy. A panel of experts evaluated 12,542 responses for accuracy, making it the largest independent assessment of AI in news and current events.

Approximately 30% of responses contained factual errors, including incorrect dates, policy details, and attributions.

OpenAI’s ChatGPT was the most accurate chatbot with an error rate of 9%, followed by Gemini at 25%, Claude at 41%, and Grok at 43%.

For example, Gemini inaccurately stated that Arkansas ACA premiums would rise by 65% to 67% in 2026, while Grok wrongly claimed that no effective Iranian navy, air force, or advanced air defenses remained operational.

The chatbots also struggled to maintain political neutrality, with nearly a quarter of responses failing the neutrality check. The study revealed that directional failures were observed in responses related to elections, with varying degrees of bias among the chatbots.

Anthropic’s spokesperson emphasized that Claude is designed to provide politically balanced responses, present credible information on current events, and highlight disputed claims or sources.

Forum AI, spearheaded by Campbell Brown, a former CNN anchor and former head of news partnerships at Meta, emphasized the importance of addressing the risks associated with misinformation in AI models.

In conclusion, the study underscores the need for improved accuracy and neutrality in AI chatbots, especially when it comes to delivering information on critical topics like elections and foreign policy.