AI SCIENCE
When your sources are as real as Monopoly money
A new study shows that generative AI tools and research agents often make claims that aren’t backed up by proper sources.
About one-third of their answers were unreliable, and for OpenAI’s GPT-4.5, almost half (47%) of responses weren’t supported.
Salesforce AI researchers tested several AI search engines, including GPT-4.5 and 5, You.com, Perplexity, and Microsoft’s Bing Chat, along with five deep research tools from OpenAI, Bing, You.com, Google Gemini, and Perplexity.
The review, called DeepTrace, measured 303 queries against eight criteria, including bias, confidence, relevance, and how strong the sources were.
The results showed weak performance.
Bing Chat had unsupported claims in about 23% of answers, You.com and Perplexity in 31%, and GPT-4.5 in 47%.
Here’s what you should know:
About one-third of AI answers lacked reliable sources
GPT-4.5 had 47% unsupported claims
Perplexity’s deep research tool had the worst result at 97.5% unsupported claims
Suspicious sources alert
Perplexity’s deep research tool did worst, with 97.5% of claims unsupported. Many tools also gave one-sided answers, especially on debated topics like energy.
Reactions to the study are mixed.
Some say it confirms concerns about bias and bad sourcing, while others argue the methods used to check reliability weren’t strong enough.
There were also doubts about the way human and AI judgements were compared.
Still, experts agree that AI systems need to improve accuracy, sourcing, and balance, especially as they move into more important areas.
One-sided answers are basically AI subtweeting.