“AI responses may include mistakes,” Google says at the bottom of its AI Overviews. How many? According to a startup, 1 out of 10 summaries is problematic. That’s “millions of erroneous answers every hour.” (The New York Times)
- The NYT asked Oumi to run Google's AI Overviews against the SimpleQA benchmark, testing 4,326 queries in October 2025 and again in February 2026. The second round of testing followed Google's upgrade from Gemini 2 to Gemini 3.
- Accuracy improved from 85 to 91 percent between the two testing periods. However, the share of correct answers that were "ungrounded" rose from 37 to 56 percent. "Ungrounded" answers are those linked to sources that don't actually support the claim being made.
- This means that even when AI Overviews answers are correct, they are increasingly harder for users to verify.