Tony Wright • January 15, 2026

The Chunking Myth: Why Mike King's RAG Optimization Theory Doesn't Survive Contact with Reality

Mike King recently published a rebuttal to Danny Sullivan's comments on content chunking for AI search. King argues that breaking content into carefully optimized "atomic passages" is essential for RAG (Retrieval-Augmented Generation) systems. He cites a 19.24% improvement in cosine similarity with his BubbaChunk tool as evidence.

The problem? The data from actual AI citation patterns tells a completely different story. After reviewing research analyzing 680+ million citations across ChatGPT, Google AI Overviews, and Perplexity, the factors that actually drive AI visibility have almost nothing to do with content chunking.

1. What the Citation Data Actually Shows

Let's start with what AI platforms actually cite. According to Surfer's AI Tracker analysis of 36 million AI Overviews and 46 million citations (March-August 2025), the top-cited sources paint a clear picture:

YouTube dominates at 23.3% of all AI citations. Wikipedia follows at 18.4%. Reddit accounts for 21% in Google AI Overviews specifically. These platforms aren't winning because of optimized chunk sizes. They're winning because they have massive brand authority, user-generated authenticity, and content freshness.

The data gets even more interesting when you look at citation overlap. Only 11% of domains are cited by both ChatGPT and Perplexity. Only 12% of URLs cited by ChatGPT, Perplexity, and Copilot rank in Google's top 10 results. 80% of LLM citations don't even rank in Google's top 100 for the original query.

ChatGPT primarily cites lower-ranking pages at position 21+ approximately 90% of the time. This isn't a system rewarding carefully chunked content—it's a system that operates on entirely different principles than traditional SEO.

2. Brand Authority Beats Content Engineering

The strongest predictor of AI visibility isn't content structure—it's brand search volume. Research from Ahrefs analyzing 75,000 brands found that brand web mentions show a 0.664 correlation with AI Overview visibility. Brand search volume shows a 0.334 correlation with LLM citations.

Compare that to backlinks, which show a weak 0.218 correlation. Kevin Indig's research confirmed it directly: "Brand search volume is the biggest predictor for visibility in ChatGPT."

Brands in the top 25% for web mentions earn up to 10x more AI visibility than others. That's not a marginal improvement from chunking optimization—that's a 10x multiplier from brand authority.

The Princeton University GEO research found that optimization techniques (including structure and formatting) can increase LLM visibility by 30-40%. But here's the catch: that research also found that sites cited across 4+ platforms are 2.8x more likely to appear in ChatGPT responses. Cross-platform presence beats single-page optimization.

3. Page Speed Matters More Than You Think

Want a real competitive advantage? Focus on page speed. Research shows that pages with fast First Contentful Paint (under 0.4 seconds) average 6.7 citations, while slow pages (over 1.13 seconds) average only 2.1 citations. That's a 3x difference—far more impactful than the 19.24% cosine similarity improvement King cites for his chunking tool.

Mobile sites loading under 2 seconds get preferential treatment from Perplexity specifically. TTFB (Time to First Byte) matters most for AI bots—they're making HTTP requests, and if your server doesn't respond quickly, they may timeout and move on.

This isn't about tweaking content structure. It's about basic web performance that most sites already ignore.

4. Content Freshness Dominates

AI platforms cite content that's 25.7% fresher than content cited in traditional organic results. ChatGPT shows the strongest recency bias—76.4% of its most-cited pages were updated within the last 30 days.

Content updated within 30 days gets 3.2x more AI citations. 65% of AI bot traffic targets content published or updated within the last year. AI Overview content changes 70% of the time for the same query, and when it generates a new answer, 45.5% of citations get replaced.

This is a moving target. Static optimization strategies—including careful chunking—decay quickly when the citation landscape shifts 40-60% monthly.

5. The Context Window Problem

King's entire thesis rests on the idea that RAG systems need content chunked for optimal retrieval. But context windows are expanding faster than optimization strategies can keep up.

Llama 4 now supports 10 million tokens. Gemini 2.5 Pro handles 2 million tokens with over 99% retrieval accuracy. Magic.dev's LTM-2-Mini pushes 100 million tokens—equivalent to 10 million lines of code. Google's own documentation states that the "default place to start is now just putting all tokens into context window."

The "lost in the middle" problem that justified chunking strategies is being solved at the architecture level. Research from Stanford and Berkeley (Liu et al., 2024) documented the U-shaped performance curve where LLMs struggle with information in the middle of long contexts. But newer models like Gemini 2.5 Flash show near-perfect accuracy regardless of document position.

Context window expansion grew 30x per year since mid-2023. The ability to use that input effectively is improving even faster—the input length where top models reach 80% accuracy has risen over 250x in the past 9 months.

6. The Conflict of Interest Problem

King sells BubbaChunk as a commercial tool. His rebuttal to Sullivan is essentially marketing for that tool. The 19.24% improvement figure comes from his own testing, on his own tool, measuring his own success metric (cosine similarity in isolation).

Cosine similarity improvements in a lab environment don't necessarily translate to production wins. Google publishes hundreds of RAG research papers annually—over 1,200 in 2024 alone versus fewer than 100 in 2023. The path from research to production involves what Google calls "massive pruning."

A 2024 study found RAG healthcare tools reduced diagnostic errors by only 15% versus traditional AI. A January 2025 study noted that RAG component influence on final outputs "remains underexplored." The gap between research claims and production reality is significant.

7. What Sullivan Actually Said

Sullivan's original point wasn't that content structure doesn't matter at all. He warned against an optimization arms race where publishers spend resources on incremental improvements while missing the bigger picture.

The data supports Sullivan's position. Brands cited in AI Overviews earn 35% more organic clicks. Organic CTR dropped 61% for queries where AI Overviews appear. Getting cited matters enormously—but the path to citations isn't through chunking optimization.

It's through brand building, content freshness, cross-platform presence, and technical performance. The top three correlations with AI visibility are all off-site factors: brand web mentions (0.664), brand anchors (0.527), and brand search volume (0.392).

8. The Practical Takeaway

If you have limited resources (and who doesn't), here's where the data says you should invest:

Build brand authority across multiple platforms. Wikipedia, Reddit, LinkedIn, and YouTube mentions drive visibility far more than on-page optimization. Make your site fast. Sub-2-second load times and sub-0.4-second FCP give you a 3x citation advantage. Keep content fresh. Update high-value content monthly. AI platforms heavily favor recent information. Establish entity presence on Wikidata and Wikipedia if you're notable enough. Presence across 4+ third-party platforms increases citation likelihood 2.8x. Create content with original data, statistics, and quotations. Princeton research found these elements boost visibility by 22-40%.

Notice what's not on that list? Obsessing over chunk sizes for RAG retrieval.

The Bottom Line

King's chunking thesis represents the kind of technical optimization that feels satisfying but misses where the actual leverage is. A 19.24% improvement in cosine similarity sounds impressive until you compare it to 10x visibility gains from brand authority, 3x gains from page speed, or 3.2x gains from content freshness.

The platforms winning AI visibility aren't winning because they optimized for RAG retrieval. Reddit threads, YouTube videos, and Wikipedia articles dominate citations despite being "poorly structured" by traditional SEO standards. They win because they have authority, authenticity, and freshness.

Sullivan was right to warn against an optimization arms race. The data shows the race King is running isn't the one that matters.

Wisdom from an Experienced Fractional CMO

Man operating a large camera setup outdoors. He wears a hat, shorts, and is focused.
By Tony Wright January 13, 2026
Discover how a small tamale shop leveraged a single viral video to compete with major brands. Learn the marketing strategies behind viral success and how authenticity can transform your business reach.
People at a table review graphs and data on a tablet, a clipboard, and a laptop.
By Tony Wright January 13, 2026
Learn why micro-conversions are the missing piece in your marketing strategy. Discover how tracking small user actions can dramatically improve your conversion rates and marketing ROI.
Hand holding a blank, cream-colored rectangular card against a mottled brown background.
By Tony Wright January 13, 2026
Welcome to The Marketing Minute with Tony Wright, your go-to resource for actionable marketing insights delivered in bite-sized episodes. Learn proven strategies to elevate your marketing game.
OpenAI logo on a dark screen, with a white stylized flower symbol and the company name.
By Tony Wright January 13, 2026
Discover how to leverage AI technology effectively while maintaining authentic human connection in your marketing. Learn the balance between automation and authenticity that drives real results.
Show More