Content

The Chunk Spec: How to Structure Content So AI Cites It

AI answer engines lift the one passage that answers the question. Structure your page as self-contained, liftable chunks — and a before-and-after that proves it — to get cited far more often.

GetCited · 8 min read · Updated 21 June 2026

AI answer engines don't cite pages — they cite passages. When someone asks ChatGPT or Perplexity a question, the engine retrieves a handful of pages and lifts the specific span that answers it. So the unit that gets cited isn't your article; it's a 40–75 word chunk inside it. Structure your page as a series of self-contained, liftable chunks and you get cited far more often. This guide is the spec — and it's written to its own spec, so you can see the shape as you read it.

What is a content chunk?

A content chunk is a short, self-contained passage — roughly 40–75 words — that fully answers one question without needing the surrounding text. It names its subject, states the claim, and stands alone if lifted out of the page. Self-contained 40–75 word answer chunks are cited about 3.1× more often than longer passages (Kime.ai / Kopp, 2025), because the engine can quote the chunk verbatim instead of paraphrasing a 300-word block.

Why does chunking beat long prose?

Answer engines extract, they don't read for gist. Given a question, the model scores candidate passages on how directly and completely each answers it, then quotes the best one. A claim buried mid-paragraph, dependent on three sentences of preceding context, scores poorly — it can't be lifted cleanly. The same claim, stated as a self-contained chunk under a matching heading, is trivially liftable. Definition: liftability is the property of answering a question completely in isolation, with no pronoun or context dependency.

Where on the page should the answer go?

Put the answer first — the BLUF (Bottom Line Up Front) pattern. About 44% of AI citations come from the first third of the page (ALM Corp, 2025), so a buried answer is a wasted one. Open each page with a chunk that answers the primary question in the first 40–60 words, then expand. Do the same inside every section: lead with the direct answer, follow with the nuance. The engine reads top-down and rewards pages that resolve the query early.

How should I write the headings?

Shape every H2 as the question your reader would actually type, or the noun-phrase that mirrors it. "How should I write the headings?" beats "Heading Strategy", because answer engines match the user's prompt against your headings before they ever read the body. A question-shaped heading tells the model exactly which prompt this chunk answers, then the 40–75 word chunk beneath it delivers the lift. Heading and chunk work as a matched pair: one declares the question, the other resolves it.

What goes in each section?

Aim for one hard fact and one named concept per section. A specific statistic — ideally original or precisely sourced — gives the engine a concrete, quotable claim it can attribute to you. A named definition or framework gives it a concept anchor it can cite when explaining the topic. Quick rule: every section should contain at least one number a reader could repeat and one term a reader could look up. Sections that are all assertion and no anchor rarely get lifted.

Which schema markup helps citation?

Add three structured-data types. Article (with a real, recent dateModified) tells engines what the page is and when it changed. FAQPage marks your question-and-answer pairs as discrete Q&A units — it correlates with roughly a 2× citation lift, though the evidence is debated and it's necessary, not sufficient (GetCited mechanic research, 2025). Organization (with sameAs links to your profiles) resolves your brand as an entity. Schema doesn't replace good chunks; it labels them so machines parse them correctly.

How many entities should a page mention?

Mention and link recognised entities generously — people, organisations, products, places, and standards that exist in knowledge graphs. Pages with 15+ recognised entities are about 4.8× more likely to be cited in AI Overviews (Wellows / iPullRank, 2025). Entities give the model grounding: each one is a node it can resolve and trust. A chunk that names "Perplexity, Wikidata, and Schema.org" is more citable than the same chunk written with vague nouns like "the tools" or "the standards".

How fresh does content need to be?

Fresh, and visibly so. Content updated within roughly the last 30 days gets about 3.2× more citations (Profound, 2025), so freshness is a structural signal, not a vanity field. Show a "last updated" date on the page and match it in your Article schema's dateModified. Revisit cited pages on a cadence — refresh the stat, re-confirm the claim, bump the date when the content genuinely changes. Stale pages decay out of the citation set even when the answer is still correct.

Before and after: a paragraph rewritten into a chunk

Here is the whole spec in one example. The "before" is how most pages are written — accurate, but unliftable.

Before (sprawling, 78 words, unliftable):

When we started looking into this, we realised there were a lot of moving parts. There's the question of how the content is laid out, and then separately there's the matter of how search and answer tools actually go about pulling things in, which turns out to be quite different from how a person reads. After a fair bit of testing across different setups, we came to the view that shorter, focused sections seemed to do better for us than the longer ones we'd used before.

That passage can't be lifted: it never names its subject, leans on "this" and "it", and hides the claim in the last clause. Here is the same point as a chunk.

After (self-contained, 52 words, liftable):

Self-contained passages of 40–75 words are cited by AI answer engines roughly 3.1× more often than longer paragraphs (Kime.ai / Kopp, 2025). The reason is mechanical: engines lift the exact span that answers a question, so a focused chunk that names its subject and states its claim outright is far easier to quote than buried prose.

The "after" names its subject, states the claim first, carries a sourced number, and stands alone — exactly what an answer engine needs to lift it.

The chunk spec checklist

| Element | Why it earns citations | Quick test | |---|---|---| | 40–75 word chunks | Liftable spans are cited ~3.1× more (Kime.ai / Kopp, 2025) | Can this passage be copied out and still make sense alone? | | BLUF / answer-first | ~44% of citations come from the first third (ALM Corp, 2025) | Is the answer in the first 40–60 words of the page and each section? | | Question-shaped H2s | Headings are matched against the user's prompt | Does each heading read like something a user would type? | | One stat + one concept per section | Gives the engine a quotable number and a citable term | Is there a repeatable number and a lookup-able term in this section? | | Article + FAQ + Organization schema | Labels content as Q&A and resolves the brand entity | Does the page validate in a structured-data tester? | | 15+ recognised entities | 4.8× more likely to be cited in AI Overviews (Wellows / iPullRank, 2025) | Could you link 15+ named entities to a knowledge graph? | | Visible freshness | ~30-day-fresh content cited ~3.2× more (Profound, 2025) | Does an updated date show on-page and match the schema? | | No context dependency | Pronouns and "as above" break liftability | Does any chunk start with "this", "it", or "as mentioned"? |

The honest part

Chunking is necessary, not sufficient. Perfectly structured content still needs to live on a retrievable, trusted page, and the biggest citation lever — being placed on authority listicles that already rank — sits off your own site. But the spec above is the part fully in your control, most brands haven't done it, and it compounds with everything else. If you'd rather have the rewrite and the placement done for you and proven, that's what GetCited is.

Sources

Kime.ai / Kopp — content chunk-size citation analysis (40–75 words, 3.1×) (2025)
ALM Corp — citation position analysis (~44% from first third of page) (2025)
Wellows / iPullRank — entity density and AI Overview citation study (15+ entities, 4.8×) (2025)
Profound — content freshness citation lift study (~30 days, 3.2×) (2025)

Related guides

Want this done for you — and proven?

GetCited measures whether ChatGPT, Perplexity, Google AI Overviews and Claude cite your brand, then does the work to move it — with the dated transcripts behind every number.

See packages