Build notes
How this was built.
A small AI-assisted experiment: turn Wikipedia’s own citation and source-maintenance tags into a searchable list of publishing opportunities for founders, marketers, researchers, and agents.
Credit
This was inspired by an Enso experiment. I used that spark to build a focused version around Wikipedia citation and source-maintenance signals.
The short version
I used the March 2026 English Wikipedia Kiwix snapshot, scanned it for existing citation, verifiability, source, original-research, and link-rot maintenance templates, then turned the matches into a small SQLite-backed search app.
The product does not invent gaps. It only shows places where Wikipedia already marks a page or passage as needing better sourcing.
What the pipeline did
- Downloaded the English Wikipedia no-image ZIM snapshot from March 2026.
- Scanned candidate pages for official maintenance templates such as citation needed, more reliable sources, original research, verifiability, and link-rot cleanup.
- Extracted the article title, Wikipedia URL, template type, nearby passage, section context, and supporting evidence.
- Filtered the output into a high-signal CSV and SQLite database.
- Built a simple static site with a Node server, public preview search, authenticated exports, API endpoints, and an MCP endpoint for agents.
What came out
- 913,965 candidate Wikipedia pages checked.
- 6,487 citation/source opportunities extracted.
- 5,136 unique articles with a usable maintenance signal.
- 0 extraction errors in the final run.
- Updated from the March 2026 English Wikipedia snapshot.
How to read a result
A result is not a promise that Wikipedia will cite you. It is a signal that an existing article has a sourcing problem. The useful move is to publish something genuinely helpful: a dataset, benchmark, primary research page, expert explainer, source comparison, or documentation page that makes the article easier to verify.
Why there is a free cap
The public site shows two examples per query so people can understand the idea without turning the dataset into a bulk export. Full access unlocks the complete CSV/JSON dataset, higher API search limits, and MCP tools.
Built with AI, but grounded in source data
AI helped with coding, interface copy, deployment, and verification. The underlying opportunities come from the Wikipedia snapshot and the extracted maintenance tags, not from AI guessing what Wikipedia might need.