Articles

How We Built Four AI Agents to Convert Legacy HTML Into a Strapi CMS

By Claus Villumsen
01 May, 2025
Share this article
We had a mountain of legacy HTML that needed to become structured Strapi CMS content. Manual conversion would have taken months. We built four AI agents to do it instead. Here is how they work.
Every legacy modernization project eventually hits the same wall. The code is being replaced. The architecture is improving. And then someone points at the content and asks what happens to all of that.
In our case, the answer was a mountain of legacy HTML. Thousands of pages, inconsistently structured, carrying years of accumulated content in a format that no modern CMS wanted to ingest directly. Manual conversion was not realistic. It would have taken a team of developers months, introduced thousands of errors, and cost more than the rest of the project combined.
So we built something different. A crew of four specialized AI agents, each with a single job, working together to take raw HTML and produce fully structured Strapi CMS content. Here is how each one works and why the architecture is the way it is.
Why one agent is not enough
The first instinct is to build one agent and ask it to do everything. Understand the HTML, extract the content, map it to the CMS schema, handle navigation, manage the process. One prompt, one model, one output.
This approach fails on large volumes for the same reason that asking a single person to do every job in a project fails. The cognitive load becomes too high. The context window fills up with irrelevant information. The output quality drops as the task complexity increases.
The better approach is specialization. Give each agent one job, constrained inputs, and a well-defined output format. Build an orchestrator to manage the flow. The system becomes more reliable, easier to debug, and easier to improve incrementally.
Agent A — The Orchestrator
Agent A runs the process. It maintains a reusable URL registry that tracks which pages have been processed, which are in progress, and which have failed. It dishes out work to the other agents, monitors their outputs, and handles retries when something goes wrong.
Without Agent A, the rest of the system would have no coherence. Pages would be processed in arbitrary order. Failures would be lost. The same URL might be processed multiple times. The Orchestrator is the reason the system can run reliably at scale.
Agent B — The Collector
Agent B finds pages. It is the scout, the crawler, the one responsible for discovering everything that needs to be processed and feeding it upstream to Agent A.
Its job sounds simple. In practice it is not. Legacy HTML sites are rarely consistent. Navigation structures vary. Some pages are linked from menus. Others exist only in sitemaps. Some are orphaned entirely. Agent B handles all of this, building a complete inventory before any transformation begins.
Agent C — The Page Builder
Agent C does the heaviest work. It takes raw HTML from a single page and reshapes it into Strapi-friendly JSON. Pages, SEO data, components, APIs, the whole structure. It turns a mess of legacy markup into something clean, normalized, and ready to drop into a CMS.
The quality of Agent C's output determines the quality of everything downstream. We spent most of our engineering time here, training the agent on edge cases, handling inconsistent HTML structures, and building validation into the output format so that errors surface early rather than at import time.
Agent D — The Navigation Architect
Agent D handles the structural connective tissue that makes a website function. Navigation menus, footer structures, internal linking patterns. It builds the scaffolding that ties pages together into a coherent experience rather than an unconnected pile of content.
This is the piece most people forget when they think about content migration. Getting the pages right is necessary but not sufficient. A site where no page knows how to reach any other page is not a functioning website. Agent D is the reason the output is a site, not just a collection of documents.
What the system produces
The four agents together take a legacy HTML site and produce a fully structured Strapi CMS import, ready for a developer to run. The output is consistent, validated, and documented. The pages are structured. The navigation is intact. The SEO data is preserved. The content is normalized.
What would have taken months of manual work happens in hours. Not because AI is magic. Because the problem was broken down into four well-defined jobs, each assigned to an agent with the right context and the right constraints to do that job reliably.
That is the principle behind everything we build at Kodebaze. Not one model trying to do everything. Many constrained agents, each doing one thing well, coordinated by a system that keeps the whole process moving forward.
The same multi-agent approach powers how Kodebaze analyzes and modernizes legacy codebases. One agent per job. One constraint per agent. See how the AI modernization factory works →
Related articles

Work
Productivity
Legacy modernization requires different instincts than greenfield development. These are the eleven habits that separate engineers who succeed at it from those who struggle.

AI

Legacy Modernization
AI
AI + Human
AI + Human software Solution
© 2026 Kodebaze. All Rights Reserved.