WordPress migration (F03) — webhouse.app Docs

Probe a live WordPress site, review its theme/builder/content, and import posts, pages, media and taxonomies into a fresh @webhouse/cms site.

What it does

Point at a public WordPress site, click through a 4-step wizard, and end up with a brand-new @webhouse/cms site containing the WP content as JSON documents plus downloaded media. Phase 1 handles the content + media side cleanly; theme/design extraction is Phase 2.

The migration lives at /admin/sites/new → WordPress tab.

What gets imported

Posts, pages, and custom post types
Media files (images, PDFs, downloads) — downloaded and stored under /uploads/
Featured images, wired to the imported post
Categories and tags
Excerpts, publish status, publish date
Authors (as text — not as a relation collection in Phase 1)

Not imported (Phase 2+):

Comments
Menus / navigation
ACF / custom fields
Gutenberg-only blocks that don't transform to clean HTML (kept as HTML but not structured)
Page-builder shortcodes (Divi, WPBakery) — they leak as raw [et_pb_...] text until Phase 2's HTML-scraping fallback lands

The wizard (4 steps)

1. Probe

Enter the WordPress site URL. The wizard calls the WP REST API (/wp-json/wp/v2/...) to detect:

Theme name and version
Page builder in use (Elementor, Divi, WPBakery, Gutenberg, Classic)
Content inventory: post counts per post type, media counts, taxonomy counts

Takes ~3–5 seconds. Works on any self-hosted WordPress with the REST API enabled (default since WP 4.7 — about 90% of sites). wordpress.com hosted sites are not supported directly; you'd need the REST API accessible.

2. Review

Review the detected metadata. No content preview yet — that's a Phase 2 addition. Decide whether to continue based on the inventory numbers.

3. Name

Give the new site an ID and display name, pick which organization to add it under. The wizard will auto-generate cms.config.ts based on the discovered post types — a WP custom post type called exhibitions becomes a CMS collection with the same name.

4. Migrate

Spinner screen. The wizard:

Paginates the WP REST API (100 items per page, no delay between pages)
Downloads each media file, slugifies the filename (e.g. photo-a1b2.jpg), writes to public/uploads/
Rewrites <img> URLs in post content from wp-content/uploads/... to the new /uploads/... paths
Creates one JSON document per post/page in content/<collection>/<slug>.json
Writes the generated cms.config.ts with urlPrefix matching the original WP paths (so any redirects you set up can keep working 1:1)
Registers the site in the CMS registry under the chosen org

Duration: ~30 seconds for a small blog, up to 5 minutes for a site with hundreds of media files. No progress bar in Phase 1 — just the final "Open site in CMS" button.

Authentication

Public WP sites need nothing. For private sites (e.g. wp-admin-protected), the wizard supports WordPress application passwords: username:app-password passed via HTTP Basic Auth.

What to check after migration

Broken shortcodes — if the source used Divi/WPBakery/Elementor, you'll see raw shortcode text in imported content. Either manually clean up or wait for Phase 2 HTML scraping.
Author links — authors come in as text. If you want relational authors, add a team collection and rewrite the author field as a relation.
Custom fields — ACF fields are dropped. Check the WP admin source for fields you need and re-add them as @webhouse/cms fields in cms.config.ts (re-exporting webhouse-schema.json if the site has non-TS consumers).
URL prefix — verify urlPrefix matches the original structure so your old URLs still resolve.
Images with text in them — the downloaded images are byte-identical copies, no alt text inferred. Run the media AI analysis to generate alt text in bulk.

Phase 2+ roadmap

Design token extraction via Dembrandt (colors, fonts, spacing scale)
Tailwind config auto-generation from extracted tokens
HTML scraping fallback for page-builder sites (Divi, WPBakery)
WXR XML import (WP export file) as an offline alternative
Custom field mapping UI
Content preview before commit

Phase 1 is the safe baseline — it won't do anything unexpected, and everything it does import is lossless against the WP REST API response.

Tags:Migration Media Schema

Curation Queue

Calendar

JSON API →Edit on GitHub →