Probe a live WordPress site, review its theme/builder/content, and import posts, pages, media and taxonomies into a fresh @webhouse/cms site.
What it does
Point at a public WordPress site, click through a 4-step wizard, and end up with a brand-new @webhouse/cms site containing the WP content as JSON documents plus downloaded media. Phase 1 handles the content + media side cleanly; theme/design extraction is Phase 2.
The migration lives at /admin/sites/new → WordPress tab.
What gets imported
- Posts, pages, and custom post types
- Media files (images, PDFs, downloads) — downloaded and stored under
/uploads/ - Featured images, wired to the imported post
- Categories and tags
- Excerpts, publish status, publish date
- Authors (as text — not as a relation collection in Phase 1)
Not imported (Phase 2+):
- Comments
- Menus / navigation
- ACF / custom fields
- Gutenberg-only blocks that don't transform to clean HTML (kept as HTML but not structured)
- Page-builder shortcodes (Divi, WPBakery) — they leak as raw
[et_pb_...]text until Phase 2's HTML-scraping fallback lands
The wizard (4 steps)
1. Probe
Enter the WordPress site URL. The wizard calls the WP REST API (/wp-json/wp/v2/...) to detect:
- Theme name and version
- Page builder in use (Elementor, Divi, WPBakery, Gutenberg, Classic)
- Content inventory: post counts per post type, media counts, taxonomy counts
Takes ~3–5 seconds. Works on any self-hosted WordPress with the REST API enabled (default since WP 4.7 — about 90% of sites). wordpress.com hosted sites are not supported directly; you'd need the REST API accessible.
2. Review
Review the detected metadata. No content preview yet — that's a Phase 2 addition. Decide whether to continue based on the inventory numbers.
3. Name
Give the new site an ID and display name, pick which organization to add it under. The wizard will auto-generate cms.config.ts based on the discovered post types — a WP custom post type called exhibitions becomes a CMS collection with the same name.
4. Migrate
Spinner screen. The wizard:
- Paginates the WP REST API (100 items per page, no delay between pages)
- Downloads each media file, slugifies the filename (e.g.
photo-a1b2.jpg), writes topublic/uploads/ - Rewrites
<img>URLs in post content fromwp-content/uploads/...to the new/uploads/...paths - Creates one JSON document per post/page in
content/<collection>/<slug>.json - Writes the generated
cms.config.tswithurlPrefixmatching the original WP paths (so any redirects you set up can keep working 1:1) - Registers the site in the CMS registry under the chosen org
Duration: ~30 seconds for a small blog, up to 5 minutes for a site with hundreds of media files. No progress bar in Phase 1 — just the final "Open site in CMS" button.
Authentication
Public WP sites need nothing. For private sites (e.g. wp-admin-protected), the wizard supports WordPress application passwords: username:app-password passed via HTTP Basic Auth.
What to check after migration
- Broken shortcodes — if the source used Divi/WPBakery/Elementor, you'll see raw shortcode text in imported content. Either manually clean up or wait for Phase 2 HTML scraping.
- Author links — authors come in as text. If you want relational authors, add a
teamcollection and rewrite theauthorfield as a relation. - Custom fields — ACF fields are dropped. Check the WP admin source for fields you need and re-add them as @webhouse/cms fields in
cms.config.ts(re-exportingwebhouse-schema.jsonif the site has non-TS consumers). - URL prefix — verify
urlPrefixmatches the original structure so your old URLs still resolve. - Images with text in them — the downloaded images are byte-identical copies, no alt text inferred. Run the media AI analysis to generate alt text in bulk.
Phase 2+ roadmap
- Design token extraction via Dembrandt (colors, fonts, spacing scale)
- Tailwind config auto-generation from extracted tokens
- HTML scraping fallback for page-builder sites (Divi, WPBakery)
- WXR XML import (WP export file) as an offline alternative
- Custom field mapping UI
- Content preview before commit
Phase 1 is the safe baseline — it won't do anything unexpected, and everything it does import is lossless against the WP REST API response.