Ideas to import 700k html pages as "Pages"?

Hi All -

I have a large static html website. I want to import the html files into a wordpress site.

I already have a theme, so I do not want to make a new theme.

I want the content imported as Pages.

Note: A plugin would be nice. However I am a decent coder. I was considering to use the API but still unsure.

Any ideas?

Thank you!!!

I’d just parse the content and add it with wp_insert_post inside of an Acorn command.

wp acorn make:command PostImportCommand

Is the only source of content static HTML files? Ideally you’d be able to format it into JSON and use something like lazy-json to loop and insert.

If all you have is HTML, I’d scrape it with something like Scrapy (there’s PHP options too, but Scrapy is very fast) to get the data properly formatted for importing.

ChatGPT would probably have no issue writing the Scrapy spider for you with some guidance.

5 Likes

Nice! def. gonna start making use of Acorn commands! Always end up with some migration script that triggers via a single-use admin URL. wp-cli/acorn commands look loads cleaner/saner way of handling util scripts :pray:

1 Like

You my friend are a man among the Gods! Huge massive thanks!!!
Im speechless.

As a follow up - it’s working great. I made a script in Vim to automate most of the process. It’s still a bit manual, simply because we have Topics, Sections and Books+Authors.

Again, Thank you!