Configure database character set to utf-8 (utf8mb4)


Is it possible to change the selected the character set and collation for the database provisioning?

I searched in the config files to see if there was a parameter that was setting it but I didn’t found anything.

I guest I will be suppose to add a settings in the mariadb/defaults/main.yml but not sure which naming convention is ok for this playbook. I’m new in DevOps stack and with Ansible, I found this type of setting:
- ‘default-character-set = utf8’

But I don’t find the right way to add this kind of configuration.
Someone knows this?

Thank you

I’m not a pro on the topic, but I think that if the project you’re deploying (repo) is Bedrock-based, it should default to utf8mb4 for a WP site’s new DB tables, so you maybe wouldn’t need the setting in the mariadb defaults.

If you’re importing an existing DB after deploy, a DB that isn’t already utf8mb4, I think you’d need to manually update the DB, e.g.,

Thank you for your answer @fullyint, yes I use Bedrock, I was so focus on provision that I didn’t think to look at the WP db config file.

But that makes me think this differences between the 2 configurations do not respect the principle of keeping same exact environment all the way to the production phase.

This could cause problem if the dev environment use special chars that could be output in the wrong way on the remote part, am I wrong?

You mention “differences between the 2 configurations” but I think this DB_CHARSET would typically be the same for dev, staging, and production.

When you set up your project’s files locally, your site is presumably the same Bedrock project (repo) as you will use for dev, staging, and production, thus all having the same DB_CHARSET.

Is there some other config you’re thinking of that does differ between environments?

Note: The above assumes you are using the Trellis Vagrant VM for local dev, not something else like Valet or MAMP.

Yes I’m using Trellis with Vagrant, I said that because when I take a look in the database with MySQL WorkBench, it indicates latin1 set with swedish collation, the default of MySQL and MariaDB.

But I didn’t really search deeper to understand how it impact on development to have this kind of setup. I just experienced problem in the past with characters encoding so I try to avoid it more as possible.

Again, I’m not a pro on the topic, but if you see latin1 with swedish collation, I suspect the table or column was created in some non-Bedrock context (without the define('DB_CHARSET', 'utf8mb4');) or before Bedrock had switched to utf8mb4 (over 2 years ago, I think). Otherwise, perhaps a certain WP plugin created a table or column explicitly as latin1 with swedish collation.

If somehow the latin1 with swedish collation is a default for the DB generally, I suggest verifying individual table and column structures for their encoding and collation, which perhaps may be utf8mb4 where you want them to be.

Do you see the latin1 with swedish collation on a fresh test DB? To answer this, you could test letting WP create an empty DB instead of importing and examining a preexisting DB.

As for the default-character-set = utf8 config you mentioned, I assume it would belong in the Trellis my.cnf.j2 (must reprovision after adding). Adding that is adjusting Trellis core files, normally to be avoided. If you test and it proves useful, you could propose that Trellis build in a related option enabling users to simply define mysql_default_charset: utf8mb4 in group_vars/all/main.yml.

1 Like

There it is! I didn’t look at the tables individually, because as far I know it is better to set up the chars over all the DB but hey, I’m not an expert too, so maybe I missed something here about best practices, I will dig it.

You’re right, it’s the correct set on the tables, thank you to enlighted me @fullyint :raised_hands:

By the way it’s a recent installation from last week, I’m up to date.

Thank you for your time to help me. :+1:

1 Like