How to force certificate renewal?

I am used to manually using the certbot command from letsencrypt.

How do I handle that in the trellis context? I think I may have got a wrong cert because of misconfiguration earlier and now trellis uses is over and over again or something. Can i manually call the Trellis ansible task for this this and force it to renew?

I have a issue where https://nico.onl says it has the cert for https://stage.nico.onl i am messing around with this for hours now, I am about to file this as a bug but I have a feeling I did something stupid again.

staging

wordpress_sites:
  nico.onl:
    site_hosts:
      - canonical: stage.nico.onl
    local_path: ../nico.onl
    repo: git@gitlab.com:nnico/nico.onl.git
    #repo_subtree_path: site
    branch: master
    multisite:
      enabled: false
    ssl:
      enabled: true
      provider: letsencrypt
    cache:
      enabled: true

production

wordpress_sites:
  nico.onl:
    site_hosts:
      - canonical: nico.onl
        redirects:
          - www.nico.onl
    local_path: ../nico.onl
    repo: git@gitlab.com:nnico/nico.onl.git
    #repo_subtree_path: site
    branch: master
    multisite:
      enabled: false
    ssl:
      enabled: true
      provider: letsencrypt
    cache:
      enabled: true

I have rebuild my droplet several times and had it working at one point when i went straight for production first with

ansible-playbook server.yml -e env=production
and then
ansible-playbook deploy.yml -e "site=nico.onl env=production"

But now I also wanted to deploy stage.nico.onl on the same server, have everything setup and its messed up again.

I looked in /etc/nginx/sites-aviable and also /var/www and there are no different configs or folder created for staging.nico.onl isnt there supposed to be a separate folder for the staging site?

There is a nico.onl.conf and its starts like this so it look like this is the config for the staging site.

server {
  listen [::]:443 ssl http2;
  listen 443 ssl http2;
  server_name stage.nico.onl;

  access_log   /srv/www/nico.onl/logs/access.log main;
  error_log    /srv/www/nico.onl/logs/error.log;

I do not even get now nico.onl gets served if the one and only config in there is for the stage subdomain.

Note so self: You fucking idiot why do you not go with a normal host and stop wasting time with VPS?

As far as I can tell, you are correct that attempting to have staging and production on the same server is creating the challenges. My recommendation would be to use a different server for each environment and the issues you’ve mentioned should resolve. You’ll find various discussion in this Roots discourse of whether to put staging and production on the same server vs. separate servers. In my opinion the intention of Trellis and best practice generally is separate servers.

hostname alias in hosts file. If you must use a single server for both staging and production, use an alias in your hosts file (see option 1). Discussion in that thread will demonstrate how doing this will clarify for Ansible which vars are for staging vs. for production.

I suspect that your SSL cert wasn’t recreated because – with both envs on the same server and with the same hostname in your hosts file – Ansible just always used the staging wordpress_sites even when you indicated -e env=production.

site key. If you must use a single server for both staging and production, use a different “site key” in your staging wordpress_sites dictionary:

 wordpress_sites:
-  nico.onl:
+  stage.nico.onl:
     site_hosts:
     ...

Trellis uses this site key in some directory paths and file names. If on a single server you duplicate this key for different sites or envs, the duplicate will overwrite the original. I think this accounts for why your Nginx conf nico.onl.conf was only for staging.

Try making the change above, then run

# fix up staging
ansible-playbook server.yml -e env=staging --tags letsencrypt

# fix up production
ansible-playbook server.yml -e env=production --tags letsencrypt

After the change and the reprovision, you should see separate files nico.onl.conf and stage.nico.onl.conf, among other improvements.

force cert renewal. During provisioning (i.e., server.yml) Trellis should automatically renew the cert if the cert is more than 60 days old or if there is a change in relevant parameters like the site_hosts etc. I guess there isn’t a “force” renewal option (except maybe setting letsencrypt_min_renewal_age: 0), but if you SSH in to the server and mv the cert in /etc/nginx/ssl/letsencrypt to a different name, then run server.yml again, Trellis will see that the cert file appears to be missing and attempt to recreate it.

managed hosting vs. self-managed VPS. I’ve gained a little experience managing sites on my own VPS, but it has felt like a long road. When issues came up and I was under pressure, it was a road through hell. Other times this investment has paid off to where I have much more access and ability to control stuff than I would have had with a basic managed hosting plan. For me it’s interesting and worth it but I think there are many for whom the time and effort wouldn’t be worth it, given their context. From the little I’ve seen of your posts, it seems you certainly have the aptitude to make the VPS route work well. You’d have to decide whether the investment is worth it.

4 Likes

Thanks, what happened to the guy in the other is exactly what happened to me I think ansible took configs for another environment and I wondered why. That begs the question why is it not fixed/documented/warned about. Why is it not in the default config files this way?

Funny enough, I actually had this setup this way (with the yml site keys) but then due to some error and the fact that the examples do not use this I thought the very top site key in the yml files need to be the same across envs for Trellis can work correctly.

The docs really suck! Seriously sorry but there is no few info in there, I have a million of questions popping up in my head when reading it, this is not explained as are so many others things. Well I know I should improve it. At first I thought Trellis is a dream come true but now after 2 days trails and erroring …

If I am not supposed to have staging and production on the same server it should be mentioned there the examples should reflect that. Even the example github for a site, it actually completely omits the staging config / not filled it out at all. https://github.com/roots/roots-example-project.com/blob/master/trellis/group_vars/staging/wordpress_sites.yml

This would also include that I would need another vault entry for the site.

Renaming the site keys in for staging seems really logical to me now.

As for VPS, I like tweaking, I use Linux as my desktop OS I am a Linux guy, I know how to configure a server to some extend but sometimes I ask myself is it really worth it. I am a WP dev not a server admin. And this is one of those times where it drives me crazy. I actually run a production server with the beta of the h2o webserver nobody uses. I am that crazy. But its a half baked server without half the stuff and pro knowledge behind it that seems to be in Trellis.

Now I have so much invested in this that I cannot stop. I really should sign up for Fastcomet or Siteground and be done with it and focus in my WP dev work.

I will now make this changes, rebuild my droplet once again and see what error comes up next …

I think the current status of the Trellis docs – and the presence or lack of validations generally – reflects the fact that so far Trellis is exclusively the product of volunteer work. Volunteers have donated hundreds, maybe thousands, of person hours to offer a tool that is free to use. That said, I wish it were more polished and complete.

Regarding the status of Trellis and its docs, I think its current status is a tool that offers great ideas and spares people a lot of labor. However, it doesn’t yet spare people having to understand some Ansible basics (like Ansible host groups) or just other dev tool basics. Some have posted threads indignant that the docs didn’t dedicate prime docs real estate at the beginning to offer OS-specific instruction on basic concepts that in my opinion should already be familiar to devs who have even minimal involvement with servers. (Edit: I removed unnecessary example topic.) It’s ok for anyone to not know those concepts. I didn’t know them before getting involved with Trellis. But at this point Trellis doesn’t try to provide documentation for all the basics of setting up a development environment.

It would be great if we were to create or link to materials for all such knowledge prerequisites, but it would take a lot of time to create an effective uncluttered presentation that serves novice and expert alike. At present, volunteers are just short on the time required to coordinate such an effort. I’d love to try it, but I’ve had to discipline myself to focus on more global structural issues and improvements that could have more far-reaching benefits, and which could necessitate a different set of docs instructions anyway. So, I personally have been taking the attitude of “I’ve gotta do the restructuring first, then I’ll know what docs to create.”

The issue you encountered of conflict between staging vs. production vars is a bigger issue than others, however, and shouldn’t be put off. I’m sad you encountered it. Nonetheless, to illustrate the volunteer situation, note that I proposed a validation for this specific issue 1.5 years ago but no volunteers have reviewed it (see one_env_group_only stuff in roots/trellis#562). Nor have any volunteers added anything to the docs on the issue (docs are open for PRs). I don’t fault anyone, and to be fair, I let that #562 validations PR go stale because it’s an example of something I expect will be revamped anyway in those “structural changes” I mentioned I’m exploring.

I don’t say any of this with bitterness, nor with criticism of anyone (e.g., volunteers). Every contribution from volunteers to the project and this forum is a gift. I don’t say any of this as an excuse. I’m only giving my best account of the situation. I agree the docs need help and some examples grow embarrassingly out-of-date. I think that’s just the status of our all-volunteer situation.

Right now the docs are fairly bare, suitable for simple examples and setup, but when issues or needed customizations arise, people will have to familiarize themselves with Ansible and the Trellis implementation. Trellis is pretty ideal for someone with knowledge of server issues, for someone who mostly already knows what needs to happen (server setup/configs) but just needs to learn how Ansible/Trellis automates it.

Figuring out what Trellis is doing is usually pretty straightforward given that the roles and tasks are just YAML, each with a straightforward name parameter like Create WordPress configuration for Nginx. In that example you can see how the Nginx conf file is created and its dest which uses the item.key from the Ansible-basic with_dict loop. At present, a user wishing to resolve some issues or make customizations to Trellis will need to examine the “code”, not just the docs. I think people don’t try doing so often enough. I think they don’t realize that Trellis tasks are easily examined; just a list of tasks that Trellis steps through in linear fashion.

You indicate that so far you’ve invested two days getting familiar with the tools (I realize its probably more). As it seems you will seek more than just the most basic use, I expect you will end up investing at least a few more days initially, then again on various occasions in the future. Given that expectation, you may decide that’s not an investment you want to make. Honestly, to me, it would be worthwhile to invest two weeks – or even two months – to get a handle on tools that could be a powerful and central part of your dev toolset.

Again, my impression is that you’re already proficient with the dev/server issues involved and thus are likely to profit more than most from investing in making Ansible part of your toolbox. You already know that learning tools takes time. But time is limited and I see it as perfectly respectable to prefer paid services to handle these particular matters so that you can “be done with it and focus on [your] WP dev work.”

Best wishes, and thanks for the candid feedback. This can serve as a reference for improvements. It’s valuable perspective that easily fades in memory of those who have been familiar with the tools and their quirks for some time.

6 Likes

roots/trellis#909 ensures vars from a user’s indicated env group take priority, preventing the possibility of getting staging vars when -e env=production.

roots/trellis#910 proposes to prevent the problem of duplicate site keys when multiple environments are loaded on a single server.

1 Like