Trellis provisioning breaks DNS in docker

If anyone here is familiar with Trellis’s ansible config and also with Docker networking…

Can you think of why running trellis up (and hence the vagrant/ansible provisioning) would break the way a Docker container does DNS name resolution?

(I’ve updated my Vagrantfile such that it tries to use Docker for local development rather than VirtualBox so this works on Apple Silicon M1 chip, which Virtualbox doesn’t run on).

Below you see that after I do trellis up, I can ssh into the docker container and show that networking isn’t working, then I add a DNS server, and it works:

trellis [main●] vagrant ssh
Welcome to Ubuntu 20.04.3 LTS (GNU/Linux 5.10.104-linuxkit aarch64)
...
vagrant@webnext:~$ cat /etc/resolv.conf
nameserver 127.0.0.11
options ndots:0

vagrant@webnext:~$ curl -I http://ports.ubuntu.com/ubuntu-ports
curl: (6) Could not resolve host: ports.ubuntu.com

vagrant@webnext:~$ sudo vi /etc/resolv.conf #adding a public nameserver like 1.1.1.1
vagrant@webnext:~$ cat /etc/resolv.conf
nameserver 127.0.0.11
nameserver 1.1.1.1
options ndots:0

vagrant@webnext:~$ curl -I http://ports.ubuntu.com/ubuntu-ports
HTTP/1.1 301 Moved Permanently
Location: http://ports.ubuntu.com/ubuntu-ports/

But for context, nameserver 127.0.0.11 is supposed to make Docker container use its own internal DNS system which just uses the host’s DNS resolver. And when I checked it seems that this works if I comment out in dev.yml the firewall bits:

roles:
 ...
    - { role: ferm, tags: [ferm] }

So I think maybe it’s something to do with that (but that also seemed to leave other things not working, so I don’t think that’s a solution).

I think the ferm firewall system must be the culprit here. But for now, one workaround could be to update the resolv.conf file. Here’s how I’m doing that:

  1. In trellis/dev.yml, add a new ‘dns’ role, and add it before the ‘common’ role:
  roles:
    - { role: dns, tags: [dns] }
    - { role: common, tags: [common] }

then set up that role with a task that updates the resolv.conf file:

mkdir -p trellis/roles/dns/tasks
cat > trellis/roles/dns/tasks/main.yml
---
- name: Add public nameserver to resolv.conf
  shell: echo "nameserver 1.1.1.1" >> /etc/resolv.conf

Now if we run vagrant up --provision (or , then vagrant ssh then curl -I https://github.com it should be able to resolve that name. Since I was doing this on a new box with lots of attempts, I used vagrant destroy and then trellis up to test it from a clean slate.

Note: Docker apparently tries to prevent against changes to the resolv.conf in the container (it should instead be done in the image), so Ansible’s lineinfile module which internally copies a file over another file does not work - Docker throws an error if you try something like this instead of using the shell command as I did above:

 - name: Add public nameserver to resolv.conf
   lineinfile:
     path: /etc/resolv.conf
     line: "nameserver 1.1.1.1"

^ does not work

I also tried this but it didn’t change the file either… perhaps something to do with the shell redirection not working via Ansible -

- name: Add public nameserver to resolv.conf
  command: echo "nameserver 1.1.1.1" >> /etc/resolv.conf

Note also, 1.1.1.1 should work universally - it is Cloudflare’s DNS
https://www.cloudflare.com/en-gb/learning/dns/what-is-1.1.1.1/

Have you tried to trellis up with the ferm role commented out or removed? That should confirm this theory or not. Personally I don’t think it’s ferm itself.

Vagrant does a few things for networking purposes itself (thought my memory is fuzzy on specifics), so this might just be the equivalent behaviour missing from the manual Docker setup.

Would you mind writing up a short description of your Trellis docker setup?

I have the exact same problem in a production server that has:

  • Trellis + Bedrock + Sage
  • and runs Docker with a different service

Yes, if I set ferm_enabled to false it works properly otherwise it breaks DNS resolution.

You easily replicate this by installing docker in a machine and then try to run this command:

$ docker run busybox nslookup google.com

If ferm is enabled, the output should be:

;; connection timed out; no servers could be reached*

With ferm disabled it should be something like:

Server:           X.X.X.X
Address:        X.X.X.X

Non-authoritative answer:
Name:   google.com
Address: 142.250.187.238

Non-authoritative answer:
Name:   google.com
Address: 2a00:1450:4009:822::200e

Unfortunately, I have to literally disable ferm. The fix from @techieshark doesn’t work in my case - updating server.yml.

Any clues? Is there a way to check the dropped packets (i.e. I am browsing but can’t find anything)?