If anyone here is familiar with Trellis’s ansible config and also with Docker networking…
Can you think of why running trellis up
(and hence the vagrant/ansible provisioning) would break the way a Docker container does DNS name resolution?
(I’ve updated my Vagrantfile such that it tries to use Docker for local development rather than VirtualBox so this works on Apple Silicon M1 chip, which Virtualbox doesn’t run on).
Below you see that after I do trellis up
, I can ssh into the docker container and show that networking isn’t working, then I add a DNS server, and it works:
trellis [main●] vagrant ssh
Welcome to Ubuntu 20.04.3 LTS (GNU/Linux 5.10.104-linuxkit aarch64)
...
vagrant@webnext:~$ cat /etc/resolv.conf
nameserver 127.0.0.11
options ndots:0
vagrant@webnext:~$ curl -I http://ports.ubuntu.com/ubuntu-ports
curl: (6) Could not resolve host: ports.ubuntu.com
vagrant@webnext:~$ sudo vi /etc/resolv.conf #adding a public nameserver like 1.1.1.1
vagrant@webnext:~$ cat /etc/resolv.conf
nameserver 127.0.0.11
nameserver 1.1.1.1
options ndots:0
vagrant@webnext:~$ curl -I http://ports.ubuntu.com/ubuntu-ports
HTTP/1.1 301 Moved Permanently
Location: http://ports.ubuntu.com/ubuntu-ports/
But for context, nameserver 127.0.0.11
is supposed to make Docker container use its own internal DNS system which just uses the host’s DNS resolver. And when I checked it seems that this works if I comment out in dev.yml
the firewall bits:
roles:
...
- { role: ferm, tags: [ferm] }
So I think maybe it’s something to do with that (but that also seemed to leave other things not working, so I don’t think that’s a solution).
I think the ferm
firewall system must be the culprit here. But for now, one workaround could be to update the resolv.conf file. Here’s how I’m doing that:
- In
trellis/dev.yml
, add a new ‘dns’ role, and add it before the ‘common’ role:
roles:
- { role: dns, tags: [dns] }
- { role: common, tags: [common] }
then set up that role with a task that updates the resolv.conf file:
mkdir -p trellis/roles/dns/tasks
cat > trellis/roles/dns/tasks/main.yml
---
- name: Add public nameserver to resolv.conf
shell: echo "nameserver 1.1.1.1" >> /etc/resolv.conf
Now if we run vagrant up --provision
(or , then vagrant ssh
then curl -I https://github.com
it should be able to resolve that name. Since I was doing this on a new box with lots of attempts, I used vagrant destroy
and then trellis up
to test it from a clean slate.
Note: Docker apparently tries to prevent against changes to the resolv.conf in the container (it should instead be done in the image), so Ansible’s lineinfile
module which internally copies a file over another file does not work - Docker throws an error if you try something like this instead of using the shell
command as I did above:
- name: Add public nameserver to resolv.conf
lineinfile:
path: /etc/resolv.conf
line: "nameserver 1.1.1.1"
^ does not work
I also tried this but it didn’t change the file either… perhaps something to do with the shell redirection not working via Ansible -
- name: Add public nameserver to resolv.conf
command: echo "nameserver 1.1.1.1" >> /etc/resolv.conf
Note also, 1.1.1.1 should work universally - it is Cloudflare’s DNS
https://www.cloudflare.com/en-gb/learning/dns/what-is-1.1.1.1/
Have you tried to trellis up
with the ferm
role commented out or removed? That should confirm this theory or not. Personally I don’t think it’s ferm itself.
Vagrant does a few things for networking purposes itself (thought my memory is fuzzy on specifics), so this might just be the equivalent behaviour missing from the manual Docker setup.
Would you mind writing up a short description of your Trellis docker setup?
I have the exact same problem in a production server that has:
- Trellis + Bedrock + Sage
- and runs Docker with a different service
Yes, if I set ferm_enabled to false it works properly otherwise it breaks DNS resolution.
You easily replicate this by installing docker in a machine and then try to run this command:
$ docker run busybox nslookup google.com
If ferm is enabled, the output should be:
;; connection timed out; no servers could be reached*
With ferm disabled it should be something like:
Server: X.X.X.X
Address: X.X.X.X
Non-authoritative answer:
Name: google.com
Address: 142.250.187.238
Non-authoritative answer:
Name: google.com
Address: 2a00:1450:4009:822::200e
Unfortunately, I have to literally disable ferm. The fix from @techieshark doesn’t work in my case - updating server.yml.
Any clues? Is there a way to check the dropped packets (i.e. I am browsing but can’t find anything)?