排查本地 www.cloudflare.com 指向 127.0.0.1 问题
最近访问 www.cloudflare.com 时发现域名解析出来后总被指向到 127.0.0.1,以为被劫持了:
$ ping www.cloudflare.com PING www.cloudflare.com (127.0.0.1) 56(84) 比特的数据。 64 比特,来自 localhost (127.0.0.1): icmp_seq=1 ttl=64 时间=0.090 毫秒
检查了 /etc/hosts 没有手动添加的记录,再用 Wireshark 抓 DNS 协议的包,发现竟然没有解析 www.cloudflare.com 的数据包,所以问题应该就出在本地 DNS 缓存了。
用 strace 抓包验证下:
strace -f -o /tmp/cloudflare curl www.cloudflare.com
这里主要注意加上 -f 参数,才能跟踪子进程的调用,因为有些问题出在子进程上,不跟踪子进程会丢失关键信息。
既然是网络连接,对抓出来的系统调用,只用关心 connect 调用即可,grep 一下:
$ fgrep connect /tmp/cloudflare 932422 connect(7, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (没有那个文件或目录) 932422 connect(7, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (没有那个文件或目录) 932422 connect(7, {sa_family=AF_UNIX, sun_path="/run/dbus/system_bus_socket"}, 30) = 0 932422 connect(7, {sa_family=AF_INET6, sin6_port=htons(80), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "2606:4700::6810:7c60", &sin6_addr), sin6_scope_id=0}, 28) = -1 ENETUNREACH (网络不可达) 932422 connect(7, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0 932422 connect(7, {sa_family=AF_INET6, sin6_port=htons(80), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "2606:4700::6810:7b60", &sin6_addr), sin6_scope_id=0}, 28) = -1 ENETUNREACH (网络不可达) 932422 connect(7, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0 932422 connect(7, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("127.0.0.1")}, 16) = 0 932421 connect(5, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (操作现在正在进行) 932421 connect(5, {sa_family=AF_INET6, sin6_port=htons(80), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "2606:4700::6810:7c60", &sin6_addr), sin6_scope_id=0}, 28) = -1 ENETUNREACH (网络不可达) 932421 connect(5, {sa_family=AF_INET6, sin6_port=htons(80), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "2606:4700::6810:7b60", &sin6_addr), sin6_scope_id=0}, 28) = -1 ENETUNREACH (网络不可达
过滤掉 IPv6 相关的,及前两行找不到文件的,关键调用信息就这几条:
932422 connect(7, {sa_family=AF_UNIX, sun_path="/run/dbus/system_bus_socket"}, 30) = 0 932422 connect(7, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0 932422 connect(7, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("127.0.0.1")}, 16) = 0 932421 connect(5, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (操作现在正在进行)
也就是通过 UNIX socket 和 /run/dbus/system_bus_socket 通信后,解析出 127.0.0.1,那应该和 systemd 的服务有关。顺便又检查了下 /etc/resolv.conf 配置的 DNS 服务器地址:
nameserver 127.0.0.53
DNS 指向的是本地 127.0.0.53,注释又写明了:
# This is a dynamic resolv.conf file for connecting local clients to the # internal DNS stub resolver of systemd-resolved. This file lists all # configured search domains. # # Run "resolvectl status" to see details about the uplink DNS servers # currently in use.
注释说明了本地的 DNS 解析依赖 systemd-resolved 服务,用 lsof 看 53 端口占用进程也能得知:
$ sudo lsof -i :53 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME systemd-r 10549 systemd-resolve 16u IPv4 94859 0t0 UDP localhost:domain systemd-r 10549 systemd-resolve 17u IPv4 94860 0t0 TCP localhost:domain (LISTEN)
再用 resolvectl 命令查看 www.cloudflare.com 解析的地址:
$ resolvectl query www.cloudflare.com www.cloudflare.com: 127.0.0.1 -- link: enp0s31f6 ::1 -- link: enp0s31f6
排查到这里,说明是某台 DNS 服务器就把 www.cloudflare.com 解析到 127.0.0.1,才导致本地缓存了这个地址,至于问题到底是发生在公司还是家里,我在两处分别抓过包,最后发现问题出在电信分配给家里路由器的默认 DNS 上,以下为抓包结果:
15 1.821116103 192.168.1.7 61.139.2.69 DNS 101 Standard query 0x6fdd A www.cloudflare.com OPT 19 2.216707191 61.139.2.69 192.168.1.7 DNS 94 Standard query response 0x6fdd A www.cloudflare.com A 127.0.0.1
用 dig 命令指定 DNS 服务器来验证下解析:
$ dig www.cloudflare.com @61.139.2.69 ; <<>> DiG 9.11.33-RedHat-9.11.33-1.fc33 <<>> www.cloudflare.com @61.139.2.69 ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 36311 ;; flags: qr; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;www.cloudflare.com. IN A ;; ANSWER SECTION: www.cloudflare.com. 3600 IN A 127.0.0.1
如上,问题果然出在电信的 DNS 服务器上,于是在路由器上更改 DNS 服务器后得以解决。