VMware ModConf_NewKernelInfo 段错误排查
Table of Contents
环境:
软件 | 版本 |
---|---|
VMware Workstation 15 Pro | 15.5.6 build-16341506 |
系统 | Fedora 32 |
内核 | Linux version 5.8.4-200.fc32.x86_64 |
更新系统重启后,在启动 VMware 时,vmware-modconfig 报出了段错误,用 coredumpctl 命令可以查看 Core dump 信息:
$ coredumpctl info PID: 7968 (vmware-modconfi) UID: 1000 (lu4nx) GID: 1000 (lu4nx) Signal: 11 (SEGV) Timestamp: Tue 2020-09-01 13:26:00 CST (1h 13min ago) Command Line: /usr/lib/vmware/bin/vmware-modconfig --launcher=/usr/bin/vmware-modconfig --launcher=/usr/bin/vmware-modconfig --appname=VMware Workstation --icon=vmware-workstation Executable: /usr/lib/vmware/bin/appLoader Control Group: /user.slice/user-1000.slice/[email protected]/apps.slice/apps-org.gnome.Terminal.slice/vte-spawn-8768fa70-1003-4804-9806-72000c9f6d6c.scope Unit: [email protected] User Unit: vte-spawn-8768fa70-1003-4804-9806-72000c9f6d6c.scope Slice: user-1000.slice Owner UID: 1000 (lu4nx) Boot ID: c22aa9eaff8b42e38ee6bccd794922d5 Machine ID: ba44c3fb040e40a490d6387a4e7e0bc5 Hostname: lx-pc Storage: /var/lib/systemd/coredump/core.vmware-modconfi.1000.c22aa9eaff8b42e38ee6bccd794922d5.7968.1598937960000000000000.lz4 Message: Process 7968 (vmware-modconfi) of user 1000 dumped core. Stack trace of thread 7968: #0 0x00007fac1bfededc __strchr_avx2 (libc.so.6 + 0x161edc) #1 0x00007fac1b38911b ModConf_NewKernelInfo (libvmware-modconfig.so + 0x2611b) #2 0x00007fac1b3853b6 main (libvmware-modconfig.so + 0x223b6) #3 0x000055ef06c1b930 n/a (appLoader + 0x18930) #4 0x000055ef06c1790c n/a (appLoader + 0x1490c) #5 0x00007fac1beb3042 __libc_start_main (libc.so.6 + 0x27042) #6 0x000055ef06c17df9 n/a (appLoader + 0x14df9)
在运行 /usr/lib/vmware/bin/appLoader 时出现段错误。我看了 DNF 的日志(/var/log/dnf.log),发现更新过内核,因此隐约觉得 VMware 这 bug 和内核有一点关系。
回到上面的 Stack trace,问题产生在 ModConf_NewKernelInfo 将错误的内存地址传递给了 GLibc 的函数,从这个函数名来看,应该是 VMware 在检测到内核版本变化后需要做一些配置,我用 strace 命令跟踪系统调用时,发现读取了 /proc/version(里面是一些内核的版本及编译器相关的信息):
$ strace -f -o /tmp/out vmware $ less /tmp/out ... 3530 openat(AT_FDCWD, "/proc/version", O_RDONLY) = 17 3530 fstat(17, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 3530 fcntl(17, F_GETFL) = 0x8000 (flags O_RDONLY|O_LARGEFILE) 3530 fstat(17, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 3530 read(17, "Linux version 5.8.4-200.fc32.x86"..., 4096) = 192 3530 read(17, "", 3072) = 0 ...
因为 VMware 是商业软件,为了省事,直接开 Ghidra 逆向 libvmware-modconfig.so,并定位到 ModConf_NewKernelInfo,注意到这段代码:
kernel_version = g_file_get_contents("/proc/version",&local_1d8,0,0); if (kernel_version == 0) { Warning("Failed to get a kernel gcc version. Unable to read \"%s\".\n","/proc/version"); } else { uVar3 = g_malloc0(0x18); *(undefined8 *)(puVar5 + 0x30) = uVar3; gcc_version_pos = strstr(local_1d8,"gcc version "); /* 查找“gcc version”字符串 */ bVar9 = false; bVar10 = gcc_version_pos == (char *)0x0; if (bVar10) { __s = (byte *)0x0; } else { lVar6 = 5; __s = (byte *)(gcc_version_pos + 0xc); /* 获得 GCC 的版本号 */ pbVar8 = (byte *)"egcs-"; do { if (lVar6 == 0) break; lVar6 = lVar6 + -1; bVar9 = *__s < *pbVar8; bVar10 = *__s == *pbVar8; __s = __s + (ulong)bVar11 * -2 + 1; pbVar8 = pbVar8 + (ulong)bVar11 * -2 + 1; } while (bVar10); __s = (byte *)(gcc_version_pos + 0xc); if ((!bVar9 && !bVar10) == bVar9) { __s = (byte *)(gcc_version_pos + 0x11); } } gcc_version_pos = strchr((char *)__s,0x20); }
这函数会从 /proc/version 文件中找到字符串“gcc version”的位置,然后根据位置解析出 GCC 的版本号;但由于没做好无“gcc version”时的判断,导致调用 strchr 时出错。
查看当前内核的 /proc/version:
$ cat /proc/version Linux version 5.8.4-200.fc32.x86_64 ([email protected]) (gcc (GCC) 10.2.1 20200723 (Red Hat 10.2.1-1), GNU ld version 2.34-4.fc32) #1 SMP Wed Aug 26 22:28:08 UTC 2020
并没看到“gcc version”字符串,于是我重启系统切换到更新前的内核后,查看 /proc/version:
$ cat /proc/version Linux version 5.7.17-200.fc32.x86_64 ([email protected]) (gcc version 10.2.1 20200723 (Red Hat 10.2.1-1) (GCC), GNU ld version 2.34-4.fc32) #1 SMP Fri Aug 21 15:23:46 UTC 2020
这里就有“gcc version”了,并且能正常启动 VMware。
1. 解决方案
当前系统使用 Linux 5.8 的内核就会出现该问题,因为 5.8 的内核增加了 CONFIG_CC_VERSION_TEXT 选项:
commit 9a950154668729a472d17b8e307d92e7c60f45f7 Author: Masahiro Yamada <[email protected]> Date: Thu Apr 23 23:23:54 2020 +0900 kbuild: use CONFIG_CC_VERSION_TEXT to construct LINUX_COMPILER macro scripts/mkcompile_h runs $(CC) just for getting the version string. Reuse CONFIG_CC_VERSION_TEXT for optimization. For GCC, this slightly changes the version string. I do not think it is a big deal as we do not have the defined format for LINUX_COMPILER. In fact, the recent commit 4dcc9a88448a ("kbuild: mkcompile_h: Include $LD version in /proc/version") added the linker version. Signed-off-by: Masahiro Yamada <[email protected]>
完全修复这个 bug 就只能等 VMware 发布升级;临时解决方法有两个:
- 回滚到 5.7 内核;
- 参考 https://github.com/mkubecek/vmware-host-modules/issues/70 中讨论的方法,让 vmware-modconfig 正常退出:
mv /usr/bin/vmware-modconfig /usr/bin/vmware-modconfig_backup ln -s /bin/true /usr/bin/vmware-modconfig