glibc trickery - part 2
- Yarn
- Changing argv0
- Executing yarn
- How yarn determines the node used to run itself
- How does node determine process.execPath?
- Changing /proc/self/exe
- Final wrapper script
- Summary
- References
It turned out, that the last post Using glibc trickery to run nodejs isn’t sufficient in one special case: If you are using yarn…
Yarn
Yarn is another package manager for JavaScript projects. It can replace npm,
which is bundled with node nowadays. Yarn has a specific command: node.
This allows you to run node again - but not some node. Yarn tries to make sure, that the exact same node version is used as the one
that is used to run Yarn itself.
So, one could think, that yarn achieves this, by looking at “argv0” - the zero argument of the argument list of the process. We can actually run a small node script to check, which value we get.
Given the same setup from the previous blog post, we can run this command now:
$ podman run -it --rm --pull newer -v $(pwd):/root/node-test \
-e PATH=/root/node-test:/root/node-test/node-v24.6.0-linux-x64/bin:/bin:/usr/bin \
docker.io/opensuse/leap:42 \
node -e 'console.log(process.argv0);'
/root/node-test/node-v24.6.0-linux-x64/bin/node
Hm, ok. This looks ok. However, if we would try to start a new process, we would run in the same GLIBC version issues again. It would be better, if this would have been the wrapper script already.
Changing argv0
process.argv0 is documented here: https://nodejs.org/docs/latest/api/process.html#processargv0.
There is also process.argv, which contains the whole
argument array, that has been passed to the nodejs process. Let’s see, what we get:
$ podman run -it --rm --pull newer -v $(pwd):/root/node-test \
-e PATH=/root/node-test:/root/node-test/node-v24.6.0-linux-x64/bin:/bin:/usr/bin \
docker.io/opensuse/leap:42 \
node -e 'console.log(`argv0=${process.argv0}\nargv=${process.argv}\n`);'
argv0=/root/node-test/node-v24.6.0-linux-x64/bin/node
argv=/root/node-test/glibc-2.38/lib64/ld-linux-x86-64.so.2
Let’s try to change it to the wrapper script. The wrapper script is located at /root/node-test/node and is using
the ld.loader. This has an argument called --argv0, which let’s you overwrite exactly this piece
of information. This feature has been implemented with glibc 2.33, see bug 16124
and man ld.so.8.
The new wrapper script looks like this:
#!/bin/bash
NODE_HOME=/root/node-test/node-v24.6.0-linux-x64
GLIBC_HOME=/root/node-test/glibc-2.38
exec ${GLIBC_HOME}/lib64/ld-linux-x86-64.so.2 \
--argv0 /root/node-test/node \
--library-path ${GLIBC_HOME}/lib64 \
${NODE_HOME}/bin/node "$@"
Running node again, we get now this:
$ podman run -it --rm --pull newer -v $(pwd):/root/node-test \
-e PATH=/root/node-test:/root/node-test/node-v24.6.0-linux-x64/bin:/bin:/usr/bin \
docker.io/opensuse/leap:42 \
node -e 'console.log(`argv0=${process.argv0}\nargv=${process.argv}\n`);'
argv0=/root/node-test/node
argv=/root/node-test/glibc-2.38/lib64/ld-linux-x86-64.so.2
Much better. “argv0” looks correct. Hopefully, argv doesn’t matter. But wait, where are node’s actual arguments like “-e” and “console.log(….)”? It turns out, these are in another variable: process.execArgv. There is also process.execPath which we’ll need later. Let’s have a look at all of these variables now:
$ podman run -it --rm --pull newer -v $(pwd):/root/node-test \
-e PATH=/root/node-test:/root/node-test/node-v24.6.0-linux-x64/bin:/bin:/usr/bin \
docker.io/opensuse/leap:42 \
node -e 'console.log(`argv0=${process.argv0}\nargv=${process.argv}\nexecPath=${process.execPath}\nexecArgv=${process.execArgv}\n`);'
argv0=/root/node-test/node
argv=/root/node-test/glibc-2.38/lib64/ld-linux-x86-64.so.2
execPath=/root/node-test/glibc-2.38/lib64/ld-linux-x86-64.so.2
execArgv=-e,console.log(`argv0=${process.argv0}\nargv=${process.argv}\nexecPath=${process.execPath}\nexecArgv=${process.execArgv}\n`);
Ok, so far so good.
Executing yarn
Let’s see, whether we can run “yarn node –version”. First of all, we need to get yarn.
$ podman run -it --rm --pull newer -v $(pwd):/root/node-test \
-e PATH=/root/node-test:/root/node-test/node-v24.6.0-linux-x64/bin:/bin:/usr/bin \
docker.io/opensuse/leap:42 \
npm install -g corepack
$ podman run -it --rm --pull newer -v $(pwd):/root/node-test \
-e PATH=/root/node-test:/root/node-test/node-v24.6.0-linux-x64/bin:/bin:/usr/bin \
docker.io/opensuse/leap:42 \
yarn init -2
First command works, second failed with executable file `yarn` not found in $PATH. Usually, it should
install it into the same directory as node/corepack, which would have been /root/node-test/node-v24.6.0-linux-x64/bin,
but these symlinks actually ended up in /root/node-test/glibc-2.38/bin/… Let’s ignore this for now
and install yarn manually by downloading the single js file:
$ wget -O yarn-4.10.3.js https://repo.yarnpkg.com/4.10.3/packages/yarnpkg-cli/bin/yarn.js
$ podman run -it --rm --pull newer -v $(pwd):/root/node-test \
-e PATH=/root/node-test:/root/node-test/node-v24.6.0-linux-x64/bin:/bin:/usr/bin \
-w /root/node-test \
docker.io/opensuse/leap:42 \
node yarn-4.10.3.js --version
4.10.3
This works - so we can at least start “yarn” now.
Note: I’ve added -w /root/node-test to set the current working directory, so that “yarn-4.10.3.js” can be referenced
relatively.
What is the result of yarn node --version?
$ podman run -it --rm --pull newer -v $(pwd):/root/node-test \
-e PATH=/root/node-test:/root/node-test/node-v24.6.0-linux-x64/bin:/bin:/usr/bin \
-w /root/node-test \
docker.io/opensuse/leap:42 \
node yarn-4.10.3.js node --version
Usage Error: No project found in /root/node-test
Ok, we first need to initialize a project via yarn init and yarn install:
$ podman run -it --rm --pull newer -v $(pwd):/root/node-test \
-e PATH=/root/node-test:/root/node-test/node-v24.6.0-linux-x64/bin:/bin:/usr/bin \
-w /root/node-test \
docker.io/opensuse/leap:42 \
node yarn-4.10.3.js init
...
$ podman run -it --rm --pull newer -v $(pwd):/root/node-test \
-e PATH=/root/node-test:/root/node-test/node-v24.6.0-linux-x64/bin:/bin:/usr/bin \
-w /root/node-test \
docker.io/opensuse/leap:42 \
node yarn-4.10.3.js install
...
$ podman run -it --rm --pull newer -v $(pwd):/root/node-test \
-e PATH=/root/node-test:/root/node-test/node-v24.6.0-linux-x64/bin:/bin:/usr/bin \
-w /root/node-test \
docker.io/opensuse/leap:42 \
node yarn-4.10.3.js node --version
ld.so (GNU libc) stable release version 2.38.
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
That’s worrying: Instead of node’s version, we see some output from “ld.so”. This means, yarn wasn’t able to start a new node process.
How yarn determines the node used to run itself
This tooks some digging into yarn’s source code, which is available on GitHub under yarnpkg/berry.
When you execute “yarn node”, yarn internally actually executes “yarn exec node” (see node.ts).
This is implemented in exec.ts and calls the method executePackageShellcode in scriptUtils.ts which sets up an environment, where executing just “node” will
execute the correct node process. This is done through wrapper scripts by method makeScriptEnv and
several calls to makePathWrapper. The argument “argv0” that is used to execute, is passed from process.execPath.
So, yarn actually is going to execute process.execPath in order to start a new node process… But we have seen above,
that this value is the loader… and not our node wrapper script.
This explains, why “yarn node” doesn’t work:
$ podman run -it --rm --pull newer -v $(pwd):/root/node-test \
-e PATH=/root/node-test:/root/node-test/node-v24.6.0-linux-x64/bin:/bin:/usr/bin \
-w /root/node-test \
docker.io/opensuse/leap:42 \
node yarn-4.10.3.js node
/root/node-test/glibc-2.38/lib64/ld-linux-x86-64.so.2: missing program name
Try '/root/node-test/glibc-2.38/lib64/ld-linux-x86-64.so.2 --help' for more information.
So, the question now became: “How to change process.execPath?”
How does node determine process.execPath?
Let’s have a look at the implementation of node. In node_process_object.cc there is some code for handling execPath. It reads the value from “env”,
which is of type Environment, which is implemented in env.cc. This
contains a lot of “exec_path” references. In the end, it seems to be initialized in the constructor
by calling the method Environment::GetExecPath:
This is calling the platform dependent uv_exepath, which is implemented for linux in procfs-exepath.c, which just reads the symlink /proc/self/exe.
By the way, if uv_exepath would return an error, then Environment::GetExecPath would fall back to argv0…
Ok, can we somehow change, what the symlink in /proc/self/exe it pointing at?
Changing /proc/self/exe
Apparently this is very difficult, for good reasons. See e.g. Hiding in plain sight - Abusing the dynamic linker.
So, there is the LD_PRELOAD mechanism, also known as the ld preload trick.
With glibc you can load a shared library early on (“preload”) and this can intercept other library methods. Since node is using readlink,
we can intercept this call and return whatever path we want for /proc/self/exe… that should do the trick.
So, how does this work? We need a shared library. It is based on How could I intercept linux syscalls?.
Let’s call it change_exe.c:
// Based on https://stackoverflow.com/questions/69859/how-could-i-intercept-linux-sys-calls
#define _GNU_SOURCE
#include <dlfcn.h>
#include <unistd.h>
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
extern int errorno;
ssize_t __thread (*_readlink)(const char *restrict path, char * buf, size_t bufsiz) = NULL;
ssize_t readlink(const char *restrict path, char * buf, size_t bufsiz)
{
if (NULL == _readlink) {
_readlink = (ssize_t (*)(const char *restrict path, char * buf, size_t bufsiz)) dlsym(RTLD_NEXT, "readlink");
}
if (strcmp(path, "/proc/self/exe") == 0) {
const char * env_path = getenv("CHANGE_EXE_PATH");
if (env_path) {
const size_t env_path_len = strlen(env_path);
strncpy(buf, env_path, bufsiz);
if (bufsiz > env_path_len + 1) {
return env_path_len;
}
return bufsiz;
}
}
return _readlink(path, buf, bufsiz);
}
// gcc -fPIC -shared -ldl -o change_exe.so change_exe.c
// Test 1: LD_PRELOAD="./change_exe.so" readlink /proc/self/exe
// Test 2: CHANGE_EXE_PATH="/foo" LD_PRELOAD="./change_exe.so" readlink /proc/self/exe
This library will check whether the environment variable CHANGE_EXE_PATH exists and will use this
one when readlink is called for /proc/self/exe. This way, we can configure the actual path.
Final wrapper script
We can now change our wrapper script to include not only “argv0” but also a preloaded library:
#!/bin/bash
NODE_HOME=/root/node-test/node-v24.6.0-linux-x64
GLIBC_HOME=/root/node-test/glibc-2.38
export CHANGE_EXE_PATH=/root/node-test/node
exec ${GLIBC_HOME}/lib64/ld-linux-x86-64.so.2 \
--argv0 ${CHANGE_EXE_PATH} \
--preload /root/node-test/change_exe.so \
--library-path ${GLIBC_HOME}/lib64 \
${NODE_HOME}/bin/node "$@"
Does it work?
$ podman run -it --rm --pull newer -v $(pwd):/root/node-test \
-e PATH=/root/node-test:/root/node-test/node-v24.6.0-linux-x64/bin:/bin:/usr/bin \
-w /root/node-test \
docker.io/opensuse/leap:42 \
node yarn-4.10.3.js node --version
v24.6.0
Yes! The whole chain node -> yarn -> node is working now. Let’s see, what the values process.execPath and process.argv and
so on are now:
$ podman run -it --rm --pull newer -v $(pwd):/root/node-test \
-e PATH=/root/node-test:/root/node-test/node-v24.6.0-linux-x64/bin:/bin:/usr/bin \
docker.io/opensuse/leap:42 \
node -e 'console.log(`argv0=${process.argv0}\nargv=${process.argv}\nexecPath=${process.execPath}\nexecArgv=${process.execArgv}\n`);'
argv0=/root/node-test/node
argv=/root/node-test/node
execPath=/root/node-test/node
execArgv=-e,console.log(`argv0=${process.argv0}\nargv=${process.argv}\nexecPath=${process.execPath}\nexecArgv=${process.execArgv}\n`);
Summary
While researching this topic, I remembered proot which I occasionally use for backups (see Backups with restic and proot). proot as well intercepts system calls and returns “adjusted” values - it especially intercepts all calls that deal with path names. But do they do anything about /proc/self/exe? Let’s see:
$ ./proot node-v24.6.0-linux-x64/bin/node \
-e 'console.log(`argv0=${process.argv0}\nargv=${process.argv}\nexecPath=${process.execPath}\nexecArgv=${process.execArgv}\n`);'
argv0=node-v24.6.0-linux-x64/bin/node
argv=/root/node-test/node-v24.6.0-linux-x64/bin/node
execPath=/root/node-test/node-v24.6.0-linux-x64/bin/node
execArgv=-e,console.log(`argv0=${process.argv0}\nargv=${process.argv}\nexecPath=${process.execPath}\nexecArgv=${process.execArgv}\n`);
So, they obviously thought about this… However, proot works a bit different: While “change_exe.c” only intercepts a glibc library method “readlink”, which itself is wrapper function around the actual syscall, proot uses ptrace to intercept the actual syscalls. And it seems to follow processes: if you run a shell with proot and start another process from there, this child process also have its syscall intercepted.
Also note, that all this with wrapper scripts is only necessary, if you a) cannot upgrade the system (hey, why not using a modern operating system?) and b) you cannot change the node binaries.
Because if you can change the node binaries, you can use patchelf to set the loader and library path directly in the node binary. Then you don’t need to call node explicitly via the loader anymore - it will just work. This is explained in Multiple glibc libraries on a single host.
References
Here are some pages in used during research in no particular order:
- https://sourceware.org/bugzilla/show_bug.cgi?id=16124
- https://stackoverflow.com/questions/847179/multiple-glibc-libraries-on-a-single-host
- https://en.wikipedia.org/wiki/Dynamic_linker
- https://haxrob.net/process-name-stomping/
- https://haxrob.net/hiding-in-plain-sight-part-2/
- https://superuser.com/questions/1144758/overwrite-default-lib64-ld-linux-x86-64-so-2-to-call-executables
- https://lwn.net/Articles/631631/
- https://stackoverflow.com/questions/69859/how-could-i-intercept-linux-sys-calls
- https://stackoverflow.com/questions/426230/what-is-the-ld-preload-trick
- https://www.goldsborough.me/c/low-level/kernel/2016/08/29/16-48-53-the_-ld_preload-_trick/
- https://stackoverflow.com/questions/61668718/how-does-the-dynamic-linker-executes-proc-self-exe
- https://github.com/NixOS/patchelf
- https://github.com/proot-me/proot
- https://github.com/skirsten/proot-portable-android-binaries?tab=readme-ov-file
- https://stackoverflow.com/questions/49085907/is-it-possible-to-change-the-value-of-proc-self-exe-for-an-execed-child-proc
- https://man7.org/linux/man-pages/man2/PR_SET_MM_EXE_FILE.2const.html
- https://github.com/upx/upx/issues/249 - /proc/self/exe can be unlinked by a different process…
- https://lwn.net/Articles/920384/
- https://stackoverflow.com/questions/68265901/can-i-change-proc-pid-exe-to-reflect-a-different-binary
Comments
No comments yet.Leave a comment
Your email address will not be published. Required fields are marked *. All comments are held for moderation to avoid spam and abuse.