C sysctl in rtld x86 under Valgrind

I'm getting closer to reviving Valgrind on FreeBSD. One of the last blocker issues that I have are with a sysctl in rtld.

The scenario is that the Valgrind host code has completed phase 1 and it has started to execute client code. The first thing that it does is load and execute rtld. The problem code (in rtld) is

Code:
if (sysctlnametomib("hw.pagesizes", mib, &len) == 0)
    size = sizeof(psa);
else
; // snipped

if (sysctl(mib, len, psa, &size, NULL, 0) == -1) {
_rtld_error("sysctl for hw.pagesize(s) failed");
rtld_die();
?
}

Remember, this code isn't being executed natively, it's being JIT interpreted by Valgrind.

sysctlnametomib() should be getting values of 6,7 in the mib, but I'm seeing 6,2147482929
this is causing the call to sysctl() to fail and the client exits.

Actually I see the same thing on amd64, but it doesn't seem to pose any problem.

2147482929 is 0x7FFFFD31, somewhere towards the upper end of positive values for an int.

I've stepped though the Valgrind code down to the assembly level and I see the same values there, so it isn't a problem of propagating the values within the Valgrind client. This is occurring on the very first client syscall (the host has already done plenty). Subsequent syscalls don't seem to have a problem.

Does anyone have any idea why this syscall is misbehaving under Valgrind?
 
Last edited:
No takers?

Does anyone know how I could debug this either in
  1. a standalone executable, stepping through the rtld machine code or
  2. debugging this from the kernel side. I have tried truss/dtruss but they just print the pointer to the mib rather than the mib contents. Which is useless for my purposes.
I'm aso seeing the same behaviour with FreeBSD 11.3.
 
I tried dtruss. I'll try again with the dtraceall kmod loaded, but this will probably need some DTrace scripting.
 
Something like that:
Code:
#!/usr/sbin/dtrace -s

/* int sysctl(const int *name, u_int namelen, void *oldp, size_t *oldlenp, const void *newp, size_t newlen) */
/* 202 AUE_SYSCTL STD { int freebsd32___sysctl(int *name, u_int namelen, void *old, uint32_t *oldlenp, const void *new, uint32_t newlen); */

syscall:freebsd:__sysctl:entry,
syscall:freebsd32:freebsd32___sysctl:entry {
  this->namelen = (unsigned int)arg1;
  this->name    = (int*)copyin(arg0, sizeof(int) * this->namelen);
  printf("name = {%d, %d, %d, %d, %d, %d}, namelen = %d [%s]",
    this->name[0],
    this->namelen >= 2 ? this->name[1] : 0,
    this->namelen >= 3 ? this->name[2] : 0,
    this->namelen >= 4 ? this->name[3] : 0,
    this->namelen >= 5 ? this->name[4] : 0,
    this->namelen >= 6 ? this->name[5] : 0,
    this->namelen,
    execname);
  /*ustack();*/
}
 
Thanks, that's very interesting. When I run the guest app standalone, I don't see the rtld sysctls.

Under Valgrind I see

Code:
  2  75280                   __sysctl:entry name = {6, 7, 0, 0, 0, 0}, namelen = 2 [valgrind]
  2  75280                   __sysctl:entry name = {2, 12, 0, 0, 0, 0}, namelen = 2 [valgrind]
  2  75280                   __sysctl:entry name = {1, 14, 12, 29983, 0, 0}, namelen = 4 [valgrind]
  2  74162         freebsd32___sysctl:entry name = {1, 14, 32, 29983, 0, 0}, namelen = 4 [none-x86-freebsd]
  2  74162         freebsd32___sysctl:entry name = {0, 3, 0, 0, 0, 0}, namelen = 2 [none-x86-freebsd]
  2  74162         freebsd32___sysctl:entry name = {6, 2147481815, 0, 0, 0, 0}, namelen = 2 [none-x86-freebsd]
  2  74162         freebsd32___sysctl:entry name = {1, 14, 33, 29983, 0, 0}, namelen = 4 [none-x86-freebsd]
  2  74162         freebsd32___sysctl:entry name = {1, 14, 33, 29983, 0, 0}, namelen = 4 [none-x86-freebsd]
  2  74162         freebsd32___sysctl:entry name = {1, 14, 33, 29983, 0, 0}, namelen = 4 [none-x86-freebsd]
  2  74162         freebsd32___sysctl:entry name = {1, 14, 33, 29983, 0, 0}, namelen = 4 [none-x86-freebsd]
  2  74162         freebsd32___sysctl:entry name = {1, 14, 33, 29983, 0, 0}, namelen = 4 [none-x86-freebsd]
  2  74162         freebsd32___sysctl:entry name = {1, 14, 33, 29983, 0, 0}, namelen = 4 [none-x86-freebsd]
  2  74162         freebsd32___sysctl:entry name = {1, 14, 33, 29983, 0, 0}, namelen = 4 [none-x86-freebsd]
  2  74162         freebsd32___sysctl:entry name = {0, 3, 0, 0, 0, 0}, namelen = 2 [none-x86-freebsd]
  2  74162         freebsd32___sysctl:entry name = {6, 7, 0, 0, 0, 0}, namelen = 2 [none-x86-freebsd]
  2  74162         freebsd32___sysctl:entry name = {1, 24, 0, 0, 0, 0}, namelen = 2 [none-x86-freebsd]
  2  74162         freebsd32___sysctl:entry name = {1, 14, 33, 29983, 0, 0}, namelen = 4 [none-x86-freebsd]
  2  74162         freebsd32___sysctl:entry name = {1, 14, 33, 29983, 0, 0}, namelen = 4 [none-x86-freebsd]
  2  74162         freebsd32___sysctl:entry name = {1, 14, 33, 29983, 0, 0}, namelen = 4 [none-x86-freebsd]
  2  74162         freebsd32___sysctl:entry name = {1, 14, 33, 29983, 0, 0}, namelen = 4 [none-x86-freebsd]
  2  74162         freebsd32___sysctl:entry name = {1, 14, 33, 29983, 0, 0}, namelen = 4 [none-x86-freebsd]
  2  74162         freebsd32___sysctl:entry name = {1, 14, 33, 29983, 0, 0}, namelen = 4 [none-x86-freebsd]
  2  74162         freebsd32___sysctl:entry name = {1, 14, 33, 29983, 0, 0}, namelen = 4 [none-x86-freebsd]
  2  74162         freebsd32___sysctl:entry name = {1, 14, 33, 29983, 0, 0}, namelen = 4 [none-x86-freebsd]
  2  74162         freebsd32___sysctl:entry name = {1, 14, 33, 29983, 0, 0}, namelen = 4 [none-x86-freebsd]
  2  74162         freebsd32___sysctl:entry name = {1, 14, 33, 29983, 0, 0}, namelen = 4 [none-x86-freebsd]
  2  74162         freebsd32___sysctl:entry name = {1, 14, 33, 29983, 0, 0}, namelen = 4 [none-x86-freebsd]
  2  74162         freebsd32___sysctl:entry name = {1, 14, 33, 29983, 0, 0}, namelen = 4 [none-x86-freebsd]
  2  74162         freebsd32___sysctl:entry name = {1, 14, 33, 29983, 0, 0}, namelen = 4 [none-x86-freebsd]
  2  74162         freebsd32___sysctl:entry name = {1, 14, 33, 29983, 0, 0}, namelen = 4 [none-x86-freebsd]
  2  74162         freebsd32___sysctl:entry name = {1, 14, 33, 29983, 0, 0}, namelen = 4 [none-x86-freebsd]
  2  74162         freebsd32___sysctl:entry name = {1, 14, 33, 29983, 0, 0}, namelen = 4 [none-x86-freebsd]
  2  74162         freebsd32___sysctl:entry name = {1, 14, 33, 29983, 0, 0}, namelen = 4 [none-x86-freebsd]
  2  74162         freebsd32___sysctl:entry name = {1, 14, 33, 29983, 0, 0}, namelen = 4 [none-x86-freebsd]
  2  74162         freebsd32___sysctl:entry name = {1, 14, 33, 29983, 0, 0}, namelen = 4 [none-x86-freebsd]
  2  74162         freebsd32___sysctl:entry name = {1, 14, 33, 29983, 0, 0}, namelen = 4 [none-x86-freebsd]
  2  74162         freebsd32___sysctl:entry name = {1, 14, 33, 29983, 0, 0}, namelen = 4 [none-x86-freebsd]
  2  74162         freebsd32___sysctl:entry name = {1, 14, 33, 29983, 0, 0}, namelen = 4 [none-x86-freebsd]
  2  74162         freebsd32___sysctl:entry name = {1, 14, 33, 29983, 0, 0}, namelen = 4 [none-x86-freebsd]
  2  74162         freebsd32___sysctl:entry name = {1, 14, 33, 29983, 0, 0}, namelen = 4 [none-x86-freebsd]
  2  74162         freebsd32___sysctl:entry name = {1, 14, 33, 29983, 0, 0}, namelen = 4 [none-x86-freebsd]
  2  74162         freebsd32___sysctl:entry name = {1, 14, 33, 29983, 0, 0}, namelen = 4 [none-x86-freebsd]
  2  74162         freebsd32___sysctl:entry name = {1, 14, 33, 29983, 0, 0}, namelen = 4 [none-x86-freebsd]
  2  74162         freebsd32___sysctl:entry name = {1, 14, 33, 29983, 0, 0}, namelen = 4 [none-x86-freebsd]
  2  74162         freebsd32___sysctl:entry name = {1, 14, 33, 29983, 0, 0}, namelen = 4 [none-x86-freebsd]
  2  74162         freebsd32___sysctl:entry name = {1, 37, 0, 0, 0, 0}, namelen = 2 [none-x86-freebsd]

VG
6,7 hw.pagesizes
2,12 vm
1,14,12 kern.proc.gid

guest
1,14,32 kern.proc.vmmap
0,3 sysctlnametomib
6,garbage should be hw.pagesizes
1,14,33 kern.proc.filedesc x7
0,3 sysctlnametomib
6,7 hw.pagesizes
1,24 kern.osreldate
1,14,33 kern.proc.filedesc x lots
1,37 kern.arnd
2,12 vm

The first hw.pagesize I was expecting, it's in the VG code.

The second and third I don't fully understand.

The third one looks like the one that I hacked within VG to change the 2147481815 to 7. But where is the second one coming from?
 
What is the difference between hw.pagesize and hw.pagesizes?

The dtrace script seems to capture the inputs eg. "entry". For sysctlnametomib you likely want to see the outputs ("return" I think).
 
What is the difference between hw.pagesize and hw.pagesizes?

The dtrace script seems to capture the inputs eg. "entry". For sysctlnametomib you likely want to see the outputs ("return" I think).

There's no difference. I was just doing manual decoding of the values, looking at the sysctl.h header.

For the decoding of outputs, this is something I'm going to need.
 
I'm beginning to understand a bit more of what is happening. First of all, I had mixed up "hw.pagesize" and "hw.pagesizes". Is there any accessible documentation or something like syscall.master? Up to now I've been looking at sys/syscalls.h, but that only gives a tiny subset of the possible sysctls.

The first sysctl is coming from Valgrind:

error = VG_(sysctlbyname)("hw.instruction_sse", &sc, &scl, 0, 0);

This gives 6,7ffff8d7

The second is from rtld.so for "hw.pagesizes", and that gives 6,7ffffd31

I wrote a little application to print these values.

So that brings me to the question, why is the sysctl call to read hw.pagesizes failing?
 
Going back to the traces
Code:
SYSCALL[2575,1](202) sys_sysctl ( 0xfb01d1d4, 2, 0x441f938, 0xfb01ccb0, 0x0, 0 )
mib[0]: hw mib[1]: 2147482929
[sync] --> Failure(0xc)

The failure code is 0xc. That looks like ENOMEM. The man page gives 3 possibilities
[ENOMEM] The length pointed to by oldlenp is too short to hold
the requested value.

[ENOMEM] The smaller of either the length pointed to by oldlenp
or the estimated size of the returned data exceeds the
system limit on locked memory.

[ENOMEM] Locking the buffer oldp, or a portion of the buffer if
the estimated size of the data to be returned is
smaller, would cause the process to exceed its per-
process locked memory limit.

To be continued ...
 
Haven't given up yet.

With some more dtraceing I see

Code:
  3  74045         freebsd32___sysctl:entry sysctl entry name = {0, 3, 0, 0, 0, 0}, namelen = 2 oldlen = 8 [memcheck-x86-freebs]
  3  74045         freebsd32___sysctl:entry nametomib hw.pagesizes
  3  74046        freebsd32___sysctl:return sysctl return oldlen 8
  3  74046        freebsd32___sysctl:return  oid fffffe00d42015e0 {6 2147482929 0}
  3  74045         freebsd32___sysctl:entry sysctl entry name = {6, 2147482929, 0, 0, 0, 0}, namelen = 2 oldlen = 8 [memcheck-x86-freebs]
  3  74046        freebsd32___sysctl:return sysctl return oldlen 8
  3  74046        freebsd32___sysctl:return  oid fffffe00d4201750 {4096 2097152 0}

So, sysctlnametomib is getting 6,2147482929 with a length of 8. Looks good.

Then the sysctl to get the pagesizes is getting 4096,2097152 and a length of 8. That's 4k and 2M. Also looks good.

On amd64, is there a way to see the 32bit translation of the addresses? I can run my tests on i386, but that will be a pain (I only have a basic VirtualBox setup for i386).
 
Well, after a serious amount of debugging I now have an explanation, though not yet a solution.

When an executable starts, it gets passed several things
  1. argc the arg count
  2. argv the arg vector
  3. envp the environment vector
  4. auxv the auxiliary vector.
Since there is just one executable with Valgrind, the host, that receives all of these and it has to synthesize, usually just by copying, equivalents for the guest.

The rtld code that executes on startup first checks the auxp vector. This should contain the PAGESIZES. If it doesn't find the values from auxv, it falls back to the syscalls.

This leads to the following sequence of events

  1. Valgrind wasn't propagating auxv fully, specifically not PAGESIZES
  2. rtld was then falling back to the hw.pagesizes sysctl
  3. There is a bug in this fallback when running an i386 executable on an amd64 kernel. Userland amd64 has a value of 3 for MAXPAGESIZES, but the i386 kernel only has a value of 2 for MAXPAGESIZES. The copy out routine for sysctl sees this discrepancy and flags ENOMEM.
 
Back
Top