Clisp under Maemo

I wanted to get clisp running on my Nokia N810. Unfortunately it seems that no-one had yet ported it to the new ARM EABI (known as "armel" in Debian). Several things changed, such as the alignment of doubles and structure packing and so on, and clisp knows enough about the way its own stack works that this causes trouble.

So I decided to have a go at porting it myself, just to see. I don't know much about assembler or ARM processors or anything, but what the hell. I did actually make some progress, but ultimately failed to get the last piece to work. So I just put my notes up on the web to see if they would be useful.

To my astonishment, they were. Max Lapan carried on where I left off, and soon polished the project off. His code has now been committed to the clisp trunk, and it's just a matter of letting this filter down through the distributions. Hooray!

To commemorate this triumph, I here leave my notes for posterity...

ffcall

As far as I can tell, simply by finding all files with "arm" in the name or "__arm__" in the code, clisp's architecture-specific code is confined to the ffcall library. This provides a variety of mechanisms for calling C functions outside the usual C idiom. Get ffcall working, and you'll probably get clisp working too. The patches appear to work without modification under both source trees. ffcall is available as a separate library in Debian as libffcall1. It has far fewer build dependencies, so it's best to work with that tarball instead of clisp directly.

There are three basic lumps of code in the library. vacall implements a kind of varargs syntax, so you can define a function that takes a struct argument containing all the real arguments, and then call it as "f(arg1, arg2, arg3);". avcall is similar but in reverse: the caller builds up an argument list and then uses it to call a function with multiple arguments. Both of these bits work with my patch below.

The third part is the trampoline, which implements a kind of closure. That is, it allocates a piece of memory that wraps up an existing, statically defined function with some data. So say you have a function "int add_constant(int x);" defined somewhere, which adds a constant value stored in a global variable to the argument. You can use trampoline to define at runtime a function "add_3" which can be called like any other, which wraps up add_constant together with the value "3". Simulatenously you can define another function "add_7" which wraps up add_constant with the value "7". Same code, different behaviour. This is the bit that I failed to get working.

Under the callback directory there are slightly different versions of vacall and trampoline. These are re-entrant, achieved by using registers to supply the arguments rather than global variables. It was fairly straightforward to get vacall_r working by copying the changes I made to vacall.

Building

My patches so far are available here. To build, you first have to generate the assembly code from the C files. So building under Debian looks like this:

apt-get source libffcall1
wget http://mat.exon.name/logs/ffcall-arm.diff
patch -p0 < ffcall-arm.diff
make -C ffcall-1.10+2.41/ffcall/avcall -f Makefile.devel avcall-arm.S
make -C ffcall-1.10+2.41/ffcall/callback/vacall_r -f Makefile.devel vacall-arm.S
make -C ffcall-1.10+2.41/ffcall/vacall -f Makefile.devel vacall-arm.S
cd ffcall-1.10+2.41/ffcall
./configure
make extracheck

Of course, this will fail at the trampoline. But it should pass the other tests.

The tricky part is the .S files. These are assembly routines generated by gcc. As supplied, they don't compile, probably due to the fact that they are gcc version 2.6.3, circa 1995. So they need to be regenerated. This is done with the also-supplied but not-automatically-run Makefile.devel file, plus some kind of Rube Goldberg machine of sed scripts and macros to take away underscores and put them back again later, for some reason. I don't understand it, and so my patch hacks things up to bypass most of it.

Trampoline

There are actually two trampolines, one inside trampoline and one inside callback/trampoline_r. The former relies on a global variable to hold the data, and so it'll break if the same trampoline is used more than once at the same time. The latter, reentrant one corrects this by storing the data in the temporary register, "ir" aka "r12".

As far as I can tell, the non-reentrant version can't have worked for a long time, because the interface seems to have changed. Instead of a pointer to a function pointer, the macro now takes just a function pointer, and instead of a pointer to the data, the macro takes the data itself. However, the reentrant version appears to be correct and up-to-date. I just can't get it to work.

That is to say, the non-reentrant version seems to work under scratchbox. But as soon as I move it to the actual device, it segfaults. The reentrant version fails in any case. The problem is, it works fine if I step through it instruction-by-instruction in gdb, which is pretty maddening. I've tried debugging it by adding instructions to shift temporary values into spare registers such as r2 and r3, but can't get it to produce anything comprehensible.

Unfortunately, I'm out of time (and patience), so I have to give up. But the problem that remains is fairly well constrained, so maybe someone with more experience of debugging ARM processors will hear my plea for help and finish this off.

Notes

The rest of this file is the raw notes I made while trying to get it to work. They probably make very little sense.

For clisp, I'm doing what I probably should have done first, which is replace "-DSAFETY=3" with "-DNO_GENERATIONAL_GC" in makemake.in. And now I can try ./configure. This is 2.41, by the way.

Same error as always.

./lisp.run -B . -N locale -E 1:1 -Efile UTF-8 -Eterminal UTF-8 -norc -m 1800KW -x "(and (load \"init.lisp\") (sys::%saveinitmem) (ext::exit)) (ext::exit t)"
qemu: uncaught target signal 4 (Illegal instruction) - exiting
make: *** [interpreted.mem] Error 252
[sbox-CHINOOK_ARMEL: ~/clisp/clisp/clisp-2.41.orig/src] > mv .gdbinit disabled.gdbinit
[sbox-CHINOOK_ARMEL: ~/clisp/clisp/clisp-2.41.orig/src] > gdb ./lisp.run
(gdb) run -B . -N locale -E 1:1 -Efile UTF-8 -Eterminal UTF-8 -norc -m 1800KW -x "(and (load \"init.lisp\") (sys::%saveinitmem) (ext::exit)) (ext::exit t)"
Starting program: /home/mexon/clisp/clisp/clisp-2.41.orig/src/lisp.run -B . -N locale -E 1:1 -Efile UTF-8 -Eterminal UTF-8 -norc -m 1800KW -x "(and (load \"init.lisp\") (sys::%saveinitmem) (ext::exit)) (ext::exit t)"
Don't know how to run.  Try "help target".

Note that officially, clisp depends on gcc-4.1. That could well be significant. So maybe I should have another go at that. That won't work, because installing it depends on a newer version of libc. Hmm, or not. What does it depend on? autogen dejagnu (>= 1.4.3) expect-tcl8.3 gperf (>= 3.0.1) bison (>= 1:2.3) libmpfr-dev realpath (>= 1.9.12) chrpath make (>= 3.81) graphviz (>= 2.2) gsfonts-x11. I reckon I have a go compiling it locally. It wants to compile java, and I don't want it to. According to this:

[sbox-CHINOOK_ARMEL: ~/clisp/gcc-4.1/local/gcc-4.1-4.1.1ds2.orig/gcc-4.1.1/gcc] > grep language= */config-lang.in
ada/config-lang.in:language="ada"
ada/config-lang.in:boot_language=yes
cp/config-lang.in:language="c++"
fortran/config-lang.in:language="fortran"
java/config-lang.in:language="java"
objc/config-lang.in:language="objc"
objcp/config-lang.in:language="obj-c++"
treelang/config-lang.in:language="treelang"

I don't want any of those. Oh, apart from c. "./configure --enable-languages=c".

After a long time it failed like this:

ar  rc ./libgcc.a libgcc/./_udivsi3.o libgcc/./_divsi3.o libgcc/./_umodsi3.o libgcc/./_modsi3.o libgcc/./_dvmd_lnx.o libgcc/./_muldi3.o libgcc/./_negdi2.o libgcc/./_lshrdi3.o libgcc/./_ashldi3.o libgcc/./_ashrdi3.o libgcc/./_cmpdi2.o libgcc/./_ucmpdi2.o libgcc/./_floatdidf.o libgcc/./_floatdisf.o libgcc/./_fixunsdfsi.o libgcc/./_fixunssfsi.o libgcc/./_fixunsdfdi.o libgcc/./_fixdfdi.o libgcc/./_fixunssfdi.o libgcc/./_fixsfdi.o libgcc/./_fixxfdi.o libgcc/./_fixunsxfdi.o libgcc/./_floatdixf.o libgcc/./_fixunsxfsi.o libgcc/./_fixtfdi.o libgcc/./_fixunstfdi.o libgcc/./_floatditf.o libgcc/./_clear_cache.o libgcc/./_enable_execute_stack.o libgcc/./_trampoline.o libgcc/./__main.o libgcc/./_absvsi2.o libgcc/./_absvdi2.o libgcc/./_addvsi3.o libgcc/./_addvdi3.o libgcc/./_subvsi3.o libgcc/./_subvdi3.o libgcc/./_mulvsi3.o libgcc/./_mulvdi3.o libgcc/./_negvsi2.o libgcc/./_negvdi2.o libgcc/./_ctors.o libgcc/./_ffssi2.o libgcc/./_ffsdi2.o libgcc/./_clz.o libgcc/./_clzsi2.o libgcc/./_clzdi2.o libgcc/./_ctzsi2.o libgcc/./_ctzdi2.o libgcc/./_popcount_tab.o libgcc/./_popcountsi2.o libgcc/./_popcountdi2.o libgcc/./_paritysi2.o libgcc/./_paritydi2.o libgcc/./_powisf2.o libgcc/./_powidf2.o libgcc/./_powixf2.o libgcc/./_powitf2.o libgcc/./_mulsc3.o libgcc/./_muldc3.o libgcc/./_mulxc3.o libgcc/./_multc3.o libgcc/./_divsc3.o libgcc/./_divdc3.o libgcc/./_divxc3.o libgcc/./_divtc3.o libgcc/./_eprintf.o libgcc/./__gcc_bcmp.o libgcc/./_divdi3.o libgcc/./_moddi3.o libgcc/./_udivdi3.o libgcc/./_umoddi3.o libgcc/./_udiv_w_sdiv.o libgcc/./_udivmoddi4.o
/scratchbox/compilers/cs2005q3.2-glibc2.5-arm/bin/sbox-arm-linux-ar: libgcc/./_udivsi3.o: No such file or directory
make[3]: *** [libgcc.a] Error 1
make[3]: Leaving directory `/home/mexon/clisp/gcc-4.1/local/gcc-4.1-4.1.1ds2.orig/gcc-4.1.1/host-arm-unknown-linux-gnu/gcc'
make[2]: *** [libgcc.a] Error 2
make[2]: Leaving directory `/home/mexon/clisp/gcc-4.1/local/gcc-4.1-4.1.1ds2.orig/gcc-4.1.1/host-arm-unknown-linux-gnu/gcc'
make[1]: *** [all-gcc] Error 2
make[1]: Leaving directory `/home/mexon/clisp/gcc-4.1/local/gcc-4.1-4.1.1ds2.orig/gcc-4.1.1'
make: *** [all] Error 2

I'm quite disappointed by that. Oh, it's possible that I ran out of space. Right, I won't try to compile xfree86 at the same time as gcc4.1. xfree86 is definitely an overnight job. Just gcc4.1

Note that the floating point problems are noticed here, which links to a more recent report of clisp problems here.

Interesting. vacall-arm.S was generated by gcc 2.6.3 from vacall-arm.c. ldfeqs is only mentioned in the .S file, not the .c file. So presumably if I regenerate the .S file, it will work.

[sbox-CHINOOK_ARMEL: ~/clisp/clisp/clisp-2.41/ffcall/vacall] > make -f Makefile.devel vacall-arm.S
gcc -V 2.6.3 -b arm-acorn-riscix -O2 -fomit-frame-pointer -DHAVE_LONG_LONG -D__arm__ -S vacall-arm.c -o vacall-arm.s
sbox-arm-linux-gcc: `-V' must come at the start of the command line
make: *** [vacall-arm.S] Error 1

GCC failed exactly the same way as before. So it's not a disk space thing.

So why use -V or -b at all?

[sbox-CHINOOK_ARMEL: ~/clisp/clisp/clisp-2.41/ffcall/vacall] > gcc -O2 -fomit-frame-pointer -DHAVE_LONG_LONG -D__arm__ -S vacall-arm.c -o vacall-arm.s
vacall-arm.c:23: warning: call-clobbered register used for global register variable
vacall-arm.c:24: warning: call-clobbered register used for global register variable
vacall-arm.c:26: warning: register used for two global register variables
vacall-arm.c: In function `__vacall':
vacall-arm.c:124: error: insn does not satisfy its constraints:
(insn 437 205 206 22 (set (reg/v:SF 16 f0 [ fret ])
        (reg:SF 3 r3)) 155 {*arm_movsf_soft_insn} (nil)
    (nil))
vacall-arm.c:124: internal compiler error: in reload_cse_simplify_operands, at postreload.c:391
Please submit a full bug report,
with preprocessed source if appropriate.
Send email to arm-gnu@codesourcery.com for instructions.

The warning lines are (line numbers added):

23: register __vaword       iret    __asm__("r0");
24: register __vaword       iret2   __asm__("r1");
25: register float          fret    __asm__("f0");
26: register double         dret    __asm__("f0");

The error line is the last closing brace of the only function in this file, __vacall. Hmm:

  /* MAGIC ALERT!
   * This is the last struct on the stack, so that
   * &args + 1 == &return_address == &firstword - 1.
   * Look at the assembly code to convince yourself.
   */

I don't think that's the problem though. Clearly, it's something to do with this "f0". Read this for more information.

Note that this code is all from libffcall. So I could start by tackling that. But I think they're exactly the same version. Yes, same problem.

So my theory is that if I remove all the register stuff around that, the compiler will try to do something sensible for the architecture. I've done that, and generated a .s file. Let's see if it compiles. This line in the generated file should help:

        .fpu softvfp

Well, that helped configure get over its hurdle. But there are other files with the same problem. vacall_r/vacall-arm.s appears to be hand-written, not auto-generated.

Ach, I think I was looking at the wrong file before. It's the one in vacall_r that's actually the problem. Nevertheless. If ffcall manages to build, then I may have achieved something. And I haven't: it fails exactly the same way. Ah, I think I have to "mv vacall-arm.s vacall-arm.S". No, that doesn't help either. You know what? That .s file is being auto-generated each time. And of course it's being done with that -b arm-acorn-riscix flag. So let's just remove those flags and see what happens. By the way, I'm building with dpkg-buildpackage here, so it's possible that this is deliberately asking for them to be rebuilt. Even that didn't help. It builds the file like this:

make[2]: Entering directory `/home/mexon/clisp/libffcall/ffcall-1.10+2.41/ffcall/vacall'
gcc -E `if test false = true; then echo '-DASM_UNDERSCORE'; fi` ./vacall-arm.S | grep -v '^ *#line' | grep -v '^#' | sed -e 's,% ,%,g' -e 's,//,@,g' -e 's,\
$,#,g' > vacall-arm.s
gcc -x none -c vacall-arm.s

Ah. That might mean that I trashed my own file. Yes, they're completely different. Lucky I saved a copy! No, still the same problems. I don't believe it, I still have a file generated by gcc 2.6.3. This is very strange.

I think configure creates src/callback/vacall-arm.s by copying it from somewhere, and therefore I have to fix things before running configure for the first time. There are two vacall-arm.c files, and I need to do them both.

[sbox-CHINOOK_ARMEL: ~/clisp/clisp/local/clisp-2.41.orig] > find . -name \*vacall-arm\*
./ffcall/callback/vacall_r/vacall-arm.c
./ffcall/callback/vacall_r/vacall-arm.S
./ffcall/vacall/vacall-arm.c
./ffcall/vacall/vacall-arm.S

Aw man. The c files are identical. The S files are just a little bit different. Well, I reckon I edit the last two, and then copy the results of compiling to .S into the first one.

OK. Now configure fails like this:

gcc -g -O2 -I. -I../../ffcall/avcall -c ../../ffcall/avcall/minitests.c
/bin/sh ./libtool --mode=link gcc -g -O2 -x none minitests.o libavcall.la -o minitests
gcc -g -O2 -x none minitests.o -o minitests  ./.libs/libavcall.a
./minitests > minitests.out
qemu: uncaught target signal 4 (Illegal instruction) - exiting

That's a different library, avcall. Same deal I think, but with avcall-arm.S. Except until configure, it only exists in one place, ffcall/avcall.

No, I can't just copy over those files for vacall and vacall_r, they have different function name. They really are different. Oh, no they're not, same input c file remember? But they're otherwise so similar that I reckon one is a branch of the other with a different function name. So I think what I do is change the name of the function in ./ffcall/callback/vacall_r/vacall-arm.c and recompile both files.

diff ./ffcall/vacall/vacall-arm.S ./ffcall/callback/vacall_r/vacall-arm.S | less
12,17c12,14
< LC0:
<       .word   C(vacall_function)
<       .align  0
<       .global C(__vacall)
<       DECLARE_FUNCTION(__vacall)
< C(__vacall:)
---
>       .global C(__vacall_r)
>       DECLARE_FUNCTION(__vacall_r)
> C(__vacall_r:)

Hmm, it seems that in fact it's the other way round. Except that in vacll_r it doesn't compile. So what I do is compile it in the other directory, copy it over, and change the name.

Still failed in exactly the same place. I thought at first I'd forgotten to do avcall, but no, it's there.

Ah, interesting, running minitests by hand generates heaps of correct output, before this:

long long f(float,long long,int):(1.4,0x35c6f707fffffffa,0xe)->0x35c6f70800000009
long long f(float,long long,int):(1.4,0xe35c6f707,0x9999999a)->0xdcf6090a2
void* f(void*,double*,char*,Int*):(0x14d2e,0x14da0,0xc5dc,0x14c98)->0x14da1
void* f(void*,double*,char*,Int*):(0x14d2e,0x14da0,0xc5dc,0x14c98)->0x14da1
Int f(Int,Int,Int):({1},{2},{3})->{6}
Int f(Int,Int,Int):({1},{2},{3})->{6}
J f(J,int,J):({47,11},2,{73,55})->{120,68}
J f(J,int,J):({11,2},73,{55,-1243263332})qemu: uncaught target signal 11 (Segmentation fault) - exiting

Man, this was really written by the kind of person my parents warned me about. I'm going to have to figure this out slowly and carefully. tests.c is the file, by the way.

OK, reading avcall-arm.c, I realise that this is completely broken for armel.

If I want to fix this, here's what I have to do. Read up on the differences in the new ABI. Chase up as much documentation as I can find. Read the various comments in the existing functions which explain how function calls work and in particular how structs are packed. Then write a whole new avcall-armel.c file to do the job. It's a fairly constrained problem: those files aren't long. But it will be very hard to get my head around.

With the new ABI, default structure packing changes, as do some default data sizes and alignment (which also have a knock-on effect on structure packing). In particular the minimum size and alignment of a structure was 4 bytes. Under the EABI there is no minimum and the alignment is determined by the types of the components it contains. This will break programs that know too much about the way structures are packed and can break code that writes binary files by dumping and reading structures.

You know, I'm pretty confident I can get at least structs working.

I should definitely do all of this in ffcall before moving on to clisp itself.

There's some chatter on internet tablet talk.

Note that ffcall, unlike clisp, has build depends of only debhelper and autotools-dev. So I might even be able to fix this on the road.

After a night of hacking, I have solved one alignment problem in avcall. It looks like this:

#if defined(__arm__)

#define av_float(LIST,VAL)						\
  (++(LIST).aptr > __av_eptr(LIST)					\
   ? -1 : (((float*)(LIST).aptr)[-1] = (float)(VAL), 0))

#define av_double(LIST,VAL)						\
  fprintf(out, "double aptr %x args %x\n", (LIST).aptr, (LIST).args ); \
  fflush(out); \
   (((LIST).aptr += 2 + (((int)((LIST).aptr) % 8) ? 1 : 0)) > __av_eptr(LIST)	\
    ? -1 :								\
    ((LIST).tmp._double = (double)(VAL),				\
     (LIST).aptr[-2] = (LIST).tmp.words[0],				\
     (LIST).aptr[-1] = (LIST).tmp.words[1],				\
     0)); \
fprintf(out, "double aptr %x args %x\n", (LIST).aptr, (LIST).args )

#endif

Or at least, you remove the prints. Then the avcall-arm.s file has to be generated, and I'm still not clear on that point, although I managed it earlier. No changes to avcall-arm.c, just generate the file. Once that's done, then build. But you can't build, because the same thing has to be done to the vacall-arm.s file and probably the _r directory as well. Nevertheless, this represents progress.

Here are the commands I run to get things working:

cp -r ffcall-1.10+2.41.orig ffcall-1.10+2.41
patch -p0 < my-third-patch
make -C ffcall-1.10+2.41/ffcall/avcall -f Makefile.devel avcall-arm.S
make -C ffcall-1.10+2.41/ffcall/callback/vacall_r -f Makefile.devel vacall-arm.S
make -C ffcall-1.10+2.41/ffcall/vacall -f Makefile.devel vacall-arm.S
cd ffcall-1.10+2.41/ffcall
./configure

I have figured out that the problem lies with struct return values, while struct args are fine. That's probably got something to do with __av_start_struct3 and friends.

Beyond that, I have little understanding of how it actually works. In particular, aptr only seems to be used for the fourth and fifth elements. The first three must be being sent in registers somehow. Oh, OK, let's say it's like this. Up to four words can be sent in registers. There's a two-word struct, then an int, and then the third argument which is also a two-word struct doesn't fit in registers, so it's sent on the stack.

I also think that we actually have up to six registers that can be used for arguments, but we only appear to be using four. But it works, so I guess it works.

It doesn't help that ARM's "information center" appears to be completely broken under Firefox.

This is the comment from the existing implementation:

To return a structure, the called function copies the return value to the address supplied in register "%r0".

I do not appear to have __AV_PCC_STRUCT_RETURN set, but I do appear to have the other two (small and gcc).

It is possible that it is precisely this which is wrong. It shouldn't be PCC, but "normal".

I don't know what the provenance of this is, but:

When compiling functions that return structures or unions, GCC output code normally uses a method different from that used on most versions of Unix. As a result, code compiled with GCC cannot call a structure-returning function compiled with PCC, and vice versa.

The method used by GCC is as follows: a structure or union which is 1, 2, 4 or 8 bytes long is returned like a scalar. A structure or union with any other size is stored into an address supplied by the caller (usually in a special, fixed register, but on some machines it is passed on the stack). The target hook `TARGET_STRUCT_VALUE_RTX' tells GCC where to pass this address.

By contrast, PCC on most target machines returns structures and unions of any size by copying the data into an area of static storage, and then returning the address of that storage as if it were a pointer value. The caller must copy the data from that memory area to the place where the value is wanted. GCC does not use this method because it is slower and nonreentrant.

On some newer machines, PCC uses a reentrant convention for all structure and union returning. GCC on most of these machines uses a compatible convention when returning structures and unions in memory, but still returns small structures and unions in registers.

You can tell GCC to use a compatible convention for all structure and union returning with the option `-fpcc-struct-return'.

Two pages from Microsoft. Including: "ARM's compiler returns some 4-byte structures in R0. CLARM and CLTHUMB always return structures in the calling function's stack space to which the first argument points."

The segfault I'm getting certainly makes it look like the calling function is trying to store the return value in a pointer supplied in a register, but I'm not doing that.

See, even though I only have an 8 byte return value here, I can't see anywhere in the code where it... hold on!

        if (l->rsize == 2*sizeof(__avword)) {
          ((__avword*)l->raddr)[0] = i;
          ((__avword*)l->raddr)[1] = iret2;

As I was saying, I can't see anywhere in the code where it does the "address in a register" thing. But there it is right there! It's setting a return address instead of just returning.

No, wait, that's irrelevant. This part of the code isn't the problem. The problem is that the called function is expecting raddr to be stored in some register, and that isn't being done. so the only question is: which register?

I just compiled a trivial example to assembler and figured out what it's doing. I'm pretty sure the address for the return value goes in r0. That might explain why the arguments appear to be one word to the left of where they should be. In fact, it would totally explain it. So all I have to do is add that to the args list as effectively an extra arg.

And it already has infrastructure for that. The only thing to do is make it do that if it's a struct always, regardless of whether it fits in the registers or not. In fact, this seems to be controlled by __AV_GCC_STRUCT_RETURN. Hmm, no, a four-byte struct doesn't get returned that way. A six-byte struct does. So let's try removing the "|| ((TYPE_SIZE) == 8" part and see what happens. No difference. then again, that might be because I was editing the i386 section. Ah, yes exactly:

avcall.h.in:  __AV_GCC_STRUCT_RETURN    = 1<<2, /* consider 8 byte structs as small */

So how do I convince that to, like, not? Well, I just added a "!defined(__arm__)". And all my tests now pass. How nice.

If I turn on the extra tests, a couple seem wrong:

Char f(Char,double,Char):({'A'},0.2,{'C'})->{'B'}
Char f(Char,double,Char):({''},0.2,{''})->{''}
T f(T,char,T):({"the"},' ',{"fox"})->{"box"}
T f(T,char,T):({"��"},'/',{" "})->{""}
X f(B,char,double,B):({0.1,{1,2,3}},'',0.3,{0.2,{5,4,3}})->{"return val",''}
X f(B,char,double,B):({2.65022e-314,{2,3,0}},'3',0.2,{8.48798e-314,{3,0,902231815}})->{"return val",'3'}

The common factor: chars. So it's a bit early to declare victory yet. Just to clarify: the fix was in avcall.h, and probably needs to really be applied to avcall.h.in:

#if defined(__GNUC__) && !defined(__arm__)
				  __AV_GCC_STRUCT_RETURN |
#endif

I managed to knock off one more bug. The problem was that after the last one-byte-long struct has been pushed onto the arguments list, the arguments list is left being an odd size. This gets rounded down by "arglen = l->aptr - l->args", which means that the last argument never gets written to the argframe. To solve, it round up when calulating the arglen:

  /* Make the argument count round up, in case just one byte got added
     to the argument list or something */
  l->aptr = (__avword*)((void*)l->aptr + sizeof(__avword) - 1);
  int arglen = l->aptr - l->args;
  __avword i;

But I'm still not done. The remaining two tests both fail as well.

Oh, nearly forgot to mention, the whole reason for that second bug popping up was what I did to fix the first one. Here it is, complete with debugging statements and context:

#if defined(__m88k__)
#define __av_struct(LIST,TYPE,TYPE_SIZE,TYPE_ALIGN,ASSIGN,VAL)		\
  (((LIST).aptr =							\
    (__avword*)(((((__avword)(LIST).aptr+(TYPE_SIZE)+(TYPE_ALIGN)-1) & -(long)(TYPE_ALIGN))\
		 +sizeof(__avword)-1) & -(long)sizeof(__avword)))	\
   > __av_eptr(LIST)							\
   ? -1 : (ASSIGN(TYPE,TYPE_SIZE,TYPE_ALIGN,(void*)((__avword)(LIST).aptr-(TYPE_SIZE)),VAL),\
	   0))
#endif
#endif
#if defined(__arm__)
#define __av_struct(LIST,TYPE,TYPE_SIZE,TYPE_ALIGN,ASSIGN,VAL)		\
  fprintf( out, "Yes, we're really here.  args %x aptr %x\n", (LIST).args, (LIST).aptr ); \
  (((LIST).aptr = (__avword*)(((long)(LIST).aptr+(TYPE_SIZE)+TYPE_ALIGN-1) & -(long)TYPE_ALIGN)) \
    > __av_eptr(LIST)							\
    ? -1 : (ASSIGN(TYPE,TYPE_SIZE,TYPE_ALIGN,(void*)((__avword)(LIST).aptr-(TYPE_SIZE)),VAL),\
	    0)); \
  fprintf( out, "After, aptr %x\n", (LIST).aptr)
#endif
#if defined(__m68k__) || defined(__convex__)
/* Structures are passed as embedded copies on the arg stack.
 */
#define __av_struct(LIST,TYPE,TYPE_SIZE,TYPE_ALIGN,ASSIGN,VAL)		\
  (((LIST).aptr = (__avword*)(((long)(LIST).aptr+(TYPE_SIZE)+sizeof(__avword)-1) & -(long)sizeof(__avword))) \
    > __av_eptr(LIST)							\
    ? -1 : (ASSIGN(TYPE,TYPE_SIZE,TYPE_ALIGN,(void*)((__avword)(LIST).aptr-(TYPE_SIZE)),VAL),\
	    0))
#endif
#if (defined(__sparc__) && !defined(__sparc64__)) || (defined(__powerpc__) && !defined(__powerpc64__) && !(defined(_AIX) || (defined(__MACH__) && defined(__APPLE__))))

The explanation is basically that it is the same as the m68k and convex ones below, but is uses the specified alignment as the offset instead of assuming that everything is aligned according to __avword. Everything still demands to be aligned on four-byte boundaries, but this strategy puts the char into the first byte instead of the last, which seems to be what the ABI demands.

I think I marginally improved things. The three-byte struct return value does get passed in a register. I changed it to rather than being specific sizes, anything less than eight bytes goes on the stack. But I don't know if it's less than 8 or less than 4, so I added a test to check for that.

That previous case turned out to be unnecessary - I can use the existing one for i386 and other little-endian friends.

OK, good, avcall passes its tests. vacall segfaults, so I'm about one-third of the way through. I'd better check in my patch.

vacall fails when trying to return a value. I expect this is due to the following:

  /* MAGIC ALERT!
   * This is the last struct on the stack, so that
   * &args + 1 == &return_address == &firstword - 1.
   * Look at the assembly code to convince yourself.
   */

I may well do that, thanks.

Yesterday I spent ages staring at the assembly code for my test trying to figure out how it's trying to get hold of the return value address, and failed. But hopefully something is gelling in my mind.

One interesting thing is the difference between compiling with -fomit-frame-pointer and without. With that flag, it uses "bx lr". Without, it uses more complicated stuff, there's no bx instruction. In fact, tests is compiled without the flag. But vacall-arm.S is compiled with the flag.

Reading this hint, this is probably deliberate. That is, it's only the __vacall function which should have no frame pointer, because it's the called function that has to understand __vacall's layout.

I'm also referencing some Microsoft documentation about the ARM stack layout and the Wikipedia page about stack layout. Also this summary of the ARM instruction set, which unfortunately is a bit cryptic for me.

Right, I finally managed to do something useful with vacall. The secret seems to be that it deliberately screws with its stack. It knows that anything past the fourth argument will get passed on the stack. So it simply goes back four places, scribbles its four argument registers over whatever is there, and calls the function. It saves away the return address, but not the other three, even though it has a structure there to hold the new values. So I simply changed things so that it stores all four stack positions before scribbling over them, and restores them all afterwards. Note that the function which gets called only gets an opaque pointer to some area of memory, which in this case happens to be this hacked up region of stack. It won't do any damage outside this region though.

Now I'm left with the usual problems, similar to all of the ones for avcall. Hopefully I can get through them fairly quickly.

I've done two, both just by moving __arm__ to something more appropriate. The second was that the __va_arg_adjusted thing is little endian instead of big endian. That solves the char problems. The second is that __arm__ moves from the null version to the "__VA_alignof(double) > sizeof(__vaword)" version. Which happens to be the case.

And the third is that __va_arg_longlong moves from the "(at most) word-aligned" bit to "have alignment 8" bit.

So now it's just that structures are broken.

I fixed one set of those problems by adding the __va_start_struct_return and __va_start_struct1 macros for ARM. Now my only problem is small structs that get returned. Large structs work fine.

Right, vacall is done and checked in. Next thing to fail is trampoline. Conveniently, there's a step-by-step guide to porting it. After that I'll have to do trampoline and vacall in callback. So I've done two directories and I have three to go. So call that 90% done I guess.

The thing to do with the trampoline is start small. Write some machine language that moves data from one address to another via a register and then returns. See if I can execute it. That is all.

OK, that's one thing explained: configure does test for the existence of mprotect, but it never checks that it can allow execution. That's no good.

I've managed to load and execute some code from memory by mmaping it. Here's the code:

#include 
#include 
#include 
#include 
#include 
#include 
#include 

char* buffer;
int (*funptr)(int);

int fun(int x)
{
    return x + 2;
}

int main()
{
    int pagesize = getpagesize();

    int fd = open("memory", O_RDWR | O_CREAT);
    buffer = (char*)mmap(NULL, pagesize, PROT_READ | PROT_WRITE | PROT_EXEC,
                         MAP_SHARED, fd, 0);
    if (buffer == MAP_FAILED)
    {
        fprintf(stderr, "Couldn't map memory: %d\n", errno);
        return 0;
    }

    fflush(stdout);
    funptr = (int (*)(int))buffer;

    return (*funptr)(3);
}

Compile it. Dissassemble fun. Produces this:

0x000084f4 :     mov     r12, sp
0x000084f8 :     stmdb   sp!, {r11, r12, lr, pc}
0x000084fc :     sub     r11, r12, #4    ; 0x4
0x00008500 :    sub     sp, sp, #8      ; 0x8
0x00008504 :    str     r0, [r11, #-16]
0x00008508 :    ldr     r3, [r11, #-16]
0x0000850c :    add     r3, r3, #2      ; 0x2
0x00008510 :    mov     r0, r3
0x00008514 :    sub     sp, r11, #12    ; 0xc
0x00008518 :    ldmia   sp, {r11, sp, pc}

I have no idea what those addresses on the left are, but they're certainly not positions in the file. But there are a bunch of constants there. "od -tx1 -Ax -w4 < a.out | less". What we're aiming for is a file that starts with this:

000000 0d c0 a0 e1
000004 00 d8 2d e9
000008 04 b0 4c e2
00000c 08 d0 4d e2
000010 10 00 0b e5
000014 10 30 1b e5
000018 02 30 83 e2
00001c 03 00 a0 e1
000020 0c d0 4b e2
000024 00 a8 9d e8

You can see the constants in the left-hand data column. Searching through the hex dump, I find this code starts at 0004fc. Doing the hex dump with the addresses in decimal (-Ad) shows that it's 1268. So I do "tail -c +1269 a.out > memory". Now I can execute a.out, and the returned value is 5. Now, if I make test.c write a different constant into place 0x18, I get a different return value. So that's good.

So I tried copying the values out of the trampoline.c file for arm and doing that, but I get more segmentation fault. For a moment there I thought it might be as simple as the endianness having changed, but using long means that I'm automatically using the correct endianness for the architecture. I verified that if you compile the code in the comment in the arm section you do get exactly the same binary data as I'm writing to my trampoline. So that's not the problem. Instead, I think there must be something about the instruction set that doesn't work. Or possibly there's a different offset to the data now.

It would be nice to see exactly which instruction is producing the segfault. But this doesn't seem to be possible under scratchbox.

I think I should concentrate on just getting the jump to somewhere sane working. The problem is I have to do this anyway in order to keep the program running. Input assembler looks like this:

blah:   
        ldr     ip,[pc,#_function-.-8]
        ldr     pc,[ip,#0]
_function:
        .word   buffer

Compiled and disassembled, it looks like this:

0x00000000 :    ldr     r12, [pc, #0]   ; 0x8 <_function>
0x00000004 :    ldr     pc, [r12]

Which looks wrong to me. It should be 8, for the next word after the last instruction. Anyway, the equivalent instructions are:

0000056 00 c0 9f e5
0000060 00 f0 9c e5

If I change it to point to 8 bytes after the second instruction, it looks like this:

0000056 08 c0 9f e5
0000060 00 f0 9c e5

Comparing with what I've been telling it to do before, it seems that I was telling it to jump to the constant "0x3", which is clearly bad.

I found out how to use GDB. Pity it won't let me dump the registers really, isn't it? In fact, pity that qemu basically won't let me do anything at all.

Interesting. I couldn't get the real device to use mmap. But I changed it to use MAP_ANONYMOUS, and now it works. That's nice. And although it still segfaults, I can at least see the registers now. This shows me that the registers are badly screwed up somehow.

OK. When it segfaults, the program counter has 0xe1a0c00c. Obviously this is bad. But also, this is an instruction. It's a mov of some kind. I suspect that I'm not supposed to load the exact address itself, but that because there's some kind of post-decrement going on I'm supposed to load the address shortly before. Or something. I wish I could track down where that instruction is defined, but I can't install grep or od on this thing.

Nope, it's not there. I have a bunch of e1a0c00d, but no c00c. Maybe it comes from some shared library somewhere.

Oh, hang on. Those instructions are completely fucked, aren't they? Not a load. A move. What I'm doing there is loading the first instruction of the function as the program counter. That would explain it.

So what I really want to do is not go via an intermediate register, but just load the pc directly from memory with a single instruction:

        ldr     pc,[pc,#-4]

I don't really get the -4 part, but it seems to be necessary. Maybe that's the frame pointer or something, who knows. Could also be -g. That comes out to:

04 f0 1f e5

And that works perfectly! Hooray! OK, I'm getting close to knocking this one off now.

OK, let the record state: the following works, on the device:

#include 
#include 

int data = 0x42;
int variable = 0;
int rv = 0;
void fun()
{
    rv = variable + 3;
}
void (*funptr)() = &fun;

int main()
{
    int pagesize = getpagesize();
    long* buffer = (long*)mmap(NULL, pagesize, PROT_READ | PROT_WRITE | PROT_EXEC,
                               MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
    if (buffer == MAP_FAILED)
    {
        return -1;
    }

    buffer[0] = 0xE92D0001;
    buffer[1] = 0xE59F0014;
    buffer[2] = 0xE590C000;
    buffer[3] = 0xE59F0010;
    buffer[4] = 0xE580C000;
    buffer[5] = 0xE8FD0001;
    buffer[6] = 0xE59FC008;
    buffer[7] = 0xE59CF000;
    buffer[8] = (long)&data;
    buffer[9] = (long)&variable;
    buffer[10] = (long)&funptr;

    void (*tramp)() = (void (*)())buffer;
    (*tramp)();

    munmap(buffer, pagesize);
    return rv;
}

And just in case the flags matter or something:

~ $ gcc-3.4  -Wall -g test.c
~ $ ./a.out
~ $ echo $?
69

However, it does not at all work on scratchbox. So that makes things interesting. It seems to me that no changes are required to the trampoline code at all. However, it will be necessary to compile clisp on the device itself. That should be a fascinating experience. Given that it's 15MB before we even get started. Still, I do have 55MB free. Might just squeeze in.

So, I'm already running into busybox trouble, such as the lack of md5sum. So I'm trying the extended busybox. To install it, I have to remove wget. It seems to conflict with binutils. Removing that removes dpkg-dev and gcc-3.4, so I'll have to put them back afterwards. Oh, and less. Man, and cpio. But now we're back. And we have diff, which is nice. Hmm, and it's trying to install cpio and binutils. This won't work.

For reference, here's what happens inside scratchbox:

[sbox-CHINOOK_ARMEL: ~/clisp/libffcall/ffcall-1.10+2.41/ffcall/trampoline] > ./a.out
qemu: uncaught target signal 4 (Illegal instruction) - exiting

So I think there's a couple of ways to proceed here. One is to try and get ffcall compiling on the machine itself. But another would be to try to figure out which instruction it doesn't like and work around it. It might be that carat and the "PSR bits", in which case I may well be able to get away with just leaving them off.

And in fact, that is precisely the case. Instead of 0xE8FD0001, use 0xE8BD0001, and the test goes.

Or at least, my test goes. The real test segfaults. Probably the mprotect/mmap thing. I think I can make that work by just bypassing EXECUTABLE_VIA_MPROTECT by making HAVE_WORKING_MPROTECT false. Hmm, still segfaults.

So I just tried building the full libffcall on the actual device, and got exactly the same error. This is good: it means I can debug it on Aeon.

Or rather, to use gdb I have to use Joy. And it's pretty clear. The test expects to supply an actual function as the function argument. But the machine code is written assuming that a variable that holds a function pointer will be the function argument. So the machine language is wrong and needs to be fixed. That means replacing E59CF000 with E1A0F00C. No, still segfaults. Wait, I just got the wrong code (and I've edited it to be right in this log). Still segfaults.

Ah, and the same thing applies to the data. Not a pointer to where the data is kept, the literal data itself. Another mov. This time "mov ip,r0", which is changing E590C000 to E1A0C000. Note that I could remove two whole instructions, but it's not worth it.

Still segfaults. Fuck it. And it's because I got the wrong code again. Dammit. Still segfaults though.

And the function data as well! Three movs. Count them, three!

No, wait, I've got it completely wrong. That original ldr was never supposed to change to a mov. God, this is confusing.

No, it was supposed to change to a mov. And in fact it works as far as printing "Works, test1 passed.". And then it calls exit, and exit segfaults. Weird, huh? Maybe I should unmap that memory.

Uh-oh, Heisenbug! When I step through things, it works. When I don't, function_data gets assigned to the value instead of the pointer to the value. Bad.

OK, two things. One, I might be breaking the stack. It seems to have an exclamation mark when it saves away its registers, and again when it loads them. That seems wrong. Two, what about those weird flags? I load them, but do I ever save them? That could explain it: it would be random when run normally, but when stepping through in the debugger it would always have the debugger's flags. Or something. Anyway, that's worth looking at too. Just try playing with the instructions to get something that works.

Well, the stack does seem to get reset correctly. So that looks right. In the debugger, my segfault happens at address 0x4100d900 in /lib/ld-linux.so.3. That's another mmap address. My trampoline is at 0x41022db8, and is 4096 (0x1000) bytes big. So no, they don't overlap.

As for the PSR bits, they are apparently stored whenever you use stm to store the pc (r15). But they aren't loaded unless you use the carat. So only having the carat on the load does make sense. What doesn't make sense is that it isn't actually storing or loading the pc, only r0. Do I want to restore the PSR bits?

I'm going to try using the libc6-dbg library. That means using "LD_LIBRARY_PATH=/usr/lib/debug". It tells me that I'm segfaulting in _dl_fini(). The instruction is "ldrb r3,[r2,#397]". r2 has NULL, so that's not gonna work. Register 2 came from register 8. Register 8 was set by adding #660 to register 2. And about here I have to give up.

It should be rather easier to debug this from the non-stepping method though. It's doing something pretty straightforward: assigning the data rather than the data address to function_data. It should be reasonably easy to break at various specific points and check what the registers are doing and what the code says to do.

Hmm. Actually, I guess I should make it look as much like a real function call as possible. That is, it should blx or whatever and the last step should be a ldmfd, including the program counter. And of course, that's precisely why this proto.c was supplied in the first place. So it should look something like this:

        str     lr, [sp, #-4]!
        ldr     r1, .L2
        ldr     r3, .L2+4
        sub     sp, sp, #4
        str     r1, [r3, #0]
        ldr     r2, .L2+8
        blx     r2
        add     sp, sp, #4
        ldmfd   sp!, {pc}

Hmm, how about this:

Any veneer inserted must preserve the contents of all registers except IP (r12) and the condition code flags; a conforming program must assume that a veneer that alters IP may be inserted at any branch instruction that is exposed to a relocation that supports inter- working or long branches.

Can I get away with only modifying ip? I think so. No, no I can't. I need to keep the address to write to in one register and the value to write in another.

This page may explain why there's -8 all over the place. There's a "PC bias".

So I think my solution is just to replace "mov pc,ip" with "blx ip". Hmm, no, then I have to return somehow. Maybe the goto approach is better. And that really is just a mov from one register to another. I need to debug how the hell that function_data variable is being set.

I think I may have a solution. It must be the PC bias. I'm jumping to slightly the wrong spot, which means the stmfd statement never gets run and the stack never gets updated to the correct spot.

It doesn't seem to be as simple as that. However, when I break at f(), it actually puts the breakpoint at 12 bytes after the start of the function. How weird. When it gets there, the parameter x (r0) has the address of the trampoline in it.

No, that doesn't appear to be the case.

If I set a breakpoint in the trampoline just before it returns, at address 0x41022dd4, then when it gets there r0 is correctly set. But if I set the breakpoint instead in f, then it's wrong. Hmm, actually if I break just before jumping to the trampoline, then it's correct, but if I don't, it's wrong.

Disassembling main(), it seems that I should have the argument in r0 when I branch to the trampoline at 0x8478. And if I break there, it works. Unfortunately, I can't break inside the trampoline until it's allocated. That's address 0x8470. And if I break there, the test passes. So I have some real problems here. I think I'll have to use printf statements somehow.

OK, I think I'm getting somewhere. After calling f, the stack pointer is decreased by 8. After leaving f, it's increased by 12, leaving it four more than it was before.

In the debugger. Starts as 0xbef60520. After the first stmfd, 51c. After the ldmfd, 520 again. Hmm. Still there after the jump. Then the function reduces it to 514. Ah, interesting. Somewhere during the course of f, the stack pointer got changed to 510. But when I break at f, it's already at 510 when I start. Oh, wait, there it is, it just subtracts 4 from the stack pointer as the second instruction. Then it adds it again just before the end, so that's your actual stack usage there.

If I set no breakpoints, the stack pointer is 514 by the time it gets to the printf()s in f(). That's wrong.

I kinda feel like removing the exclamation marks from the stmfd and ldmfd instructions. They seem unnecessary and dangerous.

I just tried removing the carat, to bring things into line with scratchbox. Doesn't help.

Anyway, "ldmfd sp,{r0}" is E89D0001 (rather than "ldmfd sp!,{r0}" which is E8BD0001), and "stmfd sp,{r0}" is E90D0001 (rather than "stmfd sp!,{r0}" which is E92D0001). And that seems to have brought the stack back under control. But it still doesn't work.

So I guess my next step is to try to figure out what's happening to r0 through all of this.

It's hard, because I can't break there. So instead I'm investigating the theory that I'm scribbling over main's stack in my trampoline. Before main starts, the sp is 0xbedb8530, and lr is 0x4103c10c. I suspect that lr might be the one that gets scribbled over. After the first instruction, the sp is 0xbedb8520, so 16 bytes less. It stores away r4, r5, r6 and lr, so that makes sense. It doesn't seem to modify the stack pointer in any other way.

The stack from 0xbedb8520 to 0xbedb8530 is completely zero. That's odd. Oh, wait, the sp changes each time. I have to be careful. So, after the registers have been saved away by main, 0x20 is 0x41022db8, 0x24 is 0xbe9e8524, 0x28 is 0x8418, 0x2c is 0x4103c10c, and just for completeness 0x30 is 0x82cd. So from 0x20, they are r4, possibly sp, r6 and lr. So something weird is happening here with r5, but I'm not sure I care. So I step to just before it goes to the trampoline, and the lr is who knows what, but the old value is safely stored in 0x2c. Upon returning to main, the old value in 0x2c is still intact at 0x41022db8, but the current value in the lr is 0x4107ab7c. But at the end of main it jumps to a fixed spot, 0x8948. So maybe it doesn't care about the lr anyway. Oh, wait, it doesn't branch to it, it just loads it into r0 before going to puts. So that's OK. In fact it does a branch to exit, with 0 in r0 as an argument. So it's hard to see what can go wrong.

Exit starts at 0x83f4. Jumps to 0x0398. Jumps to _dl_runtime_resolve for some reason. The argument registers are 1, 10, 1 and 0 respectively. So I hope it's not trying to free something. The exit() function load the pc into r12, adds 32768, and then jumps to whatever's stored at that address plus 1672. I guess this is a veneer around a shared library. Oh, in fact that ends up jumping back to _init+20. So then it carries on flushing buffers and stuff. No it doesn't. It loads its own lr from its stack and branches to it. Which is _dl_runtime_resolve. So exit() has returned 1 to _dl_runtime_resolve, which I guess is sensible, since I'm trying to return 1 here. Maybe the segfault is due to not unmmapping the memory? This is in libc, so maybe I can see what the source code looks like. The segfault is in _dl_fini.

I'm trying to put my problem into an email, and I've noticed that I've got different versions of this code lying around, some of which appear to be doing the wrong thing.

I'm having trouble contacting Aeon right now, but on Joy two things are clear. First, I'm not successfully loading r0 back from the stack. It's getting the wrong value. This is probably because I do indeed need to move the sp. Second, I really should be decrementing the stack by 4 first, to make sure I'm not overwriting the calling function's space. Basically, I should put the exclamation marks back.

For reference, because this keeps confusing me, at 0x41022dc8, just before the store, I should have &magic in r12, and &function_data in r0. Just after the ldmia in 9cc, I should have r0 with the first argument, 0x614a13c9.

Right, I'm back where I was - with my sp getting corrupted somehow. But at least r0 is correct now.

To make any headway here, I'm going to have to start properly compiling my assembly:

gcc -Wall -g -c arm-code.s
od -tx4 -Ax -w4 < arm-code.o | less

I finally get the LDM description. I am using a full, descending stack, which means both my store and load operations should use "fd" as a suffix. This translates to LDMIA and STMDB, for "increment after" and "decrement before", which is what I want. The disassembler will show me the latter versions, but my code should be the former.

Interestingly, main does a stmdb but doesn't do any ldmia. Presumably this is because it never reaches that point, due to the exit.

Unlike the "real" test, my hacked up version of main does apparently use the stack, subtracting 8 from it and never recovering that space. Now when I look at f(), things get even more interesting:

0x000084b4 :       mov     r12, sp
0x000084b8 :       stmdb   sp!, {r11, r12, lr, pc}
0x000084bc :       sub     r11, r12, #4    ; 0x4
0x000084c0 :      sub     sp, sp, #8      ; 0x8
...
0x000084e0 :      sub     sp, r11, #12    ; 0xc
0x000084e4 :      ldmia   sp, {r11, sp, pc}
0x000084e8 :      andeq   r0, r1, r0, asr #21
...

Complex! Things are a lot more sane on Joy, however, so I must just have no idea where that a.out on Aeon came from.

OK, I was really just fooling around, trying to move some registers into some spare other registers for debugging. And the thing works. How incredibly irritating. I have no idea why it works, but apparently it does. It still has two residual no-ops which I have no intention of cleaning up, but it seems to work. I have to produce a clean patch, but it's worth emphasising that it appears to work!

Yes, there you go. I cleaned out the whole thing, started again from my old patch and the new patch, and make extracheck got all the way to trampoline_r. Nice.

I cleaned up my patch. Looking in trampoline_r, it looks to me that it should work. I compiled the assembly and got exactly the same hex as is in the code. So I was disappointed when it segfaulted. But I've just realised that I'd forgotten to do this:

#include "config.h"
#undef HAVE_WORKING_MPROTECT
#include "trampoline_r.h"

Which makes it not segfault, but instead it has an illegal instruction. On Joy, I weirdly get this:

gcc -g -O2 -I. -I. -c ./trampoline.c
./trampoline.c: In function `alloc_trampoline':
./trampoline.c:362: error: `MAP_VARIABLE' undeclared (first use in this function)
./trampoline.c:362: error: (Each undeclared identifier is reported only once
./trampoline.c:362: error: for each function it appears in.)
make[1]: *** [trampoline.o] Error 1
make[1]: Leaving directory `/home/user/clisp/libffcall/ffcall-1.10+2.41/ffcall/trampoline'

That's weird, I can't find it anywhere on Aeon. So I just defined it to zero. And now I get:

gcc -g -O2 -x none test2.o trampoline.o  -o test2
./test1
trampoline: Out of virtual memory!

Copying Aeon's executables to Joy:

(gdb) run
Starting program: /home/user/test1
BFD: /usr/lib/debug/lib/ld-2.5.so: warning: sh_link not set for section `.ARM.exidx'
BFD: /usr/lib/debug/lib/libc-2.5.so: warning: sh_link not set for section `.ARM.exidx'

Program received signal SIGSEGV, Segmentation fault.
0x41022e30 in _rtld_local_ro () from /lib/ld-linux.so.3

Bear in mind that these are the normal trampolines, which I thought were working. So clearly we have a problem here. Oh, no, sorry, those were the callback trampolines. But test1 from the normal trampoline also segfaults. function_data is NULL. So I'm no further advanced at all, I was just lucky that it worked on Aeon. Damn.

Matthew Exon

Last modified: Sun May 10 21:08:09 CST 2009