segfaults on several non-x86 architectures

Librairie C++ de calcul formel/ C++ symbolic computation library

Modérateur : xcasadmin

infinity0
Messages : 36
Inscription : dim. févr. 05, 2017 5:46 pm

Re: segfaults on several non-x86 architectures

Message par infinity0 » dim. juin 25, 2017 7:41 pm

All 30 tests pass for me on a ppc64el machine, without needing the flag I just mentioned. Thanks for the fix!

parisse
Messages : 5734
Inscription : mar. déc. 20, 2005 4:02 pm
Contact :

Re: segfaults on several non-x86 architectures

Message par parisse » dim. juin 25, 2017 7:57 pm

Great!

infinity0
Messages : 36
Inscription : dim. févr. 05, 2017 5:46 pm

Re: segfaults on several non-x86 architectures

Message par infinity0 » mar. juil. 04, 2017 5:15 pm

Unfortunately, it seems that something changed between 1.2.3-47 and 1.2.3-53 so that it segfaults on arm64 again. (ppc64el still works)

All tests fail in this build log, however I can reproduce the segfault just by running src/icas (edited to run gdb) after the build:

Code : Tout sélectionner

(sid_arm64-dchroot)infinity0@asachi:~/giac$ src/icas
Reading symbols from /home/infinity0/giac/src/.libs/lt-icas...done.
(gdb) run
Starting program: /home/infinity0/giac/src/.libs/lt-icas
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
giac::gen::~gen (this=0xffffaaaaaac51550, __in_chrg=<optimized out>) at gen.h:659
659           if ( type>_DOUBLE_ && type!=_FLOAT_
(gdb) bt
#0  giac::gen::~gen (this=0xffffaaaaaac51550, __in_chrg=<optimized out>) at gen.h:659
#1  0x0000ffffb7d50394 in giac::symbolic::~symbolic (this=0xffffaaaaaac51548, __in_chrg=<optimized out>) at gen.h:1419
#2  giac::ref_symbolic::~ref_symbolic (this=0xffffaaaaaac51540, __in_chrg=<optimized out>) at gen.h:1448
#3  giac::gen::delete_gen (this=<optimized out>) at gen.cc:1417
#4  0x0000ffffb7571650 in __static_initialization_and_destruction_0 (__priority=65535, __initialize_p=1) at prog.cc:9240
#5  0x0000ffffb7fdfb20 in ?? () from /lib/ld-linux-aarch64.so.1
#6  0x0000ffffb8000198 in ?? ()
Backtrace stopped: not enough registers or memory available to unwind further
(gdb) p $_siginfo._sifields._sigfault.si_addr
$1 = (void *) 0xffaaaaaac51550
(gdb) info inferiors
  Num  Description       Executable
* 1    process 21921     /home/infinity0/giac/src/.libs/lt-icas
(gdb) shell grep stack /proc/21921/maps
fffffffdf000-1000000000000 rw-p 00000000 00:00 0                         [stack]
(gdb) shell cat /proc/21921/maps                                                                                           
aaaaaaaaa000-aaaaaabf7000 r-xp 00000000 fe:00 16787964                   /home/infinity0/giac/src/.libs/lt-icas            
aaaaaac07000-aaaaaac0a000 r--p 0014d000 fe:00 16787964                   /home/infinity0/giac/src/.libs/lt-icas           
aaaaaac0a000-aaaaaac12000 rw-p 00150000 fe:00 16787964                   /home/infinity0/giac/src/.libs/lt-icas           
aaaaaac12000-aaaaaac6b000 rw-p 00000000 00:00 0                          [heap]                                           
ffffb5b6c000-ffffb5b73000 rw-p 00000000 00:00 0                                                                           
ffffb5b73000-ffffb5b84000 r-xp 00000000 fe:00 17569526                   /lib/aarch64-linux-gnu/libbsd.so.0.8.5         
[.. etc ..]
ffffb73fb000-ffffb7f6f000 r-xp 00000000 fe:00 16787949                   /home/infinity0/giac/src/.libs/libgiac.so.0.0.0
ffffb7f6f000-ffffb7f7e000 ---p 00b74000 fe:00 16787949                   /home/infinity0/giac/src/.libs/libgiac.so.0.0.0
ffffb7f7e000-ffffb7fa5000 r--p 00b73000 fe:00 16787949                   /home/infinity0/giac/src/.libs/libgiac.so.0.0.0
ffffb7fa5000-ffffb7fb0000 rw-p 00b9a000 fe:00 16787949                   /home/infinity0/giac/src/.libs/libgiac.so.0.0.0
ffffb7fb0000-ffffb7fd2000 rw-p 00000000 00:00 0
ffffb7fd2000-ffffb7fee000 r-xp 00000000 fe:00 17568572                   /lib/aarch64-linux-gnu/ld-2.24.so
ffffb7fef000-ffffb7ff1000 rw-p 00000000 00:00 0
ffffb7ff9000-ffffb7ffc000 rw-p 00000000 00:00 0
ffffb7ffc000-ffffb7ffd000 r--p 00000000 00:00 0                          [vvar]
ffffb7ffd000-ffffb7ffe000 r-xp 00000000 00:00 0                          [vdso]
ffffb7ffe000-ffffb7fff000 r--p 0001c000 fe:00 17568572                   /lib/aarch64-linux-gnu/ld-2.24.so
ffffb7fff000-ffffb8001000 rw-p 0001d000 fe:00 17568572                   /lib/aarch64-linux-gnu/ld-2.24.so
fffffffdf000-1000000000000 rw-p 00000000 00:00 0                         [stack]
I am also running this in a very recent Linux kernel with the stack-clash security patches. The main difference is that the stack guard is now 1MB instead of 4KB as before. This has caused problems elsewhere, I don't think it is causing giac's problems here, but I thought I'd mention it in case it sounds familiar to you.

infinity0
Messages : 36
Inscription : dim. févr. 05, 2017 5:46 pm

Re: segfaults on several non-x86 architectures

Message par infinity0 » mar. juil. 04, 2017 7:38 pm

Unfortunately, I also get segfaults with 1.2.3-47 today. However 10 days ago, I experienced a clean test run for 1.2.3-51 on the sagemath trac.

During that time, none of xcas' dependencies were updated in Debian. It may be that there is an issue with giac in this newer kernel, with the stack-clash security fixes. Are you doing anything special with the stack?

infinity0
Messages : 36
Inscription : dim. févr. 05, 2017 5:46 pm

Re: segfaults on several non-x86 architectures

Message par infinity0 » mar. juil. 04, 2017 7:50 pm

Ah, also the kernel was upgraded from 3.16.43 to 4.9.30. (The new Debian stable, "stretch" that was released a few weeks ago.) Perhaps you could reproduce this yourself in QEMU, with Debian stretch.

parisse
Messages : 5734
Inscription : mar. déc. 20, 2005 4:02 pm
Contact :

Re: segfaults on several non-x86 architectures

Message par parisse » mer. juil. 05, 2017 4:43 am

I don't play with ths stack. The only change I made recently that might be significative was:

Code : Tout sélectionner

27c27
< #define x86_64 1
---
> #define x86_64
However, I don't believe it's related.

infinity0
Messages : 36
Inscription : dim. févr. 05, 2017 5:46 pm

Re: segfaults on several non-x86 architectures

Message par infinity0 » jeu. juil. 20, 2017 12:50 pm

Someone on the Debian bug tracker has a fix for this issue. I've confirmed that it works. However, there is one more problem that I need to figure out before I can upload it.

On Debian we have to patch the expected output of chk_fhan16 like this:

Code : Tout sélectionner

--- a/check/TP16-sol.cas.out1
+++ b/check/TP16-sol.cas.out1
@@ -48,7 +48,7 @@
 "Done",
 [-sqrt(13)-1,sqrt(13)-1,4],
 y^2+6*sqrt(13)+18,y^2-6*sqrt(13)+18,y^2,
--sqrt(6)*I*sqrt(sqrt(13)+3),sqrt(6)*I*sqrt(sqrt(13)+3),-sqrt(6)*sqrt(sqrt(13)-3),sqrt(6)*sqrt(sqrt(13)-3),0,
+sqrt(6)*I*sqrt(sqrt(13)+3),-sqrt(6)*I*sqrt(sqrt(13)+3),-sqrt(6)*sqrt(sqrt(13)-3),sqrt(6)*sqrt(sqrt(13)-3),0,
 "No such variable u",
 x^2+1/4*y^2+1/9*z^2-1,
 x^2+y^2+z^2-u,
This is necessary on all platforms where giac builds successfully, see here - none of the failures are to do with this "sqrt" unexpected test output, therefore the patch works (and is needed) on all those platforms.

However, if I apply Edmund's patch to fix build errors on arm64 (aka armv8, aka aarch64), I have to unpatch my chk_fhan16 test patch. But this patch has to be retained on amd64 for the test to pass. (I had assumed it was because Debian was using a different version of PARI from giac, but the fact that it's dependent on the machine architecture now makes me believe otherwise.)

Do you have any ideas why the test is giving a very-slightly different output on the two architectures? If so, we could fix the expectation "properly". If not, then I would have to add some logic like "succeed if there is no diff with EITHER TP16-sol.cas.out1 or TP16-sol.cas.out2", where this second file is the same as the first one but differs by the patch I just gave.

The other tests are all OK.

infinity0
Messages : 36
Inscription : dim. févr. 05, 2017 5:46 pm

Re: segfaults on several non-x86 architectures

Message par infinity0 » jeu. juil. 20, 2017 1:08 pm

Another strange fact is that before (presumably when running the arm64 build on a 39-bit virtual address kernel) I did still have to retain that patch, and this gave me a clean test run. So this output difference might be related to these 39-bit vs 48-bit pointers, somehow.

parisse
Messages : 5734
Inscription : mar. déc. 20, 2005 4:02 pm
Contact :

Re: segfaults on several non-x86 architectures

Message par parisse » jeu. juil. 20, 2017 3:43 pm

Great! I have incorporated your fix longlong -> ulonglong by hand in alg_ext.cc, gen.cc, vecteur.cc, modpoly.cc, global.cc and gen.h.
https://dev.geogebra.org/trac/changeset/55367/
The difference in fhan16 is just a sorting issue, and it might be related to the way objects are loaded by ld (sorting for some kinds of objects is using the pointer address).

infinity0
Messages : 36
Inscription : dim. févr. 05, 2017 5:46 pm

Re: segfaults on several non-x86 architectures

Message par infinity0 » ven. juil. 21, 2017 2:18 pm

Thanks! OK, in that case I will patch the test to expect two possible outputs - something like

Code : Tout sélectionner

#! /bin/sh
unset LANG
../src/icas TP16-sol.cas > TP16.tst
diff TP16.tst TP16-sol.cas.out1 || diff TP16.tst TP16-sol.cas.out2
It is still a bit weird why the difference only occurs on arm64 now - I have always had to reverse the first two elements myself, on my own machines and on Debian's machines. However if you are happy to loosen the test, I'm happy and I'll forget trying to chase this issue down further.

infinity0
Messages : 36
Inscription : dim. févr. 05, 2017 5:46 pm

Re: segfaults on several non-x86 architectures

Message par infinity0 » ven. juil. 21, 2017 3:04 pm

By the way I think you missed patching some files in your changeset. Equation.cc is not in trac, but the other two files are, sym2poly.cc and usual.cc.

parisse
Messages : 5734
Inscription : mar. déc. 20, 2005 4:02 pm
Contact :

Re: segfaults on several non-x86 architectures

Message par parisse » ven. juil. 21, 2017 4:47 pm

Indeed, thanks!

Répondre