Linking against cln fails when built with link-time-optimization
Hello, When the shared library for CLN is built with link-time optimization [1] (LTO) enabled, linking against libcln.so fails (for example GiNaC fails to build). Turning off link-time optimization fixes the linking issue, but I wonder if this is a bug in CLN that needs fixing or a compiler issue and if we can build and use LTO enabled CLN in the future. This issue was first reported [2] against openSUSE Tumbleweed which builds its packages with LTO flags by default; we are currently in the process of working around this by disabling the LTO flags specifically for CLN [3]. Thanks in advance for any suggestion as to how to fix TLO-enabled CLN builds or if indeed disabling LTO is the only way to go (at least for now). I maintain openSUSE's CLN and GiNaC packages. Cheers, [1] https://gcc.gnu.org/wiki/LinkTimeOptimization [2] https://bugzilla.opensuse.org/show_bug.cgi?id=1176710 [3] https://build.opensuse.org/request/show/835441 -- Atri Bhattacharya
Hi! 20.09.2020, 15:11, "Atri Bhattacharya" <badshah400@opensuse.org>:
Hello, When the shared library for CLN is built with link-time optimization [1] (LTO) enabled, linking against libcln.so fails (for example GiNaC fails to build).
Can't reproduce it here with gcc 7.5 and binutils 2.30. git clone git://ginac.de/ginac.git cd ginac git clone -b cmake git://github.com/asheplyakov/cln.git mkdir _build cd _build cmake -GNinja -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_SHARED_LINKER_FLAGS='-flto' -DCMAKE_CXX_FLAGS='-std=c++11 -O2 -g -Wall -flto' -DCMAKE_C_FLAGS='-O2 -g -Wall -flto' .. ninja test_suite ninja test | tail -n 3 100% tests passed, 0 tests failed out of 62 Total Test time (real) = 341.58 sec This builds both CLN and GiNaC and runs test suites (both GiNaC and CLN's ones). I've verified that '-flto' switch has been passed to the compiler and the linker, and noticed those memory hungry 'lto1' processes.
This issue was first reported [2] against openSUSE Tumbleweed which builds its packages with LTO flags by default;
$ g++ example.cpp -lginac
This looks wrong, one should link with both GiNaC and CLN: g++ example.cpp -lginac -lcln (I'm not sure this is the root cause, though) Best regards, Alexey
Hi Alexey, On Sun, 2020-09-20 at 20:35 +0400, Alexey Sheplyakov wrote:
Hi!
20.09.2020, 15:11, "Atri Bhattacharya" <badshah400@opensuse.org>:
Hello, When the shared library for CLN is built with link-time optimization [1] (LTO) enabled, linking against libcln.so fails (for example GiNaC fails to build).
Can't reproduce it here with gcc 7.5 and binutils 2.30.
Thanks for your response. I have set up a test project to reproduce the problem here (and try any suggestions you may have): <https://build.opensuse.org/project/show/home:badshah400:boo1176710> where both cln and ginac are being built with LTO enabled against GCC 10.1 and binutils 2.34. Here is the relevant build failure for ginac (cln tests do pass here as well): --- [ 137s] [ 81%] Linking CXX shared library libginac.so [ 137s] cd /home/abuild/rpmbuild/BUILD/ginac-1.7.11.git20200829/build/ginac && /usr/bin/cmake -E cmake_link_script CMakeFiles/ginac.dir/link.txt --verbose=1 [ 137s] /var/lib/build/ccache/bin/c++ -fPIC -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -Werror=return-type -flto=auto -g -DNDEBUG -O2 -g -DNDEBUG -flto=auto -Wl,--as-needed -Wl,--no-undefined -Wl,-z,now -shared -Wl,-soname,libginac.so.6 -o libginac.so.6.4.5 CMakeFiles/ginac.dir/add.cpp.o CMakeFiles/ginac.dir/archive.cpp.o CMakeFiles/ginac.dir/basic.cpp.o CMakeFiles/ginac.dir/clifford.cpp.o CMakeFiles/ginac.dir/color.cpp.o CMakeFiles/ginac.dir/constant.cpp.o CMakeFiles/ginac.dir/excompiler.cpp.o CMakeFiles/ginac.dir/ex.cpp.o CMakeFiles/ginac.dir/expair.cpp.o CMakeFiles/ginac.dir/expairseq.cpp.o CMakeFiles/ginac.dir/exprseq.cpp.o CMakeFiles/ginac.dir/factor.cpp.o CMakeFiles/ginac.dir/fail.cpp.o CMakeFiles/ginac.dir/fderivative.cpp.o CMakeFiles/ginac.dir/function.cpp.o CMakeFiles/ginac.dir/idx.cpp.o CMakeFiles/ginac.dir/indexed.cpp.o CMakeFiles/ginac.dir/inifcns.cpp.o CMakeFiles/ginac.dir/inifcns_gamma.cpp.o CMakeFiles/ginac.dir/inifcns_nstdsums.cpp.o CMakeFiles/ginac.dir/inifcns_trans.cpp.o CMakeFiles/ginac.dir/integral.cpp.o CMakeFiles/ginac.dir/lst.cpp.o CMakeFiles/ginac.dir/matrix.cpp.o CMakeFiles/ginac.dir/mul.cpp.o CMakeFiles/ginac.dir/ncmul.cpp.o CMakeFiles/ginac.dir/normal.cpp.o CMakeFiles/ginac.dir/numeric.cpp.o CMakeFiles/ginac.dir/operators.cpp.o CMakeFiles/ginac.dir/parser/default_reader.cpp.o CMakeFiles/ginac.dir/parser/lexer.cpp.o CMakeFiles/ginac.dir/parser/parse_binop_rhs.cpp.o CMakeFiles/ginac.dir/parser/parse_context.cpp.o CMakeFiles/ginac.dir/parser/parser_compat.cpp.o CMakeFiles/ginac.dir/parser/parser.cpp.o CMakeFiles/ginac.dir/polynomial/chinrem_gcd.cpp.o CMakeFiles/ginac.dir/polynomial/collect_vargs.cpp.o CMakeFiles/ginac.dir/polynomial/cra_garner.cpp.o CMakeFiles/ginac.dir/polynomial/divide_in_z_p.cpp.o CMakeFiles/ginac.dir/polynomial/gcd_uvar.cpp.o CMakeFiles/ginac.dir/polynomial/mgcd.cpp.o CMakeFiles/ginac.dir/polynomial/mod_gcd.cpp.o CMakeFiles/ginac.dir/polynomial/normalize.cpp.o CMakeFiles/ginac.dir/polynomial/optimal_vars_finder.cpp.o CMakeFiles/ginac.dir/polynomial/pgcd.cpp.o CMakeFiles/ginac.dir/polynomial/primpart_content.cpp.o CMakeFiles/ginac.dir/polynomial/remainder.cpp.o CMakeFiles/ginac.dir/polynomial/upoly_io.cpp.o CMakeFiles/ginac.dir/power.cpp.o CMakeFiles/ginac.dir/print.cpp.o CMakeFiles/ginac.dir/pseries.cpp.o CMakeFiles/ginac.dir/registrar.cpp.o CMakeFiles/ginac.dir/relational.cpp.o CMakeFiles/ginac.dir/remember.cpp.o CMakeFiles/ginac.dir/symbol.cpp.o CMakeFiles/ginac.dir/symmetry.cpp.o CMakeFiles/ginac.dir/tensor.cpp.o CMakeFiles/ginac.dir/utils.cpp.o CMakeFiles/ginac.dir/wildcard.cpp.o /usr/lib64/libcln.so -ldl [ 137s] /usr/lib64/gcc/x86_64-suse-linux/10/../../../../x86_64-suse-linux/bin/ld: /tmp/libginac.so.6.4.5.5CSKjJ.ltrans27.ltrans.o: in function `GiNaC::print_real_number(GiNaC::print_context const&, cln::cl_R const&)': [ 137s] /home/abuild/rpmbuild/BUILD/ginac-1.7.11.git20200829/ginac/numeric.cpp:411: undefined reference to `cln::denominator(cln::cl_RA const&)' [ 137s] /usr/lib64/gcc/x86_64-suse-linux/10/../../../../x86_64-suse-linux/bin/ld: /tmp/libginac.so.6.4.5.5CSKjJ.ltrans27.ltrans.o: in function `GiNaC::print_real_csrc(GiNaC::print_context const&, cln::cl_R const&)': [ 137s] /home/abuild/rpmbuild/BUILD/ginac-1.7.11.git20200829/ginac/numeric.cpp:451: undefined reference to `cln::denominator(cln::cl_RA const&)' [ 137s] /usr/lib64/gcc/x86_64-suse-linux/10/../../../../x86_64-suse-linux/bin/ld: /tmp/libginac.so.6.4.5.5CSKjJ.ltrans27.ltrans.o: in function `GiNaC::numeric::numer() const': [ 137s] /home/abuild/rpmbuild/BUILD/ginac-1.7.11.git20200829/ginac/numeric.cpp:1370: undefined reference to `cln::denominator(cln::cl_RA const&)' [ 137s] /usr/lib64/gcc/x86_64-suse-linux/10/../../../../x86_64-suse-linux/bin/ld: /home/abuild/rpmbuild/BUILD/ginac-1.7.11.git20200829/ginac/numeric.cpp:1374: undefined reference to `cln::denominator(cln::cl_RA const&)' [ 137s] /usr/lib64/gcc/x86_64-suse-linux/10/../../../../x86_64-suse-linux/bin/ld: /home/abuild/rpmbuild/BUILD/ginac-1.7.11.git20200829/ginac/numeric.cpp:1374: undefined reference to `cln::denominator(cln::cl_RA const&)' [ 137s] /usr/lib64/gcc/x86_64-suse-linux/10/../../../../x86_64-suse-linux/bin/ld: /tmp/libginac.so.6.4.5.5CSKjJ.ltrans27.ltrans.o:/home/abuild/rpmbuild/BUILD/ginac-1.7.11.git20200829/ginac/numeric.cpp:1376: more undefined references to `cln::denominator(cln::cl_RA const&)' follow [ 137s] collect2: error: ld returned 1 exit status [ 137s] make[2]: *** [ginac/CMakeFiles/ginac.dir/build.make:989: ginac/libginac.so.6.4.5] Error 1 [ 137s] make[2]: Leaving directory '/home/abuild/rpmbuild/BUILD/ginac-1.7.11.git20200829/build' [ 137s] make[1]: *** [CMakeFiles/Makefile2:598: ginac/CMakeFiles/ginac.dir/all] Error 2 [ 137s] make: *** [Makefile:163: all] Error 2 --- Turning off LTO for cln fixes this linking issue (as noted in the original bug report). Perhaps it is a GCC regression or something, but I am merely guessing as opposed to really understanding the issue. Full build log (for ginac) here: <https://build.opensuse.org/package/live_build_log/home:badshah400:boo1176710/ginac/openSUSE_Tumbleweed/x86_64> Thanks again for looking into this and best wishes. -- Atri Bhattacharya
On Sun, 2020-09-20 at 13:11 +0200, Atri Bhattacharya wrote:
Hello, When the shared library for CLN is built with link-time optimization [1] (LTO) enabled, linking against libcln.so fails (for example GiNaC fails to build). Turning off link-time optimization fixes the linking issue, but I wonder if this is a bug in CLN that needs fixing or a compiler issue and if we can build and use LTO enabled CLN in the future.
This issue was first reported [2] against openSUSE Tumbleweed which builds its packages with LTO flags by default; we are currently in the process of working around this by disabling the LTO flags specifically for CLN [3].
Thanks in advance for any suggestion as to how to fix TLO-enabled CLN builds or if indeed disabling LTO is the only way to go (at least for now). I maintain openSUSE's CLN and GiNaC packages.
To follow up on this, we found that the problem was us building the cln library with "-fvisibility-inlines-hidden" in combination with LTO. Turning off the "-fvisibility-inlines-hidden" flag has fixed this issue (even with LTO enabled). Please see the original bug report for a discussion: <https://bugzilla.opensuse.org/show_bug.cgi?id=1176710> if interested. Thanks in particular to Alexei for testing and commenting on my initial post. Cheers, -- Atri Bhattacharya
Hello, 22.09.2020, 03:53, "Atri Bhattacharya" <badshah400@opensuse.org>:
To follow up on this, we found that the problem was us building the cln library with "-fvisibility-inlines-hidden" in combination with LTO. Turning off the "-fvisibility-inlines-hidden" flag has fixed this issue (even with LTO enabled).
Not exactly. The problem is *us* breaking the standard (ISO C++98) for performance reasons. Consider include/cln/rational.h 19 20 extern const cl_I denominator (const cl_RA& r); 21 and src/rational/cl_RA.h 149 inline const cl_I denominator (const cl_RA& r) 150 { 151 if (integerp(r)) 152 return 1; 153 else 154 return TheRatio(r)->denominator; 155 } Notice that the same function is declared non-inline in the public header, however it's implemented as inline. That's because we don't want to expose too many implementation details, and yet we want to inline that function in CLN itself. With '-fvisibility-inlines-hidden' the compiler can assume that those inline versions are not used outside of the CLN. Thus if LTO is enabled the compiler can find out all call sites, inline the function eliminate the out-of-the-line copy. As a result `cln::denominator` won't be exported from libcln.so ISO C++ 98 standard demands (in 7.1.2.4) that "If a function with external linkage is declared inline in one translation unit, it shall be declared inline in all translation units in which it appears; no diagnostic is required." So our approach (having an inline and non-inline function with the same name and argument(s)) violates the standard and has caused quite a number of linking failures in the past. The issue has been partially addressed by commit ce250e91fb8d16bc6b35be0add1896fc64f31ec1. However there are still lots of non-conforming code, in particular cln::denominator. In general the need to inline a function in some places and not in others came from three situations: 1) Some functions, like cl_SF_zerop or cl_FF_zerop, are just a single machine instruction when inlined. Putting their definitions into a public header would expose too many internals of the library, leading violation of abstraction and increased compilation times. Still, it would be nice to use the inline version of these functions in the library itself. 2) Some functions, like cl_{SF,FF,DF,LF}_idecode, are usually only invoked through a dispatcher cl_F_idecode that does nothing but dispatch the call to the right function. Here inlining is used, regardless of the size of the inlined functions, because it removes one function call from the chain of function calls. A compiler cannot know that this caller is the main caller for the 4 inlined functions (unless LTO is used) 3) Similarly, cl_I_from_NDS would be a bottleneck if not inlined: every creation of a new cl_I goes through this function. A compiler cannot know a priori the bottlenecks (unless LTO is used) To solve the problem properly we should either 1) Stop playing tricks and use the LTO. The bug report being discussed kind of proves that LTO actually works these days. 2) Introduce `denominator_inline` and `denominator` and use the former in CLN itself (similarly to ce250e91fb8d16bc6b35be0add1896fc64f31ec1). Best regards, Alexey
Hi Alexey,
The problem is *us* breaking the standard (ISO C++98) for performance reasons. ... To solve the problem properly we should either
1) Stop playing tricks and use the LTO. The bug report being discussed kind of proves that LTO actually works these days. 2) Introduce `denominator_inline` and `denominator` and use the former in CLN itself (similarly to ce250e91fb8d16bc6b35be0add1896fc64f31ec1).
Thank you for this analysis; it's very clear. Yes, when possible, we want to avoid a function call for something that is just one or two instructions. I vote for 2), using the technique that we already use for zerop_inline and minusp_inline. Why? Because LTO is an undocumented implementation technique. 5 or 10 years from now, LTO may be out and JIT compilation may be the standard technique.(*) CLN relied another undocumented implementation technique in the past: the per-file static constructor functions (for CL_REQUIRE, CL_PROVIDE). It was a big mistake and required major efforts to fix. (Thank you for these efforts!!) Relying on LTO would likely bring similar problems a couple of years from now. Bruno
On 27.09.20 00:09, Bruno Haible wrote:
The problem is *us* breaking the standard (ISO C++98) for performance reasons. ... To solve the problem properly we should either
1) Stop playing tricks and use the LTO. The bug report being discussed kind of proves that LTO actually works these days. 2) Introduce `denominator_inline` and `denominator` and use the former in CLN itself (similarly to ce250e91fb8d16bc6b35be0add1896fc64f31ec1).
Thank you for this analysis; it's very clear. Yes, when possible, we want to avoid a function call for something that is just one or two instructions.
I vote for 2), using the technique that we already use for zerop_inline and minusp_inline.
Do we? Header file include/cln/rational.h declares namespace cln { extern bool zerop(const cl_RA&x); } and then src/rational/cl_RA.h defines that function as inline. Then, src/rational/cl_RA_zerop.cc tries to force compilation of a text symbol for this function. And likewise for minusp(const cl_RA &). How does this satisfy the standard? -richy. -- Richard B. Kreckel <https://in.terlu.de/~kreckel/>
Hi Bruno, Alexey, On 27.09.20 00:09, Bruno Haible wrote:
The problem is *us* breaking the standard (ISO C++98) for performance reasons. ... To solve the problem properly we should either
1) Stop playing tricks and use the LTO. The bug report being discussed kind of proves that LTO actually works these days. 2) Introduce `denominator_inline` and `denominator` and use the former in CLN itself (similarly to ce250e91fb8d16bc6b35be0add1896fc64f31ec1).
Thank you for this analysis; it's very clear. Yes, when possible, we want to avoid a function call for something that is just one or two instructions.
I vote for 2), using the technique that we already use for zerop_inline and minusp_inline.
Why? Because LTO is an undocumented implementation technique. 5 or 10 years from now, LTO may be out and JIT compilation may be the standard technique.(*)
CLN relied another undocumented implementation technique in the past: the per-file static constructor functions (for CL_REQUIRE, CL_PROVIDE). It was a big mistake and required major efforts to fix. (Thank you for these efforts!!)
Relying on LTO would likely bring similar problems a couple of years from now. Okay, okay, we're all on the same page.
The case of numerator(cl_RA) and denominator(cl_RA) may not have been an ideal example: These functions aren't used at all internally. Instead, we always use the inline and private functions numerator(cl_RT) and denominator(cl_RT) because we handle the integer case first and do a DeclareType(cl_RT, x) then. So, I'm now uploading a patch to remove the inline versions of these two functions. But that's just the beginning. -richy. -- Richard B. Kreckel <https://in.terlu.de/~kreckel/>
Hi Bruno, 27.09.2020, 02:09, "Bruno Haible" <bruno@clisp.org>:
The problem is *us* breaking the standard (ISO C++98) for performance reasons. ... To solve the problem properly we should either
1) Stop playing tricks and use the LTO. The bug report being discussed kind of proves that LTO actually works these days. 2) Introduce `denominator_inline` and `denominator` and use the former in CLN itself (similarly to ce250e91fb8d16bc6b35be0add1896fc64f31ec1).
Thank you for this analysis; it's very clear. Yes, when possible, we want to avoid a function call for something that is just one or two instructions.
I vote for 2), using the technique that we already use for zerop_inline and minusp_inline.
Why? Because LTO is an undocumented implementation technique. 5 or 10 years from now, LTO may be out and JIT compilation may be the standard technique.(*)
I strongly disagree. 1. Cross-module/whole program analysis (which is a proper name for LTO) is a well established optimization method. And it's exactly the right way for the compiler to know that a) cl_{SF,FF,DF,LF}_idecode are called mostly by cl_F_idecode (so it's a good idea to inline those 4 functions) b) every cl_I is created through cl_I_from_NDS (so it's a good idea to inline cl_I_from_NDS) etc. 2. Most (all?) major C++ compilers (GCC, clang, msvc) provide cross-module/whole program analysis. In fact it has been available for (almost) two decades. LTO branch has been merged into GCC trunk back in 2009. Link time code generation (which is msvc name for cross-module analysis) has been available since visual studio 7.0 (released in 2002). Therefore I prefer the "Stop playing tricks and use the LTO" solution.
5 or 10 years from now, LTO may be out and JIT compilation may be the standard technique. Relying on LTO would likely bring similar problems a couple of years from now.
Firstly any reasonably efficient piece of the software should be adapted for the new hardware and compilers. As long as the software is useful there will be someone to update/fix it (cf. CL_REQUIRE/CL_PROVIDE story). Secondly I don't think cross-module analysis is going to disappear. These days there are quite a number of large (C++) codebases (think of Chrome, clang, etc), and without the cross-module analysis such a code quickly gets either too messy or too slow. I think it's more likely for C++ itself to disappear in 10..20 years. Best regards, Alexey
Hi Alexey, On 21.10.20 10:45, Alexey Sheplyakov wrote:
1. Cross-module/whole program analysis (which is a proper name for LTO) is a well established optimization method. And it's exactly the right way for the compiler to know that
a) cl_{SF,FF,DF,LF}_idecode are called mostly by cl_F_idecode (so it's a good idea to inline those 4 functions) b) every cl_I is created through cl_I_from_NDS (so it's a good idea to inline cl_I_from_NDS)
etc.
2. Most (all?) major C++ compilers (GCC, clang, msvc) provide cross-module/whole program analysis. In fact it has been available for (almost) two decades. LTO branch has been merged into GCC trunk back in 2009. Link time code generation (which is msvc name for cross-module analysis) has been available since visual studio 7.0 (released in 2002).
Therefore I prefer the "Stop playing tricks and use the LTO" solution.
We need evidence, please, and benchmarks! I tried compiling CLN with -flto added to CXXFLAGS and it turned out my machine has too little memory (it has 32GiB). It appears to me this isn't quite ready for showtime. -richy. -- Richard B. Kreckel <https://in.terlu.de/~kreckel/>
participants (4)
-
Alexey Sheplyakov
-
Atri Bhattacharya
-
Bruno Haible
-
Richard B. Kreckel