Hi all, This is gonna be long. If you just want to know what you soon need to do to your source files and are not interested in why, please jump to the end of this email. The usual way of implementing polymorphic methods accepting ex arguments in GiNaC involves checking the arguments for their types. This results in switch-like statements of the form ex mul::somemethod(const ex & other) { if (is_ex_of_type(add)) { // do this... } else if (is_ex_of_type(mul)) { // do that... } else { // do something else... } } Okay, language lawyers (*) usually construe those switch statements as bad. But since C++ does not have generic multiple dispatch the usual answer is that they offer some variation of home-grown double dispatch, maybe by bloating the add and mul classes with overloaded methods like this: class basic { // ... virtual ex somemethod(const add & other); virtual ex somemethod(const mul & other); ex somemethod(const ex & other); }; // same for add and mul mul::somemethod(const ex & other) { return ex.bp.somemethod(*this); } So the proper implementation is called by two subsequent lookups in the virtual function table. Nothing prohibits us from doing it this way in GiNaC but for the objects we are dealing with in a CAS so far we have always chosen the switch way of implementing and found this much more accessible and readable. Just consider the various print functions that recently were changed to accept an object of type print_context or derived to format the output properly. The different ways need to be implemented somewhere and why not deal with all of them in foo::print(). What is this guy ranting about?!? Who cares? Cool, if you don't care. The only thing that really bothered me so far is the use of macros at this place. The definition of is_of_type(obj, type) by a macro 1) lives outside the namespace and may collide some time, 2) accepts all kind of funny arguments with barely and possibility for compile-time checking, 3) does not allow overloading, so there is an is_of_type and another is_ex_of_type with the same semantics and 4) is generally crap with respect to readabilty and makes steam come out of the ears of language lawyers. A better approach would be to use a template here. We could write is_a<numeric>(foo) where foo is either something derived from basic or an ex and it will produce the expected outcome. There will of course also be an is_exactly_a<tensor>(bar) matching only tensors and not classes derived from tensor. We can implement it in exactly the same way as the macros were implmented, for instance like this: template <class T> inline bool is_a(const basic & obj) { return dynamic_cast<const T *>(&obj)!=0; } template <class T> inline bool is_a(const ex & obj) { return is_a<T>(*obj.bp); } The only cause for concern is template <class T> inline bool is_exactly_a(const class basic & obj) { const T foo; return foo.tinfo()==obj.tinfo(); } because it has to allocate a temporary. But this is not a big deal, since we are allowed to specify template specializations, for instance in file add.h we would write down template<> inline bool is_exactly_a<add>(const basic & obj) { return obj.tinfo()==TINFO_add; } giving us all the performance back. We all know that "An Inline Function is As Fast As a Macro" (an actual section title in GCC's info page). So we should do it. Now. Surprise! The inliner in GCC-2.95 does some very poor job at flow analysis inside if statements when inlined functions return some boolean (or integer, no matter). Consider this code: struct ABC { virtual ~ABC() {} }; struct A : public ABC {}; template <class T> inline bool is_a<T>(const ABC & obj) { return (dynamic_cast<const T *>(&obj)!=0); } #define is_of_type(OBJ,TYPE) \ (dynamic_cast<const TYPE *>(&OBJ)!=0) #ifdef USE_MACRO int test(const ABC & e) { if (is_of_type(e,A)) return 1; return 0; } #else int test(const ABC & e) { if (is_a<A>(e)) return 1; return 0; } #endif The compiler generates in the case where USE_MACRO is defined at preprocessing level: 00000000 <test(ABC const &)>: 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 83 ec 08 sub $0x8,%esp 6: 8b 45 08 mov 0x8(%ebp),%eax 9: 85 c0 test %eax,%eax b: 74 28 je 35 <test(ABC const &)+0x35> d: 83 c4 f8 add $0xfffffff8,%esp 10: 50 push %eax 11: 68 00 00 00 00 push $0x0 12: R_386_32 ABC type_info function 16: 8b 10 mov (%eax),%edx 18: 03 02 add (%edx),%eax 1a: 50 push %eax 1b: 6a 01 push $0x1 1d: 68 00 00 00 00 push $0x0 1e: R_386_32 A type_info function 22: ff 72 04 pushl 0x4(%edx) 25: e8 fc ff ff ff call 26 <test(ABC const &)+0x26> 26: R_386_PC32 __dynamic_cast 2a: 85 c0 test %eax,%eax 2c: 74 07 je 35 <test(ABC const &)+0x35> 2e: b8 01 00 00 00 mov $0x1,%eax 33: eb 02 jmp 37 <test(ABC const &)+0x37> 35: 31 c0 xor %eax,%eax 37: c9 leave 38: c3 ret while in the inline case it generates this entirely contorted code: 00000000 <test(ABC const &)>: 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 83 ec 08 sub $0x8,%esp 6: 8b 45 08 mov 0x8(%ebp),%eax 9: 85 c0 test %eax,%eax b: 74 24 je 31 <test(ABC const &)+0x31> d: 83 c4 f8 add $0xfffffff8,%esp 10: 50 push %eax 11: 68 00 00 00 00 push $0x0 12: R_386_32 ABC type_info function 16: 8b 10 mov (%eax),%edx 18: 03 02 add (%edx),%eax 1a: 50 push %eax 1b: 6a 01 push $0x1 1d: 68 00 00 00 00 push $0x0 1e: R_386_32 A type_info function 22: ff 72 04 pushl 0x4(%edx) 25: e8 fc ff ff ff call 26 <test(ABC const &)+0x26> 26: R_386_PC32 __dynamic_cast 2a: 85 c0 test %eax,%eax 2c: 0f 95 c0 setne %al 2f: eb 02 jmp 33 <test(ABC const &)+0x33> 31: b0 00 mov $0x0,%al 33: 84 c0 test %al,%al 35: 75 09 jne 40 <test(ABC const &)+0x40> 37: 31 c0 xor %eax,%eax 39: eb 0a jmp 45 <test(ABC const &)+0x45> 3b: 90 nop 3c: 8d 74 26 00 lea 0x0(%esi,1),%esi 40: b8 01 00 00 00 mov $0x1,%eax 45: c9 leave 46: c3 ret This turned out to be the performance hammer of about 25% that I saw when I first tried substituting the macros by templates. And it also turns out that the new GCC-3.0 produces better and roughly equivalent code in both cases, the templated one being even slightly superior as far as I can see. So, this is what we'll do: (Remember that GCC-3.0 is going to be released today.) In all cases the old macros will be supplemented by the template functions and specializations for is_exactly_a<>() for all critical cases. Inside the library we'll stick with the macros for some while in the time critical parts until GCC-3.0 catches on. (BTW: GCC-3.0 produces code that is roughly 10%-30% faster at the GiNaC benchmarks. Rejoice and upgrade!) Eventually these macros will be entirely phased out. I just finished checking the changes in to CVS. In my applications I used a Perl script to convert from the macros to the new inline template functions which is supplied herewith WITHOUT ANY WARRANTY WHATSOEVER. It actually works for converting the GiNaC library but you should of course make a backup first. Once GiNaC 0.9.1 rolls out (or if you are running from CVS) please apply this converter to your source files. ----------8<------------------8<---------------------8<----------------- #! /usr/bin/perl -w my $file; my $tmpfile; if ($file = $ARGV[0]) { print STDERR "replacing in file $file\n"; $tmpfile = "tmp${file}tmp"; } else { print STDERR "*** no file given\n"; exit; } open (CPPFILE, $file) or die "Can't open source file: $!\n"; open (TMPFILE, "> $tmpfile") or die "Can't open temporary file: $!\n"; while ($_ = <CPPFILE>) { # is_exactly_of_type(foo,bar) -> is_exactly_a<bar>(foo) s/is_exactly_of_type\(([\*\.a-zA-Z_0-9\(\)\[\]\+\-\>]+)[, ]+([a-zA-Z_0-9]+)\)/is_exactly_a\<$2\>\($1\)/g; # is_ex_exactly_of_type(foo,bar) -> is_exactly_a<bar>(foo) s/is_ex_exactly_of_type\(([\*\.a-zA-Z_0-9\(\)\[\]\+\-\>]+)[, ]+([a-zA-Z_0-9]+)\)/is_exactly_a\<$2\>\($1\)/g; # is_of_type(foo,bar) -> is_a<bar>(foo) s/is_of_type\(([\*\.a-zA-Z_0-9\(\)\[\]\+\-\>]+)[, ]+([a-zA-Z_0-9]+)\)/is_a\<$2\>\($1\)/g; # is_ex_of_type(foo,bar) -> is_a<bar>(foo) s/is_ex_of_type\(([\*\.a-zA-Z_0-9\(\)\[\]\+\-\>]+)[, ]+([a-zA-Z_0-9]+)\)/is_a\<$2\>\($1\)/g; print TMPFILE "$_"; } close CPPFILE; close TMPFILE; `mv $tmpfile $file`; ---------->8------------------>8--------------------->8----------------- Regards -richy. (*) "The first thing we do, let's kill all the language lawyers." Henry VI, part II, taken from TC++PL-3, chapt 2. -- Richard Kreckel <Richard.Kreckel@Uni-Mainz.DE> <http://wwwthep.physik.uni-mainz.de/~kreckel/>
participants (1)
-
Richard B. Kreckel