Hello, recently I was generating C-style output from GiNaC expressions and ran into the following problems: - the compiled code was inefficient, as common subexpressions are calculated over and over again, - the compiler didn't really appreciate single lines with about one million or more characters. On the mailing list I found that the issue of long lines was already discussed in July 2003, but it seems that this discussion did not lead to additional functionality of GiNaC. To overcome these problems, I wrote a routine which generates optimized C code. This routine - substitutes every subexpression, which occurs more than once with a temporary variable. - splits long expressions into several lines. As this could be of use to other people as well, it could be included into GiNaC. The source files are attached to this mail. An example for this routine would be the following code fragment symbol x("x"), y("y"); ex f = pow(sin(x+y),2)+sin(x+y)+log(sin(x+y)); optimized_C_code(std::cout, std::string("double"), std::string("tmp"), f); which prints double tmp1 = sin(y+x); double tmp0 = tmp1+(tmp1*tmp1)+log(tmp1); to standard output. Best wishes, Stefan
Hello, On Tue, Mar 20, 2007 at 01:40:22PM +0100, Stefan Weinzierl wrote:
recently I was generating C-style output from GiNaC expressions and ran into the following problems:
- the compiled code was inefficient, as common subexpressions are calculated over and over again,
Fetching something from memory is quite an expensive operation on modern (and not so modern) hardware, so calculating simple subexpressions over and over again *can* be much faster. For instance, consider common subexpressions of the form (x+y), (x+2*y), ... (x+1000000*y). Evaluation of such an subexpression takes just few cycles, so it would be really inefficient (in terms of both calculation time and memory) to use 10^6 variables to store their values. On the other hand, there are definitely a lot of cases when it is more efficient to calculate the subexpression(s) only once. Deciding if subexpression(s) are complicated or not is a very complicated problem. GiNaC does not even try to solve it (GiNaC is Not A C compiler).
- the compiler didn't really appreciate single lines with about one million or more characters.
If the generated code size is _that_ big, direct calculation with .evalf() might be much faster. In fact, I often do the inverse thing: use GiNaC for numeric evaluation of lenghthy (>= 100Mb) expressions. No C/C++/whatever compiler is able to process them...
On the mailing list I found that the issue of long lines was already discussed in July 2003, but it seems that this discussion did not lead to additional functionality of GiNaC. To overcome these problems, I wrote a routine which generates optimized C code. This routine
- substitutes every subexpression, which occurs more than once with a temporary variable.
First of all, scanning for common subexpressions is _very_ expansive, as any pattern matching is. Secondly, generated code is not really optimal (as I explained before).
- splits long expressions into several lines.
IMHO this is good idea. However I don't like your implementation at all. (That said, I used to pipe GiNaC output through `fold -w 78')
As this could be of use to other people as well, it could be included into GiNaC. The source files are attached to this mail.
An example for this routine would be the following code fragment
symbol x("x"), y("y"); ex f = pow(sin(x+y),2)+sin(x+y)+log(sin(x+y)); optimized_C_code(std::cout, std::string("double"), std::string("tmp"), f);
which prints
double tmp1 = sin(y+x); double tmp0 = tmp1+(tmp1*tmp1)+log(tmp1);
to standard output.
Could you please give less trivial examples? I don't like the idea at all (that is just IMHO). But the implementation itself is far from perfect.
/** @file code_generation.h * * Interface to code_generation * */
#ifndef __GINAC_CODE_GENERATION_H__ #define __GINAC_CODE_GENERATION_H__
#include <iostream> #include <string>
#include "ex.h"
namespace GiNaC {
It would be nice to have several functions for each task -- e.g. one which transforms the expression to the "optimal" form, one which converts it into exmap (e.g. splits it), one which prints it... And the function for "optimizing" should probably take a hint from user (he might knew about some sort of subexpressions in advance). E.g. const ex heuristic_collect(const ex& e, exmap& subexprs);
void optimized_C_code(std::ostream & os, const std::string & T, const std::string & temp_var_base_name, ex expr);
const ex& expr?
} // namespace GiNaC
#endif // ndef __GINAC_CODE_GENERATION_H__
Content-Description: cpp file
/** @file code_generation.cc * * Implementation of code_generation * */
#include <cstddef> #include <vector> #include <sstream>
#include "ginac.h" #include "code_generation.h"
namespace GiNaC {
// anonymous namespace for helper functions namespace {
/** * * Helper function, which converts an integer into a string * */ std::string itos(int arg) { std::ostringstream buffer; buffer << arg; return buffer.str(); }
/** * * Checks if an expression can potentially be substituted. * * At present, expressions which are considered for substitution are of the * following types: * add, mul, power or function. * */ bool check_substitution_type(ex expr)
^^ You'd better use const ex& here
{ if ( is_a<add>(expr) || is_a<mul>(expr) || is_a<power>(expr) || is_a<function>(expr) ) return true;
return false; }
/** * * This function writes a C statement of the form * T base_namexxx = expr * to the stream. * * If the expression contains a subexpression with a large number of operands, the output is split * into smaller subexpressions.
The number of operands is not really good as code complexity metrics.
* */ void write_line(std::ostream & os, const std::string & T, const std::string & base_name, std::vector<symbol> & tv, int tv_i, ex expr)
I think this should be splitted into two separate functions -- one which transforms expression and one which prints it.
{ size_t MAX_NOPS = 1024;
bool flag_long_expression = false;
// ------------------------------------------- // check for long add and mul lst subexpr_lst; for (const_preorder_iterator it = expr.preorder_begin(); it != expr.preorder_end(); it++) { if ( is_a<add>(*it) || is_a<mul>(*it) ) { if ( (*it).nops() > MAX_NOPS ) { subexpr_lst.append(*it); flag_long_expression = true; } } }
// ------------------------------------------- if ( flag_long_expression ) { exmap subs_map;
for ( lst::const_reverse_iterator k = subexpr_lst.rbegin(); k != subexpr_lst.rend(); k++ ) { // two new symbols int i1 = tv.size(); ^^^^^^ This ought to be std::size_t ...
tv.push_back( symbol( base_name+itos(i1) ) ); int i2 = tv.size();
^^^^^^ and this one too.
tv.push_back( symbol( base_name+itos(i2) ) );
ex expr_1, expr_2;
if ( is_a<add>(*k) ) { expr_1 = 0; const_iterator j_it = k->begin(); for (size_t j=0; j<MAX_NOPS; j++) { expr_1 += *j_it; j_it++; } expr_2 = (*k) - expr_1;
subs_map[*k] = tv[i1]+tv[i2]; } else if ( is_a<mul>(*k) ) { expr_1 = 1; const_iterator j_it = k->begin(); for (size_t j=0; j<MAX_NOPS; j++) { expr_1 *= *j_it; j_it++; } expr_2 = (*k) / expr_1;
subs_map[*k] = tv[i1]*tv[i2]; }
// print expr_2 and expr_1 write_line(os, T, base_name, tv, i2, expr_2); write_line(os, T, base_name, tv, i1, expr_1);
} // lst::const_iterator k
expr = expr.subs( subs_map, subs_options::no_pattern );
} // flag_long_expression
// ------------------------------------------- // print one line
os << " " << T << " " << tv[tv_i] << " = ";
if ( T == std::string("double") ) { expr.print(print_csrc_double(os)); } else if ( T == std::string("float") ) { expr.print(print_csrc_float(os)); } else if ( T == std::string("cl_N") ) { expr.print(print_csrc_cl_N(os)); } else // default { expr.print(print_csrc(os)); }
os << ";" << std::endl; }
/** * * This function substitutes all subexpressions which occur more than once in the * expression tree. * * The function proceeds iteratively and substitutes first subexpressions, which are * "closer" to the root of the expression tree. * If these subexpressions in turn contain subsubexpressions, which occur multiple times, * the latter are substituted in subsequent iterations. * */
I doubt this will result in better code. And I think this function should be splitted into two separate functions -- one which transforms expression and one which prints it.
void substitute_subexpressions(std::ostream & os, const std::string & T, const std::string & base_name, ex expr) { bool flag_try_substitution = true; std::vector<symbol> tv;
lst expr_lst; ex ex_expr_lst; lst replies;
// initialize expr_lst.append(expr); ex_expr_lst = expr_lst; tv.push_back( symbol( base_name+itos(tv.size()) ) );
... so you don't need to loop here, you can call (recursively) this function.
while ( flag_try_substitution ) { // expand tree into a long list lst subexpr_lst;
You'd better use exset (e.g., std::set<ex, ex_is_less>)...
for (const_preorder_iterator it = ex_expr_lst.preorder_begin(); it != ex_expr_lst.preorder_end(); it++) { if ( check_substitution_type(*it) ) subexpr_lst.append(*it); } subexpr_lst.sort();
// find dublicate entries replies.remove_all();
... so you won't get any duplicates at all.
ex subexpr = subexpr_lst.op(0); int counter = 1;
lst::const_iterator it_start = subexpr_lst.begin(); it_start++; for ( lst::const_iterator it = it_start; it != subexpr_lst.end(); it++ ) { if ( subexpr.is_equal(*it) ) { counter++;
if ( counter == 2 ) { bool flag_subs = true;
size_t k = 0; for ( lst::const_iterator k_it = replies.begin(); k_it != replies.end(); k_it++ ) { // subexpression of something already in replies if ( (*k_it).has(subexpr) ) { flag_subs = false; } // replies contains a subsubexpression of subexpr // note that subexpr can be the mother of more than one child else if ( subexpr.has(*k_it) ) { replies.let_op(k) = subexpr; flag_subs = false; }
// update k k++; } // const_iterator k_it
// new substitution if ( flag_subs ) { replies.append(subexpr); } } // counter == 2
} // equal else // not equal { // new subexpression subexpr = *it; counter = 1; } // subexpr not equal } // lst::const_iterator it
// sort and remove double entries replies.sort(); replies.unique();
// substitute if ( replies.nops() > 0 ) { exmap subs_map;
for ( lst::const_iterator k = replies.begin(); k != replies.end(); k++ ) { int i = tv.size(); tv.push_back( symbol( base_name+itos(i) ) ); subs_map[*k] = tv[i]; } ex_expr_lst = ex_expr_lst.subs( subs_map, subs_options::no_pattern );
expr_lst = ex_to<lst>(ex_expr_lst); for ( lst::const_iterator k = replies.begin(); k != replies.end(); k++ ) { expr_lst.append(*k); } ex_expr_lst = expr_lst; } else { flag_try_substitution = false; }
} // flag_try_substitution
// ----------------------------------------------------------- // output
int tv_i = tv.size(); for ( lst::const_reverse_iterator k = expr_lst.rbegin(); k != expr_lst.rend(); k++ ) { tv_i--; write_line(os, T, base_name, tv, tv_i, *k); } }
} // end of anonymous namespace
/** * * A routine which generates optimized C code. * * expr is the expression which should be translated to C code. * os is the stream to which the output is written. * T is a string giving the data type in C of the expression, usually something like "double" or "float". * Intermediate variables obtain their names from base_name, to which a serial number is attached, e.g. * if base_name=string("t"), the intermediate variables are labelled t0, t1, t2, etc.. * The routine generates a C code fragment, such that after execution of this code, t0 contains the value * of the original expression expr. * * The code is generated in two steps: * - First all sub-expressions, which occur more than once are replaced by intermediate variables.
No, it does not... Example: x+y+cos(x+y) does not get modified at all.
* - In a second step, expressions with a large number of operands are split into smaller parts. * This step does not lead to optimizations, but allows large expressions to be processed by the * compiler.
I don't think 1K is a "large number". And evalf() is much faster for numeric evaluation of large expressions anyway...
The second part is done by the function "write_line". * */ void optimized_C_code(std::ostream & os, const std::string & T, const std::string & base_name, ex expr) { substitute_subexpressions(os, T, base_name, expr); }
} // namespace GiNaC
Best regards, Alexei -- All science is either physics or stamp collecting.
Dear Alexei, I think you miss the point. A typical situation were the generation of C code is useful, is the following: You use GiNaC to compute symbolically a rather large expression which depends on a handful symbols. You then want to evaluate this expression numerically a large number of times for different values of the variables/symbols ( a typical example is Monte Carlo integration). There are two large quantities in the game: The size of the expression and the number of evaluations. I believe an efficient way to do this is to write the expression as C code into a file, compile and reload the compiled function into memory. There is certainly an overhead involved for the time needed to generate the C code and the compilation, but this occurs only once and is usually outweighted by the speed improvement of a single evaluation. To give an example I took an expression, which I want to evaluate 10^6 times. A single evaluation with evalf() takes 0.55 s. If I use the plain print_csrc routine build into GiNaC, the code is generated in less than a second, compilation takes 74s, but the time for a single evaluation improves to 3.8 10^-3 s. If I take the routine for the generation of optimized code, code generation takes 49s (yes, code optimization has a price, but you pay only once), compilation takes 7 s, and the time for a single evaluation is now 3.2 10^-4 s. So the total cost for one million evaluations is evalf() : 550000 s standard csrc: 3875 s optimized: 376 s Therefore I don't understand why you propose evalf(), the difference in cost are orders of magnitude. A few more comments: I agree that GiNaC is not a compiler, but a nice feature is that it is based on C++ and should therefore allow the combination of symbolic and numerical code. Generating optimized C code is a feature I miss in GiNaC. Finally, one should make the attempt to generate "the" optimal code, anything which performs better than the standard csrc output is already an improvment. Best wishes, Stefan
On Wed, Mar 21, 2007 at 11:25:21AM +0100, Stefan Weinzierl wrote:
I think you miss the point. A typical situation were the generation of C code is useful, is the following: You use GiNaC to compute symbolically a rather large expression which depends on a handful symbols. You then want to evaluate this expression numerically a large number of times for different values of the variables/symbols ( a typical example is Monte Carlo integration).
There are two large quantities in the game: The size of the expression and the number of evaluations.
Yes, if one about to calculate the expression ~ 10^N times, compiling it might give considerable speedup.
I believe an efficient way to do this is to write the expression as C code into a file, compile and reload the compiled function into memory.
There is certainly an overhead involved for the time needed to generate the C code and the compilation, but this occurs only once and is usually outweighted by the speed improvement of a single evaluation.
That depends.
To give an example I took an expression,
If it is not top secret, could you please post it here? Otherwise it is difficult to draw any conclusions. It is easy to construct an expression such that optimizer 1) won't changed them at all; 2) will die due to exhausted memory (or my patience, or both); 3) will make them even worse.
which I want to evaluate 10^6 times. A single evaluation with evalf() takes 0.55 s.
Well, things are not that simple. One can do partial evaluation with GiNaC (which is difficult in plain C/C++). For example, simple-minded code would do double x[N]; double y[N]; ex e = // some huge expression depending on x and y. double ed[N*N]; for (size_t i=0; i<=N-1; i++) for(size_t j=0; j<=N-1; j++) ed[i+j*(N-1)] = e.subs(lst(x==x[i], y==y[j])).evalf().to_double(); Smarter code[r] would do for (size_t i=0; i<=N-1; i++) { ex tmp = e.subs(x==x[i]).evalf(); for (size_t j=0; j<=N-1; j++) ed[i+j*(N-1)] = tmp.subs(y==y[j]).evalf().to_double(); } Such a technique is a viable alternative to dumping expression as a C code and compiling it. I admit it is almost useless when evaluation points are random (as in Monte Carlo).
If I use the plain print_csrc routine build into GiNaC, the code is generated in less than a second, compilation takes 74s, but the time for a single evaluation improves to 3.8 10^-3 s.
If I take the routine for the generation of optimized code, code generation takes 49s (yes, code optimization has a price, but you pay only once),
And the price grows faster-then-exponentially with the size of the expression...
compilation takes 7 s, and the time for a single evaluation is now 3.2 10^-4 s.
So the total cost for one million evaluations is evalf() : 550000 s standard csrc: 3875 s optimized: 376 s
Therefore I don't understand why you propose evalf(), the difference in cost are orders of magnitude.
Because the problem was (is?) not stated accurately enough. Anyway, I do agree that "optimizer" improves _some_ expressions.
A few more comments: I agree that GiNaC is not a compiler, but a nice feature is that it is based on C++ and should therefore allow the combination of symbolic and numerical code.
Yes, it allows that. One can use any C/C++ (and with some labour F*****N) libraries _without_ dumping expressions into C/C++/whatever code.
Generating optimized C code is a feature I miss in GiNaC.
To some extent GiNaC was desinged to _avoid_ code generation: "This border line between analytical and numerical methods, however, has quite often remained a painful obstacle in the successful application of computer algebra systems (CASs). Usually, some results are obtained with the help of a CAS and these results are integrated later into some other program. This is not only restricted to numerical results, where one frequently translates the output of some CAS to C or even FORTRAN. It is questionable whether the distinction between one language for writing down symbolical algebra, one for obtaining numerical results and maybe a third one for integrating everything in a graphical user interface has any reason other than an historical one." [cs.sc/0004015] However, GiNaC does support code generation, and even got that evil GiNaC::excompiler thing...
Finally, one should make the attempt to generate "the" optimal code, anything which performs better than the standard csrc output is already an improvment.
It performs better for _particular kind of expressions you happen to work with_. I doubt it will improve something in general case. That was point of my previous post (mostly, I don't like the particular implementation you've suggested, but that's another story). That said, I've got a similar "optimizer" too. :) Basically, it collect()s certain type of (sub)expressions and runs normal() on coefficients. (BTW your code would look simpler if you use collect()). It works very good for some kinds of expressions, but in general case it just sucks horribly. Best regards, Alexei -- All science is either physics or stamp collecting.
Dear Stefan, Stefan Weinzierl schrieb:
I believe an efficient way to do this is to write the expression as C code into a file, compile and reload the compiled function into memory.
Sheplyakov Alexei schrieb:
However, GiNaC does support code generation, and even got that evil GiNaC::excompiler thing...
The evil thing Alexei mentioned is part of CVS HEAD. Maybe you can have a look at it (it is in the files excompiler.* mostly). Most parts of it are unscrupulously stolen from your code in gtybalt ... Maybe we can combine/improve that code with your patch. What do you think? Regards, Jens
Hi, I have the impression that I hit on an internal religous war on the evilness of certain files. This was not my intention. To avoid other misunderstandings, I did not propose to replace standard print_csrc output by something else. This should remain as it is. But I would like to see that GiNaC offers the opportunity that C code is generated in an additional different way with temporary variables. (I carefully try to avoid the word "optimized code" here, but that's what I mean.) The decision which output format to use should be put into the hands of the user, he or she will know best what suits here needs. I will have a look at the excompiler.cpp file and see how it fits together. There are also several good points which Alexei mentioned: If expressions of type add and mul should be substituted or rather calculated on the fly is certainly worth a discussion. My experience would lead me towards substitution, but I admit that in different circumstances the conclusion could be the opposite. Maybe flags or a visitor object could fine-tune this behaviour on an individual basis. Collecting and running normal() on the coefficients is also something I do privately on expressions where I excpect improvements. As normal() involves a gcd computation I did not include it in the code generation routine. Best wishes, Stefan On Wed, 21 Mar 2007, Jens Vollinga wrote:
Dear Stefan,
Stefan Weinzierl schrieb:
I believe an efficient way to do this is to write the expression as C code into a file, compile and reload the compiled function into memory.
Sheplyakov Alexei schrieb:
However, GiNaC does support code generation, and even got that evil GiNaC::excompiler thing...
The evil thing Alexei mentioned is part of CVS HEAD. Maybe you can have a look at it (it is in the files excompiler.* mostly). Most parts of it are unscrupulously stolen from your code in gtybalt ... Maybe we can combine/improve that code with your patch. What do you think?
Regards, Jens _______________________________________________ GiNaC-devel mailing list GiNaC-devel@ginac.de https://www.cebix.net/mailman/listinfo/ginac-devel
Hello! On Wed, Mar 21, 2007 at 08:55:07PM +0100, Stefan Weinzierl wrote:
To avoid other misunderstandings, I did not propose to replace standard print_csrc output by something else. This should remain as it is.
Fine. Yet another printing format would not hurt, I guess...
But I would like to see that GiNaC offers the opportunity that C code is generated in an additional different way with temporary variables.
...however it should be properly documented. "Produce optimized C code" sounds like "do everything very good", Simplify[], etc. The actual effect is mostly unpredictable, because there is no definition of simplicity. So, *please*, be accurate. E.g. "const ex find_common_subex(const ex& e, exmap& repls) Attempts to recursively find common subexpressions of e (in the sense of .is_equal()) and replace them with symbols. Replacement rules will be appended to repls. For instance, ex e = log(sin(x+y)) + sin(x+y) + pow(sin(x+y), 2); exmap repls; e = find_common_subex(e, repls); cout << e << endl; // will print // log(symbol10) + symbol10 + symbol10^2 for (exmap::const_iterator i = repls.begin(); i != repls.end(); ++i) cout << i->first << " = " << i->second << endl; // will print // symbol10 = sin(x+y) Doing this kind of transformation can take a long time. Note that the matching is purely syntactic, so some kind of common subexpressions are not detected, for example: ex e = x+y+log(x+y); exmap repls; e = find_common_subex(e, repls); cout << e << endl; // expression will be returned unchanged, the output will be // x+y+log(x+y) This function can in some situations substantially improve C code generated by print_csrc." BTW there are somewhat similar functions in GiNaC -- to_polynomial(), to_rational(). I think function for finding common subexpressions is useful (not only for code generation).
(I carefully try to avoid the word "optimized code" here, but that's what I mean.)
Don't tell this to compiler folks :)
The decision which output format to use should be put into the hands of the user, he or she will know best what suits here needs.
To make the correct decision one should be properly informed. Unfortunately the documentation you've supplied is not good enough for this purpose.
I will have a look at the excompiler.cpp file and see how it fits together.
There are also several good points which Alexei mentioned:
If expressions of type add and mul should be substituted or rather calculated on the fly is certainly worth a discussion. My experience would lead me towards substitution, but I admit that in different circumstances the conclusion could be the opposite.
Getting rid of temporaries often gives substantional performance boost, so there is a special kind of C++ (template) libraries which are designed exactly for this purpose -- expression template libraries (blitz++, boost::ublas, tvmet, etc.). It is very difficult to algorithmically decide which variant is better (making the decision might even take longer than the calculation itself). Actually, this is what optimizing compilers do, and that is why compilation takes a long time. BTW, sometimes turning optimization off (as in -O0) makes the calculation faster.
Maybe flags or a visitor object could fine-tune this behaviour on an individual basis.
E.g. instead of hard-coding MAX_SIZE make it the argument of the function (with some default value). While we at it, the name is dangerous -- there are a lot of MAX_* macros in the kernel/glibc/whatever headers, so let's type it in all smalls to be on the safe side.
Collecting and running normal() on the coefficients is also something I do privately on expressions where I excpect improvements. As normal() involves a gcd computation I did not include it in the code generation routine.
Yes, gcd() is expansive. has() and other pattern matching functions are expansive too. There are not universal way... Best regards, Alexei -- All science is either physics or stamp collecting.
Hello, thinking about it, we probably want the following: a data structure consisting of two lists, lhs = lst(t0,t1,...) rhs = lst(expr0,expr1,...) with the understanding that this should represent something like ... t1 = expr1; t0 = expr0; There is a consistency condition which needs to be fulfilled: The right hand side of the (j+1)-th entry cannot refer to symbols t0,...,tj, which would be defined later. One can consider two such structures equivalent, if the values of t0,...,tn would be the same after execution of the corresponding code fragments. In most cases we are just interested in n=0. This is actually an equivalence relation in the mathematical sense. For this structure we can have methods, which transform the structure into an equivalent one. These methods can add or remove entries from the two lists, or change the order. Methods, which just transform the rhs are in some sense trivial and not considered here. Examples of such methods could be: - replace common subexpressions of type T1, T2, ... in the rhs. - split containers with more than n_max arguments - (eliminate redundant variables) This structure has then a print method, which simply prints ... double t1 = expr1; double t0 = expr0; Everybody happy with this concept ? Stefan
Dear Stefan, On Thu, Mar 22, 2007 at 03:26:40PM +0100, Stefan Weinzierl wrote:
thinking about it, we probably want the following: a data structure consisting of two lists, lhs = lst(t0,t1,...) rhs = lst(expr0,expr1,...) with the understanding that this should represent something like ... t1 = expr1; t0 = expr0;
There is a type which represents such objects, it is called exmap (std::map<ex, ex_is_less>). Is new type really necessary?
There is a consistency condition which needs to be fulfilled: The right hand side of the (j+1)-th entry cannot refer to symbols t0,...,tj, which would be defined later.
One can consider two such structures equivalent, if the values of t0,...,tn would be the same after execution of the corresponding code fragments. In most cases we are just interested in n=0. This is actually an equivalence relation in the mathematical sense.
For this structure we can have methods, which transform the structure into an equivalent one. These methods can add or remove entries from the two lists, or change the order. Methods, which just transform the rhs are in some sense trivial and not considered here.
Examples of such methods could be: - replace common subexpressions of type T1, T2, ... in the rhs. - split containers with more than n_max arguments - (eliminate redundant variables)
This structure has then a print method, which simply prints ... double t1 = expr1; double t0 = expr0;
Everybody happy with this concept ?
No. There are 3 different _independent_ transformations: 1) finding commong subexpressions, 2) splitting an expression into a smaller parts, 3) generating code. Each of them is useful on its own, so it would be nice to keep them separate. E.g. /** * Attempt to find common subexpressions and replace them with symbols */ const ex find_common_subex(const ex& e, exmap& repls); I've already described suggested semantics in the previous mail. /** * Split large sums and products in @e with number of operands bigger * then @threshold, replace obtained subexpressions with symbols */ const ex split_large_subex(const ex& e, exmap& repls, const size_t threshold); Example: ex e = x1+...+x1000; exmap repls; e = split_large_subex(e, repls, 500); cout << e << endl; // Should print, for instance: // symbol100 + symbol101 for (exmap::const_iterator i=repls.begin(); i!=repls.end(); ++i) cout << i->first << " => " << i->second << endl; // should be something like // symbol100 => x1+...+x500 // symbol101 => x501+...+x1000 void generate_code(const& ex e, const exmap& repls, std::ostream& os); Example: generate_code(e, repls, std::cout); should print const double f(const double[] x) { double symbol100 = x[0]+...+x[499]; double symbol101 = x[500]+...+x[999]; double ret = symbol100 + symbol101; return ret; } N.B. I don't insist on this particular kind of interface, I just give examples to clarify what I mean. You are wellcome to suggest something better. Best regards, Alexei -- All science is either physics or stamp collecting.
Dear Alexei, On Mon, 26 Mar 2007, Sheplyakov Alexei wrote:
There is a type which represents such objects, it is called exmap (std::map<ex, ex_is_less>). Is new type really necessary?
exmap is hard-wired to ex_is_less and uses the internal order of GiNaC. For substitutions which are supposed to be printed out in the end, one has to ensure that they occur in the right order, e.g. no symbol is used on the right-hand-side before it has been defined on a left-hand-side.
Examples of such methods could be: - replace common subexpressions of type T1, T2, ... in the rhs. - split containers with more than n_max arguments - (eliminate redundant variables)
This structure has then a print method, which simply prints ... double t1 = expr1; double t0 = expr0;
Everybody happy with this concept ?
No. There are 3 different _independent_ transformations: 1) finding commong subexpressions, 2) splitting an expression into a smaller parts, 3) generating code. Each of them is useful on its own, so it would be nice to keep them separate.
That's what I wrote above: One has three different methods. Stefan
Dear Stefan, Stefan Weinzierl wrote:
thinking about it, we probably want the following: a data structure consisting of two lists, lhs = lst(t0,t1,...) rhs = lst(expr0,expr1,...) with the understanding that this should represent something like ... t1 = expr1; t0 = expr0;
There is a consistency condition which needs to be fulfilled: The right hand side of the (j+1)-th entry cannot refer to symbols t0,...,tj, which would be defined later.
One can consider two such structures equivalent, if the values of t0,...,tn would be the same after execution of the corresponding code fragments. In most cases we are just interested in n=0. This is actually an equivalence relation in the mathematical sense.
For this structure we can have methods, which transform the structure into an equivalent one. These methods can add or remove entries from the two lists, or change the order. Methods, which just transform the rhs are in some sense trivial and not considered here.
Examples of such methods could be: - replace common subexpressions of type T1, T2, ... in the rhs. - split containers with more than n_max arguments - (eliminate redundant variables)
This structure has then a print method, which simply prints ... double t1 = expr1; double t0 = expr0;
Everybody happy with this concept ?
It seems to make a lot of sense but I am not entirely sure if I understand the proposal correctly. Could you, please, add a small mock-up snippet of user code where at least two nontrivial expressions are "optimized" in this sense in order to make things clear? TIA -richy. -- Richard B. Kreckel <http://www.ginac.de/~kreckel/>
Dear Stefan,
thinking about it, we probably want the following: a data structure consisting of two lists, lhs = lst(t0,t1,...) rhs = lst(expr0,expr1,...) with the understanding that this should represent something like ... t1 = expr1; t0 = expr0;
[...]
Everybody happy with this concept ?
I think that would be rather useful. Best wishes, Chris
Hi, after the discussions of the previous weeks comes now a new version for the code generation. There is now a new class "assign_lst", which has different methods for different things: replace_common_subex(flag) to replace common subexpressions. The flag can used to select any combination from the set {add,mul,power,function} for which one wants this substitution. split_large_subex(nmax) splits subexpressions of type add or mul with more than nmax operands. get_temporary_variables() returns the list of all new temporary variables print(ostream) and operator<<: Print it out. I attach an example program to show how it is used. The example program first prints for each example the standard csrc output and the modified (or "optimized") version. Examples 5 and 6 show some limitations. When running this program one gets: Example 1: replace common subexpressions double res = log(sin(y+x))+sin(y+x)+pow(sin(y+x),2.0)+log(y+x); double tmp1 = y+x; double tmp0 = sin(tmp1); double res = log(tmp0)+(tmp0*tmp0)+log(tmp1)+tmp0; Example 2: as above, but don't replace add and mul double res = log(sin(y+x))+sin(y+x)+pow(sin(y+x),2.0)+log(y+x); double tmp0 = sin(y+x); double res = (tmp0*tmp0)+tmp0+log(tmp0)+log(y+x); Example 3: works also on several assignments double y1 = x2*sin(phi)+cos(phi)*x1; double y2 = cos(phi)*x2-x1*sin(phi); double tmp1 = sin(phi); double tmp0 = cos(phi); double y1 = x1*tmp0+tmp1*x2; double y2 = -x1*tmp1+tmp0*x2; Example 4: split subexpressions with more than 6 operands double res = x1+x0+y6*y1*y0*y5*y4*y9*y8*y3*y2*y7+x5+x4+x9+x8+x3+x2+x7+x6; double tmp1 = x8+x3+x2+x7+x6; double tmp3 = y8*y3*y2*y7; double tmp2 = y6*y1*y0*y5*y4*y9; double tmp0 = x1+x0+x5+x4+x9+tmp3*tmp2; double res = tmp1+tmp0; List of temporary variables: {tmp0,tmp1,tmp2,tmp3} Example 5: pow(x+y,2) is not a subexpr of pow(x+y,4) double res = 2.0*pow(y+x,2.0)+log(pow(y+x,4.0))+3.0*pow(y+x,3.0)+5.0*pow(y+x,5.0)+4.0*pow(y+x,4.0)+log(pow(y+x,2.0)); double tmp2 = y+x; double tmp1 = ((tmp2*tmp2)*(tmp2*tmp2)); double tmp0 = (tmp2*tmp2); double res = log(tmp0)+log(tmp1)+5.0*(tmp2*(tmp2*tmp2)*(tmp2*tmp2))+3.0*(tmp2*tmp2*tmp2)+4.0*tmp1+2.0*tmp0; Example 6: Substitution works only syntactically double res = y+x+cos(y+x)+sin(y+x); double tmp0 = y+x; double res = sin(tmp0)+y+x+cos(tmp0); Best wishes, Stefan
Dear Stefan, On Tue, Apr 03, 2007 at 10:09:10AM +0200, Stefan Weinzierl wrote:
after the discussions of the previous weeks comes now a new version for the code generation. There is now a new class "assign_lst", which has different methods for different things:
I don't think it is good idea. Each of these methods is useful on its own...
replace_common_subex(flag) to replace common subexpressions. The flag can used to select any combination from the set {add,mul,power,function} for which one wants this substitution.
... this one in particular is generic enough and similar to already exisitng collect_common_factors(), to_polynomial(), to_rational(). Please make it a separate function. Pretty *please*.
split_large_subex(nmax) splits subexpressions of type add or mul with more than nmax operands.
The same here.
get_temporary_variables() returns the list of all new temporary variables
print(ostream) and operator<<: Print it out.
I attach an example program to show how it is used. The example program first prints for each example the standard csrc output and the modified (or "optimized") version. Examples 5 and 6 show some limitations. When running this program one gets:
Example 1: replace common subexpressions double res = log(sin(y+x))+sin(y+x)+pow(sin(y+x),2.0)+log(y+x);
double tmp1 = y+x; double tmp0 = sin(tmp1); double res = log(tmp0)+(tmp0*tmp0)+log(tmp1)+tmp0;
First of all, I'd prefer this to do the following: ex e = log(sin(y+x))+sin(y+x)+pow(sin(y+x),2)+log(y+x); exmap repls; e = find_common_subex(e, repls); // e: // log(symbol0) + pow(symbol0, 2) + log(symbol1) + symbol0 // repls: symbol1 => y+x // symbol0 => sin(symbol1) Such transformation is useful even if user is not going to generate any code at all. Secondly, even for code generation this format is not flexible enough, since it can produce only C/C++ code. In many languages variables has to be declared in advance. That's why I insist on separate functions for each task (finding common subexpressions, splitting large subexpressions, printing the actual code).
Example 2: as above, but don't replace add and mul double res = log(sin(y+x))+sin(y+x)+pow(sin(y+x),2.0)+log(y+x);
double tmp0 = sin(y+x); double res = (tmp0*tmp0)+tmp0+log(tmp0)+log(y+x);
Example 3: works also on several assignments double y1 = x2*sin(phi)+cos(phi)*x1; double y2 = cos(phi)*x2-x1*sin(phi);
double tmp1 = sin(phi); double tmp0 = cos(phi); double y1 = x1*tmp0+tmp1*x2; double y2 = -x1*tmp1+tmp0*x2;
Example 4: split subexpressions with more than 6 operands double res = x1+x0+y6*y1*y0*y5*y4*y9*y8*y3*y2*y7+x5+x4+x9+x8+x3+x2+x7+x6;
double tmp1 = x8+x3+x2+x7+x6; double tmp3 = y8*y3*y2*y7; double tmp2 = y6*y1*y0*y5*y4*y9; double tmp0 = x1+x0+x5+x4+x9+tmp3*tmp2; double res = tmp1+tmp0;
It would be much better in this form: ex e = x1+x0+y6*y1*y0*y5*y4*y9*y8*y3*y2*y7+x5+x4+x9+x8+x3+x2+x7+x6; size_t max_size = 6; exmap repls; e = split_large_subex(e, repls, max_size); // e: symbol1 + symbol0 // repls: // tmp0 => x1+x0+x5+x4+x9+tmp3*tmp2 // tmp1 => x8+x3+x2+x7+x6 // tmp2 => y6*y1*y0*y5*y4*y9 // tmp3 => y8*y3*y2*y7 No need for get_temporary_variables() -- just iterate over repls. But one need to sort entries so variables get defined before they used.
Example 5: pow(x+y,2) is not a subexpr of pow(x+y,4) double res = 2.0*pow(y+x,2.0)+log(pow(y+x,4.0))+3.0*pow(y+x,3.0)+5.0*pow(y+x,5.0)+4.0*pow(y+x,4.0)+log(pow(y+x,2.0));
double tmp2 = y+x; double tmp1 = ((tmp2*tmp2)*(tmp2*tmp2)); double tmp0 = (tmp2*tmp2); double res = log(tmp0)+log(tmp1)+5.0*(tmp2*(tmp2*tmp2)*(tmp2*tmp2))+3.0*(tmp2*tmp2*tmp2)+4.0*tmp1+2.0*tmp0;
So powers are always printed as products? I don't think it is nice, would be better to allow user-defined formats, for instance, pow(tmp2, 5), or tmp2**5, or ...
Example 6: Substitution works only syntactically double res = y+x+cos(y+x)+sin(y+x);
double tmp0 = y+x; double res = sin(tmp0)+y+x+cos(tmp0);
[snipped the code] Best regards, Alexei -- All science is either physics or stamp collecting.
Dear Alexei, On Wed, 4 Apr 2007, Sheplyakov Alexei wrote:
First of all, I'd prefer this to do the following:
ex e = log(sin(y+x))+sin(y+x)+pow(sin(y+x),2)+log(y+x); exmap repls; e = find_common_subex(e, repls); // e: // log(symbol0) + pow(symbol0, 2) + log(symbol1) + symbol0 // repls: symbol1 => y+x // symbol0 => sin(symbol1)
Such transformation is useful even if user is not going to generate any code at all.
In case that I repeat myself: I don't think that an exmap is appropriate here, as it looses the information in which order the substitutions should occur. This information is available when parsing the expression tree, it gets lost when you squeeze it into an exmap and has to be recovered with an overhead of computational cost when ones wants to print out the assignments in the right order. Secondly, there is a more subtle issue: The content of exmap will depend on implementation details of subs(). In the example above, should repls contain symbol1 => y+x symbol0 => sin(symbol1) or symbol1 => y+x symbol0 => sin(y+x) The answer depends on whether subs() visits the children of each node in post- or preorder. (It visits the children first.) If for some reason this is changed in the future, it will break the code for find_common_subex(). I don't want to introduce such subtle dependencies. Best regards, Stefan
participants (5)
-
Chris.Dams@mi.infn.it
-
Jens Vollinga
-
Richard B. Kreckel
-
Stefan Weinzierl
-
varg@theor.jinr.ru