[GiNaC-devel] GiNaC is not a C compiler [Was: Optimized C-style output]

21 Mar 2007

      Hello,

On Tue, Mar 20, 2007 at 01:40:22PM +0100, Stefan Weinzierl wrote:
...
recently I was generating C-style output from GiNaC expressions and ran
into the following problems:
- the compiled code was inefficient, as common subexpressions are
  calculated over and over again,
Fetching something from memory is quite an expensive operation on modern
(and not so modern) hardware, so calculating simple subexpressions over
and over again *can* be much faster. For instance, consider common 
subexpressions of the form (x+y), (x+2*y), ... (x+1000000*y). Evaluation
of such an subexpression takes just few cycles, so it would be really
inefficient (in terms of both calculation time and memory) to use 10^6
variables to store their values. 

On the other hand, there are definitely a lot of cases when it is more
efficient to calculate the subexpression(s) only once. Deciding if
subexpression(s) are complicated or not is a very complicated problem.
GiNaC does not even try to solve it (GiNaC is Not A C compiler).
...
- the compiler didn't really appreciate single lines with about one
  million or more characters.
If the generated code size is _that_ big, direct calculation with .evalf()
might be much faster. In fact, I often do the inverse thing: use GiNaC for
numeric evaluation of lenghthy (>= 100Mb) expressions. No C/C++/whatever
compiler is able to process them...
...
On the mailing list I found that the issue of long lines was already
discussed in July 2003, but it seems that this discussion did not lead to
additional functionality of GiNaC.
To overcome these problems, I wrote a routine which generates optimized
C code. This routine
- substitutes every subexpression, which occurs more than once with a
  temporary variable.
First of all, scanning for common subexpressions is _very_ expansive, as
any pattern matching is. Secondly, generated code is not really optimal
(as I explained before).
...
- splits long expressions into several lines.
IMHO this is good idea. However I don't like your implementation at
all. (That said, I used to pipe GiNaC output through `fold -w 78')
...
As this could be of use to other people as well, it could be included into
GiNaC. The source files are attached to this mail.
An example for this routine would be the following code fragment
symbol x("x"), y("y");
  ex f = pow(sin(x+y),2)+sin(x+y)+log(sin(x+y));
  optimized_C_code(std::cout, std::string("double"), std::string("tmp"), f);
which prints
double tmp1 = sin(y+x);
 double tmp0 = tmp1+(tmp1*tmp1)+log(tmp1);
to standard output.
Could you please give less trivial examples?

I don't like the idea at all (that is just IMHO). But the implementation
itself is far from perfect.
...
/** @file code_generation.h
 *
 * Interface to code_generation
 *
 */
#ifndef __GINAC_CODE_GENERATION_H__
#define __GINAC_CODE_GENERATION_H__
#include <iostream>
#include <string>
#include "ex.h"
namespace GiNaC {
It would be nice to have several functions for each task -- e.g. one
which transforms the expression to the "optimal" form, one which
converts it into exmap (e.g. splits it), one which prints it...
And the function for "optimizing" should probably take a hint from 
user (he might knew about some sort of subexpressions in advance).
E.g.

const ex heuristic_collect(const ex& e, exmap& subexprs);
...
void optimized_C_code(std::ostream & os, const std::string & T, const std::string & temp_var_base_name, ex expr);
const ex& expr?
...
} // namespace GiNaC
#endif // ndef __GINAC_CODE_GENERATION_H__
Content-Description: cpp file
...
/** @file code_generation.cc
 *
 * Implementation of code_generation 
 *
 */
#include <cstddef>
#include <vector>
#include <sstream>
#include "ginac.h"
#include "code_generation.h"
namespace GiNaC {
// anonymous namespace for helper functions
namespace {
/**
   *
   * Helper function, which converts an integer into a string
   *
   */
  std::string itos(int arg)
  {
    std::ostringstream buffer;
    buffer << arg; 
    return buffer.str(); 
  }
/**
   *
   * Checks if an expression can potentially be substituted.
   *
   * At present, expressions which are considered for substitution are of the 
   * following types:
   * add, mul, power or function.
   *
   */
  bool check_substitution_type(ex expr)
^^
You'd better use const ex& here
...
{
    if ( is_a<add>(expr) 
   || is_a<mul>(expr) 
   || is_a<power>(expr) 
   || is_a<function>(expr) 
   ) return true;
return false;
  }
/**
   *
   * This function writes a C statement of the form
   * T base_namexxx = expr
   * to the stream.
   *
   * If the expression contains a subexpression with a large number of operands, the output is split 
   * into smaller subexpressions.
The number of operands is not really good as code complexity metrics.
...
*
   */
  void write_line(std::ostream & os, const std::string & T, const std::string & base_name, 
        std::vector<symbol> & tv, int tv_i, ex expr)
I think this should be splitted into two separate functions -- one
which transforms expression and one which prints it.
...
{
    size_t MAX_NOPS = 1024;
bool flag_long_expression = false;
// -------------------------------------------
    // check for long add and mul
    lst subexpr_lst;
    for (const_preorder_iterator it = expr.preorder_begin(); it != expr.preorder_end(); it++)
      {
  if ( is_a<add>(*it) || is_a<mul>(*it) ) 
    {
      if ( (*it).nops() > MAX_NOPS )
        {
      subexpr_lst.append(*it);
      flag_long_expression = true;
        }
    }
      }
// -------------------------------------------
    if ( flag_long_expression )
      {
  exmap subs_map;
for ( lst::const_reverse_iterator k = subexpr_lst.rbegin(); k != subexpr_lst.rend(); k++ )
    {
      // two new symbols
      int i1 = tv.size();
        ^^^^^^
This ought to be std::size_t ...
...
tv.push_back( symbol( base_name+itos(i1) ) );
      int i2 = tv.size();
^^^^^^
and this one too.
...
tv.push_back( symbol( base_name+itos(i2) ) );
ex expr_1, expr_2;
if ( is_a<add>(*k) )
        {
      expr_1 = 0;
      const_iterator j_it = k->begin();
      for (size_t j=0; j<MAX_NOPS; j++)
        {
          expr_1 += *j_it;
          j_it++;
        }
      expr_2 = (*k) - expr_1;
subs_map[*k] = tv[i1]+tv[i2];
        }
      else if ( is_a<mul>(*k) )
        {
      expr_1 = 1;
      const_iterator j_it = k->begin();
      for (size_t j=0; j<MAX_NOPS; j++)
        {
          expr_1 *= *j_it;
          j_it++;
        }
      expr_2 = (*k) / expr_1;
subs_map[*k] = tv[i1]*tv[i2];
        }
// print expr_2 and expr_1
      write_line(os, T, base_name, tv, i2, expr_2);
      write_line(os, T, base_name, tv, i1, expr_1);
} // lst::const_iterator k
expr = expr.subs( subs_map, subs_options::no_pattern );
} // flag_long_expression
// -------------------------------------------
    // print one line
os << " " << T << " " << tv[tv_i] << " = ";
if ( T == std::string("double") )
      {
  expr.print(print_csrc_double(os));
      }
    else if ( T == std::string("float") )
      {
  expr.print(print_csrc_float(os));
      }
    else if ( T == std::string("cl_N") )
      {
  expr.print(print_csrc_cl_N(os));
      }
    else // default
      {
  expr.print(print_csrc(os));
      }
os << ";" << std::endl;
  }
/**
   *
   * This function substitutes all subexpressions which occur more than once in the
   * expression tree.
   *
   * The function proceeds iteratively and substitutes first subexpressions, which are
   * "closer" to the root of the expression tree.
   * If these subexpressions in turn contain subsubexpressions, which occur multiple times,
   * the latter are substituted in subsequent iterations.
   *
   */
I doubt this will result in better code.

And I think this function should be splitted into two separate functions
-- one which transforms expression and one which prints it.
...
void substitute_subexpressions(std::ostream & os, const std::string & T, const std::string & base_name, ex expr)
  {
    bool flag_try_substitution = true;
    std::vector<symbol> tv;
lst expr_lst;
    ex ex_expr_lst;
    lst replies;
// initialize
    expr_lst.append(expr);
    ex_expr_lst = expr_lst;
    tv.push_back( symbol( base_name+itos(tv.size()) ) );
... so you don't need to loop here, you can call (recursively)
this function.
...
while ( flag_try_substitution )
      {
  // expand tree into a long list
  lst subexpr_lst;
You'd better use exset (e.g., std::set<ex, ex_is_less>)...
...
for (const_preorder_iterator it = ex_expr_lst.preorder_begin(); it != ex_expr_lst.preorder_end(); it++)
    {
      if ( check_substitution_type(*it) ) subexpr_lst.append(*it);
    }
  subexpr_lst.sort();
// find dublicate entries
  replies.remove_all();
... so you won't get any duplicates at all.
...
ex subexpr = subexpr_lst.op(0);
  int counter = 1;
lst::const_iterator it_start = subexpr_lst.begin();
  it_start++;
  for ( lst::const_iterator it = it_start; it != subexpr_lst.end(); it++ )
    {
      if ( subexpr.is_equal(*it) ) 
        {
      counter++;
if ( counter == 2 )
        {
          bool flag_subs = true;
size_t k = 0;
          for ( lst::const_iterator k_it = replies.begin(); k_it != replies.end(); k_it++ )
            {
      	// subexpression of something already in replies
      	if ( (*k_it).has(subexpr) ) 
      	  {
      	    flag_subs = false;
      	  }
      	// replies contains a subsubexpression of subexpr
      	// note that subexpr can be the mother of more than one child
      	else if ( subexpr.has(*k_it) )
      	  {
      	    replies.let_op(k) = subexpr;
      	    flag_subs = false;
      	  }
// update k
      	k++;
            } // const_iterator k_it
// new substitution
          if ( flag_subs )
            {
      	replies.append(subexpr);
            }
        } // counter == 2
} // equal
      else // not equal
        {
      // new subexpression
      subexpr = *it;
      counter = 1;
        } // subexpr not equal
    } // lst::const_iterator it
// sort and remove double entries
  replies.sort();
  replies.unique();
// substitute
  if ( replies.nops() > 0 )
    {
      exmap subs_map;
for ( lst::const_iterator k = replies.begin(); k != replies.end(); k++ )
        {
      int i = tv.size();
      tv.push_back( symbol( base_name+itos(i) ) );
      subs_map[*k] = tv[i];
        }
      ex_expr_lst = ex_expr_lst.subs( subs_map, subs_options::no_pattern );
expr_lst = ex_to<lst>(ex_expr_lst);
      for ( lst::const_iterator k = replies.begin(); k != replies.end(); k++ )
        {
      expr_lst.append(*k);
        }
      ex_expr_lst = expr_lst;
    }
  else
    {
      flag_try_substitution = false;
    }
} // flag_try_substitution
// -----------------------------------------------------------
    // output
int tv_i = tv.size();
    for ( lst::const_reverse_iterator k = expr_lst.rbegin(); k != expr_lst.rend(); k++ )
      {
  tv_i--;
  write_line(os, T, base_name, tv, tv_i, *k);
      }
  }
} // end of anonymous namespace
/**
   *
   * A routine which generates optimized C code.
   *
   * expr is the expression which should be translated to C code.
   * os is the stream to which the output is written.
   * T is a string giving the data type in C of the expression, usually something like "double" or "float".
   * Intermediate variables obtain their names from base_name, to which a serial number is attached, e.g.
   * if base_name=string("t"), the intermediate variables are labelled t0, t1, t2, etc..
   * The routine generates a C code fragment, such that after execution of this code, t0 contains the value
   * of the original expression expr.
   *
   * The code is generated in two steps:
   * - First all sub-expressions, which occur more than once are replaced by intermediate variables.
No, it does not... Example: x+y+cos(x+y) does not get modified at all.
...
* - In a second step, expressions with a large number of operands are split into smaller parts.
   *   This step does not lead to optimizations, but allows large expressions to be processed by the 
   *   compiler.
I don't think 1K is a "large number". And evalf() is much faster for
numeric evaluation of large expressions anyway...
...
The second part is done by the function "write_line".
   *
   */
  void optimized_C_code(std::ostream & os, const std::string & T, const std::string & base_name, ex expr)
  {
    substitute_subexpressions(os, T, base_name, expr);
  }
} // namespace GiNaC
Best regards,
 Alexei

-- 
All science is either physics or stamp collecting.

[GiNaC-devel] GiNaC is not a C compiler [Was: Optimized C-style output]

varg＠theor.jinr.ru