Re: CLN/ginac: factoring announce, questions

17 Jun 2004

      Dear Ralf,

On Wed, 16 Jun 2004, Ralf Stephan wrote:
...
the following is concerned with the numtheory part of CLN. I have
appended a patch to numtheory/cl_IF.h file and three new files for
the same directory that would provide a factoring engine. The code
is a much improved version of what I sent to Richy Kreckel a few days
ago, and it explains itself. I have serious design questions on
the interface, though.
Shortly summed, you can factor any integer now within a minute whose
second largest prime factor is less than ~10^11. I am not an expert,
Oh, c'mon!
...
and I will not even think about implementing one of the more exotic
algorithms (squfof, elliptic curves, all kind of sieves). I did this
because I need ginac with a divisors() function.
The people familiar with ginac/CLN design are asked to please answer
QUESTION 1.
The function cl_factor_rec() is the recursive main engine and will be
called by factor() that is the interface to it. What should factor()
return to the outside? We have
- two lists (prime factors, exponents)
- a list of pairs (p1,e1) (p2,e2) ... pari does return data like this
- a simple list (p1, e1, p2, e2, p3, ...)
QUESTION 2.
How should the pairs/lists be implemented? We have
- vectors provided by CLN (GV/SV where's the difference?)
- STL containers
An alternative would be to implement factor() outside CLN, ie, within
ginac using lst, but I think there is interest having it in CLN. I just
ask because I do not think the STL should be duplicated by GV/SV which,
supposedly, was started when gcc did not have a working STL. I am no
expert there, either, but I think CLN's memory scheme can be adapted to
STL by simply(?) using a tailored allocator object.
QUESTION 3.
Should divisors() go into CLN or ginac?
Many thanks for your time reading through to here. Please help---I am
just starting C++ again after 8 years of hiatus, and design thinking
is a bit unfamiliar at this time. Anyway, here is the beef:
Thank you very much for your keen resubmission.  I was still pondering
over your first patch which did have some, err, rough edges -- like that
concat() function poking on the heap.  There are still issues that we need
to iron out, like const correctness and how to setup tests, but that can
wait.

I generally have a quite restrictive attitude towards extensions to CLN,
admitting only state-of-the-art implementations.  But in the case of
integer factorization I do agree that code that finds all factors of order
10^10 is a very good start and should be included in CLN proper.  (Yes,
that answers your third question.)  It may be helful for many and it may
be improved in the future if people feel inclined.  However, it is
important to get the interface right, as you have noticed already.

So, to question 1, what should factor() return?  I'ld hate to see two
lists.  (I would rather have two vectors, since these are somewhat more
rigidly coupled by nature.)  I also dislike the alternating list of
factors and exponents.  It bears semantics invisible to the
language/compiler.  The list of pairs of factors and exponents is the most
reasonable return value.  And, I think, the most common.  Not only Pari
uses this notation, I think NTL does as well (for polynomial factors, that
is).

Question 2 gives me headaches.  You've already noticed that CLN does not
use std::list<>.  I guess this is because the univariate polynomial code
was written at a time when it was en vogue to write one's own
vector/list/map/...  Since a list of integer factors does not necessarily
need to have anything to do with the coefficients of a univariate
polynomial, I tend to suggest not to focus on what is already present it
CLN.  It is more important to focus on readabiliby and to keep
extensibility in mind and this leads to STL containers.

Now I do have two questions: What do you mean by adapting CLN's memory
scheme to STL?  The user-visible side of the bridge constitute perfectly
copyable objects with value semantics and are hence suited for STL, right?
Just have a look at the Bernoulli number cache in GiNaC.  Second, is
std::list< factor_exponent_pair_or_whatever > really a wise decision?
Why not use a std::vector<> instead?  After all it doesn't make too much
sense removing/inserting elements therein at random locations.  Methinks
NTL returns vectors as well, but I need to check again.

Best wishes
        -richy.

PS: Besides throwing away lots of emails because I can't wade through all
    my fu^Hine SPAM I sometimes don't have enough time to answer email
    instantly.  This is especilly true if the email is quite non-trivial.
    Well, sorry for the late answer.
-- 
Richard B. Kreckel
<http://www.ginac.de/~kreckel/>