On 8/16/06, Richard B. Kreckel <kreckel@ginac.de> wrote:
Thanks for your patch suggestion. I haven't looked closely at it and won't find the time during the next few weeks (traveling). The macros are just one minor issue. Some quick thoughts that come to mind are: Is it really necessary to add the byte-endianness of the sequence to the signature to I_to_BS? Couldn't we leave the string reversal to the user? It only appears to swap bytes within a digit: What's the point of that?
If an integer is written out as a sequence of digits, the ordering of the digits, need not agree with the ordering of the bytes inside each digit. That's why I added code to make sure that the output was a sequence of *bytes* ordered in a particular way. In principle, I think it is best to have several options available for exporting to a byte stream. My main concern is efficient conversion between native representations of big integer representations of different libraries. Naively, this is an O(N^2) problem, where N is the number of libraries, since one would have to have a conversion routine for each pair of libraries. However, this complexity can be avoided if some common storage format is chosen. Using a byte stream is probably one of the best choices. I ran some tests and noticed that using a byte stream as an intermediate format can be 5-10 times faster than converting through a base-16 representation or even 15-20 times faster than another method using exported shift arithmetic operations (as currently used by the clnum[1] python module). Once a common format is chosen, then complexity of the conversion problem goes down to O(N), since each library now needs only to convert to and from the common standard. My main concerns right now are Python and possibly GMP. Here are their capabilities in this respect: Taken from /usr/include/python2.3/longobject.h: /* _PyLong_FromByteArray: View the n unsigned bytes as a binary integer in base 256, and return a Python long with the same numeric value. If n is 0, the integer is 0. Else: If little_endian is 1/true, bytes[n-1] is the MSB and bytes[0] the LSB; else (little_endian is 0/false) bytes[0] is the MSB and bytes[n-1] the LSB. If is_signed is 0/false, view the bytes as a non-negative integer. If is_signed is 1/true, view the bytes as a 2's-complement integer, non-negative if bit 0x80 of the MSB is clear, negative if set. Error returns: + Return NULL with the appropriate exception set if there's not enough memory to create the Python long. */ PyAPI_FUNC(PyObject *) _PyLong_FromByteArray( const unsigned char* bytes, size_t n, int little_endian, int is_signed); /* _PyLong_AsByteArray: Convert the least-significant 8*n bits of long v to a base-256 integer, stored in array bytes. Normally return 0, return -1 on error. If little_endian is 1/true, store the MSB at bytes[n-1] and the LSB at bytes[0]; else (little_endian is 0/false) store the MSB at bytes[0] and the LSB at bytes[n-1]. If is_signed is 0/false, it's an error if v < 0; else (v >= 0) n bytes are filled and there's nothing special about bit 0x80 of the MSB. If is_signed is 1/true, bytes is filled with the 2's-complement representation of v's value. Bit 0x80 of the MSB is the sign bit. Error returns (-1): + is_signed is 0 and v < 0. TypeError is set in this case, and bytes isn't altered. + n isn't big enough to hold the full mathematical value of v. For example, if is_signed is 0 and there are more digits in the v than fit in n; or if is_signed is 1, v < 0, and n is just 1 bit shy of being large enough to hold a sign bit. OverflowError is set in this case, but bytes holds the least-signficant n bytes of the true value. */ PyAPI_FUNC(int) _PyLong_AsByteArray(PyLongObject* v, unsigned char* bytes, size_t n, int little_endian, int is_signed); GMP is significantly more versatile, it does not require the byte order to be homogeneous. That is, the stream could be made up of words, whose internal byte order could differ from the order in which the words themselves are placed. Taken from the GMP info documentation: -- Function: void mpz_import (mpz_t ROP, size_t COUNT, int ORDER, int SIZE, int ENDIAN, size_t NAILS, const void *OP) Set ROP from an array of word data at OP. The parameters specify the format of the data. COUNT many words are read, each SIZE bytes. ORDER can be 1 for most significant word first or -1 for least significant first. Within each word ENDIAN can be 1 for most significant byte first, -1 for least significant first, or 0 for the native endianness of the host CPU. The most significant NAILS bits of each word are skipped, this can be 0 to use the full words. There is no sign taken from the data, ROP will simply be a positive integer. An application can handle any sign itself, and apply it for instance with `mpz_neg'. -- Function: void * mpz_export (void *ROP, size_t *COUNTP, int ORDER, int SIZE, int ENDIAN, size_t NAILS, mpz_t OP) Fill ROP with word data from OP. The parameters specify the format of the data produced. Each word will be SIZE bytes and ORDER can be 1 for most significant word first or -1 for least significant first. Within each word ENDIAN can be 1 for most significant byte first, -1 for least significant first, or 0 for the native endianness of the host CPU. The most significant NAILS bits of each word are unused and set to zero, this can be 0 to produce full words. The number of words produced is written to `*COUNTP', or COUNTP can be `NULL' to discard the count. ROP must have enough space for the data, or if ROP is `NULL' then a result array of the necessary size is allocated using the current GMP allocation function (Note: Custom Allocation). In either case the return value is the destination used, either ROP or the allocated block. If OP is non-zero then the most significant word produced will be non-zero. If OP is zero then the count returned will be zero and nothing written to ROP. If ROP is `NULL' in this case, no block is allocated, just `NULL' is returned. The sign of OP is ignored, just the absolute value is exported. An application can use `mpz_sgn' to get the sign and handle it as desired. (Note: Integer Comparisons)
Also, couldn't the existing read_integer function be enhanced to do what you want?
That sounds like a good place for an exported interface to this functionality. Do you think adding new flag to cl_read_flags is warranted? [1] http://calcrpnpy.sourceforge.net/clnum.html Igor P.S.: I've noticed my previous messages were delayed for moderator approval before being posted to the list. Does that happen to every mail sent to the list or is there anything wrong with how I'm posting?