Hello, On Wed, Oct 20, 2010 at 8:47 PM, Cristobal Navarro <axischire@gmail.com> wrote:
yes, it adds some memoery usage, but doing the math i found that it isnt too much, for the 8000x8000 problem, each string should be in the worst case length 21 (for example string s = "Ld0x1d2x3d4x5d6x7d8x9" ), assuming std::string is 1 byte per length unit: stringMemory <= 21 bytes * 64M <= 1344Mbytes <= 1.3GB
I'm afraid your calculation is wrong. Let's forget about GiNaC for a moment and estimate the memory usage of map<string, map<string, long> >. std::map is a red-black tree. Every node has 3 pointers (2 children and parent), color (int), and the actual data (key and value). So, every node consumes 3*sizeof(void *) + sizeof(int) + padding + sizeof(string) + sizeof(long) + strlen(TypicalString) = 4*sizeof(void *) + sizeof(int) + padding + sizeof(long) + strlen(TypicalString) (std::string is essentialy a char* as far as memory usage is concerned). We got N trees each of N nodes, thus we need (4*sizeof(void *) + sizeof(int) + padding + sizeof(long) + strlen(TypicalString))*N^2 bytes. On x86_64 this will be (48 + strlen(TypicalString))*N^2. If typical string is 16 characters long, this equals 64*N^2 # N kb 1000 62500 2000 250000 3000 562500 4000 1000000 To check this theory consider the following program (it doesn't use GiNaC): // mapmap.cpp #include <map> #include <string> #include <sstream> #include <iostream> #include <cstdlib> #include <unistd.h> #include <sys/resource.h> using namespace std; int main(int argc, char** argv) { cout << "sizeof(string) " << sizeof(string) << endl; const int dflt_size = 800; int size = dflt_size; if (argc >= 2) size = atoi(argv[1]); if (size <= 10) size = dflt_size; map<string, map<string, long> > big_map; for (int i = 0; i < size; ++i) { map<string, long> m; for (int j = 0; j < size; ++j) { ostringstream oss; oss << "Row " << i << " Col" << j; // ~ 16 chars m[oss.str()] = i + j; } ostringstream oss; oss << "Row" << i; big_map[oss.str()] = m; } struct rusage rus; getrusage(RUSAGE_SELF, &rus); cout << size << "\t" << rus.ru_maxrss << std::endl << std::flush; return 0; } $ g++ -O2 -Wall -pipe -march=native -o mapmap mapmap.cpp $ for i in 1000 2000 3000 4000; do ./mapmap $i; done On my system (Athlon 64 X2 running Debian GNU/Linux) it prints # N kb 1000 112096 2000 450804 3000 1019464 4000 1833620 Experimental values are ~ 2x bigger than ones predicted by theory. Perhaps I've missed a pointer to vtbl somewhere. Anyway, it's clear that memory usage is O(N^2), and N = 8000 would consume ~ 2^2 * 2Gb = 8Gb (I'm unable to check this experimentally -- my computer has 2Gb of RAM). Conclusion: the binary tree is not suitable large amounts of data. Perhaps you need to use a different data structure. Also, use ex instead of ex* (ex is actually a wrapped pointer). Hope this helps, Alexei