Allocating a huge boolean array fails

  • Thread starter jk22
  • Start date
  • Tags
    Array
In summary, you are trying to allocate a bool[3^27] array of bits in a Linux system with 1,2TB swap partition but the system says Segmentation Fault. How could you do this ?
  • #1
jk22
729
24
Hello,

I'm trying to allocate a bool[3^27] array of bits in a Linux system with 1,2TB swap partition but the system says Segmentation Fault. How could I do this ? Thanx.
 
Technology news on Phys.org
  • #2
An array needs to be allocated as a contiguous area of memory/disk.
You may have enough memory/disk available in total, but not a contiguous region of that size
 
  • Like
Likes phinds
  • #3
However we do it, it should never result in a segmentation violation. If it can't be allocated, we'll get an error condition instead.
How are you allocating your array exactly?
And when do you get the segmentation fault exactly?
 
  • #4
I think you're asking too much of your computer to allocate such a big array. I would instead look to see how it could be partitioned and perhaps stored in a database or a random access file.
 
  • #5
I have written

Long long int size=pow3 (27);

Printf ("allocating");
Bool f [size];
Printf ("..ok");

Pow3(n); is a subroutine returning a long long int equals to 3^n.

It never arrives to print ok.

Would bool *f=new bool [size]; be more flexible even if the space is not contiguous ?
 
  • #6
What is the problem you are trying to solve that you need such a large array?
 
  • #7
jk22 said:
I have written

Long long int size=pow3 (27);

Printf ("allocating");
Bool f [size];
Printf ("..ok");

Pow3(n); is a subroutine returning a long long int equals to 3^n.

It never arrives to print ok.

Would bool *f=new bool [size]; be more flexible even if the space is not contiguous ?
That declaration puts the array in stack space, which is limited, causing a stack overflow. Instead I suggest to use indeed 'new', which puts it in dynamic memory. Then it will either succeed or it will throw a bad_alloc exception.
 
  • Like
Likes taverner and rbelli1
  • #8
You left some important context out of your question, particularly what programming language you are trying to use.

However if you are trying to allocate large arrays in C or C++ then the following issues are relevant:
  • The size of the bool datatype depends on the compiler but it is most likely one byte, not one bit, so when you try to create an array of ##3^{27}## booleans you are probably asking to create an array nearly 7 terabytes in size rather than the ~900 gigabytes you might have been expecting. If you think storing one boolean per byte is wasteful and want to store them as bits then this requires some additional work (e.g. look up how to use "bit fields").
  • Like others have pointed out, arrays are stored as a contiguous block in memory. If you have e.g. 1000 megabytes of free memory then this is no guarantee that you will be able to allocate a 1000 megabyte array: it could be that there is a block of 400 megabytes free somewhere and another block of 600 megabytes free somewhere else. So if you need close to the total amount of memory you have available you may need to use some other data structure that doesn't store all data in one contiguous block.
  • When you use the bool array[SIZE] syntax inside a function it allocates the array on the stack. This is an area of memory reserved for local function variables. The amount of stack space is usually very limited (e.g. a few megabytes, though the compiler or operating system may offer a way to change this). There are some ways to allocate an array in heap memory which don't have this limitation:
    • Make the array a global variable (i.e., move the declaration outside the body of any function).
    • Declare the array static.
    • Use malloc() or similar or (in C++) the new keyword to allocate memory.
    These won't help with the limitations above, though.

A couple of additional points:
  • When you create an array using TYPE array[SIZE], the size should be a compile-time constant (a literal number or constant expression like 1024*1024 or a #defined constant or const variable). If you want to create an array whose size will be computed at runtime (like by your pow3() function) then you should use malloc() or new in C++.

    C (but not C++) supports variable-length arrays (arrays whose sizes don't need to be known until runtime) since the C99 standard and some C++ compilers (like the GNU compiler) support them as a nonstandard extension. But they're still stack-allocated (with the limitations that follow) and they're risky to use (and nonportable in C++) because there is no way to test if the allocation was successful. (malloc(), by contrast, returns a null pointer if it fails to allocate memory.)
  • jk22 said:
    Printf ("allocating");
    Bool f [size];
    Printf ("..ok");

    [...]It never arrives to print ok.
    In your case probably the array allocation failed, but in general not seeing it print "..ok" doesn't necessarily mean anything. The standard output stream is buffered in C and C++. This means calling printf() won't necessarily print anything immediately but instead save what you want to print into a buffer and wait until there is a certain amount of text in the buffer before actually printing it. You should call fflush(stdout) if you want to force printing to be done immediately, or print to standard error, which is unbuffered, instead of standard output. (Also, what are "Printf" and "Bool"? C and C++ like most languages are case sensitive. I am assuming you meant "printf" and "bool".)
 
Last edited:
  • Like
Likes 3301, QuantumQuest and Vanadium 50
  • #9
jk22 said:
Hello,
I'm trying to allocate a bool[3^27] array of bits in a Linux system with 1,2TB swap partition but the system says Segmentation Fault. How could I do this ? Thanx.
Part of the skill involved in programming is to develop algorithms that fit within the resources available to you. Even it you managed to allocate such an array, the first thing you would probably want to do is clear it to all false values. That would take tens of minute if not hours.
I suggest you find a new attack.
 
  • #10
Also: swap (virtual memory) is really slow. It's hundreds or thousands of times slower than main memory. So even if you can get your code working be prepared for your computer to grind to a slow crawl, including any other applications you are running as the memory they're using is swapped out.
 
  • #11
If I read this thread correctly, and I assume that a boolean data object is 8 bits for your compiler:
327 = 7 625 597 484 987 or on the order of 1013 - bytes - somewhat less than ten trillion bytes. Allowing for the 1024/kB conversion factor you are still in the 8 TB range. You do not have 8TB of virtual memory.

[aside]This also looks like you are trying something with Graham's number.[/aside]

You have choices, assuming this is a rational thing to try in the first place:
Find a bignum library for your compiler and use it, there are several free libraries. Your compiler will thank you.
Find an alternate data representation - maybe use a bit array which gets you down to circa 2TB of virtual memory. Still a lot of disk.
 
  • #12
Thanks for your replies. I assumed a bool was 1 bit indeed hence 7/8 Tbytes.

In fact I fell upon this problem by wanting to check if any function f(x,y,z) can be decomposed in k(g(x,y),h(x,z)),l(y,z))

This decomposition seems okay, it's like all the projections on 2D planes of a higher dimensional space.*

I took x,y,z in 0,1 it's ok, but 0,1,2 is already too huge. In fact the number of functions f to check to have been obtained has complexity v^(m^d), where f(x1,...xd) is in {1,..v} and xi in {1,..m}. Hence this grows much faster than exponential or even factorial.

*Then I also tried for higher number of variables in 0,1 but it seems that dimension 7 is the highest dimension we could represent in this way (which is out of the scope of any computer we could imagine) But I think there are infinite many possibilities to decompose in 2 variables functions. (I somebody had references to this problem, thanks)
 
  • #13
jk22 said:
In fact I fell upon this problem by wanting to check if any function f(x,y,z) can be decomposed in k(g(x,y),h(x,z)),l(y,z))

Is there some constraint on these functions? E.g., why don't you consider $$\begin{eqnarray}
k(x, z, y) &=& f(x, y, z) \,, \nonumber \\
g(x, y) &=& x \,, \nonumber \\
h(x, z) &=& z \,, \nonumber \\
l(y, z) &=& y
\end{eqnarray}$$ a solution to this problem?
 
  • #14
wle said:
If you want to create an array whose size will be computed at runtime (like by your pow3() function) then you should use malloc() or new in C++.
Or use the standard C++ 'vector' data type, which allocates memory as necessary, e.g. 'std::vector<bool> f(size)'.
 
  • #15
wle said:
Is there some constraint on these functions? E.g., why don't you consider $$\begin{eqnarray}
wle said:
k(x, z, y) &=& f(x, y, z) \,, \nonumber \\
g(x, y) &=& x \,, \nonumber \\
h(x, z) &=& z \,, \nonumber \\
l(y, z) &=& y
\end{eqnarray}$$ a solution to this problem?

K shall be a function of 2 variables
I wrote it wrong its m (k ( g (x,y),h (x,y)),l (x,y)). Your solution is basically writing f (x,y,z)=k (g (x,y),z) which is not always possible since in this way you can generate 88 over 256 functions on 0,1
Yes thanks jtbell i tried new but i have to decompose [3^27]=[3^9]^3 which means a slight redecoding in base 3 easy but it takes a time for such a huge number of calls
 
Last edited:
  • #16
jtbell said:
Or use the standard C++ 'vector' data type, which allocates memory as necessary, e.g. 'std::vector<bool> f(size)'.
Fun fact, an std::vector<bool> is actually a specialization that is implemented as an array of bits.
As such it's not a 'true' STL container.
See for instance http://en.cppreference.com/w/cpp/container/vector_bool
 
  • #17
Whatever you are trying to do, setting up 327 bools is almost certainly not what you want. That's saving the result of 7 trillion calculations. If each calculation takes a microsecond, that's still 3 months.
 
  • Like
Likes sysprog
  • #18
I changed to smaller : 2^27 but even there it bugged. in fact the bug came from the usage of bool. I did with char and recomputed to compute the bits and it worked fine.
 
  • #19
Messages like #18 make me sad.

Message #8 was a great message in explaining what the problem is and how to fix it. But your are sticking to your guns and trying the same thing that didn't work last time.
 
  • Like
Likes ChrisVer
  • #20
wle said:
arrays are stored as a contiguous block in memory.
Only the virtual address space for an array is contiguous, the physical allocation can made of up of scattered chunks of 4096 byte physical memory pages.

wle said:
C (but not C++) supports variable-length arrays since the C99 standard and some C++ compilers support them as a nonstandard extension.
Microsoft and Visual Studio compilers do not support variable length arrays. They've somewhat kept up with C++ standards, but not C standards.
 
  • #21
You need to figure out what parts of the memory you need at any given time and only load that. You're trying to just load the whole thing into virtual memory and let the OS figure out what you need in actual RAM. Any software that has to open more than a gig or two of RAM, manages a lot of its own swapping (when an app does it it's usually called caching.) What on Earth are you trying to calculate with all of that data? My guess is that if you require that much memory, you need to figure out how to use parallelization and send it off to a server farm.

Other than things like physics simulations, the only thing I can think of that actually requires terabytes of information is raytracing for things like movies. Neither can be done or a regular person's computers. Simple things like that are done on supercomputers and things that take even more horsepower are farmed to companies like Amazon.
 
  • Like
Likes ChrisVer
  • #22
In fact having a mastercode that will overflow your memory consumption is not only discouraged practice but also wrong (it means that you are not working efficiently).
The only way to go about it is either revisit the way you approach your problem and try to be smart, or figure out a way to use cloud or cluster computing to split your (impossible) master-job into smaller mini-jobs that run on different machines in parallel.
 
  • #24
erratum : I'm trying to write $$f(x,y,z)$$ in the form $$m(g(x,z),h(y,z))$$ more precisely to find the ratio of such generated function over the whole in the case $$m:\{0,1,2\}^2 \rightarrow \{0,1\}$$ and $$h,g : \{0,1,2\}^2\rightarrow\{0,1,2\}$$
 
Last edited:
  • #25
jk22 said:
erratum : I'm trying to write $$f(x,y,z)$$ in the form $$m(g(x,z),h(y,z))$$ more precisely to find the ratio of such generated function over the whole in the case $$m:\{0,1,2\}^2 \rightarrow \{0,1\}$$ and $$h,g : \{0,1,2\}^2\rightarrow\{0,1,2\}$$
This is very different from what you first asked, about an array that contained 327 bits.

Your notation suggests to me that h and g each have one of 9 pairs of input values -- (0, 0), (0, 1), (0, 2), (1, 0), ..., (2, 2), and produce 0, 1, or 2 as the output. m has the same 9 pairs as input values, and produces 0 or 1 as its output. Maybe you could use small lookup tables or small arrays?
 
  • #26
If we count there are ##2^9## different m and ##3^9## different g and h.

Hence the loop is over ##2^9 3^{18}## different function composition.

Somehow i have to check if the obtained function f is different from the others hence an array of ##2^{27}## bits or 16 MB which is manageable.

In fact its the loops that use much time, like a week.

I wonder if its possible to count the functions without exhaustive search on a computer.
 
Last edited by a moderator:
  • #27
jk22 said:
If we count there are ##2^9## different m and ##3^9## different g and h.

Hence the loop is over ##2^9 3^{18}## different function composition.
Wouldn't that be ##2^9 3^9 = 6^9## different possibilities? That's just over 10,000,000 different possibilities.
 
  • #28
I think it's ##3^9## for g and ##3^9## for h then we have to multiply all.
 
Last edited by a moderator:
  • #29
jk22 said:
I think it's ##3^9## for g and ##3^9## for h then we have to multiply all.
You're right. I used m and either g or h instead of g and h. Anyway, that's just ##3^9 \cdot 3^9 = 9^9## or 387,420,489 different input pairs for m. Still a lot less than your original ##3^{27}##.

Since m produces either 0 or 1, I'm not sure you need to double the result above.

LaTeX tip: For plain old numbers, like ##3^9##, use a pair of ## characters rather than a pair of $$. The former are for inline LaTeX and the later for standalone LaTeX. There's no good reason that I can see for ##3^9##, for example, to be formatted on its own line with extra lines before and after.
 
  • #30
jk22 said:
erratum : I'm trying to write $$f(x,y,z)$$ in the form $$m(g(x,z),h(y,z))$$ more precisely to find the ratio of such generated function over the whole in the case $$m:\{0,1,2\}^2 \rightarrow \{0,1\}$$ and $$h,g : \{0,1,2\}^2\rightarrow\{0,1,2\}$$

Have you considered starting with a smaller problem?
Say with ##m:\{0,1\}^2 \rightarrow \{0,1\}## and ##h,g : \{0,1\}^2\rightarrow\{0,1\}##.
It means we have ##2^{2^3}=256## options for f and ##2^{2^2}=16## options for both g and h.
From there we can construct m and validate it, so that with brute force we need to verify ##256\cdot 16 \cdot 16 \cdot 8 = 2^{19} = 524288## values.
Still large but much more feasible.

Btw, I have the sneaking suspicion that it's not possible for all functions f, but I can't prove it yet.
It seems to me that since we reduce the domain of f from ##\{0,1,2\}^3## to the domain of m, which is ##\{0,1,2\}^2##, we cannot distinguish all possibilities.
The same should apply for the smaller domains.
 
  • Like
Likes jk22
  • #31
In fact I already checked for binary domains. It can generate 3/4 of the number of f's.(f image domain is always considered with 2 values only)

But my suspiscion was that since the number of possible compositions over number of f's is ##2^{n^2}n^{2n^2}/2^{n^3}## there is a potential anomaly at values of n being 3,4,5. But I just checked for 3 and there is no surprise.
 
  • Like
Likes I like Serena
  • #32
jk22 said:
I wonder if its possible to count the functions without exhaustive search on a computer.

You could potentially reduce the number of possibilities you need to search through. For instance, if you can write ##f(x, y, z) = m \bigl( g(x, z), h(y, z) \bigr)## then you can express the same function ##f## as ##f(x, y, z) = m' \bigl( g'(x, z), h'(y, z) \bigr)## where $$\begin{eqnarray}
m'(x, y) &=& m \bigl( \pi^{-1}(x), \pi'^{-1}(y) \bigr) \,, \\
g'(x, z) &=& \pi \bigl( g(x, z) \bigr) \,, \\
h'(y, z) &=& \pi' \bigl( h(y, z) \bigr)
\end{eqnarray}$$ and ##\pi## and ##\pi'## are arbitrary permutations ##\{0, 1, 2\} \to \{0, 1, 2\}##. So if you can avoid counting possibilities that are related to each other by permutations like this then this reduces the number of possibilities you need to search through by a factor of around ##(3!)^{2} = 36##. (It won't be exactly 36 because there are some functions that are the same under certain permutations. For instance, if ##g(x, z) = 0## for all ##x, z## then a permutation ##\pi## that swaps 1 and 2 doesn't change anything.) One way to do this is to impose constraints on the form of ##g## and ##h##. For instance, write the different evaluations of ##g## in a certain order, e.g., $$\begin{equation}
g(0, 0),\, g(0, 1),\, g(0, 2),\, g(1, 0),\, g(1, 1),\, g(1, 2),\, g(2, 0),\, g(2, 1),\, g(2, 2) \,.
\end{equation}$$ You could exploit the symmetry under permutations to impose that ##g(0, 0) = 0## and that ##g(x, z) = 1## for the first ##g(x, z)## appearing in the list for which ##g(x, z) \neq 0##, if there is one. I.e., consider only functions ##g## for which the list above starts with one or more 0s, then a 1, then any numbers in ##\{0, 1, 2\}## (and do the same for ##h##).
 
  • Like
Likes sysprog, I like Serena and jk22
  • #33
wle said:
this reduces the number of possibilities you need to search through by a factor of around ##(3!)^{2} = 36##.
Building on this approach, I think we can do a little better by setting f(0,0,0)=g(0,0)=h(0,0)=0, which is good for a reduction of ##3^3=27##.
And additionally only set f(x,y,z)=1, respectively g(x,z)=1, and h(y,z)=1 for the first applicable value that is ##\ne 0##. That should be good for another reduction by a factor of ##2^3=8## for a total of ##216##.
It won't affect the logic, and it won't affect the proportions.

Furthermore, getting back to the original reason for this thread, we don't need to allocate all that much memory do we?
We only need to iterate over all possibilities for f, g, and h, and we only need to keep one of each in memory at a time.
It remains a problem that the number of evaluations to make is so big
 
  • Like
Likes sysprog
  • #34
Yes if ##n=4## it is already out of reach but I think this ratio is decreasing as n increases but I have no proof. In fact it needs an exhaustive search only for ## 3,4,5##

What I noticed is for ##n=2## we can decompose any ## f ## with 5 functions in the form ##g(h(k(x,z),l(y,z)),m(x,z))##

But I don't know if it is always possible to decompose in functions of only ##(x,z)## and ##(y,z)##, which is a locality condition, for higher n.
 
Last edited:
  • #35
An addition: I don't know about ratios/counting, but if the question is whether any ##f## can be written ##f(x, y, z) = m \bigl( g(x, z), h(y, z) \bigr)## as specified above then I think it's not too difficult to engineer examples where this will not work.

Here's an example. Suppose we have a function which, for ##z = 0##, gives $$\begin{equation}
f(x, y, 0) = \begin{cases}
0 &\text{if } x = y \\
1 & \text{if } x \neq y
\end{cases} \,.
\end{equation}$$ Looking at some of the values you can notice that $$\begin{eqnarray}
f(0, 0, 0) \neq f(1, 0, 0) &\Rightarrow& m \bigl( g(0, 0), h(0, 0) \bigr) \neq m \bigl( g(1, 0), h(0, 0) \bigr) \,, \\
f(0, 0, 0) \neq f(2, 0, 0) &\Rightarrow& m \bigl( g(0, 0), h(0, 0) \bigr) \neq m \bigl( g(2, 0), h(0, 0) \bigr) \,, \\
f(1, 1, 0) \neq f(2, 1, 0) &\Rightarrow& m \bigl( g(1, 0), h(1, 0) \bigr) \neq m \bigl( g(2, 0), h(1, 0) \bigr) \,.
\end{eqnarray}$$ These in turn are only possible if (respectively) $$\begin{eqnarray}
g(0, 0) &\neq& g(1, 0) \,, \\
g(0, 0) &\neq& g(2, 0) \,, \\
g(1, 0) &\neq& g(2, 0) \,,
\end{eqnarray}$$ i.e., ##g(0, 0)##, ##g(1, 0)##, and ##g(2, 0)## must all be different from each other. Using the symmetry under permutations that I mentioned in my previous post, we can set ##g(x, 0) = x## without loss of generality. Similarly, we can also choose ##h(y, 0) = y##. This means that, for ##z = 0##, the decomposition ##f(x, y, z) = m \bigl( g(x, z), h(y, z) \bigr)## simplifies to ##f(x, y, 0) = m(x, y)##. In other words, without loss of generality, we can take ##m## to be the function $$\begin{equation}
m(x, y) = \begin{cases}
0 &\text{if } x = y \\
1 & \text{if } x \neq y
\end{cases} \,,
\end{equation}$$ and, for ##z \neq 0##, we can take ##f## to have the form $$\begin{equation}
f(x, y, z) = \begin{cases}
0 &\text{if } g(x, z) = h(y, z) \\
1 & \text{if } g(x, z) \neq h(y, z) \end{cases} \,.
\end{equation}$$ Now suppose we want ##f## to have the following values for ##x, y \in \{0, 1\}## and ##z = 1##: $$\begin{eqnarray}
f(0, 0, 1) &=& 0 \,, \\
f(0, 1, 1) &=& 0 \,, \\
f(1, 0, 1) &=& 0 \,, \\
f(1, 1, 1) &=& 1 \,.
\end{eqnarray}$$ This doesn't work. The first three conditions would require ##g(0, 1) = h(0, 1)##, ##g(0, 1) = h(1, 1)##, and ##g(1, 1) = h(0, 1)##, which together imply ##g(1, 1) = h(1, 1)##, but the last value would require ##g(1, 1) \neq h(1, 1)##.
 
  • Like
Likes sysprog

Related to Allocating a huge boolean array fails

What is a boolean array and how is it used in scientific research?

A boolean array is a data structure used in computer science and scientific research to store and manipulate data that can only have two possible values: true or false. It is commonly used in algorithms and data analysis to make logical decisions based on the values stored in the array.

What does it mean when allocating a boolean array fails?

Allocating a boolean array means reserving a block of memory to store the array. When this process fails, it means that there is not enough available memory to store the array, usually due to the size of the array being too large or the system being low on memory.

What are the potential causes of a failed boolean array allocation?

The most common cause of a failed boolean array allocation is insufficient memory. This can occur if the array is too large for the available memory or if the system is already using a significant amount of memory. Other potential causes include coding errors or conflicts with other programs running on the system.

How can I troubleshoot and fix a failed boolean array allocation?

If you encounter a failed boolean array allocation, you can try troubleshooting by checking the size of the array and the available memory on the system. If the array is too large, you can try breaking it into smaller arrays or using a different data structure. If the system is low on memory, you can try closing other programs or increasing the available memory. If these solutions do not work, you may need to consult with a computer expert for further assistance.

Are there any precautions I can take to avoid a failed boolean array allocation?

To avoid a failed boolean array allocation, it is important to carefully plan the size of the array and the available memory on the system. You can also use error handling techniques in your code to catch and handle any potential allocation errors. Additionally, regularly monitoring and managing the memory usage on your system can help prevent allocation failures.

Similar threads

  • Programming and Computer Science
Replies
2
Views
1K
  • Programming and Computer Science
Replies
4
Views
537
  • Programming and Computer Science
Replies
8
Views
3K
  • Programming and Computer Science
Replies
15
Views
3K
  • Programming and Computer Science
Replies
20
Views
3K
  • Programming and Computer Science
Replies
4
Views
2K
  • Programming and Computer Science
Replies
2
Views
3K
  • Programming and Computer Science
Replies
5
Views
12K
  • Programming and Computer Science
Replies
33
Views
4K
  • Aerospace Engineering
Replies
1
Views
761
Back
Top