This is a new type of article here, I hope I will have some more such articles describing programming problems and the practices to solve the former.
Contents
- Unlimited Resources Myth
- Basic Data Structure
- Improved Data Structure
- Optimized Data Structure
- Data Packing Algorithm
- Faster Memory Allocation
- Sporadic nature of data insertion
- Example Program
- Conclusion
Top
Unlimited Resources Myth
The overhead of various frameworks and libraries sometimes (or should I say always) makes memory usage and CPU utilization less efficient. Fast hardware and loads of RAM lead to sloppy coding practices too, luring the developers into the mirage of endless resources.
Sure thing, most developers would place a check now and than (e.g. making sure malloc does not return NULL, although most of the time this will not happen). But what happens when memory is fragmented? Large data structures will make fragmentation even worse and allocating more memory will become slower and slower.
And yes, eventually the available RAM will shrink.
Top
Basic Data Structure
Consider this data structure:
typedef struct{
int id; //MAX id < 10000
int array1[10000]; //MAX element value <= 4
int array2[10000]; //MAX element value <= 4
} my_struct_type;
struct my_struct_type my_array[10000];
On a 64bit machine the structure will take 8 (id) + 8 x 10000 x 2 (array1,2) = 160008 bytes. Creating an array of 10000 such structures will occupy about 1.6GB. This is a not-so-high level language. Just imagine what size the equivalent structure will be in Java or C#.
Top
Improved Data Structure
Let's take a closer look at the struct. It is noticeable that the arrays operate on very small values, so it would be natural to change the structure to use smallest possible datatype for arrays elements:
typedef struct{
int id; //MAX id <= 10000
uint8_t array1[10000]; //MAX element value <= 4
uint8_t array2[10000]; //MAX element value <= 4
} my_struct_type;
Now the size of the structure is 8 (id) + 1 x 10000 x 2 (array1,2) = 20008 bytes. And the total size is 200MB. That is much more manageable.
But wait! What will happen if we need more structures -e.g. 100K? This will lead to 2GB - oops, too much again (to be honest, 200MB is too much anyway).
Most developers (I hope) know that a byte consists of 8 bits. From the data structure comments we can assume that an array element will not occupy more than 2 bytes. So why don't we use bitfields instead of full uint8_t? Well, it's better not to - bitfields are compiler dependent and are not that efficient (more on that later).
Theoretically we can reduce the data struct footprint to a quarter size. And it is possible using bitmasks and bitwise operators. Lets do it step by step.
Top
Optimized Data Structure
Start with the notion the arrays1,2 are of the same size. So we can put element of array2 into unused part of array1:
typedef struct{
int id; //MAX id <= 10000
uint8_t array12[10000]; //two arrays combined
} my_struct_type;
So now the array element in its binary form will look like array12[0] = 0b0000BBAA, where AA is the array1 element and BB is array2 element. So the size is halved. But there is still enough room to insert more data. Why don't we store one more of each elements of array1 and array2 in the same element of array 12, like this: array12[0] = 0bB1A1B0A0.
this way our struct becomes
typedef struct{
int id; //MAX id <= 10000
uint8_t array12[5000]; //four values in one element
} my_struct_type;
Now the memory footprint will be roughly 50MB. And this is not the only benefit of the current data structure. But before discussing it lets consider the possible overhead of bit manipulation.
Data Packing Algorithm
To store the value val in the array's element el we need to perform the following operations:
1. Put value in correct binary position:
val = val << val_pos;
2. Depending on the index value, shift the value to the left half:
val = val << (index % 2 ? 4 : 0);
Modulus can be replaced with binary index & 1 and the value can be shifted left 2 bits to get rid of conditional jump:
val = val << ((index & 1 ) << 2);
3. Insert the value into the element:
el |= val;
There are 5 bitwise operators. Bitwise operators are the most simple and "inexpencive" ones for the CPU. This will take about the same time (or only slightly more) than storing (copying) a value into array.
Faster Memory Allocation
As I mentioned earlier the reduced memory footprint is not the only good thing about the new data structure. For the computer's memory management engine it is much easier to allocate lots of small memory chunks than lots of large memory chunks. By reducing the structures memory size we increase the processing seed (possible compensating for additional CPU cycles taken by bitwise operators).
Sporadic nature of new data insertion
Another thing worth noting that addition of new elements does not happen all at once, but occurs more or less sporadically. The new element creation will be unnoticeable from CPU point of view overshadowed by more CPU intense operations. As for the occupied memory, there is not way around it and at some stage the program will have to struggle with subsequent memory allocation due to RAM fragmentation (if using dynamic memory allocation of course).
Top
Example program
A working example can be found here: https://github.com/droukin-jobs/packer - a proof of concept that bitwise data placement is not much slower than traditional approach and much more memory efficient.
Top
Conclusion
While having "unlimited" memory and CPU resources it is still important to keep track of your resource usage. How many times you had to sit and wait until some program loads a new screen or processes a new request (a good example here is SolidWorks - the behemoth is so slow even on the fastest systems despite being a well known and respected brand). When proramming it is good to create robust and easy to understand code, but after all testing is done, why not optimize some obvious parts - if you know how?
Top