Advertisement
Is there a c++ preprocessor instruction to repeat something a number of times?
I am doing some loop unrolling and I would like to do something like this (making up my own preprocessor command :P)
#repeat 16
x += p[a++];
#endrepeat
I am doing some loop unrolling and I would like to do something like this (making up my own preprocessor command :P)
#repeat 16
x += p[a++];
#endrepeat
Advertisement
Advertisement
-
Re: using the preprocessor to do something N times
Sun, July 30, 2006 - 12:51 PMNot that I know of. You could make an inline function or macro to simplify the code but you'll still have to call the function or macro 16 times.
I wonder how much of a performance increase loop unrolling of this type would yield since (I believe) nearly all modern CPU architectures have good branch prediction/multiple execution pipelines for simple loops like this. -
-
Re: using the preprocessor to do something N times
Sun, July 30, 2006 - 1:26 PMYou might be surprised. Unrolling the follwing naive loop:
x = 0;
for(int a = 0; a<READ_UNROLL_BLOCKSIZE; a++)
{
x += p[a];
}
To the following:
int a;
x = 0;
// we will use 16-fold unroll
int blockLimit = READ_UNROLL_BLOCKSIZE & ~15;
for(a = 0; a < blockLimit; a+=16)
{
x += p[a + 0];
x += p[a + 1];
x += p[a + 2];
x += p[a + 3];
x += p[a + 4];
x += p[a + 5];
x += p[a + 6];
x += p[a + 7];
x += p[a + 8];
x += p[a + 9];
x += p[a + 10];
x += p[a + 11];
x += p[a + 12];
x += p[a + 13];
x += p[a + 14];
x += p[a + 15];
}
// process the remaining reads
for (a = blockLimit; a < READ_UNROLL_BLOCKSIZE;)
{
x += p[a++];
}
results in a significant perf increase. If we run the routine several million times to get a nice sample wed see numbers like this:
Running : Loop Unroll Read
Running unoptimized version
Elapsed time [6.33] seconds
Running optimized version
Elapsed time [3.17] seconds
During the unrolled loop we do 16 read operations before we hit the conditional. This allows the processor to do a lot more pipelining. Unrolling more than 16-fold appears to result in negligible further improvement however. -
-
Re: using the preprocessor to do something N times
Sun, July 30, 2006 - 1:35 PMbtw I was thinking of using a macro like:
#define BODY x += p[a++]
then in the loop body doing:
BODY;
BODY;
BODY;
etc...
Turns out that this kills the pipelining since the proc must increment a before issuing the read to the memory controller. The best speed I get comes from hardcoding the offsets as in the code in my prev. post.
-
-
-
Re: using the preprocessor to do something N times
Sun, July 30, 2006 - 2:36 PM
Jason,
Yes, you can do this with templates. It's called meta-programming, and it's got its' own slew of weirdnesses about it, and like any tool, it can be used to do cool things (good), yet also be used to write code that is absolute hell to figure out (evil). Basically, with templates, you can "run" code at compile-time, without ever "running" the compiled code.
Here is a URL that describes some of what is possible with template metaprogramming:
osl.iu.edu/~tveldhui/pa...meta-art.html
Regards,
John
Falling You - exploring the beauty of voice and sound
www.fallingyou.com