2 Attachment(s)
OpenCL: Vector Add [Updated Sep 04 2011]
Hi,
as you probably know, OpenCL is new standard for parallel computing. It is still in baby stage (specification ready, but implementations appear slowly). Unlike CUDA or STREAM, it is not vendor specific technology, and at the moment you can run it on NVIDIA, ATi and S3 hardware.
OpenCL allows you to execute C-like code on GPU, automatically parallelized. That means you can get very brutal speed boost for your time critical parts of code, without need to go assembler route.
It seems ideal for any vector operations, image processing and other heavy tasks.
If you own GeForce 8xxx and up or Radeon HD 4xxx and up with OpenCL enabled drivers, you can try attached example code on summing vectors. Older cards do not support this, and probably never will.
I provide adaptation of code from NVIDIA OpenCL Jump Start Guide. This guide is not bad introduction to OpenCL, except it contains quite a few typos and mistakes. The code I provide should be working adaptation of "OpenGL Host Code" and performs vector addition.
Code could be further optimized, but as-is it could give you idea how OpenCL coding works.
If you have the hardware, I would appreciate any input from your side (works/doesn't work, problems with headers, ...)
Important note: The code requires the OpenCL headers.
Petr
1 Attachment(s)
Re: OpenCL: First example
it's a pity.
Quote:
If you own GeForce 8xxx and up or Radeon HD 4xxx and up with OpenCL enabled drivers, you can try attached example code on summing vectors. Older cards do not support this, and probably never will.
my old nvidia card (Geforce 440mx..) cannot use this new open cl feature. the "opencl.dll" is missing too or it's delivered with new opencl graphic driver by nvidia ? one day I will buy a newer system, but why I should ? I am not a gamer ;)
best regards, frank
Re: OpenCL: First example
Hi Frank,
sadly the older cards were built like graphic cards and not as general purpose computing devices.
The listed cards are more mature in design, and programming them can bring you very, very significant speed boosts.
If you do image processing or other intensive calculations, OpenCL is very interesting route to take for better performance while using high level code and letting CPU relax. Gaussian blur is realtime thing for 1920x1080 with OpenCL, try to use your graphic editor and watch how long it takes to do it.
The key to this speed boost is not brute force approach, but massive parallelization.
OpenCL.DLL is installed with the drivers of mentioned graphic cards.
Petr
Re: OpenCL: First example
Thanks Petr for working on this cutting edge stuff. I looked at OpenCL a little, but to avoid confusing my confused mind any further I stopped.
I was looking at the specs for the new nvidia 300 series video cards. I think they will have at least 16 cuda cores and I saw a number for
200+ stream processing cores, not sure what the difference between the two is, but either way it sure seems like it will make OpenCL run even faster.
Re: OpenCL: First example
Kent,
do you think, once you have time, you could try to install latest 195.62 WHQL drivers and try to run this example?
OpenCL is not something complicated. Take for example te CL code in the attachement above:
So what does happen here?
You first create context for GPU device, then enumerate how many GPUs and pick the first one.
Then you initialize array A and B on the CPU with some data.
Then you create mapping of those + result array C to the GPU.
Then you execute kernel - think of it as very light weight thread.
[code=c]
__kernel void
vectorAdd(__global const float * a, __global const float * b, __global float * c)
{
// Vector element index
int nIndex = get_global_id(0);
c[nIndex] = a[nIndex] + b[nIndex];
}
[/code]
You can understand it +/- as:
[code=thinbasic]
KERNEL SUB vectorAdd(a AS SINGLE PTR, b AS SINGLE PTR, c AS SINGLE PTR)
' Vector element index
LOCAL nIndex AS LONG = get_global_id(0)
c(nIndex) = a(nIndex) + b(nIndex)
END SUB
[/code]
As you can see, currently the program spreads to as many HW cores as possible, so each array cell is processed independently. "get_global_id(0)" retrieves the index, and "c(nIndex) = a(nIndex) + b(nIndex)" simply puts sum of A, B to C.
Then you just read back the results. Very simple idea, not so complicated code ... and you have it working :)
Petr
Re: OpenCL: First example
Thanks Petr, will do.
I PM'd you about installing the latest thinBasic on my programming computer, but that runs the built in intel graphics, so ignore that part and I will install on my gaming pc which has nvidia. I will run the test tonight when I get home and put up the results. Thanks for the overview.
Re: OpenCL: First example
Your example ran fine Petr. There was no benchmark, but it did do all the vector math all the way through with no errors I am happy to report!!
Re: OpenCL: First example
Thank you very much Kent,
this was not benchmark example yet, I just wanted to know if it runs ... and it seems it did, which is good to hear!
Re: OpenCL: First example
hi petr, perhaps I can test your openCL example at school. how big is your "opencl.dll" ? can you send this file to me as e-mail (zip-file)? would be nice. perhaps I can test your example above at one of newer machine and graphic cards at school ;)
frank
Re: OpenCL: First example
Hi Frank,
the OpenCL.DLL can only come with graphic drivers. Its interface is always the same, but the implementation differs for each vendor. So it cannot be copied from PC to PC.
If you have some PC with GeForce and ForceWare 195.62 in school, it would be nice to try it.
But it is still very fresh technology, so it is not present at many PCs at the moment.
But thanks for the offer :)