Page 1 of 2 12 LastLast
Results 1 to 10 of 15

Thread: OpenCL: Vector Add [Updated Sep 04 2011]

  1. #1
    Super Moderator Petr Schreiber's Avatar
    Join Date
    Aug 2005
    Location
    Brno - Czech Republic
    Posts
    7,128
    Rep Power
    732

    OpenCL: Vector Add [Updated Sep 04 2011]

    Hi,

    as you probably know, OpenCL is new standard for parallel computing. It is still in baby stage (specification ready, but implementations appear slowly). Unlike CUDA or STREAM, it is not vendor specific technology, and at the moment you can run it on NVIDIA, ATi and S3 hardware.

    OpenCL allows you to execute C-like code on GPU, automatically parallelized. That means you can get very brutal speed boost for your time critical parts of code, without need to go assembler route.

    It seems ideal for any vector operations, image processing and other heavy tasks.

    If you own GeForce 8xxx and up or Radeon HD 4xxx and up with OpenCL enabled drivers, you can try attached example code on summing vectors. Older cards do not support this, and probably never will.

    I provide adaptation of code from NVIDIA OpenCL Jump Start Guide. This guide is not bad introduction to OpenCL, except it contains quite a few typos and mistakes. The code I provide should be working adaptation of "OpenGL Host Code" and performs vector addition.

    Code could be further optimized, but as-is it could give you idea how OpenCL coding works.

    If you have the hardware, I would appreciate any input from your side (works/doesn't work, problems with headers, ...)

    Important note: The code requires the OpenCL headers.


    Petr
    Attached Images Attached Images
    Attached Files Attached Files
    Last edited by Petr Schreiber; 04-09-2011 at 18:04.
    Learn 3D graphics with ThinBASIC, learn TBGL!
    Windows 10 64bit - Intel Core i5-3350P @ 3.1GHz - 16 GB RAM - NVIDIA GeForce GTX 1050 Ti 4GB

  2. #2
    Senior Member Lionheart008's Avatar
    Join Date
    Sep 2008
    Location
    Germany, Bad Sooden-Allendorf
    Age
    51
    Posts
    934
    Rep Power
    109

    Re: OpenCL: First example

    it's a pity.

    If you own GeForce 8xxx and up or Radeon HD 4xxx and up with OpenCL enabled drivers, you can try attached example code on summing vectors. Older cards do not support this, and probably never will.
    my old nvidia card (Geforce 440mx..) cannot use this new open cl feature. the "opencl.dll" is missing too or it's delivered with new opencl graphic driver by nvidia ? one day I will buy a newer system, but why I should ? I am not a gamer

    best regards, frank

    Attached Images Attached Images
    you can't always get what you want, but if you try sometimes you might find, you get what you need

  3. #3
    Super Moderator Petr Schreiber's Avatar
    Join Date
    Aug 2005
    Location
    Brno - Czech Republic
    Posts
    7,128
    Rep Power
    732

    Re: OpenCL: First example

    Hi Frank,

    sadly the older cards were built like graphic cards and not as general purpose computing devices.
    The listed cards are more mature in design, and programming them can bring you very, very significant speed boosts.

    If you do image processing or other intensive calculations, OpenCL is very interesting route to take for better performance while using high level code and letting CPU relax. Gaussian blur is realtime thing for 1920x1080 with OpenCL, try to use your graphic editor and watch how long it takes to do it.

    The key to this speed boost is not brute force approach, but massive parallelization.

    OpenCL.DLL is installed with the drivers of mentioned graphic cards.


    Petr
    Learn 3D graphics with ThinBASIC, learn TBGL!
    Windows 10 64bit - Intel Core i5-3350P @ 3.1GHz - 16 GB RAM - NVIDIA GeForce GTX 1050 Ti 4GB

  4. #4
    thinBasic MVPs kryton9's Avatar
    Join Date
    Nov 2006
    Location
    Naples, Florida & Duluth, Georgia
    Age
    67
    Posts
    3,869
    Rep Power
    404

    Re: OpenCL: First example

    Thanks Petr for working on this cutting edge stuff. I looked at OpenCL a little, but to avoid confusing my confused mind any further I stopped.
    I was looking at the specs for the new nvidia 300 series video cards. I think they will have at least 16 cuda cores and I saw a number for
    200+ stream processing cores, not sure what the difference between the two is, but either way it sure seems like it will make OpenCL run even faster.

    Acer Notebook: Win 10 Home 64 Bit, Core i7-4702MQ @ 2.2Ghz, 12 GB RAM, nVidia GTX 760M and Intel HD 4600
    Raspberry Pi 3: Raspbian OS use for Home Samba Server and Test HTTP Server

  5. #5
    Super Moderator Petr Schreiber's Avatar
    Join Date
    Aug 2005
    Location
    Brno - Czech Republic
    Posts
    7,128
    Rep Power
    732

    Re: OpenCL: First example

    Kent,

    do you think, once you have time, you could try to install latest 195.62 WHQL drivers and try to run this example?

    OpenCL is not something complicated. Take for example te CL code in the attachement above:

    So what does happen here?

    You first create context for GPU device, then enumerate how many GPUs and pick the first one.

    Then you initialize array A and B on the CPU with some data.

    Then you create mapping of those + result array C to the GPU.

    Then you execute kernel - think of it as very light weight thread.
    [code=c]
    __kernel void
    vectorAdd(__global const float * a, __global const float * b, __global float * c)
    {
    // Vector element index
    int nIndex = get_global_id(0);
    c[nIndex] = a[nIndex] + b[nIndex];
    }
    [/code]

    You can understand it +/- as:
    [code=thinbasic]
    KERNEL SUB vectorAdd(a AS SINGLE PTR, b AS SINGLE PTR, c AS SINGLE PTR)
    ' Vector element index
    LOCAL nIndex AS LONG = get_global_id(0)
    c(nIndex) = a(nIndex) + b(nIndex)
    END SUB
    [/code]

    As you can see, currently the program spreads to as many HW cores as possible, so each array cell is processed independently. "get_global_id(0)" retrieves the index, and "c(nIndex) = a(nIndex) + b(nIndex)" simply puts sum of A, B to C.

    Then you just read back the results. Very simple idea, not so complicated code ... and you have it working


    Petr
    Learn 3D graphics with ThinBASIC, learn TBGL!
    Windows 10 64bit - Intel Core i5-3350P @ 3.1GHz - 16 GB RAM - NVIDIA GeForce GTX 1050 Ti 4GB

  6. #6
    thinBasic MVPs kryton9's Avatar
    Join Date
    Nov 2006
    Location
    Naples, Florida & Duluth, Georgia
    Age
    67
    Posts
    3,869
    Rep Power
    404

    Re: OpenCL: First example

    Thanks Petr, will do.

    I PM'd you about installing the latest thinBasic on my programming computer, but that runs the built in intel graphics, so ignore that part and I will install on my gaming pc which has nvidia. I will run the test tonight when I get home and put up the results. Thanks for the overview.
    Acer Notebook: Win 10 Home 64 Bit, Core i7-4702MQ @ 2.2Ghz, 12 GB RAM, nVidia GTX 760M and Intel HD 4600
    Raspberry Pi 3: Raspbian OS use for Home Samba Server and Test HTTP Server

  7. #7
    thinBasic MVPs kryton9's Avatar
    Join Date
    Nov 2006
    Location
    Naples, Florida & Duluth, Georgia
    Age
    67
    Posts
    3,869
    Rep Power
    404

    Re: OpenCL: First example

    Your example ran fine Petr. There was no benchmark, but it did do all the vector math all the way through with no errors I am happy to report!!
    Acer Notebook: Win 10 Home 64 Bit, Core i7-4702MQ @ 2.2Ghz, 12 GB RAM, nVidia GTX 760M and Intel HD 4600
    Raspberry Pi 3: Raspbian OS use for Home Samba Server and Test HTTP Server

  8. #8
    Super Moderator Petr Schreiber's Avatar
    Join Date
    Aug 2005
    Location
    Brno - Czech Republic
    Posts
    7,128
    Rep Power
    732

    Re: OpenCL: First example

    Thank you very much Kent,

    this was not benchmark example yet, I just wanted to know if it runs ... and it seems it did, which is good to hear!
    Learn 3D graphics with ThinBASIC, learn TBGL!
    Windows 10 64bit - Intel Core i5-3350P @ 3.1GHz - 16 GB RAM - NVIDIA GeForce GTX 1050 Ti 4GB

  9. #9
    Senior Member Lionheart008's Avatar
    Join Date
    Sep 2008
    Location
    Germany, Bad Sooden-Allendorf
    Age
    51
    Posts
    934
    Rep Power
    109

    Re: OpenCL: First example

    hi petr, perhaps I can test your openCL example at school. how big is your "opencl.dll" ? can you send this file to me as e-mail (zip-file)? would be nice. perhaps I can test your example above at one of newer machine and graphic cards at school

    frank
    you can't always get what you want, but if you try sometimes you might find, you get what you need

  10. #10
    Super Moderator Petr Schreiber's Avatar
    Join Date
    Aug 2005
    Location
    Brno - Czech Republic
    Posts
    7,128
    Rep Power
    732

    Re: OpenCL: First example

    Hi Frank,

    the OpenCL.DLL can only come with graphic drivers. Its interface is always the same, but the implementation differs for each vendor. So it cannot be copied from PC to PC.

    If you have some PC with GeForce and ForceWare 195.62 in school, it would be nice to try it.
    But it is still very fresh technology, so it is not present at many PCs at the moment.

    But thanks for the offer
    Learn 3D graphics with ThinBASIC, learn TBGL!
    Windows 10 64bit - Intel Core i5-3350P @ 3.1GHz - 16 GB RAM - NVIDIA GeForce GTX 1050 Ti 4GB

Page 1 of 2 12 LastLast

Members who have read this thread: 0

There are no members to list at the moment.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •