Search:

Type: Posts; User: Charles Pegge

Search: Search took 0.01 seconds.

  1. Re: Number crunching using Single precision SSE regs

    There may be some overhead in your test loop Petr. Intel suggests 2.1x faster than their FPU equivalent. I'm getting 0.1 secs over 0.17 secs under PB. But in any case it is not a major leap in...
  2. Re: Number crunching using Single precision SSE regs

    Okay, here are two versions for 4x4 Matrix multiplication for comparison: SSE (Intel) and FPU (my code)




    ' Floating point vector maths using SIMD instructions
    '...
  3. Re: Number crunching using Single precision SSE regs

    Phew! :)

    I've adapted an Intel example of 4x4 matrix multiply with SSE2 instructions. I don't understand the way it shuffles data around. but I'll post it here ASAP. perhaps you will be able to...
  4. Re: Number crunching using Single precision SSE regs

    Hi Petr,

    You could try commenting out instruction lines to see which ones are disruptive. We know that movups and addps works from your previous demo.

    My cpu is an Athlon 64 X2.
  5. Re: Number crunching using Single precision SSE regs

    Thanks Kent - I will investigate. The Cuda driver which is downloading now, is quite a lump: 72 Megs!

    If we can devise light-weight support for GPU calculations it will be well worth the effort.
    ...
  6. Number crunching using Single precision SSE regs

    Intel & AMD (in 32 bit mode) support single precision arithmetic four numbers at the same time using the SSE registers.

    Just a few caveats though:

    This is ideal for vector processing but beware...
Results 1 to 6 of 6