PDA

View Full Version : New benchmark proposal



Petr Schreiber
06-11-2008, 09:46
Hi all,

it seems old TBGL benchmark has its issues, so it does not represent graphics card performance as well as intented.

Why benchmark?
Lot of benchmarking serve just to make somebody feel he has the best graphic card and others are ridiculous :D.
This is not the case of planned TBGL Benchmark 2008, this one should serve TBGL developers to find out how tested graphic cards perform in common tasks and features, like:

Fill rate performance ( how many pixels per second )
Blending performance hit
Alpha test performance hit
Line rendering performance ( for potential CAD developers )
Triangle mesh at 5 001, 10 002, 20 004 and 100 008 polygons ( both model and display listed, textured )
Quad mesh at 5 000, 10 000, 20 000 and 100 000 polygons ( both model and display listed, textured )
Rendering to texture speed, ability to capture NPOT textures


What will be the use
Once completed, we will create publicly available database of graphic cards performance with TBGL, for developers reference. So anybody writing game/graphic application will know where are limits of different architectures.

Some cards for example perform relatively well on high polygon counts, but have big problems with blending ( especially fullscreen one, usually related to poor fillrate performance ) or alpha tests.

Let me know of any ideas you will have. Let's keep it simple, but useful :)


Petr

ISAWHIM
06-11-2008, 18:29
Hmmm...

I suggest...

It first goes through the most common internal detection tests. (Detecting the ability of the card, including the potential video sizes, monitors, drivers, video card, etc.)

Models... (Only sphere is relevant, or individual polygons.)
(50 Polys each) {~25 visible/rendered}
- One created with Primitives
- One created by Poly
- One as M15
- All are equal form and mapping

(Quantity Test) [(Objects)Polys]
Prim -> Poly -> M15 (Not mixed)
* Raw and Entity versions.
- Each (1)50, (10)500, (100)5000, (1000)50000 of each.
NOTE: Testing 1 first, removes any lag caused by initial RAM loading.

(Intersect clip test, screen clip test.)
Prim -> Poly -> M15 -> Prim,Poly -> Prim,M15 -> Poly,M15 -> Prim,Poly,M15
* Raw and Entity versions.
- Non-intersected, Non-clipped, Spinning
- Intersected, Non-clipped, Spinning
- Non-intersected, Clipped, Spinning
- Intersected, Clipped, Spinning

(Mapping Using (1)50, (100)5000, (1000)50000 as the test.)
(Non-intersected, Non-clipped, Spinning)
Prim -> Poly -> M15
- Unmapped-OBJ
- Mapped-OBJ
- Transparent-Mapped-OBJ

I just threw the poly-count values in there as a guide. Whatever the Primitive can create, should be re-created for the other two object types.

Additional (Specific tests) should simply use the 1/100/1000 test. The curve from the original test can safely estimate any other line values.

If desired, the calculated or "Tested" value at 60FPS and 30FPS are the ones that hold the most relevance. Relevance in that, these values will ultimately determine what the desired limitations are. How many things you can see or have turned on. (Where available, the values can actually estimate the FPS before even running the game, or state... This setting will reduce game-rate FPS by 1.25%. If the MAX FPS of "Nothing" is known... That can % be estimated.)

Number of objects at 60 FPS * 95% of the worlds monitor refresh rate is 60Hz or 60FPS (Few have 75Hz)
Number of objects at 30 FPS
FPS @ 1 (xPolys)
FPS @ 10 (xPolys)
FPS @ 100 (xPolys)
FPS @ 1000 (xPolys)
FPS @ 10000 (xPolys)

No point in making each separate element merged. (Except in the rare instance of full ALPHA, you can simply determine the losses of each element by comparing the result of 10000 Polys to the fastest speed at 10000 Polys.

EG,
Fastest speed is 10000 unmapped, white, M15 spheres (0.016 seconds, 62.50 FPS)
Mapped, white, M15 spheres (0.166 seconds, 6.02 FPS) Loss of 90.37% (9.63% of max)

You can assume that...
Mapped, white, Primitive will loose 90.37% compared to... unmapped, white, Primitive value.
- Unmapped, white, Primitive (0.020 seconds, 50.00 FPS)
- Mapped, white, Primitive Estimated (0.207 seconds, 4.82 FPS) Loss of 90.37% (9.63% of similar)

The end result should actually be recorded as a 50K penalty, based on the users MAX.
When compared to others, the result of each, should be recorded as a penalty of the MAX for that element.

The curves should all be similar, however, certain elements that are not handled by hardware, or handled by slower hardware, will have a lower percentage to the highest users MAX. (If we all render this "element" at 4FPS, our value is 100% compared to the normal, the actual FPS is of no importance, only the %of the loss from that "element".)

ISAWHIM
06-11-2008, 22:46
The importance of the "Blank" screen, is that this "Speed" is the maximum that the CPU/program can produce. Passing any single dead call, will let you know what the ability is, in time. (A full-screen background-change would be the MAX full-screen value. Which is specific to the card, with consideration to the computer speed.)

CPU call-speed max calls-per-second is also the fastest that the TBGL can render. (The card may be able to render faster, but without calls/instruction, there is nothing to render, except the last frame.)

The Poly count here is what is actually being "thought about", the Real is what TBGL thinks is being done by FPS count.
FPS 7.5, 80000 Poly, 45000 Real (Reached memory push limits)
FPS 15, 40000 Poly, 38000 Real (Reaching memory push limits)
FPS 30, 20000 Poly, 20000 Real
FPS 60, 10000 Poly, 10000 Real
FPS 120, 5000 Poly, 6000 Real (Extra Poly from render-dumping of unseen areas)
FPS 240, 2500 Poly, 5000 Real (Screen redraw area getting smaller, it is still actually drawing 2500, but 2500 not seen.)
FPS 480, 1250 Poly, 4000 Real (Screen redraw area is about 1" tall or 72 pixels, only 1250 poly in that space.)
FPS 960, 625 Poly, 4000 Real (Call limit reached, video card is done drawing and waiting for TBGL calls to be sent.)
FPS 1920, 312 Poly, 4000 Real (Video-card goes into low power mode, and runs cooler, since it is doing nothing 90%)
FPS 3840, 156 Poly, 4000 Real (Computer CPU is in overdrive, and video-card is out to lunch, drawing icons in spare time.)

FPS alone is misleading, because if the monitor refresh rate is 60Hz... you are only rendering one whole screen every 1/60th of a second. At 120FPS you are rendering two half screens. (First the top half... then a redraw begins at the half-way point, to the bottom. EG, the other half. So, at 120FPS you are not rendering 800x600, you are rendering two 800*300 screens. At 180FPS you are rendering three 800*200 screens. At 600FPS you are rendering ten 800*60 screens.)

When FPS falls below 60, or reaches the monitor refresh rate, that is where you get real numbers. That is also why a non-full-screen render (Which occupies one of those 800*60 areas, seems lightning fast. It is only rendering that single portion of the screen, 120000 times but drawing it only 1/60th of a second. Though the call to RENDER is made, the graphic card throws it out, because it expires before the scan-line has reached the area of change. It only keeps the RENDER which is in the area of change.)

This is also why the more important number is the quantity of items/things being done at that 60FPS/60Hz point.

Basically, that is not FPS, but a measure, when compared to 60Hz, of how many wasted frames have been called.

RENDER is only a suggestion to the card, letting it know that the new scene is constructed, and ready for display. The card determines what, and when to draw the display. (V-sync) was a ghetto way to attempt to stop wasted calls, which make the video-card draw incomplete screens. (It is actually better to monitor internally, when calls are wasted, and stop sending as many calls, which puts the CPU into overdrive.)

Michael Clease
07-11-2008, 09:22
When FPS falls below 60, or reaches the monitor refresh rate, that is where you get real numbers. That is also why a non-full-screen render (Which occupies one of those 800*60 areas, seems lightning fast. It is only rendering that single portion of the screen, 120000 times but drawing it only 1/60th of a second. Though the call to RENDER is made, the graphic card throws it out, because it expires before the scan-line has reached the area of change. It only keeps the RENDER which is in the area of change.)

This is also why the more important number is the quantity of items/things being done at that 60FPS/60Hz point.

Basically, that is not FPS, but a measure, when compared to 60Hz, of how many wasted frames have been called.

RENDER is only a suggestion to the card, letting it know that the new scene is constructed, and ready for display. The card determines what, and when to draw the display. (V-sync) was a ghetto way to attempt to stop wasted calls, which make the video-card draw incomplete screens. (It is actually better to monitor internally, when calls are wasted, and stop sending as many calls, which puts the CPU into overdrive.)


Where do you get this information?

ISAWHIM
08-11-2008, 20:32
Most of this is basic knowledge, for those who build video-interfaces. (Not that it is basic to normal people.)

http://en.wikipedia.org/wiki/Frame_rate

Some info there... (Though, they say nothing about LCD which has a forced limitation of 30-60 depending on the age of the LCD itself. NOTE: They may internally refresh at 200Hz, but that is to reduce the LAG you see at 60Hz refresh. The Video-Card only gets the 60 or 30 Hz ticks.)

GL and DX are fast, because a MATRIX output is equal to the resolution of the screen. The refresh/scanline position, which the card knows, is used to move the pointer in the matrix. If it is on scanline 300H, it does not read 0-299X in the matrix... It reads in a loop, until new data comes in. When new data comes in, (Which may be after displaying 100H lines, or reading 100X lines.), the old buffer is dumped, and replaced with the new matrix data.

You send commands that the VID-CARD turns into... This is a full 800*600 screen
A11111111
A22222222
A33333333
A44444444

But the monitor and the VID-CARD talk to one another, and the monitor tell the card... I am at position 2.
A22222222 (This is a 800 * 150 slice)

Now matrix B comes in... while line A2 is drawing...
The Monitor tells the VID-CARD... I am at position 3.
B33333333 (A1, A3, A4 were never drawn, they contained 2000 surfaces that were not drawn/processed)

No new frame comes in... but the Monitor tells the VID-CARD, I am at positon 4
B44444444

.... pos 1
B11111111 (Total from B is 800 * 450) (450 = 150 * 3)

.... New Matrix C comes in (B2 is never attempted to be drawn.)
C11111111
C22222222
C33333333
C44444444

... I am at position 2
C22222222

The "Tearing" you see, is the top half drew 1/60 of a second, while the bottom half drew 2/60 of a second. In that "Time", the item moved 1" on the screen, so the bottom half seems sheared by 1" to the left or right. High FPS will make a vertical line seem to bend as you turn your view.
A00010000 (240 FPS @ 60Hz)
B00002000
C00000300
D00000040

A00010000 (120 FPS @ 60Hz)
A00020000
B00003000
B00004000

A00010000 (60 FPS @ 60Hz)
A00020000
A00030000
A00040000

D00000010 (60 FPS @ 60Hz)
D00000020
D00000030
D00000040

Old video cards did not relay the VSYNC "hold"...
Old video cards actually attempted to render every frame in a buffer... and sent every frame to monitor, not just slices of the frame that were needed. Thus, "Accelerated Video Cards" were born. No software, pure hardware acceleration. (Amiga computers used this BLIT mode, and interlace.)

60Hz is used, because the eye can not see flicker at 60Hz... our lights flicker at this rate because electricity runs at this rate. (Eyes get fatigued at higher or odd rates.) (50Hz over seas.) (3D displays use 120Hz, 60Hz per eye, for shutter-goggles.)
http://en.wikipedia.org/wiki/Visual_display_unit
http://en.wikipedia.org/wiki/Refresh_rate
(Default refresh rate, gamers who change this are less than 1%)
http://support.microsoft.com/kb/311403
http://en.wikipedia.org/wiki/Vsync
http://en.wikipedia.org/wiki/RAMDAC
http://en.wikipedia.org/wiki/Super_Video_Graphics_Array
http://en.wikipedia.org/wiki/Video_acceleration
http://en.wikipedia.org/wiki/Comparison_of_ATI_Graphics_Processing_Units
http://en.wikipedia.org/wiki/Comparison_of_Nvidia_Graphics_Processing_Units

EG, in that case, it reads 100X from each sent Matrix, only 100H was changed in 1/60 or 1Hz (One full 800X600 screen.)

If 800 * 100 is what is drawn/read every 1/60 second or 1hz, that returns faster with the call. (Giving the FPS of 100*800 area, not 800*600. That is the Vsync purpose, to only return when the full 800*600 area has been drawn.)

However, on the programming side, we still MOVE, ROTATE, and send the full DRAW of 800*600, becaue that could be the last frame ever rendered. Even if only 1/10th of the data is being read/displayed. Thus, unless you are "Seeing" all 20,000 surfaces in the 800*100, you are not getting the FPS of 20,000 surfaces, you are getting the FPS of the number of surfaces in the 800*100 part of the matrix.

Every test will confirm it. That is why benchmarks use those "Tricks" to force correct values, based on what you are trying to benchmark. For FPS, a real benchmark, not a slip-demo, they force full screen (Native resolution on an LCD), or they limit all drawing to a constant minimum size. (Running QuakeTimeDemo with no limits, is a measure of ow fast the CPU is processing the game internals. The view is just to let you see how fast that actually is.)

http://en.wikipedia.org/wiki/Benchmark_(computing)
http://donutey.com/hardwaretesting.php
http://shootout.alioth.debian.org/
http://en.wikipedia.org/wiki/3DMark
http://sourceforge.net/projects/glmark

If you comment-out the DRAW... you get a FPS... though it has DRAWN 0 FPS... Thus, you are not measuring the "Frames" per second, you are measuring the "Transforms per second"...

TBGL_M15DrawModel 1
TBGL_CallList 1

Are NOT a FPS element. Both will display 60 FPS if VSYNC is on, and only one call is sent.

Processing 20000 of those, with 1 box will be processed faster than 60FPS, so that will also be 60FPS.
Processing 20000 of those, with 1000 boxes will take longer to process than 60FPS, so the result will be a delay of FPS.

Which makes the important element of "THAT" benchmark... "How many of those can be processed, while maintaining 60FPS?"

If the items are clipped, (Only drawing 500 of the 1000 objects.), that makes the DrawFrame return faster, which seems to alter the results. The RETURN TIME is shorter, so obviously, more calls are being processed.

{ClearScreen Time} + {CameraPos Time} + {SceneRotate Time} + {ModelDraw Time} + {DrawFrame Time} = {Cycle Time}

Cycles per second is limited to the addition of all those times. (Remembering that DrawFrame is NOT a constant, and does not care if the model was drawn with ModelDraw or CallList.)

If there were a 1000 {CallList} or {ModelDraw} between {SceneRotate} and {DrawFrame}, then the FPS would be a good measure of how fast those functions are.

Measuring frames, even with VSYNC ON... (And 1 box)
I get a formulated ...
1 frame = 85,000,000 FPS {This is actually how long it takes to send a blind command, your CPU speed.}
10 frames = 43,000 FPS
100 frames = 61 FPS {Actual time > 1 second}
1000 frames = 60 FPS
10000 frames = 60 FPS

With (2000 boxes) VSYNC OFF
1 frame = 8,500,000 FPS
10 frames = 680 FPS
100 frames = 48 FPS {Actual time > 1 second}
1000 frames = 47 FPS
10000 frames = 47 FPS

Without rendering anything, just (Clear and Draw frame) VSYNC OFF (Same as draw 1 box)
I get a formulated ...
1 frame = 85,000,000 FPS
10 frames = 1,400,000 FPS
100 frames = 836,000 FPS
1000 frames = 568,000 FPS
10000 frames = 158,000 FPS {Actual time < 1 second}

EG... The demo which lasts less than 1 second (60 FPS) has false info.

Run those same test, until 1 second has passed, and count the frames passed, and you will see that you are not getting 50,000 FPS Or in my case 85,000,000 FPS.

Time is time... If one call has a return of 85,000,000 FPS, than 1000 calls should return the same/similar value. (However, you also have to subtract the time it takes to make the "TickCount", which is what a longer test does.)

GetTickCount is also not accurate enough to measure any demo shorter than 1 second. (It only monitors 1000 units per second. 60FPS is 16 and 85,000,000 FPS is like 0.0000001, which is not a possible INTEGER or GET TICK COUNT value.)

The speed of the TBGL_M15DrawModel can be done witout even drawing a single screen.

But that value is useless, unless you know how fast it will redraw the entire screen, at a resolution you play at. (EG, if the time for that transform is 0.00000000000001 but your redraw is 0.016 or 0.00001... 1000 (TBGL_M15DrawModel) or (TBGL_CallList) speed will not even be seen. It will not impact FPS.


LOL... Enough citations for you?

ErosOlmi
08-11-2008, 20:43
Please, do not destroy the original post.
This post was about ideas on how to make next TBGL benchmark.
Possibly stay with the topic or create new one.

Thanks
Eros

ISAWHIM
08-11-2008, 21:12
All this is relevant to creating a new benchmark...

You can't create a benchmark, if there is no compensation for the things mentioned.

Part of a proposal, is talking about the hurdles that have to be handled.

I was also asked, specifically, where I get my information.

I will post the samples once they are complete. (So far, it tests FPS, Calls, Blanks, Objects, Scenes, and Translations. But there is no GUI, which makes it only good to a coder. There is also no cool graph. I could not get the graph thing to work within the vale ranges I use. It doesn't like decimals, and attempts to overlap hundreds of values in the line-graph on the bottom, which makes it unreadable.)

It takes into consideration, invalid low-run test results, refresh, vsync, call-times, precision error, setup delays, CPU overheating, thread-status, and user-interference. (Window-clicking/moving)

My focus was on the 60FPS, as 60Hz is the default refresh rate, and Vsync ON is a common default, which few people change, if they even can, or know about it.

ErosOlmi
08-11-2008, 21:47
Ok than.

Michael Clease
08-11-2008, 23:03
Some info there... (Though, they say nothing about LCD which has a forced limitation of 30-60 depending on the age of the LCD itself. NOTE: They may internally refresh at 200Hz, but that is to reduce the LAG you see at 60Hz refresh. The Video-Card only gets the 60 or 30 Hz ticks.)

GL and DX are fast, because a MATRIX output is equal to the resolution of the screen. The refresh/scanline position, which the card knows, is used to move the pointer in the matrix. If it is on scanline 300H, it does not read 0-299X in the matrix... It reads in a loop, until new data comes in. When new data comes in, (Which may be after displaying 100H lines, or reading 100X lines.), the old buffer is dumped, and replaced with the new matrix data.

You send commands that the VID-CARD turns into... This is a full 800*600 screen
A11111111
A22222222
A33333333
A44444444

But the monitor and the VID-CARD talk to one another, and the monitor tell the card... I am at position 2.
A22222222 (This is a 800 * 150 slice)

Now matrix B comes in... while line A2 is drawing...
The Monitor tells the VID-CARD... I am at position 3.
B33333333 (A1, A3, A4 were never drawn, they contained 2000 surfaces that were not drawn/processed)

No new frame comes in... but the Monitor tells the VID-CARD, I am at positon 4
B44444444

.... pos 1
B11111111 (Total from B is 800 * 450) (450 = 150 * 3)

.... New Matrix C comes in (B2 is never attempted to be drawn.)
C11111111
C22222222
C33333333
C44444444

... I am at position 2
C22222222

The "Tearing" you see, is the top half drew 1/60 of a second, while the bottom half drew 2/60 of a second. In that "Time", the item moved 1" on the screen, so the bottom half seems sheared by 1" to the left or right. High FPS will make a vertical line seem to bend as you turn your view.
A00010000 (240 FPS @ 60Hz)
B00002000
C00000300
D00000040

A00010000 (120 FPS @ 60Hz)
A00020000
B00003000
B00004000

A00010000 (60 FPS @ 60Hz)
A00020000
A00030000
A00040000

D00000010 (60 FPS @ 60Hz)
D00000020
D00000030
D00000040

Old video cards did not relay the VSYNC "hold"...
Old video cards actually attempted to render every frame in a buffer... and sent every frame to monitor, not just slices of the frame that were needed. Thus, "Accelerated Video Cards" were born. No software, pure hardware acceleration. (Amiga computers used this BLIT mode, and interlace.)


LMFAO. I dont know where you dream this stuff up, did I mention that I happened to spend 10 years repairing and servicing computer monitors/LCD/terminals and you just dont understand how funny what you posted is.

"Old cards didnt relay Vsync" that explains why I had to keep moving my head up and down the screen really fast to try and read it :D

I think my screen is broken its running at 75Hz :(

Interesting Idea about the monitor talking to the gfx card, total rubbish but interesting. ::)

I remember doing copper timing to triger interupt 3 so when I was blitting it didnt happen when it was being updated.

Got to go before I wee myself, thanks for the laugh anyway Jason.

ISAWHIM
09-11-2008, 00:24
Ok, then... Guess you are right, computers/video-cards don't talk to the monitor... (Then I guess you are not running at 75Mhz, if they are not talking. Do you set that from the button on the back of the monitor, like they did in the old days, or do you use a windows setting to tell your monitor how fast to redraw one whole screen? Sorry, but that comment you posted could have been sent in a PM, or another Karma, telling me I am stupid again. Don't hate the messenger, I didn't invent this junk.)

I said the "Default" was 60Hz, set by windows. ??? That is a fact, published by windows, in the link I gave. I didn't make windows. I didn't say it was impossible to get 60Hz.

Graphic cards, the accelerated ones, which is close to 99% now, have a back-buffer and some have triple-buffer, which is frames B and C... SwapBuffers() forces them to advance. We do not control when or where they draw, or even if they draw.

Last post which shows what I said, with less detail, and more words...
(Titled, "VSync and why people loath it.)
http://www.hardforum.com/showthread.php?t=928593

Nvidia also has an issue with the new API and VISTA, where it can not be turned off. (By windows)

Any-Who...

Here is the cut-down code which I am using. (If your video-card is Mach-12 you may want to add a zero to all the values in the time-test. 1,000 would become 10,000. I have not placed the code which determines the actual time needed for the 1 second test, I just used manual settings.)

You will see, if you reduce the numbers to something crazy, like 1 or 10 or 100... You get full useless values.

NOTE: This uses Petr's model from the benchmark. The file should be saved to the same folder as the old benchmark. And, Yes, the screen should be black the whole time, except for the end where it draws one single box.

The output is the Average time it takes to complete the tests. (I have not factored them down into normal time, but it shows relation quite fine.)


USES "TBGL"
HiResTimer_Init

DIM hWnd AS DWORD

TYPE t_Time
Old AS DOUBLE
New AS DOUBLE
ForNext AS DOUBLE
ClearFrame AS DOUBLE
Camera AS DOUBLE
Rotate AS DOUBLE
DrawFrame AS DOUBLE
M15DrawModel AS DOUBLE
CallList AS DOUBLE
Total AS DOUBLE
END TYPE

DIM Time AS t_Time
DIM i, Loop_Count AS LONG
DIM UseVSYNC AS BOOLEAN = %FALSE

hWnd = TBGL_CreateWindowEx("Benchmark", 640, 480, 32, %TBGL_WS_WINDOWED OR %TBGL_WS_DONTSIZE OR %TBGL_WS_CLOSEBOX)
TBGL_ShowWindow

FUNCTION TBMAIN()
SetupScene()
Loop_Count = 60 ' 30 to 30000

Loop_Count = MIN(MAX(Loop_Count, 30), 30000)
TBGL_SetWindowTitle(hWnd, "Benchmark target: " & STR$(Loop_Count) & " LOOPS")

' -- Just a generic call to initialize i with some value, and the first call to the timer.
i = INT(HiResTimer_Get)

SetupTimes()

TBGL_ClearFrame
TBGL_ResetMatrix
TBGL_DrawFrame
DoEvents
Sleep(50)

'DoTesting()

WHILE TBGL_IsWindow(hWnd)
DoEvents
Sleep(100)
WEND
STOP
TBGL_DestroyWindow
END FUNCTION

SUB SetupScene()
TBGL_M15InitModelBuffers 1, 50000
TBGL_M15LoadModel "Models\bm_48000.m15", "Textures\", 1, 0, %TBGL_NORMAL_PRECISE

TBGL_GetAsyncKeyState(-1)

TBGL_SetupFog(0,0,0,1,1)
TBGL_UseFog %FALSE
TBGL_UseVSYNC UseVSYNC
TBGL_BackColor(0,0,0)
TBGL_UseBlend %FALSE
TBGL_UseDepthMask %TRUE
TBGL_UseTexturing %TRUE
TBGL_SetDrawDistance(200)

TBGL_UseLighting %TRUE
TBGL_UseLightsource(%GL_LIGHT0, %TRUE)
TBGL_SetLightParameter(%GL_LIGHT0, %TBGL_LIGHT_AMBIENT, 0, 0, 0, 0)
TBGL_SetLightParameter(%GL_LIGHT0, %TBGL_LIGHT_DIFFUSE, 0.75, 0.75, 0.75, 0)
TBGL_SetLightParameter(%GL_LIGHT0, %TBGL_LIGHT_SPECULAR, 0.75, 0.75, 0.75, 0)
TBGL_SetLightParameter(%GL_LIGHT0, %TBGL_LIGHT_POSITION, 70, 70, 70, 1)
TBGL_UseLightSource(%GL_LIGHT1, %FALSE)
TBGL_UseLightSource(%GL_LIGHT2, %FALSE)
TBGL_UseLightSource(%GL_LIGHT3, %FALSE)
TBGL_UseLightSource(%GL_LIGHT4, %FALSE)
TBGL_UseLightSource(%GL_LIGHT5, %FALSE)
TBGL_UseLightSource(%GL_LIGHT6, %FALSE)
TBGL_UseLightSource(%GL_LIGHT7, %FALSE)
END SUB

SUB SetupTimes()
TBGL_UseVSYNC %FALSE
TBGL_SetWindowTitle(hWnd, "ForNext Test")
DoEvents
Sleep(50)

' -- Dead loop to measure speed of ForNext
Time.Old = HiResTimer_Get
FOR i = 1 TO 10000000
' -- Nothing here
NEXT
Time.New = HiResTimer_Get
Time.ForNext = (Time.New-Time.Old)/10000000 ' -- Average time for 1 call, from 10,000 calls

TBGL_SetWindowTitle(hWnd, "DrawFrame Test")
DoEvents
Sleep(50)

' -- Dead loop to measure speed of DrawFrame
Time.Old = HiResTimer_Get
FOR i = 1 TO 2000
TBGL_DrawFrame
NEXT
Time.New = HiResTimer_Get
Time.DrawFrame = (Time.New-Time.Old)/2000

TBGL_SetWindowTitle(hWnd, "ClearFrame Test")
TBGL_ClearFrame
TBGL_ResetMatrix
DoEvents
Sleep(50)

' -- Dead loop to measure speed of ClearFrame and DrawFrame
' -- Clear needs something to erase, so the time of both, minus DrawFrame time is the ClearFrame time.
Time.Old = HiResTimer_Get
FOR i = 1 TO 2000
TBGL_ClearFrame
TBGL_DrawFrame
NEXT
Time.New = HiResTimer_Get
Time.ClearFrame = ((Time.New-Time.Old)/2000) - Time.DrawFrame

TBGL_SetWindowTitle(hWnd, "Camera Test")
TBGL_ResetMatrix
DoEvents
Sleep(50)

' -- Dead loop to measure speed of Camera
Time.Old = HiResTimer_Get
FOR i = 1 TO 1000000
TBGL_Camera (70,70,70,0,-20,-20)
NEXT
Time.New = HiResTimer_Get
Time.Camera = (Time.New-Time.Old)/1000000

TBGL_SetWindowTitle(hWnd, "Rotate Test")
TBGL_ResetMatrix
DoEvents
Sleep(50)

' -- Dead loop to measure speed of Rotate
Time.Old = HiResTimer_Get
FOR i = 1 TO 2000000
TBGL_Rotate (1, 1, 0, 0)
NEXT
Time.New = HiResTimer_Get
Time.Rotate = (Time.New-Time.Old)/2000000

TBGL_SetWindowTitle(hWnd, "M15DrawModel Test")
SetupModel(1) ' -- Set something to draw (1 box) For following tests
TBGL_ResetMatrix
DoEvents
Sleep(50)

' -- Dead loop to measure speed of M15DrawModel
Time.Old = HiResTimer_Get
FOR i = 1 TO 200000
TBGL_M15DrawModel 1
NEXT
Time.New = HiResTimer_Get
Time.M15DrawModel = (Time.New-Time.Old)/200000

TBGL_SetWindowTitle(hWnd, "CallList Test")
TBGL_ResetMatrix
DoEvents
Sleep(50)

' -- Dead loop to measure speed of CallList
Time.Old = HiResTimer_Get
FOR i = 1 TO 200000
TBGL_CallList 1
NEXT
Time.New = HiResTimer_Get
Time.CallList = (Time.New-Time.Old)/200000

TBGL_SetWindowTitle(hWnd, "Total Test")
TBGL_ResetMatrix
DoEvents
Sleep(50)

' -- Test1 @ 1000 frames
Time.Old = HiResTimer_Get
FOR i = 1 TO 120
TBGL_ClearFrame
TBGL_Camera (70,70,70,0,-20,-20)
TBGL_Rotate (1, 1, 0, 0)
TBGL_M15DrawModel 1
'TBGL_CallList 1
TBGL_DrawFrame
NEXT
Time.New = HiResTimer_Get
Time.Total = (Time.New-Time.Old)/120
' -- Check compared to the total of all used items...
' -- ForNext
' -- ClearFrame
' -- Camera
' -- Rotate
' -- M15DrawModel
' -- DrawFrame
' -- ============
' -- TOTAL TIME

MSGBOX(0, _
"Time.ForNext = " & FORMAT$(Time.ForNext, "#0.0000") & CRLF & _
"Time.ClearFrame = " & FORMAT$(Time.ClearFrame, "#0.0000") & CRLF & _
"Time.Camera = " & FORMAT$(Time.Camera, "#0.0000") & CRLF & _
"Time.Rotate = " & FORMAT$(Time.Rotate, "#0.0000") & CRLF & _
"Time.M15DrawModel = " & FORMAT$(Time.M15DrawModel, "#0.0000") & CRLF & _
"Time.CallList = " & FORMAT$(Time.CallList, "#0.0000") & " * (Not in total)" & CRLF & _
"Time.DrawFrame = " & FORMAT$(Time.DrawFrame, "#0.0000") & CRLF & _
"==================================" & CRLF & _
"Time.Total = " & FORMAT$(Time.Total, "#0.0000") & CRLF & _
"==================================" & CRLF & _
"Real Total = " & FORMAT$(Time.ForNext + Time.ClearFrame + Time.Camera + Time.Rotate + Time.M15DrawModel + Time.DrawFrame, "#0.0000"))
END SUB

FUNCTION SetupModel(Boxes AS LONG) AS BOOLEAN
Function = %FALSE
' -- This control is to keep all the OBJECT tricks in one location.
' -- 48000 is the number of vertices, which is about 24 per box, at 2000 boxes.
' -- Stop an EOF filling situation. Trying to fill a TBGL value with nothing.
IF Boxes =< 2000 THEN
TBGL_M15SetModelVertexCount(1, MIN(Boxes*24,48000))
TBGL_DeleteList 1
TBGL_NewList 1
TBGL_M15DrawModel 1
TBGL_EndList
Function = %TRUE
END IF
END FUNCTION

SUB RunTest()
' -- Removed due to unfinished portions
END SUB

Michael Clease
09-11-2008, 01:15
Short reply then I'm done with this topic.



Ok, then... Guess you are right, computers/video-cards don't talk to the monitor... (Then I guess you are not running at 75Mhz, if they are not talking. Do you set that from the button on the back of the monitor, like they did in the old days, or do you use a windows setting to tell your monitor how fast to redraw one whole screen? Sorry, but that comment you posted could have been sent in a PM, or another Karma, telling me I am stupid again. Don't hate the messenger, I didn't invent this junk.)
Sorry I may have misled you slightly, monitors do talk to gfx cards its called DDC but its not what you are describing. No I am not running at 75Mhz that would need a video bandwidth of about 1.71 Ghz and a line rate of about 600Khz, technology is not available yet. As for your other comment I was only correcting you incorrect information.

Last post which shows what I said, with less detail, and more words...
(Titled, "VSync and why people loath it.)
http://www.hardforum.com/showthread.php?t=928593
Very confused guy talking about FPS which nothing to do with v-sync

Petr Schreiber
09-11-2008, 01:21
Hi Jason,

good start.
Just few details:
- TBGL_ResetKeyState() is prefered over TBGL_GetAsyncKeyState(-1)
- it has no point calling TBGL_ResetMatrix right after TBGL_ClearFrame, TBGL_ClearFrame does it on its own
- I would recommend to mention units in which the time intervals are measured.
- you can use LONG instead of BOOLEAN for TBGL purposes

I am unsure about measuring performance of single command in many loops.
Difference of speed between:



FOR i = 1 TO 2000000
TBGL_Rotate (1, 1, 0, 0)
NEXT


and



FOR i = 1 TO 2000000/5
TBGL_Rotate (1, 1, 0, 0)
TBGL_Rotate (1, 1, 0, 0)
TBGL_Rotate (1, 1, 0, 0)
TBGL_Rotate (1, 1, 0, 0)
TBGL_Rotate (1, 1, 0, 0)
NEXT


... is 20% on my PC.

Also ( maybe just on my PC ), the clear frame test ( calling both TBGL_ClearFrame and TBGL_DrawFrame ) is 1.5x faster than TBGL_DrawFrame test ( using just TBGL_DrawFrame )... that is suspicious a bit :)

Well, it is hard to write benchmark, but we will figure it out :)

Where did you get info on NVIDIA having trouble with V-Sync? I think ATi has this feature in drivers as well.


Petr

ISAWHIM
09-11-2008, 07:58
http://www.interfacebus.com/Design_SVGA_PinOuts.html
http://pinouts.ru/Video/VGAVesaDdc_pinout.shtml

Oops, I said Mhz... I was trying to say Hz.
Frames-Per-Second has nothing to do with the V-Sync that is controlling/drawing the frames being drawn per second? I have no problem with you saying that something is not correct, (Or not entirely correct.), but you have not stated anything, you just quoted my whole post and said it made you pee yourself.

Correcting what wrong information?
"I was only correcting you incorrect information."

Sorry, I will not say anything more about unrelated postings of monitor repairs and peeing. (I still talk to my monitor...with code. Even if no-one believes that it talks back to me! LOL.)

###########################################

Petr,

The units will not be seen by the user... not those ones...

The call with both Clear and Draw, is to compensate for a glitch... calling ClearFrame over and over does nothing, it has nothing to clear. You have to add DrawFrame, and then you can ClearFrame... (You have to subtract DrawFrame from that value to get the ClearFrame speed.) For me... the Draw value is 951 and the Clear is 531 (Estimated) The total for both was... 1482.

It is like REDIM on an empty variable, takes 0.000000001 seconds, but on a variable with a value, it takes 0.156433 seconds. (Just an example.)

Rarely will there be (ClearFrame, ClearFrame) situation, but there will always be a (DrawFrame, ClearFrame) I know that seems reversed, but something has to be written before it can be erased. But there may be multiple DrawFrame's before a clear frame.

Those values are in millionths of a second.
DrawFrame = 951/1000000 = 0.000951 seconds per call
ClearFrame = 531/1000000 = 0.000531 seconds per call

60 FPS = 1/60 = 0.016667 seconds per frame
(1 frame of 0.016667 seconds) / (DrawFrame of 0.000951 seconds) = 17.525762 Draws per 1/60th of a second.
17.525762 * 60 = 1051.54572 (Potential frames per second. {Processed not displayed.})

Of the 1051 Frames, 60 full processed images of volume will be displayed, which leaves 991 or 94.29% of the processed data completely unused. Created then erased before it could be displayed.

About the nVidia thing... Sorry, it looks like it is only a temporary issue for nVidia... The only people complaining about it are the ones trying to turn it off. They can turn it off, but it does not actually turn-off (Override). (The games can turn it on and off, if they have coded it.)

Point taken... About the matrix thing. I was using matrix reset, then later added the clear screen. I will also change the key-state thing... (I just copied that over from your code. Not sure I even need it.)

The problem with...
FOR i = 1 TO 20000/5
Call 1
Call 2
Call 3
Call 4
Call 5
NEXT

The gain is not desired for benchmarking... (Not at this portion of the benchmark.)

You just divided the ForNext-time by five. The result is still the same. It is 20% faster, but that is because that is the time you gained from the 4/5 missing loops, the CALL itself is not any faster. The problem with less loops, is that the numbers become unreliable.

Nothing wrong with the math, it has to do with the GetTickCount and HighRexTimerGet delay/lag. Plus the ready-state that is used when the CPU has computed the answer before you asked for it. Talking about code branch prediction interference from the CPU. It can't predict a GetTime result, but it does predict that you will use FOR i = 1 TO 100, call 1, call 2, call 3.... after the NEXT statement. Those results are just siting around waiting for you to push them over the edge. In long-calls, there is no wait-state, and the prediction tree has been saturated beyond prediction ability, or reduced to no logical prediction results.

You also want to measure each loop at this point, because that is how a game would be programmed. That value is also a KNOWN value that can be added or subtracted.

You would not have five lines of the same code, because that would be redundant. Remember, this is a benchmark, and you need as many realistic constants as possible. You want to have isolated code which is a "Control Group", or you have noting to compare with. This is the first set of isolated tests, which will determine how the real tests run. These are not the actual tests. These values are used to remove erroneous times from the actual tests... If it has three ForNext loops, you subtract the time that it takes a ForNext loop to process, and you are left with the remaining time that is not a ForNext loop.

For the optimized testing, where YOU/WE actually test YOUR/OUR code... or we set the multi-parameter test scenarios, this would be ideal. This type of code structure will show the potential gain of that specific setup.

I will post again soon.

I don't have the ability, at the moment, to create a model similar to the one you have for testing. (I was contemplating using code to spin a spiral of triangles, and save the values as an M15... But for now, your model works fine.)

Petr Schreiber
09-11-2008, 11:49
Hi Jason,

I am not sure this statement is correct:


calling ClearFrame over and over does nothing, it has nothing to clear


TBGL_ClearFrame by default:
- resets color buffer, does not matter whether screen is empty or not, this is getting slower with growing resolution
- resets Z-Buffer information, this is getting slower with growing resolution
- sets matrix transformation to default identity matrix 4x4 ( no rotation, no translation, scaling = 1 ).

I do not know if it is noticable on all cards, but my old Radeon 7000 kept last frame rendered ( sometimes corrupted ) survived until redrawed, so when I ran new program, first frame rendered was something that should not be there :)

Clearing color buffer is quite expensive on low end cards, that is why old games tried to cover whole screen somehow, so clearing of color buffer was not needed => all fragments were redrawn anyway.

Must go now, I will not be here for the rest of day I guess.


Thanks,
Petr

P.S. The testing model is a bit evil - to demonstrate worst cases, it has different texture for each quad, it is like 1,2,1,2 ... you can imagine driver can go crazy from this state change. For more real life tests, we could test single texture models ( it is good practice to use 1 texture per model ).

ISAWHIM
09-11-2008, 12:28
I figured there was other things going on behind that call, but nothing as demanding (On this card), as the call when it follows a DrawFrame. (Value without a DrawFrame was like 0.00000013 seconds, but following a DrawFrame, it is like 500.)

To clarify... (What I was trying to say.) :-\
{No DrawFrame ever called Prior to this...}
ClearFrame
ClearFrame
ClearFrame
ClearFrame
ClearFrame
ClearFrame
... {1,000,000 loops}

(Called over and over and over, with nothing done between each clear-frame, and no DrawFrame has been called yet... Does nothing. The key word was, without DrawFrame being called yet. I left both in the loop-test, because the results, even with one single DrawFrameCall before the million ClearFrame calls, had false readings. Again, I am sure related to video-card feedback. Returning the call before it was actually finished processing. It turned the 99.8% accuracy into 85.0%, which is unacceptable time-demo accuracy.)

I think there is a bug in the M15 also... (Not sure, wanted you to look at it first, before I post.) ???

This runs from the same benchmark folder. (Up arrow will load +1 vertex, Down arrow -1)
Shows the vertex loading pattern...

0 = nothing
1 = One whole surface drawn (From one point of data?)
2 = nothing
3 = nothing
4 = nothing
5 = nothing
6 = New side drawn (Skipped 4 points)
7 = nothing
8 = nothing
9 = nothing
10 = New side drawn (Expected 3 point skip)
...
14 = New (Expected 3 point skip)
...
18 = New (...)


USES "TBGL"

DIM hWnd AS DWORD
DIM i, CountValue AS LONG

hWnd = TBGL_CreateWindowEx("Object Test", 640, 480, 32, %TBGL_WS_WINDOWED OR %TBGL_WS_DONTSIZE OR %TBGL_WS_CLOSEBOX)
TBGL_ShowWindow

FUNCTION TBMAIN()
SetupScene()

WHILE TBGL_IsWindow(hWnd)
IF TBGL_GetWindowKeyState(hWnd, %VK_UP) THEN
CountValue+=1
IF CountValue <= 48000 THEN
SetupModel(CountValue)
ELSE
CountValue = 48000
END IF
END IF
IF TBGL_GetWindowKeyState(hWnd, %VK_DOWN) THEN
CountValue-=1
IF CountValue >= 0 THEN
SetupModel(CountValue)
ELSE
CountValue = 0
END IF
END IF

DoEvents
Sleep(100)
WEND
DoEvents
Sleep(10)
STOP
TBGL_DestroyWindow
END FUNCTION

SUB SetupScene()
TBGL_M15InitModelBuffers 1, 50000
TBGL_M15LoadModel "Models\bm_48000.m15", "Textures\", 1, 0, %TBGL_NORMAL_PRECISE

TBGL_ResetKeyState()

TBGL_UseVSYNC %FALSE
TBGL_BackColor(0,0,0)
TBGL_UseBlend %FALSE
TBGL_UseDepthMask %TRUE
TBGL_UseTexturing %FALSE
TBGL_SetDrawDistance(200)

TBGL_UseLighting %FALSE
TBGL_UseLightsource(%GL_LIGHT0, %FALSE)
END SUB

FUNCTION SetupModel(Vertex AS LONG) AS BOOLEAN
Function = %FALSE
IF Vertex <= 48000 THEN
IF Vertex >= 0 THEN
TBGL_M15SetModelVertexCount(1, Vertex)
TBGL_ClearFrame
TBGL_Camera 10,10,10,-30,-30,-60
TBGL_Rotate 180,1,0,0
TBGL_M15DrawModel 1
TBGL_DrawFrame
TBGL_SetWindowTitle(hWnd, "(Count Value: " & STR$(CountValue) & ")")
Function = %TRUE
END IF
END IF
END FUNCTION

ISAWHIM
10-11-2008, 03:08
Hi Jason,

I am not sure this statement is correct:


calling ClearFrame over and over does nothing, it has nothing to clear


TBGL_ClearFrame by default:
- resets color buffer, does not matter whether screen is empty or not, this is getting slower with growing resolution
- resets Z-Buffer information, this is getting slower with growing resolution
- sets matrix transformation to default identity matrix 4x4 ( no rotation, no translation, scaling = 1 ).

- Reset color buffer (No draw-frame = Should be no buffer allocated to erase, or it is empty if nothing has been drawn.)
- Resets Z-Buffer information (No draw-frame = Should be no z-order information on "0" objects.)
- Sets matrix transformation to default identity matrix 4x4 (If it is not set, should already be default 4x4 and =1)

This is also why I do resetting calls between each test... to ensure they are on a level playing field... (If you see any other items that I/WE would have to "Level the playing field" between individual tests... Don't hesitate to let me know. I am not sure how you programmed the "Things behind the curtain", you are the guy from OZ... I am just the Tin-Man.)

Though, I understand that there MAY be something left-over from another device due to bad programming or failed clean-up calls, prior to this call here. (That ClearFrame would normally only be called once, and would be the same result as calling ClearFrame after your own DrawFrame. But to know how SLOW it may be, I have to ensure there is something there on every call. Where one card may use "Acceleration" and see that nothing ha been written, so no cleaning needs to be done, and would pass back the useless call fast, while a dumb card would keep cleaning nothing over and over.)

One group of faulty cards "Errata", would/should not impact the results of every other correct working device. EG, if all those cards have that issue, the results will be the same for all of them. Which is what this testing method is designed to handle. For the cards where this is not an issue... the value of the speed-benchmark here, will point out that gain or loss.