Monday, April 03, 2006
RAM access bandwidth test for 3D in nVidia driver
I promised to tell you about that 'delta speed test' I did with the 3D driver, which I did to get a better idea about how fast the RAM could be actually accessed by the 3D part of the acceleration engine. I considered this interesting because the 3D driver was rather slow compared to it's windows counterpart.
So how did I test?
It was rather simple, really. In the previous post I did I already 'calculated' some numbers for RAM bandwidth, and bandwidth needed by the CRTC to fetch the data to show us the memory content on the monitor. So I thought, if I can ascertain how much fps is gained by stopping CRTC accesses to the memory, I also know how much fps is theoretically feasable if the engine could use the complete bandwidth.
In a formula
(total RAM bandwidth / needed bandwidth by CRTC access) * (fps without CRTC accessing memory - fps with CRTC accessing memory) = nominal fps rate possible.
The setup
Just modify the 2D driver to enter DPMS sleep mode as soon as the cursor is turned off, and enter DPMS on mode when the cursor is turned back on. DPMS sleep mode in facts sets the CRTC in a 'reset' state since it's not required to fetch any data anymore: we won't be looking at it anyway (monitor is shut off). So this DPMS sleep state gives back the memory bandwidth otherwise used by the CRTC, to the 3D engine.
Since:
- you can start Quake2 from a terminal 'command line', including instructing it to do a timedemo test,
- you see the fps results back on the command line after game quitting and,
- starting Quake2 turns off the driver's hardware cursor, while stopping it turns it back on,
This setup will work nicely.
Result for the Geforce4MX, NV18
(6000 / 225) * (29.9 - 26.3) = 96 fps. (Windows measured value = 119 fps)
Well, for my taste this proves enough already that RAM access is actually OK, and the fault would not be low clocked RAM or something like that. So the reason for slow fps on BeOS should be found somewhere else.
Conclusion
Why is the delta speed with the CRTC test actually OK, while the total rendering speed is much to low? Well, it's interesting to realize that CRTC accesses are spread evenly through time, while 3D engine memory access requests are of a bursting nature.
The conclusion would be that there's some bottleneck in the GPU somehow after all... And with this new knowledge I went to sleep, not knowing yet what to make of this new information.
So how did I test?
It was rather simple, really. In the previous post I did I already 'calculated' some numbers for RAM bandwidth, and bandwidth needed by the CRTC to fetch the data to show us the memory content on the monitor. So I thought, if I can ascertain how much fps is gained by stopping CRTC accesses to the memory, I also know how much fps is theoretically feasable if the engine could use the complete bandwidth.
In a formula
(total RAM bandwidth / needed bandwidth by CRTC access) * (fps without CRTC accessing memory - fps with CRTC accessing memory) = nominal fps rate possible.
The setup
Just modify the 2D driver to enter DPMS sleep mode as soon as the cursor is turned off, and enter DPMS on mode when the cursor is turned back on. DPMS sleep mode in facts sets the CRTC in a 'reset' state since it's not required to fetch any data anymore: we won't be looking at it anyway (monitor is shut off). So this DPMS sleep state gives back the memory bandwidth otherwise used by the CRTC, to the 3D engine.
Since:
- you can start Quake2 from a terminal 'command line', including instructing it to do a timedemo test,
- you see the fps results back on the command line after game quitting and,
- starting Quake2 turns off the driver's hardware cursor, while stopping it turns it back on,
This setup will work nicely.
Result for the Geforce4MX, NV18
(6000 / 225) * (29.9 - 26.3) = 96 fps. (Windows measured value = 119 fps)
Well, for my taste this proves enough already that RAM access is actually OK, and the fault would not be low clocked RAM or something like that. So the reason for slow fps on BeOS should be found somewhere else.
Conclusion
Why is the delta speed with the CRTC test actually OK, while the total rendering speed is much to low? Well, it's interesting to realize that CRTC accesses are spread evenly through time, while 3D engine memory access requests are of a bursting nature.
The conclusion would be that there's some bottleneck in the GPU somehow after all... And with this new knowledge I went to sleep, not knowing yet what to make of this new information.
Comments:
<< Home
Good to see the peformance for ram acces is ok. Keep up the good work and and i hope you can find out how to fix the gpu bottleneck issue.
Hi,
Thanks for the compliment :-)
I can now inform you that I'll release 2D driver version 0.79 asap. This version adds support for some 20 new cards or so for 2D, and also increases 3D performance on GeForce cards with a factor of 1.4 - 2.0 depending on exact card:
GeForce 2Ti is now the fastest supported card with 45.4fps in 1024x768x32 @ 75Hz for Q1 timedemo1 on the P4 2.8Ghz system. With 2D driver 0.74 that same card runs at 23.3 fps...
In other words: yes! Seems I solved it, at least partially. :-)
I can't tell you how happy I am with this..
Post a Comment
Thanks for the compliment :-)
I can now inform you that I'll release 2D driver version 0.79 asap. This version adds support for some 20 new cards or so for 2D, and also increases 3D performance on GeForce cards with a factor of 1.4 - 2.0 depending on exact card:
GeForce 2Ti is now the fastest supported card with 45.4fps in 1024x768x32 @ 75Hz for Q1 timedemo1 on the P4 2.8Ghz system. With 2D driver 0.74 that same card runs at 23.3 fps...
In other words: yes! Seems I solved it, at least partially. :-)
I can't tell you how happy I am with this..
<< Home