Sorry your browser is not supported!

You are using an outdated browser that does not support modern web technologies, in order to use this site please update to a new browser.

Browsers supported include Chrome, FireFox, Safari, Opera, Internet Explorer 10+ or Microsoft Edge.

DarkBASIC Professional Discussion / Need For Speed. Engine speed. Where should entity positions be stored, accessed and calculated from to boost DBPRO speed?

Author
Message
Chris Tate
DBPro Master
15
Years of Service
User Offline
Joined: 29th Aug 2008
Location: London, England
Posted: 28th Nov 2011 03:43 Edited at: 28th Nov 2011 21:43
Hi Folks, this article I am working on is a WIP so I need your input:

Quote: "The Need for Speed. A test of how best to work with object positions
I have been looking into how best to use DBPRO positions to boost the performance of my game later on as its scale increases. It is not always about speed of performance, sometimes it is more important to focus on convenience; but these questions and statistics are based on performance interests. There are other areas of code that slow things down, but usually with games, objects and their positions are the main focus.

How do you store your positions for your 3d objects, sprites and entities?

Some of you guys who are fortunate enough to have completed a medium to large game with DBPRO already have some experience to provide an answer according to kind of measures to use for calculating and manipulating positions. If not, then how are you doing it?

I made an attempt to create a test program to monitor the performance of position storage functions, and the results are quite interesting. The statistics found are only so because it is small program with no conflicting issues between plugins, content, user input and code; but is it not accurate enough?

The program tests the following float3 storage locations and performs a basic calculation on them, along with creation of the memory.

--------------------------------------------------------
UDT array
3 arrays (x,y,z)
Memblock
Memory Bank (Matrix1)
Mike Net (Dark Net) Packet
Vector
3D Object
3D Limb
Sprite (2D)"


------------------------------------------------------

View the article...

Feedback is desired as I can only test on one machine and the article is currently heavily opinionated.

Run the test (Matrix1 & DarkNet 2 required), and press spacebar to pause just at the desired test interval. (Send a screenshot if you would like to help support the research):

UDTs, Variables and Arrays ( when not used with Add to stack or Inserts etc ) where created more quickly and performed faster calculations than the others. Second where memblocks and banks, with memblocks providing more speed, and banks providing more quantity. Third where network packets and object/camera/sprite expressions; last by a long shot; Vectors and Array/Stack/Queue inserts.

Edit: Added camera read and write: Result shows camera writing [Position Camera N, 0, 0, 0] takes up to 20 times longer than variable writing; so should only be used as few times as possible.



Edit: Array Loop demonstration. Count variable faster than Array Count function. Expected, the array count loop is twice as slow as the loop which uses a variable.



Edit 2: Datatype calculation loops - Integer and Floats best on x86; faster than booleans, bytes, words and Dwords. Using integers and floats in the same calculation took twice as long. Integers 10-20% faster than floats.



Constant Vs Global Vs Local variables: Global variables slowest. Constants and locals just as fast as each other.


Edit: Move Sprite / Move Camera / Move Object Vs NewXYZValue: Mostly 40 ticks for 10 looped operations; however NewXYZValue operations on float variables was the fastest at 14 ticks, limb NewXYZValue operations were the slowest at 60 ticks. Move Sprite/Camera/Object is only best when used once; otherwise variable NewXYZ operations are up to 3 times faster:


Curve Angle Vs Curve Value: Curve Angle 5-6 times slower, although it is best for angles. 1000 operations; Value 100 ticks, Angle 500 - 600 ticks.



Loops : We all need to work through series of positions; so what is the fastest loop method? With 10000 loops, The For loop at 100 - 155 ticks; the others all score anything between 1000 - 5000 ticks; 10 to 50 times slower due to their update requirements and variable operations. Continuous for loop with variable operations took 500 - 600 ticks, 2 to 9 times faster.


Inc Vs Addition : Literal Vs Constant : Fastest of each 10000 addition/increment on the Athlon was the integer increased by a literal value or unspecified value (1) at 190 - 210 ticks. Second was an integer incremented by a constant integer at 220 ticks, consistently. Inc Variable increments where 50% faster than Variable + Variable additions. The fastest of the non-constant/literals was [Inc MyInteger, Increment] at 240 - 300 ticks. The slowest were the Float + Float variable, literal and constant additions, all at 450 - 500 ticks.
As stated, increasing a float with an integer or vise-versa is slower than using the same datatype. On one occasion the Increase Float by integer scored 650 ticks. Increase Integer by float scored 710 ticks. On another occasion they both constantly scored 700 - 740 ticks.



Instance Vs Clone Vs Make Object From Limb Vs Add Limb : All clone and instance commands occur at the same speed, although there will be a difference capabilities. Deleting cloned/instance objects 5 - 10 times faster than deleting regular objects. Deleting an object with 100 limbs twice as fast as deleting 100 objects - 2 ms vs 4 ms. Make object from limb - 400 ms, slow compared to Add Limb and Append Limb, 120 ms. Delete 100 limbs took 3 ms on the Athlon + NVidea GeForce 8400 computer.



Sync Objects Vs Sync Limbs | Move Limbs Vs Move Object : In the tests, whether there are 1000 limbs or objects in the view, the render time is the same; the result was 3-4 FPS. Moving 1000 objects was twice as fast as moving 1000 limbs individually. Moving 1000 limbs individually could not be compared to moving their parent object:



WLGfx
16
Years of Service
User Offline
Joined: 1st Nov 2007
Location: NW United Kingdom
Posted: 28th Nov 2011 05:15
Hi Chris.

In the game I've just released (with source code), I've mainly stored the external values of the x,y,z positions of objects as integers in an array as most of the time you can get away with doing most of the calculations on integers as opposed to floats, which is quicker. However some things do require float precision. It's just a speed boost I've used at the time.

Mental arithmetic? Me? (That's for computers) I can't subtract a fart from a plate of beans!
Warning! May contain Nuts!
Chris Tate
DBPro Master
15
Years of Service
User Offline
Joined: 29th Aug 2008
Location: London, England
Posted: 28th Nov 2011 16:46 Edited at: 28th Nov 2011 17:02
Interesting point.

I took a brief moment to compare integer calculations and the results are interesting. Although the test would not replicate your game engine. (It has been added to the original post)

It shows so far that the integer binary operations are faster than the byte binary operations. I am guess it is because of the sign, because all of the unsigned datatypes performed half as fast.

The float math calculations where 10% slower than the integer calculations. The difference is not too bad on my Athlon CPU, but it confirms your finding.

With the test, the smaller datatypes did not perform very fast compared to the medium (4 byte) datatypes. DWords and Integer maths runs twice as fast as Word and Byte maths on my Athlon.

The double float/integer where the slowest, and using floating point calculations against whole number calculations was slower.

So it may be worth sticking with floats if precision is needed and you work with 3D objects and plugins; even if working with rounded numbers like Number# = Time# * 10.0 rather than Number# = Time * 10

It is just how the infrastructure works really. So it is an issue of using the right datatypes and good program design.

To the main topic of positions, as illustrated in the article, Vectors are extremely slow, and UDTs and standard variables are the fastest on my machine.

On my CPU, expressions such as Object Position X( A ) + Object Position X( B ) or Sprite X( A ) + Sprite X( B ) or X Vector( A ) + X Vector( B ) are extremely slow compared to regular variables, UDT variables, memblocks and in somecases network packets. Limb expressions turned out to be slower than Object expressions; due to the querying i believe; although your environment can hold more limbs than objects.

In one instance, the Vector expression was 80 times slower than the regular variable expression.

Obviously Vectors use some kind of internal array system and are for use with the 3DMath functions, but many people use them for custom calculations. DBPRO arrays calculated 20 - 30 times faster. I'm not sure why this is, I do not know how deep the engine has to dig to obtain them.

As for Memblocks / Banks, the speed was similar, memblocks where always faster; but you can use more banks than memblocks. In some cases it was better to keep data in network packets than to extract them. Creating new packets all the time is a no no; the MikeNet demo code shows you how to reuse the same packets.

You may not notice the speed difference in a small program, as your calculations may only use up 5 or 10 milliseconds; but for intensive programs with lots of entities and detail terrains, a considerable amount of frames could be lost if calculations are not performed effectively; for 60 Fps you will only have 16 milliseconds to sync and sort out your data ready for the next sync. For 30 Fps you have more room, with up to 33 milliseconds to play with.

WLGfx
16
Years of Service
User Offline
Joined: 1st Nov 2007
Location: NW United Kingdom
Posted: 28th Nov 2011 17:09 Edited at: 28th Nov 2011 17:24
Also to note is the x86 handles 4 byte integers much faster than words or bytes because of its design. Using a byte or a word in most cases have to be expanded to a 4 byte integer before a calculation can be done. At least this is true for a basic compiler such as DBPro.

Within very large programs and especially in loops it's always best to use local variables for the loop counters as opposed to global variables as the fetching and storing of the variable itself is faster using the cpus cache. Most of the time its not noticeable unless they are very large loops. Also counting down to 0 is usually quicker too as a compare with 0 is quicker than a compare with a number.

EDIT: If using bytes or words for data storage in a program, then use a temp integer variable to do the calcs with and when finished store it back into the original variable. This will save on all the recasting the size into an integer.

Mental arithmetic? Me? (That's for computers) I can't subtract a fart from a plate of beans!
Warning! May contain Nuts!
Chris Tate
DBPro Master
15
Years of Service
User Offline
Joined: 29th Aug 2008
Location: London, England
Posted: 28th Nov 2011 18:33 Edited at: 28th Nov 2011 18:43
Quote: "Within very large programs and especially in loops it's always best to use local variables for the loop counters as opposed to global variables as the fetching and storing of the variable itself is faster using the cpus cache."


Never thought about that actually; the CPU having a trip to make to obtain your global variable.

So as WLGfx advised, working on local variables is less CPU intensive. A test confirms this as the global operation was twice as slow compared to the local. Every time I reference a global, it took 75% to 100% more time. Whether parameters where given or not. In line with your hint, returning a value was faster than setting a global value. Using expressions with constants was also just as fast.

One question that springs to mind, is why there is more speed when working with undeclared local variables in DBPRO? As shown in the test, declaring the local variables was 30% slower. Are we not making things easier on the engine by letting it know what variables exist during compile time?

Diggsey
17
Years of Service
User Offline
Joined: 24th Apr 2006
Location: On this web page.
Posted: 28th Nov 2011 20:24 Edited at: 28th Nov 2011 20:25
The reason unsigned types perform much slower is not so much to do with the processor (that difference is negligable) but because DBPro calls functions to convert from an unsigned type to an integer whenever you pass one to a function that expects integer arguments, even though it's not really necessary.

[b]
WLGfx
16
Years of Service
User Offline
Joined: 1st Nov 2007
Location: NW United Kingdom
Posted: 29th Nov 2011 00:53
If the compiler has to call a function to convert a UINT (unsigned integer) to an INT (integer) is highly inefficient. Usually there is a standard optimisation in place for that. Even the GNU compiler accounts for type changes (signs included) if the same type holds 1, 2 or 4 bytes.

The declaring of local variables problem has really got me stumped as to why there's a difference at all. Maybe it's the compiler.

A lot of the above code from the first post I cannot compile as I don't have those libraries (ah plugins) but if I think of anything else I'll add it. But do speed test the reverse loop down to 0 though and let me know.

Mental arithmetic? Me? (That's for computers) I can't subtract a fart from a plate of beans!
Warning! May contain Nuts!
MrValentine
AGK Backer
13
Years of Service
User Offline
Joined: 5th Dec 2010
Playing: FFVII
Posted: 29th Nov 2011 10:54
Umm in which scenario will this benefit?

Is it just for saving game state?

Like at the end of a game or during?

If I got it all wrong... shoot me lol... or enlighten me with a little explanation...

And is this not a question of... how... not where?

Chris Tate
DBPro Master
15
Years of Service
User Offline
Joined: 29th Aug 2008
Location: London, England
Posted: 30th Nov 2011 00:40 Edited at: 30th Nov 2011 00:41
Quote: "A lot of the above code from the first post I cannot compile as I don't have those libraries"


Matrix1 and DarkNet.

Quote: "
do speed test the reverse loop down to 0 though and let me know"


A For loop from 100,000 to 0 - Step -1 took 50-60% longer than 0 to 100,000 on my machine:



WLGfx
16
Years of Service
User Offline
Joined: 1st Nov 2007
Location: NW United Kingdom
Posted: 30th Nov 2011 03:14
Ah, that's interesting to note for DBPro when I use it in future. A simple reverse loop counting down to zero in C/C++ is only slightly faster but still an optimisation.

If I come across any others I'll certainly post them here. Thanks Chris...

Mental arithmetic? Me? (That's for computers) I can't subtract a fart from a plate of beans!
Warning! May contain Nuts!

Login to post a reply

Server time is: 2024-04-20 11:49:39
Your offset time is: 2024-04-20 11:49:39