Sorry your browser is not supported!

You are using an outdated browser that does not support modern web technologies, in order to use this site please update to a new browser.

Browsers supported include Chrome, FireFox, Safari, Opera, Internet Explorer 10+ or Microsoft Edge.

Dark GDK / Propose several new functions to "DarkGDK - Unofficial r114 update"

Author
Message
s_i
9
Years of Service
User Offline
Joined: 23rd May 2009
Location: Russia
Posted: 11th May 2014 16:47 Edited at: 12th May 2014 21:41
Hello! I propose several new functions to "DarkGDK - Unofficial r114 update". I successfully use them in my game and I think they can be useful to others. Maybe WickedX will agree to include it in the "DarkGDK - Unofficial r114 update" code.

My testbed: CPU Intel i5-4670 3.4 GHz, Windows 7 x64, Visual C++ 2008 Express Edition, DirectX SDK (August 2007).

1) dbFastSqrt() -– Square root.

Usage example:

Using dbFastSqrt() with "Release configuration" gives me this result: speed of calling dbFastSqrt() is 21...26% from speed of calling sqrt(), i.e. 4 times faster.

You can choose any of the two variants. On my testbed both variants give the same speed, but other people on other testbeds have the best speed in the second variant, so I personally use the second variant. (The second variant should be written in a .h file, and not in a .cpp file, because it uses "__declspec (naked)". )

Update (12 May 2014): in Variant #1 I changed "double" to "float", work good, as fast as with a "double".

2) dbFind3DCoordinatesOn2DScreen() — Finding 3D coordinates (of any point on ground) on 2D screen for ordinary (non-orthogonal) camera.

Usage example:

I hope that WickedX will be able to embed this in the r114 source code so that to use the internal functions instead of dbPickScreen, dbGetPickVectorX, dbGetPickVectorY, dbGetPickVectorZ.

3) dbFind2DCoordinatesFrom3DScene() — Finding 2D coordinates on screen of any 3D object (for camera 0). This piece of code extracted from the original DarkGDK source code, so it will be easy to get a user-accessible function.

Usage example:
The Tall Man
5
Years of Service
User Offline
Joined: 16th Nov 2013
Location: Earth
Posted: 11th May 2014 19:56 Edited at: 11th May 2014 20:23
Very nice! I've found that assembly-optimizing code can make a HUGE difference. I assembly-optimized a simple Fourier transform function I'd written, using speed measurements as a guide. Between making minor changes in the C code (such as incrementing pointers instead of doing array lookups each time, counting down to 0 instead of counting up, etc), and then converting that to assembly language, and using (for the first time) SSE2 instructions - the total increase in speed I consistently measured was about 760 to 1!!!!! That blew me away! I began with a pretty tightly written function to start with!

Anyway...

You forgot to finish the function with


I tend to use doubles instead of singles (floats) a lot too, but DarkGDK uses singles for everything. Might wanna have a single-precision overloaded version as well.

Judging what we see is the greatest blinder and self-limiter in the universe.

What we perceive is never reality. It is only a story we tell ourselves based on our current perspective, which has far more to do with our beliefs about ourselves than with anything else.
s_i
9
Years of Service
User Offline
Joined: 23rd May 2009
Location: Russia
Posted: 11th May 2014 21:06 Edited at: 11th May 2014 21:08
No, both variants of dbFastSqrt() work without "fstp n". Tested.
Quote: "Might wanna have a single-precision overloaded version as well."

I'm sorry, but unfortunately I hardly know assembler. For "float" I use as in example:

and it works 4 times faster then sqrt(fOne).
The Tall Man
5
Years of Service
User Offline
Joined: 16th Nov 2013
Location: Earth
Posted: 12th May 2014 01:32 Edited at: 12th May 2014 01:35
Very strange, that contradicts my knowledge and experience with the FPU. From what I see, you're loading n into the FPU, taking the square-root within the FPU, then leaving it there, and the n variable still containing its original value.

To do single-precision floats, in variant 1, just use a float instead of double as a function parameter. For variant 2, use dword ptr instead of qword ptr.

Another idea - if sqrt is something you're going to call in an intensive loop (which is why I assume you've assembly-translated it), you could write a macro instead of a function. That would bypass a slow function call and stack processing, and make it faster yet. One thing about Microsoft is that it Debug mode, it will refuse to obey your inline directives, even if you use "force".

Judging what we see is the greatest blinder and self-limiter in the universe.

What we perceive is never reality. It is only a story we tell ourselves based on our current perspective, which has far more to do with our beliefs about ourselves than with anything else.
s_i
9
Years of Service
User Offline
Joined: 23rd May 2009
Location: Russia
Posted: 12th May 2014 01:58
Both variants of dbFastSqrt() work for me without "fstp n". Other people tested too without problems:
http://www.codeproject.com/Articles/69941/Best-Square-Root-Method-Algorithm-Function-Precisi

About "intensive loop" -- I wrote it (in first post) only for example and for test.
s_i
9
Years of Service
User Offline
Joined: 23rd May 2009
Location: Russia
Posted: 12th May 2014 21:32
@ The Tall Man
1) dbFastSqrt() Variant #1 -- I changed "double" to "float", work good, as fast as with a "double".
2) dbFastSqrt() Variant #2 -- changing "double" to "float" and "qword ptr" to "dword ptr" compiles without error, but gives runtime error.
WickedX
10
Years of Service
User Offline
Joined: 8th Feb 2009
Location: A Mile High
Posted: 15th May 2014 19:19 Edited at: 17th May 2014 04:52
I’m attaching three documents to this post I think you’ll find interesting. These documents cover the Intel 64 and IA-32 Architecture and Instruction Set including the FPU and the single-instruction multiple-data (SIMD) operations. These extensions include the MMX technology, SSE extensions, SSE2 extensions, SSE3 extensions, Supplemental Streaming SIMD Extensions 3, and SSE4.


1) dbFastSqrt(): Inline functions should naturally be faster. However, variant #1 shows a very slight performance increase. Simply changing variant #1 to an inline function however shows a noticeable increase.



May I suggest instead of adding this function, We optimize the built in function. This will not change the functionality of the built in function in any way. We could then look into optimizing additional core math functions in this way.

2) dbFind3DCoordinatesOn2DScreen(): Handy function, looking into a more generalized procedure.

3) dbFind2DCoordinatesFrom3DScene(): Basically the internal function DB_ObjectScreenData(), but instead of getting an objects position we pass the 3D coordinates to the function. My suggestions: Name for the function - dbProjectPoint() or dbProject3DPoint(). Other then internal functions, Dark GDK does not pass parameters by reference. We could add this to the 3DMath Module and return the coordinates in a 2D Vector. Of course this is up for debate.

Edit:
It looks like it’s time to start removing DBPro’s dll support from the Dark GDK source. Optimizing dbSqrt() using inline FPU assembly is much faster, but I’m not getting the speed I expected. This is due, in part to the DWORD casting required to return a float in DBPro. After changing this functionality in DBDLLCore.cpp and DarkSDKCore.cpp, it was necessary to rebuild the Core.lib and DarkSDK.lib. Now I’m getting even better results. The only thing I see preventing the results I would expect is calling dbSqrt() which in turn calls SqrtFF(). Removing DBPro from the source alone would greatly improve Dark GDK’s performance.

Attachments

Login to view attachments
s_i
9
Years of Service
User Offline
Joined: 23rd May 2009
Location: Russia
Posted: 17th May 2014 22:01 Edited at: 17th May 2014 22:06
1.1. "May I suggest instead of adding this function, We optimize the built in function..." -- Do as you want. But I think adding new function is good choice too.
1.2. Why do you use "doudle" instead of "float" in "dbFastSqrt() Variant #1"? Using "float" may be better, because DarkGDK use "float".
1.3. "Inline functions should naturally be faster..." -- In my testbed I use "Release configuration" with "speed optimization", in this terms Visual Studio inline functions even without my word "inline" before function dbFastSqrt(). Maybe you use "Debug" instead of "Release" for tests?
2. Oops! I can not understand translation of your frase "Handy function, looking into a more generalized procedure."
3. About name of the function. I think name should match the meaning of the function. But maybe "dbFind2DCoordinatesFrom3DScene" is too long. Maybe "dbProject3DPointToScreen()" ?.. So do as you want.
WickedX
10
Years of Service
User Offline
Joined: 8th Feb 2009
Location: A Mile High
Posted: 17th May 2014 23:55 Edited at: 18th May 2014 00:05
1.1) In this case I don’t think it would be necessary.

1.2) Should make little if any difference. In the Dark GDK source I used float.

1.3) The only reason I have to build in debug, would be to test that the Dark GDK modifications work in debug as well. You may have noticed I removed the _d suffix from the ogg debug libs.

2) May in the end, use it as is with internal functions. The function seems a little specific to one task when it could possibly do more.

3) One definition of Projection; is an image or picture projected on a surface. That kind of makes “ToScreen” seem redundant. Don’t you think?
s_i
9
Years of Service
User Offline
Joined: 23rd May 2009
Location: Russia
Posted: 18th May 2014 11:17
Ok, I agree with you.

Login to post a reply

Server time is: 2019-04-25 22:44:11
Your offset time is: 2019-04-25 22:44:11