Propose several new functions to "DarkGDK - Unofficial r114 update"

Author

Message

s_i

14

Years of Service

User Offline

Joined: 23rd May 2009

Location: Russia

Posted: 11th May 2014 16:47 Edited at: 12th May 2014 21:41

Link

Hello! I propose several new functions to "DarkGDK - Unofficial r114 update". I successfully use them in my game and I think they can be useful to others. Maybe WickedX will agree to include it in the "DarkGDK - Unofficial r114 update" code.

My testbed: CPU Intel i5-4670 3.4 GHz, Windows 7 x64, Visual C++ 2008 Express Edition, DirectX SDK (August 2007).

1) dbFastSqrt() -– Square root.

+ Code Snippet

//Variant #1:
float dbFastSqrt (float n)
{
	__asm{
	fld n
	fsqrt
	}
}

//Variant #2:
double inline __declspec (naked) __fastcall dbFastSqrt (double n)
{
	_asm fld qword ptr [esp+4]
	_asm fsqrt
	_asm ret 8
}

Usage example:

+ Code Snippet

float fff = 0.00001f;
for (int i=0; i<10000000; i++) { fff += dbFastSqrt (fff); }

Using dbFastSqrt() with "Release configuration" gives me this result: speed of calling dbFastSqrt() is 21...26% from speed of calling sqrt(), i.e. 4 times faster.

You can choose any of the two variants. On my testbed both variants give the same speed, but other people on other testbeds have the best speed in the second variant, so I personally use the second variant. (The second variant should be written in a .h file, and not in a .cpp file, because it uses "__declspec (naked)". )

Update (12 May 2014): in Variant #1 I changed "double" to "float", work good, as fast as with a "double".

2) dbFind3DCoordinatesOn2DScreen() — Finding 3D coordinates (of any point on ground) on 2D screen for ordinary (non-orthogonal) camera.

+ Code Snippet

void dbFind3DCoordinatesOn2DScreen (const int& mouseX, const int& mouseY, const float& cameraX, const float& cameraY, const float& cameraZ, float& X, float& Z)
{	
	dbPickScreen (mouseX, mouseY, -cameraY);
	const float Y = dbGetPickVectorY();
	X = dbGetPickVectorX();
	Z = dbGetPickVectorZ();
	X = X*(-cameraY)/Y + cameraX;
	Z = Z*(-cameraY)/Y + cameraZ;
}

Usage example:

+ Code Snippet

//X-axis and Z-axis is the ground. Y-axis is directed top. The camera is looking at the ground.
float cX=5.0f, cY=5.0f, cZ=5.0f;
dbPositionCamera (cX, cY, cZ);
dbPointCamera (cX, 0.0f, cZ);
int mX=dbMouseX(), mY=dbMouseY();
float x, z;
dbFind3DCoordinatesOn2DScreen (mX, mY, cX, cY, cZ, x, z);
//now 3D coordinates of ground under cursor are: (x, 0.0f, z)

I hope that WickedX will be able to embed this in the r114 source code so that to use the internal functions instead of dbPickScreen, dbGetPickVectorX, dbGetPickVectorY, dbGetPickVectorZ.

3) dbFind2DCoordinatesFrom3DScene() — Finding 2D coordinates on screen of any 3D object (for camera 0). This piece of code extracted from the original DarkGDK source code, so it will be easy to get a user-accessible function.

+ Code Snippet

void dbFind2DCoordinatesFrom3DScene (const float& X, const float& Y, const float& Z, const int& screenCenterX, const int& screenCenterY, int& screenX, int& screenY)
{	
	D3DXVECTOR3 vecBob;
	vecBob.x=X; vecBob.y=Y; vecBob.z=Z;
	// get current camera transformation matrices (camera 0)
	D3DXMATRIX matTransform = dbGetViewMatrix(0) * dbGetProjectionMatrix(0);
	// Transform object position from world-space to screen-space
	D3DXVec3TransformCoord(&vecBob, &vecBob, &matTransform);
	// Screen data
	screenX=(vecBob.x+1.0f)*screenCenterX;
	screenY=(1.0f-vecBob.y)*screenCenterY;
}

Usage example:

+ Code Snippet

//center of the screen
int screenCX = GetSystemMetrics(SM_CXSCREEN)/2;
int screenCY = GetSystemMetrics(SM_CYSCREEN)/2;
//oX, oY, oZ – 3D coordinates of any object. mX, mY -- will be its coordinates on the screen.
int mX, mY;
dbFind2DCoordinatesFrom3DScene (oX, oY, oZ, screenCX, screenCY, mX, mY);
//now 2D coordinates on screen are: (mX, mY).

Back to top

Profile PM

The Tall Man

10

Years of Service

User Offline

Joined: 16th Nov 2013

Location: Earth

Posted: 11th May 2014 19:56 Edited at: 11th May 2014 20:23

Link

Very nice! I've found that assembly-optimizing code can make a HUGE difference. I assembly-optimized a simple Fourier transform function I'd written, using speed measurements as a guide. Between making minor changes in the C code (such as incrementing pointers instead of doing array lookups each time, counting down to 0 instead of counting up, etc), and then converting that to assembly language, and using (for the first time) SSE2 instructions - the total increase in speed I consistently measured was about 760 to 1!!!!! That blew me away! I began with a pretty tightly written function to start with!

Anyway...

You forgot to finish the function with

+ Code Snippet

fstp n

I tend to use doubles instead of singles (floats) a lot too, but DarkGDK uses singles for everything. Might wanna have a single-precision overloaded version as well.

Judging what we see is the greatest blinder and self-limiter in the universe.

What we perceive is never reality. It is only a story we tell ourselves based on our current perspective, which has far more to do with our beliefs about ourselves than with anything else.

Back to top

Profile PM

s_i

14

Years of Service

User Offline

Joined: 23rd May 2009

Location: Russia

Posted: 11th May 2014 21:06 Edited at: 11th May 2014 21:08

Link

No, both variants of dbFastSqrt() work without "fstp n". Tested.

Quote: "Might wanna have a single-precision overloaded version as well."

I'm sorry, but unfortunately I hardly know assembler. For "float" I use as in example:

+ Code Snippet

float fOne = 1.0f;
float fTwo = dbFastSqrt (fOne);

and it works 4 times faster then sqrt(fOne).

Back to top

Profile PM

The Tall Man

10

Years of Service

User Offline

Joined: 16th Nov 2013

Location: Earth

Posted: 12th May 2014 01:32 Edited at: 12th May 2014 01:35

Link

Very strange, that contradicts my knowledge and experience with the FPU. From what I see, you're loading n into the FPU, taking the square-root within the FPU, then leaving it there, and the n variable still containing its original value.

To do single-precision floats, in variant 1, just use a float instead of double as a function parameter. For variant 2, use dword ptr instead of qword ptr.

Another idea - if sqrt is something you're going to call in an intensive loop (which is why I assume you've assembly-translated it), you could write a macro instead of a function. That would bypass a slow function call and stack processing, and make it faster yet. One thing about Microsoft is that it Debug mode, it will refuse to obey your inline directives, even if you use "force".

Judging what we see is the greatest blinder and self-limiter in the universe.

What we perceive is never reality. It is only a story we tell ourselves based on our current perspective, which has far more to do with our beliefs about ourselves than with anything else.

Back to top

Profile PM

s_i

14

Years of Service

User Offline

Joined: 23rd May 2009

Location: Russia

Posted: 12th May 2014 01:58

Link

Both variants of dbFastSqrt() work for me without "fstp n". Other people tested too without problems:
http://www.codeproject.com/Articles/69941/Best-Square-Root-Method-Algorithm-Function-Precisi

About "intensive loop" -- I wrote it (in first post) only for example and for test.

Back to top

Profile PM

s_i

14

Years of Service

User Offline

Joined: 23rd May 2009

Location: Russia

Posted: 12th May 2014 21:32

Link

@ The Tall Man
1) dbFastSqrt() Variant #1 -- I changed "double" to "float", work good, as fast as with a "double".
2) dbFastSqrt() Variant #2 -- changing "double" to "float" and "qword ptr" to "dword ptr" compiles without error, but gives runtime error.

Back to top

Profile PM

WickedX

15

Years of Service

User Offline

Joined: 8th Feb 2009

Location: A Mile High

Posted: 15th May 2014 19:19 Edited at: 17th May 2014 04:52

Link

I’m attaching three documents to this post I think you’ll find interesting. These documents cover the Intel 64 and IA-32 Architecture and Instruction Set including the FPU and the single-instruction multiple-data (SIMD) operations. These extensions include the MMX technology, SSE extensions, SSE2 extensions, SSE3 extensions, Supplemental Streaming SIMD Extensions 3, and SSE4.

1) dbFastSqrt(): Inline functions should naturally be faster. However, variant #1 shows a very slight performance increase. Simply changing variant #1 to an inline function however shows a noticeable increase.

+ Code Snippet

inline double dbFastSqrt (double n)
{
	__asm {
	fld n
	fsqrt
	}
}

May I suggest instead of adding this function, We optimize the built in function. This will not change the functionality of the built in function in any way. We could then look into optimizing additional core math functions in this way.

2) dbFind3DCoordinatesOn2DScreen(): Handy function, looking into a more generalized procedure.

3) dbFind2DCoordinatesFrom3DScene(): Basically the internal function DB_ObjectScreenData(), but instead of getting an objects position we pass the 3D coordinates to the function. My suggestions: Name for the function - dbProjectPoint() or dbProject3DPoint(). Other then internal functions, Dark GDK does not pass parameters by reference. We could add this to the 3DMath Module and return the coordinates in a 2D Vector. Of course this is up for debate.

Edit:
It looks like it’s time to start removing DBPro’s dll support from the Dark GDK source. Optimizing dbSqrt() using inline FPU assembly is much faster, but I’m not getting the speed I expected. This is due, in part to the DWORD casting required to return a float in DBPro. After changing this functionality in DBDLLCore.cpp and DarkSDKCore.cpp, it was necessary to rebuild the Core.lib and DarkSDK.lib. Now I’m getting even better results. The only thing I see preventing the results I would expect is calling dbSqrt() which in turn calls SqrtFF(). Removing DBPro from the source alone would greatly improve Dark GDK’s performance.

Attachments

Login to view attachments

Back to top

Profile PM Email

s_i

14

Years of Service

User Offline

Joined: 23rd May 2009

Location: Russia

Posted: 17th May 2014 22:01 Edited at: 17th May 2014 22:06

Link

1.1. "May I suggest instead of adding this function, We optimize the built in function..." -- Do as you want. But I think adding new function is good choice too.
1.2. Why do you use "doudle" instead of "float" in "dbFastSqrt() Variant #1"? Using "float" may be better, because DarkGDK use "float".
1.3. "Inline functions should naturally be faster..." -- In my testbed I use "Release configuration" with "speed optimization", in this terms Visual Studio inline functions even without my word "inline" before function dbFastSqrt(). Maybe you use "Debug" instead of "Release" for tests?
2. Oops! I can not understand translation of your frase "Handy function, looking into a more generalized procedure."

3. About name of the function. I think name should match the meaning of the function. But maybe "dbFind2DCoordinatesFrom3DScene" is too long. Maybe "dbProject3DPointToScreen()" ?.. So do as you want.

Back to top

Profile PM

WickedX

15

Years of Service

User Offline

Joined: 8th Feb 2009

Location: A Mile High

Posted: 17th May 2014 23:55 Edited at: 18th May 2014 00:05

Link

1.1) In this case I don’t think it would be necessary.

1.2) Should make little if any difference. In the Dark GDK source I used float.

1.3) The only reason I have to build in debug, would be to test that the Dark GDK modifications work in debug as well. You may have noticed I removed the _d suffix from the ogg debug libs.

2) May in the end, use it as is with internal functions. The function seems a little specific to one task when it could possibly do more.

3) One definition of Projection; is an image or picture projected on a surface. That kind of makes “ToScreen” seem redundant. Don’t you think?

Back to top

Profile PM Email

s_i

14

Years of Service

User Offline

Joined: 23rd May 2009

Location: Russia

Posted: 18th May 2014 11:17

Link

Ok, I agree with you.

Back to top

Profile PM

Sorry your browser is not supported!

Dark GDK / Propose several new functions to "DarkGDK - Unofficial r114 update"

Attachments

Sorry your browser is not supported!

Dark GDK / Propose several new functions to &quot;DarkGDK - Unofficial r114 update&quot;

Attachments

Dark GDK / Propose several new functions to "DarkGDK - Unofficial r114 update"