Quote: "blink0k is correct, imo. Thread count and rendering API have little to do with array performance (or much of any data set/type for that matter). What you are observing likely has more to do with the interpreted nature of AGK/S vs the machine code compiled structure of DBPro."
While DBP is a Compiled Language., but it isn't Machine Code Compiled.
Instead the Virtual Machine is Compiled and Optimised, along with the Bytecode being Optimised... which allowed for better performance over the Embedded VM and Parsed Bytecode of DBC., but still the overall performance wasn't as good as Native Machine Code.
As for how that differs from AppGameKit... I'm not convinced it is actually Interpreted.
If it was, then it wouldn't strictly need to generate Bytecode, as that's something that Virtual Machine Languages do... instead it could just perform Just-in-Time Parsing, and work as a Real-Time Scripting Language.
In any case... you might find the result of this interesting:
AGK-S v2020.3.27
// show all errors
SetErrorMode(2)
// set window properties
SetWindowTitle( "normalproblem" )
SetWindowSize( 1024, 768, 0 )
SetWindowAllowResize( 1 ) // allow the user to resize the window
// set display properties
SetVirtualResolution( 1024, 768 ) // doesn't have to match the window
SetOrientationAllowed( 1, 1, 1, 1 ) // allow both portrait and landscape on mobile devices
SetSyncRate( 0, 1 ) // 30fps instead of 60 to save battery
SetScissor( 0,0,0,0 ) // use the maximum available screen space, no black borders
UseNewDefaultFonts( 1 ) // since version 2.0.22 we can use nicer default fonts
Type Time
New As Float
Old As Float
Delta As Float
AvgDelta As Float
Period As Float
Counter As Integer
EndType
Type DebugStruct
A As Integer
B As Float
C As String
EndType
Global debugMain As Time : debugMain.New = Timer()
Global debugText As Integer : debugText = CreateText( "" )
SetTextPosition( debugText, 10.0, 10.0 )
SetTextSize( debugText, 16.0 )
debugMemblock = CreateMemblock( 64 )
Global debugArray As debugStruct[ 63 ]
Global A As Integer = 10
Global B As Float = 5.0
Global C As String = "Hello"
Repeat
debugMain.Old = debugMain.New
debugMain.New = Timer()
debugMain.Delta = debugMain.New - debugMain.Old
Inc debugMain.Counter
debugMain.Period = debugMain.Period + debugMain.Delta
i = 0
Repeat //For i = 0 To 99999 Step 4
SetMemblockInt( debugMemblock, 0x0000, 10 )
SetMemblockInt( debugMemblock, 0x0003, 10 )
SetMemblockInt( debugMemblock, 0x0007, 10 )
SetMemblockInt( debugMemblock, 0x000A, 10 )
/*
debugArray[0].A = A
debugArray[1].A = A
debugArray[2].A = A
debugArray[3].A = A
*/
Inc i, 4
Until i >= 99999
SetTextString( debugText, Str( debugMain.AvgDelta * 1000.0, 2) + "ms" )
Sync()
If debugMain.Period >= 1.0
debugMain.AvgDelta = debugMain.Period / debugMain.Counter
debugMain.Period = 0.0
debugMain.Counter = 0
EndIf
Until GetRawKeyPressed(27)
DBP v1.07.GG
Disable Escapekey
Set Window Size 1024, 768
Set Display Mode 1024, 768, 32
Sync On
Sync Rate 0
Type Time
New As Float
Old As Float
Delta As Float
AvgDelta As Float
Period As Float
Counter As Integer
EndType
Type debugStruct
A As Integer
B As Float
C As String
EndType
Global debugMain As Time : debugMain.New = 0.0 + Timer()
Set Text Font "Segeo UI"
Set Text Size 16
Global debugMemblock As Integer = 1
Make Memblock debugMemblock, 64
Global DIM debugArray(64) As debugStruct
Global A As Integer = 10
Global B As Float = 5.0
Global C As String = "Hello"
Repeat
Cls
debugMain.Old = debugMain.New
debugMain.New = Timer()
debugMain.Delta = debugMain.New - debugMain.Old
Inc debugMain.Counter
debugMain.Period = debugMain.Period + debugMain.Delta
i = 0
Repeat // For i = 0 To 99999 Step 4
Write Memblock Dword debugMemblock, 0x0000, 10
Write Memblock Dword debugMemblock, 0x0003, 10
Write Memblock Dword debugMemblock, 0x0007, 10
Write Memblock Dword debugMemblock, 0x000A, 10
remstart
A = 10 + 10
A = 10 - 5
A = 10 * 5
A = 15 / 5
remend
Inc i, 4
Until i >= 99999 //Next
Text 10, 10, Str$( debugMain.AvgDelta, 2) + "ms"
Sync
If debugMain.Period >= 1000.0
debugMain.AvgDelta = debugMain.Period / debugMain.Counter
debugMain.Period = 0.0
debugMain.Counter = 0
EndIf
Until EscapeKey()
Now I perhaps should've noted that the Performance Metrics I'm getting are from an AMD Ryzen 5 1600 at 3.2/3.5GHz (Stock)., and it was a 1st Run 1st Gen Ryzen... so it actually doesn't overclock particularly well (think I can squeeze 3.72GHz out of it on a good day).
Still in any case... we are talking about 6 Cores and 12 Threads., now Windows 10 specifically sees these as Physical and Logical Processors.
DBP will recognise and utilise 4 Cores for Logical Processes., but Optimisation is Single Core.
So you only really gain performance from Additional Cores via DirectX Operations (which I'm not using) ... so for all intended purposes it's Single Threaded; and it can't use SMT/HTT.
And I can actually track this via the Task Manager - Performance Tab.
Running the DBP Code essentially just uses Core 0 and nothing else.
Where-as this is where things get a little "Weird" with AppGame Kit.
Core 3 sees a rise in utilisation., while Core 8 sees the same massive leap in utilisation as Core 0 when using DBP... the Core 3 rise in utilisation is about 30% of what Core 8 has., and it's inline with the runtime; meaning there is "Some" offloading occurring; but not much.
Why I say this is "Weird" is because of how AMD "Cores" work.
See with Intel., on a 6 Core 12 Thread CPU... Core 0 = Physical Core, Core 1 = HyperThread ... and so we can just say that the Even Numbers are Physical Cores while the Odd Numbers of Logical Cores.
AMD on the other hand Core 0 - 5 are the Physical Cores; while Core 6 - 11 are Logical Cores.
This means that AppGameKit is more or less just running on SMT almost exclusively.
In fact if I had to guess, it's running on Core 3 with SMT... and the SMT actually has the bulk of the workload being pushed to it.
Now SMT unlike Hyper-Threading is essentially 2 General Purpose Process Pipelines (ALX) operating beside the ALU; these are "General Purpose" because they can either be used by the SIMD or ALU as "Extra" Processing Pipeline., depending on which has more priority.
Windows Threading doesn't exactly use these well, hence why AMD recommend their CPU SDK on GPU Open.
Still in essence this does mean you have approx. 50% of the Throughput of the Physical Core; which as noted AppGameKit is pushing most of it's Logical Processing on to.
At least in my case. I'm not sure if that's different for Intel Processors,.. and Bulldozer just has Split Physical GPU (2 ALU per SIMD with a Shared I/O Pipeline; hence why it sucks at Floating-Point,. because it's far too easy to bottleneck it); so I'm sure they behave as you'd expect them too.
Mind it doesn't stop this being weird Threading Behaviour.
As it still ends up being more-or-less "Single Threaded" it just isn't using the Lowest Thread.
And it ALWAYS uses those Processor "Cores" ... despite others having lower utilisation., so it's not like it's even testing and using "Unutilised" Cores.
I'm going to experiment further., but I'm starting to get a good picture as to why AppGameKit has relatively terrible performance for Larger Datasets.
Remember my point here wasn't the concern that AGK/DBP both drop performance with Larger Datasets... rather it stems from how AGK-S *SUBSTANTIALLY* drops performance.
This new information doesn't really help in terms of improving performance, as well the Dataset that I want to use for the other project is somewhat requires / frame., breaking it down into smaller loops isn't going to help.
Instead I think this highlights what TGC could do to dramatically improve performance, as they're clearly under-utilising the Hardware Available... and I'd never have even thought to check had this not been an issue.