DirectX 9 is Single Threaded on a Single Core... so while CPU Performance has improved over the years., if you look at Single Core Performance; it hasn't really increased much.
You're looking at ~2x Performance over the past 12 years., while the decade before that you're looking at almost 8x Performance Increase.
Now a Draw Primitive is essentially the number of Draw Calls required to handle State Changes in a given Primitive Object., and DX9 isn't exactly efficient with this.
Generally speaking on a "Modern Processor" you should be able to handle ~1.2 Million Draw Calls / Second or 20,000 Draw Calls / Frame.
This is where newer DirectX become better., as
DirectX 9 : 20,000 Draw Calls / Frame
DirectX 10 : 9,600 Draw Calls / Frame
DirectX 11 : 6,500 Draw Calls / Frame
I won't include DirectX 12 as it actually handles them differently., as it drops State Changes that don't change; so what you instead get is an increase in Draw Calls Possible per Frame at the same Performance.
Still the above is why you end up limited to ~1,000 Draw Primitives; as this comes down to the Draw Calls per Primitive., which for DX9 is ~12-20 depending on Primitive Complexity.
OpenGL and Vulkan work a bit differently here.
See., while DirectX the Draw Calls are Command Lists generated based upon what DirectX thinks it needs... with the Khronos APIs, it's when you're DIRECTLY telling it to do it.
This actually makes it difficult to measure from the API perspective as you could be doing just a single Draw Call or making multiple state Changes that increase the number of Calls Executed... and OpenGL / Vulkan doesn't know what is and isn't.
The key difference between the two is Vulkan, like DX12 will basically do a binary check to see if a given State has changed and if it isn't., simply not do another Call Execution for that Element; the idea again being to increase the potential Draw Calls made per Frame (while the actual excuted calls remains relatively the same).
Simply rule of thumb here is make fewer changes, fewer polygons and fewer objects... better performance.
What you could do is a silent benchmark for a given system to find it's limitations., then just keep track of what's going on in-flight but keep in mind that AppGameKit Script (BASIC) has terrible performance as you increase the number of memory accesses.
I have a thread somewhere that delves into it, and Dark BASIC Professional can often go from being similar performance to 100x faster; and has much greater limitations.
The only real way around that is to switch to "Tier 2" (aka C++ w/AGK SDK).