Proberly won't make any noticable difference, but you are using rgb(0,0,0) a lot. Instead of calculating the colour each time you can store it in a variable and compare against that variabe.
global _black as integer
_black = rgb(0,0,0)
Then to compare
if memblock dword(1,(px*4)+(py*4*640)) <> _black
instead of
if memblock dword(1,(px*4)+(py*4*640)) <> rgb(0,0,0)
I can't remember if DOT commands are sped up by locking the pixels fist. You could try moving the UNLOCK PIXELS command to just above the text command in the main loop so the DOT command is done with the pixels locked.
do
`write memory to screen
lock pixels
copy memory get pixels pointer(),get memblock ptr(1),memsize
`make pixels around the mouse active
if mouseclick()>0
active_pixel(mousex(),mousey())
active_pixel(mousex()+1,mousey())
active_pixel(mousex(),mousey()+1)
active_pixel(mousex()-1,mousey())
active_pixel(mousex(),mousey()-1)
endif
pixmove()
unlock pixels
set cursor 0,0
print screen fps()
sync
loop
Perhaps precalculate some of the often done calculations. You multiply by 640*4 a lot. You could either use 2560 or stick the result in a variable and mulitply by that instead of calculating 640*4 thousands of times a second.
Most of the slowdown is comming from copying all the memory to the screen each looop.
It might be faster to use a proper image memblock, do all manipulation on that memblock and paste it to the screen each loop but I don't know for sure.