With routers being so common these days, it's not common to see peer-to-peer networked games anymore.
I've posted this link many times on the forums already so you may have seen it, but I'll post it again anyway:
http://developer.valvesoftware.com/wiki/Source_Multiplayer_Networking.
However, that doesn't cover in detail what you want to know. I figured out a suitable system which is probably the same as the way Source and other engines manage it, although it's pretty detailed and I don't want to type it all out here.
Here's some brief idea of what you have to do:
1. Client: Every tick, sample user input (up/down/left/right/jump/etc) and store this in a queue as a
user command, along with the time of the previous tick (this will be how long the command will operate over) and a sequence number.
2. Client: Every networking tick, send the server any queued commands (but do not remove them from the queue yet).
3. Server: Every tick, process all user commands for each client, using the time stored.
*[1]
4. Server: Every networking tick, send all coordinates to all relevant players (you should send players their own coordinates too so that they may adjust if an error occurred).
This is the very basic system you should use for handling movement server-side. However, this alone won't be satisfactory as there will be a delay for each player as they press keys and wait for their updated position from the server. So what's the cure here? Client-side input prediction, using the user commands stored. For each tick on the client, execute one of the stored commands and increment a counter to point to the next command.
So far, under good conditions it should perform pretty well, but not perfectly. There are a couple of things that will produce synchronization errors between the server's copy of the player object, and the client's copy. For example, due to delays between receiving other networked object data, the client will sometimes register (or not register) collision with an object which is not there anymore (or is there), and aside from this you have to account for packet loss. You may think that when an error occurs, you can simply have the client set the player object to the latest position received, so what's wrong with this? Due to the latency between the user commands being generated and being processed, the client's player object should
always be at least slightly ahead of the server's copy. If you place the player at the server's last indicated position, you'll be placing it at an old location - and the server may have processed other commands since then.
So what do you do? Well, when the server sends each players coordinates, it must send the sequence number that corresponds to the last acknowledged user command. This way, when an error occurs you can start interpolating the player coordinates from the last known position, through to the latest command. This way, the client-side prediction can stay slightly ahead (and correctly) as it should be.
Also, I haven't mentioned it already, but user commands must be stored on the client until they are acknowledged by the server. After that they can be removed from the queue.
*[1] You'll probably be thinking "but if the client dictates the amount of time for each command, surely they could cheat?" but I'm afraid this is about the only way you can make sure the server processes the commands in an identical fashion to the client's own input prediction. And besides, all you have to do here for security is track the amount of time a client uses, and make sure it doesn't change at a rate different from the server's clock (though you must allow for packet jitter). It's by far simpler than trying to prevent cheating where clients are allowed to dictate their own coordinates.
Of course, there's more to it - you'll probably want to adjust things for estimated latency, and add lag compensation, and I may have missed minute details from my explanation as it's all off the top of my head, but that should give you a basic idea of what you need to do.