Well i can't do much in the way of explaining exactly what to type ... however I can explain a method I'd use to achieve this.
Now we'll start with the worlds, they should be kinda a pseudo 3D - this is mainly for speed means, but basically make a world that will be seen only from headon at an angle upwards. Allowing you to save on polygons but more importantly rendering portions first to get a far better look than you could with a great number of textured objects.
for the actual characters use textured plains that face the camera bill boarding but using the floor as the axis.
that way you can use sprites and make it seem kinda 3D, and really be good and speedy
the control system is really upto you but if you've billboarded the plains and the background is 3D then the camera should be angled down to it so you can see ... so the controls wouldn't be any different from say the 3D person just on a more linear axis variables

hopefully you understand that.
the collision system you will probably wanna use some kinda of distance checker rather than actual collision ... then by setting up a variable for if a person is within Nth distance and the sprite face varilable is facing them then one thing - can have backward moves if it isn't.
hopefully this'll give you some ideas on howto do this
Anata aru kowagaru no watashi! 