A search (on these forums; use the searchbox below) for "editor" will yield a lot of results.
Basically what you do is load different objects (and images, sounds, etc., etc.) in by various user inputs, let the user position them (this can be all from full-out 3d mouse placement to the boring, yet workable method of simply typing the location coordinates) and store the info, preferably in an array of a custom type, which you can then save to a file and reload in order to setup your scenes from the info it contains. (You will obviously add more features than that, but yeah, you get the idea).
Again, search for it; there might even be complete tutorials covering this matter - I think I've seen some.
(Or, if you have any particular questions, by all means; feel free to ask).
"We know some things about poodles, for example that they are alive, they can bark, they eat meat..."
- Extract from Objects first with Java.