Quote: "This game is huge project (about 50 levels with 500 HQ monsters)."
That would put me off if it wasn't for the nice pre rendered scenes you have made. But i have to tell you, you are better making the 'actual game', worry about cut scenes at the end, also, aim smaller, 10 good monsters is better than 50 when it comes to getting the game done. Also on voice acting, again, Id do this later on in your dev cycle, I use place holder voices when I build the game, I add voice artists at the end when I know the place holders are working.
I use this tool for place holder voices. text to speech
http://www2.research.att.com/~ttsweb/tts/demo.php
I even use it for sound fx, if I have a rocket launcher, I dont hunt for a place holder, I get the actual working game in first, working mesh, working code, I worry about the polish later, for the sound if its the wav is called rocket blast.wav I record text to speech saying rocket blast. Its a great way to check what works and what doesnt in a way that males sense, not just for me as a producer and director, but aso for the art and code departments as well, its easy and quick to prototype and see what works, and what needs replacing with the finalized art at the end.
If I make a film, I use still images and text to speech to make the actors voice, its crude, but I can spot what works, what dialog or timing is off, what scenes need cutting. I suggest you do the same, worry about final art when the core of the project works first.
When you can show a working game with placeholders, talent will work with you, coders and artists now the project is 'on' they can see it, play it, its a real thing. It just needs the pretty pony to come in, also even if the code isnt built, having place holders that make sense and are in order and very produced design docs really makes there job a hell of a lot easier.