@TKF15H
I really haven't seen many compilers use ESP to fetch locals. The output of the few that I've looked at have always used EBP. What compiler did you use? The one's I've looked at including MASM all use EBP for locals.
Also, I believe that I read somewhere that using EBP is actually quicker than using ESP to access locals. This is due to some kind of one clock cycle delay when decoding instructions with ESP in the ModR/M byte. I'm not sure where I read it but I'm pretty sure it is true.
@Offset of reality
Not on vacation. Just rarely get around to posting in the forums these days.
Yes, I've figuared out what the SIB byte is. It is used for accessing arrays. Assume for a moment that you have an array of bytes called "MyArray". Assume also that you want to cycle through each element in the array. First you would load up the address of MyArray into a register. This register will be our "Base". We would then clear some register and use that to hold the index of our array. We'll call this the "Index"(noticing a pattern?

). Now in in order to get the first byte or our array we would use a piece of code like this:
MOV ebx, MyArray ;Our base register
XOR ecx, ecx ;Our index register
;Grab a byte
MOV al, [EBX + ECX]
What this does is take the pointer from our base register and add it to our index register. Using the resulting pointer it then fetches a byte into the al register. It might just seem easier to use MOV al, [EBX] since anything plus zero is going to be itself, and you'd be right. However, what if you are in a for next loop and are cycling through the array with the loop index? That is where the real power of the sib shines. If you held the loop index in the ECX register, you could access each byte in the array sequentially, because you'll be incrementing the for next counter each loop.
This is a very usefull optimization, however, there is a draw back. It will only work on arrays of bytes. If you were to have an array of integers, the first itineration would work out really well. However, with a index counter set to one you'd wind up only getting the second byte of the first integer, not the second integer! Allow me to explain with some simple math.
Suppose you have your array at address 10. Your ebx register(which contains the pointer to the array) would then hold the value 10. Since there are 4 bytes to an integer the next integer would be located at address 14. With an index of 0 you'd get the address of 10 since 10 + 0 (EBX + ECX) equals 10. However, with an index of 1 you'd get 11. This is fine for bytes, but anything larger like an integer it won't do, because the base pointer needs to increment by 4. Bit of a severe restriction, eh? But there is a solution:
Scale to the rescue! Scale can be one of any 4 values: 1, 2, 4, 8. And it works like this: The Scale value is multipled by index and the product is then added to the base. So with a scale value of 4 our instruction would look like this:
MOV EAX, [EBX + ECX * 4]
Using this feature we can now access arrays of integers in a for next loop with the loop counter. As I said before, this is limited to 4 values and one of them is redundent(Anything * 1 = Anything). But it is much more efficent and saves you from wasting a register to hold the pointer to the current element in the array.