Take on C ; (And all the great links on Compiliers)

Author

Message

TKF15H

21

Years of Service

User Offline

Joined: 20th Jul 2003

Location: Rio de Janeiro

Posted: 17th Jun 2005 23:20 Edited at: 17th Jun 2005 23:21

Link

I've stopped working on my language for a while. Not much free time. The little free time I have is for a stupidly large project I managed to get obsessed with (no, it's not an RPG. It's not even a game

).
I'll get back to language creation when I'm done with my current project (if ever).
*looks at watch*
ARG, I'm late for work!

WarBasic Scripting engine for DarkBasicPro

Back to top

Profile PM Email Website

David T

Retired Moderator

22

Years of Service

User Offline

Joined: 27th Aug 2002

Location: England

Posted: 17th Jun 2005 23:39

Link

Quote: "The other dude's tutorial was Pascal. ARG. "

Is that a problem? You can just convert it...

I too have paused for a while on my uber-compiler written in c# (

). The "uber" doesn't describe functionality, but the sheer amount of beaurocracy involved in the whole thing. I suppose that's part and parcel of having every feature wrapped up in it's own little class

"A book. If u know something why cant u make a kool game or prog.
come on now. A book. I hate books. book is stupid. I know that I need codes but I dont know the codes"

Back to top

Profile PM Email Website

PowerSoft

20

Years of Service

User Offline

Joined: 10th Oct 2004

Location: United Kingdom

Posted: 7th Jul 2005 04:26 Edited at: 7th Jul 2005 04:26

Link

You say your making a C# compiler. What can it do?

[b]PowerScript: Currently Working on new VB version

Back to top

Profile PM Email Website

David T

Retired Moderator

22

Years of Service

User Offline

Joined: 27th Aug 2002

Location: England

Posted: 7th Jul 2005 05:07

Link

Not much on the outside, but it has very nice suports for all manner of arrays.

"A book. If u know something why cant u make a kool game or prog.
come on now. A book. I hate books. book is stupid. I know that I need codes but I dont know the codes"

Back to top

Profile PM Email Website

MikeS

Retired Moderator

22

Years of Service

User Offline

Joined: 2nd Dec 2002

Location: United States

Posted: 1st Aug 2005 03:14

Link

A little article on compiler construction and parsing.
http://www.cs.man.ac.uk/~pjj/farrell/compmain.html

Have recently been learning some 3D programming with OpenGL. Hope to have a nice little 3D wrapper complete soon.

Going to spend the rest of the night working on my interpreter. Progress has been slow, and I only have 1 month left of summer. My original deadline to have a working version of my interpreter was by the end of August, so I should still be able to hit that deadline.

How is progress coming along for everyone else?

A book? I hate book. Book is stupid.
(Formerly Yellow)

Back to top

Profile PM Email

TKF15H

21

Years of Service

User Offline

Joined: 20th Jul 2003

Location: Rio de Janeiro

Posted: 1st Aug 2005 05:13

Link

I've stopped working on WarBasic for a long while now, but I'm doing something similar. I'm making a Dreamcast emulator, which is very close to the VM programming you'd do in a JITted language. It isn't exactly an interpretor, as it translates the code it runs and puts it in a code cache so it doesn't need to be translated again untill it's modified. Any languages I make in the future will be based on this engine as it has MUCH better results than plain interpreting.
I've almost finished the CPU emulation (over 556,000 lines of code in 15mb of cpp/h files >_< ), and I've done bits of memory emulation also. After that I have to do the video card (which is gonna be tough as I don't have much experience with low-level 3D graphics manipulation), MAPLE BUS (also gonna be tough because of the documentation), and some other stuff.

WarBasic Scripting engine for DarkBasicPro
DC emulator code size: 14.3MB, 553,214 lines

Back to top

Profile PM Email Website

Neophyte

22

Years of Service

User Offline

Joined: 23rd Feb 2003

Location: United States

Posted: 4th Aug 2005 06:11

Link

@MikeS

How is progress coming along for everyone else?

I'm probably on my third or fourth rewrite of the compiler I'm working on. I've ported what I have over to FreeBasic and I've managed to grab a little bit of spare time to work on it some more. But with my new job I really don't have a whole lot of time to work on anything compiler related.

There is talk of Basic4gl(The BASIC dialect I'm aiming my compiler for) being discontinued at the moment so if it goes under I might just switch my compiler over to some other BASIC variant. Maybe even DBPro. I really don't want to commit a whole lot of time to writing a parser for a dialect of BASIC that will no longer exist or be used by anyone.

I'll finish up my new lexical scanner and take a look at the status of Basic4gl then. If it looks like Tom is going to call it quits, I probably move it on over to DBPro or make my own mini-dialect.

Either way, if I finish anything significant I plan on writing a series of tutorials on how to write a BASIC compiler for everyone. That should be fun.

Back to top

Profile PM

MikeS

Retired Moderator

22

Years of Service

User Offline

Joined: 2nd Dec 2002

Location: United States

Posted: 4th Aug 2005 23:59

Link

Wow TKF15H, sounds very impressive.

-------------------------
Good to hear you're still making progress. I heard about Basic4GL going under too.

There were rumors about it going open source though, so then it might be worthwhile to just keep what you have. Either way, I'd love to see a series of tutorials for creating a BASIC compiler.

A book? I hate book. Book is stupid.
(Formerly Yellow)

Back to top

Profile PM Email

PowerSoft

20

Years of Service

User Offline

Joined: 10th Oct 2004

Location: United Kingdom

Posted: 5th Aug 2005 00:30

Link

And the thread lives on

Yay

Back to top

Profile PM Email Website

TKF15H

21

Years of Service

User Offline

Joined: 20th Jul 2003

Location: Rio de Janeiro

Posted: 6th Aug 2005 19:07

Link

PowerSoft: What happend to PowerScript? You can't say "the thread lives on" and just walk out.

WarBasic Scripting engine for DarkBasicPro
DC emulator code size: 14.3MB, 553,214 lines

Back to top

Profile PM Email Website

PowerSoft

20

Years of Service

User Offline

Joined: 10th Oct 2004

Location: United Kingdom

Posted: 8th Aug 2005 12:31

Link

:S

PowerScript is on hold (see sig)

Im just amazed that the thread is still in existence and hasnt auto locked itself

Back to top

Profile PM Email Website

David T

Retired Moderator

22

Years of Service

User Offline

Joined: 27th Aug 2002

Location: England

Posted: 8th Aug 2005 12:49

Link

Autolocking was disabled a while ago I think.

"A book. If u know something why cant u make a kool game or prog.
come on now. A book. I hate books. book is stupid. I know that I need codes but I dont know the codes"

Back to top

Profile PM Email Website

PowerSoft

20

Years of Service

User Offline

Joined: 10th Oct 2004

Location: United Kingdom

Posted: 8th Aug 2005 12:50

Link

oh. That explains it then

Back to top

Profile PM Email Website

TKF15H

21

Years of Service

User Offline

Joined: 20th Jul 2003

Location: Rio de Janeiro

Posted: 17th Nov 2005 00:33

Link

@Neophyte (if you're still around

): I could really use that tutorial you were working on, even if it's incomplete. Could you please upload/e-mail it? I need it for my DC emulator. ^_^

WarBasic Scripting engine for DarkBasicPro
DC emulator code size: 14.3MB, 553,214 lines

Back to top

Profile PM Email Website

Neophyte

22

Years of Service

User Offline

Joined: 23rd Feb 2003

Location: United States

Posted: 17th Nov 2005 02:06

Link

@TKF15H

I'm in the process of completeing the tutorial right now actually. I've been quite busy with work so I really haven't had the time to progress as far as I'd like, but I can give you a basic outline.

My plan so far is instead of working from a basic language and compiling into machine code, to work from an assembly language that compiles into machine code and work from there into a basic language. That should make the parsing side of things easier and allow me to dig into the intricisies of machine code generation.

So far I have the lexical scanner completed and I'm working on the simple parsing routines. I almost have the symbol table implementation completed. Just need to make the delete symbol and delete symbol table functions and I should be set.

Once, I've completed that I'll start with the tutorial series. Part 1 should cover symbol tables, linked lists, lexical scanning, simple parsing routines, and a skeletal code generator. When part 1 is completed, you will have a mini-assembler that will assemble a source file and output actual machine code that can then be linked into a executable.

I can't promise any deadlines right now, but I can promise you that I will finish this compiler.

I also have to update my old tutorial concerning the MOV instruction as I've learned quite a bit about the complex addressing mode of x86 instructions since I wrote it. A new version of the tutorial will probably appear some where in my compiler series of tutorials.

Back to top

Profile PM

TKF15H

21

Years of Service

User Offline

Joined: 20th Jul 2003

Location: Rio de Janeiro

Posted: 17th Nov 2005 02:19

Link

Yay! Good to know you're still working on this.
Since my project isn't a programming language, the lexical scanning and parsing bits aren't necessary (for now... I'll probably have to give them a look later on). I'm generating machine code from (SH4 machine code) into RAM and running it from there directly.
Some of the code in my emulator is based on your MOV tutorial, so if anything is wrong and needs updating, please tell!

WarBasic Scripting engine for DarkBasicPro
DC emulator code size: 14.3MB, 553,214 lines

Back to top

Profile PM Email Website

TKF15H

21

Years of Service

User Offline

Joined: 20th Jul 2003

Location: Rio de Janeiro

Posted: 19th Nov 2005 20:45 Edited at: 19th Nov 2005 21:15

Link

Neophyte: Know if the Offset part of the MOV instruction is signed? If it is, then why is ESP used as a local-variable pointer, despite its value changing all the time? Doesn't seem to make any sense.

If I can't use [EBP - 4], I'm thinking of storing local variables before doing the "Mov EBP, ESP" so I can use [EBP + 4] instead. Wonder if this has any implications, as nobody ever does it like this.
Normal function structure (Intel syntax):

+ Code Snippet

Push EBP     ; Store the old base pointer
Mov EBP, ESP ; Set the base pointer to the current stack position
Sub ESP, 8   ; Create 2 32-bit variables on the stack
...          ; Function code here.
Mov EAX, DWORD PTR[ ESP + 4 ] ; Access the second variable with ESP as an offset
...
Add ESP, 8   ; Delete locals
Pop EBP
Ret          ; End Function

My function structure (no idea if this works or is any better than the normal header):

+ Code Snippet

Push EBP     ; Store the old base pointer
Sub ESP, 8   ; Create 2 32-bit variables on the stack  <---
Mov EBP, ESP ; Set the base pointer to the current stack position <---
...          ; Function code here.
Mov EAX, DWORD PTR[ EBP + 4 ] ; Use EBP as an offset, as its value does not change when you Push/Pop function parameters.
...
Add ESP, 8   ; Delete locals
Pop EBP
Ret          ; End Function

WarBasic Scripting engine for DarkBasicPro
DC emulator code size: 14.3MB, 553,214 lines

Back to top

Profile PM Email Website

Neophyte

22

Years of Service

User Offline

Joined: 23rd Feb 2003

Location: United States

Posted: 21st Nov 2005 03:29

Link

Yes, I believe it is signed. I believe that EBP is used as the local variable pointer instead of the ESP register so you can arbitrarily push and pop values whenever you need to.

The interesting thing about the x86 architecture is that the stack grows downward. So the stack layout would be like this:

+ Code Snippet

Larger Address  [LOCALS FROM PREVIOUS FUNCTION]
      |         [PARAMETERS FOR FUNCTION PUSHED ON STACK]
      |         [RETURN ADDRESS PUSHED ON STACK BY CALL INSTRUCTION]
      |         [EBP REGISTER PRESERVED ON STACK]
      ↓         [SPACE FOR LOCAL VARIABLES OF FUNCTION]
Smaller Address

So your version should work. It is a little unorthodox, but I can't figuare out any flaws in it right now. Of course, I haven't tested so due caution is appropriate when using that method.

Back to top

Profile PM

TKF15H

21

Years of Service

User Offline

Joined: 20th Jul 2003

Location: Rio de Janeiro

Posted: 21st Nov 2005 11:46

Link

Quote: "It is a little unorthodox, but I can't figuare out any flaws in it right now."

That's just the thing... if it's easier to just use EBP rather than ESP to keep track of locals, then WHY do compilers go through the trouble of tracking ESP when pushing/popping things?!? Everybody uses ESP, and I just don't see a reason to do so, which is why I'm thinking twice about using EBP instead.

WarBasic Scripting engine for DarkBasicPro
DC emulator code size: 14.3MB, 553,214 lines

Back to top

Profile PM Email Website

empty

22

Years of Service

User Offline

Joined: 26th Aug 2002

Location: 3 boats down from the candy

Posted: 21st Nov 2005 13:06 Edited at: 21st Nov 2005 13:06

Link

Quote: "if it's easier to just use EBP rather than ESP to keep track of locals, then WHY do compilers go through the trouble of tracking ESP"

Not all do. In fact Borland compilers use EBP to calculate the offet to local variables (unless some of the countless optimisation routines prevent then from using the stack to store local variables at all). It's the standard way of stack framing.

Play Nice! Play Basic! Version 1.089

Back to top

Profile PM

Three Score

20

Years of Service

User Offline

Joined: 18th Jun 2004

Location: behind you

Posted: 22nd Nov 2005 03:51 Edited at: 22nd Nov 2005 04:12

Link

have your figured out what the sib byte in a mov instruction is for
(im just read through this thread today,for over an hour so..)

and one more thing what does this mean in the mod r/m table

+ Code Snippet

EAX/AX/AL/MM0/XMM0  11   000
ECX/CX/CL/MM1/XMM1	 001
EDX/DX/DL/MM2/XMM2	 010
EBX/BX/BL/MM3/XMM3	 011
ESP/SP/AH/MM4/XMM4	 100
EBP/BP/CH/MM5/XMM5	 101
ESI/SI/DH/MM6/XMM6	 110
EDI/DI/BH/MM7/XMM7	 111

[edited]

ok, I just hit him with a shovel. Is he still conscious? Yea, I think so. Then hit him again!
If at first your dont succeed, then skydiving is not for you

Back to top

Profile PM Email Website

TKF15H

21

Years of Service

User Offline

Joined: 20th Jul 2003

Location: Rio de Janeiro

Posted: 22nd Nov 2005 13:28

Link

That is how you access a register. [EAX] (Mod 00) gets the value pointed to by EAX, and EAX (mod 11) gets the value stored in EAX.
Or at least, that's what I understood...

WarBasic Scripting engine for DarkBasicPro
DC emulator code size: 14.3MB, 553,214 lines

Back to top

Profile PM Email Website

Three Score

20

Years of Service

User Offline

Joined: 18th Jun 2004

Location: behind you

Posted: 22nd Nov 2005 23:27 Edited at: 22nd Nov 2005 23:29

Link

yes but why the / and all the different registers and mm0/xmm0

edit:
@neophyte
can i host your mini mov tutorial on my website(of course with credits to you) because well, there is just nowhere but this thread that explains it and the tut is in the middle of the thread so a bit hard to find

ok, I just hit him with a shovel. Is he still conscious? Yea, I think so. Then hit him again!
If at first your dont succeed, then skydiving is not for you

Back to top

Profile PM Email Website

empty

22

Years of Service

User Offline

Joined: 26th Aug 2002

Location: 3 boats down from the candy

Posted: 23rd Nov 2005 00:08

Link

Quote: "yes but why the / and all the different registers and mm0/xmm0"

That's just the list of registers a certain Mod R/M byte applies to.

Play Nice! Play Basic! Version 1.089

Back to top

Profile PM

TKF15H

21

Years of Service

User Offline

Joined: 20th Jul 2003

Location: Rio de Janeiro

Posted: 23rd Nov 2005 00:15

Link

Quote: " yes but why the / and all the different registers and mm0/xmm0"

Because it depends on the instruction. If you want to use AL, you use 000. If the instruction is a 16 bit one, it will use AX. If it's 32 bit opcode, it will use EAX. If it's an MMX instruction it will use.... (drum roll)... MMX!

WarBasic Scripting engine for DarkBasicPro
DC emulator code size: 14.3MB, 553,214 lines

Back to top

Profile PM Email Website

Phaelax

DBPro Master

22

Years of Service

User Offline

Joined: 16th Apr 2003

Location: Metropia

Posted: 23rd Nov 2005 08:12

Link

Xcode, with Cocoa and objective C.

Deadly Night Assassins

Back to top

Profile PM Email Website

TKF15H

21

Years of Service

User Offline

Joined: 20th Jul 2003

Location: Rio de Janeiro

Posted: 23rd Nov 2005 13:38

Link

Quote: " Xcode, with Cocoa and objective C."

Eh???

WarBasic Scripting engine for DarkBasicPro
DC emulator code size: 14.3MB, 553,214 lines

Back to top

Profile PM Email Website

Three Score

20

Years of Service

User Offline

Joined: 18th Jun 2004

Location: behind you

Posted: 23rd Nov 2005 19:31

Link

did anyone else get that

btw
is neophyte on vacation or something

tutorials,programs,useful but simple php scripts, a place for code snipplets and more at
http://hackr83.0z0.co.uk
(still under construction)

Back to top

Profile PM Email Website

TKF15H

21

Years of Service

User Offline

Joined: 20th Jul 2003

Location: Rio de Janeiro

Posted: 25th Nov 2005 00:21 Edited at: 25th Nov 2005 00:31

Link

Something I found googling: http://www.swansontec.com/sregisters.html
It's an article regarding "the art of register picking".
Good stuff. Reading this reminds me how much I hate the x86 architecture, and makes me wish I could get a G5.

http://www.unixwiz.net/techtips/win32-callconv-asm.html
Covers function calls (cdecl, stdcall). Basic stuff, probably covered previously in this thread, but good to have around.

WarBasic Scripting engine for DarkBasicPro
DC emulator code size: 14.3MB, 553,214 lines

Back to top

Profile PM Email Website

Neophyte

22

Years of Service

User Offline

Joined: 23rd Feb 2003

Location: United States

Posted: 25th Nov 2005 02:36

Link

@TKF15H

I really haven't seen many compilers use ESP to fetch locals. The output of the few that I've looked at have always used EBP. What compiler did you use? The one's I've looked at including MASM all use EBP for locals.

Also, I believe that I read somewhere that using EBP is actually quicker than using ESP to access locals. This is due to some kind of one clock cycle delay when decoding instructions with ESP in the ModR/M byte. I'm not sure where I read it but I'm pretty sure it is true.

@Offset of reality

Not on vacation. Just rarely get around to posting in the forums these days.

Yes, I've figuared out what the SIB byte is. It is used for accessing arrays. Assume for a moment that you have an array of bytes called "MyArray". Assume also that you want to cycle through each element in the array. First you would load up the address of MyArray into a register. This register will be our "Base". We would then clear some register and use that to hold the index of our array. We'll call this the "Index"(noticing a pattern?

). Now in in order to get the first byte or our array we would use a piece of code like this:

+ Code Snippet

MOV ebx, MyArray ;Our base register
XOR ecx, ecx ;Our index register

;Grab a byte
MOV al, [EBX + ECX]

What this does is take the pointer from our base register and add it to our index register. Using the resulting pointer it then fetches a byte into the al register. It might just seem easier to use MOV al, [EBX] since anything plus zero is going to be itself, and you'd be right. However, what if you are in a for next loop and are cycling through the array with the loop index? That is where the real power of the sib shines. If you held the loop index in the ECX register, you could access each byte in the array sequentially, because you'll be incrementing the for next counter each loop.

This is a very usefull optimization, however, there is a draw back. It will only work on arrays of bytes. If you were to have an array of integers, the first itineration would work out really well. However, with a index counter set to one you'd wind up only getting the second byte of the first integer, not the second integer! Allow me to explain with some simple math.

Suppose you have your array at address 10. Your ebx register(which contains the pointer to the array) would then hold the value 10. Since there are 4 bytes to an integer the next integer would be located at address 14. With an index of 0 you'd get the address of 10 since 10 + 0 (EBX + ECX) equals 10. However, with an index of 1 you'd get 11. This is fine for bytes, but anything larger like an integer it won't do, because the base pointer needs to increment by 4. Bit of a severe restriction, eh? But there is a solution:

Scale to the rescue! Scale can be one of any 4 values: 1, 2, 4, 8. And it works like this: The Scale value is multipled by index and the product is then added to the base. So with a scale value of 4 our instruction would look like this:
MOV EAX, [EBX + ECX * 4]
Using this feature we can now access arrays of integers in a for next loop with the loop counter. As I said before, this is limited to 4 values and one of them is redundent(Anything * 1 = Anything). But it is much more efficent and saves you from wasting a register to hold the pointer to the current element in the array.

Back to top

Profile PM

TKF15H

21

Years of Service

User Offline

Joined: 20th Jul 2003

Location: Rio de Janeiro

Posted: 25th Nov 2005 11:33

Link

Ah, always figured it'd be something like that, but wasn't sure.
Regarding the EBP/ESP thing, turns out my compiler (MSVC2005) uses EBP as a general purpose register, therefore has to use ESP to access locals.

WarBasic Scripting engine for DarkBasicPro
DC emulator code size: 14.3MB, 553,214 lines

Back to top

Profile PM Email Website

Halo Man

19

Years of Service

User Offline

Joined: 5th Nov 2005

Location:

Posted: 7th Dec 2005 14:21 Edited at: 11th Dec 2005 03:18

Link

Good luck to everyone!

C++ Programming Tutorial - http://www.cplusplus.com/doc/tutorial/

Back to top

Profile PM

MikeS

Retired Moderator

22

Years of Service

User Offline

Joined: 2nd Dec 2002

Location: United States

Posted: 20th Dec 2005 05:16

Link

Found a few more links that may be of interest.

Interpreter example:
http://en.wikipedia.org/wiki/Interpreter_(computing)#Example_of_a_simple_interpreter

Self-interpretation:
http://arxiv.org/html/cs.PL/0311032

Mostly Compiler Construction things:
http://www.angelfire.com/ar/CompiladoresUCSE/COMPILERS.html

A book? I hate book. Book is stupid.
(Formerly Yellow)

Back to top

Profile PM Email

TKF15H

21

Years of Service

User Offline

Joined: 20th Jul 2003

Location: Rio de Janeiro

Posted: 20th Dec 2005 11:16

Link

heh, self-interpretation... neat. I wonder what that guy was taking when he made it.

WarBasic Scripting engine for DarkBasicPro
DC emulator code size: 14.3MB, 553,214 lines

Back to top

Profile PM Email Website

Three Score

20

Years of Service

User Offline

Joined: 18th Jun 2004

Location: behind you

Posted: 21st Dec 2005 17:57

Link

thanks, that really helps
btw
I'm building a virtual pc though instead of a compiler but if you can write machine code then you can read it also

tutorials,programs,useful but simple php scripts, a place for code snipplets and more at
http://hackr83.0z0.co.uk
(still under construction)

Back to top

Profile PM Email Website

TKF15H

21

Years of Service

User Offline

Joined: 20th Jul 2003

Location: Rio de Janeiro

Posted: 21st Dec 2005 18:13

Link

mind saying a bit more of your project? I'm making an emulator so the projects are a bit related. It is still related to the original topic as emulators are very similar to compilers.

WarBasic Scripting engine for DarkBasicPro
DC emulator code size: 14.3MB, 553,214 lines

Back to top

Profile PM Email Website

Neophyte

22

Years of Service

User Offline

Joined: 23rd Feb 2003

Location: United States

Posted: 9th Apr 2006 10:47

Link

*Bump*

I have some news about my compiler that I'll post later at length. In the mean time, how is everyone doing with their respective projects.

Back to top

Profile PM

MikeS

Retired Moderator

22

Years of Service

User Offline

Joined: 2nd Dec 2002

Location: United States

Posted: 9th Apr 2006 18:00

Link

Progress is a bit slow unfortuantly on my compiler. However, I've begun working a bit with Lua, now that it's so availiable with DBP. I've actually also finished a parser in DBP quite a while ago. I'm going to work on translating it over to FreeBasic(very similar to qBasic, but faster and still growing) so I can work with it more in that language.

---------------------------------

Kind of off topic, but here's a little tool I made based off your shader tutorial that I wanted to give you credit for.
http://forum.thegamecreators.com/?m=forum_view&t=76210&b=5

A book? I hate book. Book is stupid.
(Formerly Yellow)

Back to top

Profile PM Email

TKF15H

21

Years of Service

User Offline

Joined: 20th Jul 2003

Location: Rio de Janeiro

Posted: 9th Apr 2006 23:10 Edited at: 9th Apr 2006 23:14

Link

Good to see you guys haven't given up.

I totally changed all my original plans:
I'm working on an Assembler that reads an XML file which defines the instructions and the output (so depending on the XML, it can generate intel x86 code, or ARM code, etc.). Currently focused on ARM code, it can already output arithmetic instructions. Adding more instructions is (hopefully) just a matter of editting the configs.
When that's done, I'll have to make a linker. And after that, the actual compiler (aiming for a BASIC-like language).
This will take me a really long time though, I'm working on other things, and I have a job/classes to tend to. -_-

DC emulator code size: 9MB. Compiled: 4MB.
Overall Status: 20% done. CPU: 80% (no floats), RAM: 10%, GFX: 0%

Back to top

Profile PM Email Website

Neophyte

22

Years of Service

User Offline

Joined: 23rd Feb 2003

Location: United States

Posted: 13th Apr 2006 02:27

Link

@MikeS

Interesting tool. I'm glad my tutorials could be of assistance.

@Everyone

Right now, I've just recently completed a mini-assembler. It outputs valid COFF object files and calls microsoft's linker to generate a valid executable. I'm thinking about writing a huge tutorial series that will cover how to make a compiler from the back end up. I could document the COFF object file format and how to output assembly to it.

Ultimately, I'd like to work from working assembler, to working psuedo-assembler code, to full-fledged BASIC compiler. I think I'll start documenting my progress so far and how I made out shortly. Would any of you guys be interested in anything like that?

Back to top

Profile PM

MikeS

Retired Moderator

22

Years of Service

User Offline

Joined: 2nd Dec 2002

Location: United States

Posted: 13th Apr 2006 04:28

Link

Sounds quite exciting Neophyte, and I really think that's the path to go in terms of compiler development. Documentation will also be very important, for you might have to restart(as I have numerous times), and that'll definitly help out a lot in your development. So of course, I'd be extremely interested(if you couldn't guess).

A book? I hate book. Book is stupid.
(Formerly Yellow)

Back to top

Profile PM Email

TKF15H

21

Years of Service

User Offline

Joined: 20th Jul 2003

Location: Rio de Janeiro

Posted: 13th Apr 2006 05:00

Link

I would. There's more to making an executable than code generation so being able to generate COFF files is handy. Any info on other people's experience is always helpfull.

DC emulator code size: 9MB. Compiled: 4MB.
Overall Status: 20% done. CPU: 80% (no floats), RAM: 10%, GFX: 0%

Back to top

Profile PM Email Website

The resurrected anarchist

19

Years of Service

User Offline

Joined: 5th Apr 2006

Location:

Posted: 13th Apr 2006 14:03

Link

hmm, books on makin compilers, intersting, im a bit hung over, so ill check em out later

like wen i get bak from cafe! mmm greasy food

Back to top

Profile PM

MikeS

Retired Moderator

22

Years of Service

User Offline

Joined: 2nd Dec 2002

Location: United States

Posted: 18th Jun 2006 02:42

Link

lol

-----------------
Found a great link for those interested in Tokens and parsing.
http://users.skynet.be/wvdd2/Tokenizers/tokenizers.html

As for me, currently I'm writing a tokenizer in ANSI C. After that I'm going to take a look back at the executable generation links in this thread and have a go at that, or just link the code with a C compiler to be compiled(Hopefully the former though).

My number 1 goal for this summer is to finish this project, I can't believe it's almost been 2 years. How's everyone elses project going?

A book? I hate book. Book is stupid.
(Formerly Yellow)

Back to top

Profile PM Email

Neophyte

22

Years of Service

User Offline

Joined: 23rd Feb 2003

Location: United States

Posted: 20th Jun 2006 04:53

Link

@MikeS

I really haven't worked a whole lot on the little assembler I have since last time. I've been either busy with work or have been working on my tutorial series for this project.

So far I have an introduction, tutorial 1, and tutorial 2 basically finished. I'm not sure I want to release them all just yet. I'd like to output a tutorial a week, but usually work or something else pops up and I really don't get around to working on the tutorials. So I'm trying to get a few done and then I'll release them one by one. Sort of give myself a buffer.

The whole process of revisiting the code I have and explaining why I structured it the way I structured it and what it does has lead me to make some minor revisions of it that have improved the clarity and functionality of the code. So I guess this tutorial writing is rather benificial.

My number 1 goal for this summer is to finish this project, I can't believe it's almost been 2 years.

I too can't believe that it has been almost 2 years. Seems like just yesterday I was digging up articles on compiler construction and posting links to them en masse.

Back to top

Profile PM

PowerSoft

20

Years of Service

User Offline

Joined: 10th Oct 2004

Location: United Kingdom

Posted: 23rd Jun 2006 20:09

Link

Im still here people! lol

Back to top

Profile PM Email Website

MikeS

Retired Moderator

22

Years of Service

User Offline

Joined: 2nd Dec 2002

Location: United States

Posted: 11th Jul 2006 01:24

Link

Some more links for those interested. These two are quite good ones worth looking at, especially if you're just getting into this kind of thing. Both are using BASIC code, so they could even be translated into DBP if necessary.

Full Basic Basic interpreter

Simple Compiler snippet by Mark Sibly

A book? I hate book. Book is stupid.
(Formerly Yellow)

Back to top

Profile PM Email

Three Score

20

Years of Service

User Offline

Joined: 18th Jun 2004

Location: behind you

Posted: 11th Jul 2006 04:15

Link

Neophyte, do you have to instructions part of your tutorial done?
if so, could you please email it to me: [href]mailto:hackr9483-AT-gmail.com[/href]

I'm attempting to make an emulator(only attempting 8086 for now but still (or well I was but when I started reading this thread you encouraged me to start working on it again)

JouleOS and friends
great thanks to http://galekus.com for FREE HOSTING!

Back to top

Profile PM Email Website

Neophyte

22

Years of Service

User Offline

Joined: 23rd Feb 2003

Location: United States

Posted: 9th Oct 2006 03:21

Link

Update:

It appears that I'm finding it easier to advance the compiler than it is to advance my tutorial series. I've modified the compiler that I have to accept initalized data and I can get it to process a program that outputs a messagebox. Haven't progressed much on the tutorial front though.

I was intending to finish a few tutorials to give me a head start, but at the rate I'm going I might as well start releasing the ones I've finished now rather than later.

Back to top

Profile PM

Neophyte

22

Years of Service

User Offline

Joined: 23rd Feb 2003

Location: United States

Posted: 9th Oct 2006 03:21

Link

Creating a Compiler - Introduction

As the title suggests, this post and the posts after it are going to be about creating a compiler. However, the series will have an unusal approach to the subject. Instead of starting with a BASIC language and working toward outputing machine code, we're going to start with a simple ASSEMBLY language which generates machine code and work our way toward a BASIC language. The reason for this is that outputing an object file with correct machine code is generally the hardest part of the process of making the compiler.

Many people can get the front end of a compiler working and make their own interpreter, but creating something that can output machine code and properly link it is a challenge that many fall short of completing. Consequently, it is my goal to start with the difficult part first and then work my backwards toward the easy part.

Now I won't be claiming that this is the best way of going about things or that my way of programing of the compiler is the most optimal way. In fact, I might wind up changing somethings as I go along and at the end of this series might even re-write these tutorials for better clarity in both code and instruction. I'm not entirely sure myself of the precise steps that I'll need to complete my task. This is very much a work in progress. But it's taken me almost forever to get this far so I think waiting till I complete a compiler fully is out of the question. Better now than never.

Here is a brief overview of how our assembler-soon-to-be-compiler will be structured:

The compiler will be broken up into 4 parts:
The Lexer
The Parser
The IR Generator
The Code Generator

The source file for a program goes into the Lexer which strips the source file of useless information and creates a linked list of all lexical tokens. That list is then sent to the parser which is divided into two parts: The Syntax Checker and The Semantic Checker. Loosely defined, the Syntax checker makes sure that the program is structured right. The Semantic Checker makes sure that the meaning of the program makes sense. For example, if the following code was parsed the syntax checker would throw an error:

+ Code Snippet

A as integer

If A = 0
  A = 1

This is because it is missing a key construct: a matching endif. If the following code was parsed the semantic checker would throw an error:

+ Code Snippet

A as integer
B$ as string

If A = 0
  A = B$
endif

Although we have our missing endif in place, A = B$ is incorrect because you can't assign a string to an integer. The meaning does not make sense because the types are incompatiable. The Semantic Checker is the part of the compiler that catches incompatiable types.

Once the source code clears the parsing stage it is ready to be fed to the IR Generator. The IR Generator will transform the list of tokens into an Intermediate Representation. This is the format that our Backend will work with from here on out.

With our new Intermediate Representation of the program our Code Generator can get to work assembling the program and outputing a COFF object and sending it to a linker. This is also the phase where optimizations take place but this won't really be touch upon until much later since we'll be working with assembly in the beginning.

This complete's our brief introduction to our future compiler. Next post will contain a tutorial covering the code to one of our fundamental data structures: the Linked List.

Back to top

Profile PM

Sorry your browser is not supported!

Geek Culture / Take on C ; (And all the great links on Compiliers)