Sorry your browser is not supported!

You are using an outdated browser that does not support modern web technologies, in order to use this site please update to a new browser.

Browsers supported include Chrome, FireFox, Safari, Opera, Internet Explorer 10+ or Microsoft Edge.

Geek Culture / Take on C ; (And all the great links on Compiliers)

Author
Message
Neophyte
22
Years of Service
User Offline
Joined: 23rd Feb 2003
Location: United States
Posted: 2nd Apr 2004 10:04
@Yellow

Sorry, I haven't posted sooner. I kind of forgot about this.

Anyway, good to see you are making some serious progress. You are definately farther than me. Currently, I'm stuck on how I'm going to organize my intermediate langauge. I'm trying to think of a format that would allow for easy optimization and assembly but I keep running into some roadblocks, namly how to get around load-to-store delays. I could probably go into explict detail here but this post would go on forever and I'm not sure I have the time for that.

So, I'll just post what I came across, namely this:
http://elfz.laacz.lv/ms_exe_spec.html

Its the exact same word document I found before only in HTML format. The type is also smaller and I'm finding it easier to read and understand as well as a result. The info here isn't different than what I posted before, but I thought I'd post it anyway just in case want to view this stuff on-line from work or something.
MikeS
Retired Moderator
22
Years of Service
User Offline
Joined: 2nd Dec 2002
Location: United States
Posted: 2nd Apr 2004 23:44
Thanks for the link Neophyte. (As always )

I actually think I'm behind you. What I've created is just a simple interpreter. Still reading through my book, yet eventually I'm going to get some time to sit down and start a real compiler using C/C++.





A book? I hate book. Book is stupid.
Neophyte
22
Years of Service
User Offline
Joined: 23rd Feb 2003
Location: United States
Posted: 8th Apr 2004 11:02 Edited at: 20th Apr 2004 06:34
And just when you thought I couldn't find any more information...

As I posted previously, I was having some trouble determining what my intermediate langauge was going to be like. This lead to a little research and I found a load of stuff that is going to be very useful. First up, is the first thing I stumbled across: Automatic Generation of Data-Flow Analyzers: A tool for building optimizers.

http://citeseer.ist.psu.edu/tjiang93automatic.html

Basically, this is someone's thesis on a tool that will build global optimizers using data-flow analysis. Essentially, a global optimizer looks over the entire program and makes a varity of optimizations to increase program speed. This differs from regular optimizers in that regular optimizers look at one statement or basic block at a time while global optimizers look over the entire program. Modern compilers use global optimizers to get the best speed out of a program.

This paper discusses a tool that will help you to build optimizers procedurally from a series of specifications. This kind of work is going to free up a lot of effort on my part spent figuaring out how to optimize my intermediate langauge as it takes care of it for me.

There is also a discussion of SUIF Intermediate language and I think I'll be using that as my intermediate langange.

The tool Sharlit was also introduced in an early whitepaper here:
http://citeseer.ist.psu.edu/tjiang92sharlit.html

I also found quite a bit of info on SUIF and the various groups that are working on projects related to it.

An Introduction to Machine SUIF
and Its Portable Libraries for Analysis and Optimization
http://www.eecs.harvard.edu/hube/software/nci/overview.html

This describes Machine SUIF and how to begin using it. Their homepage is here:
http://www.eecs.harvard.edu/hube/research/machsuif.html

Here is a thorough over view of the SUIF compiler system and how to use it:
http://suif.stanford.edu/suif/suif2/doc-2.2.0-4/

Well, this search for an intermediate langauge got me really interested in data-flow analysis so here is some more research I mangaged to dig up.

Program Dependence Graphs for the Rest of Us
http://citeseer.ist.psu.edu/10089.html

Here is a simplifed approach to building a dependency graph. It's a short 24 pages so it isn't to difficult to digest.

Optimix A Tool For Rewriting And Optimizing Programs
http://www.ida.liu.se/~chrke/courses/ACC/Optimix.pdf
Here is another intermediate language that you could use for your compiler. The intro to it is a brief 4 pages but it is enough to get the jist of it.

Guaranteed Optimization: Proving Nullspace Properties of Compilers
http://citeseer.ist.psu.edu/veldhuizen02guaranteed.html
Another interesting paper worth taking a look at. Deals with the inconsistance of optimizations of imperative languages(C, Basic, Fortran, etc.) and defines a way of guaranteeing optimization through "super-analysis".

Automatic Construction of Optimizing, Parallelizing Compilers from Specifications
http://citeseer.ist.psu.edu/cohen94automatic.html

Here is a slightly more recent paper on constructing optimizers from a specification. I haven't really gotten the time to look this one over too much to figuare out how good it is, but it looks to be pretty promising.

A FrameWork for Automatic Generation of Code Optimizers
http://www.cse.iitk.ac.in/research/mtech1999/9911115.html

Yet another Optimizer generator. This one is in post-script so make sure you have a program like the GIMP or GhostScript so you can read this one.

Bidirectional Data Flow Analysis in Code Motion: Myth and Reality
http://citeseer.ist.psu.edu/164412.html

This deals with some of the problems with Data flow analysis. I haven't gotten into this one yet but it could be useful later on for improving your optimizer.

Register Pressure Sensitive Redundancy Elimination
http://citeseer.ist.psu.edu/gupta99register.html

This one is a keeper. Basically, it addresses the problem of high registar pressure and when not to do common subexpression elimination. I haven't begun reading it yet but figuaring out the optimial usuage of registars with minial "spill code" has been a problem that has vexxed me ever since I started thinking about how to avoid load-to-store stalls with instructions.

Path-Sensitive, Value-Flow Optimizations of Programs
http://citeseer.ist.psu.edu/bodik99pathsensitive.html

I'll just quote the abstract on this one:
Quote: "framework that is powerful enough to remove nearly all redundancies, yet practical enough to permit an industrial-strength implementation."


Optimal and Efficient Speculation-based Partial Redundancy Elimination
http://citeseer.ist.psu.edu/qiong03optimal.html
A more recent paper that deals with the same problem as the paper above.

Path Profile Guided Partial Redundancy Elimination Using Speculation
http://citeseer.ist.psu.edu/12884.html

Again, I'll quote from the abstract:
Quote: "While programs contain a large number of paths, a very small fraction of these paths are typically exercised during program execution. Thus, optimization algorithms should be designed to trade off the performance of less frequently executed paths in favor of more frequently executed paths. However, traditional formulations to code optimizations are incapable of performing such a trade-off. We present a path profile guided partial redundancy elimination algorithm that uses speculation to enable... "


Path Profile Guided Partial Dead Code Elimination Using Predication
http://citeseer.ist.psu.edu/gupta97path.html

A similar paper as the one above. It discusses "a path profile guided partial dead code elimination algorithm that uses predication to enable sinking for the removal of deadness along frequently executed paths at the expense of adding additional instructions along infrequently executed paths."

Loop Quasi-Invariance Code Motion
http://citeseer.ist.psu.edu/pdf/508447

This one is interesting. Its a new twist on the old loop invariance optimization that can deal with certain invariant expressions that can't be normally hoisted. Definately worth taking a serious look at. Note: You have to d/l this from ftp://ftp.diku.dk/diku/semantics/papers/D-452.pdf. as CiteSeer only has a cached copy with no pages.

There is also another copy that is titled slightly different that can be found here:
http://citeseer.ist.psu.edu/song00loop.html
It is basically the exact same paper only this time the PDF link works.

TenDRA
http://www.ten15.org/news/interest.html

Here is a whole load of links that are very interesting. TenDRA is a C/C++ compiler group that is working for the U.K. equivalant of the U.S. DARPA. They have a bunch of recent links here to good articles on compiler optimizations. A few good ones like...
Operator-Precedence Parsing
http://www.epaperpress.com/oper/index.html
A paper on, as you could have guessed, Operator-Precedence Parsing. Looks to be very useful and easy to understand.

Also...
Compiler Transformations for High-Performance Computing
http://citeseer.ist.psu.edu/bacon93compiler.html

This is basically an overview of compiler optimizations. This is relevant for our work as it deals with examples in imperative languages like C and Fortran. Since Basic is an imperative language these observations also apply. A refreshing out-look since most compiler optimizations theses talk about the object-oriented C++ almost exclusively.

And as if you couldn't get anymore interesting stuff to read...
Genetic Programming applied to Compiler Heuristic Optimization
http://citeseer.ist.psu.edu/cache/papers/cs/27030/http:zSzzSzwww.metahuman.orgzSzmartinzSzpaperszSzGPApppliedToCompilerHeuristicOptimizationStephenson.pdf/genetic-programming-applied-to.pdf/

This looks like a pretty cool application of genetic programming and it appears to offer some impressive speed ups in compiler time. Also, at 13 pages long, it makes for a quick read.

Meta-Optimization: Improving Compiler Heuristics with Machine Learning
http://citeseer.ist.psu.edu/stephenson02meta.html
This is basically an extentsion of the above. Its 11 pages long so it is a nice short read. However, the PDF version on CiteSeer is utter crap. Someone screwed up when making the file so it looks really huge and blown out of purportion. Simply unreadable. You'll have to use the PostScript version instead as that one is just fine.

On the subject of Compile time:
Dynamic Control of Compile Time Using Vertical Region Based Compiling
http://www.crhc.uiuc.edu/IMPACT/ftp/report/ms-thesis-jaymie-braun.ps
This is post script only as CiteCeer only had a cached copy of this available. So the download link is directly from the authors site.

Composing Dataflow Analysis and Transformations
http://citeseer.ist.psu.edu/553577.html
Interesting new approach to dataflow analysis.

Efficient Representation and Abstractions for Quantifying and Exploiting Data Reference Locality
http://citeseer.ist.psu.edu/chilimbi01efficient.html
This deals with cache optimization and the implementation there of. There are not a whole lot of papers out there that deal with cache optimization so this one is a keeper.

Cache-Conscious Structure Definition
http://citeseer.ist.psu.edu/2787.html
The second paper that deals with cache optimization. This one deals with data structures and how to reorganize them for cache coherency to improve performance.

And just in case I missed something, here is a link from the nullstone site that I found that list quite a few interesting papers. Its also how I found the first paper that I gave a link to here.
http://www-suif.stanford.edu/papers/

On a semi-related note:

I found this presentation on Assembly language to be quite informative:
http://www.ecs.syr.edu/faculty/fawcett/handouts/TestingSeminar/Chapter7a.ppt

It takes a quick look at assembly language from a debugging/disassembly point of view. It cleared a few things up for me so it may be worth your time. Note: The presentation is in powerpoint so you'll need something that can view powerpoint presentations.

Also, from the same site is several chapters I found on debugging and debuggers:
http://www.ecs.syr.edu/faculty/fawcett/handouts/TestingSeminar/

Its in the directory of this teacher's old presentations. I stumbled across it by accident, but I found it to be interesting and easy to understand. The various powerpoint chapters cover the dos and don'ts of debugging along with debuggers: how they work and what they can do.

With all of this talk about data flow analyzers I think it best that some of the terms be clearly described so as to avoid confusion. Here is a glossary of the common terms that you'll run across when reading these papers.
http://www.cl.cam.ac.uk/~jds31/useful/dfgloss.html

Also from that site is a list of Jermey Singer's research on the subject of data-flow analyzers as well as some intermediate languages.
http://www.cl.cam.ac.uk/~jds31/research/ssi.html

I think he is at the same stage that you and I are at so his research might prove invaluable.

Whew!

That was a lot of typing. I think that the stuff that I've unearthed just recently ought to keep both you and I busy for the next decade or so.

Hopefully, I'll have a simple demo to show next time I post. I'm getting close to completing my preliminary work with the MS Library file format. I've examined the OpenGL32.lib quite extensively and I think that I should soon be able to extract whatever object file in the library I need and link it into an executable. From there it is a short hop, skip, and a jump to making a simple executable that displays a colored triangle.

Well, best of wishes in your endeavours and happy coding.

EDIT: Changed one of the links to point to the correct ppt presentation.
Powersoft
21
Years of Service
User Offline
Joined: 1st Aug 2003
Location: United Kingdom
Posted: 8th Apr 2004 15:05
That was a long post!

Just to add to the confusion.
Look at my avatar
Neophyte
22
Years of Service
User Offline
Joined: 23rd Feb 2003
Location: United States
Posted: 21st Apr 2004 13:53
A few more links to add to the pile:

The text for a book "Compilers and Compiler Generators" is online and available here:
http://www.scifac.ru.ac.za/compilers/

The text for a book "Parsing Techniques - A Practical Guide" can be found on line as well:
http://www.cs.vu.nl/~dick/PTAPG.html
It contains a complete survey of all parsing techniques and not just the popular ones. Looks to be very promising.

I can't believe no one has posted this before:
http://compilers.iecc.com/index.html
Popular usenet news group about all things compilers and sometimes stuff about language design. Dates back to 1986 so has got be cram packed full of usefull information(along with a bunch of irrelevant stuff but I like to look on the bright side).

Very informative lecture on register allocation via Graph coloring.
http://www.ee.pdx.edu/~mperkows/temp/register-allocation.pdf
Note, this isn't a theses. It appears to be some professors's lecture notes and as such is aimed more toward teaching the beginner. Contains quite a few pictures as well.

Spill Code Minimization Techniques for Graph Coloring Register Allocators
http://www.cs.pub.ro/~pt/articles/bergner.pdf
Should be read after the above lecture. Is a thesis concerning techniques for minimizing spill code. Promisies an increase in execution time of up to 15% over standard methods.

Yet another lecture on compiler design concerning register allocation. Deals with the problem through graph coloring.
http://www.cs.rutgers.edu/~ryder/415/lectures/registerAllocation.pdf

Another lecture. This one is in a weird page setup of four pages to a page. Still, contains lots of pictures and it couldn't hurt to read in conjuction with the above two lectures. A lot of information will get repeated between the three but in my opinion that is a good thing. Makes for easier learning.
http://ece-www.colorado.edu/~dconnors/CGO/notes/register-6.pdf

Preference-Directed Graph Coloring
http://www.trl.ibm.com/projects/jit/paper/slide_pref.pdf
Looks promising. Lots of pictures and colourful(can you tell I'm a visual learner? ).

I should probably post a link to this as this is the place to find out the most up to date stuff on compilers. Here is the link to the main page of the ACM Transactions on Programming Languages and Systems.
http://www.cs.wustl.edu/~toplas/#online
It covers a lot more than compilers but you can usually find stuff in their libraries about compilers. It requires a subsciption fee but don't let that deter you. Just search for the name of the title you are interested in on line and you'll usually find it for free on CiteSeer.

I could go on, but I think I've posted enough for now.

As for my compiler, I'm currently looking into the MOV instruction and all of its many encodings. It appears to be quite a bit more complex than I originally thought it was. I'll post an update once I've gained a firm enough grasp of how and why MOV instructions are encoded the way they are.
Jeku
Moderator
21
Years of Service
User Offline
Joined: 4th Jul 2003
Location: Vancouver, British Columbia, Canada
Posted: 21st Apr 2004 22:57 Edited at: 21st Apr 2004 22:58
Holy crap, Neophyte, you are a research king!

I'm thinking about writing a BASIC compiler for my final CS degree project. Do you and yellow recommend "Writing Compiliers and Interpreters"? It looks pretty in-depth as it handles OOP.

One more question (sorry if it was asked before), but does the book show you how to make the compiler generate an EXE. I know you two are talking about the specs, so I wasn't sure.

Thanks

MikeS
Retired Moderator
22
Years of Service
User Offline
Joined: 2nd Dec 2002
Location: United States
Posted: 21st Apr 2004 23:46 Edited at: 21st Apr 2004 23:50
Full list of almost every link found in this post.



Preference-Directed Graph Coloring
http://www.trl.ibm.com/projects/jit/paper/slide_pref.pdf

http://ece-www.colorado.edu/~dconnors/CGO/notes/register-6.pdf

http://www.cs.rutgers.edu/~ryder/415/lectures/registerAllocation.pdf

Spill Code Minimization Techniques for Graph Coloring Register Allocators
http://www.cs.pub.ro/~pt/articles/bergner.pdf

Very informative lecture on register allocation via Graph coloring.
http://www.ee.pdx.edu/~mperkows/temp/register-allocation.pdf

http://compilers.iecc.com/index.html

The text for a book "Parsing Techniques - A Practical Guide" can be found on line as well:
http://www.cs.vu.nl/~dick/PTAPG.html

The text for a book "Compilers and Compiler Generators" is online and available here:
http://www.scifac.ru.ac.za/compilers/

http://www.cl.cam.ac.uk/~jds31/research/ssi.html

With all of this talk about data flow analyzers I think it best that some of the terms be clearly described so as to avoid confusion. Here is a glossary of the common terms that you'll run across when reading these papers.
http://www.cl.cam.ac.uk/~jds31/useful/dfgloss.html

With all of this talk about data flow analyzers I think it best that some of the terms be clearly described so as to avoid confusion. Here is a glossary of the common terms that you'll run across when reading these papers.
http://www.cl.cam.ac.uk/~jds31/useful/dfgloss.html

I found this presentation on Assembly language to be quite informative:
http://www.ecs.syr.edu/faculty/fawcett/handouts/TestingSeminar/Chapter7a.ppt

http://www-suif.stanford.edu/papers/

http://citeseer.ist.psu.edu/2787.html

Composing Dataflow Analysis and Transformations
http://citeseer.ist.psu.edu/553577.html

On the subject of Compile time:
Dynamic Control of Compile Time Using Vertical Region Based Compiling
http://www.crhc.uiuc.edu/IMPACT/ftp/report/ms-thesis-jaymie-braun.ps

Meta-Optimization: Improving Compiler Heuristics with Machine Learning
http://citeseer.ist.psu.edu/stephenson02meta.html

Genetic Programming applied to Compiler Heuristic Optimization
http://citeseer.ist.psu.edu/cache/papers/cs/27030/http:zSzzSzwww.metahuman.orgzSzmartinzSzpaperszSzGPApppliedToCompilerHeuristicOptimizationStephenson.pdf/genetic-programming-applied-to.pdf/

Compiler Transformations for High-Performance Computing
http://citeseer.ist.psu.edu/bacon93compiler.html

Operator-Precedence Parsing
http://www.epaperpress.com/oper/index.html

TenDRA
http://www.ten15.org/news/interest.html

http://citeseer.ist.psu.edu/song00loop.html

ftp://ftp.diku.dk/diku/semantics/papers/D-452.pdf.

Loop Quasi-Invariance Code Motion
http://citeseer.ist.psu.edu/pdf/508447


Path Profile Guided Partial Dead Code Elimination Using Predication
http://citeseer.ist.psu.edu/gupta97path.html

Path Profile Guided Partial Redundancy Elimination Using Speculation
http://citeseer.ist.psu.edu/12884.html

Optimal and Efficient Speculation-based Partial Redundancy Elimination
http://citeseer.ist.psu.edu/qiong03optimal.html

Path-Sensitive, Value-Flow Optimizations of Programs
http://citeseer.ist.psu.edu/bodik99pathsensitive.html

Bidirectional Data Flow Analysis in Code Motion: Myth and Reality
http://citeseer.ist.psu.edu/164412.html

A FrameWork for Automatic Generation of Code Optimizers
http://www.cse.iitk.ac.in/research/mtech1999/9911115.html

Automatic Construction of Optimizing, Parallelizing Compilers from Specifications
http://citeseer.ist.psu.edu/cohen94automatic.html

Optimix A Tool For Rewriting And Optimizing Programs
http://www.ida.liu.se/~chrke/courses/ACC/Optimix.pdf

Program Dependence Graphs for the Rest of Us
http://citeseer.ist.psu.edu/10089.html

Here is a thorough over view of the SUIF compiler system and how to use it:
http://suif.stanford.edu/suif/suif2/doc-2.2.0-4/

This describes Machine SUIF and how to begin using it. Their homepage is here:
http://www.eecs.harvard.edu/hube/research/machsuif.html

An Introduction to Machine SUIF
and Its Portable Libraries for Analysis and Optimization
http://www.eecs.harvard.edu/hube/software/nci/overview.html

The tool Sharlit was also introduced in an early whitepaper here:
http://citeseer.ist.psu.edu/tjiang92sharlit.html

http://citeseer.ist.psu.edu/tjiang93automatic.html

http://elfz.laacz.lv/ms_exe_spec.html

Some URL's:

Teaches on Lex & YaCC
http://dinosaur.compilertools.net/

Tutorial on Programming a Compilier on C
http://compilers.iecc.com/crenshaw/


Some Book Links:

Writing Compiliers and Interpreters
http://www.amazon.com/exec/obidos/tg/detail/-/0471113530/ref=pd_bxgy_text_1/102-8676814-4432937?v=glance&s=books&st=*

Sams Teach Yourself C in 21 Days
http://www.amazon.com/exec/obidos/tg/detail/-/0672324482/102-8676814-4432937?v=glance

Modern Compilier Design
http://www.amazon.com/exec/obidos/tg/detail/-/0471976970/qid=1049947481/sr=8-1/ref=sr_8_1/102-8676814-4432937?v=glance&s=books&n=507846

Game Scripting Mastery
*I highly recommend this book from the description and previous experince from the Preimer Press Series*
http://www.amazon.com/exec/obidos/ASIN/1931841578/qid%3D1049947504/sr%3D11-1/ref%3Dsr%5F11%5F1/102-8676814-4432937

Construting Language Processors for Little Languages
http://www.amazon.com/exec/obidos/tg/detail/-/0471597546/ref=pd_bxgy_text_1/102-8676814-4432937?v=glance&s=books&st=*

Complete Book of C Programming
http://www.amazon.com/exec/obidos/tg/detail/-/0130960934/qid=1075167258/sr=1-17/ref=sr_1_17/102-8676814-4432937?v=glance&s=books

Compiliers (Better known as the DRAGON BOOK)
http://www.amazon.com/exec/obidos/ASIN/0201100886/qid%3D1049840066/sr%3D2-1/ref%3Dsr%5F2%5F1/102-8676814-4432937

C Programming for the Absolute Begginner
http://www.amazon.com/exec/obidos/tg/detail/-/1931841527/qid=1075167258/sr=1-15/ref=sr_1_15/102-8676814-4432937?v=glance&s=books

This ones is a fairly extensive list of parsers:
http://www.thefreecountry.com/programming/compilerconstruction.shtml

And another one on Programming Grammers half of which appear to be C or C++ based.
http://www.thefreecountry.com/sourcecode/grammars.shtml

Heres what appears to be the complete draft of a book dedicated to linkers and loaders:
http://www.iecc.com/linker/

Here is a free linker with source code and documentation included. It appears to be formally a commerical product so its quality is probably up there over other free linkers with source on the net. Might be a good example of how a professional linker is written. However, it is for MSDOS and is quite old.
http://www.devoresoftware.com/freesource/wlsrc.htm

Here is another free linker. It appears to be much more up to date then the warplink and hence more relevent.
http://alink.sourceforge.net/

Another linker with source. This one claims to be Ten times faster than MS Link. Definately worth a look. It is DOS linker though.
http://www.sudleyplace.com/

http://www.concerto.demon.co.uk/UPS/

http://root.cern.ch/root/Cint.html

Flip Code Tutorials


http://www.basic4gl.net/

Source 1: Iczelion's PE tutorials
http://win32asm.cjb.net/

Source 2: LUEVELSMEYER's Description
http://spiff.tripnet.se/~iczelion/files/pe1.zip

Source 3: Tool Interface Standards
http://x86.ddj.com/intel.doc/tools.htm

Part 1:
http://msdn.microsoft.com/msdnmag/issues/02/02/PE/default.aspx

Part 2:
http://msdn.microsoft.com/msdnmag/issues/02/03/PE2/default.aspx

http://www.smidgeonsoft.com
http://www.fairdell.com/

http://www.agner.org/assem/


Here is one on porting 3DNOW instructions. You mentioned earlier that you were working on a compiler for a math oriented language(I think) so you might find it useful for use these instructions.
http://www3pub.amd.com/products/cpg/athlon/techdocs/pdf/22621.pdf

Here is one on general AMD processor optimization:
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/22007.pdf

Briefing on Compilers
http://www.flipcode.com/tfiles/henry02.shtml

[Important picture. ]


Books:

For those of you who like to look through books like me.
http://www.compilerconnection.com/books/books.htm

More compilers
http://www.compilers.net/Dir/Free/Compilers/Basic.htm

It can be found here, hidden at some remote part of a site I've never heard of:
http://www.microsoft.com/whdc/hwdev/hardware/PECOFF.mspx
http://www.flipcode.com/tutorials/layout.jpg

A book? I hate book. Book is stupid.
(Formerly known as Yellow)
MikeS
Retired Moderator
22
Years of Service
User Offline
Joined: 2nd Dec 2002
Location: United States
Posted: 22nd Apr 2004 01:09
Hello Jeku,

I admit I'm not an owner of this book, but I'll give my review based on what I know and have read at Amazon.

First, you can see inside the book here.
http://www.amazon.com/gp/reader/0471113530/ref=sib_dp_pt/002-2455768-4429634#reader-link

Here's the reviews from the people at amazon.
http://www.amazon.com/exec/obidos/tg/detail/-/0471113530/ref=cm_cr_dp_2_1/002-2455768-4429634?v=glance&s=books&vi=customer-reviews

My review:

One of the first reviewers starts off with some good news for you.
Quote: "I bought this book in 1996 when I was a CS graduate student"

He goes on to compliment the book, and for the most part, the book got a majority of good reviews over bad. From what I saw in amazon preview of the book, most of it is indeed code. So if you've got a good working knowledge of C++ (and recommended pascal), this should be a good book for you.

As far as building exe's, I can't really comment on that. From the index it looks like some topics are covered, but I really can't give you a precise anwesr, for you'll just have to look at the index yourself.( through the inside of the book link provided above )



A book? I hate book. Book is stupid.
(Formerly known as Yellow)
Neophyte
22
Years of Service
User Offline
Joined: 23rd Feb 2003
Location: United States
Posted: 23rd Apr 2004 12:56 Edited at: 23rd Apr 2004 12:58
@Jeku

"Holy crap, Neophyte, you are a research king!"

Indeed.

"Do you and yellow recommend "Writing Compiliers and Interpreters"? It looks pretty in-depth as it handles OOP."

Well I found a review of it in the comp.iecc.com faq. Here is what they had to say:
Quote: "Ronald Mak, "Writing Compilers and Interpreters: An Applied Approach", 1991, John Wiley and Sons, Inc. ISBN 0-471-50968-X.

Andrew Tucker <[email protected]> writes: This 512-page book presents a strictly hands on approach, developing a Pascal interpreter and interactive debugger, then completing with a compiler which emits 8086 assembly. All source code is provided in print and on disk. This book is very low to non-existent in theoretical content, but is very practical and readable for an introduction. Taylor Hutt <[email protected]> comments that the book is a piece of junk. The code that is contained in the book is full of bugs, and the code that it generates will not work."


The first one seems favorable, but the second one says the code is full of bugs and won't produce working code. For a hands on approach kind of book that could be pretty bad. But I haven't personally read it so I can't really make a good judgement call on how usefull it will be to you.

"One more question (sorry if it was asked before), but does the book show you how to make the compiler generate an EXE. I know you two are talking about the specs, so I wasn't sure. "

I don't think it does. It talks about emitting assembly, but I think he means assembly in the text sense of the word. Like MOV ebx, eax or ADD edx, eax and not the actual binary values for the opcodes.

I'd also like to caution you at this point to realize that he is talking about emitting 8086 assembly and I don't think that is very usefull when it comes to modern day processors. 8086 is really old and 16 bit(if I recall correctly) so if you were planning on building a modern compiler that emitted x86 instructions I'd think twice before I delved into that. Writing 8086 asm is a lot different from younger 32 bit 80x86 asm. You don't have to deal with segments and the like when you are writing 32 bit asm. You have a simple flat address space and that makes things a lot easier.


@MikeS

Well I've been progressing along with my compiler lately and recently I've had to figuare out the binary encoding of the MOV instruction and boy was that a hassle. I've cooked up this little mini tutorial about the various encodings I've encountered and how they are laid out. I hope it saves you the time and frustration that I had to go through to figuare out what intel meant in its software developer's volumes. Most, if not all, of the info I picked up can be found in intel's Intel Architecture Software Developer's Manual Volume 2A and 2B. Volume 2A provided usefull info from the Instruction format chapter and the Instruction set reference A -M chapter. 2B mostly provided usefull info in Appendix B under Machine instruction format and General purpose instruction format encoding.


MINI TUTORIAL - MOV ENCODING

The simple MOV instruction has many different encodings and its vital to have a firm grasp on what those encodings are as you'll be hard pressed to have a program without a single MOV instruction!

Note: I'll be dealing with how to move data with the mov instruction here. Moving to and from Control registers and Debug registers seldom happens so I'll ignore them for now.

I'll start with simple stuff and deal with moving data from one register to another. Our example will be moving esp into ebp. So in assembler it would be this: MOV ebp, esp.

Now there are two different valid forms of register to register MOVs but they both start with the same four bits: 1000.

The next three bits are where the two vaild forms differ. Either 100 or 101 are valid though they will change the way we handle which register is placed where later.

Okay so far we have 1000 101 or 1000 100. The next bit determines whether we are moving 8 bits or 32 bits. Its called the w bit and its 1 if we are moving 32 bits or 0 if we are moving 8 bits.

We'll be moving 32 bits in this mini tutorial here so will leave the w bit set to 1.

This completes the first byte of the opcode, so one down one more to go.

The next two bits are the same for both encodings and are always 11. So we'll briefly skim over them on to the more important next 6 bits. A quick recap before we do. We now have two valid encodings that differ so far by only one bit: 1000 1011 11 and 1000 1001 11.

This next 6 bits is where the difference between the two becomes important. They're divided into 2 sections 3 bits a piece. Each section contains the encoding for which register it uses. The first section is called reg1 and the second section is called reg2. You can find a table of all of these encodings in intel's Intel Architecture Software Developer's Manual Volume 2B. I'll list it here to save you the time though. Note: This table is for when w is present in the instruction and the operation performed is 32 bit. When you have opcodes where it isn't either of them then the table is different.


Since we are moving esp into ebp we want 100 and 101 respectively. Now, here is where the different encodings matter. For the first encoding, the reg1 field is moved into the reg2 field.

For the second encoding it is the other way around. Reg2 is moved into reg1. Now in case you missed my brief mention of what reg1 and reg2 are, reg2 is the last 3 bits of our MOV instruction and reg1 is the previous 3 bits before that. So reg1 and reg2 would be located here:
1000 1001 11 reg1 reg2

Now an encoding of 1000 1001 11 reg1 reg2 will move register1 into register2. The encoding 1000 1011 11 reg1 reg2 will move register2 into register1. So if we wanted to move esp into ebp we could have either 1000 1001 11 100 101 or 1000 1011 11 101 100. Both would be valid instructions and they would both do the same thing.

Now at this point I'd like to call attention to a statement that I made earlier after the w bit.

Namely, that the operaton performed is 32 bit. This of course begs the question of how to perform a 16 bit operation and well I'll tell ya.

Since MOV is naturally a 32 bit operation on the IA - 32 we need the operand override prefix before it to change it to 16 bit. In hexadecimal it is 0x66. So if we were doing something like this in assembly:
MOV ax, bx
We would need something like this in binary:

Before the instruction we have the prefix and after that it is pretty normal. Exactly like the 32 bit version in fact. So whenever you want to move 16 bits into anything place the operand override prefix before the mov opcode.

So now that we have both 16 bit and 32 bit versions of the register to register mov down lets move on to something a bit more complicated, no? Let's try moving an immediate value into a register. For example: MOV al, 0x01

Now there are two valid forms for moving a value into an intermediate register but I'll only cover one here since it is half the size of the other. The first 4 bits are as follows:

1011

The next four bits can be broken up into two sections: w and reg. The w still means what it did before and so does reg. In this example we'll be moving just 8 bits so we'll set w to 0. The next part is the encoding for the register. Using the table above, we get 000 for al. So our final result for the opcode will look like this:

1011 0 000

After that we just tack on the immediate value that needs to be loaded: 0x01.

1011 0 000 0000 0001

Simple isn't it?

A quick recap:

So far we've covered how to move from one register to another. How to move 16 bit values using the operand override prefix and how to move an immediate value into a register. But so far we have avoided what must be the most complicated part: System Memory. How do we mov to and from system memory? What about immediate values? How do we move them into system memory? Well lets find out shall we?

First up, how to move a value from system memory to a register. We'll use MOV eax, dword ptr [esp + 0x8] as an example. As usual, the first 4 bits are the same:

1000

The next 3 bits are also the constant. They are 101. After that we have the w bit and we'll leave it set to 1 for this example.

The next byte is called the ModR/M byte. Its a register and/or address mode specifier. It can be broken up into three sections: the mod field, the reg field, and the R/M field. We can look up the needed values for these fields in intel's Intel Architecture Software Developer's Manual Volume 2A under chapter 2, but that isn't all that easy. I'll break the sections of the ModR/M down here.

Here is the Mod R/M table:


The first two bits are the mod field. Since we are using a register and a 8 bit displacement we'll encode it as 01.

The next three bits are the reg field. This will indicate which register we plan on moving our data into. Since we are moving data into the EAX register, we will encode the reg field with 000.

Next up is the R/M field. You'll notice by now the [--] where ESP should be. This is what Intel uses to indicate that we will need another byte after this one called the SIB byte.

Scale-Index-Base. You'll probably only have to deal with this byte when you use ESP register but since the ESP register is the Stack pointer you'll be dealing with it a lot. But back to the R/M field.

Since as you know we are using ESP our R/M field is going to be 100. So our ModR/M byte will look like this:

01 000 100

Now on to the SIB byte. The table for SIB values is just below the table for ModR/M values but I'll list it here for completion.


Now here is where the gaps in my knowledge appears. I'm not entirely sure what the purpose of the SIB byte is. I know its divided into three sections each entitled Scale Index and Base. I know that Scale is two bits in size while Index and Base are 3 bits and are used for encoding the two registers to be used. What I'm not entirely sure of is what Intel means by "Scale". I can only assume that it means multiply the register by the number next to it if applicable(ie *2 *4 or *8 in the above chart) and add that to the base register along with any displacements. But that is just a guess based on the bottom note. All instances of the SIB byte that I've come across so far have a hexidecmal value of 0x24. In binary, it would look something like this: 00 100 100.

If you look at the chart above, you would find this means it has a scaling factor of 0 and has "none" as both a base and an index register. In short, the SIB does nothing. Or so it would appear. I haven't come across another instruction that uses the SIB byte yet, but I haven't looked too far either. Use of the SIB byte depends on the presence of the ModR/M byte and the only fields that will allow for a SIB byte are ones that use ESP. However, ESP is the only field in the SIB byte that does nothing. So I can only guess that the register you use for addressing a point in memory is unrelated to what the SIB byte will use. It just happens that all references to memory using the stack pointer and a displacement have a SIB of 0x24.

I don't know why this is and the docs don't explain either(or they do and I just didn't see/decypher it). But I suppose that simplifies things for us.

Well now that we have the SIB byte out of the way we have our displacment of 0x8 left. So we tack that on to the end of the instruction and for the final result we get this:



Now to do the opposite, move data from a register into system memory, we do something very similar. Take for example the opposite of the above instruction: MOV dword ptr [esp+8], eax.

The encoding for that is exactly the same as the above instruction save for one bit. They both start with 1000 but the next three bits are 100 in this encoding instead of the 101 for the above. After that how w, modr/m, and sib are determined is identical to how the above instruction determines those fields.

So are final result would look like this:



It's indentical to MOV eax, dword ptr[esp +8] except for the seventh bit from the left.

Now, there is a shorter, special version of encoding we can use when we are dealing with system memory and the EAX/AX/AL register.

For moving data from system memory to the EAX/AX/AL register we can skip the Mod/RM and SIB fields altogether and just place a full displacement after the opcode if we have just a full displacement(as opposed to a partial displacement that requires the esp register or any other register to get the effective address). We need to change how we encode the MOV instruction though. Take MOV eax, dword ptr[0x8] for example.

For the first 4 bits we'll need this:

1010

This is slightly different from our usual 1000, but its needed so the processor can recognize the instruction for what it is.

The next three bits will be this:

000

After that is the w bit and we'll set it to 1 for this one. So so far we have this:

1010 000 1

Next up is the displacement value. Since we are displacing by 8 we'll need this:

0000 1000 0000 0000 0000 0000 0000 0000

Notice how there are 3 bytes after the eight. Even though our value could easily fit into one byte we need a full four byte displacement after the opcode as that is what it is expecting.
Another point of note is how the bytes are arranged. In hex this displacement value would look like this:

08 00 00 00

This might lead one to believe that we are displacing by 0x8000000 if you were looking at it in a hex viewer, but in reality we are displacing by 0x00000008. The byte order is reversed. To draw another example, say we were displacing by 0x407B9C. If we were to view this opcode through a hexviewer we would see this:

A1 9C 7B 40 00

The A1 is our opcode and the next four bytes after that are our displacement displayed backwards.

Now if we were moving data from EAX/AX/AL to some specific place in memory with a full displacement we could use a similar encoding as the instruction above to get away with skiping the ModR/M and SIB bytes.

Take MOV dword ptr [0x8], eax for example. The first four bits would be as follows:

1010

Its the same as the above encoding. Now here is where they are different. The next 3 bits are as follows:

001

That last bit is set to 1 instead of zero. This signifies that we are moving into system memory and not out of. After that is of course the w bit. We'll set it to 1 since we are dealing with a 32 bit value.

1

So now we have this:

1010 001 1

Like the previous encoding, the full displacement will follow the instruction. So as a final result we will have this:


Finally, we get to the last MOV instruction variation we will be dealing with: Moving an immediate value into a place in system memory. Take MOV dword ptr [esp+0x28], 0x23 as an example.

For the first four bits we will need this:

1100

The next three bits are as follows:

011

And then of course comes the ever predictable w bit. It is also 1.

1

Now here comes the interesting part. After this we'll need a ModR/M byte but one of its fields, reg, is always set to 000. But lets not get ahead of ourselves too much. First we need to set Mod field. Since we are displacing by an 8 bit value(0x28) we set our Mod field to 01.

Next of course is the reg field, and as I stated previously it is fixed at 000.

That leaves our R/M field. Since we are using ESP as the register to store our effective address in, we'll need a value of 100 once again in the R/M field.

So our ModR/M byte looks something like this:

01 000 100

As you probably recall, since we are using ESP as our register to store our effective address we will need an additional SIB byte after the ModR/M. The value for it will of course be 0x24 so I'll skip over to the next field we have to deal with: The displacement.

Displacements come before immediate values and since our displacement is 0x28 we will need a binary field that looks like this:

0010 1000

After the displacement comes our immediate data that we want to store in system memory. So here it is in binary:

0010 0011 0000 0000 0000 0000 0000 0000

Notice the reverse byte order.

So our final value will look something like this:



So there you have it. That concludes my mini-tutorial on the common encodings of the MOV instruction. I hope I saved you the trouble of having to experiment with an assembler and a hex viewer like I had to.
Jeku
Moderator
21
Years of Service
User Offline
Joined: 4th Jul 2003
Location: Vancouver, British Columbia, Canada
Posted: 23rd Apr 2004 20:28
@Neophyte - Hmmm, well if I can generate assembly code then all it would take is a good linker to do the rest I assume (?). Well, I've got a while before my project is due so I've ordered the book off Amazon.

I've scoured the net for reviews and for some reason 90% of them just take the reviews off Amazon for their sites. It seems most people think the book is really great and practical--- which is good for me because I hate theory :p

Neophyte
22
Years of Service
User Offline
Joined: 23rd Feb 2003
Location: United States
Posted: 4th May 2004 11:11 Edited at: 4th May 2004 11:12
@Jeku

"Hmmm, well if I can generate assembly code then all it would take is a good linker to do the rest I assume (?). "

A good assembler actually, but yes. However, if all you are generating is 8086 code, then the only assembler's that you find are going to generate 16 bit dos programs. That could be a bit of a problem.

P.S. Sorry about the delay. I meant to respond earlier but I just didn't have the time.

@MikeS

I found a few more interesting tidbits.

A new technique for optimizing code: Linear Scan copy propagation.
http://compilers.iecc.com/comparch/article/03-11-046

I also found this tutorial from intel that describes MMX technology. What it is, what it's good for, and how to use it. Quite informative.
https://shale.intel.com/SoftwareCollege/CourseDetails.asp?courseID=20
A few brief notes about that though. It requires you to download some files and install them on your computer (2.61 MB and 14.4 MB respectively). One is for some runtime libraries and the other is the actual tutorial itself.

There are also some more tutorials from intel that cover SSE and SSE2. However, when I got to their download page I didn't get any link to download them from! I'll list these here anyway just incase you have more luck.

SSE
https://shale.intel.com/softwarecollege/CourseDetails.asp?courseID=23

SSE2
https://shale.intel.com/softwarecollege/CourseDetails.asp?courseID=24
MikeS
Retired Moderator
22
Years of Service
User Offline
Joined: 2nd Dec 2002
Location: United States
Posted: 23rd May 2004 03:48
Thanks for the links as always Neophyte. I've gotten pretty far with my PureBasic interpreter, and am starting to prep myself for making one in C soon. I'll keep you updated.



A book? I hate book. Book is stupid.
(Formerly known as Yellow)
Neophyte
22
Years of Service
User Offline
Joined: 23rd Feb 2003
Location: United States
Posted: 23rd May 2004 20:41
@MikeS

"Thanks for the links as always Neophyte."

Your welcome.

I haven't progressed very far with my compiler. I've been putting off work on it in favor of the shader tutorials. But since I need some answer's from MS and they gave me a response that was basically "We have forwarded your comments onto the appropriate Microsoft group for review and response." and I haven't gotten a response since.

So I'm not even progressing on my shader tut.

Right now all I'm doing is waiting for MS's reply and trying to solve an old problem with the pathfinding in my 2d rts game. Oh, and I'm trying to port all of my files to another computer as well as this one's harddrive is giving up the ghost. *sigh*

At least I managed to get a real cheap copy of MS's book on the DX9 programable pipeline. That ought to provide some entertainment for the time being I suppose.
TKF15H
21
Years of Service
User Offline
Joined: 20th Jul 2003
Location: Rio de Janeiro
Posted: 24th May 2004 05:38
I know you guys are talking about Compilers, not interpretors,
but I could use a hand.

When the bytecode is generated, how do numbers fit in? For example:
a=1
The bytecode wouldn't have the variable 'a' there, and most likely
the 1 wouldn't be either. What I'm currently doing, is putting the
numbers as they are, to later on read them in as strings,
multiply+add+multiply+add.... I suppose this isn't the fastest way
to do it. And without some encoding, anyone can go into EDIT and mess
around with the numbers.

My guess on the best way to do this is to use unions to turn one
integer into two chars and then save those into the bytecode. Though,
I don't know how to do that (and there's probably a simpler way
right under my nose). I haven't used unions before, just
took a glimps at them in a C++ book.

Neophyte
22
Years of Service
User Offline
Joined: 23rd Feb 2003
Location: United States
Posted: 25th May 2004 08:07
@TKF15H

Sorry for not responding sooner. I've been in the middle of transfering all of my important files off of my dying harddrive onto my backup.

I'm not writing an interpreter but I've given the subject some thought. Basically, you have a special number in your byte code that indicated that an actual number would be stored next and you would read that number like a normal number and not a byte code. You'd also have to have different codes for different sizes of numbers(byte, word, dword, etc.) so you would know how much to read off.

MikeS might know of a better way to do this though. But untill he shows up I'd try my suggestion out first and see if it works for you.
TKF15H
21
Years of Service
User Offline
Joined: 20th Jul 2003
Location: Rio de Janeiro
Posted: 25th May 2004 16:30
I spoke with MikeS yesterday. He didn't know any better way either.
What I'm currently doing is this:
i,3,1,2,3
i: Says that an integer is next
3: number of digits
123: actual number.
But for the interpretor to decipher this, it has to:
Get the 1.x=1.
Multiply x by 10.
Get the 2. x=10+2
multiply x by 10.
Get the 3. x=120+3

Which is quite slow. Imagine if that had to be done in a For loop!

Kevin Picone
22
Years of Service
User Offline
Joined: 27th Aug 2002
Location: Australia
Posted: 25th May 2004 20:19 Edited at: 11th Aug 2010 22:04
To repeat myself , all data types within the language have a specific/unique identifiers, which I like to call tokens. So when the compiler see's a constant integer for example, it pokes the "Token_Constant_integer" identifier into memory, then at the next location, it pokes the actual integer.

Here's some PlayBASIC styled example code...



so to read a value from memory (as an integer) you might have something like this.



Neophyte
22
Years of Service
User Offline
Joined: 23rd Feb 2003
Location: United States
Posted: 12th Jun 2004 04:11
*bump*

@MikeS

How's your interpreter coming along?

I've been working on mapping out the PE format lately and am almost finished. When I'm done I'll probably post a whole lot of PureBasic code along with descriptions of the various sections and headers. My goal is to get some working code to build a PE file down along with a detailed explaination of how to work it.

I've also been continuing my research with regards to intermediate representations and I think that I've finally found a representation that I would like to start out with. Its called Static Single Assignment or SSA for short and I think I posted a few links regarding it previously.

Well, since then I've found some more info and gone over it in more detail. I'll probably post the links to the new info only this time I'll be giving some commentary on some of them and going over in detail their usefullness and application.

Anyway, that's all that I've managed to accomplish recently. Have you made significant headway with your interpreter or have other priorities snuck up on you like they did with me?
MikeS
Retired Moderator
22
Years of Service
User Offline
Joined: 2nd Dec 2002
Location: United States
Posted: 12th Jun 2004 21:06
Phew, saved this post.

Haven't done much work on the interpreter, in fact, I'm planning on doing a re-write of it. The IDE is finished and has everything it needs except sytanx highlighing(which I can deal with later).

Right now the core of the interpreter is pretty inefficient unfortuantly, so I'm going to re-write for speed.

There's still only 10 commands or so but it's a decent enough start for anyone who likes text adventures.

So my plan is to totally re-build the core and get all the math operators and variables sorted first. After that I'll start rebuilding some of the commands (set display mode,print,input,etc.), and if I get adventourous add in some 2D commands.



A book? I hate book. Book is stupid.
(Formerly known as Yellow)
Neophyte
22
Years of Service
User Offline
Joined: 23rd Feb 2003
Location: United States
Posted: 12th Jun 2004 21:27 Edited at: 12th Jun 2004 21:33
@MikeS

Good to hear.

I've managed to finish the code for the import section(which was giving me the most trouble). I now have code that will let me read in and display the contents of any .exes import section as well as the PE, Optional, and Section Headers.

All I have to do now is recycle some of the structures and code I have now and I'll probably have the hardest part of my .exe builder 80% finished. I just have to organize stuff now and generalize my functions a little bit more. Remove some dependencies and basically clean things up.

Its pretty exciting to be this close to finally getting a working executable up.

The only thing I have to worry about after that is getting a simple machine code template of a windowed program to insert into my .exes to test them out. Shouldn't be too hard to come up with though.
Neophyte
22
Years of Service
User Offline
Joined: 23rd Feb 2003
Location: United States
Posted: 16th Jun 2004 10:23 Edited at: 16th Jun 2004 10:29
@MikeS

As promised, an in-depth(somewhat) analysis of the papers that I've found. I'll summarise them and point out what I think is usefull about each of them. Some of these you might find rather interesting.

[NOTE]
While I was writing this over the past few days I was able to download several papers for free from ieee's site. Although the page said you needed a subscription to download, I thought what the hell, may as well try. And low and behold I was able to download quite a few texts from ieee. I figuared all this talk about buying articles must have changed and they didn't update their site. Well two days later when I'm checking the links for these papers I notice that I can't download anything from them without a username and account.

So it appears they had a bit of a security glitch and now quite a few of the links to the papers I mentioned are dead. I've searched the net for copies but apparently only ieee had a few of them. I've replaced the links with a [No Link] piece to sigify which papers I got and couldn't replace from ieee.

Sadly, many good papers I have came from ieee so that leaves me with a bit of a dilemma. I guess if you are interested in checking out these papers yourself you could email me and I could give them to you but I don't know if it is worth the hassle.

Since I spent a fair bit of time writting these little synopsis and semi-reviews for them I'll leave in the parts that mention the now unobtainable papers. After all, I wouldn't want all of that to go to waste, eh?

Anyway, without further ado, here are the semi-somewhat in depth analyses of the papers I've found recently.


A VIRTUAL MACHINE FOR INTERPRETING PROGRAMS IN STATIC SINGLE ASSIGNMENT FORM

http://www.ics.uci.edu/~wangn/uci-ics-tr-03-12.pdf

This paper describes how to build a virtual machine that will interpret SSA form. SSA form is a widely used and simple intermediate language that compilers use for optimizations. I think this could serve as a stepping stone for building an optimizing compiler by allowing you to get acquainted with SSA form without having to build a full blown compiler.

It could also serve as a usefull debugging tool for your compiler as well when you build one.

An interesting note: This document contains the C source code for the inner core of the virutal machine. It also has several references that you may be interested in as well.

This one is definately worth giving a look at sometime soon.

ACCELERATING SINE AND COSINE EVALUATION WITH COMPILER ASSISTANCE

[No Link]

This one is more of interest to me as Basic4GL is designed to run graphical programs and Sine and cosine are abundant in them. It works by using approximations of sine and cosine and breaking up the code for them into two parts. The first part is the code that is common for both of them and the second part is to add small finishing pieces of code for the both of them.

The method described has several advantages. First and foremost is the fact that existing compiler optimizations will allow for a cosine and sine operation on the same data to be optimized to the point that they execute in roughly the same time as a single sine or cosine operation would!

The method described also is extended to include tangets and hyberolic functions.

It looks like this could be of great use to me.

On a side note: This technique isn't completely theoretical either as its been implemented in a production compiler. The "HP-UX compiler for the Itanium." So it has some soild performance statistics to back it up as well.

INTER-PROCEDURAL LOOP FUSION, ARRAY CONTRACTION AND ROTATION

[No Link]

This paper is straight out of Intel's Compiler Labarotary out in Santa Clara, CA.

This technique has also been implemented in a production compiler as well and apparently gave a 12% increase in the SPECfp 2000 benchmarks.

The essential problem that these techniques are trying to solve is the fact that processor speeds are increasing much faster than memory speeds. These techniques try to improve the locality of the data referenced. Basically more cache hits and less cache misses which means less time spent trying to fetch data from memory which can be real slowed compared to getting it out of the on chip cache or a register.

This document then goes into a VERY detailed discussion of how to go about implementing these techniques. It nicely describes inter-procedural analysis and the steps involved in it. There are a lot of diagrams in here and as a visual learner I really appreciate the throughness Intel achieved by littering this document with these diagrams.

There are really only two drawbacks I can see to this paper. The first is the need for a profiler.

The paper mentions the need for a profiler but it also talks about static profiling methods. But clearly, the best info you can get is from dynamic profiling methods as they are the best shot you've got at accurately measuring what will get executed and when.

The second drawback is that it is discussed with the Itanium processor in mind. This isn't too bad as the techniques mentioned are general enough that one could easily apply them to Intel's more popular processor.

It's hard to sum up all that is covered in this document. Although there are just 11 pages it is jammed full of usefull practical info.

Even if your compiler writing days are a ways away yet, I'd suggest giving this doc a read as it is real interesting and informative.

AN EXPERIMENTAL EVALUATION OF DATA DEPENDENCE ANALYSIS TECHNIQUES

[NO LINK]

This is a more recent paper(March 2004) that deals with three different data dependence analysis techniques and there various performances with regards to compile time, accuarcy, etc.

This one isn't too interesting, but it does serve as an overview of what appears to be the three dominant data dependence analysis techniques out there and their various pros and cons.

Not too helpful right now, but might prove usefull in the future.

COMPILING LANGUAGE DEFINITIONS: THE ASF+SDF COMPILER

http://arxiv.org/PS_cache/cs/pdf/0007/0007008.pdf

This might be of interest to you.

Quote: "
The ASF+SDF Meta-Environment is an interactive language development environment whose main application areas are definition and implementation of domain-specific languages, generation of program analysis and transformation tools, and production of software renovation tools.
"


I think what it does is basically helps you to write functional languages. Sort-of a YACC or LEX only beefed up to include more.

I remember you saying that you were going to port your work over to C eventually so since this outputs C source code I thought you might be interested.

COMPILER OPTIMIZATION - SPACE EXPLORATION

http://www.cs.princeton.edu/~nvachhar/papers/cgo01_ose.pdf

One of the major problems facing compiler optimizations today is the unpredicatable effect that various optimizatons can have on each other. It's entirely possible for an optimizaton to increase performance in one case while decreasing performance in another.

Take Common Sub-Expression Elimination as an example. In Common Sub-Expression Elimination, CSE for short, you remove operations that will yield identical data and leave only one instance of that operation with it's result stored in a temporary variable. For example, examine the following code:

a = b + c

d = b + c + e * b / c

In this example b + c is computed twice which is unneccessary since the values of b or c do not change between the time that the first b + c is and the time that the second b + c is computed. So a more optimal piece of code would look like the following:

a = b + c

d = a + e * b / c

But suppose that we were short on registers and for some inexplicable reason a needed to be evicted from the register that it was residing in. So basically, we only have room for d, b, c, and e, but not a. Now in the original code we wouldn't have had a problem since a could safely be moved back to memory while d takes its place. But thanks to our "optimization" that is no longer the case.

We now will have to push a value out of a register and into memory so we have room. Then we will have to pop that value back into the register that it was pushed out of to complete the equation. These relatively expensive memory access will completly override any gain that we obtained from only computing b + c once.

Now this may seem far-fetched but in reality it can be pretty common. Remember, the x86 only has 4 general purpose registers, and in the above not too complicated example we ended up needing 5 registers thanks to our "optimization".

So the effect of optimizations on each other is of great interest to the compiler writer. Especially x86 compiler writers who have to deal with a horribly small amount of registers.

What this paper deals with is how to effectively ascertain the impact of your optimizations not only on each other, but on the perfomance of your compiled code. It also outlines the key failures of the existing methods, predictive heuristics and itinerative compilation mainly, and proposes a solution to the problem. By combining perdictive heuristics with itinerative compilation, they managed to achieve a higher code quality while at the same time keeping the compile time to a reasonable amount.

PRACTICAL DEPENDENCE ANALYSIS IN A SIMD VECTORIZING COMPILER

[No Link]

This is an interesting paper. It deals with detecting data dependence using the Bresenham line algorithm. A novel idea, and it appears to be quite quick and exact. Its rather short and to the point(its only eight pages long with the last page filled with references) but interesting and possibly quite applicable.

A REGION-BASED COMPILATION INFRASTRUCTURE

http://ipf-orc.sourceforge.net/region-final-version.pdf

This paper deals with dividing a program up into sections using something other than function scope as the boundries. According to this paper, not only do the described technique speed up compilation time significantly(one benchmark compiled 63% faster!), but they also increased execution time of the compiled program.

This is definately worth looking in to as the performance gain in not only compile time, but in execution time is hard to shurg off as uninteresting.

However, the method doesn't do data-dependence analysis yet so there is still some room for improvement of it.

COMPILER-DIRECTED RESOURCE MANAGEMENT FOR ACTIVE CODE REGIONS

http://api.ece.uic.edu/~Interact7/paper4.2.pdf

As mentioned eariler, memory latenices are beginning to become a real source of slow down and this is only going to get worse as time goes on.

This paper deals with "improving cache effiecency by utilizing compiler-directed adaptive cache management techniques". In short, it talks about a series of compiler optimizations that deal with cache effiency in small "hot spot" regions of code.

Some interesting results were had from the experiments. In one benchmark, a section of code that was only executed 3% of the time was optimized and resulted in a 19% performance improvement in the program overall. They go on to describe this in detail and why this occured.

Its an interesting paper if you would like to improve cache performance.

MINIMUM REGISTER INSTRUCTION SEQUENCING TO REDUCE REGISTER SPILLS IN OUT OF ORDER ISSUE SUPERSCALAR ARCHITECTURE

http://citeseer.ist.psu.edu/573385.html

Lately, I've been interested in register allocation and the various methodologies that are used.

While researching the subject I came across this paper which claims that in 99.2% of the cases studied their solution was optimal. That's an impressive number so naturally I decided to include this paper although it really focuses on non-x86 architecture.

But despite that it shows promise considering the fact that the problem that it solves is closely related to Optimal Code Generation.

I'd look into this paper in the future, but not nearly so early as some of the other papers mentioned.

USING THE COMPILER TO IMPROVE CACHE REPLACEMENT DECISIONS

ftp://ftp.cs.umass.edu/pub/osl/papers/pact_zlwang_2002.ps.gz

[Note]
I could only find a postscript version of the paper. Originally it was in a pdf format, but I got this one from ieee so no chance in getting the pdf version now. If you don't have something that will display postscript google GhostScript. That program should allow you to display it.
[End Note]


Yet another paper on reducing cache misses. This one is more general than the last and talks about varying architectures. It seems rather promising and it claims to have seen reductions in execution time of 4.89%, 8.39%, and 6.94%. It also predicts that in 5 years time we'll reductions of 9.99%, 15.62%, and 12.31% for the respective programs.

It appears that memory speeds are significantly behind processor speeds and as current trends go that gap is only going to widen. To put this into perspective, processor speeds have been increasing 55% for every year after 1987 while memory access latenicies have only been improving by about 7%. So their predictions for performance improvements for 5 years from now aren't all that far fetched.

COST EFFECTIVE COMPILER DIRECTED MEMORY PREFETCHING AND BYPASSING

[Note]
This look like a powerpoint presentation turned into a pdf. I included here for two reasons:
1. The original text version came from ieee.
2. It looks neat when you use the scoll wheel on the mouse to scroll down.

http://moss.csc.ncsu.edu/pact2002/slides/ortega117.pdf

[Other Note]
I found a good copy of the original.

http://www.csc.ncsu.edu:8080/faculty/mueller/pact02/papers/ortega117.pdf


You can really tell I'm interested in caches lately.

As the name suggests, it deals with prefetching which was introduced in the SSE extensions for the Pentium III(though it doesn't actually mention the Pentium III or SSE, the methods introduced apply to them as well).

The performance gained can be quite siginificant as one benchmark showed an increase of about 102% and an average of 43% speed increase against software with no software prefetching.

The methods outlined here in conjunction with eariler methods mentioned could be combined to create a powerfull optimizing force for data-dependent applications.

PROCEDURE MERGING WITH INSTRUCTION CACHES

http://citeseer.ist.psu.edu/292876.html

This one is rather old. It was written in 1991 so if you want to follow its methods just be on the look out for something better because by now someone probably came up with a superior technique.

Anyway, this paper deals with inter-procedural optimizations with regards to instruction cache performance. It uses a combo of profiling info, data structures, the cache size and cache miss penalty as indicators as to whether the procedures in question should be merged or not.

I believe that this paper was the first to measure the likelyhood of a procedure merge based on the effects it would have on the instruction cache. Procedure merging has always been a murky sort of business with the rules to define whether a procedure to be merged or not are rather arbitrary. There are trade-offs to be made between code size and execution time that are not always easy to ascertain.

I think this is nice start when it comes to procedure inlining, but the intel one is way better in terms of quality presentation and relevance. Still, I wouldn't discount this paper yet. It could prove rather usefull as a stepping stone for getting into procedure merging. I didn't see anything too complex in it so I guess it is worth giving a look at to at least compare it Intel's methods.

EVALUATING INLINING TECHNIQUES

http://citeseer.ist.psu.edu/539953.html

This is an overview of various inlining techniques and a new hybrid approach that they introduced. It is a more formal treatment of the subject than previous papers I've mentioned.

The paper is rather old at 1996 but it is still usefull. It uses a probabilistic model though, and as a much newer paper that I mentioned demonstrated, these models are inherently flawed. Figuaring out whether a function should be inlined or not in all cases except for the most basic is at best an NP-Hard problem. Its prohibitively expensive to calculate the precise effect of an inline for all functions.

FIAT - A FRAMEWORK FOR INTERPROCEDURAL ANALYSIS AND TRANSFORMATION

http://citeseer.ist.psu.edu/37218.html

This one is a keeper. It describes an abstract framework for interprocedural analysis. From the looks of it, it appears to be a rather easy to integrate, modular design for an interprocedular analyizer.

It doesn't modify source code either. It only deals with identifying areas where transformations can be applied in a safe and profitable manner. So you can choose whether to inline a function or not based on your own internal algorithms.

The ease of use of this system seems to be pretty impressive as well. In a short time it was integrated into several major compiler systems with no changes being needed to the core source code.

This along with Intel's paper are probably the top two inter-procedural analysis papers that you should start with if you ever consider implementing inter-procedural analysis in your compiler.


OVERCOMING THE CHALLENGES TO FEEDBACK-DIRECTED OPTIMIZATION

http://www.eecs.harvard.edu/hube/publications/dynamo00.pdf

I've listed several papers so far that make mention of Feed back directed optimization techniques.

However, I haven't really given any papers that deal with just feed-back directed optimization. This one describes feed back directed optimization and the challenges it currently faces.

Although I don't entirely agree with them, they do make some excellent points that are worth considering. I'd give this one a read if you are interested in feed-back directed optimization.

PRECISE REGISTER ALLOCATION FOR IRREGULAR ARCHITECTURES

http://citeseer.ist.psu.edu/435214.html

When it comes to register allocation you are going to be hearing a lot about graph coloring. Graph coloring was originally developed by Chatin when he was working at IBM. Graph coloring was the first significant leap forward in the science of register allocation.

However, it was developed with a RISC machine with 16 registers in mind. Graph coloring algorithms for architectures that have less than 16 registers, like our poor x86, don't do to well.

Enter integer programming.

This paper describes an approach to register allocation that uses something called integer programming to solve the problem of register allocation for irregular architectures. The best part about this paper is that the x86 was selected as a test case to see how well it did. Other papers so far have dealt with different architectures or described things generally, but with this paper we have a method that is directly applicable to x86 architecture.

The authors set up their algorithm and test it against GCC's graph coloring algorithm and get some surprising results. Their IP allocater reduces register allocation overhead by 61% and produces faster code!

This is a must read paper if you are going to build a compiler for the x86 architecture.

GRAPH-COLORING REGISTER ALLOCATION FOR IRREGULAR ARCHITECTURES

http://citeseer.ist.psu.edu/382779.html

Just to be fair, I thought I'd let these guys talk about how they allocate registers on irregular architecture using a modified graph coloring algorithm.

They discuss several modifications the graph coloring algorithm and their work seems to solve the problem of irregular architectures not being amiable to graph coloring. They don't mention any real performance statistics though so I'm a little suspicious as to whether this is better than the integer programming approach.

The mention the above paper in their references section but they don't comment on its performance vis a vis their algorithm. The cynic in my head thinks that their algo, though better than other graph colorers, didn't measure up as favorably against an integer programming method. But that's just my suspicion.

OPTIMAL SPILLING FOR CISC MACHINES WITH FEW REGISTERS

http://citeseer.ist.psu.edu/339374.html

This is a more up to date approach at solving the register allocation problem for x86 architectures.

It mentions integer programming as a solution to register allocation and the current work being done with it. From what I can gather they got pretty far and with promising results but they ran into a problem with optimal register coalescing. They gave two solutions but neither were vary satisfactory as both had tradeoffs of speed of compilation versus speed of executable.

But this paper came two years after the other paper regarding IP register allocaton so it does bring one up to speed more with the current field.

I'd suggest reading PRECISE REGISTER ALLOCATION FOR IRREGULAR ARCHITECTURES first than this to get a better grasp of the challenges that we face.

EFFECTIVE INSTRUCTION SCHEDULING WITH LIMITED REGISTERS

http://www.eecs.harvard.edu/hube/publications/cheng-thesis.pdf

This one is a big one. Its someone's thesis on optimal instruction scheduling. There is a whole lot here and it contains a lot of usefull information. Some of the info here also applies to register allocation as well. In fact, there is even an entire appendix devoted to graph coloring.

But that aside, instruction scheduling is very important, especially on the x86 with its multiple pipelines. Properly scheduled instructions can boost performance up to four times or more depending on the architecture. For example, AMD's Athlon processors can execute up to 4 independent ADD instuctions at a time.

This paper, although real long(134 pages), could prove rather usefull in boosting the performance of your compiled applications.

EFFICIENT INSTRUCTION SCHEDULING USING FINITE STATE AUTOMATA

http://citeseer.ist.psu.edu/205264.html

This one I included just for fun. I found a reference to it in one of the papers bibliographies and I thought it'd be interesting.

It deals with instruction scheduling using two finite state automata and produces some interesting results. I found the paper to be quite descriptive if a little out dated(1995).

You might want to give it a try when you have the spare time or become interested.

REGISTER ALLOCATION VIA GRAPH COLORING

http://citeseer.ist.psu.edu/43119.html

No mention of graph coloring would be complete with out mentioning Briggs. I heard that this was recommended reading for anyone interested in graph coloring and I found out why. This 154 page thesis details extensively the history of graph coloring and its modern application.

Although it was written in 1992, you'll see a lot of references to this paper in any paper that concerns graph coloring. It is rather interesting read and I'd suggest giving it a look even though you may decide against graph coloring as your register allocation scheme of choice.

PROCEEDINGS OF THE GCC DEVELOPER'S SUMMIT 2003

http://www.linux.org.uk/~ajh/gcc/gccsummit-2003-proceedings.pdf

I thought I'd add this in as the icing on the cake.

There are a ton of interesting papers in this PDF and I think it is definately worth looking into.

There is just too much to summarise here. Check it out if you have (a lot of) some spare time.


Now that those are out of the way, why don't I discuss some interesting links I've managed to dig up.

First up, a page on compiling Java code that gave some interesting references:

http://www.bearcave.com/software/java/comp_java.html

Look toward the bottom under "References". You should see several descriptions of books that may be of interest to you.

I also found someone's notes on the compiler that they implemented. The compiler is aimed at x86 architecture and looks like their could be several interesting tidbits of information here:

http://www.smlnj.org//compiler-notes/

Of note, is his notes on implementing register allocation for the x86. The notes are in PostScript though so make sure you have something like GhostScript to read them.

I don't know if you've seen this one before, but it is a very helpful page on code optimization.

http://www.azillionmonkeys.com/qed/optimize.html

Well that's all for now. Hopefully next time I'll be posting PureBasic code on how to build a .exe.

MikeS
Retired Moderator
22
Years of Service
User Offline
Joined: 2nd Dec 2002
Location: United States
Posted: 16th Jun 2004 21:58
Whew, I got a lot more reading to do now.

The one thing that really caught my eye was the ASF+SDF Compiler. Really a good read.

Quote: "Well that's all for now. Hopefully next time I'll be posting PureBasic code on how to build a .exe. "


Sounds great. At this point I'm finished with my interpreter in PB. It doesn't do much, but I've learned a lot about building languages so to speak. I'm now going to have to jog through this post and start on building a compiler. Like you mentioned though, that ASF+SDF has given me good place to start if I decide to port over any code.

I'll keep you updated. (But first I must read through the rest of the links. )



A book? I hate book. Book is stupid.
(Formerly known as Yellow)
Powersoft
21
Years of Service
User Offline
Joined: 1st Aug 2003
Location: United Kingdom
Posted: 17th Jun 2004 00:11
mike when you get a simple compiler sorted out with the "print" command clone and EXE making capabilities can i have a look at the code?


Create or Play? You choose!
TheAbomb12
21
Years of Service
User Offline
Joined: 14th Aug 2003
Location: Amist the blue skies...
Posted: 17th Jun 2004 00:14
Dang Neophyte, you really do enjoy writing long posts don't you

Amist the Blue Skies...
MikeS
Retired Moderator
22
Years of Service
User Offline
Joined: 2nd Dec 2002
Location: United States
Posted: 17th Jun 2004 01:04
@TheAbomb12

Hehehe, I think me and Neo are the only ones who've read through all this thread.

@Powersoft/RichS()

We'll see about it. The thing about compiler construction is it can be complicated at some times and you need to understand what exactly is going on.(Even after reading Neo's posts I still have to re-read them again to grasp what he's saying) When I get to the point where I can make a "hello world" .exe I'll probebly post it and document it for you and everyone though. But only because I like your slogan.

"Create or Play? You choose!"



A book? I hate book. Book is stupid.
(Formerly known as Yellow)
Neophyte
22
Years of Service
User Offline
Joined: 23rd Feb 2003
Location: United States
Posted: 17th Jun 2004 06:20
@MikeS

"At this point I'm finished with my interpreter in PB."

Good to hear.

@Powersoft

I'll be posting code on how to build a .exe. So that might be of some assistance.

I've decided to break it up into 2 halves. The first half will deal with the DOS stub, PE and Optional headers. The second half will deal with the section headers and the needed sections.

The first half should be done pretty soon, but I don't know how long the second half will take me.

@TheAbomb12

"Dang Neophyte, you really do enjoy writing long posts don't you"

Indeed.
Powersoft
21
Years of Service
User Offline
Joined: 1st Aug 2003
Location: United Kingdom
Posted: 18th Jun 2004 20:19
well neophyte i see you aren't as long this time. That would be useful if i could have a look at how it all works.

Thank you as well MikeS. The compiler tut thingy will help me let you create AND play. (eventually!)


Create or Play? You choose!
Powersoft
21
Years of Service
User Offline
Joined: 1st Aug 2003
Location: United Kingdom
Posted: 1st Jul 2004 03:07 Edited at: 6th Jul 2004 16:15
hows it going


Create or Play? You choose!
Powersoft
21
Years of Service
User Offline
Joined: 1st Aug 2003
Location: United Kingdom
Posted: 6th Jul 2004 16:16
*bump*


Create or Play? You choose!
Neophyte
22
Years of Service
User Offline
Joined: 23rd Feb 2003
Location: United States
Posted: 8th Jul 2004 00:21
@PowerSoft

"hows it going"

I've completed the first part of my tutorial regarding the layout of a PE file and how to create it. I've started work on the second part, but lately I've had other priorities.

I also haven't had a chance to work on my compiler at all lately either.

Oh, and thanks for keeping this thread alive.

Hopefully, after my vacation I'll be able to finish up my PE file creator tutorial. I should get back some time in August.
Powersoft
21
Years of Service
User Offline
Joined: 1st Aug 2003
Location: United Kingdom
Posted: 8th Jul 2004 01:51
i didnt want it to die...
sticky?


Create or Play? You choose!
Powersoft
21
Years of Service
User Offline
Joined: 1st Aug 2003
Location: United Kingdom
Posted: 17th Jul 2004 01:48
sorry didnt mean to post... honest


Create or Play? You choose!
Powersoft
21
Years of Service
User Offline
Joined: 1st Aug 2003
Location: United Kingdom
Posted: 11th Aug 2004 00:30
this isnt meant to be a bump.

Hows the tutorial going? im looking forward to it


Create or Play? You choose!
Neophyte
22
Years of Service
User Offline
Joined: 23rd Feb 2003
Location: United States
Posted: 12th Aug 2004 10:34
@Powersoft

"Hows the tutorial going? im looking forward to it"

Pretty good actually. As I've stated before, the first part is pretty much done. There has been some change in my plans regarding the tutorial though. Originally it was going to be a two part tutorial, but I've decided to expand it to three parts.

The second part is about 95% finished. All I have to do is write one more piece of code and its done. However, I want to test the code for the second one first, and for that I'll need some real live machine code. I'm not sure if I should borrow some from an existing file or generate my own. It would take me quite a bit of time either way.

I've already started some pre-liminary work on the third part, but it's along way from complete.

The main focus of the third part is to demonstrate in code the creation of a valid PE file for a hardcoded assembly program. Basically, I'm going to hardcode an assembly program into the program and show you step by step how to generate the machine code for it and create a valid PE file. There is also some new information regarding the encodings of the MOV instruction that I've found out since writing this tutorial. I'll be adding that in to the third part as well. It should help clear up any confusion that my last tutorial dealing with the MOV instruction may have caused. So all in all, the third part should be pretty interesting and instructive.

This should provide a big leap toward having a complete functioning compiler on our hands. There will still be some difficult topics that I will need to address like register allocation, but when I finish the third part of this tutorial we should be very close to having a working compiler.

I'll probably release the first part of the tutorial tonight and finish up the second part.

I'm not sure when I'm going to release the second part of the tutorial, because I think I might be leaving out some necessary sections. I won't know for sure until I can get a working executable generated to test it out. So I probably won't be releasing the second part until I've finished up the third part.
MikeS
Retired Moderator
22
Years of Service
User Offline
Joined: 2nd Dec 2002
Location: United States
Posted: 12th Aug 2004 12:04
Sounds good Neo. Just started designing a new layout for my language today, so I'm sure your tutorials will be very helpful to me and others in the long run.



A book? I hate book. Book is stupid.
(Formerly known as Yellow)
Neophyte
22
Years of Service
User Offline
Joined: 23rd Feb 2003
Location: United States
Posted: 12th Aug 2004 13:02
@MikeS

Well, here it is. The first part of my PE tutorial. Hopefully you'll find it to be of some use.

A bit of explaination first. The tutorial on how to generate a PE file will be split into three parts. This part will deal with the DOS 2.0 stub, PE(by this I mean the PE file signature and COFF File Header) and Optional headers and a general introduction to the various segments of the PE file. It will explain what they are and how to generate working versions of them for your executables.

The second part will contain the section headers and their various sections. What they do, and how to format them. The second part is by far the "meat" of the tutorial as it contains a lot of non-trival information about the import table that you need to know.

Finally, the third part brings all of this together into an example of assembling a mini-program and creating a valid executable for it. I plan on skipping the parsing of the assembly and just hardcode the program in to save space. This final part will probably be of the most interest though I'll try to limit the amount of new information as much as I can. It will be kind of like a conclusion.

If you have any questions regarding clarity or content by all means feel free to voice them. I intend for this to the definative guide to generating valid PE files with PureBasic code.

Before we begin, I'd like to recommend that you get a decent hex editor to check out a program in. It will come in handy being able to look at the bytes and their values. I'd also like to recommend PEBrowsePro. Its an excellent disassembler that also lists the values for the various headers and sections as well as allowing you to do a hex dump of them.

I'd recommend just PEBrowsePro but the hex dumping part isn't as good as some hex editors out there. You may find it adequet though, so I'll leave the choice of whether to download hex editor to supplement it up to you.

You can find PEBrowsePro here:

http://www.smidgeonsoft.com/

Note: You want PEBrowse Professional. NOT PEBrowse Professional Interactive which is a debugger(Not that there is anything wrong with it, its just that it won't be too usefull for a our tutorial).

As reference you can use this website to aid in your understanding. I believe I posted it before but its worth posting it again:

http://elfz.laacz.lv/ms_exe_spec.html

So on to the introduction.

PE FILE FORMAT - PART ONE - DOS STUB, PE HEADER, OPTIONAL HEADER

The very beginning of any valid PE file starts with a DOS stub. This is basically a mini dos program that will display the string "Program cannot be run in DOS Mode." Or something similar. The exact contents of it are not too important except for a few key bytes.

The stub must start with the ASCII characters MZ. These are 4D and 5A in hex respectively. After that you have the 60th byte from the beginning of the file. This paticular byte is very important. It is the offset from the beginning of the file to the beginning of the PE Header.

The best method of generating one of these files is to go the easy route. Find any old .exe on your computer and examine the contents with a hex editor. At byte 60(counting from the beginning) you find an offset from the start of the file. All you have to do is read that amount of bytes off of the beginning of your .exe and you'll have a valid dos stub to tack on to everyone of your .exes that you generate.

But to save you the trouble, here is one I ripped off of list_imports, a utility for listing the imports of a .exe.



You'd write this data to a file using the following code(assuming you've created a valid file):



Its that simple.

With the DOS Stub out of the way lets move on to the PE Header.

Here is a structure for the PE Header fields:



The PE Header has a few fields that can be taken care of via constants. These fields don't change or can be decided ahead of time what they will be.

The first four bytes of the PE Header are the PE signature which is "PE" followed by two zeros. They will always be those four bytes and if they aren't you don't have a vaild PE file. We have a PESigZero field after the first entry for your second zero. The first field is terminated with a zero and so we only need one additonal byte to complete the signature.

The next field is the MachineID. This is a two byte value that tells windows what machine this file is expected to run on. The machine that you want this to run on is a Intel 386 or later. So we use the value $014C. Or in decimal, 332.

The next section is the number of sections contained in the file. Sections hold various pieces of code or data. Some sections have special meanings as well as being required for the exectuable to run. I'll get to them later but there is one important point I'd like to make. This field can not be dealt with by a constant as the number of sections can vary. You'll need to pass this in as a parameter in your PEHeader creating function when you write one.

The next field is TimeStamp. You can probably guess what this is. You create a valid TimeStamp by using the C runtime library routines. But it isn't really necessary. You can just place a zero in the timestamp field if you want.

The pToSymbolTable field is a file offset from the beginning to the symbol table. This is used for debugging and unless you are putting symbol tables into your PE files(which this tutorial will not cover) you should set this field to zero.

NumOfSymbols is the number of symbols in your symbol table. If your pToSymbolTable field is zero this one should be zero as well.

Next up is SizeOfOptionalHeader. Although offically this field can vary I haven't really seen vary in size much at all. It is usually 224 bytes in size. Since we can control how big our optional field is we might as well cheat a little and always set this one to 224.

The final field is Characteristics. As the name implies, this field describes the characteristics of the file. Since we are creating a PE .exe file we can set our characteristics field using a constant. The constants value is $0182 which is 386 in decimal. For further information on what the various characteristic flags mean refer to the link above.

So if we were to create a PEHeader creating function it would probably look like this:



Now for the Optional Header.

There isn't too much about the Optional Header that is optional. It is required for a valid PE .exe. So unfortunately, we can't skip over it. Here is the Optional Header in structure format:



For that structure to be valid we need to have the SubDir structure above it:



The Optional Header is a pretty big header, but luckily like the PE Header you can set some of its fields using constants. So lets start from the beginning.

The first field is the "Magic" field. Its an unsigned short integer that identifies the state of the image file. If it is $10B(267 in decimal) it is a PE32 file. If it is $20B(523 in decimal) than it is a PE32+ file. A PE32+ file is basically a non-32 bit machine, ie a 64 bit machine. Since we are creating a 32 bit executable we will always have a value of $10B in that field.

Next up is the LNKMajor and LNKMinor fields. These are supposed to be set by the linker and indicate the major and minor version number of the linker. You can safely set these to whatever you want but we'll set them to zero in this tutorial.

SizeOfCode is obviously a field we can't set through a constant. This will have to be passed in as a parameter to our CreateOptionalHeader function.

Same goes for SizeOfIniData(size of initalized data) and SizeOfUnIniData(size of unitialized data).

AddressOfEntryPoint is the address of the entry point relative to the ImageBase when loaded into memory. For program images it is the starting address. You'll need to pass this in as a parameter as well. AddressOfEntryPoint will always be greater than or equal to the BaseOfCode.

BaseOfCode is an address relative to the ImageBase of the code section when loaded into memory. This also must be passed in as a parameter.

BaseOfData is an address relative to the ImageBase of the data section when loaded into memory. Like BaseOfCode it too must be passed in as a parameter to our CreateOptionalHeader function.

ImageBase is the preferred loading address of the first byte of an image when loaded into memory. The default for .exes is $400000. However, if the specified address is not available, Windows will have to find another address to load it into. This is why all memory references like AddressOfEntryPoint, BaseOfCode, and BaseOfData are relative to the ImageBase. It saves Windows the effort of having to relocate BaseOfCode, BaseOfData, and AddressOfEntryPoint as well as all of the memory address that depend upon them. ImageBase can be safely set with a constant. We'll use $400000 in this tutorial for the default. If you want to change that you must remember that ImageBase has to be a multiple of 64K. Otherwise, it's invalid.

SectionAlignment is the alignment(in bytes) of sections when loaded into memory. Must be greater or equal to FileAlignment. The default value is the page size for the architecture. For Windows, this is 4K. This can be set with a constant. We'll set ours to $1000(4K in decimal).

FileAlignment is the alignment factor(in bytes) used to align the raw data in the image file. This value should be a power of 2 between 512 and 64K inclusive. The default is 512. If SectionAlignment is less than the architectures page size than this must equal SectionAlignment. This value can be set with a constant. We'll set ours to $200(512 in decimal).

MjOS(Major Operating System Version) and MnOS(Minor Operating System Version) deal with major and minor version number of the required OS. It is often wrongly supplied or not supplied at all. The windows loader doesn't use it apparently. But we'll set it to 4 for MjOS and 0 for MnOS anyway.

MjImgVer(Major Image Version) and MnImgVer(Minor Image Version) are the major and minor verions of the program respectively. You can pass them in as parameters and set them if you want but for this tutorial we'll simplfiy things and set them both to 0.

MjSubSys(Major Subsystem Version) and MnSubSys(Minor Subsystem Version) should at least be 4 and 0 respectively. Otherwise, Windows will think you intend to run the program on a primitive graphics system so you'll lose the 3D look of your windows as well as some other things.

Next up is WinVerVal. It is reserved so we will always set this one to 0.

SizeOfImage will, of course, always be passed in as a parameter. Of note is the fact that it includes all of the file headers in it as well. It must be a multiple of SectionAlignment.

SizeOfHeaders is the combined size in bytes of the DOS Stub, PE and Optional Headers, and Section Headers rounded up to a multiple of FileAlignment.

CheckSum can be set to zero since it isn't really going to be used unless you are writting a driver or a DLL loaded at boot time or into a server. Since none of these things apply to use we leave it as zero.

SubSystem deals with which subsystem the .exe will run in. There are really only two values that can apply to us. 2 for Windows GUI applications(You can still open a console in this mode if you want). 3 for console applications(it will get a console per default on startup or inherit the parent's console). We'll be using the value 2 for this tutorial.

DLLChar(DLL Characteristics) can be safely set to zero as we aren't creating a DLL in this tutorial.

The next four fields deal with the reserved and commited amounts for the stack and heap. You should probably keep these numbers as powers of 2 but I haven't read anywhere that this is a requirement. Reserved amounts mean address space that is reserved not reserved amounts of RAM. Commited amounts mean actual amounts of RAM that are allocated. It also means the amount at which the the heap or stack can "grow" incremently if need be. For this tutorial we will use a value of $100000 for both Reserves and a value of $1000 for both commits.

LoaderFlag is obsolete so set it to zero.

NumberOfRVAandSizes is the number of data directory entries in the remainder of the optional header. All though offically this number can vary it is always 16 currently. In this tutorial we'll leave it at 16 and write out 16 data directories as that is the current amount specified in the docs.

Finally we have the IMAGE_DATA_DIRECTORIES static array. As you can see we have 16 entries in the array each of which contains an RVA(Relative Virtual Address) and Size(in bytes). I won't go into too much detail about all of the various entires as that would take too much time. I'll only be dealing with the few entries that almost always concern us. We will probably have to pass a pointer to an array that contains all 16 entries to our CreateOptionalHeader function so no constants here.

The IMAGE_DATA_DIRECTORIES that you'll probably want to be aware of are the Import, Resource, BaseReloc, and IAT(Import Address Table). The rest either don't apply to this tutorial, are reserved, or don't apply to the x86 architecture. However, since these deal with sections they will have to dealt with in the next tutorial. For now, some code to write them to a file is all you'll need.

So to sum up, here is our CreateOptionalHeader function:

Killswitch
22
Years of Service
User Offline
Joined: 2nd Oct 2002
Location: School damnit!! Let me go!! PLEASE!!!
Posted: 15th Aug 2004 09:54


~It's a common mistake to make, the rules of the English langauge do not apply to insanity~
Neophyte
22
Years of Service
User Offline
Joined: 23rd Feb 2003
Location: United States
Posted: 17th Aug 2004 17:43
@Killswitch

Why the ?

@MikeS

You'll be pleased to know, I've just finished the second part of my tutorial and have begun work on the third part in earnest. Hopefully, within a week I'll have the third part finished and I'll be able to test the code for both the second and third part.
MikeS
Retired Moderator
22
Years of Service
User Offline
Joined: 2nd Dec 2002
Location: United States
Posted: 18th Aug 2004 04:13
Excellent Neophyte, this is great news. I'm still sucking in the first part. This is a job very well done.



A book? I hate book. Book is stupid.
(Formerly known as Yellow)
Powersoft
21
Years of Service
User Offline
Joined: 1st Aug 2003
Location: United Kingdom
Posted: 18th Aug 2004 05:59
yeah... can you releaswe it all as a word doc or rtf?


Create or Play? You choose!
Neophyte
22
Years of Service
User Offline
Joined: 23rd Feb 2003
Location: United States
Posted: 18th Aug 2004 15:42
@MikeS

"This is a job very well done."

Thanks.

@Powersoft

" yeah... can you releaswe it all as a word doc or rtf?"

When I've finished I could probably do that.
Powersoft
21
Years of Service
User Offline
Joined: 1st Aug 2003
Location: United Kingdom
Posted: 30th Aug 2004 01:53
hows it going?


Scorched Turf --> Project Thread
1010011 binary for 83
20
Years of Service
User Offline
Joined: 19th Aug 2004
Location: where ever your not
Posted: 30th Aug 2004 02:58
dude you bumped an old thread more than a week ago was the last post what page was it on if it is past page 2 then don't post start a new one or something
especially just to say hows it going

http://free-space.myftp.org for free ftp
website hosting and e-mail 60mb for ftp/webste 70mb for e-mail
Powersoft
21
Years of Service
User Offline
Joined: 1st Aug 2003
Location: United Kingdom
Posted: 30th Aug 2004 03:10
how about no. . .

i dont do it on all threads just this one


Scorched Turf --> Project Thread
Neophyte
22
Years of Service
User Offline
Joined: 23rd Feb 2003
Location: United States
Posted: 31st Aug 2004 17:46
@Powersoft

"hows it going?"

I'm progressing, but not very quickly. Lately I've had a lot of other things I had to do. So I haven't really gotten the time to work on the final part too much.

However, from what I've managed to get down so far I'm going to have to say the part I'm working on currently is going to be pretty big. The first part of the tutorial was 20 KB. The second part is currrently at 24 KB. The third part is currently at 15 KB and I don't think I'm even a third of the way through yet.

Hopefully, in a few days time I'll have the opportunity to work on it again with all of my energies. Instead, of being spread out between 4 different projects.

In case your curious though, I'm at the point where I'm discussing the way I encoded the assembly program into my test program. This part is a lot more complicated then I originally anticipated.

Personally, I'm thinking about ditching this part as it's kind of tedious and really strays from the subject of PE file generation. But I'm worried that if I don't explain how I hard-coded the assembly program into my program the later code which reads it will seem rather bizarre.

We'll see how it pans out later when I get back to working on it full time. I might just end up re-writing major sections of it(again).
Powersoft
21
Years of Service
User Offline
Joined: 1st Aug 2003
Location: United Kingdom
Posted: 31st Aug 2004 18:29
thats ok but just made sure this topic stayed alive


Scorched Turf --> Project Thread
MikeS
Retired Moderator
22
Years of Service
User Offline
Joined: 2nd Dec 2002
Location: United States
Posted: 1st Sep 2004 08:37
If Powersoft doesn't bump the thread, I do every 28 or 29 days.

Been busy with school recently, and haven't made much work on anything on my compiler project. Hopefully will be able to get some work done this weekend.



A book? I hate book. Book is stupid.
(Formerly known as Yellow)
Powersoft
21
Years of Service
User Offline
Joined: 1st Aug 2003
Location: United Kingdom
Posted: 12th Sep 2004 00:34
you go to school Mike? wow thought you worked


Scorched Turf --> Project Thread
Neophyte
22
Years of Service
User Offline
Joined: 23rd Feb 2003
Location: United States
Posted: 28th Sep 2004 10:16
Well I feel I owe an explaination for why the third part hasn't been released yet.

After I posted last time I ran into a little snag that prevented me from working on the tutorial any further. However, I figuared out the problem and resumed work on the tutorial. Unfortunatley, this is getting way more complicated than I had originally intended. The first part of my tutorial was 20 kb in size. The second part is 24 kb in size. The third part, which still isn't finished, is currently 26 kb in size. And I'm still not done explaining how I hard coded the assembly program into my program!

I never imagined how much more complicated this could get explaining how to store the program in an intermediate language. Worst case I assumed it would take me 11 kb. I'm now over twice my original expectation and I'm still not finished. Ouch!

On the positive side of things, I am making progress. And recently I went hunting on CiteSeer for more compiler related info and came up with about 40 new documents some of which are whole thesises. I'll probably post those later when I've sorted through them. For now though, I'll have to get back to working on that third part. Hopefully, with enough persistence I can finally finish this thing and get it out to the general public. It's release is long over due.
MikeS
Retired Moderator
22
Years of Service
User Offline
Joined: 2nd Dec 2002
Location: United States
Posted: 29th Sep 2004 05:29
This is excellent to see all the progress you've made Neophyte. I'm still making slow and steady progress on an interpreter in my spare time. Though, I've restarted and am doing it(or rather attempting) in C++ now.

Keep up the awesome work, and I really mean that.



A book? I hate book. Book is stupid.
(Formerly known as Yellow)

Login to post a reply

Server time is: 2025-06-04 02:19:38
Your offset time is: 2025-06-04 02:19:38