@TKF15H
I'll start working on that in a little while then. But first, the links.
@MikeS
I promised this a while ago but I've been a little busy (and a little lazy) lately. So better late than never I suppose.
A Formal, Language-Independent, and Compositional Approach to Interprocedural Control Dependence Analysis
http://citeseer.ist.psu.edu/600335.html
Discusses a formal approach to creating interprocedural dependency graphs. A bit on the advanced side, but interesting nonetheless.
An Overview of Cache Optimization Techniques and Cache-Aware Numerical Algorithms
http://citeseer.ist.psu.edu/kowarschik03overview.html
Caches are becoming more and more important as CPU speeds race on while memory speeds lag behind. This paper describes a number of advanced optimization techniques to improve cache performance.
Building a Control-flow Graph from Scheduled Assembly Code
http://citeseer.ist.psu.edu/cooper02building.html
This is more of an interesting topic than a usefull topic. It deals with, as the name implys, taking compiled assembly and uncompiling it into a control-graph.
Designing Intermediate Representations for Optimization
http://citeseer.ist.psu.edu/207376.html
This paper is paticularly instructive. It discusses the various design issues one might encounter when designing an Intermediate Representation. Recommended reading.
Instruction Scheduling Using Integer Programming
http://citeseer.ist.psu.edu/230478.html
This paper discusses optimal instruction scheduling using interger programming techniques. The technique is a bit time-consuming but it appears to produce good results.
Integer Programming Based Register Allocation
http://citeseer.ist.psu.edu/220772.html
I was considering integer programming as an alternative to graph-coloring(since graph-coloring works poorly on the IA-32). Later on, I found out that it was not easy to implement and slow to compile though it did produce good results. I've since changed plans and no longer want to rely on Integer Programming for register allocation, but you might decide differently so here is the paper.
Operator Strength Reduction
http://citeseer.ist.psu.edu/54279.html
Operator strength reduction is a basic optimization that takes costly(strong) operations and replaces them with cheap(weak) operations. This paper describes a simple algorithm to do just that and that was built for beginners in mind.
Practical Improvements to the Construction and Destruction of SSA From
http://citeseer.ist.psu.edu/193349.html
Describes some improvements to Static Single Assignment form. Usefull if you are going to use SSA form.
Value Driven Redundancy Elimination
http://citeseer.ist.psu.edu/66713.html
Advanced Optimization. Is a combination of Value Numbering(see below) and code motion.
Combining Analyses, Combining Optimizations
http://citeseer.ist.psu.edu/213053.html
Nifty super-optimization that combines Constant Propogation, Unreachable Code Elimination, Global Congruence Finding and Global Value Numbering. This is also worth quoting:
Quote: "In conjunction with global code motion, these peephole optimizations generate excellent code very quickly, a useful feature for compilers that stress compilation speed over code quality."
Might be worth looking into if, like me, you are worried about compilation speed.
Value Numbering
http://citeseer.ist.psu.edu/86621.html
Value Numbering assigns an identifying number to each expression. The numbers are assigned in such a way so that if two numbers match then it can be proven that the two expressions are equal for all program inputs.
This paper discusses some improvements to the heuristic.
Advanced Copy Propagation for Arrays
http://citeseer.ist.psu.edu/vanbroekhoven03advanced.html
Advanced Optimization. After an array is as signed, we can, under certain conditions, replace a read from this array by the righthand side of the assignment. If so, the intermediate assignment can be skipped. In case it becomes dead code, it can be eliminated.
Minimum Register Instruction Sequencing to Reduce Register Spills in Out-of-Order Issue Superscalar Architectures
[href]hpc.serc.iisc.ernet.in/papers/IEEETC03-GovEtAl.pdf[/href]
Method of order instructions so the minimal amount of registers is used. Related to instruction scheduling and register allocation. Looks interesting, but I'm unsure of how usefull it will be to us.
The Stratego Series
Program Transformation with Stratego_XT-Rules, Strategies, Tools, and Systems
[href]www.cs.uu.nl/~visser/ftp/Vis03-strategoxt.pdf[/href]
From the Abstract:
Quote: "Stratego/XT is a framework for the development of transformation
systems aiming to support a wide range of program transformations. The framework
consists of the transformation language Stratego and the XT collection of
transformation tools. Stratego is based on the paradigm of rewriting under the
control of programmable rewriting strategies. The XT tools provide facilities
for the infrastructure of transformation systems including parsing and prettyprinting.
The framework addresses the entire range of the development process;
from the specification of transformations to their composition into transformation
systems. This chapter gives an overview of the main ingredients involved in the
composition of transformation systems with Stratego/XT, where we distinguish
the abstraction levels of rules, strategies, tools, and systems."
Strategies for Source-to-Source Constant Propagation
http://citeseer.ist.psu.edu/olmos02strategies.html
Shows how data optimizations can be applied to abstract syntax trees using Stratego.
The Stratego Tutorial and The Stratego Reference Manual
http://citeseer.ist.psu.edu/56794.html
http://citeseer.ist.psu.edu/294659.html
These explain the stratego language. What it does, and how it does it.
Warm Fusion in Stratego - A Case Study in Generation of Program Transformation Systems
http://citeseer.ist.psu.edu/373123.html
Provides an example of Stratego in action on a non-trival problem. This is the last Stratego paper that I'll be mentioning for now. The following papers are unrelated to Stratego.
Catamorphic Approach to Program Analysis
http://citeseer.ist.psu.edu/561292.html
Proposes a new framework for program analyses that regards them as maximum marking problems. Looks interesting and may prove usefull. But it also looks kind of difficult to implement. I'll have to look into it more later.
Iterative-Free Program Analysis
http://citeseer.ist.psu.edu/601409.html
Proposes a new way to analyize a program using recursive-graph traversal instead of the much more common iterative method. Looks interesting, but seems a bit too advanced.
Tupling Calculation Eliminates Multiple Data Traversals
http://citeseer.ist.psu.edu/21786.html
When optimizing a program, it is quite common for you to have to traverse a data structure multiple times. This paper discribes the Tupling Calculation which is "a well known transformation tactic to obtain new efficient recursive functions by grouping some recursive functions into a tuple." It seems to be intended for functional and not imperative languages(C, Basic, etc.) so I don't know if it will be usefull to us or not.
Combined Code Motion and Register Allocation using the Value State Dependence Graph
http://www.cl.cam.ac.uk/~am/papers/cc03.ps.gz
This paper describes a method of combining the code motion and register allocation phases efficiently. This looks interesting however it uses a form of Graph Coloring as a register allocator so I don't know how suitable this will be for x86 development.
Early Control of Register Pressure for Software Pipelined Loops
http://citeseer.ist.psu.edu/615495.html
This focuses on register allocation in loops. It brings up some good points about how register allocation should be favored over instruction scheduling because the penelty for missed cache hits and loads is higher than poorly pipelined code. It uses RISC type machines as its example though so I don't know if this will translate to well over to CISC machines like the x86.
SIRA-Schedule Independent Register Allocation for Software Pipelining
http://citeseer.ist.psu.edu/605752.html
This paper also deals with register allocation in loops. Only this one uses integer programing to solve the problem.
Register Saturation in Data Dependence Graphs
http://citeseer.ist.psu.edu/touati00register.html
Yet another paper on register allocation and instruction scheduling. This one is rather long at about 106 pages though and goes into some detail. It basically describes how register constraints can be taken into account during the scheduling phase using multiple DAGs.
Optimal loop parallelization under register constraints
http://citeseer.ist.psu.edu/eisenbeis96optimal.html
Deals with the same topic as the above paper only it solves it with a linear integer programming solution.
LoRA: a Package for Loop Optimal Register Allocation
http://citeseer.ist.psu.edu/223965.html
Describes a package, LoRA, that implements several algorithms for trading the register pressure against the code size.
Interactive Compilation
Compiler Modifications to Support Interactive Compilation
http://citeseer.ist.psu.edu/471977.html
This is someone's thesis that describes the steps they had to take to create an interactive compiler that would allow the user to view and modify the program as it is compiled. This is a pretty interesting concept and I've been seriously considering implementing it. However, given the complexity of this, it would have to be a project that you undertake from scratch instead of trying to tack on the GUI to an existing compiler. You might be interested in trying something like this if you have the time and ambition. It is 64 pages total.
Graphical User Interface for Compiler Optimizations with Simple-SUIF
http://citeseer.ist.psu.edu/252813.html
Hugh technical report/thesis (138 pages). Discusses a GUI for a compiler. Quite similar to the paper above only it is meant more as a means to study the process of back-end optimization than modifying the result.
The FeedBack Compiler
http://citeseer.ist.psu.edu/17434.html
This is the last paper in my Interactive Compilation series. This paper is much shorter than the previous 2(it is 8 pages long). It focuses more on user feedback to improve certain optimizations.
Value Dependence Graphs- Representation Without Taxation
http://citeseer.ist.psu.edu/345322.html
From the abstract:
Quote: "The value dependence graph(VDG) sparse dataflow-like representation that simplifies program analysis and transformation. It is a functional representation that represents control flow as data flow and makes explicit all machine quantities, such as stores and I/O channels."
Looks interesting. The paper goes on to describe several optimizations and how they can exploit the power of the VDG.
Fusion-Based Register Allocation
http://citeseer.ist.psu.edu/386426.html
From the abstract:
Quote: "The register allocation phase of a compiler maps live ranges of a program to registers. If there are more candidates than there are physical registers, the register allocator must spill a live range (the home location is in memory) or split a live range (the live range occupies multiple locations). One of the challenges for a register allocator is to deal with spilling and splitting together. Fusion based register allocation uses the structure of the program to make splitting and spilling decisions, with the goal to move overhead operations to infrequently executed parts of a program.
The basic idea of fusion based register allocation is to build up the interference graph. Starting with some base region (e.g., a basic block, a loop), the register allocator adds basic blocks to the region and incrementally builds the interference graph. When there are more live ranges than registers, the register allocator selects live ranges to split; these live ranges are split along the edge that was most recently added to the region."
This is a novel approach to register allocation. Though I'm not too sure if it will be suitable for x86 use.
Issues in Register Allocation by Graph Coloring
http://citeseer.ist.psu.edu/105594.html
Describes many of the short-comings of graph coloring and three improvements to overcome them.
Call-Cost Directed Register Allocation
http://citeseer.ist.psu.edu/lueh97callcost.html
Takes the methods described in the paper above and use them to select the right kind of register for a live range. Not too sure that this is usefull.
Integer Linear Programming vs Graph-Based Methods in Code Generation
http://citeseer.ist.psu.edu/5186.html
Discusses the phase ordering problem and the integer programming approach in depth. Offers some approximations of the integer programming approach to speed up compile time.
Linear-Time Register Allocation for a Fixed Number Registers
http://citeseer.ist.psu.edu/2404.html
This paper is somewhat different from all the other register allocation papers. In this paper the goal is not to find the minimum number of registers needed to realize a given program without spilling, but can a program be realized without spilling with at least
n registers. This paper claims that it can be solved in linear time as long as the program is well structured(i.e. goto free).
Printing Floating-Point Numbers Quickly and Accurately
http://citeseer.ist.psu.edu/28233.html
This doesn't have to do with compilers, but I thought I'd include it anyway in the off chance you might want to write your own floating-point printing routine.
A Register Allocation Framework Based on Hierarchical Cyclic Interval Graphs
http://citeseer.ist.psu.edu/15321.html
Novel register allocation technique using cyclic interval graphs. Might be of some use.
Register Allocation for Indirect Addressing in Loops
http://citeseer.ist.psu.edu/38080.html
Although it doesn't deal with x86 CPUs directly, it does deal with CISC machines. It also mentions several interesting papers that will probably be worth looking into later on. Not too relevant to x86, but usefull nonetheless in helping us find some interesting papers.
Combining Register Allocation and Instruction Scheduling
http://citeseer.ist.psu.edu/115521.html
Yet another method for combining register allocation with instruction scheduling. I'll leave it to you to judge its worth in light of the others.
Hardness and Algorithms for Local Register Allocation
http://citeseer.ist.psu.edu/54499.html
This paper deals with local register allocation, but without any instruction scheduling. It uses a novel integer programming algorithm to solve the problem quickly and efficiently. It is a bit complicated but it looks promising.
Register Allocation sans Coloring
http://citeseer.ist.psu.edu/42139.html
Implements a register allocator that doesn't use graph coloring. Its called fast allocation and is, well, fast.

It is also remarkable simple. Definately worth taking a look.
The Power of Belady's Algorithm in Register Allocation for Long Basic Blocks
http://citeseer.ist.psu.edu/647675.html
From the abstract:
Quote: "Optimization techniques such as loop-unrolling and trace-scheduling can result in long straight-line codes. It is, however, unclear how well the register allocation algorithms of current compilers perform on these codes. Compilers may well have been optimized for human written codes, which are likely to have short basic blocks. To evaluate how the state of the art compilers behave on long straight-line codes, we wrote a compiler that implements the simple Belady’s MIN algorithm.
The main contribution of this paper is the evaluation of Belady’s MIN algorithm when used for register allocation for long straight-line codes. These codes were executed on a MIPS R12000 processor. Our results show that applications compiled using Belady’s MIN algorithm run faster than when compiled with the MIPSPro or GCC compiler. In particular, Fast Fourier Transforms (FFTs) of size 32 and 64 run 12% and 33% faster than when compiled using the MIPSPro compiler."
Understanding and Improving Register Assignment
http://citeseer.ist.psu.edu/355529.html
Discusses coordinating register allocation strategies with instruction scheduling.
Evaluation of Algorithms for Local Register Allocation
http://citeseer.ist.psu.edu/665874.html
Four different LRA algorithms are compared for code effeciency and execution time of the algroithm itself. Worth looking into.
Quality and Speed in Linear-Scan Register Allocation
http://citeseer.ist.psu.edu/91122.html
Finally, we get to this paper. After a lot of thought, Linear-Scan register allocation is the algorithm I'm going to use for my compiler, or at least the first implementation of it. It has that combination of speed of execution, simplicity of implementation, and quality of code that I've been looking for for quite a while. This thesis describes it in full. It is definately worth a look as it appears to be the algorithm to go for when writing x86 compilers.
Correction to 'Producing Good Code for the Case Statement'
http://citeseer.ist.psu.edu/kannan94correction.html
Single page document that describes a O(n^2) algorithm for splitting a case-statements jump table into a minimum number of subtables(of a given density). Short and informative.
Code Generation Techniques
http://citeseer.ist.psu.edu/proebsting92code.html
This 154 page thesis introduces an alternative to graph coloring called
Probablisitc Register Allocation. It promises to find the optimal solution to the old phase ordering problem in linear(?) time. Also, it is quite simple fitting on a single page. Might prove promising.
Code Generation Techniques for Irregular Architectures
http://citeseer.ist.psu.edu/173331.html
Another paper that deals with the phase-ordering problem. This time it deals with irregular architectures and the need to quickly retarget compilers to new processors.
Code Scheduling and Optimization for a Superscalar x86 Microprocessor
http://citeseer.ist.psu.edu/281885.html(This is the actual page. Just d/l the pdf)
Mandatory reading. This is one of the few papers that I could find that deals with writing a compiler for the x86 architecture. It is a bit old unfortunately, but it is still relevant today. It goes into some depth on how to schedule code, create a good machine description language, etc. Very informative.
Effective Instruction Scheduling With Limited Registers
http://citeseer.ist.psu.edu/627129.html
Phase-ordering problem paper. It is about 134 pages long and was written quite recently(2001).
Experimental evaluation and improvements to linear scan register allocation
http://user.it.uu.se/~happi/publications/regalloc_spe.pdf
Mandatory reading. This one describes improvements to the linear scan register allocator to allow it to run on x86 hardware efficently. Very usefull. If you are thinking of implementing a linear-scan register allocator like I am than this paper is definately a must read.
Improvements to Linear Scan register allocation
[href]www.llvm.org/ProjectsWithLLVM/2004-Fall-CS426-LS.pdf[/href]
I actually found this one when I was making this post. It was released in 2004 April 1 so it is pretty recent. It mentions several of the papers that I've mentioned before and builds upon them. Definately worth looking into if you are planning on making a linear scan register allocator.
Formal Compiler Implementation in a Logical Framework
http://citeseer.ist.psu.edu/571905.html
In case you are having trouble designing and implementing a compiler I found this. It looks like it could be quite usefull.
From the abstract:
Quote: "In this paper, we present a new approach based on higher-order abstract syntax and term rewriting in a logical framework. All program transformations, from parsing to code generation, are cleanly isolated and specified as term rewrites. This has several advantages. The correctness of the compiler depends solely on a small set of rewrite rules that are written in a language of formal mathematics. In addition, the logical framework guarantees the preservation of scoping, and it automates many frequently occuring tasks including substitution and rewriting strategies. As we show, compiler development in a logical framework can be easier than in a general purpose language like ML, in part because of automation, and also because the framework provides extensive support for examination, validation, and debugging of the compiler transformations. This paper is organized around a case study, using the MetaPRL logical framework to compile an ML-like langauge to Intel x86 assembly. We also present a scoped formalization of x86 assembly in which all registers are immutable."
Linear Scan Register Allocation
http://citeseer.ist.psu.edu/poletto99linear.html
Another paper on the usefull linear scan register allocator.
Compilation of Floating Point Arithmetic in the High Performance Erlang Compiler
http://citeseer.ist.psu.edu/547273.html
Although I have a register allocator planned in the form of the linear-scan register allocator algorithm there is a slight problem. It only deals with integers. The pecularity of the stack architecture of the x87 floating-point processor means that some modification of the linear scan register allocator is needed to compile floating-point arithmetic. This paper describes these kind of modifications that were done on an actual x86 compiler for the Erlang language.
Reducing loads and stores in stack architectures
http://citeseer.ist.psu.edu/378471.html
Although not directly related to the x87, these stack optimizations might prove handy when dealing with it.
SMLNJ- Intel x86 back-end Compiler Controlled Memory
http://citeseer.ist.psu.edu/647811.html
From the abstract:
Quote: "This note describes the code generation algorithm used for the Intel x86, introduced in version 110.16. The standard Chaitin graph coloring register allocation cannot be used directly for machines with few registers, as all temporaries wind up being spilled, making for a poor allocation[Cha82]. Thus, for the x86, the conceptual model of the architecture has been extended with a set of memory locations that are treated as registers. The net effect is one where temporaries are computed into memory locations and passed as arguments to functions. The use of these memory locations managed in this way can be as fast as using registers, where the register allocation algorithm is indirectly taking the hardware register renaming mechanism into account."
Some notes on the new MLRISC x86 floating point code generator
http://citeseer.ist.psu.edu/682023.html
Describes an efficient scheme for generating floating-point arithmatic. I believe that this is the scheme that I'll be using for my floating-point register allocator. Mandatory reading.
Misc
They discuss interesting reseach papers on compiler optimizations
http://paramount.www.ecn.purdue.edu/ParaMount/reading.html
Code Specialization Literature Survey
http://www1.cs.columbia.edu/~locasto/projects/codespec/codespecialization.survey.html
Nifty page filled with examples of low-level optimizations for Pentiums. Must read.
http://www.goof.com/pcg/doc/pentium-optxref.html
Pdf link to a book about assemblers and loaders conviently titled Assemblers and Loaders
http://www.ecs.csun.edu/~dsalomon/assem.advertis/asl.pdf
Page that contains some info about said book can be found here
http://www.ecs.csun.edu/~dsalomon/assem.advertis/AssemAd.html
The PowerPC Compiler Writer's Guide
http://www-306.ibm.com/chips/techlib/techlib.nsf/techdocs/852569B20050FF7785256996007558C6
Not x86 related, but in case you every feel the urge to write a compiler for a Mac.