Compilers - SisInf Lab
Transcript
Compilers - SisInf Lab
Formal Languages and Compilers Master’s Degree Course in Computer Engineering A.Y. 2015/2016 FORMAL LANGUAGES AND COMPILERS Compilers Floriano Scioscia 1 Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Compiler history • The name compiler was introduced in 1950. Translation was seen as a “compilation” of a sequence of machine language subprograms, selected from a library. • 1957: the FORTRAN team at IBM, lead by John Backus, is credited for inventing the first complete compiler. • 1960: COBOL was one of the first languages compiled for multiple architectures. • A compiler is itself a program written in some language. The earliest compilers were written in Assembly. • 1962: the first self-compiled compiler (i.e. capable of compiling its own source code) was developed for the Lisp language by Hart and Levin at MIT. • Creating a self-compiling compiler introduces a bootstrapping problem: the very first compiler for that language must necessarily be written in another language or compiled having the compiler work as an interpreter (like Hart and Levin did with their Lisp compiler). • Early 1970s: the use of high-level languages to write compilers received a boost, as Pascal and C compilers were written in the same languages. • Ambitious “optimizations” were adopted to generate efficient machine code: this was essential for the first computers, having limited resources. Efficient resource usage is still of high importance for modern compilers. Compilers - Floriano Scioscia 2 Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Compiler implementation goals Correctness • Compilers allow programmers to discover lexical and logical errors. • Compilation techniques allow to improve security (e.g. Java Bytecode Verifier). Intellectual property protection Efficiency • Reduce time and memory occupation for data and code at both compile and run time. • Support language expressiveness. Providing a “development environment”. • Grant a fast turn-around time. • Enable separate compilation. • Allow source code debugging. Compilers - Floriano Scioscia 3 Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari “Compilers” as programming language translators only? Beyond translating high-level programming languages to lower-level ones, compilers have further uses: • TeX and LaTeX use compilers to translate text and formatting markup to a typeset document. • Postscript (generated from LaTeX, Word, etc.) is a programming language interpreted by a printer or a Postscript viewer to produce a human-readable form of a document. • Mathematica, MatLab and others are interactive systems mixing programming with mathematics. They use compilation techniques to manage the problem specifications, internal representation and solution. • Verilog and VHDL support the design of VLSI circuits. A silicon compiler specifies the layout and composition of VLSI circuit masks using standard building blocks: just like a typical compiler, they understand and apply design rules which determine the feasibility of a circuit design. • Interactive tools often need a programming language to support automatic analysis and modification of a system. Compilers - Floriano Scioscia 4 Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari What is a compiler? • It is a program which reads a sentence in a language and translates it in sentence(s) in another language. Source program COMPILER usually a program written in a high-level language errors Compilers - Floriano Scioscia Object program usually the equivalent program in machine code for a specific architecture 5 Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Role of compilers Support the use of high-level programming languages • Increase programmers’ productivity • Easier code maintainability • Higher code portability Exploit opportunities provided by low-level architecture details • Instruction selection • Addressing modes • Pipeline • Cache usage • Instruction-level parallelism Compilers are needed to bridge the gap between high-level and lowlevel languages • Architecture changes → compiler changes • Significant performance differences Compilers - Floriano Scioscia 6 Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Compiler development Compilers are software systems of large size and complexity. Compiler development requires knowledge about: • Programming tools (compilers, debuggers) • Program-generation tools (LEX/YACC, Flex/Bison) • Software libraries (sets, collections) • Simulators Knowledge of compiler architecture improves the effectiveness of software designers/developers. Compilers - Floriano Scioscia 7 Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Compiler requirements • Correctness of generated code • Efficiency of generated code (output runs fast) • Efficiency of the compiler (compiler runs fast) • Compile time must be proportional to code size • Separate compilation • Accurate syntax error diagnostics • Good interoperability with the debugger • Accurate anomaly detection • Allowing cross language calls • Predictable optimizations Compilers - Floriano Scioscia 8 Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Compiler architecture • Compilers typically include two main phases: analysis and synthesis. • In the analysis phase an intermediate representation of the source program is created. • This phase includes: – lexical analyzer, – syntax analyzer, – semantic analyzer, – intermediate code generator. • Starting from the intermediate representation, in the synthesis phase the equivalent target program is created. • This phase includes: – code generator, – optimizer. Compilers - Floriano Scioscia 9 Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Compiler phases (1/2) Source program Scanner Parser Semantic checker Symbol table Intermediate code generator Error handlers Machine-independent code optimizer Code generator Machine-dependent code optimizer Target program Compilers - Floriano Scioscia 10 Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Compiler phases (2/2) • Each phase transforms the source program from a representation to another. • Error handlers and symbol table are used in all phases. • The symbol table contains information on all symbolic elements, such as name, scope, type (if present), etc. COMPILER PHASES Sequence of characters ARCHITECTUREINDEPENDENT OPTIMIZER LEXICAL ANALYZER Sequence of tokens Optimized intermediate code SYNTAX ANALYZER FRONT END (Analysis) CODE GENERATOR Parse tree SEMANTIC ANALYZER Abstract syntax tree with attributes BACK END (Synthesis) Machine code ARCHITECTUREDEPENDENT OPTIMIZER INTERMEDIATE CODE GENERATOR Optimized machine code Intermediate code Compilers - Floriano Scioscia 11 Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Compiler classification criteria (1/2) Number of steps • It is the number of times the source code is read during compilation • Single-pass compilers – multi-pass compilers Optimization • No optimization • Optimization in space • Optimization in time • Optimization in power consumption Generated target format • Assembly language • Relocatable binary • Memory image Compilers - Floriano Scioscia 12 Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Compiler classification criteria (2/2) Generated object language • pure machine code – Compilers generate code for a particular machine instruction set, not presuming the presence of any operating system or function library. This approach is rare, used only in system implementation. • augmented machine code – Compilers generate code for a particular machine instruction set, augmented by operating system and support routines: in order to execute such object code, the target machine must have an operating system and a collection of runtime support routines (I/O, memory allocation, etc.) which must be fused with the object code. The degree of correspondence between code and hardware can vary widely. • virtual machine code – Compilers generate virtual machine code exclusively. This approach is attractive because the object code is executed regardless of the underlying hardware. If the virtual machine is kept simple, its interpreter can be written easily. – This approach penalizes execution speed, typically of a 3:1 to 10:1 factor. "Just in Time" (JIT) compiler can translate virtual code sections to native code to speed up execution. Compilers - Floriano Scioscia 13 Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Benefits of virtual machine code The use of virtual machine code can be beneficial for several purposes: – Simplifying a compiler by providing suitable primitives (e.g. method calls, string manipulation, etc.); – Compiler portability; – Decreasing the size of generated code, since the instruction set is designed for a particular programming language (e.g. JVM bytecode from Java). • In order to generate virtual machine code, almost all compilers, in varying degree, need to interpret some operations. Compilers - Floriano Scioscia 14 Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Generated target format (1/2) Assembly Language (Symbolic) Format • A text file containing the assembly source code is produced: some decisions in code generation (jump instruction targets, address structure, etc.) are left to the assembler. – This is a good approach for didactic projects. – It supports the generation of assembly code for "cross-compilation" (compiling occurs on a different machine w.r.t. the one the code will be executed). – The assembly code generation simplifies debugging and understanding of a compiler (as the generated code can be seen). – Rather than a specific assembly language, C can be used as “universal assembly“ language. C is “machine independent “, more than any particular assembly language. Nevertheless, some features of a program (such as run-time data representation) are not accessible using C code, but easily accessible in assembly language. Compilers - Floriano Scioscia 15 Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Generated target format (2/2) Relocatable Binary Format • The code can be generated in a binary format with references to external and local instructions and with data addresses not “bounded” yet. • The addresses are assigned relative to the start of the module or with respect to a symbolic unit denomination. – A linkage step allows adding support libraries and other separately compiled routines and produces an executable absolute binary format of the program. Memory-Image (Absolute Binary) Format • The compiled code can be loaded to memory and executed immediately. – This is the fastest method but the possibility to use libraries can be limited and the program must be recompiled for each run. – Memory-image compilers are useful for students, who frequently debug and change the code, and when the compiling costs are larger than execution ones. Compilers - Floriano Scioscia 16 Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Single-phase compilers Lexical analyzer Code generator Syntax and semantic analyzer Compilers - Floriano Scioscia 17 Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Two-phase compilers (1/4) • Current compilers split the compiling process in two main phases, the front end and the back end. Each can require reading the source code. • In the front-end step, the compiler translates source code in an intermediate language (usually internal to the compiler). Hence the front end depends on the source language, but not on the target machine. • In the back-end step, a preliminary optimization of the intermediate code sometimes occurs, then the object code is generated and optimized. The back end is therefore independent on the source language, but it depends on the target machine. Front end Source language Back end Intermediate language Object language Errors Compilers - Floriano Scioscia 18 Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Two-phase compilers (2/4) Phases and passes • A single pass can be enough for more phases (interleaved during the pass): for example, analysis and generation of intermediate code can be done in a single pass “driven” by the syntax analyzer. • Decreasing the number of passes reduces required time, but it increases the required memory. • The intermediate language (IL) can be of: – high level, implying that the source language operators are still present in the IL; – low level, with the source language operators translated to other, simpler or more specialized ones. if cond then branch1 else branch2 ifop cond branch1 branch2 Compilers - Floriano Scioscia jump iffalse cond label2 branch1 jump label_exit label2: branch2 label_exit 19 Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Two-phase compilers (3/4) Consequences of two-phase compilation: • Building a compiler for a new processor (retargeting) is simpler; • Compilers with multiple front ends can be designed. Front end and Back end can be reused separately for new compilers. Compilers - Floriano Scioscia 20 Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Two-phase compilers (4/4) PASCAL code Front end for PASCAL Back end for processor P1 Object code for P1 PASCAL code Front end for PASCAL Back end for processor P2 Object code for P2 PASCAL code Front end for PASCAL Back end for processor P3 Object code for P3 PASCAL code Front end for PASCAL Back end for processor P4 Object code for P4 PASCAL code Front end for PASCAL Back end for processor P1 Object code for P1 Front end for C Back end for processor P2 Object code for P2 Front end for ADA Back end for processor P3 Object code for P3 Back end for processor P4 Object code for P4 C code ADA code Retargeting Multiple front-end Compilers - Floriano Scioscia 21 Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Lexical analyzer a.k.a. scanner (1/2) • It processes the preprocessor directives (include, define, etc) • Transforms the source code in a compact and uniform structure (sequence of tokens, i.e. lexical elements) • Removes unnecessary information in the source code (comments, whitespace) • Identifies lexical errors • Allows describing tokens effectively through regular expression notation. • A token describes a set of strings having the same role (e.g. identifiers, operators, keywords, numbers, delimiters, etc.). • Tokens, described by regular expressions, are the smallest elements (not further decomposable) of a languages, such as keywords (for, while), variable names (goofy), operators (+, -, <<). Compilers - Floriano Scioscia 22 Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari • Lexical analyzer a.k.a. scanner (2/2) A lexical analyzer (a.k.a. scanner) reads the source text one character at a time and returns the tokens of the source program. newval := oldval + 12 token: newval identifier := assignment operator oldval identifier + add operator 12 number • Information on identifiers are stored into a symbol table. • Regular expressions are used to describe tokens. • A Deterministic Finite-state Automaton can be used to implement a scanner Compilers - Floriano Scioscia 23 Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Syntax analyzer a.k.a. parser • Syntax analysis is the process of construction of the derivation for a sentence with respect to a given grammar. • Therefore a parser is an algorithm working on a string: if the string belongs to the language generated by the grammar, parsing produces a derivation, otherwise it stops notifying: – the place within the string where the error occurred; – the type of error (diagnosis). • Parsing takes in input the sequence of tokens produced by the previous lexical analysis step and perform syntactic checks through a grammar. The output of this step is a parse tree. • Parsing: – looks for syntax errors; – groups tokens in grammar sentences. Compilers - Floriano Scioscia 24 Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Syntax analyzer (CFG) (1/2) • Syntax of a programming language is specified by means of a Context Free Grammar (CFG). • Rules of a CFG are recursive. • To express a CFG, a BNF (Backus Naur Form) is often used: assgstmt ::= identifier := expression expression ::= identifier expression ::= number expression ::= expression + expression • A parser checks whether each program statement obeys the rules of the language CFG. • If it is so, the parser creates a parse tree for the program statements. Compilers - Floriano Scioscia 25 Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari • Syntax Analyzer (CFG) (2/2) A parser usually creates the syntactic structure as a parse tree for each specific construct of a programming language. Hereafter there is the parse tree of an assignment statement; likewise, parse trees are produced by the parser for loop statements, as we will see assgstmt identifier newval := expression expression + identifier oldval • expression number 12 In a parse tree, all terminals are leaves. All other nodes are nonterminals. Compilers - Floriano Scioscia 26 Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari • Parsing techniques Based on the way the parse tree is created, different parsing techniques exist. They can be classified in two groups: Top-Down Parsing: descending or “predictive” parsing • The construction of the parse tree starts from the root and proceeds toward the leaves. • Efficient top-down parsers can be easily written by hand. • Recursive Predictive Parsing, Non-Recursive Predictive Parsing (LL Parsing: Left-to-right, Left-most). Bottom-Up Parsing: ascending or “shift-reduce” parsing • The construction of the parse tree starts from the leaves and proceeds toward the root. • Efficient bottom-up parsers are usually created with the assistance of software tools. • Bottom-up parsing is also known as shift-reduce parsing. Operator-Precedence Parsing – easy to implement. • LR Parsing: Left-to-right, Right-most. • Compilers - Floriano Scioscia 27 Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Semantic analysis • Semantic analysis checks the meaning of statements in the source code. Typical checks in this step include type checking, verifying that identifiers have been declared before being used, etc.. The outcome of this step is the Abstract Syntax Tree (AST). • Semantic analysis: – looks for semantic errors; – gathers information on types by doing type checking. • Example: newval := oldval + 12 The type of newval identifier must match the type of expression (oldval+12) • A Semantic Analyzer looks for semantic errors in the source program and collects information needed for the subsequent code generation and optimization. Compilers - Floriano Scioscia 28 Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Semantic analyzer • Semantic information cannot be typically represented with a simple parse tree. Specific attributes (e.g. data types and values in expressions) must be associated to constructs (more precisely, to terminals and nonterminals of the grammar). • Productions of a CFG must be integrated by annotating them with rules (semantic rules) and/or code fragments (semantic actions). Rules define how to compute the value of the attributes of nodes in the production. Code fragments are executed when the production is used during syntax analysis. • The execution of semantic actions, in the order determined by parsing, produces a syntax-directed translation of the program statements as a result. Compilers - Floriano Scioscia 29 Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Type checking • Type checking is an important part of semantic analysis: it checks the static semantics of each node of the parse tree; it checks that the construct is legal (all involved identifiers must be declared, types must be correct, etc.); if the construct is semantically correct, type checking annotates the node, adding information about the type or the related symbol table entry, so allowing the creation of the AST for the construct; if a semantic error is discovered, a proper error message is notified. • Type checking depends purely on the semantic rules of the source language, i.e. it is independent from the target language. Compilers - Floriano Scioscia 30 Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Syntax Analyzer vs Lexical Analyzer Which program constructs must be recognized by a lexical analyzer and which by a syntax analyzer? • They both do the same thing, but the lexical analyzer only deals with non-recursive language constructs. • The syntax analyzer deals with recursive language constructs. • The lexical analyzer simplifies the work for the syntax analyzer. • The lexical analyzer recognizes the smallest elements (the tokens) of the source program. • The syntax analyzer works on tokens, recognizing the language structures. Compilers - Floriano Scioscia 31 Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Intermediate code generator (1/2) • The abstract syntax tree is the first form of intermediate representation (IR) of the source program. • A low-level IR can be obtained explicitly from it, which is similar to machine code and can be deemed as a kind of program for a virtual machine. • This intermediate code is known as three-address code. It is easy to generate and translate to machine code, both for arithmetic expressions and for statements. Intermediate representations of statement: do i = i + 1; while(a[i] < v); Abstract syntax tree Compilers - Floriano Scioscia Three-address code 32 Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Intermediate code generator (2/2) • An element can be translated if it is semantically correct. • Translation requires that the run-time “meaning” of the construct is captured. • For example, the AST of a while loop contains two subtrees, one for the control expression, the other one for the body of the loop. • Nothing in the AST shows that the loop “iterates”. This meaning is captured when the AST of the while loop is translated. Translation depends on the semantics of the source language. • In the annotation of the AST by means of semantic rules and actions, the notion of checking the value of the control expression and possibly executing the body of the loop becomes explicit. • In other words, a CFG, besides defining the syntax of a programming language, can be used as a support to translation with the technique of syntax-directed translation. • To do so, the AST (and therefore the statements and expressions of a programming language) must be transformed from infix to postfix notation (where operators appear after their operands). A simple example with arithmetic expressions follows. Compilers - Floriano Scioscia 33 Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari • • Translation from infix to postfix notation A generic arithmetic expression can be described visually as a binary tree recursively containing an operator in the father nodes and the two operands in the two children nodes. The tree in the picture represents the expression A-B or B-A, according respectively to the so-called left or right infix notation in which the left (respectively, right) children is visited first. By applying the left (right) prefix notation, the tree corresponds to the string –AB (resp. –BA). Finally, by applying the left (right) postfix notation, the tree corresponds to the string AB- (resp. BA-). • Prefix and postfix notations lend themselves to unambiguous storage and execution of arithmetic expressions. Conversely, the infix notation suffers from interpretation ambiguity (hence it requires parenthesis), and therefore is less often adopted. • Postfix notation can allow the execution of the equivalent arithmetic expression if, reading the string from left to right, every operator is applied recursively to the two operands which precede it immediately. These two operands are the left and the right one if the notation is left, while the right and the left one if the notation is right. Compilers - Floriano Scioscia 34 Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari • Translation of an expression in postfix notation Considering the source expression ALPHA = Beta * (Gamma - Delta) / (Omega + Psi) the syntax analyzer will create the equivalent parse tree and then its AST, = ALPHA / * Beta + - Gamma Omega Psi Delta which is transformed by means of post-order traversal into the corresponding left postfix notation ALPHA Beta Gamma Delta - * Omega Psi + / = easily translatable to a sequence of three-address instructions. Compilers - Floriano Scioscia 35 Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Three-address code • By transforming the AST in three-address instructions, a compiler can produce an explicit representation of the intermediate code. • This code is generally independent from target architecture. It takes its name from its instructions, which have the form x = y op z , where op is a binary operator, y and z are operands and x is the location of the result of the operation. • A three-operand instruction can perform a single operation, typically a computation, a comparison or a jump. • The level of the intermediate code is generally close to machine code. example 1 ALPHA Beta Gamma Delta - * Omega Psi + / = minus Delta, , t1 add Gamma, t1 , t2 mult Beta, t2 , t3 add Omega, Psi, t1 div t3 , t1 , t2 mov t2 , , ALPHA example 2 newval := oldval * fact + 1 id1 := id2 * id3 + 1 MULT ADD MOV Compilers - Floriano Scioscia id2, id3, temp1 temp1, #1, temp2 temp2, , id1 36 Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Optimizer • Optimization aims to improve the code in order to reduce the running time and the required memory (this phase can be very complex). Example of code optimizer (for intermediate code): MULT id2, id3, temp1 ADD temp1, #1, id1 Compilers - Floriano Scioscia 37 Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Program synthesis (1/2) • Little of the nature of the target machine must be manifest during the generation and optimization of intermediate code. • Detailed information on the target machine architecture (available instructions, characteristics, etc.) are instead used in the final phase of target machine code generation. • In simple non-optimizing compilers, the translator generates the reference code directly, without using an IR. • In more sophisticated compilers, a high-level (source-oriented) IR is generated first, then the code is translated to a low-level (targetoriented) IR. • This approach allows a clear separation of dependencies derived from the source and the target. Compilers - Floriano Scioscia 38 Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Program synthesis (2/2) • The IR code is analyzed and transformed into equivalent IR code optimized for a specific architecture. • Actually, the name “optimization” is ambiguous: not always the produced code is the best possible translation of the IR code: – Some optimizations cannot be applied in some circumstances, as they are undecidable problems. For example, the problem of the removal of “dead” code cannot be solved in general. – Other optimizations are however too expensive. This concerns NP-hard problems, which are believed to be inherently exponential. Registry allocation to variables is an example of NP-hard problem. • Optimization can be complex, it can involve several steps, which may need to be executed many times. • Optimization can slow down translation. Nevertheless, a well-designed optimizer can increase the program execution speed significantly, by moving operations or deleting unneeded ones. Compilers - Floriano Scioscia 39 Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Code generation • IR code is mapped to machine code by the code generator, which produces the target language for a particular architecture. • The target program is typically a relocatable binary object containing machine code. • Example (suppose having an architecture on which at most one operand is a registry) : MOVE id2,R1 MULT id3,R1 ADD #1,R1 MOVE R1,id1 • This phase uses detailed information on the target machine and includes optimizations tied to the specific machine, such as register allocation and code scheduling. • The code generator can be rather complex, as many particular cases have to be considered in order to produce good target code. • Automated code generators can be used. The basic approach is defining templates matching low-level IR instructions with target instructions. • A famous compiler using automated code generation is GNU C, a strongly optimizing compiler exploiting description files containing machine architecture descriptions for more than ten CPU architectures and at least two languages (C and C++). Compilers - Floriano Scioscia 40 Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Symbol table(s) • The compiler uses one or more symbol table as data structures to store information about identifiers and constructs of the source language. This information is collected during the analysis phase and shared by all compilation steps. • Each time an identifier is used, the symbol table allows accessing information about it, collected when its declaration was processed. • It may seem more natural that the scanner – being the first component to process lexemes – associates a lexeme to an element of the symbol table. Actually, in many cases it is the parser, because it can distinguish among the different declarations and occurrences of an identifier. • Frequently, each program scope (e.g. in C, each code block) is associated with a specific symbol table. Compilers - Floriano Scioscia 41 Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Tool for compiler design • General-purpose software development tools + specialized tools. • Tools for the automatic design of compiler components: – scanner generators: produce lexical analyzers; – parser generators: produce syntax analyzers; – syntax-directed translator engines: produce routine collections which traverse the parse tree and generate intermediate code; – automatic code generators: they translate intermediate language to machine language through template matching rules; – data-flow engines: data-flow analyzers optimize the code by means of information about the way data are propagated among the different parts of a program. Compilers - Floriano Scioscia 42 Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Useful formalisms for compiler design • Lexical analysis • Semantic analysis – Regular Grammars – Finite State Automata – Regular Expressions – Syntax-directed translation – Attribute grammars – More sophisticated approaches • Syntax analysis • Code generation – Context-Free Grammars (although not every element in a programming language can be treated as a context-free grammar) – CFG with O(n3) or lower complexity – Push Down Automata – Pattern matching – Heuristics – Ad-hoc solutions • Symbol table – Hashing functions Compilers - Floriano Scioscia 43
Documenti analoghi
LEX, Flex, JLEX - SisInf Lab
• As already noted, if more than one possible match is found, the
longest one is taken. In the case of 2 equal-length matches, the one is
chosen which corresponds to the rule appearing earlier in t...