The MiniJava Compiler is a modular toolchain that translates a simplified version of Java into MIPS assembly. Its design follows the traditional phases of compilation: a lexer transforms source code into tokens; a parser uses an LL(1) approach to build an abstract syntax tree (AST); a semantic analyzer checks for type compatibility, scoping, and correct declaration usage; and a code generator produces the corresponding assembly instructions.
Figure: High-Level Compilation Pipeline of the MiniJava Compiler
The MiniJava Compiler’s lexical analyzer is the first phase of the compilation process. It reads the raw source code character by character and converts it into a structured stream of tokens that the parser later uses. The analyzer keeps track of line and column positions, which helps pinpoint errors such as unclosed strings or invalid identifiers. It identifies elements like identifiers, numeric constants, and string literals using regular expressions, normalizes identifiers to lowercase, and stores them in a string table. Comments and extraneous whitespace are skipped to focus on meaningful tokens. A high-level overview of the tokenization process is presented on the right:
function yylex():
while not EOF:
ch = getNextCharacter()
if isWhitespace(ch): updatePosition(ch); continue
if isLetter(ch): return IDENTIFIER(toLowerCase(readIdentifier(ch)))
if isDigit(ch): return INTEGER_CONSTANT(readNumber(ch))
if isQuote(ch): return STRING_CONSTANT(readString(ch))
if startsComment(ch): skipComment(); continue
return SYMBOL(ch)
return EOF_TOKEN
The grammar phase defines the formal syntax of the MiniJava language by outlining production rules that combine the tokens produced by the lexer into valid program constructs. Our Yacc file specifies how tokens form higher-level structures such as class declarations, method definitions, and expressions, which are then assembled into an Abstract Syntax Tree (AST). Each production rule explicitly defines how different language elements—like keywords, identifiers, operators, and punctuation—work together to form a correct MiniJava program. For example, the grammar includes rules that dictate the structure of a class, how methods are declared and defined, and how statements and expressions are constructed. This precise mapping from tokens to non-terminal symbols ensures that only syntactically correct programs generate an AST, which is then used in later phases such as semantic analysis and code generation.
Below is a minimal MiniJava program alongside the corresponding Abstract Syntax Tree (AST) produced by the grammar.
program test;
class xyz {
method void main() {
System.println('Hello World!!!');
}
}
+-[IDNode,0,"xyz"]
R-[ProgramOp]
| +-[IDNode,4,"test"]
| +-[ClassDefOp]
| | +-[DUMMYnode]
| | +-[CommaOp]
| | | +-[STRINGNode,29,"Hello World!!!"]
| | +-[RoutineCallOp]
| | | +-[DUMMYnode]
| | | +-[SelectOp]
| | | +-[DUMMYnode]
| | | +-[FieldOp]
| | | +-[IDNode,21,"println"]
| | +-[VarOp]
| | +-[IDNode,14,"system"]
| +-[StmtOp]
| +-[DUMMYnode]
| +-[MethodOp]
| +-[DUMMYnode]
| +-[SpecOp]
| +-[DUMMYnode]
| +-[HeadOp]
| +-[IDNode,9,"main"]
+-[ClassOp]
+-[DUMMYnode]
The Semantic Analyzer traverses the AST to ensure that the MiniJava program adheres to the language’s rules. Its abilities include, for example, verifying that every identifier (IDNode) is declared by replacing raw identifier nodes with symbol table references (STNode). For assignments (AssignOp), it recursively processes the left-hand side (a variable) and the right-hand side (an expression) and checks that their types are compatible. Routine calls (RoutineCallOp) are validated by analyzing each argument and ensuring that the function or method signature matches the declaration. Control structures, such as conditionals (IfElseOp) and loops (LoopOp), trigger the opening and closing of new scopes to enforce proper visibility of variables. Array operations (represented using a combination of IndexOp and VarOp) are similarly checked to ensure that the number of indices matches the declared dimensions. In short, each node is processed according to its operation, making the AST semantically sound before code generation begins.
A high-level pseudocode overview of the semantic analysis process is shown on the right:
function analyzeAST(node):
if node is null: return
switch node.operation:
case ProgramOp:
analyzeAST(node.left)
analyzeAST(node.right)
case ClassDefOp:
openScope()
analyzeAST(node.left)
analyzeAST(node.right)
closeScope()
case MethodOp:
openScope()
analyzeAST(node.left)
analyzeAST(node.right)
closeScope()
case DeclOp
analyzeAST(node.left)
analyzeAST(node.right)
case IDNode:
node = lookupSymbol(node.lexeme)
case AssignOp:
analyzeAST(node.left)
analyzeAST(node.right)
if not compatible(node.left.type, node.right.type):
reportError("Type mismatch")
case RoutineCallOp:
for each arg in node.arguments:
analyzeAST(arg)
validateCall(node)
case IfElseOp:
analyzeAST(node.condition)
openScope(); analyzeAST(node.thenBranch); closeScope()
if node.hasElse:
openScope(); analyzeAST(node.elseBranch); closeScope()
case LoopOp:
analyzeAST(node.condition)
openScope(); analyzeAST(node.body); closeScope()
case IndexOp:
analyzeAST(node.array)
for each index in node.indices:
analyzeAST(index)
if count(node.indices) ≠ declaredDimension(node.array):
reportError("Incorrect array dimensions")
// ... other cases omitted for brevity ...
default:
analyzeAST(node.left)
analyzeAST(node.right)
The Code Generator is the final phase of the MiniJava Compiler, translating the semantically validated AST into executable MIPS assembly code. The process begins with codegenInit()
, which establishes the assembly file’s structure by initializing the .data
and .text
segments, defining global constants (e.g., a newline string labeled Enter
), and transferring control to the runtime entry point. For example, during initialization the following instructions are generated:
.data
Enter: .asciiz "\n" # Define newline string
.text
j main # Jump to the runtime entry point
Central to the code generation process is the visit()
function. This recursive function traverses the entire AST and dispatches each node to its specialized routine based on its type. For example, when encountering a class definition node (ClassDefOp), visit()
calls visitClassDefOp()
to generate class-specific code; similarly, method definitions, assignments, and control structures are handled by visitMethodOp()
, visitAssign()
, and visitIfStmt()
, respectively. A simplified overview of this traversal is shown on the right:
function visit(node):
if node is null: return
switch node.operation:
case ProgramOp:
// Process top-level structure
visit(node.left)
visit(node.right)
case ClassDefOp:
visitClassDefOp(node)
case MethodOp:
visitMethodOp(node)
case DeclOp:
visitDeclOp(node)
case AssignOp:
visitAssign(node)
case RoutineCallOp:
visitCall(node)
case IfElseOp:
visitIfStmt(node, false_label, end_label)
case LoopOp:
visitLoop(node)
case VarOp:
visitVarOp(node)
// ... other cases omitted for brevity ...
default:
visit(node.left)
visit(node.right)
Once the AST has been fully traversed and the corresponding assembly instructions generated, codegenFinish()
defines the main
label. This label serves as the runtime entry point, loading class-specific addresses (like the singleton instance for a class) and calling the user-defined Program.main
method. After executing user logic, the program terminates gracefully. Together, these components—initialization, recursive AST traversal via visit()
, and finalization—ensure that the high-level MiniJava program is accurately converted into low-level MIPS assembly code, ready for execution on a simulator.
main:
la $s0, xyz.singleton # Load address of the singleton instance for class xyz
jal xyz.main # Jump to the user-defined xyz.main method
li $v0, 10 # Set syscall code for exit
syscall # Terminate program
See the MiniJava-Compiler GitHub repository for source code, build instructions, and to try reproducing the compiler testing with your own MiniJava code.