
December 18, 2025
This guide provides a comprehensive technical deep dive into the Abstract Syntax Tree (AST) builder for the JS_CMP_LEXER project. It is intended for developers who want to understand the internal architecture, memory management, and parsing algorithms used to transform tokens into a structured syntax tree.
The AST builder transforms a linear stream of tokens into a hierarchical tree structure. This tree represents the grammatical structure of the source code and is the foundation for subsequent compilation steps (optimization, code generation).
The core of the AST is defined in include/AST/Node.hpp. We use a classical inheritance hierarchy:
Node: The abstract base class for all AST nodes. It defines the virtual interface for operations like printing and transpilation.Expr (Expression): Inherits from Node. Represents code that produces a value (e.g., 1 + 2, x, function() {}).Stmt (Statement): Inherits from Node. Represents an action or a unit of execution (e.g., if (...), var x = ..., return).We strictly use std::unique_ptr for all AST node pointers to ensure automatic memory management and clear ownership semantics.
// From include/AST/Node.hpp
class Node {
public:
using Ptr = std::unique_ptr<Node>;
virtual ~Node() = default;
// ...
};
std::move to transfer ownership of nodes during parsing (e.g., passing a parsed expression into a binary operator node).To reduce boilerplate code and maintain consistency, we use a powerful C++ macro system defined in include/AST/Node.hpp and utilized in include/AST/AST.hpp.
EXPR(ClassName, Constructor, Members...): Defines a class inheriting from Expr.STMT(ClassName, Constructor, Members...): Defines a class inheriting from Stmt.CTOR(...): A helper macro to define the constructor's initializer list.DECL(...): A helper macro to declare member variables.BinaryExprIn include/AST/AST.hpp, BinaryExpr is defined as:
EXPR(BinaryExpr,
CTOR(left(std::move(left)), op(std::move(op)), right(std::move(right))),
Expr::Ptr left, std::string op, Expr::Ptr right)
This single line expands to a full class definition equivalent to:
class BinaryExpr : public Expr {
public:
explicit BinaryExpr(Expr::Ptr left, std::string op, Expr::Ptr right)
: left(std::move(left)), op(std::move(op)), right(std::move(right)) {}
void print(size_t indent) const final;
std::ostringstream& transpile(...) const final;
static Expr::Ptr parse(Parser& parser, Expr::Ptr expr = nullptr);
Expr::Ptr left;
std::string op;
Expr::Ptr right;
};
This approach makes adding new nodes extremely fast and less error-prone.
The parsing logic is driven by the Parser class, declared in include/AST/Parser.hpp and implemented in src/AST/Parser.cpp.
We implement a Recursive Descent Parser. This means:
The entry point for parsing statements is Parser::parseStatement() in src/AST/Parser.cpp. It uses a lookahead token (peek()) to decide which specific statement parser to call.
Stmt::Ptr Parser::parseStatement() {
// ...
switch (peek().type) {
case TK_VAR: return VarDecl::parse(*this);
case TK_IF: return IfStmt::parse(*this);
case TK_FOR: return ForStmt::parse(*this);
// ...
default: return ExpressionStmt::parse(*this);
}
}
Expression parsing is layered to handle operator precedence automatically. The call hierarchy ensures that operations with higher precedence are parsed first (closer to the leaves of the tree).
Hierarchy (from lowest to highest precedence):
parseExpression (Comma)parseAssignment (=, +=, etc.)parseConditional (?:)parseLogicalOr (||)parseLogicalAnd (&&)parseMultiplicative (*, /, %)parseUnary (!, -, ++, --)parsePrimary (Literals, Identifiers, Parentheses)Each AST node is responsible for parsing itself via a static parse method. These are implemented in src/AST/AST_parser.cpp.
BinaryExpr::parseThis method is called when an operator is encountered. It takes the already parsed left-hand side as an argument.
Expr::Ptr BinaryExpr::parse(Parser& parser, Expr::Ptr expr) {
Token opTok = parser.previous(); // The operator token (e.g., '+')
// ... validation ...
Expr::Ptr right = parser.parseAssignment(); // Parse the right side
return std::make_unique<BinaryExpr>(std::move(expr), opTok.value, std::move(right));
}
VarDecl::parseThis method parses a variable declaration statement.
// (Simplified logic)
Stmt::Ptr VarDecl::parse(Parser& parser) {
parser.consume(TK_VAR, ...); // Consume 'var'
// ... parse identifier ...
// ... parse optional initializer ...
return std::make_unique<VarDecl>(...);
}
var x = 1 + 2;Let's trace exactly how the parser builds the AST for this line of code.
Parser::parseStatement() sees the var token.VarDecl::parse(*this).VarDecl::parse consumes var and the identifier x.= token, so it knows there is an initializer.parser.parseAssignment() to parse the value 1 + 2.parseAssignment calls down the chain to parseAdditive.parseAdditive calls parseMultiplicative -> parsePrimary.parsePrimary consumes the number 1 and returns a Number node.parseAdditive sees the + token.+ and calls BinaryExpr::parse passing the Number(1) node as expr.BinaryExpr::parse calls parseAssignment (recursively) to get the right side.2 into a Number(2) node.BinaryExpr::parse returns a BinaryExpr node: (left: Number(1), op: "+", right: Number(2)).VarDecl::parse receives the BinaryExpr node.VarDecl node containing the name "x" and the BinaryExpr as the initializer.The JS_CMP_LEXER AST builder combines a robust recursive descent parser with a flexible, macro-based node system. This architecture allows for easy extension of the language grammar while maintaining strict memory safety and code clarity.
For more details, explore the source code: