rustc 🔗
text -> tokens -> ast -> hir ->mir -> llvm ir -> llvm -> 1110 ThinLTO, PGO
快速看下rust语言,very fast rust primer 🔗
- low-level, no language runtime (like c/c++)
- focus on memory safety and practicality
- ownership, lefttime, borrow checking for references.
rustc architecture 🔗
- https://rustc-dev-guide.rust-lang.org
- https://internals.rust-lang.org
- compiler term: https://rust-lang.zulipchat.com
- rust-specific (rustc) frontend, llvm backend
- DAG of queries instead of passes
- many static analysis passes
- 5 code representations: AST, HIR, THIR, MIR, LLVM IR
queries instead of stages.
type_check_crate() -> list_of_all_hir_items() -> type_check_item(foo) -> type_of(foo) -> hir(foo)
increamental compilation, parallelization.
边用边看 🔗
rustc -Z help
可以查看
rustc -Z unpretty=ast-tree hello.rs rustc -Z unpretty=mir hello.rs
compiler内部的crate 🔗
rustc_main:编译器rustc主入口
rustc_interface:用来描述编译基础接口及实现;
rustc_middle: 定义type的语义,type的context。
rustc_data_structures:用描述rustc使用到的基础数据结构;
rustc_session:用来描述一个编译会话以及支持并行多会话编译;
rustc_traits:用来实现trait相关逻辑实现;
rustc_ast_lowering:用来将AST转换成HIR;
rustc_typeck:用来类型检查及转换等逻辑;
rustc_expand:用来进行宏扩展相关的实现,包括对过程宏及内嵌宏的实现等;
rustc_attr:用来对属性相关实现;
rustc_mir:用来描述MIR数据结构及相关实现;
rustc task list(rustc的任务列表) 🔗
第一阶段
1,2,3 基于AST, 能够对程序进行语法操作。
- 1 command line args, user environment, toolchain, load compilation cache
- 2 lexing, parsing
- 3 macro expansion, feature gate, various compiler magic
- compiler/rustc_ast/src/mut_visitor.rs#MutVisitor
// rustc_ast/src/ast.rs#
pub struct Crate {
pub attrs: Vec<Attribute>,
pub items: Vec<P<Item>>, //本crate下所有ast的root节点数组。
pub spans: ModSpans,
pub id: NodeId,
pub is_placeholder: bool,
}
pub struct Item<K = ItemKind> { // ast上的节点Item,
pub attrs: Vec<Attribute>,
pub id: NodeId,
pub span: Span,
pub vis: Visibility,
pub ident: Ident,
pub kind: K, // 在ItemKind上定义节点Item之间的连接关系
pub tokens: Option<LazyTokenStream>,
}
// 根据BNF中的语法规则定义每种语法的结构
pub enum ItemKind {
/// An `extern crate` item, with the optional *original* crate name if the crate was renamed.
///
/// E.g., `extern crate foo` or `extern crate foo_bar as foo`.
ExternCrate(Option<Symbol>),
/// A use declaration item (`use`).
///
/// E.g., `use foo;`, `use foo::bar;` or `use foo::bar as FooBar;`.
Use(UseTree),
/// A static item (`static`).
///
/// E.g., `static FOO: i32 = 42;` or `static FOO: &'static str = "bar";`.
Static(P<Ty>, Mutability, Option<P<Expr>>),
/// A constant item (`const`).
///
/// E.g., `const FOO: i32 = 42;`.
Const(Defaultness, P<Ty>, Option<P<Expr>>),
/// A function declaration (`fn`).
///
/// E.g., `fn foo(bar: usize) -> usize { .. }`.
Fn(Box<Fn>),
Mod(Unsafe, ModKind),
ForeignMod(ForeignMod),
Enum(EnumDef, Generics),
Struct(VariantData, Generics),
Union(VariantData, Generics),
Trait(Box<Trait>),
Impl(Box<Impl>),
}
第二阶段
4,5,6 基于HIR, 容易从AST转化得到HIR(仍然是AST), 适合的分析任务
- 4 type inference (类型推导)
- 5 type checking (类型检查)
- 6 trait solving/checking
desugar: loop, async/await.
trait Foo {...}
fn baz<T: Foo>(t: T) {...}
// 1. type checking, t should implz Foo trait
// 2. trait solving, which implementation of Foo to use
baz(3);
baz("hello");
7 基于THIR,方便从HIR过来,方便转化到MIR
- 7 pattern exhaustiveness checking
第三阶段
基于 MIR,是control flow graph。
方便borrow checking, dataflow for correctness checks and opts, monomorphize, optimize, generate code。
- Borrow checking (限制到local,scope是一个方法内部)
- Constant evaluation (after monomorphization)
- Rust-level Optimizations
- Monomorphization (泛型单态化)
- Saving compilation cache
- LLVM IR generation
Niko Matsakis MIR blog: Introducing MIR
第四阶段
- LLVM (more optimization, binary generation, linking, linking-time optimization...)
四则运算器 🔗
- 根据文法构造解释器,例如进行递归下降解析并计算
- 根据停车场算法利用stack,将infix变为postfix
fn main() {
let source = "-11+2*(3+10)+2";
let res = parser_expr(&mut Lexer::new(source.to_owned()));
println!("{}={}", source, res); //17
}
/// E -> E + T | E - T | T
/// T -> T * F | T / F | F
/// F -> (E) | number
fn parser_expr(lexer: &mut Lexer) -> i32 {
let mut i = parser_term(lexer);
while lexer.ch == '+' || lexer.ch == '-' {
if lexer.ch == '+' {
lexer.read_char();
let right = parser_term(lexer);
i += right;
} else {
lexer.read_char();
let right = parser_term(lexer);
i -= right;
}
}
i
}
fn parser_term(lexer: &mut Lexer) -> i32 {
let mut i = parser_factor(lexer);
while lexer.ch == '*' || lexer.ch == '/' {
if lexer.ch == '*' {
lexer.read_char();
i *= parser_factor(lexer);
} else {
// assert!(lexer.expect_char(b'/'));
lexer.read_char();
i /= parser_factor(lexer);
}
}
i
}
fn parser_factor(lexer: &mut Lexer) -> i32 {
if lexer.ch == '(' {
// assert!(lexer.expect_char('('));
lexer.read_char();
let res = parser_expr(lexer);
lexer.read_char();
res
} else {
let mut res = 0i32;
while lexer.ch >= '0' && lexer.ch <= '9' {
res = res * 10 + lexer.ch as i32 - '0' as i32;
lexer.read_char();
}
// -11 时 处理为 0-11, 对于-11遇到-时返回0
res
}
}
struct Lexer {
input: Vec<char>,
cur_position: usize,
ch: char,
next_position: usize,
}
impl Lexer {
pub fn new(input: String) -> Self {
let mut lexer = Self {
input: input.chars().collect(),
cur_position: 0,
ch: '0',
next_position: 0,
};
lexer.read_char();
lexer
}
pub fn read_char(&mut self) {
self.ch = self.peek_char();
self.cur_position = self.next_position;
self.next_position += 1;
}
pub fn peek_char(&mut self) -> char {
if self.next_position >= self.input.len() {
return char::MAX;
}
self.input[self.next_position]
}
}