rustc

· 1414 words · 3 minute read

rustc 🔗

text -> tokens -> ast -> hir ->mir -> llvm ir -> llvm -> 1110 ThinLTO, PGO

快速看下rust语言,very fast rust primer 🔗

  • low-level, no language runtime (like c/c++)
  • focus on memory safety and practicality
  • ownership, lefttime, borrow checking for references.

rust book
help forum


rustc architecture 🔗

image

  • rust-specific (rustc) frontend, llvm backend
  • DAG of queries instead of passes
  • many static analysis passes
  • 5 code representations: AST, HIR, THIR, MIR, LLVM IR

queries instead of stages.

type_check_crate() -> list_of_all_hir_items() -> type_check_item(foo) -> type_of(foo) -> hir(foo)

increamental compilation, parallelization.

边用边看 🔗

rustc -Z help
可以查看

rustc -Z unpretty=ast-tree hello.rs rustc -Z unpretty=mir hello.rs

compiler内部的crate 🔗

rustc_main:编译器rustc主入口
rustc_interface:用来描述编译基础接口及实现;
rustc_middle: 定义type的语义,type的context。
rustc_data_structures:用描述rustc使用到的基础数据结构;
rustc_session:用来描述一个编译会话以及支持并行多会话编译;
rustc_traits:用来实现trait相关逻辑实现;
rustc_ast_lowering:用来将AST转换成HIR;
rustc_typeck:用来类型检查及转换等逻辑;
rustc_expand:用来进行宏扩展相关的实现,包括对过程宏及内嵌宏的实现等;
rustc_attr:用来对属性相关实现;
rustc_mir:用来描述MIR数据结构及相关实现;

rustc task list(rustc的任务列表) 🔗

第一阶段
1,2,3 基于AST, 能够对程序进行语法操作。

  • 1 command line args, user environment, toolchain, load compilation cache
  • 2 lexing, parsing
  • 3 macro expansion, feature gate, various compiler magic
    • compiler/rustc_ast/src/mut_visitor.rs#MutVisitor
// rustc_ast/src/ast.rs#
pub struct Crate {
    pub attrs: Vec<Attribute>,
    pub items: Vec<P<Item>>, //本crate下所有ast的root节点数组。
    pub spans: ModSpans,
    pub id: NodeId,
    pub is_placeholder: bool,
}
pub struct Item<K = ItemKind> { // ast上的节点Item, 
    pub attrs: Vec<Attribute>,
    pub id: NodeId,
    pub span: Span,
    pub vis: Visibility,
    pub ident: Ident,
    pub kind: K,     // 在ItemKind上定义节点Item之间的连接关系
    pub tokens: Option<LazyTokenStream>,
}
// 根据BNF中的语法规则定义每种语法的结构
pub enum ItemKind {
    /// An `extern crate` item, with the optional *original* crate name if the crate was renamed.
    ///
    /// E.g., `extern crate foo` or `extern crate foo_bar as foo`.
    ExternCrate(Option<Symbol>),
    /// A use declaration item (`use`).
    ///
    /// E.g., `use foo;`, `use foo::bar;` or `use foo::bar as FooBar;`.
    Use(UseTree),
    /// A static item (`static`).
    ///
    /// E.g., `static FOO: i32 = 42;` or `static FOO: &'static str = "bar";`.
    Static(P<Ty>, Mutability, Option<P<Expr>>),
    /// A constant item (`const`).
    ///
    /// E.g., `const FOO: i32 = 42;`.
    Const(Defaultness, P<Ty>, Option<P<Expr>>),
    /// A function declaration (`fn`).
    ///
    /// E.g., `fn foo(bar: usize) -> usize { .. }`.
    Fn(Box<Fn>),
    Mod(Unsafe, ModKind),
    ForeignMod(ForeignMod),
    Enum(EnumDef, Generics),
    Struct(VariantData, Generics),
    Union(VariantData, Generics),
    Trait(Box<Trait>),
    Impl(Box<Impl>),
}

第二阶段
4,5,6 基于HIR, 容易从AST转化得到HIR(仍然是AST), 适合的分析任务

  • 4 type inference (类型推导)
  • 5 type checking (类型检查)
  • 6 trait solving/checking

desugar: loop, async/await.

trait Foo {...}
fn baz<T: Foo>(t: T) {...}

// 1. type checking, t should implz Foo trait
// 2. trait solving, which implementation of Foo to use 
baz(3); 
baz("hello");

7 基于THIR,方便从HIR过来,方便转化到MIR

  • 7 pattern exhaustiveness checking

第三阶段

基于 MIR,是control flow graph。
方便borrow checking, dataflow for correctness checks and opts, monomorphize, optimize, generate code。

  • Borrow checking (限制到local,scope是一个方法内部)
  • Constant evaluation (after monomorphization)
  • Rust-level Optimizations
  • Monomorphization (泛型单态化)
  • Saving compilation cache
  • LLVM IR generation

Niko Matsakis MIR blog: Introducing MIR


第四阶段

  • LLVM (more optimization, binary generation, linking, linking-time optimization...)

四则运算器 🔗

  • 根据文法构造解释器,例如进行递归下降解析并计算
  • 根据停车场算法利用stack,将infix变为postfix
fn main() {
    let source = "-11+2*(3+10)+2";
    let res = parser_expr(&mut Lexer::new(source.to_owned()));
    println!("{}={}", source, res); //17
}

/// E -> E + T | E - T | T
/// T -> T * F | T / F | F
/// F -> (E) | number
fn parser_expr(lexer: &mut Lexer) -> i32 {
    let mut i = parser_term(lexer);
    while lexer.ch == '+' || lexer.ch == '-' {
        if lexer.ch == '+' {
            lexer.read_char();
            let right = parser_term(lexer);
            i += right;
        } else {
            lexer.read_char();
            let right = parser_term(lexer);
            i -= right;
        }
    }
    i
}

fn parser_term(lexer: &mut Lexer) -> i32 {
    let mut i = parser_factor(lexer);
    while lexer.ch == '*' || lexer.ch == '/' {
        if lexer.ch == '*' {
            lexer.read_char();
            i *= parser_factor(lexer);
        } else {
            // assert!(lexer.expect_char(b'/'));
            lexer.read_char();
            i /= parser_factor(lexer);
        }
    }
    i
}

fn parser_factor(lexer: &mut Lexer) -> i32 {
    if lexer.ch == '(' {
        // assert!(lexer.expect_char('('));
        lexer.read_char();
        let res = parser_expr(lexer);
        lexer.read_char();
        res
    } else {
        let mut res = 0i32;
        while lexer.ch >= '0' && lexer.ch <= '9' {
            res = res * 10 + lexer.ch as i32 - '0' as i32;
            lexer.read_char();
        }
        // -11 时 处理为 0-11, 对于-11遇到-时返回0
        res
    }
}

struct Lexer {
    input: Vec<char>,
    cur_position: usize,
    ch: char,
    next_position: usize,
}
impl Lexer {
    pub fn new(input: String) -> Self {
        let mut lexer = Self {
            input: input.chars().collect(),
            cur_position: 0,
            ch: '0',
            next_position: 0,
        };
        lexer.read_char();
        lexer
    }

    pub fn read_char(&mut self) {
        self.ch = self.peek_char();
        self.cur_position = self.next_position;
        self.next_position += 1;
    }

    pub fn peek_char(&mut self) -> char {
        if self.next_position >= self.input.len() {
            return char::MAX;
        }
        self.input[self.next_position]
    }
}

regex 🔗

json 🔗