Good introductory text about GHC implementation?

前端 未结 2 1242
被撕碎了的回忆
被撕碎了的回忆 2020-12-12 09:40

When programming in Haskell (and especially when solving Project Euler problems, where suboptimal solutions tend to stress the CPU or memory needs) I\'m often puzzled why th

2条回答
  •  一整个雨季
    2020-12-12 09:53

    The majority of the technical information about the architecture and approach of the GHC system is in their wiki. I'll link to the key pieces, and some related papers that people may not know about.

    What typical optimizations are applied?

    The key paper on this is: A transformation-based optimiser for Haskell, SL Peyton Jones and A Santos, 1998, which describes the model GHC uses of applying type-preserving transformations (refactorings) of a core Haskell-like language to improve time and memory use. This process is called "simplification".

    Typical things that are done in a Haskell compiler include:

    • Inlining;
    • Beta reduction;
    • Dead code elimination;
    • Transformation of conditions: case-of-case, case elimiation.
    • Unboxing;
    • Constructed product return;
    • Full laziness transformation;
    • Specialization;
    • Eta expansion;
    • Lambda lifting;
    • Strictness analysis.

    And sometimes:

    • The static argument transformation;
    • Build/foldr or stream fusion;
    • Common sub-expression elimination;
    • Constructor specialization.

    The above-mentioned paper is the key place to start to understand most of these optimizations. Some of the simpler ones are given in the earlier book, Implementing Functional Languages, Simon Peyton Jones and David Lester.

    What is the execution order when there are multiple candidates for evaluation

    Assuming you're on a uni-processor, then the answer is "some order that the compiler picks statically based on heuristics, and the demand pattern of the program". If you're using speculative evaluation via sparks, then "some non-deterministic, out-of-order execution pattern".

    In general, to see what the execution order is, look at the core, with, e.g. the ghc-core tool. An introduction to Core is in the RWH chapter on optimizations.

    How are thunks represented?

    Thunks are represented as heap-allocated data with a code pointer.

    Heap object

    See the layout of heap objects. Specifically, see how thunks are represented.

    How are the stack and the heap used?

    As determined by the design of the Spineless Tagless G-machine, specifically, with many modifications since that paper was released. Broadly, the execution model:

    • (boxed) objects are allocated on the global heap;
    • every thread object has a stack, consisting of frames with the same layout as heap objects;
    • when you make a function call, you push values onto the stack and jump to the function;
    • if the code needs to allocate e.g. a constructor, that data is placed on the heap.

    To deeply understand the stack use model, see "Push/Enter versus Eval/Apply".

    What is a CAF?

    A "Constant Applicative Form". E.g. a top level constant in your program allocated for the lifetime of your program's execution. Since they're allocated statically, they have to be treated specially by the garbage collector.


    References and further reading:

    • The GHC Commentary
    • The Spinless Tagless G-machine
    • Compilation via Transformation
    • Push/Enter vs Eval/Apply
    • Unboxed Values as First-Class Citizens
    • Secrets of the Inliner
    • Runtime Support for Multicore Haskell

提交回复
热议问题