OCaml PPX
DaiLambda, Inc.
Jun FURUSE/古瀬 淳
Kyoto, 2021-04-09

What is PPX?

OCaml PreProcessor eXtension framework

What is PPX?

PPX is a preprocessor:

source code (*.ml, *.mli)

    ↓   Preprocessor (PPX, CamlP4, m4)

source code (*.ml, *.mli, or AST binary)

    ↓   OCaml compiler
          * Parsing
          * Typing
          * Compilation

object file, executable (*.cm*, *.exe)

PPX: AST transformer

Syntax tree → Syntax tree

Not text to text pp like m4 and cpp

Input
Must be a parsable OCaml AST (data structure of parsed code)
Output
Another parsable OCaml AST
No typing
No type-checking of the input and output.
They need not be well-typed.

Applications

Macros
#ifdef, #include
Syntax
123456z for Z.of_string "123456"
Monadic let
Typed regex:
  regex "(?P<name>regex)" : <name : string> regex
Code generation
Deriving printers, parsers, etc from type definitions.
type t = Foo [@@deriving show]
Etc, etc..
Your new PPX ideas come here.

Limitations

It does not extend OCaml syntax
PPX does not change the OCaml syntax.
Attributes and extension points for things out of the language.
No type information is available

PPX tends to be context-free.

  • No general way to know the types of expressions.
  • Hard to access the definition of variables.
  • Even worse, the input can be ill-typed.

(You can type-check the input in PPX, if you dare…)

OCaml syntax for PPX

Attributions and extension points

Attributes [@...]

OCaml syntax has a way to postfix annotations:

42 [@the_answer]          (* for expression *)

let x = 42 [@@the_answer] (* for toplevel *)

[@@@the_module]           (* the entire file *)

Removing attributes from a parsable OCaml code is still parsable.

ocamlc ignores most of the attributes,
except [@warning] [@tailcall] [@@inline], etc.

Attributes can give hints to PPX.

Attributes at infix positions

Postfix looks ugly sometimes. There are some infix ways:

Infix Postfix equivalent
let [@a] x = .. in .. let x = .. [@@a] in ..
match [@a] x with .. (match x with ..) [@a]
if [@a] e then .. (if e then ..) [@a]
struct [@a] .. end (struct .. end)[@a]
module [@a] X = .. module X = .. [@@a]
.. ..

Extension points [%...]

Embed something out of OCaml:

[%the_answer] + 42        (* to be an expression *)

module X = struct
  [%%self_destruct]       (* to be a toplevel *)
end

PPX must replace the extension points with normal code,
otherwise they are rejected at OCaml’s typing:

Error: Uninterpreted extension 'self_destruct'.

Payload

Each attribute and extension point can have a payload.

It is an OCaml expression, structure, type, signature, or pattern:

[@foo 42]                  (* expression and structure *)
[%bar fun x -> M.y ()]
[@pee let x = 1;; let y = 2]

[@foo: int -> int]         (* type and signature *)
[%bar: [`Foo]]             (*   prefixed with : *)
[@@baz: val x : int]

[@foo? M.Some _]           (* pattern *)
[%bar? 'a'..'z']           (*   prefixed with ? *)
[@@@boo? 1 | 2 when true]

They must be parsable but types are not checked.

Payload gives additional information to PPX.

Writing PPX

How to handle AST

PPX code transformer

PPX must implement 2 functions:

Transformer for *.ml
structure -> structure
Transformer for *.mli
signature -> signature

The types are defined in OCaml compiler source code.

Parsetree

The main module for the parsed AST:
$OPAM_SWITCH_PREFIX/lib/ocaml/compiler-libs/parsetree.mli
or $OPAM_SWITCH_PREFIX/.opam-switch/build/ocaml-base-compiler.4.xx.y/parsing/parsetree.mli

Data types:

  • structure: module implementation
  • signature: signature declaration
  • expression
  • pattern
  • etc..

AST transformation tools

Also available in compiler-libs.common:

  • Ast_helper : builders
  • Ast_iterator : iterator
  • Ast_mapper : mapper for recursive code transformation
  • Attr_helper : attribute handling
  • etc..

AST mapper

Build ty -> ty for various types of Parsetree.

type mapper = { <ty> : mapper -> ty -> ty ; .. }

val default : mapper  (* identity mapper *)

Your own mapper inheriting default:

let super = default

let rec structure_item self sitem =
  match sitem.pstr_desc with
  | Pstr_eval (e, atrs) -> .. expr self e ..
  | _ -> super.structure_item self sitem (* recursive! *)

and expr self e = ..

let my_mapper = { super with structure; expr } 

(ppxlib provides class based mapper.)

PPX at the bottom

PPX must be compiled to an executable:

  • Takes an AST of *.ml or *.mli as an input
  • Outputs a transfromed AST

PPX executable must be given to ocamlc:

$ ocamlc -ppx ./ppx_my_own.exe code.ml

But do not do it by hand. Use ppxlib and Dune.

ppxlib

Writing and using PPX easier.

PPX complications

  • Need a “driver” to make an exec from the transformers
  • Using multiple PPX execs slows down the compilation.
  • More tools for Parsetree required.

ppxlib

Now a defacto standard library for PPX.

  • PPXs to dynamic linking libraries
  • 1 PPX driver exec linked with multiple PPX dynlibs
  • Integration to Dune build system
  • Some more Parsetree tools
    • Ast_traverse: class based AST mapper/iterator/fold

ppxlib example

PPX which does nothing:

open Ppxlib

let impl : Ast.structure -> Ast.structure = fun x -> x
let intf : Ast.signature -> Ast.signature = fun x -> x

let () = Ppxlib.Driver.register_transformation
  ~impl: impl
  ~intf: intf
  "ppx_my_own"

Build with ppxlib

; dune
(library
 (kind ppx_rewriter)        ; kind: ppx_rewriter
 (name ppx_my_own)
 ; (public_name ppx_my_own) ; will need ppx_my_own.opam
 (libraries ppxlib))        ; library: ppxlib
$ dune build ppx_my_own.cma

Use PPX in your library

(library
 (name my_cool_library)
 (public_name my_cool_library)
 (preprocess (pps ppx_my_own        ; <== add this
                  ppx_not_my_own))  ;
 (libraries whatever))

PPX with ppxlib can be linked together with (pps ..)

Ast_traverse

AST mapper, iterator, and folder

open Ppxlib

class my_map = object 
  inherit Ast_traverse.map as super (* default *)
  method! expression e = match e.desc with 
    | ... (* transform e *)
    | _ -> super#expression e (* visit recursively *)
end

let () = Ppxlib.Driver.register_transformation
  ~impl: (new my_map)#structure
  ~intf: (fun i -> i)
  "ppx_my_own"

Ast_traverse: replace int consts by 42

class my_map = object
  inherit Ast_traverse.map

  method! constant c = 
    match c with
    | Pconst_integer _ -> 
        Pconst_integer ("42", None)
    | _ -> c
end

Metaquote: “PPX for PPX authors”

Conversome to write AST for 42 + 1:

Ast_builder.(eapply (evar "+") [eint 42; eint 1])

ppxlib.metaquot: useful extensions to lift ASTs:

[%expr 42 + 1]

More constructs:

let ast = [%expr 1] in [%expr 42 + [%e ast]]

[%stri let () = print_string "hello"]

[%type: int -> float]

Build with ppxlib.metaquot

; dune
(library
 (kind ppx_rewriter)
 (name ppx_my_own)
 ; (public_name ppx_fmy_own)
 (preprocess (pps ppxlib.metaquot))  ; ppxlib.metaquot
 (libraries ppxlib))

ppxlib.metaquot: add 42 to int constants

class my_map = object
  inherit Ast_traverse.map as super

  method! expression e =
    match e.pexp_desc with
    | Pexp_constant (Pconst_integer _) ->
        let loc = e.pexp_loc in
        [%expr [%e e] + 42]
    | _ -> 
       (* do not forget to recurse *)
       super#expression e 
end

Sample code

Available at https://github.com/camlspotter/ppx_my_own

Tips

Random tips for PPX

Using ppxlib and compiler-libs

ppxlib exposes Parsetree related modules.
In most cases, you need not use compiler-libs directly.

If you need use compiler-libs with ppxlib

The AST types of ppxlib are wrapped.
They are incompatible with those of compiler-libs.

Parsetrees conversion required between them

Ppxlib_ast.Selected_ast.to_ocaml
ppxlib  →  compiler-libs
Ppxlib_ast.Selected_ast.of_ocaml
compiler-libs  →  ppxlib

-dsource and -dparsetree

Good to see how things are parsed:

$ ocaml -dsource
# let [@a] x [@b] = 1 + 1 [@c] [@@d];;
let ((x)[@b ]) = ((1 + 1)[@c ])[@@a ][@@d ];;
val x : int = 2

$ ocaml -dparsetree
# 1;;
Ptop_def
  [
    structure_item (//toplevel//[1,0+0]..[1,0+1])
      Pstr_eval
      expression (//toplevel//[1,0+0]..[1,0+1])
        Pexp_constant PConst_int (1,None)
  ]

- : int = 1

Check PPX output using ppxlib

NO easy way for the moment!

  • Write a test:

    (tests
     (names test)  ; Write test.ml
     (preprocess (pps my_own_ppx)))
  • Build a preprocessed file *.pp.ml* and find it:

    $ dune build --verbose ./test.pp.ml
    ...
    - _build/default/ppx_simple/tests/test.pp.ml
  • Print the pp’ed file using ocamlc -dsource:

    $ ocamlc -dsource ../../_build/default/ppx_simple/tests/test.pp.ml

Debug print inside PPX

Use Pprintast.* for -dsource output:

Format.eprintf "%a@." Pprintast.structure str

or Ocaml_common.Printast.* for -dparsetree output:

let indent = 0 in
Format.eprintf "%a@." 
  (Ocaml_common.Printast.structure indent) str

Merlin

Type inspection, definition jump and more. Use it already!

let impl sitems =
  List.map (fun { pstr_desc; _ } ->
      match pstr_desc with
      | _(*<-cursor*) -> assert false) sitems

Pattern destruction (C-c C-d in Emacs):

let impl sitems =
  List.map (fun { pstr_desc; _ } ->
      match pstr_desc with
      | Ppxlib.Pstr_eval (_, _)|Ppxlib.Pstr_value (_, _)|Ppxlib.Pstr_primitive _
      |Ppxlib.Pstr_type (_, _)|Ppxlib.Pstr_typext _|Ppxlib.Pstr_exception _
      |Ppxlib.Pstr_module _|Ppxlib.Pstr_recmodule _|Ppxlib.Pstr_modtype _
      |Ppxlib.Pstr_open _|Ppxlib.Pstr_class _|Ppxlib.Pstr_class_type _
      |Ppxlib.Pstr_include _|Ppxlib.Pstr_attribute _|Ppxlib.Pstr_extension 
        (_, _) -> assert false) sitems