Optz is an optimizer of Michelson programs, the base language for Tezos smart contract. It takes a Michelson code as input and then returns a smaller and more efficient equivalent code.
Optimization of Michelson opcode sequences
The optimization consists of 2 parts: rule-based transformation and optimal stack manipulation search by A* algorithm.
Rule based code transformation
Dozens of rewriting rules replace opcodes with better ones, such as:
PAIR; CDR => DROP
DIP n { a }; DIP n { b } => DIP n { a; b }
SWAP; op => op // when op is a commutative binary operator
NOT; IF { a } { b } => IF { b } { a }
Search for the optimal stack manipulation sequences
After the rule-based transformation, A* algorithm finds the most optimal sequence of stack manipulation opcodes such as DROP
, SWAP
, DIG
, DUG
, DIP
, DUP
, CAR
, CDR
, PAIR
, UNPAIR
, and the push only opcodes like PUSH
and EMPTY_SET
. For example, the following opcode sequence found in an existing contract in Tezos mainnet:
DIG 1; DUP 1; DUG 2; PUSH c0; DIG 2; PUSH c1; DIG 4
is replaced with a shorter equivalent one:
PUSH c0; DIG 1; PUSH c1; DUP 4
The search space explodes exponentially when the range of the stack touched by the target opcodes and the number of constants in it grows. To finish the optimization in a reasonable time, the maximum size of the search space is introduced.
Special case for drop-only sequence
Compilers from higher-level languages to Michelson clean the stack when the program execution exits each functional and variable scope. We have observed some compilers tend to produce very long inefficient stack cleaning opcode sequences. They are too big for the above A* algorithm to optimize.
To optimize these stack cleaners, a special case of stack manipulation optimization is introduced for “drop only sequences”, which never push nor reorder stack elements but only drop some of them. The optimal opcodes are easily constructible for these drop-only sequences, therefore no size restriction is necessary. For example, the following code found in a smart contract in Tezos Mainnet:
DROP 2; DIG 1; DROP 1; DIG 1; DROP 1; DIG 3; DROP 1; DIG 3; DROP 1; DIG 4; DROP 1; DIG 4; DROP 1; DIG 4; DROP 1; DIG 4; DROP 1; DIG 4; DROP 1; DIG 4; DROP 1
can be optimized to the following much shorter efficient code:
DROP 2; DUG 2; DROP 2; DIG 3; DIG 4; DROP 2; DIP 4 { DROP 6 }
Optimization of Tezos Mainnet contracts
We have tested Optz against 1711 different Michelson contracts deployed on Tezos Mainnet between 2021-02-18 and 2022-02-17.
All the contracts are optimized within 1 second by a laptop (Mac Book Pro 13-inch 2017) and the output Michelson encoding size is 92.8% of the inputs in average. Some are reduced by over 30%:
The optimization performance highly depends on how hard the original high-level language to Michelson compilers optimizes its code.
What we learned
The drop-only sequence optimization greatly reduces the code sizes of outputs of some compilers. We recommend these compilers implement this optimization. It is cheap since the optimal opcodes can be constructed without a heavy A* search. All we need is a stack operation simulator.
Presentation
For in-depth explanation, slides are available:
Try Optz
Web app
You can try the web interface of Optz at https://dailambda.jp/optz-js. It is compiled to JavaScript and optimizes the input in your web browser.
Docker image
You can also use our Docker image. Use the following script as optz.sh
:
#!/bin/sh
docker run --rm \
-v `pwd`:/work dailambda/scaml:master \
/root/.opam/4.12.1/bin/optz $1
then give your script at the current directory to the script to optimize:
$ ./optz.sh your_contract.tz
# optimized: bytes: 2516 -> 2438
parameter
(...
Source code
The source code is available at https://gitlab.com/dailambda/scaml. Optz was originally written as the last phase of optimization in SCaml compiler, an OCaml backend for Michelson. Since this phase is a pure conversion from Michelson to Michelson, we made it an independent tool. The build is quite straightforward if you have already set up OPAM environment correctly:
$ git clone https://gitlab.com/dailambda/scaml
$ cd scaml
$ opam switch create . ocaml-base-compiler.4.12.1 --deps-only -y
$ dune build
$ dune build install
Related work
The idea of A* algorithm to optimize Tezos stack operations was originally published by Arthur Breitman. We did not notice this post when we started writing the optimizer using simple Dijkstra search. Later we extended our algorithm to A* and reduced the optimization times of some very heavy cases in Dijkstra.
Updates (2022-05-06): Pair support of the optimal stack operation search
Optz’s optimal stack operation search now supports CAR
, CDR
, PAIR
, and UNPAIR
opcodes. For example, UNPAIR; SWAP; DROP
is now rewritten to CAR
. The longest rewrite we found in Mainnet contracts is as follows:
DUP 1; CAR; DIP 1 { CDR }; DUP 1; CAR; DIP 1 { CDR }; DUP 1; CAR; DIP 1 { CDR; } DIG 2; DUP 1; CAR; DIP 1 { CDR }; DIG 4; DUG 2
=> UNPAIR; UNPAIR; UNPAIR; DIG 3; DIG 3; UNPAIR
This support of the pair opcodes improves the average reduction rate of optimization from 94.0% to 92.8%.
UNPAIR; PAIR => NOP
is not always valid rewriting
At a first glance, the opcode sequence UNPAIR; PAIR
is trivially contractable to NOP, but it is not always valid due to the field annotation (%name
) of Michelson, therefore special care is required.
UNPAIR; PAIR
clears the field annotations of the pair type of the top stack element: ex. pair (int %x) (int %y) => pair int int
. Once the field annotations are removed, the pair type becomes compatible with other pair types with any field annotations, for example, pair (int %a) (int %b)
. If UNPAIR; PAIR
is removed, the result stack type keeps the original field annotations and becomes no longer compatible with pair types with other field annotations.
Indeed, we have found several Michelson codes in Tezos Mainnet whose UNPAIR; PAIR
cannot be removed away. For example,
IF { DIP { DROP } } { DROP; UNPAIR; PAIR }
is well-typed under the initial stack type pair (int %a) (int %b); pair (int %x) (int %y); A
and the result typestack type is pair (int %a) (int %b); A
. If UNPAIR; PAIR
is removed from the code, it is no longer well typed under the initial stack type, since pair (int %a) (int %b)
and pair (int %x) (int %y)
are incompatible.
It requires type analysis to know exactly whether UNPAIR; PAIR
and similar sequences are safely contractable, but Optz’s optimal stack operation search does not use types. Instead, it uses a conservative workaround: its rewriting has the following restriction: if a pair is formed by PAIR
in the input opcode sequence, it must be also created by PAIR
in the output. It makes sure the pair has a type without field annotations.