Correctness¶
Here we describe the theorem establishing the correctness of the Elle compiler, and sketch its proof.
Note that links in this
file are to the (frozen) ITP2019
branch of the repository, to ensure consistency of line numbers as master evolves.
Though the line numbers may change, the general ideas should not.
In Implementation, we discussed the steps taken by the Elle compiler when translating code from the Elle language to EVM, and the specifications proven about the code generated after each step. In this section we will discuss the proofs that formally establish these invariants for each step, as well as the final theorem about the end-to-end process of compilation that ties them all together.
- For each phase of the compiler (other than jump resolution), we prove that only annotations with no impact on the semantics of the program when translated to EVM are modified. This enables us to lift results about the outputs of compilation passes to results about their input programs.
- Syntax trees generated by
ll_phase1
satisfy the predicatell_valid_q
, so long as they only contain valid instructions according to theinst_valid
predicate (a predicate that rules out use of instructions that would violate Elle’s guarantees, such as arbitrary jumps not governed by Elle’s scoping mechanism). This is proved by a relatively straightforward induction on the structure of the tree. - We prove that subsequent passes of the compiler preserve
the
ll_valid_q
predicate. Because only annotations without effect on the generated code are changed in most passes, this is relatively straightforward. For jump resolution, we additionally prove that expanding jump nodes to allow for larger addresses preservesll_valid_q
via a lemma about the effect of increasing the size of a jump by 1 (ll_bump
). - We prove that the output of the second pass of the
compiler meets the predicate
ll_valid3
. This is done by means of a verified validator pass that runs after the second pass of the compiler. This pass essentially supplies an executable version ofll_valid3
: it iterates over the structure of the tree, gathering the set of label nodes that point up to each sequence node. Each sequence node annotation is then checked against the gathered label nodes to ensure the annotations match (that is, either there are 0 nodes pointing up at the sequence node in question and the annotation is[]
, or the annotation is a nonempty list corresponding to the child-path of the single node that points up to the sequence node that holds the annotation). Otherwise, the validation pass fails and the compiler produces no output. Verification of this pass involves a relatively straightforward induction on the structure of the input tree (strictly speaking, on the structure of the proof that the input tree is valid according toll_valid_q
, which mirrors the syntactic structure of the tree.) - For the third phase of the compiler (resolving and resizing jumps) we make use of another validation pass, which runs after the body of the compiler. This validator is proven to only accept input code which satisfies the predicate given in (TODO: link to code).
- After this final compilation phase, we run a validator to ensure
that the lengths of encoded jump addresses match the length annotations
on the
Jump
andJumpI
nodes; that is, that all addresses we are going to encode fit exactly into the space we have allocated for them. - Finally, we run one last validator to sanity-check the jumps contained in the final code output by Elle. This validator checks to ensure that jumps are encodable in EVM (that is, that no jumps to addresses less than 1 byte or more than 32 bytes in length are present in the code).
- At this point, we can easily dump the final, fully annotated version of the Elle syntax tree to EVM instructions. It is these instructions that the final proof of Elle’s correctness will reason over.
For details of the theorem statement, see elle_alt_correct
in elle/ElleCorrect/ElleAltSemantics.thy
.
The Corrcetness Theorem¶
This correctness theorem
can be paraphrased as follows: suppose we have an Elle
program t
that ends in a state st'
when started in state st
at node
cp
(under the Elle semantics). If the t
is valid under the valid3
predicate as well as passing jump-targets validation),
the result of dumping this Elle program to EVM bytecode yields a program that steps
from ir
(with program counter set to the program counter corresponding to the
start of the instruction pointed to by st
), it will end in a state that
will differ from ir'
only in the value of the program counter (unless insufficient
interpreter fuel has been given to the EVM interpreter, in which case a larger
value of fuel would yield such a final state - this is what is captured by the predicate
program_sem stopper prog fuel net (setpc_ir st targstart) = InstructionToEnvironment act vc venv
).
program_sem
comes from the original eth-isabelle
project on which Elle is based.
Setting Up the Proof¶
Programs produced by the Elle compiler meet the validity predicates valid3'
and pass the jump-target validator, so this means that when the output of the
Elle compiler is run in EVM semantics, the result will always be the same as
the result given by running the source Elle program under Elle’s semantics.
This establishes one direction of simulation: that Elle programs are simulated
by the EVM programs output by the Elle compiler under the conditions given above.
Because the state spaces for the input and output program semantics are
virtually the same, and both languages are deterministic (assuming the Elle
program in question is valid, but the compiler will produce None
as an
output in the case that it is not), proving this one direction is sufficient
to establish the stronger bisimulation results that are common in the
compiler-correctness literature -
informally the argument is as follows: suppose we have that the EVM
semantics steps from st
to st'
for some program prog
, that
prog
was produced from an Elle program eprog
, and that st
has
a program counter value of pc
. Unless pc
is the address-value of
the middle of a multi-word instruction (see below), pc
will be the address
of the start of an instruction, meaning there will be at least one node
in eprog
with a left-side annotation equal to this address.
Since the
Elle semantics is deterministic for valid programs, and eprog
must be valid
(or the compiler would not have produced any output program), we know that there
must exist some st''
such that eprog
steps from st
to st''
under the
Elle semantics with a childpath pointing to any node with an annotation matching
the value of pc
. (There may be more than one such node, if the node is the
first element of a sequence, but the sequence semantics implies that the
semantics of running the Elle program from any of these nodes will produce the
same result). Now, we have two cases. If st'' = st'
(with the possible exception
of the final program-counter values being different), we have the converse result
that we wanted to prove and are done. Otherwise, we can apply the correctness
theorem of Elle to get that prog
steps from st
to st''
under the EVM
semantics, a contradiction since st' != st''
(beyond just differences in program counter values) and the EVM semantics is
deterministic.
As alluded to above, there are some EVM states which do not have a corresponding state in the Elle semantics for a particular program, but these states correspond to instruction-pointer values that are invalid in the sense of not being the address of the beginning of any instruction. An EVM program executing from the beginning should never reach such a program-counter value, so it is safe not to consider such states if what we care about ultimately is the behavior of whole EVM programs run from the beginning (which it is).
Correctness of Elle: Proof Sketch¶
In order to establish the correctness theorem of Elle, we use the
induction principle for the elle_alt_sem
predicate.
This requires us to prove 11 goals (corresponding to the
11 cases in the semantics).
For the first case, we need to prove that instructions at the end of an Elle program behave the same way as running the same instruction at the end of the corresponding EVM program. This amounts to a straightforward case-analysis on possible EVM instructions, in order to demonstrate that running the instructions (followed by
elle_halt
) yields the same result as running the same instruction at the end of the corresponding EVM program.For the second case, we need to prove that program executions beginning with instructions not at the end of the program behave the same way as their corresponding EVM programs. This proof is similar to the first case, except that instead of running
elle_halt
at the end and comparing the final results, we need to appeal to our inductive hypothesis (which says that running the Elle program and EVM programs after the given instruction yield the same result.) In order to apply our inductive hypothesis, we need to prove that the states match after executing a single instruction, which is similar to the beginning to the proof of the first case.The third case is for label nodes at the end of Elle programs, its proof is almost identical to the first case (minus the exhaustive case-analysis of EVM instructions).
The fourth case is similar to the second case in the same way that the third case is to the first: again we need to appeal to an inductive hypothesis about running the remainder of the program, and doing so involves reasoning about the execution of a single JUMPDEST instruction (as in the third case).
The fifth case involves reasoning about the effects of the
Jump
Elle instruction. The core of this proof is a case-analysis on the three disjunctive cases of the specification for jump-target validity. Since we are talking about a whole tree (rather than a tree in context) we only have two cases: one where the jump in question points up to the context corresponding to the root of the tree (aSeq
node) and one where the jump points up to an intermediate node which is, in turn, a descendant of the root.In both cases, the
ll_valid3
predicate is used to show that, because the label node corresponding to the jump node’sSeq
node is unique, the Elle semantics selects a single jump target which corresponds to the jump target given by the EVM semantics of the compiled code.Additionally, some (rather tedious) reasoning is needed to prove that when the target address of the jump is serialized in to an EVM stack value and then deserialized again (to calculate the new program-counter value in the EVM semantics) the value is preserved. This mostly boils down to proving that the address in question does not overflow a 256-bit EVM integer, which is guaranteed by the fact that the address is not more than 32 bytes (shown by an additional validation pass run at the end of the compiler).
The second case is largely similar to the first, except that some extra reasoning needs to be done to show that the descended
Seq
node also satisfies thell_valid3
predicate, and then to translate the results of the reasoning on the subtree in which the jump is taking place back up to a statement about the meaning of the jump in the context of the overall syntax tree descended from the root node. For details, the reader can refer to our Isabelle formalization.The sixth case involves reasoning about the effects of the
JumpI
Elle instruction when the conditional jump is taken. This case is similar to the fifth case, except that a bit of additional reasoning is needed to prove that the jump is indeed taken.The seventh case involves reasoning about
JumpI
in the case where the conditional jump is not taken and theJumpI
instruction in question is at the end of the code. This case is similar overall to cases 1 and 3, since the semantics ofJumpI
where the jump is not taken are not dissimilar to that of the label case (decrement gas by the correct amount, increment the program counter)The eighth case involves reasoning about
JumpI
in the case where the conditional jump is not taken and theJumpI
instruction in question is not at the end of the code. This case is similar overall to cases 2 and 4.The ninth case is the case of executing an empty sequence node at the end of an Elle program. Because the empty sequence has no effect on the machine state (nor does running an empty series of EVM instructions), this corresponds to showing that the effects of running
elle_halt
matches the effect of running at the end of an EVM program. Essentially this is an easier version of cases 1, 3, and 7.The tenth case corresponds to executing an empty sequence node somewhere other than the end of an Elle program. As with cases 2, 4, and 8, this involves applying an inductive hypothesis stating that the execution behaviors of the remainder of the program are the same between the Elle an EVM versions when started in the same state. Because an empty sequence of instructions leaves the state unchanged in both the Elle and EVM versions of the program, the hypothesis almost immediately applies.
The eleventh case corresponds to running a nonempty sequence in Elle. In this case, we get an inductive hypothesis about the effect of running the given Elle program starting at the first element of that sequence. We need to show that the address in the output code corresponding to the start of the sequence in Elle is the same as that of the start of its first element, which is straightforwardly proved using auxiliary lemmas about the behavior of the
ll_valid_q
predicate on lists. Once we have this, we can apply our inductive hypothesis to complete the proof.
It should be noted that we prove two different versions of this lemma.
The first, elle_alt_correct
talks about correctness in cases where the execution of the
Elle and EVM programs terminate successfully; the second,
elle_alt_correct_fail
, deals with the
case where the
execution ends in a “crashed” state (either because the EVM stack limit was exceeded,
the EVM was not supplied enough gas to complete the execution,
or an invalid instruction was reached).