Here we describe the theorem establishing the correctness of the Elle compiler, and sketch its proof.

Note that links in this file are to the (frozen) ITP2019 branch of the repository, to ensure consistency of line numbers as master evolves. Though the line numbers may change, the general ideas should not.

In Implementation, we discussed the steps taken by the Elle compiler when translating code from the Elle language to EVM, and the specifications proven about the code generated after each step. In this section we will discuss the proofs that formally establish these invariants for each step, as well as the final theorem about the end-to-end process of compilation that ties them all together.

  • For each phase of the compiler (other than jump resolution), we prove that only annotations with no impact on the semantics of the program when translated to EVM are modified. This enables us to lift results about the outputs of compilation passes to results about their input programs.
  • Syntax trees generated by ll_phase1 satisfy the predicate ll_valid_q, so long as they only contain valid instructions according to the inst_valid predicate (a predicate that rules out use of instructions that would violate Elle’s guarantees, such as arbitrary jumps not governed by Elle’s scoping mechanism). This is proved by a relatively straightforward induction on the structure of the tree.
  • We prove that subsequent passes of the compiler preserve the ll_valid_q predicate. Because only annotations without effect on the generated code are changed in most passes, this is relatively straightforward. For jump resolution, we additionally prove that expanding jump nodes to allow for larger addresses preserves ll_valid_q via a lemma about the effect of increasing the size of a jump by 1 (ll_bump).
  • We prove that the output of the second pass of the compiler meets the predicate ll_valid3. This is done by means of a verified validator pass that runs after the second pass of the compiler. This pass essentially supplies an executable version of ll_valid3: it iterates over the structure of the tree, gathering the set of label nodes that point up to each sequence node. Each sequence node annotation is then checked against the gathered label nodes to ensure the annotations match (that is, either there are 0 nodes pointing up at the sequence node in question and the annotation is [], or the annotation is a nonempty list corresponding to the child-path of the single node that points up to the sequence node that holds the annotation). Otherwise, the validation pass fails and the compiler produces no output. Verification of this pass involves a relatively straightforward induction on the structure of the input tree (strictly speaking, on the structure of the proof that the input tree is valid according to ll_valid_q, which mirrors the syntactic structure of the tree.)
  • For the third phase of the compiler (resolving and resizing jumps) we make use of another validation pass, which runs after the body of the compiler. This validator is proven to only accept input code which satisfies the predicate given in (TODO: link to code).
  • After this final compilation phase, we run a validator to ensure that the lengths of encoded jump addresses match the length annotations on the Jump and JumpI nodes; that is, that all addresses we are going to encode fit exactly into the space we have allocated for them.
  • Finally, we run one last validator to sanity-check the jumps contained in the final code output by Elle. This validator checks to ensure that jumps are encodable in EVM (that is, that no jumps to addresses less than 1 byte or more than 32 bytes in length are present in the code).
  • At this point, we can easily dump the final, fully annotated version of the Elle syntax tree to EVM instructions. It is these instructions that the final proof of Elle’s correctness will reason over.

For details of the theorem statement, see elle_alt_correct in elle/ElleCorrect/ElleAltSemantics.thy.

The Corrcetness Theorem

This correctness theorem can be paraphrased as follows: suppose we have an Elle program t that ends in a state st' when started in state st at node cp (under the Elle semantics). If the t is valid under the valid3 predicate as well as passing jump-targets validation),

the result of dumping this Elle program to EVM bytecode yields a program that steps from ir (with program counter set to the program counter corresponding to the start of the instruction pointed to by st), it will end in a state that will differ from ir' only in the value of the program counter (unless insufficient interpreter fuel has been given to the EVM interpreter, in which case a larger value of fuel would yield such a final state - this is what is captured by the predicate program_sem stopper prog fuel net (setpc_ir st targstart) = InstructionToEnvironment act vc venv). program_sem comes from the original eth-isabelle project on which Elle is based.

Setting Up the Proof

Programs produced by the Elle compiler meet the validity predicates valid3' and pass the jump-target validator, so this means that when the output of the Elle compiler is run in EVM semantics, the result will always be the same as the result given by running the source Elle program under Elle’s semantics. This establishes one direction of simulation: that Elle programs are simulated by the EVM programs output by the Elle compiler under the conditions given above.

Because the state spaces for the input and output program semantics are virtually the same, and both languages are deterministic (assuming the Elle program in question is valid, but the compiler will produce None as an output in the case that it is not), proving this one direction is sufficient to establish the stronger bisimulation results that are common in the compiler-correctness literature - informally the argument is as follows: suppose we have that the EVM semantics steps from st to st' for some program prog, that prog was produced from an Elle program eprog, and that st has a program counter value of pc. Unless pc is the address-value of the middle of a multi-word instruction (see below), pc will be the address of the start of an instruction, meaning there will be at least one node in eprog with a left-side annotation equal to this address.

Since the Elle semantics is deterministic for valid programs, and eprog must be valid (or the compiler would not have produced any output program), we know that there must exist some st'' such that eprog steps from st to st'' under the Elle semantics with a childpath pointing to any node with an annotation matching the value of pc. (There may be more than one such node, if the node is the first element of a sequence, but the sequence semantics implies that the semantics of running the Elle program from any of these nodes will produce the same result). Now, we have two cases. If st'' = st' (with the possible exception of the final program-counter values being different), we have the converse result that we wanted to prove and are done. Otherwise, we can apply the correctness theorem of Elle to get that prog steps from st to st'' under the EVM semantics, a contradiction since st' != st'' (beyond just differences in program counter values) and the EVM semantics is deterministic.

As alluded to above, there are some EVM states which do not have a corresponding state in the Elle semantics for a particular program, but these states correspond to instruction-pointer values that are invalid in the sense of not being the address of the beginning of any instruction. An EVM program executing from the beginning should never reach such a program-counter value, so it is safe not to consider such states if what we care about ultimately is the behavior of whole EVM programs run from the beginning (which it is).

Correctness of Elle: Proof Sketch

In order to establish the correctness theorem of Elle, we use the induction principle for the elle_alt_sem predicate. This requires us to prove 11 goals (corresponding to the 11 cases in the semantics).

  1. For the first case, we need to prove that instructions at the end of an Elle program behave the same way as running the same instruction at the end of the corresponding EVM program. This amounts to a straightforward case-analysis on possible EVM instructions, in order to demonstrate that running the instructions (followed by elle_halt) yields the same result as running the same instruction at the end of the corresponding EVM program.

  2. For the second case, we need to prove that program executions beginning with instructions not at the end of the program behave the same way as their corresponding EVM programs. This proof is similar to the first case, except that instead of running elle_halt at the end and comparing the final results, we need to appeal to our inductive hypothesis (which says that running the Elle program and EVM programs after the given instruction yield the same result.) In order to apply our inductive hypothesis, we need to prove that the states match after executing a single instruction, which is similar to the beginning to the proof of the first case.

  3. The third case is for label nodes at the end of Elle programs, its proof is almost identical to the first case (minus the exhaustive case-analysis of EVM instructions).

  4. The fourth case is similar to the second case in the same way that the third case is to the first: again we need to appeal to an inductive hypothesis about running the remainder of the program, and doing so involves reasoning about the execution of a single JUMPDEST instruction (as in the third case).

  5. The fifth case involves reasoning about the effects of the Jump Elle instruction. The core of this proof is a case-analysis on the three disjunctive cases of the specification for jump-target validity. Since we are talking about a whole tree (rather than a tree in context) we only have two cases: one where the jump in question points up to the context corresponding to the root of the tree (a Seq node) and one where the jump points up to an intermediate node which is, in turn, a descendant of the root.

    In both cases, the ll_valid3 predicate is used to show that, because the label node corresponding to the jump node’s Seq node is unique, the Elle semantics selects a single jump target which corresponds to the jump target given by the EVM semantics of the compiled code.

    Additionally, some (rather tedious) reasoning is needed to prove that when the target address of the jump is serialized in to an EVM stack value and then deserialized again (to calculate the new program-counter value in the EVM semantics) the value is preserved. This mostly boils down to proving that the address in question does not overflow a 256-bit EVM integer, which is guaranteed by the fact that the address is not more than 32 bytes (shown by an additional validation pass run at the end of the compiler).

    The second case is largely similar to the first, except that some extra reasoning needs to be done to show that the descended Seq node also satisfies the ll_valid3 predicate, and then to translate the results of the reasoning on the subtree in which the jump is taking place back up to a statement about the meaning of the jump in the context of the overall syntax tree descended from the root node. For details, the reader can refer to our Isabelle formalization.

  6. The sixth case involves reasoning about the effects of the JumpI Elle instruction when the conditional jump is taken. This case is similar to the fifth case, except that a bit of additional reasoning is needed to prove that the jump is indeed taken.

  7. The seventh case involves reasoning about JumpI in the case where the conditional jump is not taken and the JumpI instruction in question is at the end of the code. This case is similar overall to cases 1 and 3, since the semantics of JumpI where the jump is not taken are not dissimilar to that of the label case (decrement gas by the correct amount, increment the program counter)

  8. The eighth case involves reasoning about JumpI in the case where the conditional jump is not taken and the JumpI instruction in question is not at the end of the code. This case is similar overall to cases 2 and 4.

  9. The ninth case is the case of executing an empty sequence node at the end of an Elle program. Because the empty sequence has no effect on the machine state (nor does running an empty series of EVM instructions), this corresponds to showing that the effects of running elle_halt matches the effect of running at the end of an EVM program. Essentially this is an easier version of cases 1, 3, and 7.

  10. The tenth case corresponds to executing an empty sequence node somewhere other than the end of an Elle program. As with cases 2, 4, and 8, this involves applying an inductive hypothesis stating that the execution behaviors of the remainder of the program are the same between the Elle an EVM versions when started in the same state. Because an empty sequence of instructions leaves the state unchanged in both the Elle and EVM versions of the program, the hypothesis almost immediately applies.

  11. The eleventh case corresponds to running a nonempty sequence in Elle. In this case, we get an inductive hypothesis about the effect of running the given Elle program starting at the first element of that sequence. We need to show that the address in the output code corresponding to the start of the sequence in Elle is the same as that of the start of its first element, which is straightforwardly proved using auxiliary lemmas about the behavior of the ll_valid_q predicate on lists. Once we have this, we can apply our inductive hypothesis to complete the proof.

It should be noted that we prove two different versions of this lemma. The first, elle_alt_correct talks about correctness in cases where the execution of the Elle and EVM programs terminate successfully; the second, elle_alt_correct_fail, deals with the case where the execution ends in a “crashed” state (either because the EVM stack limit was exceeded, the EVM was not supplied enough gas to complete the execution, or an invalid instruction was reached).