Wednesday, April 22, 2020

Precise gas metering

  The goal of gas metering in next generation blockchains, such as NEAR, is to provide mutually acceptable by validators and users cost of elementary computations, such as hashing, storage access, and smart contract regular execution. Previously, NEAR used gas metering calibration based on time elapsed, but this approach is tied to the hardware used, affected by the OS scheduler behavior and system load, and is not very stable or reproducible. Instead, we decided to come up with the completely transparent and reproducible approach to the gas metering, which could be reproduced and understood by any NEAR blockchain user.

 What could be such a metric, if we are talking about computation intensive operations? The number of CPU instructions produced by an optimizing compiler required for performing certain action looks like sensible and reproducible metrics. But how could we count that? Of course, modern CPU has counters, like RDTSC, which shows the number of CPU clock generator ticks at the moment, but it doesn't properly account for task switching, different CPU microarchitectures (so that same instruction may take different number of ticks) and other instability sources. And we want to be precise and stable!

  Fortunately, QEMU emulator has the mode for running only userland part of the program. We likely do not care about the kernel price of operation, as it mostly reflects the kernel mechanisms, not the computational complexity of what we are doing. And moreover, recent QEMU provides API for instrumenting QEMU's dynamic translator named TCG (Tiny Code Generator).

 We can now combine those technologies and implement the exact CPU instructions measurement technology suitable for arbitrary Linux program. Plugin need to instrument all code generated by TCG, when requested — start counting, and on another request — stop counting and report back the number of operations performed since counting started. Note that we need to always instrument code produced by the TCG, as it maintains the compiled code cache and it is possible (and highly likely) that code participating in the certain operation (for example, malloc(3) implementation) is already compiled at the moment we want to start counting. We also need to capture the moment when code is ready to start and stop counting. Easiest way to achieve those two goals is to install listeners on code compilation by TCG and on system call execution.
Following code install those two handlers:



In TCG instrumentation callback we do the following operation:


i.e. on every instruction translated by TCG we register execution callback which generates call to function vcpu_insn_exec_before before executing the code emulating the actual instruction.

And the callback itself is implemented as:


We do compare the current thread with the one we remembered earlier to only count instructions executed on the particular thread, as QEMU in usermode emulation maps 1:1 guest to host threads.

Another interesting problem is how do we active and deactivate counters, and return back the results.
It is achieved by the syscall listener we installed during plugin installation function.


This function is invoked when any system call is executed. Here we decided to take syscall number 0 (implementing read(2) in Linux kernel).  and check if the file descriptor argument (first argument a1) is equal to the magic constant CATCH_BASE, which is the huge number 0xcafebabe, so it's very unlikely the actual file descriptor of normal read(2) call will be equal to that value.

It means, if the guest code will execute code like this one:


we will start counting. We stop counting in the similar manner, with an operation like:


and here the matching code in the plugin does an interesting trick:


So we look at the third argument of the syscall a3 (which is the buffer size), and if it is equal to 8, we use the second argument a2 (pointer to the buffer) to store the number of instructions counted there. It works, as QEMU uses 1:1 address space translation between the guest and the host, so pointer passed from the guest is the valid pointer in the simulator.

As the result, we are able to obtain reproducible (modulo compiler and libraries versions) instruction count from arbitrary Linux userland program and successfully use the obtained instruction counter to measure the gas cost for the NEAR blockchain.

 Of course, life would be too easy, if it just worked out of the box, and my previous post describes the QEMU memory leak which had to be fixed before the whole gas price estimator can be run.

 Afterwards, the described technique is pretty generic and could be used for fair gas price estimation in other blockchains and in other scenarios, where exact instruction cost of a userland program had to be figured out!

No comments:

Post a Comment