Benchmarking

The default Substrate block production systems produce blocks at consistent intervals. This is known as the target block time. Given this requirement, Substrate based blockchains are only able to execute a limited number of extrinsics per block. The time it takes to execute an extrinsic may vary based on the computational complexity, storage complexity, hardware used, and many other factors. We use generic measurement called weight to represent how many extrinsics can fit into one block.

In Substrate, 10^12 weight units = 1 second, or 1,000 weight units = 1 nanosecond. This is measured on specific reference hardware: Intel Core i7-7700K CPU with 64GB of RAM and an NVMe SSD.

Substrate does not use a mechanism similar to "gas metering" for extrinsic measurement due to the large overhead such a process would introduce. Instead, Substrate expects benchmarking to provide an approximate maximum for the worst case scenario of executing an extrinsic. Users are charged assuming this worst case scenario path was taken, and if the extrinsic turns out needing less resources, some of the estimated weight and fees can be returned. This is further explained in the Transaction Weights and Fees chapter.

Why benchmark a pallet

Denial-of-service (DoS) is a common attack vector for distributed systems, including blockchain networks. A simple example of such an attack would be for a user to repeatedly execute an extrinsic that involve intensive computation. To prevent users from spamming the network, we charge fee to the user for making that call. The cost of the call should reflect the computation and storage cost incurred to the system such that the more complex the call, the higher the fee. However, we still want to encourage users to use our blockchain system, so we also want this estimated cost to be relatively accurate so we don't charge users more than necessary.

Benchmarking allows developers to charge appropriate transaction fees to end users, corresponding to a more accurate representation of the cost an extrinsic has on the system. Setting a proper weight function that accurately reflects the underlying computation and storage is also an important security safeguard in Substrate.

How to benchmark

The FRAME benchmarking module has a set of tools to help determine the worst case scenario computation time of runtime extrinsics in order to determine appropriate weights for those extrinsics. It does this by executing a pallet's extrinsics multiple times within a mock runtime environment, and keeps track of the execution time.

In summary, it:

Sets up and executes extrinsics from your pallets.
Captures the raw data of these benchmarks over with these varying inputs, including how many database reads and writes have been performed.
Uses linear regression analysis to determine the relationship between computation time and the extrinsic input.
Outputs a Rust file with ready to use weight functions that can be easily integrated in your runtime.

This framework uses Rust macros to help developers easily integrate benchmarking into their runtimes. A Substrate benchmark will look something like this:

benchmarks! {
  benchmark_name {
    /* set-up initial state */
  }: {
    /* the code to be benchmarked */
  } verify {
    /* verifying final state */
  }
}

You can see that the benchmark macro:

Sets up the initial state before running the benchmark.
Measures the execution time, along with the number of database reads and writes.
Verifies the final state of the runtime, ensuring that the benchmark executed as expected.

You can configure your benchmark to run over different varying inputs. For each input, you can configure the range of those variables, and use them within the benchmark set-up or execution logic.

The full syntax and functionality can be seen in the benchmarks! macro API documentation.

Best practices and common patterns

There are a few best practices for writing extrinsics in order to avoid any surprise on your extrinsics computation time and benchmarking.

Initial weight calculation must be lightweight

Extrinsic weight function will be called, probably multiple times, when an extrinsic is going to be called. So the weight function must be lightweight and should not perform any storage read/write, an expensive operation.

Set bounds and assume worst case

If the extrinsic computation time depends on an existing storage value, then set a maximum bound on those storage items and assume the worst case. Once the actual weight is known, the difference can be returned in the extrinsic.

Keep extrinsics simple

Try to keep an extrinsic simple and to perform only one function. Sometimes it may be better to separate the complex logic into multiple extrinsic calls and have a front-end abstracting these extrinsic interactions away to provide a clean and friendly user experience.

Separate benchmarks per logical path and use the worst case

If your extrinsic has multiple logical paths with significantly different execution time, separate these paths in multiple benchmarking cases and measure them. In the actual pallet weight macro above the extrinsics, you could combine them with a max function, e.g.

#[pallet::weight(
  <T::WeightInfo::path_a()
    .max(T::WeightInfo::path_b())
    .max(T::WeightInfo::path_c())>
)]

Note the weight returned here is more as the worst case of the weight estimate. You can then decide if you want to return some weight value back at the end of the extrinsic once you know what computations have happened. Otherwise it will be always overcharging users for calling this extrinsic.

Minimize usage of `on_finalize`, and transition logic to `on_initialize`

Substrate provides runtime developers with multiple hooks when writing their own runtime, with on_initialize and on_finalize being two of them. As on_finalize is the last thing that happens in a block, variable weight requirements in on_finalize can easily lead to an overweight block and should be avoided.

If possible, move the logic to on_intialize hook that happens at the beginning of the block. Then the number of extrinsics to be included in the block can be adjusted accordingly. Another trick is to put the weight of on_finalize on to on_initialize or the extrinsic itself. This leads to another tip of trying to keep the pallet hook execution in constant time for only set up and clean up, but not doing fancy computation.

Command arguments

To find out what the different CLI commands do, run:

 cargo build --features runtime-benchmarks --help

Here's an example of launching a node with benchmarking features enabled:

 ./target/release/node-template benchmark \
    --chain dev \               # Configurable Chain Spec
    --execution wasm \          # Always test with Wasm
    --wasm-execution compiled \ # Always used `wasm-time`
    --pallet pallet_example \   # Select the pallet
    --extrinsic '*' \          # Select the benchmark case name, using '*' for all
    --steps 20 \                # Number of steps across component ranges
    --repeat 10 \               # Number of times we repeat a benchmark
    --json-file=raw.json \      # Optionally output json benchmark data to a file
    --output ./                 # Output results into a Rust file

A recent argument that has been introduced is --template. With it, you can specify your own weight template file and the benchmarking toolchain will fill in the exact numbers from the measured result. This enables automating weight generation in your desired code format and integrates this in your CI process.

(--raw has recently been replaced with --json and --json-file)

The template is in rust handlebars format.

Example

Let's take accumulate_dummy benchmark case from the example pallet as an example.

accumulate_dummy {
  let b in 1 .. 1000;
  let caller = account("caller", 0, 0);
}: _ (RawOrigin::Signed(caller), b.into())

Using the benchmarking CLI, we can specify the number of steps and repeats. This means how many steps will be taken to walk through each variable range, and how many times the execution state will be repeated.

For example:

./target/release/node-template benchmark \
    --chain dev \
    --execution wasm \
    --wasm-execution compiled \
    --pallet pallet_example \
    --extrinsic '\*' \
    --steps 20 \
    --repeat 10 \
    --json \
    --output ./pallets/example/weights.rs

With --steps 20 --repeat 10 in the benchmark input arguments, b will walk 20 steps to reach 1,000, so b will start from 1 and increment by about 50. For each value of b, we will execute the benchmark 10 times and record the benchmark information. The resulting weights will be outputted into a weights.rs file.

multi-variables-regression

Raw data output

The first output is the raw data recording how much time is spent on running the execution state when varying the input variables. At the end for each variable, the coefficient, assuming linear relationship, between the execution time with respect to change in the variable is determined.

This is a snippet of the output:

Pallet: "pallet_example", Extrinsic: "accumulate_dummy", Lowest values: [], Highest values: [], Steps: [10], Repeat: 5
b,extrinsic_time,storage_root_time,reads,repeat_reads,writes,repeat_writes
1,1231926,162640,1,4,1,2
1,1245146,128021,1,4,1,2
1,1238746,126051,1,4,1,2
1,1206004,126391,1,4,1,2
1,1212564,127941,1,4,1,2
100,1257646,129750,1,4,1,2
100,1232476,125780,1,4,1,2
100,1215466,128310,1,4,1,2
100,1205835,129070,1,4,1,2
...
991,1208294,125820,1,4,1,2
991,1209305,126921,1,4,1,2
991,1203275,125031,1,4,1,2
991,1234855,124190,1,4,1,2
991,1337136,125060,1,4,1,2

Median Slopes Analysis
========
-- Extrinsic Time --

Model:
Time ~=     1231
    + b        0
              µs

Reads = 1 + (0 * b)
Writes = 1 + (0 * b)

Min Squares Analysis
========
-- Extrinsic Time --

Data points distribution:
    b   mean µs  sigma µs       %
    1      1230     12.61    1.0%
  100      1230     16.53    1.3%
  199      1227     12.88    1.0%
  298      1215     12.94    1.0%
  397      1239     19.83    1.6%
  496      1234     18.51    1.5%
  595      1228     7.167    0.5%
  694      1233     13.51    1.0%
  793      1218     9.876    0.8%
  892      1225     14.15    1.1%
  991      1220     11.63    0.9%

Quality and confidence:
param     error
b         0.006

Model:
Time ~=     1230
    + b        0
              µs

Reads = 1 + (0 * b)
Writes = 1 + (0 * b)

With the median slopes analysis, this is the weight function:

1231 µs + (0 * b) +
  [1 + (0 * b)] * db read time +
  [1 + (0 * b)] * db write time

We separate the db/storage read and write out because they are particularly expensive and their respective operation time will be retrieved from the runtime. We just need to measure how many db read write are performed with respect to the change of b.

The benchmark result is telling us that it will always perform 1 db read and 1 db write no matter how b changes.

The benchmark library gives us both a median slope analysis; that the execution time of a particular b value is taken as the median value of the repeated runs, and a min. square analysis that is better explained in a statistics primer.

You can also derive your own coefficients given you have the raw data on each run, say maybe you know the computation time will not be a linear but an O(nlogn) relationship with the input variable. So you need to determine the coefficient differently.

Auto-generated `WeightInfo` implementation

The second output is an auto-generated WeightInfo implementation. This file defines weight functions of benchmarked extrinsics with the computed coefficient above. We can directly integrated this file in your pallet or further customize them if so desired. The auto-generated implementation is designed to make end-to-end weight updates easy.

To use this file, we define a WeightInfo trait, for example in the Example pallet:

pub trait WeightInfo {
  fn accumulate_dummy(b: u32, ) -> Weight;
  fn set_dummy(b: u32, ) -> Weight;
  fn sort_vector(x: u32, ) -> Weight;
}