hyperfine でコマンドのベンチマークを実行する

hyperfine を使うとコマンドのベンチマークを実行出来ます。類似のツールに bench や poop があります。 poop は高機能なようですが、私の環境では動作させることが出来ませんでした。今回は hyperfine を起動する手順まで、メモしておきます。今回は Ubuntu 24.04 環境でテストしました。

インストール¶

Debian / Ubuntu 用には .deb パッケージが用意されています。今回はこのパッケージからインストールしました。

curl -LO https://github.com/sharkdp/hyperfine/releases/download/v1.18.0/hyperfine_1.18.0_amd64.deb
apt -y install ./hyperfine_1.18.0_amd64.deb

使い方¶

基本的な使い方¶

コマンドのベンチマークを実行するには hyperfine コマンドに続けて「ベンチマークしたいコマンド」を指定します。デフォルトでは 10 回、実行されるようです。

# hyperfine "sleep 0.3"
Benchmark 1: sleep 0.3
  Time (mean ± σ):     304.0 ms ±   1.3 ms    [User: 1.8 ms, System: 2.1 ms]
  Range (min … max):   301.2 ms … 304.8 ms    10 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

回数を指定¶

実行回数を変更したい場合は --runs オプションで回数を指定します。

# hyperfine "sleep 0.3" --runs 5
Benchmark 1: sleep 0.3
  Time (mean ± σ):     303.0 ms ±   0.7 ms    [User: 1.3 ms, System: 1.5 ms]
  Range (min … max):   302.2 ms … 304.1 ms    5 runs

ウォームアップ回数を指定¶

ディスク I/O に関連するベンチマークを実施したい場合、キャッシュの有無によって大きく結果が変わることが予想されます。こういった場合は事前に「統計情報としては収集しない、ウォームアップ」を指定することが出来ます。下記の例では「ウォームアップとして 3 回、コマンドを実行」した後、「(回数指定が無いのでデフォルトの) 10 回、コマンドを実行」して、その統計情報を計算します。

# hyperfine "sleep 0.3" --warmup 3
Benchmark 1: sleep 0.3
  Time (mean ± σ):     303.3 ms ±   0.6 ms    [User: 2.5 ms, System: 0.9 ms]
  Range (min … max):   302.8 ms … 304.5 ms    10 runs

参考¶

ヘルプ表示¶

# hyperfine --help
A command-line benchmarking tool.

Usage: hyperfine [OPTIONS] <command>...

Arguments:
  <command>...
          The command to benchmark. This can be the name of an executable, a
          command line like "grep -i todo" or a shell command like "sleep 0.5 &&
          echo test". The latter is only available if the shell is not
          explicitly disabled via '--shell=none'. If multiple commands are
          given, hyperfine will show a comparison of the respective runtimes.

Options:
  -w, --warmup <NUM>
          Perform NUM warmup runs before the actual benchmark. This can be used
          to fill (disk) caches for I/O-heavy programs.
  -m, --min-runs <NUM>
          Perform at least NUM runs for each command (default: 10).
  -M, --max-runs <NUM>
          Perform at most NUM runs for each command. By default, there is no
          limit.
  -r, --runs <NUM>
          Perform exactly NUM runs for each command. If this option is not
          specified, hyperfine automatically determines the number of runs.
  -s, --setup <CMD>
          Execute CMD before each set of timing runs. This is useful for
          compiling your software with the provided parameters, or to do any
          other work that should happen once before a series of benchmark runs,
          not every time as would happen with the --prepare option.
  -p, --prepare <CMD>
          Execute CMD before each timing run. This is useful for clearing disk
          caches, for example.
          The --prepare option can be specified once for all commands or
          multiple times, once for each command. In the latter case, each
          preparation command will be run prior to the corresponding benchmark
          command.
  -c, --cleanup <CMD>
          Execute CMD after the completion of all benchmarking runs for each
          individual command to be benchmarked. This is useful if the commands
          to be benchmarked produce artifacts that need to be cleaned up.
  -P, --parameter-scan <VAR> <MIN> <MAX>
          Perform benchmark runs for each value in the range MIN..MAX. Replaces
          the string '{VAR}' in each command by the current parameter value.

            Example:  hyperfine -P threads 1 8 'make -j {threads}'

          This performs benchmarks for 'make -j 1', 'make -j 2', …, 'make -j 8'.

          To have the value increase following different patterns, use shell
          arithmetics.

            Example: hyperfine -P size 0 3 'sleep $((2**{size}))'

          This performs benchmarks with power of 2 increases: 'sleep 1', 'sleep
          2', 'sleep 4', …
          The exact syntax may vary depending on your shell and OS.
  -D, --parameter-step-size <DELTA>
          This argument requires --parameter-scan to be specified as well.
          Traverse the range MIN..MAX in steps of DELTA.

            Example:  hyperfine -P delay 0.3 0.7 -D 0.2 'sleep {delay}'

          This performs benchmarks for 'sleep 0.3', 'sleep 0.5' and 'sleep 0.7'.
  -L, --parameter-list <VAR> <VALUES>
          Perform benchmark runs for each value in the comma-separated list
          VALUES. Replaces the string '{VAR}' in each command by the current
          parameter value.

          Example:  hyperfine -L compiler gcc,clang '{compiler} -O2 main.cpp'

          This performs benchmarks for 'gcc -O2 main.cpp' and 'clang -O2
          main.cpp'.

          The option can be specified multiple times to run benchmarks for all
          possible parameter combinations.
  -S, --shell <SHELL>
          Set the shell to use for executing benchmarked commands. This can be
          the name or the path to the shell executable, or a full command line
          like "bash --norc". It can also be set to "default" to explicitly
          select the default shell on this platform. Finally, this can also be
          set to "none" to disable the shell. In this case, commands will be
          executed directly. They can still have arguments, but more complex
          things like "sleep 0.1; sleep 0.2" are not possible without a shell.
  -N
          An alias for '--shell=none'.
  -i, --ignore-failure
          Ignore non-zero exit codes of the benchmarked programs.
      --style <TYPE>
          Set output style type (default: auto). Set this to 'basic' to disable
          output coloring and interactive elements. Set it to 'full' to enable
          all effects even if no interactive terminal was detected. Set this to
          'nocolor' to keep the interactive output without any colors. Set this
          to 'color' to keep the colors without any interactive output. Set this
          to 'none' to disable all the output of the tool.
      --sort <METHOD>
          Specify the sort order of the speed comparison summary and the
          exported tables for markup formats (Markdown, AsciiDoc, org-mode):
            * 'auto' (default): the speed comparison will be ordered by time and
              the markup tables will be ordered by command (input order).
            * 'command': order benchmarks in the way they were specified
            * 'mean-time': order benchmarks by mean runtime
  -u, --time-unit <UNIT>
          Set the time unit to be used. Possible values: microsecond,
          millisecond, second. If the option is not given, the time unit is
          determined automatically. This option affects the standard output as
          well as all export formats except for CSV and JSON.
      --export-asciidoc <FILE>
          Export the timing summary statistics as an AsciiDoc table to the given
          FILE. The output time unit can be changed using the --time-unit
          option.
      --export-csv <FILE>
          Export the timing summary statistics as CSV to the given FILE. If you
          need the timing results for each individual run, use the JSON export
          format. The output time unit is always seconds.
      --export-json <FILE>
          Export the timing summary statistics and timings of individual runs as
          JSON to the given FILE. The output time unit is always seconds
      --export-markdown <FILE>
          Export the timing summary statistics as a Markdown table to the given
          FILE. The output time unit can be changed using the --time-unit
          option.
      --export-orgmode <FILE>
          Export the timing summary statistics as an Emacs org-mode table to the
          given FILE. The output time unit can be changed using the --time-unit
          option.
      --show-output
          Print the stdout and stderr of the benchmark instead of suppressing
          it. This will increase the time it takes for benchmarks to run, so it
          should only be used for debugging purposes or when trying to benchmark
          output speed.
      --output <WHERE>
          Control where the output of the benchmark is redirected. Note that
          some programs like 'grep' detect when standard output is /dev/null and
          apply certain optimizations. To avoid that, consider using
          '--output=pipe'.

          <WHERE> can be:

            null:     Redirect output to /dev/null (the default).

            pipe:     Feed the output through a pipe before discarding it.

            inherit:  Don't redirect the output at all (same as
            '--show-output').

            <FILE>:   Write the output to the given file.
      --input <WHERE>
          Control where the input of the benchmark comes from.

          <WHERE> can be:

            null:     Read from /dev/null (the default).

            <FILE>:   Read the input from the given file.
  -n, --command-name <NAME>
          Give a meaningful name to a command. This can be specified multiple
          times if several commands are benchmarked.
  -h, --help
          Print help
  -V, --version
          Print version