Skip to content

jmuehlig/perf-cpp

Repository files navigation

perf-cpp: Hardware Performance Monitoring for C++

LGPL-3.0 LinuxKernel->=4.0 C++17 Build and Test Ask DeepWiki

Quick Start | How to Build | Documentation | System Requirements

perf-cpp lets you profile specific parts of your code, not the entire program.

Tools like Linux Perf, Intel® VTune™, and AMD uProf profile everything: application startup, configuration parsing, data loading, and all your helper functions. perf-cpp is different: place start() and stop() around exactly the code you want to measure. Profile one sorting algorithm. Measure cache misses in your hash table lookup. Compare two memory allocators. Skip all the noise.

What can perf-cpp do?

Built around Linux's perf subsystem, perf-cpp lets you count and sample hardware events for specific code blocks:

See various practical examples and the full documentation for more details.

Quick Start

Record Hardware Event Statistics

Count hardware events like perf stat—instructions, cycles, cache misses—while your code runs.

#include <perfcpp/event_counter.hpp>

/// Initialize the counter
auto event_counter = perf::EventCounter{};

/// Specify hardware events to count
event_counter.add({"seconds", "instructions", "cycles", "cache-misses"});

/// Run the workload
event_counter.start();
code_to_profile(); /// <-- Statistics recorded during execution
event_counter.stop();

/// Print the result to the console
const auto result = event_counter.result();
for (const auto [event_name, value] : result)
{
    std::cout << event_name << ": " << value << std::endl;
}

Possible output:

seconds:      0.0955897 
instructions: 5.92087e+07
cycles:       4.70254e+08
cache-misses: 1.35633e+07

Note

See the guides on recording event statistics and event statistics on multiple CPUs/threads. Check out the hardware events documentation for built-in and processor-specific events.

Record Samples

Record snapshots like perf [mem] record—instruction pointer, CPU, timestamp—every 50,000 cycles.

#include <perfcpp/sampler.hpp>

/// Create the sampler
auto sampler = perf::Sampler{};

/// Specify when a sample is recorded: every 50,000th cycle
sampler.trigger("cycles", perf::Period{50000U});

/// Specify what data is included in a sample: time, CPU ID, instruction
sampler.values()
    .timestamp(true)
    .cpu_id(true)
    .logical_instruction_pointer(true);

/// Run the workload
sampler.start();
code_to_profile(); /// <-- Samples recorded during execution
sampler.stop();

const auto samples = sampler.result();

/// Export samples to CSV.
samples.to_csv("samples.csv");

/// Or access samples programmatically.
for (const auto& record : samples)
{
    const auto timestamp = record.metadata().timestamp().value();
    const auto cpu_id = record.metadata().cpu_id().value();
    const auto instruction = record.instruction_execution().logical_instruction_pointer().value();
    
    std::cout 
        << "Time = " << timestamp << " | CPU = " << cpu_id
        << " | Instruction = 0x" << std::hex << instruction << std::dec
        << std::endl;
}

Possible output:

Time = 365449130714033 | CPU = 8 | Instruction = 0x5a6e84b2075c
Time = 365449130913157 | CPU = 8 | Instruction = 0x64af7417c75c
Time = 365449131112591 | CPU = 8 | Instruction = 0x5a6e84b2075c
Time = 365449131312005 | CPU = 8 | Instruction = 0x64af7417c75c 

Note

See the sampling guide for what data you can record. Also check out the sampling on multiple CPUs/threads guide for parallel sampling.

Building

perf-cpp is designed as a library (static or shared) that can be linked to your application.

git clone https://github.com/jmuehlig/perf-cpp.git
cd perf-cpp
cmake . -B build
cmake --build build

Note

See the building guide for CMake integration and build options.

Documentation

The full documentation is available at jmuehlig.github.io/perf-cpp.

See also: Examples | Changelog

System Requirements

  • Clang / GCC with support for C++17 features.
  • CMake version 3.10 or higher.
  • Linux Kernel 4.0 or newer (note that some features need a newer Kernel).
  • perf_event_paranoid setting: Adjust as needed to allow access to performance counters (see the perf paranoid documentation).
  • Python3, if you make use of processor-specific hardware event generation.

Contribute and Contact

We welcome contributions and feedback. For feature requests, feedback, or bug reports, please reach out via our issue tracker or submit a pull request.

Alternatively, you can email me: jan.muehlig@tu-dortmund.de.


Further PMU-related Projects

Other profiling tools:

Resources about (Perf-) Profiling

Papers and articles about profiling (feel free to add your own via pull request):

Academic Papers

Blog Posts