Skip to content

Optimize ZSTD_count with RISC-V Vector (RVV) intrinsics#4629

Open
Polaris-911 wants to merge 1 commit intofacebook:devfrom
Polaris-911:pr2
Open

Optimize ZSTD_count with RISC-V Vector (RVV) intrinsics#4629
Polaris-911 wants to merge 1 commit intofacebook:devfrom
Polaris-911:pr2

Conversation

@Polaris-911
Copy link
Copy Markdown
Contributor

Description

This PR introduces RISC-V Vector (RVV) intrinsics to optimize the ZSTD_count function in zstd_compress_internal.h.

ZSTD_count is a highly frequently called function during the match-finding phase. The original scalar implementation processes data in machine-word chunks (sizeof(size_t)) and requires a fallback mechanism to handle the remaining tail bytes safely.

By leveraging the RISC-V Vector extension (__riscv_vsetvl_e8m1), we can dramatically simplify the loop. The vsetvl instruction automatically manages the application vector length (AVL), seamlessly absorbing the tail without the need for additional branching or scalar tail-handling code, while safely avoiding out-of-bounds reads.

To maximize performance, the initial scalar fast-path (MEM_readST) is preserved to quickly catch early mismatches (which covers the vast majority of cases), switching to the RVV loop only for longer matches.

Performance Evaluation

Test Environment:

  • CPU: RISC-V r2044
  • Dataset: silesia.tar (211,957,760 bytes)
  • Command: numactl -l -C 12 ./zstd -b1 -e1 ../silesia.tar

Benchmark Results

Implementation Run 1 (MB/s) Run 2 (MB/s) Run 3 (MB/s) Average Compression Ratio
Scalar (Before PR) 85.2 85.5 85.2 85.30 MB/s Base
RVV (After PR) 91.9 91.4 91.7 91.67 MB/s +7.47%

Before PR (Scalar):
image

After PR (RVV Optimized):
image

@meta-cla meta-cla bot added the CLA Signed label Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant