Skip to content

IlyaGrebnov/libcubwt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

libcubwt

The libcubwt is a library for fast (see Benchmarks below) GPU-based Burrows-Wheeler transform construction and inversion on the CUDA platform using prefix doubling + Skew/DC3 approach as described in the following papers:

  • Vitaly Osipov, Parallel Suffix Array Construction for Shared Memory Architectures, 2012
  • Leyuan Wang, Sean Baxter, and John D. Owens Fast Parallel Suffix Array on the GPU, 2015
  • Florian BĂĽren, Daniel JĂĽnger, Robin Kobus, Christian Hundt, Bertil Schmidt Suffix Array Construction on Multi-GPU Systems, 2019

Copyright (c) 2022-2025 Ilya Grebnov ilya.grebnov@gmail.com

Introduction

The libcubwt provides simple API to construct Burrows-Wheeler transform from a given string over constant-size alphabet using 20.5n bytes of GPU memory.

  • The libcubwt works with any CUDA compatible GPU, but I recommend SM 8.9+ Ada Lovelace (CUDA 11.8 and later) due to very large L2 cache.
  • The libcubwt is sensitive to fast GPU memory and might not be suitable for some workloads. Please benchmark yourself.

Suffix array computation

  • Starting with version 1.5.0, libcubwt algorithm was changed and no longer supports the computation of suffix arrays or inverse suffix arrays. To compute these arrays, please consider using a version prior to 1.5.0.

License

The libcubwt is released under the Apache License Version 2.0 and is considered suitable for production use. However, no warranty or fitness for a particular purpose is expressed or implied.

Changes

  • May 5, 2025 (1.6.2)
    • Support for CUDA Toolkit 12.9 & CCCL 3.0.
  • January 24, 2025 (1.6.1)
    • Support for Blackwell architecture.
  • January 24, 2024 (1.6.0)
    • Inverse Burrows-Wheeler transform.
  • March 24, 2023 (1.5.0)
    • Reduced memory usage and improved performance.
  • February 10, 2023 (1.0.0)
    • Initial public release of the libcubwt.

Examples of APIs (see libcubwt.cuh for complete APIs list)

    /**
    * Allocates storage on the CUDA device that allows reusing allocated memory with each libcubwt operation.
    * @param max_length The maximum length of string to support.
    * @return LIBCUBWT_NO_ERROR if no error occurred, libcubwt error code otherwise.
    */
    int64_t libcubwt_allocate_device_storage(void ** device_storage, int64_t max_length);

    /**
    * Destroys the previously allocated storage on the CUDA device.
    * @param device_storage The previously allocated storage on the CUDA device.
    * @return LIBCUBWT_NO_ERROR if no error occurred, libcubwt error code otherwise.
    */
    int64_t libcubwt_free_device_storage(void * device_storage);

    /**
    * Constructs the Burrows-Wheeler Transform (BWT) of a given string.
    * @param device_storage The previously allocated storage on the CUDA device.
    * @param T [0..n-1] The input string.
    * @param L [0..n-1] The output string (can be T).
    * @param n The length of the input string.
    * @return The primary index if no error occurred, libcubwt error code otherwise.
    */
    int64_t libcubwt_bwt(void * device_storage, const uint8_t * T, uint8_t * L, int64_t n);

    /**
    * Reconstructs the original string from a given burrows-wheeler transformed string (BWT) with primary index.
    * @param device_storage The previously allocated storage on the CUDA device.
    * @param T [0..n-1] The input string.
    * @param U [0..n-1] The output string (can be T).
    * @param n The length of the given string.
    * @param freq [0..255] The input symbol frequency table (can be NULL).
    * @param i The primary index.
    * @return LIBCUBWT_NO_ERROR if no error occurred, libcubwt error code otherwise.
    */
    int64_t libcubwt_unbwt(void * device_storage, const uint8_t * T, uint8_t * U, int64_t n, const int32_t * freq, int32_t i);

Benchmarks

Methodology

  • Input files were capped at 352MB due to GPU memory limit.
  • All timings exclude initialization and memory allocations. However, the time for data transfer in and out of GPU is included.
  • The timings are minimum of five runs measuring multi-threading performance of Burrows-Wheeler transform construction.

Specification (x86-64 architecture)

  • OS: Microsoft Windows 10 Pro 64-Bit
  • MB: MSI MPG Z390M GAMING EDGE AC, PCIe 3.0 x16
  • CPU: Intel Core i7-9700K Processor (12M Cache, 5GHz all cores)
  • RAM: 2x8 GB dual-channel DDR4 (4133 MHz, 17-17-17-37)
  • GPU: ASUS TUF Gaming GeForce RTX 4070 Ti 12GB GDDR6X OC Edition
  • Compiler: Microsoft Visual C++ compiler v19.34.31937 for x64
  • Optimizations (CPU): /MD /DNDEBUG /O2 /GL /arch:AVX2 /openmp
  • Optimizations (GPU): -arch=native --use_fast_math

Silesia Corpus

file size libcubwt 1.5.0 (GPU) libsais 2.7.1 (CPU) divsufsort 2.0.2 (CPU)
dickens 10192446 0.012 sec ( 844.66 MB/s) 0.115 sec ( 88.27 MB/s) 0.317 sec ( 32.12 MB/s)
mozilla 51220480 0.107 sec ( 479.70 MB/s) 0.722 sec ( 70.91 MB/s) 1.296 sec ( 39.51 MB/s)
mr 9970564 0.044 sec ( 228.12 MB/s) 0.124 sec ( 80.11 MB/s) 0.283 sec ( 35.21 MB/s)
nci 33553445 0.097 sec ( 347.06 MB/s) 0.378 sec ( 88.67 MB/s) 1.141 sec ( 29.40 MB/s)
ooffice 6152192 0.007 sec ( 924.93 MB/s) 0.085 sec ( 72.66 MB/s) 0.117 sec ( 52.59 MB/s)
osdb 10085684 0.012 sec ( 841.83 MB/s) 0.152 sec ( 66.35 MB/s) 0.299 sec ( 33.72 MB/s)
reymont 6627202 0.008 sec ( 834.20 MB/s) 0.077 sec ( 86.48 MB/s) 0.199 sec ( 33.23 MB/s)
samba 21606400 0.044 sec ( 488.45 MB/s) 0.277 sec ( 77.87 MB/s) 0.502 sec ( 43.04 MB/s)
sao 7251944 0.006 sec (1275.49 MB/s) 0.145 sec ( 50.11 MB/s) 0.158 sec ( 46.04 MB/s)
webster 41458703 0.065 sec ( 640.98 MB/s) 0.603 sec ( 68.71 MB/s) 1.768 sec ( 23.46 MB/s)
x-ray 8474240 0.007 sec (1297.70 MB/s) 0.149 sec ( 56.84 MB/s) 0.198 sec ( 42.90 MB/s)
xml 5345280 0.009 sec ( 594.05 MB/s) 0.054 sec ( 98.87 MB/s) 0.117 sec ( 45.77 MB/s)

Large Canterbury Corpus

file size libcubwt 1.5.0 (GPU) libsais 2.7.1 (CPU) divsufsort 2.0.2 (CPU)
bible.txt 4047392 0.004 sec (1063.62 MB/s) 0.042 sec ( 97.15 MB/s) 0.106 sec ( 38.16 MB/s)
E.coli 4638690 0.006 sec ( 771.33 MB/s) 0.040 sec ( 116.30 MB/s) 0.166 sec ( 27.98 MB/s)
world192.txt 2473400 0.003 sec ( 885.03 MB/s) 0.027 sec ( 90.86 MB/s) 0.053 sec ( 46.41 MB/s)

Maximum Compression Corpus

file size libcubwt 1.5.0 (GPU) libsais 2.7.1 (CPU) divsufsort 2.0.2 (CPU)
A10.jpg 842468 0.001 sec (1338.95 MB/s) 0.020 sec ( 42.90 MB/s) 0.018 sec ( 45.71 MB/s)
AcroRd32.exe 3870784 0.007 sec ( 577.69 MB/s) 0.059 sec ( 65.08 MB/s) 0.070 sec ( 55.50 MB/s)
english.dic 465211 0.001 sec ( 567.47 MB/s) 0.007 sec ( 70.86 MB/s) 0.008 sec ( 54.87 MB/s)
FlashMX.pdf 4526946 0.004 sec (1104.21 MB/s) 0.099 sec ( 45.58 MB/s) 0.086 sec ( 52.37 MB/s)
FP.LOG 20617071 0.047 sec ( 437.20 MB/s) 0.261 sec ( 79.07 MB/s) 0.715 sec ( 28.82 MB/s)
MSO97.DLL 3782416 0.003 sec (1178.36 MB/s) 0.061 sec ( 62.34 MB/s) 0.072 sec ( 52.74 MB/s)
ohs.doc 4168192 0.006 sec ( 674.87 MB/s) 0.051 sec ( 81.72 MB/s) 0.094 sec ( 44.17 MB/s)
rafale.bmp 4149414 0.004 sec (1060.26 MB/s) 0.043 sec ( 96.65 MB/s) 0.083 sec ( 49.81 MB/s)
vcfiu.hlp 4121418 0.005 sec ( 797.61 MB/s) 0.046 sec ( 89.91 MB/s) 0.078 sec ( 52.78 MB/s)
world95.txt 2988578 0.004 sec ( 836.76 MB/s) 0.033 sec ( 91.77 MB/s) 0.067 sec ( 44.32 MB/s)

Large Text Compression Benchmark Corpus

file size libcubwt 1.5.0 (GPU) libsais 2.7.1 (CPU) divsufsort 2.0.2 (CPU)
enwik8 100000000 0.147 sec ( 681.07 MB/s) 1.633 sec ( 61.22 MB/s) 4.616 sec ( 21.67 MB/s)
enwik9 369098752 0.579 sec ( 637.83 MB/s) 6.363 sec ( 58.01 MB/s) 20.633 sec ( 17.89 MB/s)

The Gauntlet Corpus

file size libcubwt 1.5.0 (GPU) libsais 2.7.1 (CPU) divsufsort 2.0.2 (CPU)
abac 200000 0.004 sec ( 47.69 MB/s) 0.002 sec ( 89.59 MB/s) 0.002 sec ( 105.77 MB/s)
abba 10500596 0.038 sec ( 278.77 MB/s) 0.092 sec ( 114.41 MB/s) 0.494 sec ( 21.26 MB/s)
book1x20 15375420 0.076 sec ( 201.57 MB/s) 0.192 sec ( 80.14 MB/s) 0.535 sec ( 28.75 MB/s)
fib_s14930352 14930352 0.111 sec ( 134.06 MB/s) 0.181 sec ( 82.57 MB/s) 1.164 sec ( 12.83 MB/s)
fss10 12078908 0.073 sec ( 166.32 MB/s) 0.135 sec ( 89.26 MB/s) 0.885 sec ( 13.65 MB/s)
fss9 2851443 0.014 sec ( 205.39 MB/s) 0.027 sec ( 107.11 MB/s) 0.132 sec ( 21.53 MB/s)
houston 3839141 0.023 sec ( 164.68 MB/s) 0.020 sec ( 188.82 MB/s) 0.028 sec ( 138.15 MB/s)
paper5x80 956322 0.002 sec ( 403.97 MB/s) 0.010 sec ( 94.51 MB/s) 0.021 sec ( 46.35 MB/s)
test1 2097152 0.006 sec ( 358.13 MB/s) 0.023 sec ( 92.50 MB/s) 0.082 sec ( 25.46 MB/s)
test2 2097152 0.006 sec ( 356.80 MB/s) 0.022 sec ( 96.95 MB/s) 0.055 sec ( 37.91 MB/s)
test3 2097088 0.004 sec ( 474.53 MB/s) 0.035 sec ( 60.43 MB/s) 0.049 sec ( 42.93 MB/s)

Manzini Corpus

file size libcubwt 1.5.0 (GPU) libsais 2.7.1 (CPU) divsufsort 2.0.2 (CPU)
chr22.dna 34553758 0.093 sec ( 369.82 MB/s) 0.436 sec ( 79.18 MB/s) 1.837 sec ( 18.81 MB/s)
etext99 105277340 0.199 sec ( 528.25 MB/s) 1.780 sec ( 59.14 MB/s) 5.590 sec ( 18.83 MB/s)
gcc-3.0.tar 86630400 0.221 sec ( 392.32 MB/s) 1.215 sec ( 71.32 MB/s) 2.682 sec ( 32.30 MB/s)
howto 39422105 0.065 sec ( 603.01 MB/s) 0.565 sec ( 69.75 MB/s) 1.366 sec ( 28.86 MB/s)
jdk13c 69728899 0.174 sec ( 400.57 MB/s) 1.031 sec ( 67.66 MB/s) 2.734 sec ( 25.50 MB/s)
linux-2.4.5.tar 116254720 0.252 sec ( 461.40 MB/s) 1.698 sec ( 68.45 MB/s) 3.808 sec ( 30.53 MB/s)
rctail96 114711151 0.254 sec ( 451.11 MB/s) 1.864 sec ( 61.54 MB/s) 5.452 sec ( 21.04 MB/s)
rfc 116421901 0.228 sec ( 510.92 MB/s) 1.725 sec ( 67.48 MB/s) 4.510 sec ( 25.81 MB/s)
sprot34.dat 109617186 0.211 sec ( 519.63 MB/s) 1.756 sec ( 62.42 MB/s) 4.810 sec ( 22.79 MB/s)
w3c2 104201579 0.286 sec ( 364.16 MB/s) 1.603 sec ( 65.01 MB/s) 3.885 sec ( 26.82 MB/s)

Pizza & Chilli Corpus

file size libcubwt 1.5.0 (GPU) libsais 2.7.1 (CPU) divsufsort 2.0.2 (CPU)
dblp.xml 296135874 0.558 sec ( 530.99 MB/s) 4.860 sec ( 60.93 MB/s) 13.964 sec ( 21.21 MB/s)
dna 369098752 0.591 sec ( 624.79 MB/s) 5.837 sec ( 63.24 MB/s) 30.630 sec ( 12.05 MB/s)
english.1024MB 369098752 0.758 sec ( 486.98 MB/s) 6.794 sec ( 54.33 MB/s) 24.330 sec ( 15.17 MB/s)
pitches 55832855 0.097 sec ( 578.30 MB/s) 0.807 sec ( 69.19 MB/s) 1.521 sec ( 36.70 MB/s)
proteins 369098752 0.740 sec ( 499.06 MB/s) 7.085 sec ( 52.10 MB/s) 20.326 sec ( 18.16 MB/s)
sources 210866607 0.391 sec ( 539.50 MB/s) 3.254 sec ( 64.80 MB/s) 7.826 sec ( 26.94 MB/s)

Pizza & Chilli Repetitive Corpus

file size libcubwt 1.5.0 (GPU) libsais 2.7.1 (CPU) divsufsort 2.0.2 (CPU)
cere 369098752 1.788 sec ( 206.43 MB/s) 5.648 sec ( 65.35 MB/s) 24.792 sec ( 14.89 MB/s)
coreutils 205281778 0.892 sec ( 230.01 MB/s) 3.097 sec ( 66.29 MB/s) 8.828 sec ( 23.25 MB/s)
einstein.de.txt 92758441 0.547 sec ( 169.56 MB/s) 1.363 sec ( 68.03 MB/s) 4.038 sec ( 22.97 MB/s)
einstein.en.txt 369098752 2.195 sec ( 168.17 MB/s) 5.845 sec ( 63.15 MB/s) 21.941 sec ( 16.82 MB/s)
Escherichia_Coli 112689515 0.302 sec ( 372.84 MB/s) 1.695 sec ( 66.48 MB/s) 6.832 sec ( 16.49 MB/s)
influenza 154808555 0.527 sec ( 293.57 MB/s) 2.234 sec ( 69.31 MB/s) 8.538 sec ( 18.13 MB/s)
kernel 257961616 1.063 sec ( 242.78 MB/s) 3.856 sec ( 66.89 MB/s) 10.715 sec ( 24.08 MB/s)
para 369098752 1.335 sec ( 276.43 MB/s) 5.725 sec ( 64.47 MB/s) 25.627 sec ( 14.40 MB/s)
world_leaders 46968181 0.312 sec ( 150.45 MB/s) 0.457 sec ( 102.68 MB/s) 1.119 sec ( 41.98 MB/s)
dblp.xml.00001.1 104857600 0.424 sec ( 247.09 MB/s) 1.739 sec ( 60.31 MB/s) 5.610 sec ( 18.69 MB/s)
dblp.xml.00001.2 104857600 0.432 sec ( 242.87 MB/s) 1.729 sec ( 60.63 MB/s) 5.669 sec ( 18.50 MB/s)
dblp.xml.0001.1 104857600 0.371 sec ( 282.50 MB/s) 1.743 sec ( 60.17 MB/s) 5.600 sec ( 18.73 MB/s)
dblp.xml.0001.2 104857600 0.372 sec ( 282.22 MB/s) 1.729 sec ( 60.64 MB/s) 5.585 sec ( 18.77 MB/s)
dna.001.1 104857600 0.275 sec ( 381.66 MB/s) 1.571 sec ( 66.73 MB/s) 6.097 sec ( 17.20 MB/s)
english.001.2 104857600 0.284 sec ( 369.03 MB/s) 1.795 sec ( 58.43 MB/s) 4.912 sec ( 21.35 MB/s)
proteins.001.1 104857600 0.490 sec ( 214.02 MB/s) 1.898 sec ( 55.26 MB/s) 4.256 sec ( 24.64 MB/s)
sources.001.2 104857600 0.300 sec ( 349.51 MB/s) 1.717 sec ( 61.07 MB/s) 4.295 sec ( 24.42 MB/s)
fib41 267914296 2.480 sec ( 108.03 MB/s) 3.898 sec ( 68.73 MB/s) 40.566 sec ( 6.60 MB/s)
rs.13 216747218 2.100 sec ( 103.23 MB/s) 3.062 sec ( 70.79 MB/s) 34.326 sec ( 6.31 MB/s)
tm29 268435456 2.343 sec ( 114.58 MB/s) 4.134 sec ( 64.93 MB/s) 37.548 sec ( 7.15 MB/s)

About

libcubwt is a library for GPU accelerated suffix array and burrows wheeler transform construction.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy