Skip to content

py-pdf/benchmarks

Repository files navigation

PDF Library Benchmarks

This benchmark is about reading pure PDF files - notscanned documents and not documents that applied OCR.

Benchmarking machine

Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz

Input Documents

# Name File Size Pages
1 2201.00214 2.4MiB 22
2 GeoTopo-book 5.1MiB 117
3 2201.00151 1.5MiB 12
4 1707.09725 7.0MiB 134
5 2201.00021 2.6MiB 10
6 2201.00037 2.9MiB 33
7 2201.00069 14.7MiB 15
8 2201.00178 2.3MiB 16
9 2201.00201 1.3MiB 9
10 1602.06541 2.9MiB 16
11 2201.00200 284.8KiB 7
12 2201.00022 1.1MiB 11
13 2201.00029 797.6KiB 12
14 1601.03642 1004.9KiB 8

Libraries

Name Last PyPI Release License Version Dependencies
Borb 2023-06-23 AGPL/Commercial 2.1.16
pypdfium2 2023-07-04 Apache-2.0 or BSD-3-Clause 4.18.0 PDFium (Foxit/Google)
pdfminer.six 2022-11-05 MIT/X 20221105
pdfplumber 2023-07-29 MIT 0.10.2 pdfminer.six
pdfrw 2017-09-18 MIT 0.4
pdftotext - GPL 0.86.1 build-essential libpoppler-cpp-dev pkg-config python3-dev
PyMuPDF 2023-08-24 GNU AFFERO GPL 3.0 / Commerical 1.23.1 MuPDF
pypdf 2023-08-26 BSD 3-Clause 3.15.4
Tika 2023-01-01 Apache v2 2.6.0 Apache Tika

Text Extraction Speed

# Library Average 1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 PyMuPDF 0.1s 0.4s 0.2s 0.2s 0.2s 0.0s 0.1s 0.0s 0.0s 0.0s 0.0s 0.0s 0.0s 0.0s 0.0s
2 pypdfium2 0.2s 1.9s 0.2s 0.2s 0.2s 0.0s 0.1s 0.1s 0.1s 0.0s 0.1s 0.0s 0.0s 0.0s 0.0s
3 pdftotext 0.3s 0.8s 1.0s 0.3s 0.8s 0.1s 0.2s 0.2s 0.1s 0.0s 0.1s 0.1s 0.1s 0.0s 0.0s
4 Tika 1.1s 12.9s 0.9s 0.6s 0.4s 0.1s 0.3s 0.2s 0.1s 0.1s 0.1s 0.1s 0.1s 0.0s 0.0s
5 pypdf 2.6s 18.7s 4.8s 5.3s 2.3s 0.7s 0.9s 0.4s 0.5s 0.3s 0.6s 0.5s 0.4s 0.4s 0.2s
6 pdfminer.six 4.5s 26.0s 12.9s 8.0s 4.6s 1.3s 2.1s 1.0s 1.2s 0.8s 1.5s 0.9s 0.9s 0.6s 0.6s
7 pdfplumber 6.7s 41.7s 10.9s 11.5s 8.4s 2.4s 4.3s 2.0s 1.9s 1.9s 2.7s 1.8s 1.7s 1.0s 1.2s
8 Borb 34.7s 111.2s 105.0s 1.4s 87.2s 21.1s 7.4s 83.5s 16.4s 20.3s 5.4s 3.4s 18.8s 3.2s 2.1s

Image Extraction Speed

# Library Average 1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 PyMuPDF 0.5s 0.3s 0.5s 0.0s 1.7s 0.4s 0.0s 3.2s 0.4s 0.4s 0.1s 0.0s 0.3s 0.2s 0.0s
2 pypdf 2.8s 16.4s 2.1s 0.8s 9.2s 1.1s 0.0s 6.7s 0.9s 0.9s 0.4s 0.0s 0.7s 0.2s 0.1s
3 pdfminer.six 6.5s 31.8s 13.7s 9.2s 24.0s 1.5s 2.3s 1.5s 1.4s 0.9s 1.5s 0.9s 1.0s 0.6s 0.5s

Watermarking Speed

# Library Average 1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 PyMuPDF 0.0s 0.0s 0.1s 0.0s 0.1s 0.0s 0.0s 0.0s 0.0s 0.0s 0.0s 0.0s 0.0s 0.0s 0.0s
2 pdfrw 0.1s 0.0s 0.4s 0.0s 0.3s 0.1s 0.1s 0.1s 0.1s 0.1s 0.1s 0.0s 0.1s 0.0s 0.0s
3 pypdf 0.4s 0.6s 1.7s 0.4s 0.9s 0.2s 0.3s 0.4s 0.3s 0.2s 0.3s 0.1s 0.2s 0.0s 0.2s

Watermarking File Size

# Library Average 1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 pdfrw 3.4MB 2.5MB 5.7MB 1.6MB 7.3MB 2.7MB 3.1MB 15.4MB 2.4MB 1.3MB 3.0MB 0.3MB 1.1MB 0.8MB 1.0MB
2 pypdf 3.5MB 2.5MB 5.7MB 1.6MB 7.3MB 2.7MB 3.1MB 15.4MB 2.4MB 1.3MB 3.0MB 0.3MB 1.1MB 0.8MB 1.0MB
3 PyMuPDF 3.7MB 2.7MB 6.8MB 1.7MB 8.5MB 2.8MB 3.4MB 15.5MB 2.5MB 1.4MB 3.2MB 0.3MB 1.2MB 0.9MB 1.1MB

Text Extraction Quality

# Library Average 1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 pypdfium2 98% 99% 97% 94% 99% 98% 96% 99% 98% 99% 99% 98% 98% 99% 99%
2 pypdf 97% 98% 93% 94% 98% 98% 96% 97% 98% 99% 99% 98% 98% 98% 99%
3 PyMuPDF 97% 98% 96% 93% 97% 98% 96% 98% 98% 98% 98% 97% 97% 98% 99%
4 Tika 96% 99% 98% 92% 97% 98% 96% 93% 97% 98% 93% 98% 93% 98% 96%
5 pdftotext 93% 96% 93% 91% 94% 92% 96% 96% 96% 97% 83% 94% 96% 96% 79%
6 pdfminer.six 90% 95% 79% 86% 92% 86% 93% 95% 93% 92% 92% 93% 86% 98% 86%
7 pdfplumber 75% 94% 84% 61% 97% 61% 93% 61% 89% 57% 59% 67% 59% 98% 67%
8 Borb 45% 70% 79% 0% 40% 48% 92% 0% 64% 51% 41% 55% 43% 0% 53%
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy