Description
Feature request
The idea (discussed in the Discord server with @lhoestq ) is to have a Pdf type like Image/Audio/Video. For example Video was recently added and contains how to decode a video file encoded in a dictionary like {"path": ..., "bytes": ...} as a VideoReader using decord. We want to do the same with pdf and get a pypdfium2.PdfDocument.
Motivation
In many cases PDFs contain very valuable information beyond text (e.g. images, figures). Support for PDFs would help create datasets where all the information is preserved.
Your contribution
I can start the implementation of the Pdf type :)