Skip to content

Next.js template for seamless PDF parsing using pdf2json and FilePond. Ideal for developers seeking a ready-to-use solution for PDF content extraction in Next.js projects.

Notifications You must be signed in to change notification settings

tuffstuff9/nextjs-pdf-parser

Repository files navigation

Next.js PDF Parser Template 📄🔍

nextjs-pdf-parser.mp4

Introduction

I was having some trouble parsing PDFs in Next.js, so I thought I would make this template for anyone else who was facing the same issues as me. I hope this template saves you some time and trouble. It's a basic create-next-app with PDF parsing implemented using the pdf2json library and file uploading facilitated by FilePond.

Installation & Setup 🚀

  1. Clone the repository:

  2. git clone [repository-url]

  3. Navigate to the project directory:

  4. cd nextjs-pdf-parser

  5. Install dependencies:

  6. Windows only: In app\api\upload\route.ts on line 22, change tempFilePath to a valid path. Make sure it starts from the root drive, for example: C:/coding/nextjs-pdf-parser/public/${fileName}.pdf

  7. npm install
    # or
    yarn install
  8. Run the development server:

    npm run dev
    # or
    yarn dev

    Visit http://localhost:3000 to view the application.

Usage 🖱

Navigate to http://localhost:3000 and use the FilePond uploader to select and upload a PDF. Once uploaded, the content of the PDF is parsed and printed to the server console (Note: it will not be printed to the browser log).

Technical Details 🛠

  • nodeUtil is not defined Error:

    To bypass the nodeUtil is not defined error, the following configuration was added to next.config.js:

const nextConfig = {
  experimental: {
    serverComponentsExternalPackages: ['pdf2json'],
  },
};

module.exports = nextConfig;

See more details here

  • Blank output from pdfParser.getRawTextContent():

    This issue might be due to incorrect type definitions. There are two potential solutions:

    1. Fix TypeScript definitions: Update the type definition for PDFParser.

    2. Bypass type checking: Instantiate PDFParser as shown:

      const pdfParser = new (PDFParser as any)(null, 1);

    For more details, refer to my comment on this GitHub issue.

Acknowledgements 🙏

A special thanks to the following libraries and their contributors:

  • FilePond: For providing a seamless and user-friendly file uploading experience.
  • pdf2json: For its efficient and robust PDF parsing capabilities.

License 📜

MIT License

About

Next.js template for seamless PDF parsing using pdf2json and FilePond. Ideal for developers seeking a ready-to-use solution for PDF content extraction in Next.js projects.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy