Skip to content
@StabRise

StabRise

Document processing solutions

Hi there 👋

StabRise - Document Processing Solutions

Our projects

PDF DataSource for the Apache Spark

Spark Pdf


Source Code: https://github.com/StabRise/spark-pdf

Home page: https://stabrise.com/spark-pdf/

Quick Start Jupyter Notebook: https://github.com/StabRise/spark-pdf/blob/main/examples/PdfDataSource.ipynb


The project provides a custom data source for the Apache Spark that allows you to read PDF files into the Spark DataFrame.

Key features:

  • Read PDF documents to the Spark DataFrame
  • Support read PDF files lazy per page
  • Support big files, up to 10k pages
  • Support scanned PDF files (call OCR)
  • No need to install Tesseract OCR, it's included in the package

ScaleDP

ScaleDP


Source Code: https://github.com/StabRise/scaledp

Home page: https://stabrise.com/scaledp/

Quick Start Jupyter Notebook: https://github.com/StabRise/ScaleDP-Tutorials/blob/master/1.QuickStart.ipynb


ScaleDP is an Open-Source Library for processing documents using Apache Spark.

Key features:

  • Load PDF documents/Images
  • Extract text from PDF documents/Images
  • Extract images from PDF documents
  • OCR Images/PDF documents
  • Run NER on text extracted from PDF documents/Images
  • Visualize NER results

PDF Redaction

pdf-redaction

Home page: https://pdf-redaction.com

Free AI-powered tool for redact PDF files (remove sensitive information) online.

pdf-redaction

Pinned Loading

  1. spark-pdf spark-pdf Public

    PDF DataSource for Apache Spark

    Scala 44 3

  2. ScaleDP ScaleDP Public

    ScaleDP is an Open-Source extension of Apache Spark for Document Processing

    Python 6

  3. ScaleDP-Tutorials ScaleDP-Tutorials Public

    Tutorials for ScaleDP library. ScaleDP is an Open-Source Library for Processing Documents in Apache Spark.

    Jupyter Notebook 3

Repositories

Showing 6 of 6 repositories
  • ScaleDP Public

    ScaleDP is an Open-Source extension of Apache Spark for Document Processing

    StabRise/ScaleDP’s past year of commit activity
    Python 6 AGPL-3.0 0 7 0 Updated Mar 17, 2025
  • ScaleDP-Tutorials Public

    Tutorials for ScaleDP library. ScaleDP is an Open-Source Library for Processing Documents in Apache Spark.

    StabRise/ScaleDP-Tutorials’s past year of commit activity
    Jupyter Notebook 3 AGPL-3.0 0 0 0 Updated Mar 17, 2025
  • stabrise Public

    Stabrise - Document Processing Solutions

    StabRise/stabrise’s past year of commit activity
    TypeScript 2 0 0 0 Updated Mar 17, 2025
  • .github Public

    Document processing solutions

    StabRise/.github’s past year of commit activity
    1 0 0 0 Updated Mar 9, 2025
  • spark-pdf Public

    PDF DataSource for Apache Spark

    StabRise/spark-pdf’s past year of commit activity
    Scala 44 AGPL-3.0 3 4 1 Updated Mar 6, 2025
  • De-Identify Public

    Data De-Identification Tools

    StabRise/De-Identify’s past year of commit activity
    Jupyter Notebook 1 AGPL-3.0 0 0 0 Updated Jan 3, 2025

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy