Skip to content
@DS4SD

IBM Deep Search

Developer tools for IBM Deep Search

Welcome to our OSS organization for document processing

The DS4SD organization is the home of the open-source projects of the AI for Knowledge group at IBM Research Europe - Zurich.

Docling

Docling is our main open-source package. It is a powerful library which simplifies document processing, parsing diverse formats — including advanced PDF understanding — and providing seamless integrations with the gen AI ecosystem.

We support an amazing community which helps us driving forward the adoption of Docling. Give it a try and join the community!



The key repositories of Docling are:

  • docling - The home of the main docling package.
  • docling-core - The definition of types, transforms, serializers, etc. If it has to do with the DoclingDocument you will find it here.
  • docling-parse - The backend PDF parser used by Docling.
  • docling-serve - The FastAPI wrappers for running Docling as REST API and distribute large jobs.
  • docling-ibm-models - The AI models powering Docling.

Deep Search

Deep Search leverages the output of Docling to Interprete, Index and Integrate the knowledge encoded in your documents. It offers a seamless chat interface for interacting with its RAG backend and navigate your data collections.

Deep Search is a service and it provides a programmatic access, for easy integration with other tools or in order to do bulk conversion. Our python toolkit provides these functionalities both as a client and library. Our examples repository is very useful to get started.

PatCID

PatCID is a collection of chemical structures in patent documents to facilitate search of patent documents in the organic-chemistry domain. Programmatic access to PatCID can facilitate discovery of molecules. This collection was created by processing molecular-structure images in United States Patent and Trademark Office, Japan Patent Office, European Patent Office, Korean Intellectual Property Office, and China National Intellectual Property Administration patent documents.

The key repositories of the PatCID tools are:

  • PatCID - Examples and demostrators of PatCID.
  • MolGrapher - The graph-based visual recognition of chemical structures leveraged when building the PatCID database.
  • deepsearch-toolkit - The programmatic toolkit for interacting with the database and perform chemistry searches.

Publications

Find here our extensive list of publications!

IBM ❤️ Open Source AI

All our projects are brought to you by IBM.

Pinned Loading

  1. deepsearch-toolkit deepsearch-toolkit Public

    Interact with the Deep Search platform for new knowledge explorations and discoveries

    Python 205 27

  2. deepsearch-examples deepsearch-examples Public

    Examples using the Deep Search functionalities

    Python 80 21

  3. DocLayNet DocLayNet Public

    DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis

    346 19

Repositories

Showing 10 of 22 repositories

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy