-
Notifications
You must be signed in to change notification settings - Fork 2k
Insights: docling-project/docling
Overview
Could not load contribution data
Please try again later
1 Release published by 1 person
-
v2.36.1
published
Jun 4, 2025
10 Pull requests merged by 7 people
-
fix:NoneTypeError in MsPowerpointDocumentBackend
#1747 merged
Jun 10, 2025 -
feat: support xlsm files
#1520 merged
Jun 10, 2025 -
fix: prov for merged-elems
#1728 merged
Jun 10, 2025 -
fix: initialize df_osd to avoid uninitialized variable error
#1718 merged
Jun 10, 2025 -
fix: allow custom torch_dtype in vlm models
#1735 merged
Jun 10, 2025 -
docs: add open webui
#1734 merged
Jun 10, 2025 -
fix: Improve extraction from textboxes in Word docs
#1701 merged
Jun 6, 2025 -
fix: Add WEBP to the list of image file extensions
#1711 merged
Jun 5, 2025 -
fix: remove typer and click constraints
#1707 merged
Jun 4, 2025 -
docs: flash-attn usage and install
#1706 merged
Jun 4, 2025
3 Pull requests opened by 3 people
-
fix: #1469 handle list index out of range error with page_range param
#1717 opened
Jun 5, 2025 -
fix: Handle multiple formatted elements in list MD parsing
#1725 opened
Jun 6, 2025 -
feat: Make Page.parsed_page the only source of truth for text cells, add OCR cells to it
#1745 opened
Jun 10, 2025
8 Issues closed by 4 people
-
pptx parsing
#1742 closed
Jun 10, 2025 -
DoclingDocument validation fails with "Nothing to be deleted" error on list_item elements
#1741 closed
Jun 10, 2025 -
ValueError: ListItem's parent must be a list group
#1732 closed
Jun 8, 2025 -
Legitimate duplicate text in textbox in docx is being unexpectedly removed
#1668 closed
Jun 6, 2025 -
Loading Failure of Files with Chinese Filenames
#1218 closed
Jun 5, 2025 -
Controlled requests to external inference provider.
#1661 closed
Jun 5, 2025 -
Load DoclingDocument from .doctags file?
#1713 closed
Jun 5, 2025 -
How to Avoid Duplicate Table Content in Text Extraction with Docling
#1675 closed
Jun 5, 2025
21 Issues opened by 21 people
-
Documents output filled with GLYPH word
#1744 opened
Jun 10, 2025 -
Error while parsing : maximum recursion depth exceeded
#1743 opened
Jun 10, 2025 -
Docling does not work well with Vietnamese
#1740 opened
Jun 10, 2025 -
Table omission in conversion from Docling Document to markdown
#1738 opened
Jun 9, 2025 -
Words in a line break into multiple lines
#1737 opened
Jun 9, 2025 -
error running the simple convert example from the home page
#1736 opened
Jun 9, 2025 -
Add LangChain ChainVlmOptions to complete the conversion of Vlm documents with the help of langchian
#1733 opened
Jun 8, 2025 -
Use flash attention with VLM
#1730 opened
Jun 7, 2025 -
Improce VLM API model calls
#1729 opened
Jun 7, 2025 -
New RapidOCR Version Support
#1727 opened
Jun 6, 2025 -
No CLI or docling-serve option to set truncation for tokenizer; warning cannot be suppressed
#1726 opened
Jun 6, 2025 -
KeyError: -1
#1722 opened
Jun 5, 2025 -
[Bee] Add OCR-derived word cells into SegmentedPage
#1721 opened
Jun 5, 2025 -
Layout Parser Fine-Tune?
#1719 opened
Jun 5, 2025 -
Error while parsing a .csv file
#1716 opened
Jun 5, 2025 -
Conversion seems slow, while GPU is underutilized - num_threads seem to have absolutely no effect
#1715 opened
Jun 5, 2025 -
It says "Input pdf "filename.pdf" is not valid, when ran as an .exe
#1714 opened
Jun 5, 2025 -
Severe Markdown Conversion Issues: Cropped/Missing Images, Mixed Layout
#1712 opened
Jun 5, 2025 -
CLI exporting of images from Excel and PowerPoint files fails
#1710 opened
Jun 4, 2025 -
Add Comparison Chart to Top 8 Alternatives
#1709 opened
Jun 4, 2025 -
Bug: Miss first list item when new list begins
#1705 opened
Jun 4, 2025
29 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
fix: pptx line break and space handling
#1664 commented on
Jun 10, 2025 • 4 new comments -
fix: pptx shape order
#1658 commented on
Jun 10, 2025 • 0 new comments -
fix(msword_backend): Identify text in the same line after an image #1425
#1610 commented on
Jun 10, 2025 • 0 new comments -
fix: Capturing of pptx images following the docx backend
#1328 commented on
Jun 7, 2025 • 0 new comments -
Getting word-level bounding boxes from DocumentConverter
#1704 commented on
Jun 10, 2025 • 0 new comments -
Performance Issue. 28.1% less inference time on demo case with a simple change.
#1521 commented on
Jun 10, 2025 • 0 new comments -
Poor performance on pages with many elements
#1624 commented on
Jun 10, 2025 • 0 new comments -
How do I add GPU parallelization for the do_formula_enrichment model?
#1693 commented on
Jun 10, 2025 • 0 new comments -
Inconsistent Markdown Output with generate_multimodal_pages Method in Docling
#1584 commented on
Jun 10, 2025 • 0 new comments -
Bugs new Docling release
#1703 commented on
Jun 9, 2025 • 0 new comments -
Embedded Image data was not loaded correctly when opening a MD file
#1305 commented on
Jun 9, 2025 • 0 new comments -
CUDA support
#1649 commented on
Jun 9, 2025 • 0 new comments -
Issue: EasyOCR model cannot be accessed from Spark workers when using pre-downloaded docling model
#1414 commented on
Jun 8, 2025 • 0 new comments -
Refined layout parsing
#1614 commented on
Jun 8, 2025 • 0 new comments -
Downloading detection model, please wait. This may take several minutes depending upon your network connection.
#1638 commented on
Jun 7, 2025 • 0 new comments -
Export to markdown does not save image references
#1574 commented on
Jun 6, 2025 • 0 new comments -
Footer Text Interferes with Main PDF Content During Parsing
#1625 commented on
Jun 6, 2025 • 0 new comments -
Stride - chunk overlap
#1686 commented on
Jun 5, 2025 • 0 new comments -
As multiple PDFs are parsed, the memory of Docling continues to increase and will not decrease.
#1311 commented on
Jun 5, 2025 • 0 new comments -
Incorrectly populated ProvenanceItem in ReadingOrderModel
#1699 commented on
Jun 5, 2025 • 0 new comments -
Runtime Error since v.2.34.0 related to OSD detection
#1657 commented on
Jun 5, 2025 • 0 new comments -
Memory leak caused by EasyOCR
#1343 commented on
Jun 5, 2025 • 0 new comments -
Multi-threading and multi-processing for faster parsing
#1256 commented on
Jun 5, 2025 • 0 new comments -
Incorrect Table Columns
#1678 commented on
Jun 5, 2025 • 0 new comments -
Export to markdown only contains H2 headers
#1023 commented on
Jun 4, 2025 • 0 new comments -
html parser KeyError: -1
#1702 commented on
Jun 4, 2025 • 0 new comments -
Coversion of the document contains false positive classified tables
#1680 commented on
Jun 4, 2025 • 0 new comments -
KeyError and RuntimeError occurred when opening a document(docx)
#1645 commented on
Jun 4, 2025 • 0 new comments -
Docling getting killed when i feed bigger pdf files which have 900+ pages
#1654 commented on
Jun 4, 2025 • 0 new comments