Skip to content

Issue 2790/xlsx stream missing worksheets #2791

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

LarryKen
Copy link

@LarryKen LarryKen commented Jul 7, 2024

Summary

Provides a integration test case for issue 2790 as well as a possible solution.

The stream namespace for xlsx workbook reader does not correctly parse large numbers of worksheets > 100. This is due to the way it is unzipping the file, it is reading the entries from the unzipper parse however it is not correctly draining each entry resulting in the stream halting node-unzipper. The iterateStream util does not call autodrain for each data when reading the unzipper stream.

Test plan

This is the test xlsx file that has 188 worksheets.
WORKBOOK_WITH_188_SHEETS.xlsx

I've created a test case that will read the file and check whether the count of worksheets emitted is equal to the count of worksheets in the file.

Running npm run test:integration get 194 tests passing 2 failing (The 2 failing tests are also failing on master).

image

Related to source code (for typings update)

@elouie99
Copy link

elouie99 commented Sep 17, 2024

Are there any guarantees that the workbookReader would return worksheets in the order they appear in the workbook? I ran a test and it seems like the order is not guaranteed. The total number of worksheets returned appear correct.

In addition, each worksheet returned doesn't appear to return the correct row data if I iterate through the rows and add rows to a new worksheet. If you add some sample rows to a few worksheets in the workbook and run this code, it should reproduce.

async function printWorkbook() {
    const outputWorkbook = new ExcelJS.Workbook();
    let combinedWorksheet = outputWorkbook.addWorksheet('Test worksheet');    
    const filePath = 'WORKBOOK_WITH_188_SHEETS.xlsx';
    const workbookReader = new ExcelJS.stream.xlsx.WorkbookReader(filePath, {
      worksheets: 'emit',
    });

    for await (const worksheet of workbookReader) {
    console.log('===> worksheet:', worksheet.name);
        for await (const row of worksheet) {
            combinedWorksheet.addRow(row.values).commit();
        }
    }
    await outputWorkbook.xlsx.writeFile('combined.xlsx');
}

printWorkbook().catch(console.error);

@LarryKen
Copy link
Author

There are no guarantee that the worksheets are returned in the way they appear in the workbook. I believe the order is effectively set by how unzipper extracts the xlsx. I believe the order on this branch is the same as on master.

Also regarding the rows not return the correct data. I think that is not an issue introduced by my changes. I am able to replicate the same results on master as on my branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy