Issue 2790/xlsx stream missing worksheets #2791

LarryKen · 2024-07-07T21:13:07Z

Summary

Provides a integration test case for issue 2790 as well as a possible solution.

The stream namespace for xlsx workbook reader does not correctly parse large numbers of worksheets > 100. This is due to the way it is unzipping the file, it is reading the entries from the unzipper parse however it is not correctly draining each entry resulting in the stream halting node-unzipper. The iterateStream util does not call autodrain for each data when reading the unzipper stream.

Test plan

This is the test xlsx file that has 188 worksheets.
WORKBOOK_WITH_188_SHEETS.xlsx

I've created a test case that will read the file and check whether the count of worksheets emitted is equal to the count of worksheets in the file.

Running npm run test:integration get 194 tests passing 2 failing (The 2 failing tests are also failing on master).

Related to source code (for typings update)

elouie99 · 2024-09-17T21:32:14Z

Are there any guarantees that the workbookReader would return worksheets in the order they appear in the workbook? I ran a test and it seems like the order is not guaranteed. The total number of worksheets returned appear correct.

In addition, each worksheet returned doesn't appear to return the correct row data if I iterate through the rows and add rows to a new worksheet. If you add some sample rows to a few worksheets in the workbook and run this code, it should reproduce.

async function printWorkbook() {
    const outputWorkbook = new ExcelJS.Workbook();
    let combinedWorksheet = outputWorkbook.addWorksheet('Test worksheet');    
    const filePath = 'WORKBOOK_WITH_188_SHEETS.xlsx';
    const workbookReader = new ExcelJS.stream.xlsx.WorkbookReader(filePath, {
      worksheets: 'emit',
    });

    for await (const worksheet of workbookReader) {
    console.log('===> worksheet:', worksheet.name);
        for await (const row of worksheet) {
            combinedWorksheet.addRow(row.values).commit();
        }
    }
    await outputWorkbook.xlsx.writeFile('combined.xlsx');
}

printWorkbook().catch(console.error);

LarryKen · 2024-09-27T17:01:51Z

There are no guarantee that the worksheets are returned in the way they appear in the workbook. I believe the order is effectively set by how unzipper extracts the xlsx. I believe the order on this branch is the same as on master.

Also regarding the rows not return the correct data. I think that is not an issue introduced by my changes. I am able to replicate the same results on master as on my branch.

LarryHeydoc added 2 commits July 7, 2024 20:47

bug/xlsx-stream-missing-worksheets -> fix for missing worksheets

17df5fd

add tests for issue 2790

aa59a76

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issue 2790/xlsx stream missing worksheets #2791

Issue 2790/xlsx stream missing worksheets #2791

Uh oh!

LarryKen commented Jul 7, 2024

Uh oh!

elouie99 commented Sep 17, 2024 •

edited

Loading

Uh oh!

LarryKen commented Sep 27, 2024

Uh oh!

Uh oh!

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Issue 2790/xlsx stream missing worksheets #2791

Are you sure you want to change the base?

Issue 2790/xlsx stream missing worksheets #2791

Uh oh!

Conversation

LarryKen commented Jul 7, 2024

Summary

Test plan

Related to source code (for typings update)

Uh oh!

elouie99 commented Sep 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LarryKen commented Sep 27, 2024

Uh oh!

Uh oh!

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

elouie99 commented Sep 17, 2024 •

edited

Loading