Add df.duplicated() method for DataFrame similar to pandas #667

RahulDas-dev · 2025-04-30T20:32:58Z

Is your feature request related to a problem? Please describe.
Currently, Danfo.js does not have a method to identify duplicate rows in a DataFrame, which is a common data manipulation task. This limitation can make it challenging to clean or preprocess data efficiently, especially when working with large datasets. For example, in pandas, the df.duplicated() method is highly useful for flagging duplicate rows, but no such equivalent exists in Danfo.js.

The getDuplicate method in utility is array-specific and doesn't directly address DataFrame row duplication.
For Series: The dropDuplicates method works on Series and can help remove duplicate values but isn't designed for identifying duplicate rows in a DataFrame.

Describe the solution you'd like
I would like Danfo.js to implement a df.duplicated() method for DataFrames, similar to pandas. This method should return a boolean Series indicating whether each row in the DataFrame is a duplicate of a previous row. The method should also include parameters such as:

subset: Specify columns to consider when identifying duplicates.
keep: Define which duplicates to mark as True ('first', 'last', or 'none').

Describe alternatives you've considered
An alternative would be to manually implement a custom function to compare rows and identify duplicates. However, this approach is less efficient and may lead to inconsistent or error-prone implementations across different projects. Providing a built-in method would standardize and simplify the process for all users.

Additional context
This feature would align Danfo.js closely with pandas, making it easier for users transitioning from Python to JavaScript.
The implementation could leverage existing internal methods for row/column comparisons to ensure optimal performance.
This feature is particularly useful in data cleaning workflows and preprocessing pipelines.

Proposed API:

const df = new DataFrame([
    { col1: 1, col2: 2 },
    { col1: 1, col2: 2 },
    { col1: 3, col2: 4 },
]);

const duplicates = df.duplicated();
console.log(duplicates); // Output: [false, true, false]

// Sample DataFrame
const data = [
    { col1: 1, col2: 2, col3: 'A' },
    { col1: 1, col2: 2, col3: 'B' },
    { col1: 3, col2: 4, col3: 'A' },
    { col1: 1, col2: 2, col3: 'A' }
];

const df = new DataFrame(data);

// Find duplicates considering only 'col1' and 'col2'
const duplicates = df.duplicated({ subset: ['col1', 'col2'], keep: 'first' });

console.log(duplicates);
// Output: [false, true, false, true]

// Explanation:
// - Row 1 is not duplicate.
// - Row 2 is duplicate of Row 1 based on 'col1' and 'col2'.
// - Row 3 is not duplicate.
// - Row 4 is duplicate of Row 1 based on 'col1' and 'col2'.

Parameters Explained

subset: Specifies the columns to consider when checking for duplicates. In this example, only col1 and col2 are considered.
keep: Determines which duplicate to mark as False:

'first' (default): Marks duplicates except for the first occurrence.
'last': Marks duplicates except for the last occurrence.
false: Marks all duplicates as True.

This feature would be very useful in filtering and processing data efficiently, similar to pandas' duplicated() method. Let me know if you'd like further clarifications!

The text was updated successfully, but these errors were encountered:

RahulDas-dev mentioned this issue May 3, 2025

feat: add dataframe duolicated issue - #667 #669

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add df.duplicated() method for DataFrame similar to pandas #667

Add df.duplicated() method for DataFrame similar to pandas #667

RahulDas-dev commented Apr 30, 2025 •

edited

Loading

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Uh oh!

Add df.duplicated() method for DataFrame similar to pandas #667

Add df.duplicated() method for DataFrame similar to pandas #667

Comments

RahulDas-dev commented Apr 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

RahulDas-dev commented Apr 30, 2025 •

edited

Loading