Skip to content

Add renameAll {} DSL #1168

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Jolanrensen opened this issue May 2, 2025 · 0 comments
Open

Add renameAll {} DSL #1168

Jolanrensen opened this issue May 2, 2025 · 0 comments
Labels
enhancement New feature or request
Milestone

Comments

@Jolanrensen
Copy link
Collaborator

Renaming multiple columns at once is a hassle,
especially when you want to use column accessors or the compiler plugin.

The options currently are:

df.rename { all() }.into {
    when (it.name) {
        "arter" -> "species"
        "ø" -> "island"
        "næblængde_mm" -> "bill_length_mm"
        "næbdybde_mm" -> "bill_depth_mm"
        "luffelængde_mm" -> "flipper_length_mm"
        "kropsmasse_g" -> "body_mass_g"
        "køn" -> "sex"
        "måledato" -> "measurement_date"
        else -> error("for ${it.name}")
    }
}

This is unsafe and might break, plus the compiler plugin cannot interpret it.

df.rename(
    "arter" to "species",
    "ø" to "island",
    "næblængde_mm" to "bill_length_mm",
    "næbdybde_mm" to "bill_depth_mm",
    "luffelængde_mm" to "flipper_length_mm",
    "kropsmasse_g" to "body_mass_g",
    "køn" to "sex",
    "måledato" to "measurement_date",
)

While the compiler plugin can interpret this code, it can still contain typos, so I'd count it as unsafe too.

df.rename { all() }.into(
    "species",
    "island",
    "bill_length_mm",
    "bill_depth_mm",
    "flipper_length_mm",
    "body_mass_g",
    "sex",
    "measurement_date"
)

Order dependent, which is seldom a good thing

df
    .rename { arter }.into("species")
    .rename { ø }.into("island")
    .rename { næblængde_mm }.into("bill_length_mm")
    .rename { næbdybde_mm }.into("bill_depth_mm")
    .rename { luffelængde_mm }.into("flipper_length_mm")
    .rename { kropsmasse_g }.into("body_mass_g")
    .rename { køn }.into("sex")
    .rename { måledato }.into("measurement_date")

The safest solution, but a hassle to type out every time, and not very readable

df.select {
    cols(
        arter into "species",
        ø into "island",
        næblængde_mm into "bill_length_m",
        næbdybde_mm into "bill_depth_mm",
        luffelængde_mm into "flipper_lenth_mm",
        kropsmasse_g into "body_mass_g",
        køn into "sex",
        måledato into "measurement_date",
    )
}

Funnily enough, the most readable solution doesn't even use the rename operation ;P that should be a good indication we need improvement.

I'd suggest:

df.renameAll {
    arter into "species"
    ø into "island"
    næblængde_mm into "bill_length_m"
    næbdybde_mm into "bill_depth_mm"
    luffelængde_mm into "flipper_lenth_mm"
    kropsmasse_g into "body_mass_g"
    køn into "sex"
    måledato into "measurement_date"
}

We cannot reuse rename {} because it doesn't return a DataFrame.
We would need to carefully shadow into from the ColumnsSelectionDsl, but otherwise, we should be fine.

@Jolanrensen Jolanrensen added the enhancement New feature or request label May 2, 2025
@Jolanrensen Jolanrensen added this to the Backlog milestone May 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy