how to optimize this kind of process

driving_crooner@lemmy.eco.br · 11 months ago

how to optimize this kind of process

Goku@lemmy.world · edit-2 11 months ago

In that case you can iterate over the rows instead of using apply()

Test it out and see if it’s more efficient.

Also, you can improve performance by only passing the required columns to apply()

df['c'] = df[['a','b']].apply(function, axis=1)

Actually this seems like a better solution for you.

Here’s another approach, I like this one more because it is a closer match to the problem you described.

Check the result_type=expand argument for df.apply()

driving_crooner@lemmy.eco.br · edit-2 11 months ago

Actually this seems like a better solution for you.

Here’s another approach, I like this one more because it is a closer match to the problem you described.

Thanks, tried the first approach but was slower that what I was doing. The second one didn’t worked because I use some of the new generated columns to create new ones, but doing the process twice, to use the new columns to create the additional columns worked well and reduced the process time from 22m to 13m. Maybe they’re ways to optimize even more the code, but 13 minutes is good enough for me.

Edit: for some reason it broke the information in some way and the next steps of the process are giving me errors 😐

Edit2: I’m an idiot, I made an error while updating the code to the new method.