-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add aggregate_function argument to utils.concat() #401
Conversation
index = audformat.filewise_index(['f1', 'f2', 'f3', 'f4'])
df1 = pd.DataFrame(
{
'a': [1, 1, 1],
'b': [1, 1, 1],
},
index=index[:3],
)
df2 = pd.DataFrame(
{
'a': [2, 2, 2],
'b': [2, 2, 2],
},
index=index[1:],
)
audformat.utils.concat([df1, df2], aggregate_function=np.sum)
But I was expecting:
|
Thanks for spotting this, I fixed it (and a related issue) and now we get:
|
Co-authored-by: Johannes Wagner <[email protected]>
Co-authored-by: Johannes Wagner <[email protected]>
Co-authored-by: Johannes Wagner <[email protected]>
I can confirm that example is now working with two frames, but when I add a third one I get: index = audformat.filewise_index(['f1', 'f2', 'f3', 'f4'])
df1 = pd.DataFrame(
{
'a': [1, 1, 1],
'b': [1, 1, 1],
},
index=index[:3],
)
df2 = pd.DataFrame(
{
'a': [2, 2, 2],
'b': [2, 2, 2],
},
index=index[1:],
)
df3 = pd.DataFrame(
{
'a': [3],
'b': [3],
},
index[:1],
)
audformat.utils.concat([df1, df2, df3], aggregate_function=np.sum)
but what I would expect is:
|
Thanks for finding another example that didn't worked. There was indeed a bigger error in how the overlapping values were collected as for the very first column we need to collect all values that overlap with any of the other columns. Whereas for all other columns it is just fine to collect values that overlap with the first column. I changed the code accordingly, and added a few more tests (most likely still not enough ;) ). Now we get: import audformat
import numpy as np
import pandas as pd
index = audformat.filewise_index(['f1', 'f2', 'f3', 'f4'])
df1 = pd.DataFrame(
{
'a': [1, 1, 1],
'b': [1, 1, 1],
},
index=index[:3],
)
df2 = pd.DataFrame(
{
'a': [2, 2, 2],
'b': [2, 2, 2],
},
index=index[1:],
)
df3 = pd.DataFrame(
{
'a': [3],
'b': [3],
},
index[:1],
)
audformat.utils.concat([df1, df2, df3], aggregate_function=np.sum)
|
Looks good, can't find another failing example. Nice extension, strange we didn't come up with this aggregate solution earlier. |
I guess there was no urgent need to have it earlier. |
This adds the
aggregate_function
argument toaudformat.utils.concat()
.If
overwrite=False
andaggregate_function
is notNone
it will be used as a callable that combines values for all entries that have more than one data point.NOTE: in
test_utils_concat.py
only the teststest_concat_aggregate_function()
andtest_concat_overwrite_aggregate_function()
are new, the other part is moved fromtest_util.py
.