Add aggregate_function argument to utils.concat() #401

hagenw · 2023-10-25T11:04:12Z

This adds the aggregate_function argument to audformat.utils.concat().
If overwrite=False and aggregate_function is not None it will be used as a callable that combines values for all entries that have more than one data point.

NOTE: in test_utils_concat.py only the tests test_concat_aggregate_function() and test_concat_overwrite_aggregate_function() are new, the other part is moved from test_util.py.

codecov · 2023-10-25T11:06:40Z

Codecov Report

Merging #401 (2dc437e) into main (d461614) will not change coverage.
The diff coverage is 100.0%.

Files	Coverage Δ
audformat/core/utils.py	`100.0% <100.0%> (ø)`

audformat/core/utils.py

frankenjoe · 2023-10-25T12:36:26Z

index = audformat.filewise_index(['f1', 'f2', 'f3', 'f4'])
df1 = pd.DataFrame(
    {
        'a': [1, 1, 1],
        'b': [1, 1, 1],
    },
    index=index[:3],
)
df2 = pd.DataFrame(
    {
        'a': [2, 2, 2],
        'b': [2, 2, 2],
    },
    index=index[1:],
)

audformat.utils.concat([df1, df2], aggregate_function=np.sum)

         a     b
file            
f1    <NA>  <NA>
f2       3     3
f3       3     3
f4       2     2

But I was expecting:

         a     b
file            
f1       1     1
f2       3     3
f3       3     3
f4       2     2

hagenw · 2023-10-25T13:29:40Z

Thanks for spotting this, I fixed it (and a related issue) and now we get:

Co-authored-by: Johannes Wagner <[email protected]>

audformat/core/utils.py

Co-authored-by: Johannes Wagner <[email protected]>

frankenjoe · 2023-10-25T16:45:56Z

Thanks for spotting this, I fixed it (and a related issue) and now we get:

I can confirm that example is now working with two frames, but when I add a third one I get:

index = audformat.filewise_index(['f1', 'f2', 'f3', 'f4'])
df1 = pd.DataFrame(
    {
        'a': [1, 1, 1],
        'b': [1, 1, 1],
    },
    index=index[:3],
)
df2 = pd.DataFrame(
    {
        'a': [2, 2, 2],
        'b': [2, 2, 2],
    },
    index=index[1:],
)
df3 = pd.DataFrame(
    {
        'a': [3],
        'b': [3],
    },
    index[:1],
)

audformat.utils.concat([df1, df2, df3], aggregate_function=np.sum)

but what I would expect is:

audformat/core/utils.py

hagenw · 2023-10-26T09:48:46Z

Thanks for finding another example that didn't worked. There was indeed a bigger error in how the overlapping values were collected as for the very first column we need to collect all values that overlap with any of the other columns. Whereas for all other columns it is just fine to collect values that overlap with the first column. I changed the code accordingly, and added a few more tests (most likely still not enough ;) ).

Now we get:

import audformat
import numpy as np
import pandas as pd

index = audformat.filewise_index(['f1', 'f2', 'f3', 'f4'])
df1 = pd.DataFrame(
    {
        'a': [1, 1, 1],
        'b': [1, 1, 1],
    },
    index=index[:3],
)
df2 = pd.DataFrame(
    {
        'a': [2, 2, 2],
        'b': [2, 2, 2],
    },
    index=index[1:],
)
df3 = pd.DataFrame(
    {
        'a': [3],
        'b': [3],
    },
    index[:1],
)

audformat.utils.concat([df1, df2, df3], aggregate_function=np.sum)

frankenjoe · 2023-10-26T13:52:26Z

Looks good, can't find another failing example. Nice extension, strange we didn't come up with this aggregate solution earlier.

hagenw · 2023-10-26T13:56:35Z

I guess there was no urgent need to have it earlier.

hagenw added 2 commits October 25, 2023 11:47

Move test for concat() to extra file

253fb28

Add aggregate_function argument

8ac6c22

hagenw added 5 commits October 25, 2023 13:15

Handle dtype conversion

957a577

Expand docstring

947438a

Make sure overwrite wins

283797e

Fix docstring

ed244a9

Add tests for different number of columns

4589744

hagenw requested a review from frankenjoe October 25, 2023 11:57

hagenw mentioned this pull request Oct 25, 2023

Add audformat.Database.get() #399

Merged

frankenjoe reviewed Oct 25, 2023

View reviewed changes

audformat/core/utils.py Outdated Show resolved Hide resolved

frankenjoe reviewed Oct 25, 2023

View reviewed changes

audformat/core/utils.py Outdated Show resolved Hide resolved

frankenjoe reviewed Oct 25, 2023

View reviewed changes

audformat/core/utils.py Outdated Show resolved Hide resolved

frankenjoe reviewed Oct 25, 2023

View reviewed changes

audformat/core/utils.py Outdated Show resolved Hide resolved

hagenw added 2 commits October 25, 2023 14:57

Add test for different indices

b553963

Add more tests and fix issue

032835a

hagenw and others added 4 commits October 25, 2023 15:32

Update audformat/core/utils.py

100f82b

Co-authored-by: Johannes Wagner <[email protected]>

Update audformat/core/utils.py

d0087a4

Co-authored-by: Johannes Wagner <[email protected]>

Be more explicit in docstring

d520900

Remove debug statement

3a88079

frankenjoe reviewed Oct 25, 2023

View reviewed changes

audformat/core/utils.py Outdated Show resolved Hide resolved

frankenjoe reviewed Oct 25, 2023

View reviewed changes

audformat/core/utils.py Outdated Show resolved Hide resolved

hagenw and others added 3 commits October 25, 2023 15:47

Increase readability of tests

8dbc83b

Update audformat/core/utils.py

88e8bb4

Co-authored-by: Johannes Wagner <[email protected]>

Change to lambda y: tuple(y)

595b15a

frankenjoe reviewed Oct 25, 2023

View reviewed changes

audformat/core/utils.py Outdated Show resolved Hide resolved

Add failing test for 3 dfs with different index

16283db

hagenw added 5 commits October 26, 2023 11:33

Fix overlap collection issue

aa7aa2a

Use y for series in tests

117c8e1

Clean up code and docstring

d116e1f

Link to numpy docs

edd532e

Simplify docstring

c28eeda

hagenw added 2 commits October 26, 2023 11:50

Show lambda in docstring

c5de2ad

Add more tests

2dc437e

frankenjoe merged commit 77f69bf into main Oct 26, 2023
10 checks passed

frankenjoe deleted the concat-aggregate branch October 26, 2023 13:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add aggregate_function argument to utils.concat() #401

Add aggregate_function argument to utils.concat() #401

hagenw commented Oct 25, 2023 •

edited

Loading

codecov bot commented Oct 25, 2023 •

edited

Loading

frankenjoe commented Oct 25, 2023

hagenw commented Oct 25, 2023

frankenjoe commented Oct 25, 2023

hagenw commented Oct 26, 2023 •

edited

Loading

frankenjoe commented Oct 26, 2023

hagenw commented Oct 26, 2023

Add aggregate_function argument to utils.concat() #401

Add aggregate_function argument to utils.concat() #401

Conversation

hagenw commented Oct 25, 2023 • edited Loading

codecov bot commented Oct 25, 2023 • edited Loading

Codecov Report

frankenjoe commented Oct 25, 2023

hagenw commented Oct 25, 2023

frankenjoe commented Oct 25, 2023

hagenw commented Oct 26, 2023 • edited Loading

frankenjoe commented Oct 26, 2023

hagenw commented Oct 26, 2023

hagenw commented Oct 25, 2023 •

edited

Loading

codecov bot commented Oct 25, 2023 •

edited

Loading

hagenw commented Oct 26, 2023 •

edited

Loading