Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructure littlefs to not use recursion, measure stack usage #658

Merged
merged 22 commits into from
Apr 11, 2022

Conversation

geky
Copy link
Member

@geky geky commented Mar 21, 2022

This PR restructures the internals of littlefs a bit to completely avoid recursion. In theory the design of littlefs uses strictly bounded RAM, though it was written using some recursive functions, and at least one code path relied on tail-call optimization for bounded RAM consumption, which isn't guaranteed in standard C.

This broke down into three pieces of logic that used recursion:

  1. lfs_dir_commit/lfs_fs_relocate - 84da4c0

    The biggest culprit of recursion was the relocation logic responsible for moving metadata to new blocks when a block goes bad or has high wear. The reason is because this occurs deep within the commit logic and is inherently recursive (any commit can require relocation, which needs to commit to fix the relocation!)

    This required quite a bit of restructuring, inverting most of the commit logic so that relocations occur on an outer loop that invokes the core commit logic. The commit logic ends up with several new entry points that "fall back" to outer loops under different error conditions, with the top level loop being a full lfs_fs_deorphan pass.

  2. lfs_file_read/write/flush - fedf646

    A bit easier to solve, lfs_file_read and lfs_file_write sometimes need to flush state from other lfs_file_read/lfs_file_write calls. The only problem being that lfs_file_flush calls lfs_file_read/lfs_file_write to do the flush!

    This was easy to solve with an extra set of lfs_file_flushedread/lfs_file_flushedwrite entry points that avoid this. Note that
    this code path couldn't actually occur normally outside of filesystem bugs.

  3. lfs_dir_traverse - 8109f28

    lfs_dir_traverse was also a bit tricky. It's inherently recursive, but has a strict upper bound of 4 calls (a traverse with a filter with a move with a filter).

    I tried a few different things, but it was difficult to unroll things without an extra ~1KiB of code cost. The solution here
    is to add an explicit, bounded stack (just an array of state), which turned out to not increase the code cost too badly.

    I suspect this hints that modeling traversals with callbacks is a poor choice for a non-recursive code, so this may be worth implementing differently at some point in the future.


This PR also adds a few more scripts and utilities to help analyze builds and compare different implementations:

  • make code or ./scripts/code.py - Reports the code size of functions
  • make data or ./scripts/data.py - Reports any data sections, this should always be zero for littlefs
  • make stack or ./scripts/stack.py - Reports stack frames and maximum stack usage of functions
  • make structs or ./scripts/structs.py - Reports the size of any public structs

For example:

$ make stack
Output
./scripts/stack.py lfs.ci lfs_util.ci -S
name                                   frame   limit
lfs_dir_close                              0       0
lfs_dir_tell                               0       0
lfs_file_size                              0       0
lfs_file_tell                              0       0
lfs_fs_preporphans                         0       0
lfs_fs_size_count                          0       0
lfs_pair_isnull                            0       0
lfs_tag_dsize                              0       0
lfs_alloc_lookahead                        8       8
lfs_bd_erase.isra                          8       8
lfs_cache_zero.isra                        8       8
lfs_deinit                                 8       8
lfs_dir_commit_size                        8       8
lfs_gstate_xor                             8       8
lfs_pair_cmp                               8       8
lfs_crc                                   16      16
lfs_dir_traverse_filter                   16      16
lfs_gstate_hasmovehere                     8      16
lfs_unmount                                8      16
lfs_ctz_index.isra                        24      24
lfs_init                                  16      24
lfs_bd_read                               64      64
lfs_fs_parent_match                       40     104
lfs_bd_cmp.constprop                      64     128
lfs_ctz_find.constprop                    64     128
lfs_dir_getslice                          64     128
lfs_ctz_traverse                          72     136
lfs_dir_get                               24     152
lfs_dir_find_match                        32     160
lfs_dir_fetchmatch                       112     176
lfs_bd_flush                              56     184
lfs_dir_getgstate                         40     192
lfs_dir_getinfo                           40     192
lfs_dir_getread.constprop                 64     192
lfs_dir_fetch                             24     200
lfs_dir_rewind                             8     208
lfs_bd_prog                               32     216
lfs_dir_read                              24     224
lfs_fs_pred                               32     232
lfs_dir_seek                              40     240
lfs_fs_parent                             64     240
lfs_file_flushedread                      56     248
lfs_dir_commitprog                        40     256
lfs_dir_find                              96     272
lfs_fs_rawtraverse                        88     288
lfs_dir_commitcrc                         80     296
lfs_fs_traverse                            8     296
lfs_fs_rawsize                            16     304
lfs_mount                                112     304
lfs_fs_size                                8     312
lfs_alloc                                 32     320
lfs_dir_commitattr                        64     320
lfs_dir_open                              48     320
lfs_dir_commit_commit                      8     328
lfs_stat                                  56     328
lfs_dir_traverse.constprop               272     336
lfs_getattr                               64     336
lfs_dir_alloc                             32     352
lfs_file_relocate                         56     376
lfs_dir_compact                          136     472
lfs_file_flushedwrite                     96     472
lfs_dir_split                             96     568
lfs_file_flush                           112     584
lfs_file_rawseek                          16     600
lfs_file_read                             24     608
lfs_file_rewind                            8     608
lfs_file_seek                              8     608
lfs_file_rawwrite                         40     624
lfs_file_write                             8     632
lfs_file_truncate                         48     672
lfs_dir_relocatingcommit                 128     696
lfs_dir_orphaningcommit                  136     832
lfs_fs_deorphan                          160     992
lfs_dir_commit                             8    1000
lfs_dir_drop                              24    1024
lfs_file_rawsync                          48    1048
lfs_fs_forceconsistency                   48    1048
lfs_file_sync                              8    1056
lfs_commitattr                            64    1064
lfs_file_rawclose                         16    1064
lfs_file_close                             8    1072
lfs_removeattr                            16    1080
lfs_setattr                               24    1088
lfs_format                               104    1104
lfs_file_rawopencfg                       64    1128
lfs_file_open                             16    1144
lfs_file_opencfg                          16    1144
lfs_remove                               128    1176
lfs_mkdir                                184    1232
lfs_rename                               216    1264
TOTAL                                   4248    1264

Or with callgraph info:

$ make stack STACKFLAGS+='-L1'
Output
./scripts/stack.py lfs.ci lfs_util.ci -S -L1
name                                   frame   limit
lfs_dir_close                              0       0
lfs_dir_tell                               0       0
lfs_file_size                              0       0
lfs_file_tell                              0       0
lfs_fs_preporphans                         0       0
lfs_fs_size_count                          0       0
lfs_pair_isnull                            0       0
lfs_tag_dsize                              0       0
lfs_alloc_lookahead                        8       8
lfs_bd_erase.isra                          8       8
lfs_cache_zero.isra                        8       8
lfs_deinit                                 8       8
lfs_dir_commit_size                        8       8
'-> lfs_tag_dsize                          0       0
lfs_gstate_xor                             8       8
lfs_pair_cmp                               8       8
lfs_crc                                   16      16
lfs_dir_traverse_filter                   16      16
lfs_gstate_hasmovehere                     8      16
'-> lfs_pair_cmp                           8       8
lfs_unmount                                8      16
'-> lfs_deinit                             8       8
lfs_ctz_index.isra                        24      24
lfs_init                                  16      24
|-> lfs_cache_zero.isra                    8       8
'-> lfs_deinit                             8       8
lfs_bd_read                               64      64
lfs_fs_parent_match                       40     104
|-> lfs_pair_cmp                           8       8
'-> lfs_bd_read                           64      64
lfs_bd_cmp.constprop                      64     128
'-> lfs_bd_read                           64      64
lfs_ctz_find.constprop                    64     128
|-> lfs_ctz_index.isra                    24      24
'-> lfs_bd_read                           64      64
lfs_dir_getslice                          64     128
|-> lfs_tag_dsize                          0       0
|-> lfs_gstate_hasmovehere                 8      16
'-> lfs_bd_read                           64      64
lfs_ctz_traverse                          72     136
|-> lfs_ctz_index.isra                    24      24
'-> lfs_bd_read                           64      64
lfs_dir_get                               24     152
'-> lfs_dir_getslice                      64     128
lfs_dir_find_match                        32     160
'-> lfs_bd_cmp.constprop                  64     128
lfs_dir_fetchmatch                       112     176
|-> lfs_tag_dsize                          0       0
|-> lfs_crc                               16      16
|-> lfs_gstate_hasmovehere                 8      16
'-> lfs_bd_read                           64      64
lfs_bd_flush                              56     184
|-> lfs_cache_zero.isra                    8       8
'-> lfs_bd_cmp.constprop                  64     128
lfs_dir_getgstate                         40     192
|-> lfs_gstate_xor                         8       8
'-> lfs_dir_get                           24     152
lfs_dir_getinfo                           40     192
'-> lfs_dir_get                           24     152
lfs_dir_getread.constprop                 64     192
'-> lfs_dir_getslice                      64     128
lfs_dir_fetch                             24     200
'-> lfs_dir_fetchmatch                   112     176
lfs_dir_rewind                             8     208
'-> lfs_dir_fetch                         24     200
lfs_bd_prog                               32     216
'-> lfs_bd_flush                          56     184
lfs_dir_read                              24     224
|-> lfs_dir_getinfo                       40     192
'-> lfs_dir_fetch                         24     200
lfs_fs_pred                               32     232
|-> lfs_pair_isnull                        0       0
|-> lfs_pair_cmp                           8       8
'-> lfs_dir_fetch                         24     200
lfs_dir_seek                              40     240
|-> lfs_pair_cmp                           8       8
'-> lfs_dir_fetch                         24     200
lfs_fs_parent                             64     240
|-> lfs_pair_isnull                        0       0
'-> lfs_dir_fetchmatch                   112     176
lfs_file_flushedread                      56     248
|-> lfs_bd_read                           64      64
|-> lfs_ctz_find.constprop                64     128
'-> lfs_dir_getread.constprop             64     192
lfs_dir_commitprog                        40     256
|-> lfs_crc                               16      16
'-> lfs_bd_prog                           32     216
lfs_dir_find                              96     272
|-> lfs_dir_get                           24     152
'-> lfs_dir_fetchmatch                   112     176
lfs_fs_rawtraverse                        88     288
|-> lfs_pair_isnull                        0       0
|-> lfs_ctz_traverse                      72     136
|-> lfs_dir_get                           24     152
'-> lfs_dir_fetch                         24     200
lfs_dir_commitcrc                         80     296
|-> lfs_crc                               16      16
|-> lfs_bd_read                           64      64
|-> lfs_bd_flush                          56     184
'-> lfs_bd_prog                           32     216
lfs_fs_traverse                            8     296
'-> lfs_fs_rawtraverse                    88     288
lfs_fs_rawsize                            16     304
'-> lfs_fs_rawtraverse                    88     288
lfs_mount                                112     304
|-> lfs_pair_isnull                        0       0
|-> lfs_init                              16      24
|-> lfs_dir_get                           24     152
|-> lfs_dir_fetchmatch                   112     176
'-> lfs_dir_getgstate                     40     192
lfs_fs_size                                8     312
'-> lfs_fs_rawsize                        16     304
lfs_alloc                                 32     320
'-> lfs_fs_rawtraverse                    88     288
lfs_dir_commitattr                        64     320
|-> lfs_tag_dsize                          0       0
|-> lfs_bd_read                           64      64
'-> lfs_dir_commitprog                    40     256
lfs_dir_open                              48     320
|-> lfs_dir_get                           24     152
|-> lfs_dir_fetch                         24     200
'-> lfs_dir_find                          96     272
lfs_dir_commit_commit                      8     328
'-> lfs_dir_commitattr                    64     320
lfs_stat                                  56     328
|-> lfs_dir_getinfo                       40     192
'-> lfs_dir_find                          96     272
lfs_dir_traverse.constprop               272     336
|-> lfs_tag_dsize                          0       0
'-> lfs_bd_read                           64      64
lfs_getattr                               64     336
|-> lfs_dir_get                           24     152
|-> lfs_dir_fetch                         24     200
'-> lfs_dir_find                          96     272
lfs_dir_alloc                             32     352
|-> lfs_bd_read                           64      64
'-> lfs_alloc                             32     320
lfs_file_relocate                         56     376
|-> lfs_bd_erase.isra                      8       8
|-> lfs_cache_zero.isra                    8       8
|-> lfs_bd_read                           64      64
|-> lfs_dir_getread.constprop             64     192
|-> lfs_bd_prog                           32     216
'-> lfs_alloc                             32     320
lfs_dir_compact                          136     472
|-> lfs_pair_isnull                        0       0
|-> lfs_bd_erase.isra                      8       8
|-> lfs_gstate_xor                         8       8
|-> lfs_pair_cmp                           8       8
|-> lfs_dir_getgstate                     40     192
|-> lfs_dir_commitprog                    40     256
|-> lfs_dir_commitcrc                     80     296
|-> lfs_alloc                             32     320
|-> lfs_dir_commitattr                    64     320
'-> lfs_dir_traverse.constprop           272     336
lfs_file_flushedwrite                     96     472
|-> lfs_bd_erase.isra                      8       8
|-> lfs_cache_zero.isra                    8       8
|-> lfs_ctz_index.isra                    24      24
|-> lfs_bd_read                           64      64
|-> lfs_ctz_find.constprop                64     128
|-> lfs_bd_prog                           32     216
|-> lfs_alloc                             32     320
'-> lfs_file_relocate                     56     376
lfs_dir_split                             96     568
|-> lfs_pair_cmp                           8       8
|-> lfs_dir_alloc                         32     352
'-> lfs_dir_compact                      136     472
lfs_file_flush                           112     584
|-> lfs_bd_flush                          56     184
|-> lfs_file_flushedread                  56     248
|-> lfs_file_relocate                     56     376
'-> lfs_file_flushedwrite                 96     472
lfs_file_rawseek                          16     600
'-> lfs_file_flush                       112     584
lfs_file_read                             24     608
|-> lfs_file_flushedread                  56     248
'-> lfs_file_flush                       112     584
lfs_file_rewind                            8     608
'-> lfs_file_rawseek                      16     600
lfs_file_seek                              8     608
'-> lfs_file_rawseek                      16     600
lfs_file_rawwrite                         40     624
|-> lfs_file_flushedwrite                 96     472
'-> lfs_file_flush                       112     584
lfs_file_write                             8     632
'-> lfs_file_rawwrite                     40     624
lfs_file_truncate                         48     672
|-> lfs_ctz_find.constprop                64     128
|-> lfs_file_flush                       112     584
|-> lfs_file_rawseek                      16     600
'-> lfs_file_rawwrite                     40     624
lfs_dir_relocatingcommit                 128     696
|-> lfs_gstate_xor                         8       8
|-> lfs_pair_cmp                           8       8
|-> lfs_dir_getgstate                     40     192
|-> lfs_dir_fetch                         24     200
|-> lfs_fs_pred                           32     232
|-> lfs_dir_commitcrc                     80     296
|-> lfs_fs_rawsize                        16     304
|-> lfs_dir_commitattr                    64     320
|-> lfs_dir_traverse.constprop           272     336
|-> lfs_dir_compact                      136     472
'-> lfs_dir_split                         96     568
lfs_dir_orphaningcommit                  136     832
|-> lfs_fs_preporphans                     0       0
|-> lfs_pair_cmp                           8       8
|-> lfs_gstate_hasmovehere                 8      16
|-> lfs_dir_getgstate                     40     192
|-> lfs_fs_pred                           32     232
|-> lfs_fs_parent                         64     240
|-> lfs_file_relocate                     56     376
|-> lfs_file_flush                       112     584
'-> lfs_dir_relocatingcommit             128     696
lfs_fs_deorphan                          160     992
|-> lfs_fs_preporphans                     0       0
|-> lfs_pair_isnull                        0       0
|-> lfs_gstate_hasmovehere                 8      16
|-> lfs_dir_get                           24     152
|-> lfs_dir_getgstate                     40     192
|-> lfs_dir_fetch                         24     200
|-> lfs_fs_parent                         64     240
'-> lfs_dir_orphaningcommit              136     832
lfs_dir_commit                             8    1000
|-> lfs_dir_orphaningcommit              136     832
'-> lfs_fs_deorphan                      160     992
lfs_dir_drop                              24    1024
|-> lfs_dir_getgstate                     40     192
'-> lfs_dir_commit                         8    1000
lfs_file_rawsync                          48    1048
|-> lfs_pair_isnull                        0       0
|-> lfs_file_flush                       112     584
'-> lfs_dir_commit                         8    1000
lfs_fs_forceconsistency                   48    1048
|-> lfs_dir_fetch                         24     200
|-> lfs_fs_deorphan                      160     992
'-> lfs_dir_commit                         8    1000
lfs_file_sync                              8    1056
'-> lfs_file_rawsync                      48    1048
lfs_commitattr                            64    1064
|-> lfs_dir_fetch                         24     200
|-> lfs_dir_find                          96     272
'-> lfs_dir_commit                         8    1000
lfs_file_rawclose                         16    1064
'-> lfs_file_rawsync                      48    1048
lfs_file_close                             8    1072
'-> lfs_file_rawclose                     16    1064
lfs_removeattr                            16    1080
'-> lfs_commitattr                        64    1064
lfs_setattr                               24    1088
'-> lfs_commitattr                        64    1064
lfs_format                               104    1104
|-> lfs_deinit                             8       8
|-> lfs_init                              16      24
|-> lfs_dir_fetch                         24     200
|-> lfs_dir_alloc                         32     352
'-> lfs_dir_commit                         8    1000
lfs_file_rawopencfg                       64    1128
|-> lfs_cache_zero.isra                    8       8
|-> lfs_dir_get                           24     152
|-> lfs_dir_find                          96     272
|-> lfs_dir_commit                         8    1000
|-> lfs_fs_forceconsistency               48    1048
'-> lfs_file_rawclose                     16    1064
lfs_file_open                             16    1144
'-> lfs_file_rawopencfg                   64    1128
lfs_file_opencfg                          16    1144
'-> lfs_file_rawopencfg                   64    1128
lfs_remove                               128    1176
|-> lfs_fs_preporphans                     0       0
|-> lfs_dir_get                           24     152
|-> lfs_dir_fetch                         24     200
|-> lfs_fs_pred                           32     232
|-> lfs_dir_find                          96     272
|-> lfs_dir_commit                         8    1000
|-> lfs_dir_drop                          24    1024
'-> lfs_fs_forceconsistency               48    1048
lfs_mkdir                                184    1232
|-> lfs_fs_preporphans                     0       0
|-> lfs_dir_fetch                         24     200
|-> lfs_dir_find                          96     272
|-> lfs_dir_alloc                         32     352
|-> lfs_dir_commit                         8    1000
'-> lfs_fs_forceconsistency               48    1048
lfs_rename                               216    1264
|-> lfs_fs_preporphans                     0       0
|-> lfs_pair_cmp                           8       8
|-> lfs_dir_get                           24     152
|-> lfs_dir_fetch                         24     200
|-> lfs_fs_pred                           32     232
|-> lfs_dir_find                          96     272
|-> lfs_dir_commit                         8    1000
|-> lfs_dir_drop                          24    1024
'-> lfs_fs_forceconsistency               48    1048
TOTAL                                   4248    1264

Note this depends on the -fstack-usage flag introduced in GCC 10.

Also, if you're curious what the deepest call is in littlefs after this change:

$ ./scripts/stack.py lfs.ci lfs_util.ci -L -s
name                                                         frame   limit
lfs_rename                                                     216    1264
|-> lfs_fs_forceconsistency                                     48    1048
|   |-> lfs_dir_commit                                           8    1000
|   |   |-> lfs_fs_deorphan                                    160     992
|   |   |   |-> lfs_dir_orphaningcommit                        136     832
|   |   |   |   |-> lfs_dir_relocatingcommit                   128     696
|   |   |   |   |   |-> lfs_dir_split                           96     568
|   |   |   |   |   |   |-> lfs_dir_compact                    136     472
|   |   |   |   |   |   |   |-> lfs_dir_traverse.constprop     272     336
|   |   |   |   |   |   |   |   |-> lfs_bd_read                 64      64

These changes do come with a code cost, as some of the recursion allowed a bit better code reuse:

Configuration Code Stack Structs
Default 14656 B (+5.1%) 1264 B (-∞%) 772 B (+0.0%)
Readonly 5346 B (+0.0%) 336 B (+0.0%) 772 B (+0.0%)
Threadsafe 15586 B (+4.7%) 1264 B (-∞%) 780 B (+0.0%)
Migrate 16308 B (+4.4%) 1528 B (-∞%) 776 B (+0.0%)
Error-asserts 15264 B (+5.0%) 1256 B (-∞%) 772 B (+0.0%)

Should resolve #417

geky added 22 commits March 11, 2022 14:36
Now with -s/--sort and -S/--reverse-sort for sorting the functions by
size.

You may wonder why add reverse-sort, since its utility doesn't seem
worth the cost to implement (these are just helper scripts after all),
the reason is that reverse-sort is quite useful on the command-line,
where scrollback may be truncated, and you only care about the larger
entries.

Outside of the command-line, normal sort is prefered.

Fortunately the difference is just the sign in the sort key.

Note this conflicts with the short --summary flag, so that has been
removed.
This is to avoid unexpected script behavior even though data.py should
always return 0 bytes for littlefs. Maybe a check for this should be
added to CI?
scripts/coverage.py was missed originally because it's not ran as often
as the others. Since it requires run-time info, it's usually only used
in CI.
Note this detects loops (recursion), and renders this as infinity.
Currently littlefs does have a single recursive function and you can see
how this infects the full call graph. Eventually this should be removed.
- size  -> code_size
- size  -> data_size
- frame -> stack_frame
- limit -> stack_limit
- hits  -> coverage_hits
- count -> coverage_count
Note this does include internal structs, so this should probably
be limited to informative purposes.
This required a patch to the --diff flag for the scripts to ignore
a missing file. This enables the useful one liner for making comparisons
with potentially missing previous versions:

    ./scripts/code.py lfs.o -d lfs.o.code.csv -o lfs.o.code.csv

    function (0 added, 0 removed)            old     new    diff
    TOTAL                                  25476   25476      +0

One downside, these previous files are easy to delete as a part of make
clean, which limits their usefulness for comparing configuration
changes...
- Added -L/--depth argument to show dependencies for scripts/stack.py,
  this replaces calls.py
- Additional internal restructuring to avoid repeated code
- Removed incorrect diff percentage when there is no actual size
- Consistent percentage rendering in test.py
A full summary of static measurements (code size, stack usage, etc) can now
be found with:

    make summary

This is done through the combination of a new ./scripts/summary.py
script and the ability of existing scripts to merge into existing csv
files, allowing multiple results to be merged either in a pipeline, or
in parallel with a single ./script/summary.py call.

The ./scripts/summary.py script can also be used to quickly compare
different builds or configurations. This is a proper implementation
of a similar but hacky shell script that has already been very useful
for making optimization decisions:

    $ ./scripts/structs.py new.csv -d old.csv --summary
    name (2 added, 0 removed)               code             stack            structs
    TOTAL                                  28648 (-2.7%)      2448               1012

Also some other small tweaks to scripts:

- Removed state saving diff rules. This isn't the most useful way to
  handle comparing changes.

- Added short flags for --summary (-Y) and --files (-F), since these
  are quite often used.
- Changed `make summary` to show a one line summary
- Added `make lfs.csv` rule, which is useful for finding more info with
  other scripts
- Fixed small issue in ./scripts/summary.py
- Added *.ci (callgraph) and *.csv (script output) to CI
- Added to GitHub statuses (61 results)

- Reworked generated release table to include these (16 results, only thumb)

These also required a surprisingly large number of other changes:

- Bumbed CI Ubuntu version 18.04 -> 20.04, 22.04 is already on the
  horizon but not usable in GitHub yet

- Manualy upgrade to GCC v10, this is required for the -fcallgraph-info
  flag that scripts/stack.py uses.

- Increased paginated status queries to 100 per-page. If we have more
  statuses than this the status diffs may get much more complicated...

- Forced whitespace in generated release table to always be nbsp. GitHub
  tables get scrunched rather ugly without this, prefering margins to
  readable tables.

- Added limited support for "∞" results, since this is returned by
  ./scripts/stack.py for recursive functions.

As a side-note, this increases the number of statuses reported
per-commit from 6 to 61, so hopefully that doesn't cause any problems...
Using errors=replace in python utf-8 decoding makes these scripts more
resilient to underlying errors, rather than just throwing an unhelpfully
generic decode error.
GCC is a bit frustrating here, it really wants to generate every file in
a single command, which _is_ more efficient if our build system could
leverage this. But -fcallgraph-info is a rather novel flag, so we can't
really rely on it for generally compiling and testing littlefs.

The multi-file output gets in the way when we want an explicitly
separate rule for callgraph-info generation. We can't generate the
callgraph-info without generating the objects files.

This becomes a surprsing issue when parallel building (make -j) is used!
Suddenly we might end up with both the .o and .ci rules writing to .o
files, which creates a really difficult to track down issue of corrupted
.o files.

The temporary solution is to use an order-only prerequisite. This still
ends up building the .o files twice, but it's an acceptable tradeoff for
not requiring the -fcallgraph-info for all builds.
This requires parsing an additional section of the dwarfinfo (--dwarf=rawlines)
to get the declaration file info.

---

Interpreting the results of ./scripts/structs.py reporting is a bit more
complicated than other scripts, structs aren't used in a consistent
manner so the cost of a large struct depends on the context in which it
is used.

But that being said, there really isn't much reason to report
internal-only structs. These structs really only exist for type-checking
in internal algorithms, and their cost will end up reflected in other RAM
measurements, either stack, heap, or other.
…ntu snaps

Introduced when updating CI to Ubuntu 20.04, Ubuntu snaps consume
loop devices, which conflict with out assumption that /dev/loop0
will always be unused. Changed to request a dynamic loop device from
losetup, though it would have been nice if Ubuntu snaps allocated
from the last device or something.
Avoids redundant counting of structs shared in multiple .c files, which
is very common. This is different from the other scripts,
code.py/data.py/stack.py, but this difference makes sense as struct
declarations have a very different lifetime.
As noted in Python's subprocess library:

> This will deadlock when using stdout=PIPE and/or stderr=PIPE and the
> child process generates enough output to a pipe such that it blocks
> waiting for the OS pipe buffer to accept more data.

Curiously, this only became a problem when updating to Ubuntu 20.04
in CI (python3.6 -> python3.8).
lfs_dir_commit originally relied heavily on tail-recursion, though at
least one path (through relocations) was not tail-recursive, and could
cause unbounded stack usage in extreme cases of bad blocks. (Keep in
mind even extreme cases of bad blocks should be in scope for littlefs).

In order to remove recursion from this code path, several changed were
raequired:

- The lfs_dir_compact logic had to be somewhat inverted. Instead of
  first compacting and then resolving issues such as relocations and
  orphans, the overarching lfs_dir_commit now contains a state-machine
  which after committing or compacting handles the extra changes to the
  filesystem in a single, non-recursive loop

- Instead of fixing all relocations recursively, >1 relocation requires
  defering to a full deorphan step. This step is unfortunately an
  additional n^2 process. It also required some changes to lfs_deorphan
  in order to ignore intentional orphans created as an intermediary in
  lfs_mkdir. Maybe in the future we should remove these.

- Tail recursion normally found in lfs_fs_deorphan had to be rewritten
  as a loop which restarts any time a new commit causes a relocation.
  This does show that the algorithm may not terminate, but only if every
  block is bad, which will eventually cause littlefs to run out of
  blocks to write to.
This mostly just required separate functions for "lfs_file_rawwrite" and
"lfs_file_flushedwrite", since lfs_file_flush recursively invokes
lfs_file_rawread and lfs_file_rawwrite.

This comes at a code cost, but gives us bounded and measurable RAM usage
on this code path.
lfs_dir_traverse is a bit unpleasant in that it is inherently a
recursive function, but without a strict bound of 4 calls (commit -> filter ->
move -> filter), and efforts to unroll the recursion comes at a
signification code cost.

It turns out the best solution I've found so far is to simple create an
explicit stack with an explicit bound of 4 calls (or more accurately,
3 pushed frames).

---

This actually highlights one of the bigger flaws in littlefs right now,
which is that this function, lfs_dir_traverse, takes O(n^2) disk reads
to traverse.

Note that LFS_FROM_MOVE can only occur once per commit, which is why
this code is O(n^2) and not O(n^4).
@geky geky added needs minor version new functionality only allowed in minor versions next minor stack usage labels Mar 21, 2022
@dpgeorge
Copy link
Contributor

3. lfs_dir_traverse was also a bit tricky. It's inherently recursive, but has a strict upper bound of 4 calls (a traverse with a filter with a move with a filter).

Did you see #621 (I'm sure you did), in particular:

Yet, despite (2), the overall CPU time only decrease by 75%, leaving the huge number of times lfs_dir_traverse is called.
We analyze the call graph and saw that 4 levels of recursive calls of lfs_dir_traverse are done during compaction.
A O(n^4) is obviously a problem...

And then:

Yet, for the particular case of cb=lfs_dir_traverse_filter for [3] and [4], we saw that all the calls are useless ... The shortcut of this particular case reduce the number of lfs_dir_traverse calls by 97.8%

So I assume the optimisation being talked about there is to cut down this recursion significantly. But it looks like that optimisation cannot be applied generally, is that true? It would be great to see this optimisation applied because it really does help to improve littlefs performance.

@geky
Copy link
Member Author

geky commented Mar 21, 2022

Did you see #621 (I'm sure you did), in particular:

Ah I did, and was actually just re-reading it, but haven't been able to fully understand/provide useful feedback to the proposal yet. I hadn't realized it was related to this, thanks for pointing that out.

A O(n^4) is obviously a problem...

It actually should be "only" O(n^2+n^2) = O(n^2), since a LFS_FROM_MOVE tag can only occur once in an attribute list (here, the LFS_FROM_* tags can't occur on disk).

That being said, this O(n^2) loop has been be a big mistake. It's something I'm actively exploring, with at least one potential prototype, but it's quite difficult to solve. I'd be quite interested to know @invoxiaamo's ideas, but that might be a discussion for that PR.

This is the main reason behind #203

Yet, for the particular case of cb=lfs_dir_traverse_filter for [3] and [4], we saw that all the calls are useless

I was initially skeptical but looking into it a bit more this might be true. If so this would be a great improvement to have and might be able to reduce the stack usage in lfs_dir_traverse down to 3 frames.

@dpgeorge
Copy link
Contributor

Regarding this PR: in general I think it's a good idea to eliminate the use of recursion, or at least have more control over the recursion. If it's fully eliminated it might be interesting to add a check to CI to ensure it's not added again accidentally (and that will limit the features that can be added in the future, which IMO is a good thing).

As for code size increase: we do use littlefs in a bootloader and it needs to be small, but it's in read-only mode so looks like that won't change with this PR.

@geky
Copy link
Member Author

geky commented Mar 21, 2022

It's not a hard error, but CI results in the "show all checks" panel would show "∞ B" instead of "1696 B", unfortunately GitHub has made these hard to view recently. At the very least it would be difficult to ignore come release time. But making it a hard error is an even better idea.

@geky geky added this to the v2.5 milestone Apr 9, 2022
@geky geky changed the base branch from master to devel April 9, 2022 17:42
@geky geky added v2.5 and removed needs minor version new functionality only allowed in minor versions labels Apr 10, 2022
@geky geky merged commit 4ce078c into devel Apr 11, 2022
geky added a commit that referenced this pull request Apr 11, 2022
Restructure littlefs to not use recursion, measure stack usage
@geky geky mentioned this pull request Apr 11, 2022
@warasilapm
Copy link

@geky I recognize this was a difficult and sizeable undertaking and I am extremely thankful for the work you've done to accomplish this. I had a whole class of bugs based around stack usage within littlefs in one of our applications on a RAM constrained device where every bit of stack per task counted. I ended up guessing and checking the RAM usage but was always worried about usage during relocation for worn blocks down the road. Having this be deterministic is exactly what I was hoping for out of littlefs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

What is the maximum stack depth of LFS?
3 participants