Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

英雄帖:Investigate SIMD optimization for SoundTouch #52

Open
xen0n opened this issue Apr 24, 2024 · 1 comment
Open

英雄帖:Investigate SIMD optimization for SoundTouch #52

xen0n opened this issue Apr 24, 2024 · 1 comment
Labels
AREA: Arch optimization Architecture-specific enablement or optimization (e.g. addition of LoongArch SIMD codepaths) 英雄帖 Volunteers welcome!

Comments

@xen0n
Copy link
Member

xen0n commented Apr 24, 2024

Repo: https://codeberg.org/soundtouch/soundtouch

This library is used by Firefox to handle audio time-stretching for <video> or <audio> elements. It is not bottle-necking video playback but some functions from it do show up in perf top measurements when I watch a video on Bilibili with 2x playback rate for example.

Example perf top output on 3A6000:

   PerfTop:    5738 irqs/sec  kernel:27.2%  exact:  0.0% lost: 0/0 drop: 0/0 [4000Hz cycles:P],  (all, 8 CPUs)
-----------------------------------------------------------------------------------------------------------------------------

     9.08%  liblgpllibs.so  [.] soundtouch::TDStretch::calcCrossCorrAccumulate
     4.73%  [kernel]        [k] finish_task_switch.isra.0
     3.09%  [kernel]        [k] __arch_cpu_idle
     1.56%  firefox         [.] arena_t::MallocSmall
     1.08%  libc.so.6       [.] memset
     1.08%  firefox         [.] arena_dalloc
     1.06%  libc.so.6       [.] _wordcopy_fwd_aligned
     1.05%  libc.so.6       [.] __GI___pthread_mutex_unlock_usercnt
     0.94%  libc.so.6       [.] pthread_mutex_lock@@GLIBC_2.36
     0.74%  liblgpllibs.so  [.] soundtouch::FIRFilter::evaluateFilterStereo
     0.69%  libxul.so       [.] nsIFrame::BuildDisplayListForChild
     0.66%  [kernel]        [k] try_to_wake_up
     0.61%  libc.so.6       [.] memcpy
     0.58%  libc.so.6       [.] memcmp
     0.51%  [kernel]        [k] syscall_enter_from_user_mode

Feel free to investigate whether hand-written SIMD optimization or generic auto-vectorization would help. (In my case the hottest function was not utilizing LoongArch SIMD but my compiler flag is just -O2 so I'm not excluding the possibility that a simple change to -O3 would solve it.)

@xen0n xen0n added 英雄帖 Volunteers welcome! AREA: Arch optimization Architecture-specific enablement or optimization (e.g. addition of LoongArch SIMD codepaths) labels Apr 24, 2024
@xry111
Copy link
Member

xry111 commented May 2, 2024

Is --enable-openmp helping?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AREA: Arch optimization Architecture-specific enablement or optimization (e.g. addition of LoongArch SIMD codepaths) 英雄帖 Volunteers welcome!
Projects
None yet
Development

No branches or pull requests

2 participants