Skip to content

Commit

Permalink
vdisp/sdl: R10k conversion optimized
Browse files Browse the repository at this point in the history
Create inner loop with fixed amount of iterations (16). This will allow
the compiler to unroll the inner loop and vectorize (16 iterations per
4 bytes is 512b allowing up to 512b instructions).

The eventual rest (%16 != 0) is computed per pixel as it used to be..
  • Loading branch information
MartinPulec committed Sep 18, 2024
1 parent 70e169d commit 3a238de
Showing 1 changed file with 10 additions and 0 deletions.
10 changes: 10 additions & 0 deletions src/video_display/sdl2.c
Original file line number Diff line number Diff line change
Expand Up @@ -833,7 +833,17 @@ static struct video_frame *display_sdl2_getf(void *state)
static void
r10k_to_sdl2(size_t count, uint32_t *buf)
{
enum {
LOOP_ITEMS = 16,
};
unsigned int i = 0;
for (; i < count / LOOP_ITEMS; ++i) {
for (int j = 0; j < LOOP_ITEMS; ++j) {
uint32_t val = htonl(*buf);
*buf++ = val >> 2;
}
}
i *= LOOP_ITEMS;
for (; i < count; ++i) {
uint32_t val = htonl(*buf);
*buf++ = val >> 2;
Expand Down

0 comments on commit 3a238de

Please sign in to comment.