[TriEmbed] optimized switch statements Re: Hacking a fake vintage radio (with Arduino + Pi 0)
Pete Soper
pete at soper.us
Wed Jul 1 17:39:56 CDT 2020
Gcc already does use jump tables without the user having to force
anything. Below is a simple C program followed by the X64 assembly
language. Notice the table of addresses and the jump through rax. But
what the programmer has to do is craft their case values to make a
"dense" range that is mostly consecutive. How smart this is, I don't
recall (e.g. if case values 4, 8, 12, 16 would be recognized as easy to
compute an index with, how many "gaps" in the case value range would be
tolerated, etc). But also the compiler won't use the jump table until
there are more than a handful of cases (the example below, with only
three cases compiled into sequences of tests and conditional branches
because the compiler writer judged the tradeoff of speed and space
wasn't worth it).
The reason I'm familiar with this is that the gcc compiler's smarts gave
the Linux version of the Java virtual machine an edge with dispatching
op codes but the Sun Solaris C++ compiler didn't have the optimization
(and was in a 'caretaker state' with no hope of getting the optimization
by then, in the early 2000s). But the speed of dispatch was about
interpretation and that was an absolute nit for code that executed
frequently, as it was automagically compiled into native code by the
late 90s and the quality of that native code got better and better over
the years.
-Pete
// Just for fun try putting "static" in front of the two int declarations
// (first mystery _import and then also for disgusting_global_side
effect) and see what comes out with gcc -S -O4
int mystery_import;
int disgusting_global_side_effect;
int main(int argc, char **argv) {
switch (mystery_import) {
case 0:
disgusting_global_side_effect = 123;
break;
case 1:
disgusting_global_side_effect = 124;
break;
case 2:
disgusting_global_side_effect = 125;
break;
case 3:
disgusting_global_side_effect = 126;
break;
case 4:
disgusting_global_side_effect = 127;
break;
case 5:
disgusting_global_side_effect = 128;
break;
}
return 0;
}
.file "dense-switch.c"
.text
.section .text.startup,"ax", at progbits
.p2align 4
.globl main
.type main, @function
main:
.LFB0:
.cfi_startproc
endbr64
cmpl $5, mystery_import(%rip)
ja .L2
movl mystery_import(%rip), %eax
leaq .L4(%rip), %rdx
movslq (%rdx,%rax,4), %rax
addq %rdx, %rax
notrack jmp *%rax
.section .rodata
.align 4
.align 4
.L4:
.long .L9-.L4
.long .L8-.L4
.long .L7-.L4
.long .L6-.L4
.long .L5-.L4
.long .L3-.L4
.section .text.startup
.L3:
movl $128, disgusting_global_side_effect(%rip)
.L2:
xorl %eax, %eax
ret
.L5:
movl $127, disgusting_global_side_effect(%rip)
jmp .L2
movl $123, disgusting_global_side_effect(%rip)
jmp .L2
.L8:
movl $124, disgusting_global_side_effect(%rip)
jmp .L2
.L7:
movl $125, disgusting_global_side_effect(%rip)
jmp .L2
.L6:
movl $126, disgusting_global_side_effect(%rip)
jmp .L2
.cfi_endproc
.LFE0:
.size main, .-main
.comm disgusting_global_side_effect,4,4
.comm mystery_import,4,4
.ident "GCC: (Ubuntu 9.3.0-10ubuntu2) 9.3.0"
.section .note.GNU-stack,"", at progbits
.section .note.gnu.property,"a"
.align 8
.long 1f - 0f
.long 4f - 1f
.long 5
0:
.string "GNU"
1:
.align 8
.long 0xc0000002
.long 3f - 2f
2:
.long 0x3
3:
.align 8
4:
On 7/1/20 2:12 PM, Jon Wolfe via TriEmbed wrote:
>
> This is one of those rare circumstances, imho, where goto can
> reasonably be used, Dijkstra be damned 😊.
>
> Where I have seen it used like this , it’s an optimization, and
> doesn’t change the semantics of code flow from a traditional
> switch-case, and it was wrapped in a macro I think so it could be
> used, if supported, by the compiler. It’s like putting inline assembly
> in the code: only do it if the benefit is worth the sacrifice in code
> portability and maintainability.
>
> I’m not a compiler optimization expert, but the code generation
> optimization of using a jump table would be at the discretion of the
> optimizer, using a goto construct would be a way to force that. It may
> also be the case that the way it is implemented at the machine code
> level is faster using gotos.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.triembed.org/pipermail/triembed_triembed.org/attachments/20200701/58a5bcb0/attachment.htm>
More information about the TriEmbed
mailing list