tracing/hist: Call hist functions directly via a switch statement
Due to retpolines, indirect calls are much more expensive than direct
calls. The histograms have a select set of functions it uses for the
histograms, instead of using function pointers to call them, create a
hist_fn_call() function that uses a switch statement to call the histogram
functions directly. This gives a 13% speedup to the histogram logic.
Using the histogram benchmark:
Before:
# event histogram
#
# trigger info: hist:keys=delta:vals=hitcount:sort=delta:size=2048 if delta > 0 [active]
#
{ delta: 129 } hitcount: 2213
{ delta: 130 } hitcount: 285965
{ delta: 131 } hitcount:
1146545
{ delta: 132 } hitcount:
f5819fb
{ delta: 133 } hitcount:
19896215
{ delta: 134 } hitcount:
53118616
{ delta: 135 } hitcount:
83816709
{ delta: 136 } hitcount:
68329562
{ delta: 137 } hitcount:
41859349
{ delta: 138 } hitcount:
46257797
{ delta: 139 } hitcount:
54400831
{ delta: 140 } hitcount:
72875007
{ delta: 141 } hitcount:
76193272
{ delta: 142 } hitcount:
49504263
{ delta: 143 } hitcount:
38821072
{ delta: 144 } hitcount:
47702679
{ delta: 145 } hitcount:
41357297
{ delta: 146 } hitcount:
22058238
{ delta: 147 } hitcount:
9720002
{ delta: 148 } hitcount:
3193542
{ delta: 149 } hitcount: 927030
{ delta: 150 } hitcount: 850772
{ delta: 151 } hitcount:
1477380
{ delta: 152 } hitcount:
2687977
{ delta: 153 } hitcount:
2865985
{ delta: 154 } hitcount:
1977492
{ delta: 155 } hitcount:
2475607
{ delta: 156 } hitcount:
3403612
After:
# event histogram
#
# trigger info: hist:keys=delta:vals=hitcount:sort=delta:size=2048 if delta > 0 [active]
#
{ delta: 113 } hitcount: 272
{ delta: 114 } hitcount: 840
{ delta: 118 } hitcount: 344
{ delta: 119 } hitcount: 25428
{ delta: 120 } hitcount: 350590
{ delta: 121 } hitcount:
1892484
{ delta: 122 } hitcount:
6205004
{ delta: 123 } hitcount:
11583521
{ delta: 124 } hitcount:
37590979
{ delta: 125 } hitcount:
108308504
{ delta: 126 } hitcount:
131672461
{ delta: 127 } hitcount:
88700598
{ delta: 128 } hitcount:
65939870
{ delta: 129 } hitcount:
45055004
{ delta: 130 } hitcount:
33174464
{ delta: 131 } hitcount:
31813493
{ delta: 132 } hitcount:
29011676
{ delta: 133 } hitcount:
22798782
{ delta: 134 } hitcount:
22072486
{ delta: 135 } hitcount:
17034113
{ delta: 136 } hitcount:
8982490
{ delta: 137 } hitcount:
2865908
{ delta: 138 } hitcount: 980382
{ delta: 139 } hitcount:
1651944
{ delta: 140 } hitcount:
4112073
{ delta: 141 } hitcount:
3963269
{ delta: 142 } hitcount:
1712508
{ delta: 143 } hitcount: 575941
{ delta: 144 } hitcount: 351427
{ delta: 145 } hitcount: 218077
{ delta: 146 } hitcount: 167297
{ delta: 147 } hitcount: 146198
{ delta: 148 } hitcount: 116122
{ delta: 149 } hitcount: 58993
{ delta: 150 } hitcount: 40228
The delta above is in nanoseconds. It brings the fastest time down from
129ns to 113ns, and the peak from 141ns to 126ns.
Link: https://lkml.kernel.org/r/20220906225529.411545333@goodmis.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>