For Box2D version 3.0 I decided to finally try using SIMD as it is meant to be used for solving contacts. Making contacts solve faster could yield large performance gains so I decided it would be worth the effort.
But how can I gather 4 or 8 contact pairs to be solved simultaneously? The key is graph coloring. The idea is to have a handful of colors to be assigned to all the contact constraints. For example, suppose I have 6 colors and I want to assign all the contacts to one of those 6 colors. Contact constraints act upon two bodies at a time. With graph coloring the restriction is that within a single color a body can only appear once or not at all.
…
I did all this work to enable SIMD processing. Did it help? Box2D has a benchmarking console application to help answer this question. I implemented SSE2, Neon, and AVX SIMD instruction sets in the Box2D contact solver. I also implemented a scalar reference implementation. I have 5 benchmarks scenarios that push Box2D in various ways. See the benchmark results here.
The large pyramid benchmark with 4 workers has the following numbers:
- AVX2 : 1117 fps = 0.90 ms
- Neon : 1058 fps = 0.95 ms
- SSE2 : 982 fps = 1.02 ms
- scalar (AMD): 524 fps = 1.91 ms
- scalar (M2): 679 fps = 1.47 ms
…
The bottom line is that making good use of SIMD can be a lot of work but it is worth the effort because it can make games run significantly faster and handle more rigid bodies.