Loading video...

Video Failed to Load

Go Home

I've been trying to implement a parallel rigid body solver. It's a tricky problem in physics to solve a huge pile of objects on multiple threads. I've tried graph coloring and other means of splitting up the problem, but so far a locking primitive per body is the most efficient..

65,477 views • 1 year ago •via X (Twitter)

11 Comments

Sebastian Aaltonen's profile picture
Sebastian Aaltonen1 year ago

CPU or GPU? I used graph colorization in Claybook GPU solver. You can also accumulate corrections in a lockless way, but that requires more iterations to converge. I wrote four different solvers for GPU before I was happy about it.

Dennis Gustafsson's profile picture
Dennis Gustafsson1 year ago

This is on CPU, I tried a Jacobi approach today, but it converges so slowly that it eats up all gains… Not sure if that is what you meant though. Anyway, I’m pretty happy with the unordered Gauss-Seidel I have now. I changed it from one mutex per body to hashing though.

Storm's profile picture
Storm1 year ago

I run into this problem every time I start trying to build a game engine. It's a huge rabbit hole. The problem is that you're trying to represent and calculate something continuous (quantum, even) in discrete steps and that *necessarily* requires making some tradeoffs in correctness or speed. In the real world every object interacts with every other object at all times (e.g. gravity), there's not really a notion of "event propagation" because there's not really an "origin" and thus no real "direction" in which to propagate events. You end up running into fundamental issues with our human conception of space and time. (e.g. trying to answer the question of "what is time, really?") As far as I have discovered, this makes it impossible to "correctly" implement the sort of cause-and-effect physics that collision detection requires. Granted, there are plenty of acceptable well-documented tradeoffs to be made! Best of luck to you, hope you solve this!

Alex's profile picture
Alex1 year ago

Maybe you'd want to look into the implementation of @RossNordby's Bepuphysics2 on Github. He really knows his way around multithreading and cache opt. The solver code isn't super insane (~4k LOC). For someone with your experience it's probably not too difficult to read.

Awesome New Year Robot's profile picture
Awesome New Year Robot1 year ago

Yeah, the "pile of objects" complexity is inherently n-squared :-( This is very annoying for networking, too. Long time ago I tried a system that double-buffered; read object states from an input array, write new object states to the output array. Parallelizes perfectly if you break it by object type and then sub-ranges of each type array. You may need to run narrow-phase for the same pairs multiple times, but that's at worst twice the linear cost, and thus at 3+ threads are still a win. The drawback is that you can't run a "big matrix of constraints" type solver in that fashion, so instead you have to use penalty/damping methods, use double precision, and crank up the frame rate to 1000 Hz. But that's not necessarily a bad thing ...

Kiaran Ritchie's profile picture
Kiaran Ritchie1 year ago

Impressive pile of boxes! Crazy that a mutex per body is better than graph coloring.

Nicholas Chapman's profile picture
Nicholas Chapman1 year ago

Have a look at Jolt physics, which uses striped mutexes.

Fire Totem Games 🧡 A Webbing Journey 🕷️🍪's profile picture
Fire Totem Games 🧡 A Webbing Journey 🕷️🍪1 year ago

This looks sooooo satisfying 🤩

SpeedKomodo's profile picture
SpeedKomodo1 year ago

Looks like one of those 2000s era PhysX benchmarks.

Charlie Shenton's profile picture
Charlie Shenton1 year ago

How many bodies is this, and on what CPU?

Dennis Gustafsson's profile picture
Dennis Gustafsson1 year ago

This is around 5000 bodies in an i9 cpu. Not optimized yet though :)

Related Videos