正在加载视频...
视频加载失败
I've been trying to implement a parallel rigid body solver. It's a tricky problem in physics to solve a huge pile of objects on multiple threads. I've tried graph coloring and other means of splitting up the problem, but so far a locking primitive per body is the most efficient..
65,477 次观看 • 1 年前 •via X (Twitter)
11 条评论

CPU or GPU? I used graph colorization in Claybook GPU solver. You can also accumulate corrections in a lockless way, but that requires more iterations to converge. I wrote four different solvers for GPU before I was happy about it.

This is on CPU, I tried a Jacobi approach today, but it converges so slowly that it eats up all gains… Not sure if that is what you meant though. Anyway, I’m pretty happy with the unordered Gauss-Seidel I have now. I changed it from one mutex per body to hashing though.

I run into this problem every time I start trying to build a game engine. It's a huge rabbit hole. The problem is that you're trying to represent and calculate something continuous (quantum, even) in discrete steps and that *necessarily* requires making some tradeoffs in correctness or speed. In the real world every object interacts with every other object at all times (e.g. gravity), there's not really a notion of "event propagation" because there's not really an "origin" and thus no real "direction" in which to propagate events. You end up running into fundamental issues with our human conception of space and time. (e.g. trying to answer the question of "what is time, really?") As far as I have discovered, this makes it impossible to "correctly" implement the sort of cause-and-effect physics that collision detection requires. Granted, there are plenty of acceptable well-documented tradeoffs to be made! Best of luck to you, hope you solve this!

Maybe you'd want to look into the implementation of @RossNordby's Bepuphysics2 on Github. He really knows his way around multithreading and cache opt. The solver code isn't super insane (~4k LOC). For someone with your experience it's probably not too difficult to read.

Yeah, the "pile of objects" complexity is inherently n-squared :-( This is very annoying for networking, too. Long time ago I tried a system that double-buffered; read object states from an input array, write new object states to the output array. Parallelizes perfectly if you break it by object type and then sub-ranges of each type array. You may need to run narrow-phase for the same pairs multiple times, but that's at worst twice the linear cost, and thus at 3+ threads are still a win. The drawback is that you can't run a "big matrix of constraints" type solver in that fashion, so instead you have to use penalty/damping methods, use double precision, and crank up the frame rate to 1000 Hz. But that's not necessarily a bad thing ...

Impressive pile of boxes! Crazy that a mutex per body is better than graph coloring.

Have a look at Jolt physics, which uses striped mutexes.

This looks sooooo satisfying 🤩

Looks like one of those 2000s era PhysX benchmarks.

How many bodies is this, and on what CPU?

This is around 5000 bodies in an i9 cpu. Not optimized yet though :)

