How I made my particle life simulator 1000% faster

The beginning

About a year ago, I put my particle-sim project on the shelf. It was capable of simulating around 4-5 thousand particles on my gaming PC. I had just finished making the project multi-threaded to fix the performance issue that came from my poorly optimized updateVelocities function that took \(O(n^2)\) time, where \(n\) is the number of particles. It was a very obviously lazy implementation, but I just didn't have the care or the time to do any proper optimization. That is until very recently...

Very recently

I was going through my old projects, and came across my particle life simulator. Realistically, this is one of my favorite projects. It is written in a language I enjoy, it is visually appealing, and it has a lot of room for improvement. I was looking for a project to revisit and update, and particle-sim was the perfect option. First I took a moment to read my own code from a year ago. It wasn't the prettiest, but it was commented enough for me to understand it. Turns out I had begun a quadtree implementation before I put this project on the shelf, so I knew exactly what direction I needed to go to update this project.

What is a quadtree?

Before I continue, I would like to take a moment to explain what a quadtree is. A quadtree is a tree-like data structure. Every node in the quadtree can split into four children, each of whom can then split into four more children. They are often used to represent may points on a 2d plane, with a certain number of points allowed in each quadrant until it splits again. (This is a very brief definition, please look at the wikipedia for more information.) For my project, it is extremely useful for being able to find every point within a certain radius of a given point. This works by checking which quadrants have an edge that overlaps the circle created by the radius, and then marking that quadrant to be searched for points. (It also adds the quadrant the point itself is stored in, and also wraps around edges, but that's not important for this explanation.) Since know what quadrants that are within a specified radius of the point, I no longer end up needing to check every single particle and now I only search the quadrants that could possibly contain a valid point.

The implementation

I looked over the implementation I had made, and realized I would need to make many changes to it. In my initial code, I had created the quadtree in such a way where every quad contained a single node, thus meaning if I generated 50 thousand particles, there would be 50 thousand quadrants, and it would be faster at that point to just check every particle individually. My first change was to instead have every leaf quad contain an array of particles with a fixed size. (A leaf quad is just a quadrant with no children.) This size will later be known as the "split limit", AKA the maximum number of particles any given quad can contain before splitting. I now have the information I need to decide how and when a quadrant should split. Deciding when to split is quite simple. If the number of particles within a given quad is greater than the split limit, then split. Going about the split is similarly simple. After creating the 4 children, determine the child each node belongs in, then insert the node into that child. The next important part of the implementation is in the radiusSearch function. (I won't cover radiusSearchWrapping because it is essentially just a little bit of edge-case checking around radiusSearch.) This function takes a point and a radius, and then fills an array with every particle within the radius of that point. It does this by descending into the leaf quadrants, and if the leaf quadrant intersects the circle represented by the radius and the center, then it checks every node within the quadrant. If it doesn't intersect the circle, it simply does nothing. After I have collected every node (particle) within the specified radius, I return back to updateVelocities and apply the necessary forces to a particle based on the particles within the radius. With this implementation, the average runtime for my updateVelocities becomes \(O(n \log{n})\), since radiusSearch is \(O(\log{n})\) on average. (Worst case it is \(O(n^2)\) but it realistically never happens because it requires every particle to be in the radius of every other particle.)

But wait...

After the quadtree optimization, I was able to safely simulate around ten thousand particles at 60fps. But didn't I say I could simulate upwards of fifty thousand? So how did I get there? The primary obstacle at that point was my allocations. There were simply too many allocations and deallocations within my main loop, so I went on a hunt to get rid of as many as I could. Successfully too! The only allocations remaining inside of the main loop are the ones to build the quadtree every frame. Everything else was factored out to the beginning of the program, so now I instead pass around pre-allocated lists. (This was also done after the update to zig 0.15.1, which meant a lot of stuff had to change anyways.) Then, as a little cherry on top for my speed, I used the smp_allocator provided by zig to speed up the allocations I was doing to be faster than even the c_allocator.

That's all, folks!

And that was it! I built with --release=fast and I was able to simulate fifty thousand particles at 60fps. I am currently working on getting this project to compile to emscripten, that way I can show it off on this website. I also plan to add a toggle to display the quadrant boundaries, just because I think that would be cool. Maybe in the future I will create a 3D version, but that would be a while off with school and all. Thanks for reading, and be sure to email me with any questions or comments you have!

Tags: particle-sim, optimization, projects, zig, raylib, multi-threading

Nic's posts