How I made my particle life simulator 1000% faster
How I made my particle life simulator 1000% faster
The beginning
About a year ago, I put my particle-sim project
on the shelf. It was capable of simulating around 4-5 thousand particles on my gaming PC.
I had just finished making the project multi-threaded to fix the performance issue that came
from my poorly optimized updateVelocities function that took \(O(n^2)\) time, where \(n\) is
the number of particles. It was a very obviously lazy implementation, but I just didn't have
the care or the time to do any proper optimization. That is until very recently...
Very recently
I was going through my old projects, and came across my particle life simulator. Realistically, this is one of my favorite projects. It is written in a language I enjoy, it is visually appealing, and it has a lot of room for improvement. I was looking for a project to revisit and update, and particle-sim was the perfect option. First I took a moment to read my own code from a year ago. It wasn't the prettiest, but it was commented enough for me to understand it. Turns out I had begun a quadtree implementation before I put this project on the shelf, so I knew exactly what direction I needed to go to update this project.
What is a quadtree?
Before I continue, I would like to take a moment to explain what a quadtree is. A quadtree is a tree-like data structure. Every node in the quadtree can split into four children, each of whom can then split into four more children. They are often used to represent may points on a 2d plane, with a certain number of points allowed in each quadrant until it splits again. (This is a very brief definition, please look at the wikipedia for more information.) For my project, it is extremely useful for being able to find every point within a certain radius of a given point. This works by checking which quadrants have an edge that overlaps the circle created by the radius, and then marking that quadrant to be searched for points. (It also adds the quadrant the point itself is stored in, and also wraps around edges, but that's not important for this explanation.) Since know what quadrants that are within a specified radius of the point, I no longer end up needing to check every single particle and now I only search the quadrants that could possibly contain a valid point.
The implementation
I looked over the implementation I had made, and realized I would need to make many changes to it.
In my initial code, I had created the quadtree in such a way where every quad contained a single node,
thus meaning if I generated 50 thousand particles, there would be 50 thousand quadrants, and it would
be faster at that point to just check every particle individually. My first change was to instead have
every leaf quad contain an array of particles with a fixed size.
(A leaf quad is just a quadrant with no children.)
This size will later be known as the "split limit", AKA the maximum number of particles any given quad
can contain before splitting. I now have the information I need to decide how
and when a quadrant should split. Deciding when to split is quite simple. If the number of particles
within a given quad is greater than the split limit, then split. Going about the split is
similarly simple. After creating the 4 children, determine the child each node belongs in,
then insert the node into that child.
The next important part of the implementation is in the radiusSearch function. (I won't cover
radiusSearchWrapping because it is essentially just a little bit of edge-case checking around
radiusSearch.) This function takes a point and a radius, and then fills an array with every
particle within the radius of that point. It does this by descending into the leaf quadrants,
and if the leaf quadrant intersects the circle represented by the radius and the center, then it
checks every node within the quadrant. If it doesn't intersect the circle, it simply does nothing.
After I have collected every node (particle) within the specified radius, I return back to
updateVelocities and apply the necessary forces to a particle based on the particles within the
radius. With this implementation, the average runtime for my updateVelocities becomes
\(O(n \log{n})\), since radiusSearch is \(O(\log{n})\) on average. (Worst case it is
\(O(n^2)\) but it realistically never happens because it requires every particle to be in the radius
of every other particle.)
But wait...
After the quadtree optimization, I was able to safely simulate around ten thousand particles
at 60fps. But didn't I say I could simulate upwards of fifty thousand? So how did I get there?
The primary obstacle at that point was my allocations. There were simply too many
allocations and deallocations within my main loop, so I went on a hunt to get rid of as many
as I could. Successfully too! The only allocations remaining inside of the main loop are the ones
to build the quadtree every frame. Everything else was factored out to the beginning of the program,
so now I instead pass around pre-allocated lists. (This was also done after the update to zig
0.15.1, which meant a lot of stuff had to change anyways.) Then, as a little cherry on top
for my speed, I used the smp_allocator provided by zig to speed up the allocations I was doing
to be faster than even the c_allocator.
That's all, folks!
And that was it! I built with --release=fast and I was able to simulate fifty thousand
particles at 60fps. I am currently working on getting this project to compile to emscripten, that
way I can show it off on this website. I also plan to add a toggle to display the quadrant
boundaries, just because I think that would be cool. Maybe in the future I will create a 3D version,
but that would be a while off with school and all. Thanks for reading, and be sure to email
me with any questions or comments you have!
Tags: particle-sim, optimization, projects, zig, raylib, multi-threading