Unity Multithreaded Header Image

Update: Want to benchmark some of the things below on your own hardware? Check out our Artemis Benchmarking Tool for Unity 3d inspired by this article. Best of all, it’s free and open source!

Note: The percentage graphs below are marked as % faster than 2c/2t, when it should more accurately be  described as % compared to the baseline 2c/2t.

With the launch of new Ryzen processors this year, choosing the right processor can be tricky – do you go for the multi-core performance powerhouse that AMD has released, or the better per-core performance of the newest Core i series Intel CPU? Surprisingly little information about how well Unity scales with extra cores/threads is available. If you’re looking to make this decision yourself, hopefully the information below will bring you the information you need.

Also, if you have any tips on what to improve/change about the way I’m doing things here, please feel free to let me know! I’m probably most responsive on Twitter (@PixelSpice) or Reddit (/u/PixelSpice).

Testing Approach

We selected the areas that we typically experience delays with, and performed multiple benchmarks at each configuration, and taking the average. In order to get somewhat standardized and repeatable performance, we only used first-party assets directly from Unity for our testing. Each of our selected tests was performed multiple times within the Courtyard package (version 1.2) available on the Unity asset store for free directly from Unity Technologies:

Additionally, we used new projects for import and build testing (each iteration), and the same Unity project for all other tests.

Selected tests:

Package Decompressing & Importing

Lightmap & Reflection generation

Windows build time

Occlusion baking

Navmesh generation

“Play button” delay

Testing Setup:

CPU – Ryzen 5 1600x @ Stock speeds

RAM – 32GB DDR4 2666mhz

SSD – Samsung 850 Evo 120 GB

GPU – Radeon R9 Nano

We used the same hardware for each test, with every possible variation of core (c) and thread (t) configurations tested – from 2c/2t to 6c/12t. Additionally, we used a separate SSD for Windows 10 in order to reduce the performance impact the OS had on read/write speeds from the SSD.

Ryzen Master Screenshot

AMDs new Ryzen Master utility was helpful when it came to configuring the core counts for these benchmarks.

 

Test Results

The heading for each area includes the performance improvement of the 6c/12t compared to the 2c/2t. Additionally, charts that are orange/gray are “higher is better” while blue/gray charts are “lower is better”.

Areas with limited improvement

While the majority of the tests clearly benefited from more cores, several of the tests ended up seeing little to no improvement from the additional cores. These tests are multi-threaded but limited by other factors (storage, RAM, or GPU speed for example). If there is enough interest in this type of review I’ll revisit these tests with a faster GPU/SSD/RAM.

Before diving into the more interesting tests, the following areas saw minimal to no gains with additional thread/core counts:

Package decompression ±8%

Navmesh Generation: ±11%

“Play button” delay: ±2%

All three of the above areas followed no trend related to core count, and individual results varied as much within each configuration as they did across the configurations.There were a few outliers for the above tests related to the 2c/4t and 2c/2t configurations – likely caused by the lower tolerance those configurations would have to OS interruptions. While we tried to keep the background loads minimal and consistent, there will always be a bit of background noise. Keep this in mind if you plan to have 10 25 50 tabs open in Chrome looking for that one piece of information you need.

Package Importing – 77% improvement

Even if you aren’t a big fan of the Unity asset store, package importing is a useful test to estimate the time to import custom packages or assets. Add a new .fbx file to the project? Import. Switch from one build target to another? Re-import. Upgrading Unity? Import (probably). Want the newest post processing effects? Package import – from the asset store. You get the idea – this comes up all the time. If you’re like us, it’s probably one of the top reasons you see that “Hold on” progress bar.

Honestly, we were a bit surprised this saw a benefit from more cores/threads – we assumed the drive speed would be a significant limitation. Fortunately for us, it seems that’s not the case:

Simultaneous Multi-Threading (SMT) creates two logical cores from one physical core. This helps boost performance, but is not a 1:1 performance improvement.

Importing packages is faster with more cores, but not linearly. SMT additionally seems to have a clear impact, but not as much as adding another physical core.

Overall pretty good results for a test you would expect to be primarily storage speed limited.

Lightmap & Reflection generation- 131% improvement

Lightmap building can take forever depending on the level. Depending on the size and complexity of the scene, these tasks can be very RAM intensive. Keeping your object size fairly small (like those in the demo asset) helps minimize the impact of these other potential bottlenecks.

Lightmap generation scales well with additional cores, but appears to potentially have a falloff. This is a fairly cumbersome test since there is no real easy way to fully reset the lightmap data, but it would be interesting to see how a high core counts CPU (like Threadripper or the Core X series) would compare.

Averaged times showed notable improvments with added core count, and performed well with SMT on the lowend as well.

Lightmap generation in Unity could easily have it’s own benchmarking topic, as there are many, many options to choose from. We kept it at the default settings for these tests, and it would be interesting to see if these results were consistent across the various settings. Overall the results were as expected apart from the 6c/6t vs 6c/12t where there was significant overlap in times with a nearly identical average.

Occlusion baking – 115% improvement

The most interesting thing we discovered as part of our occlusion testing isn’t well represented in the charts below. Originally, the camera had geometry in frame when doing these tests for our 4c/8t and 2c/4t testing, as opposed to the default camera position with no geometry in view. This caused a large enough difference in the baking time to skew the results below their SMT disabled counterparts.

Redoing the tests with the default camera position and no geometry in view returned the results graphed below. It might be a good idea to switch off of scene/game view before doing occlusion baking, or facing the camera away from the scene when baking. This is really unexpected behavior, but I suspect drawing the occlusion culling visualizations can actually slow down the baking process.

The results mirror the lightmap generation benchmarks with solid improvements from additional horsepower. The same limited improvement from 6c/6t to 6c/12t also exists, most likely due to the same limitation.

Windows build – 170% improvement

We saved the best for last – build times! Unity offers a new cloud build service, but it isn’t without drawbacks. If you’re like us and still doing builds locally, you probably know how tedious they can be. Fortunately, the results are pretty fantastic, and speak for themselves:

Overall, we see a pretty consistent improvement when more cores or threads is added. While there is potential for a bottleneck  due to storage and RAM speed again, we certainly don’t see it before 6c/12t on the 1600x.

Conclusion

There are certainly benefits from having a higher core count machine to work with Unity, but it seems like not all of the time consuming tasks thread well. Additionally, while anecdotal, Unity seemed to run smoother overall with the higher core count – tasks like scene swapping and project loading.

Since Unity has real-time lighting generation, it’s good to see improvements with additional cores.  Newer machines with higher core counts should be able to leave auto generation enabled in more scenarios. Lighting, navmesh, and occlusion all needs to be recalculated when moving static game objects, and of these only the navmesh baking didn’t see an improvement. This could potentially be on the back burner since Unity games often use A* or their own path-finding system.

If you’re buying a new machine to use Unity, hopefully this provides you a bit more information about whether you should spend a bit extra for those extra couple of cores. Generally AMD will give you more cores and threads at a price point, while Intel will give you more single-core performance.


Like what we do? Follow us on twitter for unity tips, reviews, and news about our upcoming game Jackie & the Crystal!