Update: Want to benchmark some of the things below on your own hardware? Check out our Artemis Benchmarking Tool for Unity 3d inspired by this article. Best of all, it’s free and open source!
Note: The percentage graphs below are marked as % faster than 2c/2t, when it should more accurately be described as % compared to the baseline 2c/2t.
With the launch of new Ryzen processors this year, choosing the right processor can be tricky – do you go for the multi-core performance powerhouse that AMD has released, or the better per-core performance of the newest Core i series Intel CPU? Surprisingly little information about how well Unity scales with extra cores/threads is available. If you’re looking to make this decision yourself, hopefully the information below will bring you the information you need.
Also, if you have any tips on what to improve/change about the way I’m doing things here, please feel free to let me know! I’m probably most responsive on Twitter (@PixelSpice) or Reddit (/u/PixelSpice).
We selected the areas that we typically experience delays with, and performed multiple benchmarks at each configuration, and taking the average. In order to get somewhat standardized and repeatable performance, we only used first-party assets directly from Unity for our testing. Each of our selected tests was performed multiple times within the Courtyard package (version 1.2) available on the Unity asset store for free directly from Unity Technologies:
Additionally, we used new projects for import and build testing (each iteration), and the same Unity project for all other tests.
Package Decompressing & Importing
Lightmap & Reflection generation
Windows build time
“Play button” delay
CPU – Ryzen 5 1600x @ Stock speeds
RAM – 32GB DDR4 2666mhz
SSD – Samsung 850 Evo 120 GB
GPU – Radeon R9 Nano
We used the same hardware for each test, with every possible variation of core (c) and thread (t) configurations tested – from 2c/2t to 6c/12t. Additionally, we used a separate SSD for Windows 10 in order to reduce the performance impact the OS had on read/write speeds from the SSD.
The heading for each area includes the performance improvement of the 6c/12t compared to the 2c/2t. Additionally, charts that are orange/gray are “higher is better” while blue/gray charts are “lower is better”.
Areas with limited improvement
While the majority of the tests clearly benefited from more cores, several of the tests ended up seeing little to no improvement from the additional cores. These tests are multi-threaded but limited by other factors (storage, RAM, or GPU speed for example). If there is enough interest in this type of review I’ll revisit these tests with a faster GPU/SSD/RAM.
Before diving into the more interesting tests, the following areas saw minimal to no gains with additional thread/core counts:
Package decompression ±8%
Navmesh Generation: ±11%
“Play button” delay: ±2%
All three of the above areas followed no trend related to core count, and individual results varied as much within each configuration as they did across the configurations.There were a few outliers for the above tests related to the 2c/4t and 2c/2t configurations – likely caused by the lower tolerance those configurations would have to OS interruptions. While we tried to keep the background loads minimal and consistent, there will always be a bit of background noise. Keep this in mind if you plan to have
10 25 50 tabs open in Chrome looking for that one piece of information you need.
Package Importing – 77% improvement
Even if you aren’t a big fan of the Unity asset store, package importing is a useful test to estimate the time to import custom packages or assets. Add a new .fbx file to the project? Import. Switch from one build target to another? Re-import. Upgrading Unity? Import (probably). Want the newest post processing effects? Package import – from the asset store. You get the idea – this comes up all the time. If you’re like us, it’s probably one of the top reasons you see that “Hold on” progress bar.
Honestly, we were a bit surprised this saw a benefit from more cores/threads – we assumed the drive speed would be a significant limitation. Fortunately for us, it seems that’s not the case:
Overall pretty good results for a test you would expect to be primarily storage speed limited.
Lightmap & Reflection generation- 131% improvement
Lightmap building can take forever depending on the level. Depending on the size and complexity of the scene, these tasks can be very RAM intensive. Keeping your object size fairly small (like those in the demo asset) helps minimize the impact of these other potential bottlenecks.
Lightmap generation in Unity could easily have it’s own benchmarking topic, as there are many, many options to choose from. We kept it at the default settings for these tests, and it would be interesting to see if these results were consistent across the various settings. Overall the results were as expected apart from the 6c/6t vs 6c/12t where there was significant overlap in times with a nearly identical average.
Occlusion baking – 115% improvement
The most interesting thing we discovered as part of our occlusion testing isn’t well represented in the charts below. Originally, the camera had geometry in frame when doing these tests for our 4c/8t and 2c/4t testing, as opposed to the default camera position with no geometry in view. This caused a large enough difference in the baking time to skew the results below their SMT disabled counterparts.
Redoing the tests with the default camera position and no geometry in view returned the results graphed below. It might be a good idea to switch off of scene/game view before doing occlusion baking, or facing the camera away from the scene when baking. This is really unexpected behavior, but I suspect drawing the occlusion culling visualizations can actually slow down the baking process.
Windows build – 170% improvement
We saved the best for last – build times! Unity offers a new cloud build service, but it isn’t without drawbacks. If you’re like us and still doing builds locally, you probably know how tedious they can be. Fortunately, the results are pretty fantastic, and speak for themselves:
Overall, we see a pretty consistent improvement when more cores or threads is added. While there is potential for a bottleneck due to storage and RAM speed again, we certainly don’t see it before 6c/12t on the 1600x.
There are certainly benefits from having a higher core count machine to work with Unity, but it seems like not all of the time consuming tasks thread well. Additionally, while anecdotal, Unity seemed to run smoother overall with the higher core count – tasks like scene swapping and project loading.
Since Unity has real-time lighting generation, it’s good to see improvements with additional cores. Newer machines with higher core counts should be able to leave auto generation enabled in more scenarios. Lighting, navmesh, and occlusion all needs to be recalculated when moving static game objects, and of these only the navmesh baking didn’t see an improvement. This could potentially be on the back burner since Unity games often use A* or their own path-finding system.
If you’re buying a new machine to use Unity, hopefully this provides you a bit more information about whether you should spend a bit extra for those extra couple of cores. Generally AMD will give you more cores and threads at a price point, while Intel will give you more single-core performance.