Project update 5 of 15
In this update we explore the relative performance of the Talos™ Secure Workstation and one of the most powerful libre-friendly, blob-free x86 machines available.
The ASUS KGPE-D16 is likely the most powerful libre-friendly x86 machine available. When populated with two Opteron G34 Piledriver CPUs clocked at 3.2 GHz, it represents the performance cap for all x86 workstation / server class machines that still respect your freedom. How does it compare against a low-end Talos™ configuration in real life? Let’s find out by measuring the wall time of something we’re all familiar with, a full Linux kernel compilation!
First, the detailed specifications of both test machines:
The benchmark is simple: compile a Linux kernel and all relevant modules. To avoid variance in compile times based on target driver support or optimization passes, we cross-compiled to a POWER target on both machines. Both machines were using an up-to-date copy of Debian Stretch as the build environment. The entire Linux kernel source was loaded into a dedicated tmpfs ramdisk on both machines, and as much CPU and memory traffic as possible was quiesced prior to benchmark start.
The commands used on each system were:
time CROSS_COMPILE=/usr/bin/powerpc64le-linux-gnu- ARCH=powerpc make -j16
ppc64_cpu --smt=8
time make -j64
real 12m30.152s
user 146m10.076s
sys 12m34.760s
real 9m26.571s
user 296m42.477s
sys 28m41.248s
As you can see, even though the KGPE-D16 has double the core count, a higher overall TDP, comparable boost clocks to the POWER8, and is a much larger machine overall, the Talos™ Secure Workstation still manages to compile a kernel in only 3/4 of the time required by the KGPE-D16! This represents a very real savings overall in terms of developer time, capability, and electrical power consumed.
Why is Talos™ so much faster than the KGPE-D16? Much of the increase is directly related to the more powerful cores of the newer POWER8 processor combined with the large caches and fast memory system of the POWER machines. To further explore the contribution of the memory subsystem to the performance differential, we ran STREAM memory bandwidth tests on both of these test machines:
Function Best Rate MB/s Avg time Min time Max time
Copy: 14129.6 0.049492 0.045295 0.055945
Scale: 15236.6 0.048226 0.042004 0.056949
Add: 15078.2 0.073871 0.063668 0.088305
Triad: 15466.7 0.071833 0.062069 0.088462
Function Best Rate MB/s Avg time Min time Max time
Copy: 36279.5 0.018876 0.017641 0.022749
Scale: 37932.3 0.017997 0.016872 0.021693
Add: 41974.5 0.024270 0.022871 0.026481
Triad: 43338.9 0.023190 0.022151 0.023880
A word of caution: These results exaggerate the memory bandwidth issues present on the Opteron CPUs due to only having four of the eight possible memory channels populated. However, even assuming the Opteron doubles in bandwidth with the remaining four channels populated (which is very unlikely), the Talos™ mainboard still has a significant edge over the Opteron system even in this best-case streaming memory benchmark.
From the test results shown, Talos™ is a significant upgrade from existing libre-friendly, owner-controlled machines. Talos™ will accelerate development and lower ongoing costs, all while preserving owner control.
Conspicuously absent from these tests are Intel processors released concurrently with or subsequent to POWER8. We specifically chose not to compare Talos™ against the Xeon® systems as the Xeon® systems are not libre-friendly, require network-enabled signed blobs to run continuously during system operation, and otherwise require compromising security and owner control if used as an upgrade from an existing blob-free system such as the KGPE-D16. You can take the raw numbers from our benchmarks above and run tests on Xeon® hardware if desired; in general Xeon® is on par with POWER8 but due to the aforementioned issues with owner control and freedom, Xeon® systems do not present a viable upgrade path for many use cases.