Limited items in stock
View Purchasing OptionsOne key differentiator of Ten64 from general-purpose and media-oriented appliances is the networking-oriented acceleration capabilities of Ten64’s LS1088 System-on-Chip.
The previous 10G Options & Performance post described some of the options available to improve packet routing performance - all the way up to the programmable offload engine (AIOP).
There are two other workloads you can accelerate on Ten64. In this post, we will describe how Ten64 can accelerate cryptography (important for VPNs) and AI workloads using an AI acceleration card.
The LS1088 SoC provides two separate methods of cryptography acceleration:
This provides acceleration for AES, and SHA-1,-224 and SHA-256. It is analogous to the AES-NI in most modern x86 processors. This is an optional extension which is not present on all ARM-powered processors, but is present on the LS1088. You can check if it is available on your ARM machine by looking at the flags in cpuinfo
:
$ cat /proc/cpuinfo | grep Features | head -n 1
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
$ cat /proc/cpuinfo | grep Features | head -n 1
Features : fp asimd evtstrm crc32 cpuid
To illustrate the difference, we ran OpenSSL’s speed benchmark on the Ten64, and Raspberry Pi 3 and 4. The Raspberry Pi 3 also uses the Cortex-A53 core (like the LS1088), but does not have the ARMv8 crypto extension.
The newer Raspberry Pi 4 uses the Cortex-A72 - a faster, out-of-order core, but also lacks the cryptography extension.
As we can see, the lack of AES acceleration is a major handicap — the LS1088 is 18-22x faster in this particular use case.
The ARMv8 Cryptography extension is used by OpenSSL, wolfSSL, and through the arm/sha*-ce kernel modules in the Linux kernel, so most applications using these libraries should be able to take advantage of them.
The NXP SEC engine (also known as CAAM) is NXP’s encryption acceleration block. It is designed to accelerate communications workloads like IPSec, as well as some earlier versions of TLS and ciphers used in standards such as 3G/UMTS (Kasumi, Snow) and Wi-Fi. It also implements some older, but still relevant standards such as RSA and 3DES.
SEC engine is best at accelerating packets to/from the network stack in the kernel (or similar environments such as DPDK). There are higher latencies as data packets need to be transferred in and out of it via DMA, rather than the ARMv8 crypto extensions, which are part of the CPU instruction set.
It is possible to use SEC from userspace, using mechanisms such as cryptodev
, but you might end up with better performance using the CPU instructions.
Nonetheless, you can get some impressive performance from the SEC engine for IPSec workloads, because it can accelerate not only the encryption cipher but also a chain of related operations such as AEAD and HMAC, as can be seen in /proc/crypto
when the SEC drivers are compiled into the kernel:
cat /proc/crypto | grep aes | grep caam
driver : cmac-aes-caam
driver : xcbc-aes-caam
driver : seqiv-authenc-hmac-sha512-rfc3686-ctr-aes-caam
driver : authenc-hmac-sha512-rfc3686-ctr-aes-caam
driver : seqiv-authenc-hmac-sha384-rfc3686-ctr-aes-caam
driver : authenc-hmac-sha384-rfc3686-ctr-aes-caam
driver : seqiv-authenc-hmac-sha256-rfc3686-ctr-aes-caam
driver : authenc-hmac-sha256-rfc3686-ctr-aes-caam
driver : seqiv-authenc-hmac-sha224-rfc3686-ctr-aes-caam
driver : authenc-hmac-sha224-rfc3686-ctr-aes-caam
driver : seqiv-authenc-hmac-sha1-rfc3686-ctr-aes-caam
driver : authenc-hmac-sha1-rfc3686-ctr-aes-caam
driver : seqiv-authenc-hmac-md5-rfc3686-ctr-aes-caam
driver : authenc-hmac-md5-rfc3686-ctr-aes-caam
driver : echainiv-authenc-hmac-sha512-cbc-aes-caam
driver : authenc-hmac-sha512-cbc-aes-caam
driver : echainiv-authenc-hmac-sha384-cbc-aes-caam
driver : authenc-hmac-sha384-cbc-aes-caam
driver : echainiv-authenc-hmac-sha256-cbc-aes-caam
driver : authenc-hmac-sha256-cbc-aes-caam
driver : echainiv-authenc-hmac-sha224-cbc-aes-caam
driver : authenc-hmac-sha224-cbc-aes-caam
driver : echainiv-authenc-hmac-sha1-cbc-aes-caam
driver : authenc-hmac-sha1-cbc-aes-caam
driver : echainiv-authenc-hmac-md5-cbc-aes-caam
driver : authenc-hmac-md5-cbc-aes-caam
driver : gcm-aes-caam
driver : rfc4543-gcm-aes-caam
driver : rfc4106-gcm-aes-caam
driver : ecb-aes-caam
driver : xts-aes-caam
driver : rfc3686-ctr-aes-caam
driver : ctr-aes-caam
driver : cbc-aes-caam
(For a full output from /proc/crypto, see the cryptographic acceleration page in the Ten64 manual.)
IPSec may not be the easiest VPN solution to use (especially in the face of alternatives like OpenVPN and Wireguard) but this is balanced by its ubiquitous nature (as many operating systems and network appliances implement it) and ability to leverage hardware offloads such as the SEC engine.
Those of you interested in machine learning and AI may be interested to know that the Coral AI EdgeTPU cards work in the Ten64. The Coral PCIe cards are available in both Mini PCIe and M.2.
While we haven’t had an opportunity to piece together an AI/ML demo of our own, the TensorFlow Lite image classification example shows an impressive speedup:
----INFERENCE TIME----
Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.
140.4ms
138.9ms
139.1ms
139.3ms
139.3ms
-------RESULTS--------
Ara macao (Scarlet Macaw): 0.77734
----INFERENCE TIME----
Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.
12.6ms
2.5ms
2.4ms
2.4ms
2.4ms
-------RESULTS--------
Ara macao (Scarlet Macaw): 0.77734
That is an over 50x speedup - which opens up possibilities involving real-time processing, such as classifying objects from a video feed.
For information on how to setup a development environment for the Coral EdgeTPU, see our application note.