Perhaps you’ve heard of the HyperThread bugs that are found in Intel Kaby Lake and some Skylake processors. (This article is intended for Linux based servers).
No? Then you may wish to read this article from the Debian support list:
Intel Skylake/Kaby Lake processors: broken hyper-threading
It is about a 4 minute read, I’ll wait.
From that article you will see that if a Skylake or Kaby Lake processor is in use, there is a very good chance that HyperThreading must be disabled until a patched version of the microcode is installed. If not, then inexplicable system crashes can occur.
If you read the follow-up posts as well, you will have seen that finding the ‘ht’ flag in /proc/cpuinfo indicates only if a processor is HyperThread capable, not if HyperThreading is enabled. The lscpu utility can be used to determine if HT is actually enabled.
If you only have a handful of servers to check, then the methods found in the mailing list post will work fine.
What if, however, you have a few hundred servers to check?
Or perhaps many different clients or sites, each with a large number of servers?
The fully manual method then becomes too time consuming and too error prone.
Some form of at least semi-automated detection is then called for. As stated in the title, this does not require Sherlock Holmes level detection skills (or even Philip Marlowe, Sam Spade or Nick Charles) but a little bit of work is required.
My goal was to simplify this as much as possible by making it possible to run a script against a list of servers, doing so from a single control server.
(Though you may suspect the ‘control’ keyword suggests an automation tool such as Ansible, that level of automation didn’t really seem necessary in this case.)
To get started, these scripts were developed: Intel HT Bug Detector
The name is a bit of a misnomer in that what is being detected is whether or not the CPU is one that can be affected by the HyperThread issue. The name however does make the intent clear.
The script is designed to be run via ssh. This allows running the script from a single location. This script also knows if it is being run on the local server,
ht-bug-chk.sh takes two optional arguments: hostname and username
If no arguments are supplied the script checks the local server. If the supplied hostname matches the local hostname, the local server is checked.
If the supplied hostname does not match the current hostname, then ssh is used to run the commands remotely, either as the current username, or the username supplied on the command line.
Following are some examples:
Checking the current host
$ ./ht-bug-chk.sh Host to check: someserver CPU Info : model name : Intel(R) Core(TM) i7-4790S CPU @ 3.20GHz CPU Model: notaffected the model name : Intel(R) Core(TM) i7-4790S CPU @ 3.20GHz is HyperThread capable however HyperThreading is not enabled on this CPU
Checking a host on the local network as current user
$ ./ht-bug-chk.sh japp Host to check: japp CPU Info : model name : Intel(R) Core(TM) i7-4790S CPU @ 3.20GHz CPU Model: notaffected CPU Architecture: notaffected This CPU is not affected by HyperThread bugs
Checking a remote host as a specific user
$ ./ht-bug-chk.sh 137.62.169.72 someuser Host to check: 137.62.169.72 CPU Info : model name : Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz CPU Model: notaffected the model name : Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz is HyperThread capable however HyperThreading is not enabled on this CPU
Put it in a loop
While an automation tool such as Ansible could be used to run this script, that really seemed like overkill to me, as once the servers are cared for, we are done with this script.
It is a simple matter to wrap this script with another bash script. The new script can read from a list of servers and output a report.
The file used to supply the list of server names and user names is server-list.txt.
The format of the file is ‘servername username’ separated by a space.
This is my file (edited to mask real info)
server1 myuser
server2 myuser
server3 myuser
server4
137.62.169.72 myuser
Notice that server4 does not have an associated username, and so the current username will be used.
Here’s an example (also found in the git repo):
#!/bin/bash # check for HT bug vulnerable CPUs # read the list of servers from a file timestamp=$(date '+%Y-%m-%d_%H-%M-%S') logfile="ht-chk_${timestamp}.log" while read server username do echo Server : $server echo Username: $username ./ht-bug-chk.sh $server $username echo echo "########################################> echo done < <( cat server-list.txt ) | tee $logfile
This script will read server-list.txt, and then run ht-bug-chk.sh for each of the servers, using the designated username if provided. Output will be logged.
Here is a test run of ht-server-chk.sh
$ ./ht-server-chk.sh Server : server1 Username: myuser Host to check: server1 CPU Info : model name : Intel(R) Core(TM) i7-4790S CPU @ 3.20GHz CPU Model: notaffected CPU Architecture: notaffected This CPU is not affected by HyperThread bugs ######################################## Server : server2 Username: myuser Host to check: server2 CPU Info : model name : Intel(R) Core(TM) i5-4590 CPU @ 3.30GHz CPU Model: notaffected the model name : Intel(R) Core(TM) i5-4590 CPU @ 3.30GHz is HyperThread capable however HyperThreading is not enabled on this CPU ######################################## Server : oradns02 Username: myuser Host to check: oradns02 bash: lscpu: command not found CPU Info : model name : Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz CPU Model: notaffected CPU Architecture: notaffected This CPU is not affected by HyperThread bugs ######################################## Server : poirot Username: Host to check: poirot CPU Info : model name : Intel(R) Core(TM) i7-4790S CPU @ 3.20GHz CPU Model: notaffected the model name : Intel(R) Core(TM) i7-4790S CPU @ 3.20GHz is HyperThread capable however HyperThreading is not enabled on this CPU ######################################## Server : 137.62.169.72 Username: myuser Host to check: 137.62.169.72 CPU Info : model name : Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz CPU Model: notaffected the model name : Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz is HyperThread capable however HyperThreading is not enabled on this CPU ######################################## $ ls -l *.log -rw-r--r-- 1 oracle dba 1499 Jul 13 12:04 ht-chk_2017-07-13_12-04-48.log
Yet To Do
There is more that could be done with this script. Some Skylake processors have microcode patches available to correct these problems. The availability of such patches is platform and vendor dependent.
Those Skylake processors for which the patches may be available (as of this writing) could be identified in the script and a notification printed.
It would also be possible to determine if the microcode is up to date, and /proc/cpuinfo provides that information.
At the very least, ht-bug-chk.sh will help you quickly determine if any of your servers are vulnerable to this issue, freeing up time for more interesting and productive work.