Quantcast
Channel: Jared Still – Official Pythian Blog
Viewing all articles
Browse latest Browse all 34

Intel Skylake and Kaby Lake CPU Detector – Sherlock Holmes need not apply

$
0
0

Perhaps you’ve heard of the HyperThread bugs that are found in Intel Kaby Lake and some Skylake processors. (This article is intended for Linux based servers).

No?  Then you may wish to read this article from the Debian support list:

Intel Skylake/Kaby Lake processors: broken hyper-threading

It is about a 4 minute read, I’ll wait.

From that article you will see that if a Skylake or Kaby Lake processor is in use, there is a very good chance that HyperThreading must be disabled until a patched version of the microcode is installed.  If not, then inexplicable system crashes can occur.

If you read the follow-up posts as well, you will have seen that finding the ‘ht’ flag in /proc/cpuinfo indicates only if a processor is HyperThread capable, not if HyperThreading is enabled.  The lscpu utility can be used to determine if HT is actually enabled.

If you only have a handful of servers to check, then the methods found in the mailing list post will work fine.

What if, however, you have a few hundred servers to check?

Or perhaps many different clients or sites, each with a large number of servers?

The fully manual method then becomes too time consuming and too error prone.

Some form of at least semi-automated detection is then called for.  As stated in the title, this does not require Sherlock Holmes level detection skills (or even Philip Marlowe, Sam Spade or Nick Charles) but a little bit of work is required.

My goal was to simplify this as much as possible by making it possible to run a script against a list of servers, doing so from a single control server.

(Though you may suspect the ‘control’ keyword suggests an automation tool such as Ansible, that level of automation didn’t really seem necessary in this case.)

To get started, these scripts were developed:  Intel HT Bug Detector

The name is a bit of a misnomer in that what is being detected is whether or not the CPU is one that can be affected by the HyperThread issue.  The name however does make the intent clear.

The script is designed to be run via ssh. This allows running the script from a single location.  This script also knows if it is being run on the local server,

ht-bug-chk.sh takes two optional arguments: hostname and username

If no arguments are supplied the script checks the local server. If the supplied hostname matches the local hostname, the local server is checked.

If the supplied hostname does not match the current hostname, then ssh is used to run the commands remotely, either as the current username, or the username supplied on the command line.

Following are some examples:

 

Checking the current host

$ ./ht-bug-chk.sh
Host to check: someserver
CPU Info : model name : Intel(R) Core(TM) i7-4790S CPU @ 3.20GHz
CPU Model: notaffected

the model name  : Intel(R) Core(TM) i7-4790S CPU @ 3.20GHz is HyperThread capable
however HyperThreading is not enabled on this CPU

Checking a host on the local network as current user

$ ./ht-bug-chk.sh  japp
Host to check: japp
CPU Info : model name : Intel(R) Core(TM) i7-4790S CPU @ 3.20GHz
CPU Model: notaffected

CPU Architecture: notaffected


This CPU is not affected by HyperThread bugs

Checking a remote host as a specific user

 

$ ./ht-bug-chk.sh 137.62.169.72 someuser
Host to check: 137.62.169.72
CPU Info : model name : Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
CPU Model: notaffected

the model name  : Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz is HyperThread capable
however HyperThreading is not enabled on this CPU

 

Put it in a loop

While an automation tool such as Ansible could be used to run this script, that really seemed like overkill to me, as once the servers are cared for, we are done with this script.

It is a simple matter to wrap this script with another bash script. The new script can read from a list of servers and output a report.

The file used to supply the list of server names and user names is server-list.txt.

The format of the file is ‘servername username’ separated by a space.

This is my file (edited to mask real info)

server1 myuser
server2 myuser
server3 myuser
server4
137.62.169.72 myuser

Notice that server4 does not have an associated username, and so the current username will be used.

Here’s an example (also found in the git repo):

ht-server-chk.sh

 

#!/bin/bash

# check for HT bug vulnerable CPUs
# read the list of servers from a file

timestamp=$(date '+%Y-%m-%d_%H-%M-%S')
logfile="ht-chk_${timestamp}.log"

while read server username
do
   echo Server  : $server
   echo Username: $username
   ./ht-bug-chk.sh $server $username
   echo
   echo "########################################>
   echo

done < <( cat server-list.txt ) | tee $logfile

 

This script will read server-list.txt, and then run ht-bug-chk.sh for each of the servers, using the designated username if provided. Output will be logged.

Here is a test run of ht-server-chk.sh

 

$ ./ht-server-chk.sh

Server : server1
Username: myuser
Host to check: server1
CPU Info : model name : Intel(R) Core(TM) i7-4790S CPU @ 3.20GHz
CPU Model: notaffected

CPU Architecture: notaffected


This CPU is not affected by HyperThread bugs


########################################

Server : server2
Username: myuser
Host to check: server2
CPU Info : model name : Intel(R) Core(TM) i5-4590 CPU @ 3.30GHz
CPU Model: notaffected

the model name  : Intel(R) Core(TM) i5-4590 CPU @ 3.30GHz is HyperThread capable
however HyperThreading is not enabled on this CPU


########################################

Server : oradns02
Username: myuser
Host to check: oradns02
bash: lscpu: command not found
CPU Info : model name : Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz
CPU Model: notaffected

CPU Architecture: notaffected


This CPU is not affected by HyperThread bugs


########################################

Server : poirot
Username:
Host to check: poirot
CPU Info : model name : Intel(R) Core(TM) i7-4790S CPU @ 3.20GHz
CPU Model: notaffected

the model name  : Intel(R) Core(TM) i7-4790S CPU @ 3.20GHz is HyperThread capable
however HyperThreading is not enabled on this CPU


########################################

Server : 137.62.169.72
Username: myuser
Host to check: 137.62.169.72
CPU Info : model name : Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
CPU Model: notaffected

the model name  : Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz is HyperThread capable
however HyperThreading is not enabled on this CPU


########################################


$ ls -l *.log
-rw-r--r-- 1 oracle dba 1499 Jul 13 12:04 ht-chk_2017-07-13_12-04-48.log


Yet To Do

There is more that could be done with this script. Some Skylake processors have microcode patches available to correct these problems.  The availability of such patches is platform and vendor dependent.

Those Skylake processors for which the patches may be available (as of this writing) could be identified in the script and a notification printed.

It would also be possible to determine if the microcode is up to date, and /proc/cpuinfo provides that information.

 

At the very least, ht-bug-chk.sh will help you quickly determine if any of your servers are vulnerable to this issue, freeing up time for more interesting and productive work.


Viewing all articles
Browse latest Browse all 34

Trending Articles