A couple weeks back we heard the news of Spectre/Meltdown. This has been quite possibly the largest single flaw ever detected in Intel architectures. Essentially, the Spectre component is a memory vulnerability which could allow, should an exploit be created, to force applications to leak their data from application to application, and thereby share their data between applications. Should something push data from application to application, that would certainly cause a data breach that could ensure catastrophic results. Meltdown is the CPU flaw that breaks the isolation between applications and the operating system. Together, we’re looking at significant flaws.
Intel has stated that the flaws effect almost every key processor on the market. Intel, AMD, and ARM processors are all known to be effected. Patches have been issued, though, at current, the overhead to patching could be as great as 30%.
It should be noted that application servers are not the only ones effected. Practically every server based storage infrastructure could very likely be subject to the same vulnerabilities. It would be important to ensure the patching methodologies for such infrastructures be monitored and also maintained. Proceed with caution! Your mileage may vary.
What does that mean for the administrator? If a well provisioned, but not over-provisioned environment is in place, and all patching is done, we’re potentially seeing data centers being forced to add as much as a third more servers to their architecture. Is this a reasonable expectation? Who eats the cost of this? Would Intel be liable? Is there even a possibility of a lawsuit there?
The Verge wrote an interesting piece on how Spectre was kept secret for as long as it was: Here.
Here’s the abstract written about Spectre and the speculative exploit: https://spectreattack.com/spectre.pdf
Many vendors, from Google, to AWS have written about these flaws, Microsoft, and also VMware, as it seems more likely that the exploits could affect virtualized workloads more significantly than physical workloads. Also, the key hardware vendors, particularly in the X86 space. HPE, Lenovo, Cisco UCS, and Dell, and Nutanix have also written responses.
There are many speculations as to their approach. Microsoft has issued patches for their operating systems, RedHat the same, and VMware as well. Intel has issued a firmware patch for processors, which would be the potential 30% hit to performance.
A couple things have become clear.
- There is a large gap between when the flaws were discovered, and when they’d been made public
- The true effects of patching, or for that matter not patching are truly unknown at this point.
ZDNet has offered this advice: in this article.
Perhaps the most significant response to the CPU flaw comes from Intel in this article.
What do I recommend to my customers? Well, for those with “On-Premises” workloads, I’d absolutely recommend patching within their sandbox environment, such that they can see the true impact of the performance related issues, and then test the heck out of them. LodeRunner would be a good testing module for placing significant workloads on those machines. I’d like to have some baseline performance models from prior to patching, as a basis for comparison. If determined to be somewhat less than the 30% estimated overhead, then it may be viable to patch at will. It’d be potentially presumptuous to purchase another slew of servers to mitigate the over utilization that may be represented by the overhead.
Disclosure: The following response comes from my company’s Managed Services Practice:
We have been updating our managed customers’ Operating Systems, web browsers and firmware (as available) in response to the massive holes that affect nearly all system flavors. Also we do have signatures for both types of flaws that have been pushed out. We’re awaiting the big Intel push by January 12th. We’ve been telling our non-MSP customers: Update your Operating System, check for firmware updates, update your browser, keep your antivirus active. We’ve been taking a lot of calls with customers asking which devices are safe. The issues dubbed Meltdown and Spectre – exist in the CPU hardware itself. Windows, Linux, Android, MacOSX, IOS, Chromebooks and other operating systems all need to protect against these.
As to cloud related workloads, the effect becomes a bit muddier. In the case of SAAS infrastructures, more often than not, you may need to do nothing, as the provider is responsible for putting ample processing power in place. Alternately, in the case of your own workloads requiring x number of servers, you may actually need to provision more compute power to support the compute you require.
Again, muddy is the response. Knowledge is power, so your server requirements demand as much knowledge as possible.