Chip Industry Meltdown: An Insider’s Account Exposed

Researchers began writing about the potential for security weaknesses at the heart of central processing units, or CPUs, at least as early as 2005.

It was late November and former Intel Corp. engineer Thomas Prescher was enjoying beers and burgers with friends in Dresden, Germany, when the conversation turned, ominously, to semiconductors.

Months earlier, cybersecurity researcher Anders Fogh had posted a blog suggesting a possible way to hack into chips powering most of the world’s computers, and the friends spent part of the evening trying to make sense of it. The idea nagged at Prescher, so when he got home he fired up his desktop computer and set about putting the theory into practice. At 2am, a breakthrough: he’d strung together a code that reinforced Fogh’s idea and suggested there was something seriously wrong.

“My immediate reaction was, ‘It can’t be true, it can’t be true’,” Prescher said.

Last week, his worst fears were proved right when Intel, one of the world’s largest chipmakers, said all modern processors can be attacked by techniques dubbed Meltdown and Spectre, exposing crucial data, such as passwords and encryption keys. The biggest technology companies, including Microsoft Corp., Apple Inc., Google and Amazon.com Inc. are rushing out fixes for PCs, smartphones and the servers that power the internet, and some have warned that their solutions may dent performance in some cases.

Prescher was one of at least 10 researchers and engineers working around the globe—sometimes independently, sometimes together—who uncovered Meltdown and Spectre. Interviews with several of these experts reveal a chip industry that, while talking up efforts to secure computers, failed to spot that a common feature of their products had made machines so vulnerable.

“It makes you shudder,” said Paul Kocher, who helped find Spectre and started studying trade-offs between security and performance after leaving a full-time job at chip company Rambus Inc. last year. “The processor people were looking at performance and not looking at security.” Kocher still works as an adviser to Rambus.

All processor makers have tried to speed up the way chips crunch data and run programs by making them guess. Using speculative execution, the microprocessor fetches data it predicts it’s going to need next.

Spectre fools the processor into running speculative operations—ones it wouldn’t normally perform— and then uses information about how long the hardware takes to retrieve the data to infer the details of that information. Meltdown exposes data directly by undermining the way information in different applications is kept separate by what’s known as a kernel, the key software at the core of every computer.

Researchers began writing about the potential for security weaknesses at the heart of central processing units, or CPUs, at least as early as 2005. Yuval Yarom, at the University of Adelaide in Australia, credited with helping discover Spectre last week, penned some of this early work.

By 2013, other research papers showed that CPUs let unauthorized users see the layout of the kernel, a set of instructions that guide how computers perform key tasks like managing files and security and allocating resources. This vulnerability became known as a KASLR break and was the foundation for some of last week’s revelations.

In 2016, research by Felix Wilhelm and others demonstrated how an early version of speculative execution could make chips vulnerable to data leaks. Jann Horn, a young Google researcher credited with first reporting the Meltdown and Spectre weaknesses, was inspired by some of this work, according to a recent tweet.

At Black Hat USA, a major cybersecurity conference in Las Vegas, in August 2016 a team from Graz Technical University presented their research from earlier in the year on a way to prevent attacks against the kernel memory of Intel chips. One of the group, Daniel Gruss, shared a hotel room with Fogh, a malware researcher at G Data Advanced Analytics, an IT security consulting firm. Fogh had long been interested in “side-channel” attacks, ways to use the structure of chips to force computers to reveal data.

Fogh and Gruss stayed up late at night discussing the theoretical basis for what would later become Spectre and Meltdown. But, like Prescher more than a year later, the Graz team was skeptical this was a real flaw. Gruss recalls telling Fogh that the chipmakers would have uncovered such a glaring security hole during testing and would never have shipped chips with a vulnerability like that.

Fogh made the case again at Black Hat Europe, in early November 2016 in London, this time to Graz researcher Michael Schwarz. The two discussed how side-channel attacks might overcome the security of “virtualized” computing, where single servers are sliced up into what looks, to users, like multiple machines. This is a key part of increasingly popular cloud services. It’s supposed to be secure because each virtual computing session is designed to keep different customers’ information separate even when it’s on the same server.

Despite Fogh’s encouragement, the Graz researchers still didn’t think attacks would ever work in practice. “That would be such a major f*ck-up by Intel that it can’t be possible,” Schwarz recalled saying. So the team didn’t dedicate much time to it.

In January 2017, Fogh said he finally made the connection to speculative execution and how it could be used to attack the kernel. He mentioned his findings at an industry conference on 12 January, and in March he pitched the idea to the Graz team.

By the middle of the year, the Graz researchers had developed a software security patch they called KAISER that was designed to fix the KASLR break. It was made for Linux, the world’s most popular open-source operating system. Linux controls servers—making it important for corporate computing—and also supports the Android operating system used by the majority of mobile devices. Being open source, all suggested Linux updates must be shared publicly, and KAISER was well received by the developer community. The researchers did not know it then, but their patch would turn out to help prevent Meltdown attacks.

Fogh published his blog on 28 July detailing efforts to use a Meltdown-style attack to steal information from a real computer running real software. He failed, again fuelling doubts among other researchers that the vulnerabilities could really be used to steal data from chips. Fogh also mentioned unfinished work on what would become Spectre, calling it “Pandora’s Box.” That got little reaction, too.

The Graz team’s attitude quickly changed, though, as summer turned to fall. They noticed a spike in programming activity on their KAISER patch from researchers at Google, Amazon and Microsoft. These giants were pitching updates and trying to persuade the Linux community to accept them—without being open about their reasons sometimes.

“That made it a bit suspicious,” Schwarz said. Developers submitting specific Linux updates usually say why they’re proposing changes, “and on some of the things they didn’t explain. We wondered why these people were investing so much time and were working on it so hard to integrate it into Linux at any cost.”

To Schwarz and his fellow researchers, there was only one explanation: A potentially much bigger attack method that could blow open these vulnerabilities, and the tech giants were scrambling to fix it secretly before every malicious hacker on Earth found out.

Unbeknownst to the Graz team and Fogh, a 22-year-old wunderkind at Alphabet Inc.’s Google called Jann Horn had independently discovered Spectre and Meltdown in April. He’s part of Google’s Project Zero, a team of crack security researchers tasked with finding “zero-day” security holes—vulnerabilities that trigger attacks on the first day they become known.

On 1 June, Horn told Intel and other chip companies Advanced Micro Devices Inc. and ARM Holdings what he’d found. Intel informed Microsoft soon after. That’s when the big tech companies began working on fixes, including Graz’s KAISER patch, in private.

By November, Microsoft, Amazon, Google, ARM and Oracle Corp. were submitting so many of their own Linux updates to the community that more cybersecurity researchers began to realize something big— and strange—was happening.

Tests on the patches these tech giants were advocating showed serious implications for the performance of key computer systems. In one case, Amazon found that a patch increased the time it took to run certain operations by about 400%, and yet the cloud leader was still lobbying that every Linux user ought to take the fix, according to Gruss. He said this made no sense for their original KAISER patch, which would only ever impact a small sub-section of users.

Gruss and other researchers became more suspicious that these companies weren’t being completely honest about the rationale for their proposals. Intel said it is standard practice not to disclose vulnerabilities until a full remedy has been put in place. The chipmaker and other tech companies have also said their tests show minimal or no impact on performance, although certain unusual workloads may be slowed by as much as 30%.

In late November, another team of researchers at IT firm Cyberus Technology became convinced that Intel had been telling its main clients, such as Amazon and Microsoft, all about the issue, while keeping the full scale of the crisis hidden from Linux development groups.

Prescher, the former Intel engineer, was part of the Cyberus team. After his late-night discovery in Dresden, he told Cyberus chief technology officer Werner Haas what he’d found. Before their next in-person meeting, Haas made sure to wear a Stetson, so he could say to Prescher, “I take my hat off to you.”

On 3 December, a quiet Sunday afternoon, the Graz researchers ran similar tests, proving Meltdown attacks worked. “We said, ‘Oh God, that can’t be possible. We must have a mistake. There shouldn’t be this sort of mistake in processors,” recalled Schwarz.

The team told Intel the next day—around the same time Cyberus informed the chip giant. They heard nothing for more than a week. “We were amazed—there was no response,” Schwarz said.

On 13 December, Intel let Cyberus and the Graz team know that the problems they found had already been reported by Horn and others. The chipmaker was initially reluctant to let them contribute. But after being pressed, Intel put both groups in touch with the other researchers involved. They all began coordinating a broader response, including releasing updated patches at the same time.

Once inside the secret circle of the large tech companies, the Graz researchers expected they would have the typical 90 days to come up with comprehensive fixes before telling the world. “They said we know it, but will publish it at the beginning of January,” Schwarz said. It had been roughly 180 days since Google unearthed it, and keeping such issues under wraps for more than 90 days is unusual, he noted.

A group of 10 researchers coalesced and kept in touch via Skype every two days. “It was a lot of work on Christmas. There wasn’t a single day where we didn’t work. Holidays were cancelled,” Schwarz said.

Their public security updates soon attracted the attention of The Register, a U.K.-based technology news site, which wrote a story on 2 January saying Intel products were at risk.

Usually, flaws and their fixes are announced at the same time, so hackers don’t quickly abuse the vulnerabilities. This time, the details emerged early and patches weren’t ready. That led to a day and a night of frantic activity to arrange what all the companies would say in unison.

Intel put the statement out at 12pm Pacific Time on 3 January and held a conference call two hours later to explain what it said was a problem that could impact the whole industry.

The solidarity was a mirage, though. Rival AMD issued its own statement shortly before Intel’s call began, saying its products were at little or no risk of being exploited. After more than six months of coordinated work, Intel went into lock-down in the final hours and didn’t consult with its erstwhile partners to speed up a public statement, according to a person familiar with what happened.

Underlining the panic that spread following the announcement, Intel had to follow up with calming statements. The next day, the company said it had made “significant progress” in deploying updates, adding that by the end of this week 90% of processors made in the last five years will have been secured.

Steve Smith and Donald Parker, the two Intel executives questioned on the call, argued things progressed in the measured way that Intel approaches any report of a threat to its technology. The difference this time was that their work ended up “in the spotlight,” according to Smith. They would have preferred to complete the work in secret.

Indeed, Intel’s reticence rankled some outside researchers. The company operates on a need-to-know basis, said Cyberus’s Haas, who worked at Intel for about a decade. “I’m not a huge fan of that.”

“Our first priority has been to have a complete mitigation in place,” said Intel’s Parker. “We’ve delivered a solution.”

Some in the cybersecurity community aren’t so sure. Kocher, who helped discover Spectre, thinks this is just the beginning of the industry’s woes. Now that new ways to exploit chips have been exposed, there’ll be more variations and more flaws that will require more patches and mitigation.

“This is just like peeling the lid off the can of worms,” he said. #KhabarLive