Inside the AMD Microcode ROM [video]

atq2119 · on Dec 28, 2018

According to a question at the end, this is about very old CPUs, K8/K10, because the newer ones authenticate microcode updates with public key cryptography which hasn't been broken. Still pretty amazing stuff.

loeg · on Dec 28, 2018

Yeah, the description says "up to 2013." I think that's likely a bit more recent than K10 but I don't know.

TazeTSchnitzel · on Dec 28, 2018

That's just the tail end of K10 production (2012 according to Wikipedia). Its successor, Bulldozer, came out in 2011, but a new architecture being out doesn't mean its predecessor immediately stops production.

snovv_crash · on Dec 28, 2018

I wonder whether it would be possible to add aftermarket AVX-512 instruction handling? Not for performance necessarily, but for compatibility.

ip26 · on Dec 28, 2018

We have robust support for alternate code paths based on CPU flags after many years of this kind of thing. Is that really necessary or useful?

Custom microcode handling is a lot more brittle and chip- specific yet nearly equivalent to overloading in software your call to some avx512 op.

raverbashing · on Dec 28, 2018

You're right, it isn't needed

The "last time" this was done was in the x87 days, where if the math coprocessor wasn't installed you could trap the corresponding interrupt and handle it to emulate the instructions.

bpye · on Dec 28, 2018

This is done for MIPS FP emulation too - https://www.linux-mips.org/wiki/Floating_point#The_Linux_ker...

mschuster91 · on Dec 28, 2018

> The "last time" this was done was in the x87 days, where if the math coprocessor wasn't installed you could trap the corresponding interrupt and handle it to emulate the instructions.

Wasn't Hackintosh (or getting new OS X running on too old hardware) also using this technology to "support" CPUs without the newest SSEx instruction sets?

TazeTSchnitzel · on Dec 28, 2018

The tricky thing there would be those aren't just new instructions, they extend existing registers and add new ones. Where do you fit the extra bytes?

en4bz · on Dec 28, 2018

AMD already implements 256bit operations in terms of 2 128bit operations. Seems like going to 4 wouldn't be a stretch, at least for some of the simpler operations. Seems like a subset of operations would be possible.

TazeTSchnitzel · on Dec 28, 2018

> AMD already implements 256bit operations in terms of 2 128bit operations.

Current Zen has actual 256-bit registers, it just doesn't have the execution units to process the whole register at once. It's not really the same thing.

bayindirh · on Dec 28, 2018

I think Zen2 implements 256 bit instructions natively [0]. For AVX512, the new instructions [1] rather than the floating point arithmetic will be a problem IMHO. Emulating them with the microcode will be expensive and will provide no performance gains.

[0]: https://en.wikichip.org/wiki/amd/microarchitectures/zen_2

[1]: https://en.wikipedia.org/wiki/AVX-512

ah- · on Dec 28, 2018

Unlikely, IIRC you only had someting in the order of 32 three-instruction slots of memory.

rzzzt · on Dec 28, 2018

I guess it would be easier to do that with virtualization. Advertise the capability to the VM, catch illegal instructions, emulate the missing pieces.

cafxx · on Dec 29, 2018

I wonder if it would be possible to dump, from microcode, the contents of the microcode ROM. This would neatly sidestep the problems inherent in decoding the ROM contents from pictures of decapped chips.

choonway · on Dec 28, 2018

Is it possible to hack the microcode so that it can run ARM assembly natively?

monocasa · on Dec 28, 2018

Nope. The chip is very much designed around x86 decoding, even before you get to the ucode ROM/RAM. Additionally, you only have a handful of patch RAM locations.

bayindirh · on Dec 28, 2018

It would be hard, because the ISA is tightly bound to the underlying silicon's structure.

Some of the commands cannot be translated to the silicon effectively or not at all.

e.g.: MIPS have 64 x 64bit registers. You can use any of them as a source or a destination, however x86 always designates EAX as the ALU accumulator. This has some profound effects on silicon design.

gpderetta · on Dec 28, 2018

> x86 always designates EAX as the ALU accumulat

Actually no. After decoding there is nothing special in the aex register.

AMD at some point was going to release K10 which was basically Zen but with an ARM decoder. It got cancelled when Zen proved viable and AMD decided it was better to compete with Intel than all the ARM vendors.

bayindirh · on Dec 28, 2018

> Actually no. After decoding there is nothing special in the aex register.

The microcode, or specifically the modern x86 processors, are using register renaming to move things around, but the actual ASM commands imply that the results should end in EAX register. You cannot arbitrarily do a MUL and get the result from EBX for example [3]. i.e. x86 assembly dictates where the results should end in.

AMD played with two ideas: A pure ARM core, and a hybrid x86 core with ARM co-processor. The ARM core missed the performance targets [0], and they also abandoned the ARM accelerated x86 core [1], but I don't know why.

They never intended to go full TransMeta and transcode the x86 ASM into something proprietary or ARM.

Bonus: It seems they are still muling the idea of X86/ARM hybrid [2].

[0]: https://www.theregister.co.uk/2018/11/27/amazon_aws_graviton...

[1]: https://www.extremetech.com/computing/205078-amds-project-sk...

[2]: https://www.reddit.com/r/AMD_Stock/comments/8x4sba/the_retur...

[3]: https://c9x.me/x86/html/file_module_x86_id_210.html

floatboth · on Dec 28, 2018

https://softiron.com/development-tools/overdrive-1000/

They actually "released" (to one manufacturer it seems) the Opteron A1100. With stock Cortex-A57 cores, not "Zen with an ARM decoder".

pkaye · on Dec 28, 2018

The instruction decoder that breaks up the variable length instruction set into micro-ops is likely hard coded.

shmerl · on Dec 28, 2018

Why is AMD microcode not open source to begin with?

anonymouzz · on Dec 28, 2018

It's firmware. Very little firmware is. In information theoretic sense it's much more surprising if some firmware is open source.

shmerl · on Dec 28, 2018

I'm asking why. Is there some reason for them not to open it? AMD are quite positive about opening up other things, like GPU drivers for example. So why not firmware as well?

In the GPU case I know the reason - it's the DRM garbage (HDCP and Co.). Support for DRM dictates for them to keep it closed. But even there, they could provide alternative firmware without DRM, and make it open. But for CPU, there is no real reason it seems.

slededit · on Dec 28, 2018

GPU vendors refused to open source their drivers and firmware long before HDCP was a thing.

shmerl · on Dec 30, 2018

Things have changed for drivers. Not for firmware though, and DRM it the reason.

jshap70 · on Dec 28, 2018

because there's a lot of proprietary stuff in microcode that's used for accelerations. gfx drivers too. it's the reason the closed amd drivers are so much faster than the open mesa ones.

shmerl · on Dec 28, 2018

> it's the reason the closed amd drivers are so much faster than the open mesa ones.

On the contrary, Mesa is faster than their blob. AMD themselves are working on replacing blob with Mesa in the long term.

Firmware doesn't offer any acceleration advantages, it's used for different purposes.

jshap70 · on Dec 28, 2018

yeah... I don't know what numbers you're looking at but that's not true in the general case. and this isn't firmware, it's microcode. firmware is already on the chip. microcode is used so the os can take advantage of chip specific features, like security patches or even acceleration.

atq2119 · on Dec 28, 2018

Do you have actual benchmarks which show the closed source OpenGL driver significantly faster than the open source one? In Phoronix benchmarks I've seen, the open source driver beats the closed source one by a large margin.

jshap70 · on Dec 28, 2018

https://www.phoronix.com/scan.php?page=news_item&px=AMDGPU-P...

shmerl · on Dec 28, 2018

That's years ago and is outdated. Today Mesa beats the blob point blank, thanks to AMD themselves working on optimizing radeonsi.

dralley · on Dec 28, 2018

A lot has changed in the last two years. Nowadays you have an occasional game that is faster on the blob driver, but most are faster under Mesa, often significantly so.

shmerl · on Dec 28, 2018

They clearly said in the presentation, that microcode is a form of CPU firmware.

monocasa · on Dec 28, 2018

Mesa almost always uses proprietary firmware. The fail0verflow guys did some work last year to at least document it for the PS4's GPU to patch a bug. But the upstream Radeon Mesa guys are really hesitant to upstream it to avoid pissing off AMD. https://github.com/fail0verflow/radeon-tools/tree/master/f32

Of course that's all sorta orthogonal because that's all not really microcode or firmware in the classic definition, but just "code for an embedded processor I don't want to document."