CPUs are insanely inefficient.
Fast, yes. They run at billions of calculations per second. But they also carry around a bit of bloat.
Bloat is one of the biggest tiny issues affecting tech today.
A CPU — Intel and AMD come to mind — have instruction sets that are rather large. It takes energy to cart all of those instructions around. Even when they aren’t used.
Without consideration of the concept of word-size*, we refer to them by an addressable bit-length: 4-bit, 8-bit… 64-bit. But let this sink in for a moment: a 4-bit instruction set is only 16 items (actually, a two-byte word)*. That actually represents a list of 16 (actually 256) possible instructions from which to draw. From those foundational instructions, we’ve managed to design and accomplish a great deal.
* yes, I know that it’s still dependent on word-size, linear, and physical address space. This is meant to be a rant and is a gross generalization
Consider the process of moving to a new memory point, reading a value, add it to a value at a separate location, putting the result into a new memory location… maybe 63 steps to do a particular ask.
And we want to do more. Decades ago, we observed that the way we were writing instructions was, in fact, insanely inefficient. The same process was repeated some number of times so we decided to expand the instruction set. So, rather than have compute cycles consumed by the common work of interpreting and reinterpreting our instructions, we could simply increase the number of instructions it could do and use a hard-coded function instead.
It’s identical capability, but it’s now a single instruction built right onto the chip. Instead of needing 372 of our steps to accomplish a particular task, it may actually need just three. It’s substantially more efficient.
And with the current swath of CPUs available, it’s ballooned to a 64-bit instruction set (actually 48-bits). Doing the math? That equates to an addressable set of millions of possible instructions. Of course, the list isn’t full — it still has necessary blank spots that literally do nothing at all. And there are so many hard-coded instructions available it’s unlikely anyone knows exactly what’s on the list.
But that one chip will still do anything one can currently conceive of.
Can’t select an instruction to do something? Write code that leverages the other instructions to do it. But you’re going to have a small performance hit. It’s so small, you won’t even notice it — the thing runs at 3-billion operations per second. How bad could it be?
For a price in either case (before or after hard-coding the process). It’ll also need to spend some time carting those instructions around. It’ll result in inefficiency and power consumption just keeping them at the ready… about 65 watts per chip package.
Unless you want to throw out those entirely-unneeded architectural instruction sets. You’ll be looking at designing a new chip with a reduced instruction set. Hmm, what if we call it a Reduced Instruction Set Computer?
Intel & AMD’s x86 architecture — where the x86/x86-64 is, in the current age, the family name and doesn’t refer directly to the instruction-set size — have onboard the entire possible set of addressable instructions.
But if you shift to a RISC-based system that includes in its instruction set most of what you want to accomplish, and isn’t taking up resources to keep uneeded capabilities alive, it’ll be more efficient.
So, what could consume a 65-watts per CPU package with AMD’s & Intel’s desktop-class packages (server-class is about twice the draw) and provide incredible capability, there are RISC packages that can outperform them while drawing only about 10-watts.
Take some RISC and move away from these monolithic architectures.