• 0 Posts
  • 414 Comments
Joined 1 year ago
cake
Cake day: June 22nd, 2023

help-circle
  • Prediction is a hard problem when coupled with caches. It relatively easy to say that no speculative instruction has any effect until it’s confirmed taken if you ignore caches. However caches need to fetch information from memory to allow an instruction to evaluate, and rewinding a cache to it’s previous state on a mispredict is almost impossible. Especially when you consider that the amount of time you’re executing non-speculative code on a modern processor is very low.

    Not having predictions is consigning yourself to 1990s performance, with faster clocks.






  • The AI numbers are pretty solid. Papers published on Hugging face list training times and platform and convert that into CO2. Those will be full load for weeks/months across arrays of GPUs.

    In this case, I don’t see why you’d need that kind of hardware for this application. You might be right that it’s not running at maximum load. If so, then somebody has been mis-sold the hardware. Whatever you’re doing it will be at a consistent load though. They are always doing the same thing.






  • We do, depending on how you count it.

    There’s two major widths in a processor. The data register width and the address bus width, but even that is not the whole story. If you go back to a processor like the 68000, the classic 16-bit processor, it has:

    • 32-bit data registers
    • 16- bit ALU
    • 16-bit data bus
    • 32-bit address registers
    • 24-bit address bus

    Some people called it a 16/32 bit processor, but really it was the 16-bit ALU that classified it as 16-bits.

    If you look at a Zen 4 core it has:

    • 64-bit data registers
    • 512-bit AVX data registers
    • 6 x 64-bit integer ALUs
    • 4 x 256-bit AVX ALUs
    • 2 x 128-bit data bus to DDR5 (dual edge 64-bit)
    • ~40-bits of addressable physical RAM

    So, what do you want to call this processor?

    64-bit (integer width), 128-bit (physical data bus width), 256-bit (widest ALU) or 512-bit (widest register width)? Do you want to multiply those numbers up by the number of ALUs in a core? …by the number of cores on a piece of silicon?

    Me, I’d say Zen4 was a 256-bit core, but you could argue any of the above numbers.

    Basically, it’s a measurement that lost all meaning so people stopped using it.


  • Disagree. You quite often have a fair degree of scaler code in between portions which are embarrassingly parallel. If you don’t have a decent scaler core you are destined to be become bottlenecked on them. It’s not that different to a CPU / GPU pairing. If one is under powered, it determines the speed of the overall system.

    If you look at what a company like Tenstorrent is doing, they are designing high performance Risc-V cores as a side aspect of their main goal of doing array processors. The reason is because they couldn’t find scaler cores on the market with enough performance to not bottleneck the system.