The primary idea behind HT/SMT was that when one thread stalls, another thread on the same core can co-opt the rest of that core\'s idle time and run with it, transparently.
As far as i know and as i experienced as a developer in the field of heavy throughput calculations SMT/HT has only one single usefull application and in all others at best it doesn't make things worse:
On virtualization SMT/HT helps reducing the costs of (thread) context switching and thus highly reduces the latency when working with multiple VMs sharing the same cores.
But regarding throughput, i never encountered in practice anything where SMT/HT actually didn't made things slower. Theoretically, it could be neither slower nor faster if the OS would optimally schedule the processes but in practice it happens to schedule two demanding processes on the same core due to SMT and thus slowing down the throughput.
So on all machines that are used for high performance calculations we disable HT and SMT. In all our tests they slow down calculation by around 10-20%.
If somebody has a real world (htoughput not latency) example where smt/HT actually didn't slow down things i would be very curious.
AMD has moved to full SMT now in the Zen microarchitecture
Regardless of how well your code is written and running on the machine, there will be relatively long periods of CPU idle time where the CPU is just waiting on something to happen. Cache misses are a subset of the problem, waiting for I/O, user input, etc. can all lead to lengthy stalls in the CPU where the progress can still be made on the second set of registers. Also, there are several causes of cache misses that you can't plan for/around (an example is pushing new instructions on a branch since you executable probably doesn't all fit into Level 3 cache).
One of the main reasons that Silvermont went away from HT is the fact that at 22 nm, you have a lot of die (relatively) to play with. As a result, you can get away with more physical cores for increased parallelism.
ARM and AMD have not implemented hyper threading because it is Intel's proprietary technology.
Whether hyper-threading helps and by how much very much depends on what the threads are doing. It isn't just about doing work in one thread while the other thread waits on I/O or a cache miss - although that is a big part of the rationale. It is about efficiently using the CPU resources to increase total system throughput. Suppose you have two threads
With hyper-threading these two threads can share the same CPU, one is doing integer operations and getting cache misses and stalling, the other is using the floating point unit and the data prefetcher is well ahead anticipating the sequential data from memory. The system throughput is better than if the O/S alternatively scheduled both threads on the same CPU core.
Intel chose not to include hyper-threading in Silvermont, but that doesn't mean it will do away with it in high end Xeon server processors, or even in processors targeted at laptops. Choosing the micro-architecture for a processor involves trade-offs, there are many considerations:
Silvermont's die size budget per core and power budget precluded having both out-of-order execution and hyperthreading, and out-of-order execution gives better single threaded performance. Here's Anandtech's assessment:
If I had to describe Intel’s design philosophy with Silvermont it would be sensible scaling. We’ve seen this from Apple with Swift, and from Qualcomm with the Krait 200 to Krait 300 transition. Remember the design rule put in place back with the original Atom: for every 2% increase in performance, the Atom architects could at most increase power by 1%. In other words, performance can go up, but performance per watt cannot go down. Silvermont maintains that design philosophy, and I think I have some idea of how.
Previous versions of Atom used Hyper Threading to get good utilization of execution resources. Hyper Threading had a power penalty associated with it, but the performance uplift was enough to justify it. At 22nm, Intel had enough die area (thanks to transistor scaling) to just add in more cores rather than rely on HT for better threaded performance so Hyper Threading was out. The power savings Intel got from getting rid of Hyper Threading were then allocated to making Silvermont an out-of-order design, which in turn helped drive up efficient use of the execution resources without HT. It turns out that at 22nm the die area Intel would’ve spent on enabling HT was roughly the same as Silvermont’s re-order buffer and OoO logic, so there wasn’t even an area penalty for the move.
After using the 8 core Atoms with virtualization, I salivate over the prospect of such a chip with HT. I will agree for most workloads maybe not, but with ESXi? You get truly impressive use of HT. The low power consumption just seals the deal on them for me. If you could get 16 logical cores on ESXi the price / performance would be truly through the roof. I mean, no way to afford the current Intel chips with 8 cores and HT and because of the way Vsphere and products for Vsphere are licensed per proc, dual proc hosts just don't make sense anymore cost wise for true small businesses.