Intel's Redwood Cove: Baby Steps Are Still Steps

bee_rider 10 months ago

Oh dang, branch hints. I always thought they were so obvious, and never implemented, so they must be obviously bad. But, Intel is giving them a shot. Neat!

pclmulqdq 10 months ago

Giving them another shot. Pentium 4 had them, but there were some skill issues on the part of programmers using them, and so code quality rose when CPUs started ignoring them.
- rayiner 10 months ago
  
  I don’t think it’s a skill issue. It’s that they’re really only useful with profile guided optimization—so you can have hints that reflect actual branch probabilities. But most developers don’t seem to bother to do that.
  - pcwalton 10 months ago
    
    I wouldn't say that PGO is necessary to predict branches correctly. Lots of branches are exceptional control flow (error checks, bounds checks, etc.) and compilers will almost always correctly statically predict not taken for them. Any branch with a target that's postdominated by a throw or a process abort can basically be safely predicted not taken (after all, if you got it wrong, the penalty is going to be minuscule compared to the cost of the exception).
    
    eigenform 10 months ago
    
    The hardware itself is already very good at predicting the direction. I think the point about PGO here is that you need it to find infrequently-occuring biased-taken branches.
    When you encounter a biased-taken branch that isn't currently being tracked by the machine, you're always condemned to pay the cost of a misprediction because the default prediction is "not-taken". The hinting here is supposed to indicate "when encountering this branch, don't predict not-taken by default."
  - acdha 10 months ago
    
    If memory serves, wasn't that a bit of a tool minefield back then, too? It’s been a while but I thought I remembered a few colleagues trying GCC’s version which involved building a new version of GCC, slowing down the builds a fair amount, and then seeing only a small benefit – far less than they got switching to the first AMD Opteron when it came out a year later.
- TrainedMonkey 10 months ago
  
  Arguably on P4 they were required to get any kind of throughput due to an incredibly long execution pipeline Intel contrived to keep pushing clock frequency up. Branch mispredictions were extremely costly on that architecture.
  - pclmulqdq 10 months ago
    
    Yes, the Pentium 4 pipeline was ~32 cycles long, which meant that a branch misprediction carried a huge penalty. Processors today have settled on using a ~15-cycle pipeline.
    
    magicalhippo 10 months ago
    
    I've probably read about it and forgotten, but what on earth was it doing in all those pipeline stages?
    Were there a lot of idle/no-op stages for simple instructions? Or did most instructions, even simple ones, actually get some useful work done for each of those 32ish stages?
    
    pclmulqdq 10 months ago
    
    Here's a good source for the earliest Pentium 4, which had a 20-cycle pipeline: https://courses.cs.washington.edu/courses/cse378/10au/lectur...
    Mostly the stages you expect from an out-of-order core all had their own pipeline slot in order to go super fast, and some were split into two stages. For example, you have two stages to calculate the next instruction (branch prediction), two stages to fetch, a stage to redrive a value from the memory bus, etc. From there, the ALUs could only do a 16-bit add in one cycle, so even an addition took at least 3 cycles. The later Pentium 4's took this subdivision even further with the goal of reaching 4-5 GHz clocks in 2005.
    The whole chip was hand-laid-out, so it was possible to put pipeline registers in places that would have been weird to do in Verilog.
jeffbee 10 months ago

I can see how branch hints are useful, but in practice isn't the sign of the distance to the branch target the implicit hint? If the predictor doesn't have any other information it assumes that backwards branches are taken and forward branches are not. So, you can imagine how the compiler would rearrange the program to align with that implication, where possible.
One of the things that the BOLT post-link optimizer does is rearrange basic blocks to reduce taken forward branches, based on the profiles.
- marcosdumay 10 months ago
  
  You can't reorder some kinds of branches, like retry sequences.
  I have absolutely no idea how important that it. But there exist that bit of extra complication on the real world.
  - taeric 10 months ago
    
    Retry sequences are unlikely to be the kind that the branch prediction is that important? I'd also expect that they are larger than the threshold that is common for branches that are likely to be taken many times.
    That is, the stuff that will go into a retry of something likely has far more setup than your typical hot loop.
- bee_rider 10 months ago
  
  Huh, that’s a funny and true perspective. I always took the “branches backwards tend to be loops” assumption as true, but of course there’s no reason a compiler or linker couldn’t use that assumption as well.
- BeeOnRope 9 months ago
  
  Yes I don't really get it. For non-loop branches if the compiler has knows enough to insert a branch hint it would also already arrange the generated code so that was the fall through cases, which is already much more efficient when predicted and aligns the default fall through assumption in the CPU.
yjftsjthsd-h 10 months ago

IIRC, SPARC also used them, which... probably suggests that it's not totally terrible.
- nineteen999 10 months ago
  
  Then again SPARC had register windows ... so ...
- duskwuff 10 months ago
  
  I believe PowerPC had them as well. No idea how effective they were.
  - seanmcdirmid 10 months ago
    
    I took computer architecture in the mid-late-90s and branch hints were talked about as a thing that was being done.
    
    bee_rider 10 months ago
    
    Yeah, in my class (a couple years ago, but not decades) they were covered as something that seemed like a neat idea, but then didn’t help much for actual programs. The general theme in that class was that everything cool added more complexity than the performance benefit could justify, and got in the way of making things wider.
    Same for branch delays slots.
    
    duskwuff 10 months ago
    
    Branch delay slots were just a straight-up instance of exposing an ugly quirk of the pipeline to the programmer. They became a liability in later revisions of MIPS, as it no longer arose naturally from the architecture and had to be deliberately included.
  - KerrAvon 10 months ago
    
    From distant memory: they were effective for at least some PowerPC implementations. I don't recall how much, but it was considered worthwhile in hot code for at least some machines. However, IIRC, branch prediction wasn't as advanced at that point; a modern implementation might not see the same benefits.
drmpeg 10 months ago

The i960CA and CF had branch hints. It was bit 1 in the branch and the compare and branch opcodes.
IshKebab 10 months ago

Do compilers even generate these hints?
- duskwuff 10 months ago
  
  Usually not by default, but GCC and other compilers have intrinsics like __builtin_expect [1] which may generate branch hints.
  [1]: https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index...
  - menaerus 10 months ago
    
    In practice this doesn't result with a special-kinded instruction to hint the CPU branch-predictor but only affects the compiler to generate a codegen that is optimized for better CPU instruction-cache utilization. E.g. it will try to move the less likely code out from the hot code path and as such the likely code is going to be more dense and co-located next to each other.
    
    vlovich123 10 months ago
    
    It can also change the instruction selection. For example, __builtin_unpredictable which is a close cousin of expect will indicate there's no prediction possible on a branch. This causes the compiler to select branchless instructions & is particularly useful in things like binary search where the CPU attempting to do prediction is worse than using a branchless version of the code.
  - unwind 10 months ago
    
    Quite popular in the Linux kernel code, as far as I've seen over the years. See [1] for some discussion over on SO.
    [1]: https://stackoverflow.com/questions/109710
  - Salgat 9 months ago
    
    I think in general we're overdue for more compiler directives, and not just branch hints. As heterogenous architectures become more standard, we really need a way to tag what functions are best to run with what types of instructions. For example, tag a function that will utilize SIMD so that if a thread is running this function, the operating system knows to try to have it run on the cores that support SIMD. As of now, with P and E Cores for Intel, Intel is straight up disabling instructions to ensure uniform instruction support.
- beeflet 10 months ago
  
  IDK but I know C++ has [[likely]] and [[unlikely]] hints:
  https://en.cppreference.com/w/cpp/language/attributes/likely
- me_me_me 10 months ago
  
  They were not necessary, as the CPU would run the both sides of the branch in parallel and scrap the irrelevant side when it was known to be false. Essentially making a hint pointless, as it was not utilized at all.
  Now the execution of both sides of a branch is basis for the 'Specter' side-channel attack. Access restricted data in the false branch, data access still happen even though it was restricted.
  That is essentially impossible to prevent and disabling it killed a lot of cpu performance, branch hints might be the next best thing.
  - LegionMammal978 10 months ago
    
    > Now the execution of both sides of a branch is basis for the 'Specter' side-channel attack. Access restricted data in the false branch, data access still happen even though it was restricted.
    By all accounts of Spectre I've seen, the processor never tries to speculate both sides of the branch. Instead, it always predicts one side of the branch, and speculatively executes that side only: when it later turns out to have been a misprediction, it rolls back the speculative execution, and proceeds to execute the other side from scratch. Both sides are executed at some point, but not simultaneously.
  - hinoki 10 months ago
    
    If you’re not speculatively executing, you don’t need a branch hint because you don’t need to speculate which way it goes.
    Also, I don’t think executing both sides of a branch ever took off on any mainstream CPUs (unless Itanium counts). It wastes power to spend half your execution units on things that won’t be committed, why not use them on the other hyperthread instead?
    
    me_me_me 10 months ago
    
    > Also, I don’t think executing both sides of a branch ever took off on any mainstream CPUs
    That part I am sure off. I will double check with a friend of mine of of curiosity, but one thing to note is that the execution units are processing branches up to the point when branch is evaluated, then the false path is dropped.
    Back then the speed was trumping the power draw. I am not sure what are the priorities today.
    In terms of hyperthread, i don't think you can safely execute instructions of both siblings due to possible shared cache mem clashes. But I am guessing now. Its been a while since I have been working that low level to remember the details.
    
    zerohp 10 months ago
    
    I don't know of any CPU that speculates both sides of a branch. I work on a CPU design team.
    Modern CPUs speculate hundreds of instructions ahead, and with just a dozen branches you can have a few thousand different paths. It makes more sense to speculate down one path with very high accuracy.
    
    Remnant44 10 months ago
    
    This is the right answer. :)
    I think a lot of folks get mixed up with GPU and/or SIMD architecture, where you execute both sides of the branch out of necessity: Some of the lanes need to go one way and some the other, so you have to do both.
    
    bee_rider 10 months ago
    
    I’ve always thought this would be an interesting use for a hyperthread (send it to execute both sides of an if, when the programmers knows the branch predictor is likely to not be able to know which side is right). But, never got around to coding anything like that up…
    
    markhahn 10 months ago
    
    the problem is that when you care (unpredictable but hot branches), there are probably too many of them to execute in parallel (use up all your speculation depth). 2^n, you know!
    not to mention that you burn a lot more power.
  - IshKebab 10 months ago
    
    Yeah I'm pretty sure most CPUs don't actually execute both branches. They pick the most likely one and execute that.
    See https://people.computing.clemson.edu/~mark/eager.html
    
    adrian_b 10 months ago
    
    That is right.
    The only purpose of the branch predictor, which has become one of the biggest and most important parts of any modern CPU core, is to execute only one of the two branches, hoping that you have guessed the right one.
    On a CPU that executes both branches, there is no reason for a branch predictor to exist.
    The equivalent of executing both branches is obtained by replacing the conditional branches with conditional move or conditional select instructions, which are used after both values corresponding to the two alternatives have been computed. The use of conditional move/select instructions is justified only when the direction of the conditional branch would have been quasi-random, so the branch predictor would have failed to predict it.
    Executing both branches has an exponential cost in the number of conditional branches that are speculated ahead, so it is neither feasible nor desirable, as it would greatly increase both the power consumption and the die area for a given performance.

zdw 10 months ago

"AI has been very popular among people wishing they had seven fingers on each hand instead of five"

hedora 10 months ago

Newer models don’t have this issue.
Now, everyone gets five fingers and a thumb, as requested.
ForOldHack 10 months ago

But most unfortunate for Numbers. We have 4 fingers, making binary capabilities, now if we had 60 fingers... then it would be a different story. With 5 fingers on each hand, and two hands, that is the product of two primes, and with 7 fingers on 3 hands, also the product of two primes... You have something to look forward to in Genetic Engineering 2077. Eight fingers on 4 hands? Personally, my fingers count only to 1023, and have for decades.
The article points out specific use of specific techniques to probe cache and other efficiencies, which is valuable.
- tbalsam 10 months ago
  
  "....and 17 other bus stop conversations you'd never expect to have."
bravetraveler 10 months ago

Think of how much more you could get done
- Rinzler89 10 months ago
  
  I call it "the stranger".
- dylan604 10 months ago
  
  I think another hand instead of extra fingers would be much more productive
  - bravetraveler 10 months ago
    
    Totally, I'm making a somewhat 'disposable'/weak joke
    It's a fun play on the idea of a productivity enhancer. AI - the extra fingers or hand - does well to spare me effort, I just think productivity may not be the right way to look at things
- dmichulke 10 months ago
  
  "[I'd do it but] I only have 5 fingers"
anthk 10 months ago

Emacs users and Lisp? For sure, since the ITS days.
kristianp 9 months ago

I wondered if this was a saying translated from another language.
LeoPanthera 10 months ago

People downvoting you probably don't realise this is a direct quote from the article. Maybe if you had written some actual comment, too...
- zdw 10 months ago
  
  Sorry about no additional comment, I couldn't think of something clever that included an Inigo Montoya reference...
  - adityaathalyo 10 months ago
    
    > Inigo Montoya reference
    Huh? Who?
    
    kstrauser 10 months ago
    
    Inconceivable.
    (But also, you must go watch “The Princess Bride” tonight to catch the million cultural references you’ve been missing.)
    
    selimthegrim 10 months ago
    
    Princess Bride character
    
    __MatrixMan__ 10 months ago
    
    who wants revenge on a six-fingered man
- kelnos 10 months ago
  
  It's the latter bit for me. Ok, so it's a quote from the article. So what? What's interesting about it? What's funny about it? Don't make me guess. Tell my why you thought it was worth your time copy-pasting it into the comment box. Foster some discussion!
  - ezst 10 months ago
    
    FTR, that had me chuckle, it's quite funny considering the fingers nightmares current day generative "AI" comes up with
- roenxi 10 months ago
  
  It was a no-content throwaway comment in the article too; highlighting it on HN is pointless. I doubt the downvoters would care much whether it was in the article or not.