bee_rider a day ago

Oh dang, branch hints. I always thought they were so obvious, and never implemented, so they must be obviously bad. But, Intel is giving them a shot. Neat!

  • pclmulqdq a day ago

    Giving them another shot. Pentium 4 had them, but there were some skill issues on the part of programmers using them, and so code quality rose when CPUs started ignoring them.

    • rayiner 11 hours ago

      I don’t think it’s a skill issue. It’s that they’re really only useful with profile guided optimization—so you can have hints that reflect actual branch probabilities. But most developers don’t seem to bother to do that.

      • pcwalton 3 hours ago

        I wouldn't say that PGO is necessary to predict branches correctly. Lots of branches are exceptional control flow (error checks, bounds checks, etc.) and compilers will almost always correctly statically predict not taken for them. Any branch with a target that's postdominated by a throw or a process abort can basically be safely predicted not taken (after all, if you got it wrong, the penalty is going to be minuscule compared to the cost of the exception).

        • eigenform 2 hours ago

          The hardware itself is already very good at predicting the direction. I think the point about PGO here is that you need it to find infrequently-occuring biased-taken branches.

          When you encounter a biased-taken branch that isn't currently being tracked by the machine, you're always condemned to pay the cost of a misprediction because the default prediction is "not-taken". The hinting here is supposed to indicate "when encountering this branch, don't predict not-taken by default."

      • acdha 6 hours ago

        If memory serves, wasn't that a bit of a tool minefield back then, too? It’s been a while but I thought I remembered a few colleagues trying GCC’s version which involved building a new version of GCC, slowing down the builds a fair amount, and then seeing only a small benefit – far less than they got switching to the first AMD Opteron when it came out a year later.

    • TrainedMonkey 12 hours ago

      Arguably on P4 they were required to get any kind of throughput due to an incredibly long execution pipeline Intel contrived to keep pushing clock frequency up. Branch mispredictions were extremely costly on that architecture.

      • pclmulqdq 12 hours ago

        Yes, the Pentium 4 pipeline was ~32 cycles long, which meant that a branch misprediction carried a huge penalty. Processors today have settled on using a ~15-cycle pipeline.

        • magicalhippo 5 hours ago

          I've probably read about it and forgotten, but what on earth was it doing in all those pipeline stages?

          Were there a lot of idle/no-op stages for simple instructions? Or did most instructions, even simple ones, actually get some useful work done for each of those 32ish stages?

          • pclmulqdq 2 hours ago

            Here's a good source for the earliest Pentium 4, which had a 20-cycle pipeline: https://courses.cs.washington.edu/courses/cse378/10au/lectur...

            Mostly the stages you expect from an out-of-order core all had their own pipeline slot in order to go super fast, and some were split into two stages. For example, you have two stages to calculate the next instruction (branch prediction), two stages to fetch, a stage to redrive a value from the memory bus, etc. From there, the ALUs could only do a 16-bit add in one cycle, so even an addition took at least 3 cycles. The later Pentium 4's took this subdivision even further with the goal of reaching 4-5 GHz clocks in 2005.

            The whole chip was hand-laid-out, so it was possible to put pipeline registers in places that would have been weird to do in Verilog.

  • yjftsjthsd-h a day ago

    IIRC, SPARC also used them, which... probably suggests that it's not totally terrible.

    • nineteen999 13 hours ago

      Then again SPARC had register windows ... so ...

    • duskwuff a day ago

      I believe PowerPC had them as well. No idea how effective they were.

      • seanmcdirmid a day ago

        I took computer architecture in the mid-late-90s and branch hints were talked about as a thing that was being done.

        • bee_rider 10 hours ago

          Yeah, in my class (a couple years ago, but not decades) they were covered as something that seemed like a neat idea, but then didn’t help much for actual programs. The general theme in that class was that everything cool added more complexity than the performance benefit could justify, and got in the way of making things wider.

          Same for branch delays slots.

          • duskwuff 10 hours ago

            Branch delay slots were just a straight-up instance of exposing an ugly quirk of the pipeline to the programmer. They became a liability in later revisions of MIPS, as it no longer arose naturally from the architecture and had to be deliberately included.

      • KerrAvon 13 hours ago

        From distant memory: they were effective for at least some PowerPC implementations. I don't recall how much, but it was considered worthwhile in hot code for at least some machines. However, IIRC, branch prediction wasn't as advanced at that point; a modern implementation might not see the same benefits.

  • drmpeg 19 hours ago

    The i960CA and CF had branch hints. It was bit 1 in the branch and the compare and branch opcodes.

  • jeffbee 14 hours ago

    I can see how branch hints are useful, but in practice isn't the sign of the distance to the branch target the implicit hint? If the predictor doesn't have any other information it assumes that backwards branches are taken and forward branches are not. So, you can imagine how the compiler would rearrange the program to align with that implication, where possible.

    One of the things that the BOLT post-link optimizer does is rearrange basic blocks to reduce taken forward branches, based on the profiles.

    • marcosdumay 12 hours ago

      You can't reorder some kinds of branches, like retry sequences.

      I have absolutely no idea how important that it. But there exist that bit of extra complication on the real world.

      • taeric 8 hours ago

        Retry sequences are unlikely to be the kind that the branch prediction is that important? I'd also expect that they are larger than the threshold that is common for branches that are likely to be taken many times.

        That is, the stuff that will go into a retry of something likely has far more setup than your typical hot loop.

    • bee_rider 11 hours ago

      Huh, that’s a funny and true perspective. I always took the “branches backwards tend to be loops” assumption as true, but of course there’s no reason a compiler or linker couldn’t use that assumption as well.

  • IshKebab a day ago

    Do compilers even generate these hints?

    • duskwuff a day ago

      Usually not by default, but GCC and other compilers have intrinsics like __builtin_expect [1] which may generate branch hints.

      [1]: https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index...

      • menaerus 17 hours ago

        In practice this doesn't result with a special-kinded instruction to hint the CPU branch-predictor but only affects the compiler to generate a codegen that is optimized for better CPU instruction-cache utilization. E.g. it will try to move the less likely code out from the hot code path and as such the likely code is going to be more dense and co-located next to each other.

        • vlovich123 13 hours ago

          It can also change the instruction selection. For example, __builtin_unpredictable which is a close cousin of expect will indicate there's no prediction possible on a branch. This causes the compiler to select branchless instructions & is particularly useful in things like binary search where the CPU attempting to do prediction is worse than using a branchless version of the code.

    • me_me_me 15 hours ago

      They were not necessary, as the CPU would run the both sides of the branch in parallel and scrap the irrelevant side when it was known to be false. Essentially making a hint pointless, as it was not utilized at all.

      Now the execution of both sides of a branch is basis for the 'Specter' side-channel attack. Access restricted data in the false branch, data access still happen even though it was restricted.

      That is essentially impossible to prevent and disabling it killed a lot of cpu performance, branch hints might be the next best thing.

      • LegionMammal978 14 hours ago

        > Now the execution of both sides of a branch is basis for the 'Specter' side-channel attack. Access restricted data in the false branch, data access still happen even though it was restricted.

        By all accounts of Spectre I've seen, the processor never tries to speculate both sides of the branch. Instead, it always predicts one side of the branch, and speculatively executes that side only: when it later turns out to have been a misprediction, it rolls back the speculative execution, and proceeds to execute the other side from scratch. Both sides are executed at some point, but not simultaneously.

      • hinoki 15 hours ago

        If you’re not speculatively executing, you don’t need a branch hint because you don’t need to speculate which way it goes.

        Also, I don’t think executing both sides of a branch ever took off on any mainstream CPUs (unless Itanium counts). It wastes power to spend half your execution units on things that won’t be committed, why not use them on the other hyperthread instead?

        • bee_rider 14 hours ago

          I’ve always thought this would be an interesting use for a hyperthread (send it to execute both sides of an if, when the programmers knows the branch predictor is likely to not be able to know which side is right). But, never got around to coding anything like that up…

          • markhahn 12 hours ago

            the problem is that when you care (unpredictable but hot branches), there are probably too many of them to execute in parallel (use up all your speculation depth). 2^n, you know!

            not to mention that you burn a lot more power.

        • me_me_me 14 hours ago

          > Also, I don’t think executing both sides of a branch ever took off on any mainstream CPUs

          That part I am sure off. I will double check with a friend of mine of of curiosity, but one thing to note is that the execution units are processing branches up to the point when branch is evaluated, then the false path is dropped.

          Back then the speed was trumping the power draw. I am not sure what are the priorities today.

          In terms of hyperthread, i don't think you can safely execute instructions of both siblings due to possible shared cache mem clashes. But I am guessing now. Its been a while since I have been working that low level to remember the details.

          • zerohp 12 hours ago

            I don't know of any CPU that speculates both sides of a branch. I work on a CPU design team.

            Modern CPUs speculate hundreds of instructions ahead, and with just a dozen branches you can have a few thousand different paths. It makes more sense to speculate down one path with very high accuracy.

            • Remnant44 5 hours ago

              This is the right answer. :)

              I think a lot of folks get mixed up with GPU and/or SIMD architecture, where you execute both sides of the branch out of necessity: Some of the lanes need to go one way and some the other, so you have to do both.

      • IshKebab 13 hours ago

        Yeah I'm pretty sure most CPUs don't actually execute both branches. They pick the most likely one and execute that.

        See https://people.computing.clemson.edu/~mark/eager.html

        • adrian_b 11 hours ago

          That is right.

          The only purpose of the branch predictor, which has become one of the biggest and most important parts of any modern CPU core, is to execute only one of the two branches, hoping that you have guessed the right one.

          On a CPU that executes both branches, there is no reason for a branch predictor to exist.

          The equivalent of executing both branches is obtained by replacing the conditional branches with conditional move or conditional select instructions, which are used after both values corresponding to the two alternatives have been computed. The use of conditional move/select instructions is justified only when the direction of the conditional branch would have been quasi-random, so the branch predictor would have failed to predict it.

          Executing both branches has an exponential cost in the number of conditional branches that are speculated ahead, so it is neither feasible nor desirable, as it would greatly increase both the power consumption and the die area for a given performance.

zdw a day ago

"AI has been very popular among people wishing they had seven fingers on each hand instead of five"

  • hedora 15 hours ago

    Newer models don’t have this issue.

    Now, everyone gets five fingers and a thumb, as requested.

  • ForOldHack a day ago

    But most unfortunate for Numbers. We have 4 fingers, making binary capabilities, now if we had 60 fingers... then it would be a different story. With 5 fingers on each hand, and two hands, that is the product of two primes, and with 7 fingers on 3 hands, also the product of two primes... You have something to look forward to in Genetic Engineering 2077. Eight fingers on 4 hands? Personally, my fingers count only to 1023, and have for decades.

    The article points out specific use of specific techniques to probe cache and other efficiencies, which is valuable.

    • tbalsam 15 hours ago

      "....and 17 other bus stop conversations you'd never expect to have."

  • bravetraveler a day ago

    Think of how much more you could get done

    • Rinzler89 a day ago

      I call it "the stranger".

    • dylan604 11 hours ago

      I think another hand instead of extra fingers would be much more productive

      • bravetraveler 8 hours ago

        Totally, I'm making a somewhat 'disposable'/weak joke

        It's a fun play on the idea of a productivity enhancer. AI - the extra fingers or hand - does well to spare me effort, I just think productivity may not be the right way to look at things

  • anthk 17 hours ago

    Emacs users and Lisp? For sure, since the ITS days.

  • LeoPanthera a day ago

    People downvoting you probably don't realise this is a direct quote from the article. Maybe if you had written some actual comment, too...

    • zdw a day ago

      Sorry about no additional comment, I couldn't think of something clever that included an Inigo Montoya reference...

      • adityaathalyo 15 hours ago

        > Inigo Montoya reference

        Huh? Who?

        • kstrauser 14 hours ago

          Inconceivable.

          (But also, you must go watch “The Princess Bride” tonight to catch the million cultural references you’ve been missing.)

    • kelnos 19 hours ago

      It's the latter bit for me. Ok, so it's a quote from the article. So what? What's interesting about it? What's funny about it? Don't make me guess. Tell my why you thought it was worth your time copy-pasting it into the comment box. Foster some discussion!

      • ezst 18 hours ago

        FTR, that had me chuckle, it's quite funny considering the fingers nightmares current day generative "AI" comes up with

    • roenxi a day ago

      It was a no-content throwaway comment in the article too; highlighting it on HN is pointless. I doubt the downvoters would care much whether it was in the article or not.