Age of Ascent - Developer Blog

A Journey of Optimization - The Saga of Ben's Magic Number

November 23rd, 2016 by Ben Adams

ASP.NET Community Standup: De Bruijn and Ben's Magic Number

A pull request from the Age of Ascent team was recently merged to the ASP.NET Core Kestrel Webserver.

Excerpts from this week’s ASP.NET Community Standup where they discuss the Saga of Ben’s Magic Number…

39:15

Scott Hanselman

I was at Connect(); you know trying to do right by the team. And Maria gave a great talk and I gave a talk and we showed Visual Studio 2017.

We showed the perf stuff that was all the rage. The hacker news coverage seemed to be fairly positive. People feel like we are actually serious. With each new release there is that statement “You know we are serious about this, aren’t you?”

Jon Galloway

You know there was a thing today on the front page of Hacker News? Talking about the whole Ben’s magic number thing; that pull request on Kestrel.

Damian Edwards

The one we just accepted. Months in the making that one; it was crazy.

Scott Hanselman

What was that? Can you provide some context there?

Damian Edwards

There is a point in Kestrel’s code where bytes are coming off the wire; its basically when we are parsing HTTP and we are looking at blocks of bytes at a time using the Vectorization stuff that we have spoken about a couple of times. Once we run the vector operation and we get back a result, that result will tell us “yes there was a byte that you are interested in in this chuck of bytes”; but it doesn’t tell us where; because it gives you are result in the same size that you passed to it.

So if you pass it a 64 bit byte block of bytes or 8 bytes you get back a result that’s 8 bytes. And you have to find where is the byte that I care about. Because we are looking for a carriage return line feed, or something like that.

And suprisingly on plaintext that actually shows up on the profiles; because the vast majority of the time is parsing bytes as they come in off the wire and turning them into http objects.

Previously we were using what’s called a quasi-tree search; which was basically a binary tree search through the resulting bytes that came back and so we were trying to check one half against a know mask and if that didn’t work we’d check the other half; then we’d dive down half every time till we found the region it was in. So in worse case it was; I don’t know whatever the big O notation is, O(log N) or something.

So that was ok, but someone discovered there are other interesting bit things you can do for these types of patterns. Specifically the chess community, from my understanding, which has been building chess software - they have to deal with a lot of these types of problems. “I have to find the least significant blah blah in a matrix of blah”; and they have all these crazy tricks and mathematical algorithms that they use to do these things.

Some of these are only very recently known, like in the last decade or so. There is a very good page on Stanford’s website called Bit Twiddling Hacks and it has all these interesting algorithms for doing stuff.

One of them was called a De Bruijn sequence. It’s basically an algorithm you can run that given an alphabet of so many characters long and a payload size of things composed of that alphabet; you can sorta “do math” and come up with a magic number that you can then bit shift with crazy bit math to find very quickly through just a couple of CPU operations where the least significant or most significant bit in a given example of a word made up of that alphabet exists.

So using that, someone, they’ve been using that, for the last three months going through all these crazy iterations on that to find the fastest possible way of performing this; what seems a fairly trival operation - scanning a bunch of bytes and find the byte that we are interested in.

While taking into consideration things like: how the jitter works; how the vector libaries work; what generation of CPU you have; what op codes get generated based on what version of the jitter you have - because the difference between running 17 op codes and 8 op codes is significant when you are doing this many times. But depending on what op codes they are and what generation of CPU you have you have to take that into account as well.

At some point; I just cease understanding what they are all talking about; but I read it anyway, and its really interesting.

This pull request has been active for months now; and its had Ben and three other people from the community who have jumpped in.

And we’ve had someone from the Jit team who is on this PR as well; becuase they have made fixes in the Jit as a result of this type of investigation.

Oh we’ve found problems with the vectorization support; its suboptimal when you do these types of operations

And the Jit team are

Oh we’ve fixed that in the next version of the Jit; can you rerun it with this version of the build and dah dah da

So anyway, end result it got checked in today I think. And on our big iron here it results in a 5% improvement for the plaintext benchmark!

Scott Hanselman

5 percent?

Damian Edwards

Went from 5 million to 5.25 million or something like that in requests per second. The real world performance on a big machine is a quater of a million requests per second.

Scott Hanselman

Its also worth pointing out the amount of patience required. Like you pointed out something; in a kinda throw away statement - “Oh - the pull request was open for months”; that’s a big deal. You can’t just go off by yourself, do a huge pull request, send it off to you and the team and expect it to be done in a week.

Additionally you have to be prepared for discussion to occur, other things down and possibilly upstream of you, to figure those things out. If you decide to drop it, wait a month, get mad, say fine “screw you all, I don’t want to be involved”.

But one of the things that is fun about Ben, and of course other people who have worked in the community, is their patience and their willingness to stick with it. But the benefit is immense.

Damian Edwards

Yeah, and it wasn’t just open and idle. It was open and in discussion. Then a little bit idle while we were shipping 1.1.0 because we just didn’t have the cycles to go and do the review.

Then Stephen and a couple others from the Kestrel team did the work over the last couple of days to actually go through and review this line by line, and really understand it, because some of this code is really gnarly and obviously we care about having the code be understandable and the next person that comes along has understand it.

Even in the review we found places where reordering two lines broke Kestrel. And we knew about that but we hadn’t properly commented the places where that was important, because of lock synchronisation and rubbish like this. And Ben’s like “Oh I didn’t realise that”; and that’s beacuse, yeah, we forgot to write the comments there.

So then did a sweep and made sure we commented all those places. We only knew that becuase we had broken it ourselves 6 months ago.

We had done the same thing: “Oh look its faster, we’ll reorder these things” and didn’t realise we’d broken it till we ran a bunch of other tests.

So at this level the code is; you know, pretty complicated - so it takes a bunch of eyes and a bunch of deep thinking to really appreciate whether this is the right change.

But as a result; we’ve gotten through this; the tests have passed; we’ve got a good 5% improvement in plaintext.

And as a result of the investigation, Ben has said “Oh and I’ve noticed as part of this investigation there is a whole bunch of other low hanging fruit of the same caliber in lots of other places to do with how we are scanning bytes”. He’s focused on this one and now he is going to apply what he has learnt to other areas and see if we can chip away and get another couple of percent, or 3% here or 5% there.

Plus the Jit team has improved the Jitter; so if you are using a new CPU architecture we can improve the code even more and make it easier to understand because the Jit can do the hard work rather than us having to do the hardwork in the code.

So its great stuff - seriously is!

Jon Galloway

One other thing to just echo what Scott said it’s a conversation; I was talking to Ben at the MVP summit. There is a part of the code and there is an interesting part in this long pull request where he came up with this “Ben’s magic number”.

Damian Edwards

Yes 😁

Jon Galloway

Which is a long and its like specially designed so when you [bitwise]and it with certain things its going to give you the; so its all one ‘and’ operation; so you give it the string as an int and it will immedately give you the position.

Damian Edwards

Its literally a number that if you bitmask it with random input; you are guaranteed to get a correct result, as long as the input is a certain size; it will give you a correct result, as in it will give you the correct number that represents what place in the input is the character you are looking for.

Scott Hanselman

That’s insane. It sounds like the kinda thing that “old Microsoft” would have patented.

Jon Galloway

😂 So the funny thing with this is; that Ben came up with this, and for a day or two he said he was; his wife was like you’ve never been this happy you are so like…

And then somebody else was like “Well if you just bitshift it by two places, then the number is a lot simplier and its this and this” And then he was all depressed and his wife was like “What’s wrong”; and he was like “He’s right, it is a better number than mine!”

Damian Edwards

Ben’s magic number prime’ 😆

I think there was another point where he said he hit some wall; and went to bed because he had a headache, then he woke up in the middle of the night and went “wait!” and then tried it again and figured the thing out. As he’d done a whole bunch of research trying to find the magic number, as he didn’t want to have compute it; he was hoping someone else had a similar scenario and had already generated this magic number. As a lot of the work is in generating the number, and then he could just plug that number into our case and it would work.

But in the end he had to build his own; and then like you said it got derived and derived and people simplified it and we’ve ended up where we are today.

So anyway… that happened this week. It’s crazy crazy stuff…

You can watch it below or watch this one and all the ASP.NET Community Standups at live.asp.net.

Get Updates

For updates and information by email:

Get in touch!

For press: press@ageofascent.com.

Categories

Archives