A Math-er of fact

*Filed under: I better write this down before I forget how I did it.

A co-worker at a previous workplace came to me with a problem they were working on. They needed a skewed pseudorandom number generator that, for a given range, would return numbers at the end of the range more often than numbers at the start.

We had no other requirements, nor a definition of what 'more often' should mean, so there was free reign to come up with the algorithm ourselves. 

We did a few attempts at hand crafting some pretty awful solutions, involving conditional statements based on the results of other random numbers. They were pretty poor solutions. Deep down I knew there was a more mathematical solution somewhere. 

It wasn't until I decided to draw what I needed as a graphical representation that I had an insight into a better solution. I needed a function f(x) that created an end result that was more likely to be a larger value than a smaller one. In graphical terms, for any input on the x-axis, my "randomised" (y) value would lie on the following graph:

required-graph.png

Here, for a uniformly spaced value of x the value of y is skewed towards the upper bounds, while still resulting in low values occasionally getting through if the value of x was sufficiently low. Now all I had to do was create an algorithm that could replicate this graph.

Reinventing the wheel

Did I say create? I mean find. There is absolutely no need to try to create such a curve when they probably already exist in nature. This graph is pretty simple, there's surely an easily calculated function that will approximate what I want.

Any time you're dealing with curves you should be thinking of two things, π and trigonometry. They are everywhere in nature and just because what you're looking for isn't an obvious application of trigonometric functions doesn't mean you should discount it.

For example, here's sin(x):

That's not bad, the curve starts off steeply, but it's a little too uniform for my tastes. Also, this curve is mirrored, it will descend and repeat periodically. We want a curve that will stay high no matter the range.

Without turning this blog into a treasure trove of trig-y curve-y goodness, I'll cut this short. The answer (as I require it) doesn't lie in trigonometric functions, it lies in inverse trigonometric functions. Specifically the inverse of tan(x), or arctan(x) as it is also known. In many C based libraries, it is usually called atan(x).

arctan(x) is bounded in the range (-π/2, π/2) for values of x in the range (-∞, ∞) so our solution will need to restrict itself to non-negative values of x. For values of x in (0, ) the value of arctan(x) is bounded in the range (0, π/2).

Dividing by π/2 will give us a normalised value of y between (0, 1) which we can now use to generate our skewed random numbers. Here are two different random number generators, one is for your standard uniform distribution, the other a skewed distribution which should favour higher numbers.

Plugging this into a playground, like so

results in the following output

uniform distribution [0, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 9, 9, 9]
skewed distribution[0, 0, 5, 5, 5, 5, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9]

Good enough for me.

This wasn't the final algorithm we settled on, we found that the higher values were appearing way more often than we liked, but with a little tweaking we softened the curve a little to give the lower values more of a fighting chance. But the overall idea is what you see here.

Shoutout to my parents for spending a fortune on my Mathematics degree, just so I could one day do this one tiny thing.