Inverse Transform Sampling in Python

6 minute read

When doing data work, we often need to sample random variables. This is easy to do if one wishes to sample from a Gaussian, or a uniform random variable, or a variety of other common distributions, but what if we want to sample from an arbitrary distribution? There is no obvious way to do this within scipy.stats. So, I build a small library, inverse-transform-sample, that allows for sampling from arbitrary user provided distributions. In use, it looks like this:

import numpy as np
pdf = lambda x: np.exp(-x**2/2) # unit Gaussian, not normalized
from itsample import sample
samples = sample(pdf,1000) # generate 1000 samples from pdf	

The code is available on GitHub. In this post, I’ll outline the theory of inverse transform sampling, discuss computational details, and outline some of the challenges faced in implementation.

Introduction to Inverse Transform Sampling

Suppose we have a probability density function \(p(x)\), which has an associated cumulative density function (CDF) \(F(x)\), defined as usual by

Recall that the cumulative density function \(F(x)\) tells us the probability that a random sample from \(p\) is less than or equal to x.

Let’s take a second to notice something here. If we knew, for some x, that \(F(x)=t\), then drawing \(x\) from \(p\) is in some way equivalent to drawing \(t\) from a uniform random variable on \([0,1]\), since the CDF for a uniform random variable is \(F_u(t) = t\).1

That realization is the basis for inverse transform sampling. The procedure is:

  1. Draw a sample \(t\) uniformly from the inverval \([0,1]\).
  2. Solve the equation \(F(x)=t\) for \(x\) (invert the CDF).
  3. Return the resulting \(x\) as the sample from \(p\).

Computational Considerations

Most of the computational work done in the above algorithm comes in at step 2, in which the CDF is inverted.2 Consider Newton’s method, a typical routine for finding numerical solutions to equations: the approach is iterative, and so the function to be inverted, in our case the CDF \(F(x)\), is evaluated many times. Now, in our case, since \(F\) is a (numerically computed) integral of \(p\), this means that we will have to run our numerical quadrature routine once for each evaluation of \(F\). Since we need many evaluations of \(F\) for a single sample, this can lead to a significant slowdown in sampling.

Again, the pain point here is that our CDF \(F(x)\) is slow to evaluate, because each evaluation requires numerical quadrature. What we need is an approximation of the CDF that is fast to evaluate, as well as accurate.

Chebyshev Approximation of the CDF

I snooped around on the internet a bit, and found this feature request for scipy, which is related to this same issue. Although it never got off the ground, I found an interesting link to a 2013 paper by Olver & Townsend, in which they suggest using Chebyshev polynomials to approximate the PDF. The advantage of this approach is that the integral of a series of Chebyshev polynomials is known analytically - that is, if we know the Chebyshev expansion of the PDF, we automatically know the Chebyshev expansion of the CDF as well. This should allow us to rapidly invert the (Chebyshev approximation of the) CDF, and thus sample from the distribution efficiently.

Other Approaches

There are also less mathematically sophisticated approaches that immediately present themselves. One might consider solving \(F(x)=t\) on a grid of \(t\) values, and then building the function \(F^{-1}(x)\) by interpolation. One could even simply transform the provided PDF into a histogram, and then use the functionality built in to scipy.stats for sampling from a provided histogram (more on that later). However, due to time constraints, inverse-transform-sample only includes the numerical quadrature and Chebyshev approaches.

Implementation in Python

The implementation of this approach is not horribly sophisticated, but in exchange it exhibits that wonderful readability characteristic of Python code. The complexity is the highest in the methods implementing the Chebyshev-based approach; those without a background in numerical analysis may wonder, for example, why the function is evaluted on that particularly strange set of nodes.

In the quadrature-based approach, both the numerical quadrature and root-finding are both done via scipy library (scipy.integrate.quad and scipy.optimize.root, respectively). When using this approach, one can set the boundaries of the PDF to be infinite, as scipy.integrate.quad supports improper integrals. In the notebook of examples, we show that the samples generated by this approach do, at least in the eyeball norm, conform to the provided PDF. As we expected, this approach is slow - it takes about 7 seconds to generate 5,000 samples from a unit normal.

As with the quadrature and root-finding, pre-rolled functional from scipy was used to both compute and evaluate the Chebyshev approximants. When approximating a PDF using Chebyshev polynomials, finite bounds must be provided. A user-determined tolerance determines the order of the Chebyshev approximation; however, rather than computing a true error, we simply use the size of the last few coefficients of the Chebyshev coefficients as an approximation. Since this approach differs from the previousl only in the way that the CDF is constructed, we use the same function sample for both approaches; an option chebyshev=True will generate a Chebyshev approximant of the CDF, rather than using numerical quadrature.

I hoped that the Chebyshev approach would improve on this by an order of magnitude or two; however, my hopes were thwarted. The implementation of the Chebyshev approach is faster by perhaps a factor of 2 or 3, but does not offer the kind of improvement I had hoped for. What happened? In testing, a single evaluation of the Chebyshev CDF was not much faster than a single evaluation of the quadrature CDF. The advantage of the Chebyshev CDF comes when one wishes to evaluate a long, vectorized set of inputs; in this case, the Chebyshev CDF is orders of magnitude faster than quadrature. But scipy.optimize.root does not appear to take advantage of vectorization, which makes sense - in simple iteration schemes, the value at which the next iteration occurs depends on the outcome of the current iteration, so there is not a simple way to vectorize the algorithm.

Conclusion

I suspect that the reason this feature is absent from large-scale library like scipy and numpy is that it is difficult to build a sampler that is both fast and accurate over a large enough class of PDFs. My approach sacrifices speed; other approximation schemes may be very fast, but may not provide the accuracy guarantees needed by some users.

What we’re left with is a library that is useful for generating small numbers (less than 100,000) of samples. It’s worth noting that in the work of Olver & Townsend, they seem to be able to use the Chebyshev approach to sample orders of magnitude faster than my impelmentation, but sadly their Matlab code is nowhere to be found in the Matlab library chebfun, which is the location advertised in their work. Presumably they implemented their own root-finder, or Chebyshev approximation scheme, or both. There’s a lot of space for improvement here, but I simply ran out of time and energy on this one; if you feel inspired, fork the repo and submit a pull request!

  1. This is only true for \(t\in [0,1]\). For \(t<0\), \(F_u(t)=0\), and for \(t>1\), \(F_u(t)=1\). 

  2. The inverse of the CDF is often called the percentile point function, or PPF. 

Updated: