In this post, let’s talk about another variance reduction method called “control variate”. This is yet another variance reduction method besides importance sampling [1].
We start from a simple example: we want to estimate , the expectation of function applied on a random variable with sampled from . Analytically, . You can also think of as , the expectation of another random variable with any function exerted on it. Since I like the form better, I’ll keep using this format throughout the post. In our setting, we can only evaluate for any given but do not know the form of in advance. Using Monte Carlo naively, we can draw numbers from and take average of their associated values in :
We can use control variate to reduce the variance of our estimate of . In the control variate method, we introduce another function , for which we can evaluate for any and even know its expectation (note: we don’t know that for ).
The control variate method says that the following estimator, which can be seen as another random variable, is an unbiased estimate of :
It is easy to see that is an unbiased estimator:
Now let’s look at the variance of :
Thus, as long as we introduce such that , we will reduce the variance of the estimate .
A more advanced method is called regression sampling [2]. We can have a tunable parameter in :
In the control variate example, . However, when is tunable, the variance of becomes:
,
which is a quadratic formula with regard to . Therefore, the optimal coefficient is:
With , the optimal variance of can be achieved at:
This concludes that the more correlated and are with each other, the more variance we can reduce potentially.
Now, let’s do an experiment to verify the control variate method indeed reduces variance. We let be a uniform distribution in range , and . Statistical quantities can be calculated as:
Therefore, according to the derived theory, we have:
.
In other words, we will expect to see around 82.2% of the original variance.
Here is the code
import numpy
T = 20000 # num of rounds
N = 10 # num of samples per round
a, b = 0., 10. # lower bound, upper bound
expct_hx = 5.
# monte carlo estimation
mc_res = []
for t in range(T):
x = numpy.random.uniform(low=a, high=b, size=N)
mc_res.append(numpy.mean(x**2))
# control variate estimation using h(x)
cv_res = []
for t in range(T):
x = numpy.random.uniform(low=a, high=b, size=N)
cv_res.append(numpy.mean(x ** 2 - x + expct_hx))
print("MC estimation (mean/std): {0}/{1}".format(numpy.mean(mc_res), numpy.std(mc_res)))
print("Control variate estimation (mean/std): {0}/{1}".format(numpy.mean(cv_res), numpy.std(cv_res)))
print(f"Variance reduction percentage {(numpy.std(cv_res)/numpy.std(mc_res))**2}")
Here’s the result and indeed we see around 82% variance as we use the control variate:
MC estimation (mean/std): 33.292873225286186/9.426090731496881
Control variate estimation (mean/std): 33.37154726826849/8.61508822016471
Variance reduction percentage 0.8353264371911822
In the end, I want to mention that we are analyzing with the assumption that it is a single-sample random variable. If we always collect samples and use their mean as the estimate of , then . This matches with our intuition that the more samples you average over, the less variance you have. However, the relative ratio of variance you can save by introducing will remain the same no matter how large is, which is .
I also want to mention that there are other variance reduction techniques usable in other scenarios, such as importance sampling and reparameterization trick. People have shown empirically that reparameterization trick can also reduce variance [4] but so far I haven’t found any theoretical proof.
References
[1] https://czxttkl.com/2017/03/30/importance-sampling/
[2] https://en.wikipedia.org/wiki/Control_variates