In this post, let’s talk about another variance reduction method called “control variate”. This is yet another variance reduction method besides importance sampling [1].
We start from a simple example: we want to estimate , the expectation of function
applied on a random variable
with
sampled from
. Analytically,
. You can also think of
as
, the expectation of another random variable
with any function exerted on it. Since I like the form
better, I’ll keep using this format throughout the post. In our setting, we can only evaluate
for any given
but do not know the form of
in advance. Using Monte Carlo naively, we can draw
numbers from
and take average of their associated values in
:
We can use control variate to reduce the variance of our estimate of . In the control variate method, we introduce another function
, for which we can evaluate
for any
and even know its expectation
(note: we don’t know that for
).
The control variate method says that the following estimator, which can be seen as another random variable, is an unbiased estimate of
:
It is easy to see that is an unbiased estimator:
Now let’s look at the variance of :
Thus, as long as we introduce such that
, we will reduce the variance of the estimate
.
A more advanced method is called regression sampling [2]. We can have a tunable parameter in
:
In the control variate example, . However, when
is tunable, the variance of
becomes:
,
which is a quadratic formula with regard to . Therefore, the optimal coefficient is:
With , the optimal variance of
can be achieved at:
This concludes that the more correlated and
are with each other, the more variance we can reduce potentially.
Now, let’s do an experiment to verify the control variate method indeed reduces variance. We let be a uniform distribution in range
,
and
. Statistical quantities can be calculated as:
Therefore, according to the derived theory, we have:
.
In other words, we will expect to see around 82.2% of the original variance.
Here is the code
import numpy
T = 20000 # num of rounds
N = 10 # num of samples per round
a, b = 0., 10. # lower bound, upper bound
expct_hx = 5.
# monte carlo estimation
mc_res = []
for t in range(T):
x = numpy.random.uniform(low=a, high=b, size=N)
mc_res.append(numpy.mean(x**2))
# control variate estimation using h(x)
cv_res = []
for t in range(T):
x = numpy.random.uniform(low=a, high=b, size=N)
cv_res.append(numpy.mean(x ** 2 - x + expct_hx))
print("MC estimation (mean/std): {0}/{1}".format(numpy.mean(mc_res), numpy.std(mc_res)))
print("Control variate estimation (mean/std): {0}/{1}".format(numpy.mean(cv_res), numpy.std(cv_res)))
print(f"Variance reduction percentage {(numpy.std(cv_res)/numpy.std(mc_res))**2}")
Here’s the result and indeed we see around 82% variance as we use the control variate:
MC estimation (mean/std): 33.292873225286186/9.426090731496881
Control variate estimation (mean/std): 33.37154726826849/8.61508822016471
Variance reduction percentage 0.8353264371911822
In the end, I want to mention that we are analyzing with the assumption that it is a single-sample random variable. If we always collect
samples and use their mean as the estimate of
, then
. This matches with our intuition that the more samples you average over, the less variance you have. However, the relative ratio of variance you can save by introducing
will remain the same no matter how large
is, which is
.
I also want to mention that there are other variance reduction techniques usable in other scenarios, such as importance sampling and reparameterization trick. People have shown empirically that reparameterization trick can also reduce variance [4] but so far I haven’t found any theoretical proof.
References
[1] https://czxttkl.com/2017/03/30/importance-sampling/
[2] https://en.wikipedia.org/wiki/Control_variates