Estimating Non-Gaussianity in The Microwave Background: A. F. Heavens

Mon. Not. R. Astron. Soc.

299, 805–808 (1998)

Estimating non-Gaussianity in the microwave background

A. F. Heavens
Institute for Astronomy, University of Edinburgh, Blackford Hill, Edinburgh EH9 3HJ

Accepted 1998 May 13. Received 1998 April 23; in original form 1998 January 7


The bispectrum of the microwave background sky is a possible discriminator between
inflationary and defect models of structure formation in the Universe. The bispectrum,
which is the analogue of the temperature three-point correlation function in harmonic
space, is zero for most inflationary models, but non-zero for non-Gaussian models. The
expected departures from zero are small, and easily masked by noise, so it is important to be
able to estimate the bispectrum coefficients as accurately as possible, and to know the errors
and correlations between the estimates so that they may be used in combination as a diagnostic
to rule out non-Gaussian models. This paper presents a method for estimating in an unbiased
way the bispectrum from a microwave background map in the near-Gaussian limit. The
method is optimal, in the sense that no other method can have smaller error bars, and, in
addition, the covariances between the bispectrum estimates are calculated explicitly. The
method deals automatically with partial sky coverage and arbitrary noise correlations without
modification. A preliminary application to the Cosmic Background Explorer 4-yr data set
shows no evidence for non-Gaussian behaviour.
Key words: cosmic microwave background – cosmology: theory – early Universe – large-
scale structure of Universe.

be strong, and even the present knowledge of the power spectrum

appears already to rule out many defect models (Pen, Seljak &
The question of whether or not the microwave background sky is Turok 1997). In this case, the absence of non-Gaussian signatures
well-approximated by a Gaussian random field is important for would be a useful confirmation. However, if the power spectrum
distinguishing inflationary and defect models of the early Universe. turns out not to be well-fitted by inflationary models, the question of
Inflationary models predict immeasurably small non-Gaussian the Gaussian nature or otherwise of the fluctuations becomes
components, arising from gravity waves (Bharadwaj, Munshi & correspondingly more important. There are many ways to approach
Souradeep 1997) and the Rees–Sciama effect (Mollerach et al. the problem of determining whether fluctuations are Gaussian, in
1995; Munshi, Souradeep & Starobinsky 1995). The difficulty large-scale structure and in the cosmic microwave background
which besets such tests is that the predicted departures from (CMB). These include the three-point function (e.g. Falk, Ran-
Gaussian behaviour for most defect models are small (e.g. Luo garajan & Srednicki 1993; Luo & Schramm 1993; Hinshaw et al.
1994a), and are correspondingly difficult to detect in the presence of 1994; Gangui et al. 1994), the genus and Euler–Poincaré statistic
noise, which may be instrumental or cosmic variance. This makes it (Coles 1989; Gott et al. 1990; Luo 1994b; Smoot et al. 1994), peak
extremely important to be able to calculate the statistical properties statistics (Bond & Efstathiou 1987; Kogut et al. 1995, 1996) and
of the non-Gaussian discriminants. In particular, it will probably be studies of tensor modes in the CMB (Coulson, Crittenden & Turok
necessary to combine the results of a large number of estimates, to 1994). The approach that we take here is to investigate the
obtain a statistically significant departure from Gaussianity (if it bispectrum, for which some studies have been made in the large-
exists), or to make a convincing case that the sky is indeed Gaussian. scale structure literature (Hivon et al. 1995; Matarrese, Verde &
Ironically, the best evidence for an inflationary model may well Heavens 1997; Verde et al. 1998) and for the CMB (Luo 1994a).
come not from a specific test for non-Gaussianity, but rather from There may be sharper tools for detecting specific non-Gaussian
the power spectrum. For inflationary models, the power spectrum of models, but the rationale for this approach is that the bispectrum
temperature fluctuations is predicted to have a reasonable amount of offers a generic test for non-Gaussian models, in the following
structure in it, with multiple acoustic oscillation peaks which should sense: a general field will have non-zero connected n-point func-
be measurable by future satellites such as MAP and Planck (Jung- tions at all orders, and the bispectrum is the lowest statistic (with
man et al. 1996; Bersanelli et al. 1996). Should such structures be n > 1) for which a Gaussian field has zero expectation value.
found, and found to agree with the inflationary model predictions The principal advantages of the approach detailed in this paper
within the errors, the case against any form of defect model would are that it deals automatically with masked regions of the sky and

806 A. F. Heavens
correlated noise, and that the estimates of the bispectrum coef- (Gangui et al. 1994):
ficients come with error bars and covariances between the errors. p
Raijk ¼ Nðgij ; gjk ; gki ÞW,1 W,2 W,3
This last point is particularly important when one realizes that a 2
single bispectrum coefficient estimate is unlikely to rule out a X
× P,4 ðcos gij ÞP,5 ðcos gjk ÞP,6 ðcos gki Þ
model, because the cosmic variance is often larger than the ,4 ,5 ,6
m4 m5 m6
expected signal, and so one is going to need many coefficients in
practice. A final point is that, for Gaussian fluctuations, the × H,m44,m6 ,6 1m1 H,m55,m4 ,4 2m2 H,m55,m6 ,6 3m3 ; ð7Þ
estimator below cannot be improved, in the sense of having a
smaller error bar. where
H,m11,m2 ,2 3m3 ¼ dQ Y,m11 ¬ Y,m22 Y,m33 ð8Þ
2 METHOD and can be related to Clebsch–Gordan coefficients. The effect of
The optimization procedure in this paper is a generalization of the beam-smearing (here modelled by a Gaussian) is through the

optimal quadratic estimator for the power spectrum, presented by window functions
Tegmark (1997). For consistency, we follow his notation as far as W, ¼ exp ¹,ð, þ 1Þj2 =2 : ð9Þ
possible. Let xi be the temperature fluctuation DT=T, in some sky
pixel i. The temperature map is expanded in spherical harmonics Y,m gij is the angle between pixels i and j, and
in the usual way: N 2 ðgij ; gjk ; gki Þ;1 ¹ cos2 gij ¹ cos2 gjk ¹ cos2 gik
Z X m¬
DT m¬ þ 2 cos gij cos gjk cos gik : ð10Þ
a,m ; dQ Y, ðQÞ . xi Y, ðvi ; fi ÞDQi ; ð1Þ
T i

where dQ, DQi represent elements of solid angle, and v and f are 2.1 Optimal estimator ya
polar coordinates. The power spectrum is defined as We wish to minimize the variance of y (cf. Tegmark 1997 for the
C, ¼ hja,m j i;2
ð2Þ power spectrum), which involves the six-point function. The means
where the angle brackets indicate an ensemble average. Expectation X 0
hya i ¼ Ba0 Raijk Eijk
: ð11Þ
values of products of distinct spherical harmonic coefficients are a0 ijk
zero by isotropy, independently of whether the temperature map is
Gaussian or not. The bispectrum is defined as The covariance beween the ys is Caa0 ; hya ya0 i ¹ hya ihya0 i which
we obtain from the triplet data covariance matrix:
Bð,1 ,2 ,3 ; m1 m2 m3 Þ ; ham 1 m2 m3
,1 a,2 a,3 i: ð3Þ hxi xj xk xi0 xj0 xk0 i ¹ hxi xj xk ihxi0 xj0 xk0 i: ð12Þ
It is zero, unless the indices comply with the following triangle We now make an assumption concerning the departures from
closure constraints (e.g. Edmonds 1957; Luo 1994a): Gaussianity. Since these are expected to be small, we approximate
m1 þ m2 þ m3 ¼ 0; ,1 þ ,2 þ ,3 ¼ even; j,i ¹ ,j j # ,k # ,i þ ,j the covariance matrix by the covariance matrix for a Gaussian field
for i; j; k ¼ 1; 2; 3. with the same power spectrum. This assumes that the bispectrum is
We seek an estimator of B that is lossless, if possible, in the sense small compared with the cosmic variance, and also assumes that the
that it contains as much information as the original map fxi g. connected four-point function is small. Strictly, this method is
Ideally it should be unbiased, and with calculable statistical proper- optimal for testing the hypothesis that the field is Gaussian, but it
ties. In the spirit of Tegmark’s optimal quadratic estimator for the should be very close to optimal for practical cases, since the
power spectrum, we seek an estimator for the bispectrum that is expectation is that the bispectrum will be small. If the assumption
cubic. We consider quantities ya of the following form: is not justified, and the bispectrum is not small compared with the
X a cosmic variance, detection will not be difficult in any event. If this
ya ¼ Eijk xi xj xk : ð4Þ turns out to be the case, it will be important to check that the
pixels ijk estimator remains unbiased in the case of large intrinsic bispectrum.
In the Gaussian approximation, hxi xj xk i ¼ 0, and we use Wick’s
We will find that the ya are related to the bispectrum estimates, but
theorem to write
will not be the bispectrum estimates themselves. We introduce the
shorthand notation a ; f,1 ; ,2 ; ,3 ; m1 ; m2 ; m3 g, and we also com- hxi xj xk xi0 xj0 xk0 i ¼yij yki0 yj0 k0
bine the list of data triplets into a data vector with elements labelled þ permutations ð15 termsÞ; ð13Þ
by A:
where we have defined the two-point function of the temperature
DA ; xi xj xk ; ð5Þ field:
where A represents some triplet fi; j; kg. The Eijk are some coef- X 2, þ 1
yij ; hxi xj i ¼ C, P, ðcos gij ÞW,2 þ Nij : ð14Þ
ficients to be determined. The mean of ya involves the three-point ,
function, which may be written in terms of the bispectra as follows:
X P, are Legendre polynomials and Nij is the noise covariance matrix.
mA ; hxi xj xk i ¼ Ba Raijk ; ð6Þ We can then compute the covariance matrix for ya :
a ÿ  a a0
Vaa0 ; hya ya0 i ¼ yij yki0 yj0 k0 þ perm: Eijk E i0 j0 k 0 ; ð15Þ
where we have assumed that the noise has a zero three-point
function. If it is known and non-zero, it may be added. The functions and we have from now adopted the summation convention for
connecting the three-point functions in real and harmonic space are repeated pixel indices, and also, unless stated otherwise, a indices.

Estimating non-Gaussianity in the microwave background 807
The products of y terms are of two types: there are six terms like the parameters to be estimated – the Fisher matrix is determined
yii0 yjj0 ykk0 , with one of each pair of subscripts from each distinct E, only by the derivatives of the mean values. For the triplets,
and nine terms of the form yij yki0 yj0 k0 where only one y mixes dashed a a 0
a Faa0 ¼ C ¹1
ijki0 j0 k0 Rijk Ri0 j0 k0
and undashed indices. Since the Eijk are symmetric to permutations
 ÿ ¹1 a a0
in the fijkg, we obtain ¼ yii0 6yjj0 ykk0 þ 9yjk yj0 k0 Rijk Ri0 j0 k0 : ð24Þ
ÿ  a a0
Vaa0 ¼ yii0 6yjj0 ykk0 þ 9yjk yj0 k0 Eijk E i0 j0 k 0 : ð16Þ A similar procedure to the computation of E above gives the inverse
covariance matrix
We now minimize the variance Vaa (not summed) with respect to 1 ¹1 ¹1 ¹1 1
Eijk , subject to a normalization constraint on E to ensure that it is not C¹1
ijki0 j0 k0 ¼ ykk 0 yjj0 yii0 ¹ y¹1
0 0y
¹1 ¹1
y0 : ð25Þ
6 2ð2 þ 3NÞ j k jk ii
driven to zero. We choose
This is an important simplification for computational reasons:
Raijk Eijk
¼ 1; ð17Þ without it, the inversion of an N 3 × N 3 matrix (C) would be
impractically slow. Decomposing its inverse into N × N matrices
giving y¹1 is much faster.

ÿ  Since we can recreate the original temperature map fxi g from the
yii0 6yjj0 ykk0 þ 9yjk yj0 k0 Eia0 j0 k0 ¼ lRaijk ; ð18Þ
triplets, (24) is also the Fisher information matrix for the original,
where l is a Lagrange multiplier. Multiplying by y¹1 ¹1
j00 j yi00 i , and entire map. We now make a comparison with the Fisher matrix for
summing over ij, gives the ya – are their errors as small as is possible with the entire map?
The covariance matrix for the ya (16) is, for the optimal choice of E
6ykk0 Eia00 j00 k0 þ 9yj0 k0 dK a ¹1 ¹1 a
j00 k Ei00 j0 k0 ¼ lyj00 j yi00 i Rijk : ð19Þ coefficients (21),
dK a ¹1 a Vaa0 ¼ yii0 6yjj0 ykk0 þ 9yjk yj0 k0
ij is a Kronecker delta function. Defining ri00 jk ¼ yi00 i Rijk , we get,
after some relabelling of indices, × Cijki
¹1 a ¹1 a
00 j00 k 00 R 00 00 00 Ci0 j0 k 0 i000 j000 k 000 R 000 000 000
i j k i j k

6yij Eia00 jk þ 9dK a

ik yjm Ei00 mj ¼ ly¹1 a
ij ri00 kj : ð20Þ ¼ Faa0 : ð26Þ
Taking the trace of this equation, and inserting it into the second Since the ensemble average of ya is
term, gives the coefficients that we require for ya to have its hya i ¼ Ba0 Raijk Eijk
a 0

minimum error bar:

¼ Ba0 Raijk Rai0 j0 k0 Cijki
0 j0 k 0 ; ð27Þ
a 1 3 a
Eijk ¼ y¹1 ¹1 ¹1
ii0 yjj0 ykk0 ¹ y¹1 ¹1
jk yj0 k0 Ri0 j0 k0 : ð21Þ
6 2 þ 3N we obtain the simple result
In this expression, l has been set to unity for convenience, and N is hya i ¼ Ba0 Faa0 : ð28Þ
the number of pixels in the map. Provided that l is finite, its value Consequently, we can use the vector
does not affect the bispectrum information content of ya : any ¹1
B̂a ¼ Faa 0 y a0 ð29Þ
multiple of ya contains the same information as ya itself. This
makes obvious sense, and can be shown formally via the Fisher as an estimator of the bispectrum. It is unbiased:
information matrix (below). ¹1
hB̂a i ¼ Faa 0 hya0 i

¼ Faa 0 Ba00 Fa0 a00 ¼ Ba ; ð30Þ
3 L O S S L E S S E S T I M AT O R O F T H E
BISPECTRUM and the bispectrum estimates also have calculable covariance
In order to estimate how well the ya will perform in estimating the
desired parameters Ba , we compute the Fisher information matrix Caa0 ; hB̂a B̂a0 i ¹ hB̂a ihB̂a0 i
(Tegmark, Taylor & Heavens 1997) ¹1 ¹1
¼ Faa ¹1 ¹1
00 Fa0 a000 hya00 ya000 i ¹ Faa00 Fa0 a000 hya00 ihya000 i
∂ ln p ¹1 ¹1
Faa0 ; ¹ ð22Þ ¼ Faa 00 Fa0 a000 Fa00 a000
∂Ba ∂Ba0
where p is the posterior probability distribution for the parameters ¼ Faa 0: ð31Þ
(equal to the likelihood, if uniform priors for the parameters are This also proves that the estimators are optimal, by the Fisher–
assigned). For a data vector with components with means m and Cramer–Rao inequality.
covariance matrix C, the Fisher matrix is
1 ∂C ¹1 ∂C ∂m ∂m
Faa0 ¼ Tr C¹1 C þ 2C ¹1 ð23Þ
∂Ba ∂Ba0 ∂Ba ∂Ba0 4 A P P L I C AT I O N T O C O B E 4 - Y R D M R D ATA
The error on the parameters Ba is contained in this matrix: if all We illustrate the method by applying the method to the COBE DMR
other parameters are known, the minimum error is the conditional
p 4-yr data, focusing on measuring low-order coefficients. For this
one, jBa ¼ 1= Faa . If all parameters are to be estimated from the
p experiment, the width of the approximately Gaussian beam is
data, then the appropriate error is the marginal error Faa ¹1. This j ¼ 38: 2 (Wright et al. 1992). The method is computationally
assumes that the probability surface is adequately approximated by expensive, and is in the process of being optimized, but, for the
a second-order Taylor expansion at the peak. moment, the approach taken is to average the ,4000 unmasked
As expected for the ‘near-Gaussian’ approximation, the covari- pixels of the COBE data set into larger pixels of roughly 12 degrees
ance matrix for either the triplets xi xj xk or the ya does not depend on square. This introduces an additional effective Gaussian smoothing

808 A. F. Heavens
for large-scale coefficients of 128 = 12, which is added in quad- satellite without some form of pre-compression, perhaps along the
rature to the COBE beam. We shall see the effect of this additional lines above for the low-, modes. For high-, modes, subdivision of
pixelization below, in the form of an error bar larger than that of the sky into essentially independent data sets may be required. This
cosmic variance, especially for the higher harmonics. The effective is an inherent problem for high-order statistics, but it may also be
beam suppresses contributions to the bispectrum from harmonics required even for the power spectrum.
with , larger than ,ð, þ 1Þj2eff =2 . 1, i.e. , > 17. We therefore A further point to note is that the method deals automatically with
truncate the summations in the estimator for B, and the Fisher sky coverage that is not complete, and arbitrary noise correlations.
matrix (scalar in this case) at the conservative limit of ,max ¼ 40. These are the two crucial tests of any method, since they will be a
Pixel errors are taken from the COBE DMR data sets, and assumed feature of future high-quality experiments such as Planck and MAP.
to be independent. Averaging is done by inverse-variance weight- Three-point function estimators have traditionally not computed the
ing. The power spectrum is assumed to have a Harrison–Zeldovich covariances directly, but rely on simulation tests to decide signifi-
spectrum, and the normalization is Qrms ¼ 18:4 mK (Gorski et al. cance (e.g. Kogut et al. 1996). This is the disadvantage of many
1996). Coefficients are chosen for which non-Gaussian predictions methods (e.g. genus, extrema correlation functions) for which the
are quoted in Luo (1994a): evaluation of errors may be difficult analytically. Against this one

has, of course, to balance speed advantages.
Bð2 2 2; 1 1 ¹ 2Þ ¼ ð2:5 6 4:1Þ × 10¹15 ; A preliminary application to the COBE 4-yr data shows large-
Bð4 4 4; 2 2 ¹ 4Þ ¼ ð5:4 6 6:5Þ × 10¹15 ; scale bispectrum coefficients consistent with zero. However, the
errors are sufficiently large that these coefficients do not rule out
Bð6 6 6; 3 3 ¹ 6Þ ¼ ð12:4 6 8:9Þ × 10¹15 ; non-Gaussian models with confidence.
Bð8 8 8; 4 4 ¹ 8Þ ¼ ð1:5 6 11:6Þ × 10¹15 ;
Bð10 10 10; 5 5 10Þ ¼ ð¹16:3 6 16:6Þ × 10¹15 ; ð32Þ REF E RENCE S
Bersanelli M. et al., 1996, ESA Report on Phase A Study D/SCI, 96, 3
from which we see that there is no evidence of non-Gaussianity, at
Bharadwaj S., Munshi D., Souradeep T., 1997, astro-ph/9708015
least for these coefficients. Note that, for the first bispectrum Bond J. R., Efstathiou G., 1987, MNRAS, 226, 655
estimate, the cosmic variance corresponds to an error of Coles P., 1989, MNRAS, 234, 509
1:4 × 10¹15 (e.g. Luo 1994a). These very large-scale modes are Coulson D., Crittenden R., Turok N., 1994, Phys. Rev. Lett., 73, 2390
not ideal for this sort of study – a higher-resolution experiment Edmonds A., 1957, Angular Momentum in Quantum Mechanics. Princeton
generally has higher signal-to-noise ratio (Luo 1994a). A recent Univ. Press, Princeton, NJ
preprint (Ferreira, Magueijo & Gorski 1998) claims a detection of Falk T., Rangarajan R., Srednicki M., 1993, ApJ, 403 L1
non-Gaussianity at , ¼ 16, a mode which is not probed here. Ferreira P., Magueijo J., Gorski K., 1998, astro-ph/9803256
Gangui A., Lucchin F., Matarrese S., Mollerach S., 1994, ApJ, 430, 447
Gorski K., Banday A. J., Bennet C. L., Hinshaw G., Kogut A., Smoot G. F.,
5 SUMM ARY Wright E. L., 1996, ApJ, 464, L11
Gott J. R., Park C., Juszkiewicz R., Bies W., Bennett D., Bouchet F., Stebbins
In this paper, we see that it is possible to construct an estimator for A., 1990, ApJ, 352 1
the bispectrum which is lossless, in the sense that it contains as Hinshaw G. et al., 1994, ApJ, 431, 1
much information on the bispectrum as the entire map. It is also Hivon E., Bouchet F., Colombi S., Juszkiewicz R., 1995, A&A, 298, 643
unbiased, and the covariance properties of the estimators are Jungman G., Kamionkowski M., Kosowsky A., Spergel D. N., 1996, Phys.
calculable. The estimator involves one approximation – that the Rev. D, 54, 1332
departures from Gaussianity are small. In this limit, there is no other Kogut A., Banday A., Bennett C., Hinshaw G., Lubin P., Smoot G., 1995,
method that will lead to smaller error bars. The fact that the ApJ, 439, L29
covariance properties are known is important in practical cases, Kogut A., Banday A., Bennett C., Gorski K., Hinshaw G., Smoot G., Wright
because the bispectrum may well be small in comparison with E., 1996, ApJ, 464, L29
Luo X., 1994a, ApJ, 427, L71
cosmic variance, so a single estimate is unlikely to be sufficient to
Luo X., 1994b, Phys. Rev. D, 49, 3810
rule out many non-Gaussian models. Many estimates (with Luo X., Schramm D., 1993, Phys. Rev. Lett., 71, 1124
different ,1 ,2 ,3 m1 m2 m3 ) would be required. As a test for non- Matarrese S., Verde L., Heavens A., 1997, MNRAS, 290, 651
Gaussianity, this method has the advantage that confidence levels Mollerach S., Gangui A., Lucchin F., Matarrese S., 1995, ApJ, 453, 1
can be computed analytically, without recourse to Monte Carlo Munshi D., Souradeep T., Starobinsky A., 1995, ApJ, 454, 552
simulation. Pen U.-L., Seljak U., Turok N., 1997, Phys. Rev. Lett., 79, 1611
The notion of an optimal method is defined rather precisely in Smoot G. F., Tenorio L., Banday A., Kogut A., Wright E. L., Hinshaw G.,
terms of information content and bias, but the issue of whether a Bennett C. L., 1994, ApJ, 437, 1
method is good or not is wider than this. There is no doubt that the Tegmark M., 1997, Phys. Rev. D, 55, 5895
number of computations required to do this analysis is very large, Tegmark M., Taylor A., Heavens A., 1997, ApJ, 480, 22
Verde L., Heavens A., Matarrese S., Moscardini L., 1998, MNRAS,
dominated for reasonable pixel counts by computation of the R
coefficients. This can be aided by pre-computing Clebsch–Gordan Vermaak C., Vermaak D., Miller H., 1984, Comput. Phys. Commun., 31, 41
coefficients (Vermaak, Vermaak & Miller 1984) and using a Wright E. et al., 1992. ApJ, 396, L13
packed-storage algorithm, and by using parallel computers, for
which this problem is ideal. However, it is clear that it will not be
possible to deal directly with the entire data set from the Planck This paper has been typeset from a TE X=LA TE X file prepared by the author.

