generalized method of moments estimation

From The New Palgrave Dictionary of Economics, Second Edition, 2008
Back to top


Generalized method of moments estimates econometric models without requiring a full statistical specification. One starts with a set of moment restrictions that depend on data and an unknown parameter vector to be estimated. When there are more moment restrictions than underlying parameters, there is family of such estimators. The tractable form of the large sample properties of this family facilitates efficient estimation and statistical testing. This article motivates the method, presents some of the underlying statistical properties and discusses implementation.
Back to top


Back to top


Back to top

1 Introduction

Generalized method of moments (GMM) refers to a class of estimators constructed from the sample moment counterparts of population moment conditions (sometimes known as orthogonality conditions) of the data generating model. GMM estimators have become widely used, for the following reasons:
  • 1. GMM estimators have large sample properties that are easy to characterize. A family of such estimators can be studied simultaneously in ways that make asymptotic efficiency comparisons easy. The method also provides a natural way to construct tests which take account of both sampling and estimation error.
  • 2. In practice, researchers find it useful that GMM estimators may be constructed without specifying the full data generating process (which would be required to write down the maximum likelihood estimator.) This characteristic has been exploited in analysing partially specified economic models, studying potentially misspecified dynamic models designed to match target moments, and constructing stochastic discount factor models that link asset pricing to sources of macroeconomic risk.
Books with good discussions of GMM estimation with a wide array of applications include: Cochrane (2001), Arellano (2003), Hall (2005), and Singleton (2006). For a theoretical treatment of this method see Hansen (1982) along with the self-contained discussions in the books. See also Ogaki (1993) for a general discussion of GMM estimation and applications, and see Hansen (2001) for a complementary article that, among other things, links GMM estimation to related literatures in statistics. For a collection of recent methodological advances related to GMM estimation, see the journal issue edited by Ghysels and Hall (2002). While some of these other references explore the range of substantive applications, in what follows we focus more on the methodology.
Back to top

2 Set-up

As we will see, formally there are two alternative ways to specify GMM estimators, but they have a common starting point. Data are a finite number of realizations of the process {xt : t=1, 2,…}. The model is specified as a vector of moment conditions:

where f has r coordinates and β0 is an unknown vector in a parameter space PRk. To achieve identification we assume that on the parameter space P

The parameter β0 is typically not sufficient to write down a likelihood function. Other parameters are needed to specify fully the probability model that underlies the data generation. In other words, the model is only partially specified.
Examples include:Regarding example (a), many related methods have been developed for estimating correctly specified models, dating back to some of the original applications in statistics of method-of-moments-type estimators. The motivation for such methods was computational. See Hansen (2001) for a discussion of this literature and how it relates to GMM estimation. With advances in numerical methods, the fully efficient maximum likelihood method and Bayesian counterparts have become much more tractable. On the other hand, there continues to be an interest in the study of dynamic stochastic economic models that are misspecified because of their purposeful simplicity. Thus moment matching remains an interesting application for the methods described here. Testing target moments remains valuable even when maximum likelihood estimation is possible (for example, see Bontemps and Meddahi, 2005).
Back to top
2.1 Central limit theory and martingale approximation
The parameter dependent average

is featured in the construction of estimators and tests. When the law of large numbers is applicable, this average converges to the Ef(xt,β). As a refinement of the identification condition:

where ⇒ denotes convergence in distribution and V is a covariance matrix assumed to be nonsingular. In an iid data setting, V is the covariance matrix of the random vector f(xt,β0). In a time series setting:

which is the long-run counterpart to a covariance matrix.
Central limit theory for time series is typically built on martingale approximation (see Gordin,1969; Hall and Heyde, 1980). For many time series models, the martingale approximators can be constructed directly and there is specific structure to the V matrix. A leading example is when f(xt,β0) defines a conditional moment restriction. Suppose that xt, t=0,1,… generates a sigma algebra Ft, E[|f(xt,β0)|2]< and

for some 1. This restriction is satisfied in models of multi-period security market pricing and in models that restrict multi-period forecasting. If =1, then gN is itself a martingale; but when >1 it is straightforward to find a martingale mN with stationary increments and finite second moments such that

where |·| is the standard Euclidean norm. Moreover, the lag structure may be exploited to show that the limit in (3) is

(The sample counterpart to this formula is not guaranteed to be positive semidefinite. There are a variety of ways to exploit this dependence structure in estimation in constructing a positive semidefinite estimate. See Eichenbaum, Hansen and Singleton, 1988, for an example.) When there is no exploitable structure to the martingale approximator, the matrix V is the spectral density at frequency zero.

Back to top
2.2 Minimizing a quadratic form
One approach for constructing a GMM estimator is to minimize the quadratic form:

for some positive definite weighting matrix W. Alternative weighting matrices W are associated with alternative estimators. Part of the justification for this approach is that

The GMM estimator mimics this identification scheme by using a sample counterpart.
There are a variety of ways to prove consistency of GMM estimators. Hansen (1982) established a uniform law of large numbers for random functions when the data generation is stationary and ergodic. This uniformity is applied to show that

and presumes a compact parameter space. The uniformity in the approximation carries over directly the GMM criterion function gN(β)WgN(β). See Newey and McFadden (1994) for a more complete catalogue of approaches of this type.
The compactness of the parameter space is often not ignored in applications, and this commonly invoked result is therefore less useful than it might seem. Instead, the compactness restriction is a substitute for checking behaviour of the approximating function far away from βo to make sure that spurious optimizers are not induced by approximation error. This tail behaviour can be important in practice, so a direct investigation of it can be fruitful. For models with parameter separation:

where X is an r×m matrix constructed from x and h is a one-to-one function mapping P into subset of Rm, there is an alternative way to establish consistency (see Hansen, 1982 for details). Models that are either linear in the variables or models based on matching moments that are nonlinear functions of the underlying parameters can be written in this separable form.
The choice of W=V−1 receives special attention, in part because

While the matrix V is typically not known, it can be replaced by a consistent estimator without altering the large sample properties of bN. When using martingale approximation, the implied structure of V can often be exploited as in formula (4). When there is no such exploitable structure, the method of Newey and West (1987b) and others can be employed that are based on frequency-domain methods for time series data.
For asset pricing models there are other choices of a weighting matrix motivated by considerations of misspecification. In these models with parameterized stochastic discount factors, the sample moment conditions gN(β) can be interpreted as a vector of pricing errors associated with the parameter vector β. A feature of W=V−1 is that, if the sample moment conditions (the sample counterpart to a vector pricing errors) happened to be the same for two models (two choices of β), the one for which the implied asymptotic covariance matrix is larger will have a smaller objective. Thus there is a reward for parameter choices that imply variability in the underlying central limit approximation. To avoid such a reward, it is also useful to compare models or parameter values in other ways. An alternative weighting matrix is constructed by minimizing the least squares distance between the parameterized stochastic discount factor and one among the family of discount factors that correctly price the assets. Equivalently, parameters or models are selected on the basis of the maximum pricing error among constant weighted portfolios with payoffs that have common magnitude (a unit second moment). See Hansen and Jagannathan (1997) and Hansen, Heaton and Luttmer (1995) for this and related approaches.
Back to top
2.3 Selection matrices
An alternative depiction is to introduce a selection matrix A that has dimension k×r and to solve the equation system:

for some choice of β, which we denote bN. The selection matrix A reduces the number of equations to be solved from r to k. Alternative selection matrices are associated with alternative GMM estimators. By relating estimators to their corresponding selection matrices, we have a convenient device for studying simultaneously an entire family of GMM estimators. Specifically, we explore the consequence of using alternative subsets of moment equations or more generally alternative linear combinations of the moment equation system. This approach builds on an approach of Sargan (1958; 1959) and is most useful for characterizing limiting distributions. The aim is to study simultaneously the behaviour of a family of estimators. When the matrix A is replaced by a consistent estimator, the asymptotic properties of the estimator are preserved. This option expands considerably the range of applicability, and, as we will see, is important for implementation.
Since alternative choices of A may give rise to alternative GMM estimators, index alternative estimators by the choice of A. In what follows, replacing A by a consistent estimator does not alter the limiting distribution. For instance, the first-order conditions from minimizing a quadratic form can be represented using a selection matrix that converges to a limiting matrix A. Let

Two results are central to the study of GMM estimators:


Both approximation results are expressed in terms of NgN(β0), which obeys a central limit theorem, see (2). These approximation results are obtained by standard local methods. They require the square matrix AD to be nonsingular. Thus, for there to exist a valid selection matrix, D must have full column rank k. Notice from (6) that the sample moment conditions evaluated at bN have a degenerate distribution. Pre-multiplying by A makes the right-hand side zero. This is to be expected because linear combinations of the sample moment conditions are set to zero in estimation.
In addition to assess the accuracy of the estimator (approximation (5)) and to validate the moment conditions (approximation (6)), Newey and West (1987a) and Eichenbaum, Hansen and Singleton (1988) show how to use these and related approximations to devise tests of parameter restrictions. (Their tests imitate the construction of the likelihood ratio, Lagrange multiplier and the Wald tests familiar from likelihood inference methods.)
Next we derive a sharp lower bound on the asymptotic distribution of a family of GMM estimators indexed by the selection matrix A. For a given A, the asymptotic covariance matrix for a GMM estimator constructed using this selection is:

A selection matrix in effect over-parameterizes a GMM estimator, as can be seen from this formula. Two such estimators with selection matrices of the form A and BA for a nonsingular matrix B imply

because the same linear combinations of moment conditions are being used in estimation. Thus without loss of generality we may assume that AD=I. With this restriction we may imitate the proof of the famed Gauss–Markov Theorem to show that

and that the lower bound on left is attained by any A˜ such that A˜=BDV-1 for some nonsingular B. The quadratic form version of a GMM estimator typically satisfies this restriction when WN is a consistent estimator of V−1. This follows from the first-order conditions of the minimization problem.
To explore further the implications of this choice, factor the inverse covariance matrix V−1 as V−1=Λ′Λ and form Δ=ΛD. Then

The matrices Δ(Δ′Δ)−1Δ′ and I−Δ(Δ′Δ)−1Δ′ are each idempotent and

The first coordinate block is an approximation for NΛgN(bN) and the sum of the two coordinate blocks is NΛgN(βo). Thus we may decompose the quadratic form

where the two terms on the right-hand side are distributed as independent chi-square. The first has r degrees of freedom and the second one has rk degrees of freedom.
Back to top

3 Implementation using the objective function curvature

While the formulas just produced can be used directly using consistent estimators of V and D in conjunction with the relevant normal distributions, looking directly at the curvature of the GMM objective function based on a quadratic form is also revealing. Approximations (5) and (6) give guidance on how to do this.
For a parameter vector β let VN(β) denote an estimator of the long-run covariance matrix. Given an initial consistent estimator bN, suppose that VN(bN) is a consistent estimator of V and

Then use of the selection AN=DN[VN(bN)]-1 attains the efficiency bound for GMM estimators. This is the so-called two-step approach to GMM estimation. Repeating this procedure, we obtain the so-called iterative estimator. (There is no general argument that repeated iteration will converge.) In the remainder of this section we focus on a third approach, resulting in what we call the continuous-updating estimator. This is obtained by solving:


Let bN denote the minimized value. Here the weighting matrix varies with β.
Consider three alternative methods of inference that look at the global properties of the GMM objective LN(β):
  • (a) {βP:LN(β)C} where C is a critical value from a χ2(r) distribution.
  • (b) {βP:LN(β)-LN(bN)C} where C is a critical value from a χ2(k) distribution.
  • (c) Choose a prior π. Mechanically, treat -12LN(β) as a log-likelihood and compute

Method (a) is based on the left-hand side of (8). It was suggested and studied in Hansen, Heaton and Luttmer (1995) and Stock and Wright (2000). As emphasized by Stock and Wright, it avoids using a local identification condition (a condition that the matrix D have full column rank). On the other hand, it combines evidence about the parameter as reflected by the curvature of the objective with overall evidence about the model. A misspecified model will be reflected as an empty confidence interval.
Method (b) is based on the second term on right-hand side of (8). By translating the objective function, evidence against the model is netted out. Of course it remains important to consider such evidence because parameter inference may be hard to interpret for a misspecified model. The advantage of (b) is that the degrees of freedom of the chi-square distribution are reduced from r to k. Extensions of this approach to accommodate nuisance parameters were used by Hansen and Singleton (1996) and Hansen, Heaton and Luttmer (1995). The decomposition on the right-hand side of (8) presumes that the parameter is identified locally in the sense that D has full column rank, guaranteeing that the DV−1D is nonsingular. Kleibergen (2005) constructs an alternative decomposition based on a weaker notion of identification that can be used in making statistical inferences.
Method (c) was suggested by Chernozhukov and Hong (2003). It requires an integrability condition which will be satisfied by specifying a uniform distribution π over a compact parameter space. The resulting histograms can be sensitive to the choice of this set or more generally to the choice of π. All three methods explore the global shape of the objective function when making inferences. (The large sample justification remains local, however.)
Back to top

4 Backing off from efficiency

In what follows we give two types of applications that are not based on efficient GMM estimation.
Back to top
4.1 Calibration-verification
An efficient GMM estimator selects the best linear combination among a set of moment restrictions. Implicitly a test of the over-identifying moment conditions examines whatever moment conditions are not used in estimation. This complicates the interpretation of the resulting outcome. Suppose instead there is one set of moment conditions for which we have more confidence and are willing to impose for the purposes and calibration or estimation. The remaining set of moment conditions are used for the purposes of verification or testing. The decision to use only a subset of the available moment conditions for purposes of estimation implies a corresponding loss in efficiency. See Christiano and Eichenbaum (1992) and Hansen and Heckman (1996) for a discussion of such methods for testing macroeconomic models.
To consider this estimation problem formally, partition the function f as:

where f[1] has r1 coordinates and f[2] has rr1 coordinates. Suppose that r1k and that β is estimated using an A matrix of the form:

and hence identification is based only on

This is the so-called calibration step. Let bN be the resulting estimator.
To verify or test the model we check whether gN[2](bN) is close to zero as predicted by the moment implication:

Partition the matrix D of expected partial derivatives as:

where D1 is r1 by k and D2 is rr1 by k. Here we use limit approximation (6) to conclude that

which has a limiting normal distribution. A chi-square test can be constructed by building a corresponding quadratic form of rr1 asymptotically independent standard normally distributed random variables. (When r1 exceeds k it is possible to improve the asymptotic power by exploiting the long-run covariation between f[2](xt,βo) and linear combination of f[1](xt,βo) not used in estimation. This can be seen formally by introducing a new parameter γo=E[f[2](xt,β)] and using the GMM formulas for efficient estimation of βo and γo.)
Back to top
4.2 Sequential estimation
Sequential estimation methods have a variety of econometric applications. For models of sample selection see Heckman (1976), and for related methods with generated regressors see Pagan (1984). For testing asset pricing models, see Cochrane (2001, chs 12 and 13).
To formulate this problem in a GMM setting, partition the parameter vector as

where β[1] has k1 coordinates. Partition the function f as:

where f[1] has r1 coordinates and f[2] has rr1 coordinates. Notice that the first coordinate block only depends on the first component of the parameter vector. Thus the matrix d is block lower triangular:


A sequential estimation approach exploits the triangular structure of the moment conditions as we now describe. The parameter β0[1] is estimable from the first partition of moment conditions. Given such an estimator, bN[1], β0[2] is estimable from the second partition of moment conditions. Estimation error in the first stage alters the accuracy of the second stage estimation, as I now illustrate.
Assume now that r1k1. Consider a selection matrix that is block diagonal:

where A11 has dimension k1 by r1 and A22 has dimension kk1 by rr1. It is now possible to estimate βo[1] using the equation system:

or a method that is asymptotically equivalent to this. Let bN[1] be the solution. This initial estimation may be done for simplicity or because these moment conditions are embraced with more confidence. Given this estimation of βo[1], we seek an estimator bN[2] of β0[2] by solving:

To proceed, we use this partitioning and apply (5) to obtain the limiting distribution for the estimator bN[2]. Straightforward matrix calculations yield,

This formula captures explicitly the impact of the initial estimation of β0[1] on the subsequent estimation of β0[2]. When D21 is zero an adjustment is unnecessary.
Consider next a (second-best) efficient choice of selection matrix A22. Formula (9) looks just like formula (5) with A22 replacing A, D22 replacing D and a particular linear combination of gN(β0). The matrix used in this linear combination ‘corrects’ for the estimation error associated with the use of an estimator bN[1] instead of the unknown true value β0[1]. By imitating our previous construction of an asymptotically efficient estimator, we construct the (constrained) efficient choice of A22 given A11:

for some nonsingular matrix B22. An efficient estimator can be implemented in the second stage by solving:

for VN[2] given by a consistent estimator of

or by some other method that selects (at least asymptotically) the same set of moment conditions to use in estimation. Thus we have a method that adjusts for the initial estimation of β[1] while making efficient use of the moment conditions Ef[2](xt,β)=0.
As an aside, notice the following. Given an estimate bN[1], the criterion-based methods of statistical inference described in Section 3 can be adapted to making inferences in this second stage in a straightforward manner.
Back to top

5 Conditional moment restrictions

The bound (7) presumes a finite number of moment conditions and characterizes how to use these conditions efficiently. If we start from the conditional moment restriction:

then in fact there are many moment conditions at our disposal. Functions of variables in the conditioning information set can be used to extend the number of moment conditions. By allowing for these conditions, we can improve upon the asymptotic efficiency bound for GMM estimation. Analogous conditional moment restrictions arise in cross-sectional settings.
For a characterizations and implementations appropriate for cross-sectional data, see Chamberlain (1986) and Newey (1993), and for characterizations and implementations in a time series settings see Hansen (1985; 1993), and West (2001). The characterizations are conceptually interesting but reliable implementation is more challenging. A related GMM estimation problem is posed and studied by Carrasco and Florens (2000) in which there is a pre-specified continuum of moment conditions that are available for estimation.
Back to top

6 Conclusion

GMM methods of estimation and inference are adaptable to a wide array of problems in economics. They are complementary to maximum likelihood methods and their Bayesian counterparts. Their large sample properties are easy to characterize. While their computational simplicity is sometimes a virtue, perhaps their most compelling use is in the estimation of partially specified models or of misspecified dynamic models designed to match a limited array of empirical targets.
Back to top

See Also

I greatly appreciate comments from Lionel Melin, Monika Piazzesi, Grace Tsiang and Francisco Vazquez-Grande. This material is based upon work supported by the National Science Foundation under Award Number SES0519372.
Back to top


Amemiya, T. 1974. The nonlinear two-stage least-squares estimator. Journal of Econometrics 2, 105–10.

Arellano, M. 2003. Panel Data Econometrics. New York: Oxford University Press.

Bontemps, C. and Meddahi, N. 2005. Testing normality: a GMM approach. Journal of Econometrics 124, 149–86.

Carrasco, M. and Florens, J.P. 2000. Generalization of GMM to a continuum of moment conditions. Econometric Theory 20, 797–834.

Chamberlain, G. 1986. Asymptotic efficiency in estimation with conditional moment restrictions. Journal of Econometrics 34, 305–34.

Chernozhukov, V. and Hong, H. 2003. An MCMC approach to classical estimation. Journal of Econometrics 115, 293–346.

Christiano, L.J. and Eichenbaum, M. 1992. Current real business cycle theories and aggregate labor market fluctuations. American Economic Review 82, 430–50.

Cochrane, J. 2001. Asset Pricing. Princeton: Princeton University Press.

Cumby, R.E., Huizinga, J. and Obstfeld, M. 1983. Two-step two-stage least squares estimation in models with rational expectations. Journal of Econometrics 21, 333–5.

Eichenbaum, M.S., Hansen, L.P. and Singleton, K.J. 1988. A time series analysis of representation agent models of consumption and leisure choice under uncertainty. Quarterly Journal of Economics 103, 51–78.

Ghysels, E. and Hall, A. 2002. Editors' Introduction to JBES twentieth anniversary issue on generalized method of moments estimation. Journal of Business and Economic Statistics 20, 441.

Gordin, M.I. 1969. The central limit theorem for stationary processes. Soviet Mathematics Doklady 10, 1174–6.

Hall, A.R. 2005. Generalized Method of Moments. New York: Oxford University Press.

Hall, P. and Heyde, C.C. 1980. Martingale Limit Theory and Its Application. Boston: Academic Press.

Hansen, L.P. 1982. Large sample properties of generalized method of moments estimators. Econometrica 50, 1029–54.

Hansen, L.P. 1985. A method for calculating bound on asymptotic covariance matrices of generalized method of moments estimators. Journal of Econometrics 30, 203–38.

Hansen, L.P. 1993. Semiparametric efficiency bounds for linear time-series models. In Models, Methods and Applications of Econometrics: Essays in Honor of A.R. Bergstrom, ed. P.C.B. Phillips. Cambridge, MA: Blackwell.

Hansen, L.P. 2001. Method of moments. In International Encyclopedia of the Social and Behavior Sciences. New York: Elsevier.

Hansen, L.P., Heaton, J. and Luttmer, E. 1995. Econometric evaluation of asset pricing models. Review of Financial Studies 8, 237–74.

Hansen, L.P. and Heckman, J.J. 1996. The empirical foundations of calibration. Journal of Economic Perspectives 10(1), 87–104.

Hansen, L.P. and Jagannathan, R. 1997. Assessing specification errors in stochastic discount factor models. Journal of Finance 52, 557–90.

Hansen, L.P. and Singleton, K.J. 1982. Generalized instrumental variables of nonlinear rational expectations models. Econometrica 50, 1269–86.

Hansen, L.P. and Singleton, K.J. 1996. Efficient estimation of linear asset pricing models with moving average errors. Journal of Business and Economic Statistics 14, 53–68.

Hansen, L.P., Heaton, J.C., Lee, J. and Roussanov, N. 2007. Intertemporal substitution and risk aversion. In Handbook of Econometrics, vol. 6A, ed. J. Heckman and E. Leamer. Amsterdam: North-Holland.

Hayashi, F. and Sims, C. 1983. Nearly efficient estimation of time-series models with predetermined, but not exogenous, instruments. Econometrica 51, 783–98.

Heckman, J.J. 1976. The common structure of statistical methods of truncation, sample selection, and limited dependent variables and a simple estimator of such models. Annals of Economic and Social Measurement 5, 475–92.

Kleibergen, F. 2005. Testing parameters in GMM without assuming that they are identified. Econometrica 73, 1103–23.

Newey, W. 1993. Efficient estimation of models with conditional moment restrictions. In Handbook of Statistics, vol. 11, ed. G.S. Maddala, C.R. Rao, and H.D. Vinod. Amsterdam: North-Holland.

Newey, W. and McFadden, D. 1994. Large sample estimation and hypothesis testing. In Handbook of Econometrics, vol. 4, ed. R. Engle and D. McFadden. Amsterdam: North-Holland.

Newey, W.K. and West, K.D. 1987a. Hypothesis testing with efficient method of moments estimation. International Economic Review 28, 777–87.

Newey, W.K. and West, K.D. 1987b. A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica 55, 703–8.

Ogaki, M. 1993. Generalized method of moments: econometric applications. In Handbook of Statistics, vol. 11, ed. G.S. Maddala, C.R. Rao and H.D. Vinod. Amsterdam: North-Holland.

Pagan, A.R. 1984. Econometric issues in the analysis of models with generated regressors. International Economic Review 25, 221–47.

Sargan, J.D. 1958. The estimation of economic relationships using instrumental variables. Econometrica 26, 393–415.

Sargan, J.D. 1959. The estimation of relationships with autocorrelated residuals by the use of instrumental variables. Journal of the Royal Statistical Society: Series B 21, 91–105.

Singleton, K.J. 2006. Empirical Dynamic Asset Pricing: Model Specification and Econometric Assessment. Princeton: Princeton University Press.

Stock, J.H. and Wright, J.H. 2000. GMM with weak identification. Econometrica 68, 1055–96.

West, K.D. 2001. On optimal instrumental variables estimation of stationary time series models. International Economic Review 42, 1043–50.

Back to top

How to cite this article

Hansen, Lars Peter. "generalized method of moments estimation." The New Palgrave Dictionary of Economics. Second Edition. Eds. Steven N. Durlauf and Lawrence E. Blume. Palgrave Macmillan, 2008. The New Palgrave Dictionary of Economics Online. Palgrave Macmillan. 26 January 2015 <> doi:10.1057/9780230226203.0626

Download Citation:

as RIS | as text | as CSV | as BibTex