CAES Lezione 3 - Elementi di Teoria della Stima
Transcript
CAES Lezione 3 - Elementi di Teoria della Stima
Outline Introduction Statistical Models CAES Lezione 3 Elementi di Teoria della Stima Prof. Michele Scarpiniti Dip. NFOCOM - “Sapienza” Università di Roma http://ispac.ing.uniroma1.it/scarpiniti/index.htm [email protected] Roma, 10 Marzo 2010 M. Scarpiniti CAES Lezione 3 - Elementi di Teoria della Stima 1 / 28 Outline Introduction Statistical Models 1 Introduction Basic Definition Classical and Bayesian estimation 2 Statistical Models AR Model MA Model ARMA Model M. Scarpiniti CAES Lezione 3 - Elementi di Teoria della Stima 2 / 28 Outline Introduction Statistical Models Basic Definition Classical and Bayesian estimation Introduction Introduction M. Scarpiniti CAES Lezione 3 - Elementi di Teoria della Stima 3 / 28 Outline Introduction Statistical Models Basic Definition Classical and Bayesian estimation Introduction to Estimation Theory In several applications the statistic properties (cdf, pdf, acf, PSD, etc.) of some stochastic processes are not known, thus it is very important the estimation of such properties from measured data. The estimation theory deals with this kind of problems. N−1 Given a SP x[n], let us pose x = [x[n]]n=0 a sequence of N values of x[n] and suppose to estimate a statistical parameter θ ∈ Θ, where Θ is the parameter space, using a function h(·), called estimator and denoted with θ̂: θ̂ = h(x). In general the problem is the estimation of a set of L unknown parameters L−1 N−1 θ = [θ[n]]n=0 from a series of N observations x = [x[n]]n=0 , by means of an estimation function or estimator h(·), such that θ̂ = h(x). Summarizing: 1 θ ∈ Θ is the vector of the parameters that we want to estimate. It can be a random variable or a unknown deterministic constant; 2 h(x) is the estimator (the law able to estimate the parameters from the observations); 3 θ̂ is the result of the estimation, θ̂ = h(x). It is alway a RV. M. Scarpiniti CAES Lezione 3 - Elementi di Teoria della Stima 4 / 28 Outline Introduction Statistical Models Basic Definition Classical and Bayesian estimation Introduction to Estimation Theory The estimator is a RV that can be described by the sampling distribution fx;θ (x; θ). The sampling distribution gives information on the goodness of an estimator: in fact a good estimator is focused around its true value and has a minimum variance. If θ is a deterministic parameter, we have the classical estimation theory. In this case θ represents a parametric dependence of the fx;θ (x; θ) from the measured data x. If θ is a stochastic parameter, characterized by an own pdf fθ (θ), which gives all the a priori knowledges and known as a priori pdf, we have the Bayesian estimation theory. Then the sampling distribution can be evaluated, using Bayes’ theorem, as fx,θ (x, θ) = fx,θ (x|θ)fθ (θ) = fx,θ (θ|x)fx (x) wherefx,θ (x|θ) the conditioned pdf which represents the knowledge taken from data x conditioned to the knowledges of θ. M. Scarpiniti CAES Lezione 3 - Elementi di Teoria della Stima 5 / 28 Outline Introduction Statistical Models Basic Definition Classical and Bayesian estimation Unbiased estimator The sampling distribution fx;θ (x; θ) isnnot n o o alway known: in this sense we can usenthe o expectation E θ̂ , the variance var θ̂ or σθ̂2 and the mean square error mse θ̂ , as an indirect measure of the goodness of the estimator. An estimator is said to be unbiased if E (θ̂) = θ and the difference ∆ b(θ̂) = E (θ̂) − θ Remark The bias can be due to a systematic error, i.e. a measurement error. An unbiased estimator is not necessarily a “good” estimator. is defined as bias. M. Scarpiniti CAES Lezione 3 - Elementi di Teoria della Stima 6 / 28 Outline Introduction Statistical Models Basic Definition Classical and Bayesian estimation Variance and MSE of an estimator The variance of an estimator is define as ∆ var(θ̂) = σθ̂2 = E {|θ̂ − E (θ̂)|2 } that measures the dispersion of the pdf of θ̂ around its mean value. The mean square error (MSE) of an estimator is defined as mse(θ̂) = E {|θ̂ − θ|2 } where θ is the true value of the parameter. It measures the mean quadratic dispersion of the estimator from its true value. It can be decomposed as mse(θ̂) = σθ̂2 + |b(θ̂)|2 Proof. E {|θ̂ − θ|2 } = E {|θ̂ − θ + E (θ̂) − E (θ̂)|2 } = E {|[θ̂ − E (θ̂)] + [E (θ̂) − θ]|2 } = E {|θ̂ − E (θ̂)|2 } + |E (θ̂) − θ|2 | {z } | {z } σ2 θ̂ M. Scarpiniti |b(θ̂)|2 CAES Lezione 3 - Elementi di Teoria della Stima 7 / 28 Outline Introduction Statistical Models Basic Definition Classical and Bayesian estimation Minimum variance unbiased (MVU) estimator Ideally we want an estimator with a zero MSE. Unfortunately such an estimator is not existing. In fact sometimes an optimum parameter can depend on the estimator itself. In this sense the best estimator is not that with a minimum MSE, but that with a zero bias, such that mse(θ̂) = σθ̂2 Obviously this estimator is called minimum variance unbiased or MVU. A such estimator cannot exist. Remark A good estimator should be unbiased and with minimum variance. These two properties are often contrasting: reducing the variance, the bias can increase. This situation is known as bias-variance trade-off. M. Scarpiniti CAES Lezione 3 - Elementi di Teoria della Stima 8 / 28 Outline Introduction Statistical Models Basic Definition Classical and Bayesian estimation Consistent estimator An estimator is said to be weakly consistent if increasing the length N of the sample, we have lim p{|h(x) − θ| > ε} = 0, ∀ε > 0 N→∞ An estimator is said to be strongly consistent if increasing the length N of the sample, we have lim p{h(x) = θ} = 1 N→∞ A sufficient condition for the weakly consistency is that, for a large N: lim E {h(x)} = θ, N→∞ lim var{h(x)} = 0. N→∞ In this way the sampling distribution tends to an impulse around the estimation value. M. Scarpiniti CAES Lezione 3 - Elementi di Teoria della Stima 9 / 28 Outline Introduction Statistical Models Basic Definition Classical and Bayesian estimation Confidence interval A confidence interval (CI) is a particular kind of interval estimate of parameters. Instead of estimating the parameter by a single value, an interval likely to include the parameter is given. On the other side, increasing the length N of data the sampling distribution tends to a Gaussian distribution for the central limit theorem. Known the sampling distribution it is possible to evaluate the probability on an interval (−∆, ∆). This interval, the confidence interval, indicates that the estimator θ̂ is into the interval (−∆, ∆) around θ, with probability 1 − β or confidence (1 − β) · 100%. M. Scarpiniti CAES Lezione 3 - Elementi di Teoria della Stima 10 / 28 Outline Introduction Statistical Models Basic Definition Classical and Bayesian estimation Classical and Bayesian estimation Now we can analyze several methods for the estimation of the unknown parameters. As we have said there exist two approaches to the estimation theory: Classical estimation theory where the parameter θ is deterministic. In particular we will introduce: 1 Maximum likelihood or ML estimation. Bayesian estimation theory where the parameter θ is a random variable. In particular we will introduce: 1 2 3 M. Scarpiniti Maximum a posteriori or MAP estimation; Minimum mean square error or MMSE estimation; Minimum absolute error or MAE estimation. CAES Lezione 3 - Elementi di Teoria della Stima 11 / 28 Outline Introduction Statistical Models Basic Definition Classical and Bayesian estimation Maximum likelihood (ML) estimation The Maximum likelihood (ML) estimation consists in the determination of θML through the maximization of the sampling distribution fx,θ (x; θ), here named ad likelihood function Lθ : Lθ = fx,θ (x; θ) Let us note that if fx,θ (x; θ1 ) > fx,θ (x; θ2 ) then θ1 is “more plausible” than θ2 . In this way the paradigm ML shows that the estimate θML is the most plausible due to the observations x. Usually one considers the natural logarithm of the likelihood ln Lθ = ln fx,θ (x; θ). The the estimate is done as θML = arg max {ln Lθ } θ∈Θ that is solving the following equation ∂ ln fx,θ (x; θ) ∆ θML = θ ∴ =0 ∂θ M. Scarpiniti CAES Lezione 3 - Elementi di Teoria della Stima 12 / 28 Outline Introduction Statistical Models Basic Definition Classical and Bayesian estimation Maximum a posteriori (MAP) estimation In the Maximum a posteriori (MAP) estimation the parameter θ is characterized by an a priori pdf pθ (θ). The knowledge given by measures on data modifies this probability conditioning it on the data x itself fx,θ (θ|x), known as pdf of θ conditioned a posteriori by measures x. The MAP estimation consists in the determination of the maximum of the a posteriori pdf fx,θ (θ|x). Usually one considers the natural logarithm of this pdf: ∂ ln fx,θ (θ|x) ∆ =0 θMAP = θ ∴ ∂θ Now for the Bayes’ theorem fx,θ (θ|x) = fx,θ (x|θ)fθ (θ) fx (x) and because fx (x) does not depend on θ, we have ∂ ∆ θMAP = θ ∴ (ln fx,θ (x|θ) + ln fθ (θ)) = 0 ∂θ M. Scarpiniti CAES Lezione 3 - Elementi di Teoria della Stima 13 / 28 Outline Introduction Statistical Models Basic Definition Classical and Bayesian estimation Minimum mean square error (MMSE) estimation In the minimum mean square error (MMSE) estimation the target is the minimization of the MSE Z∞ Z∞ mse(θ̂) = E {|h(x) − θ|2 } = |h(x) − θ|2 fx,θ (x, θ)dθdx −∞ −∞ Remembering that fx,θ (x, θ) = fx,θ (θ|x)fx (x) we obtain the following function to be minimized ∞ Z∞ Z 2 mse(θ̂) = fx (x) |h(x) − θ| fx,θ (θ|x)dθ dx −∞ −∞ Because both the integrands are positive and the external integrals does not depend on h(x), we can minimize only the internal one Z∞ |h(x) − θ|2 fx,θ (θ|x)dθ −∞ M. Scarpiniti CAES Lezione 3 - Elementi di Teoria della Stima 14 / 28 Outline Introduction Statistical Models Basic Definition Classical and Bayesian estimation Minimum mean square error (MMSE) estimation Differentiating with respect h(x) and posing the result to zero Z∞ |h(x) − θ|fx,θ (θ|x)dθ = 0 2 −∞ that is Z∞ h(x) Z∞ fx,θ (θ|x)dθ = −∞ R∞ because θfx,θ (θ|x)dθ −∞ fx,θ (θ|x)dθ = 1, we finally obtain −∞ ∆ Z∞ θMMSE = h(x) = θfx,θ (θ|x)dθ = E (θ|x) −∞ Hence the MMSE estimation is the expectation value of θ conditioned to data x. It is usually a nonlinear function of data with the exception of Gaussian data: in this case θMMSE is a linear function of x. M. Scarpiniti CAES Lezione 3 - Elementi di Teoria della Stima 15 / 28 Outline Introduction Statistical Models Basic Definition Classical and Bayesian estimation Minimum absolute error (MAE) estimation The cost function MSE is not the unique possible, but we can choose other cost functions. For example, widely used in literature is the minimum absolute error or MAE mae(θ̂) = E (|h(x) − θ|) It can be interpreted as the median of the a posteriori distribution. θMAP corresponds to the maximum of the a posteriori distribution; θMAE corresponds to the median of the a posteriori distribution; θMMSE corresponds to the mean of the a posteriori distribution. M. Scarpiniti CAES Lezione 3 - Elementi di Teoria della Stima 16 / 28 Outline Introduction Statistical Models Basic Definition Classical and Bayesian estimation Linear minimum mean square error (MMSE) estimation As we have remarked the MMSE is usually a nonlinear function of data x. If we want a linear trend, we have to impose a linear constraint, for example the estimator is a linear combination of the measures ∗ θMMSE ∆ = h(x) = N−1 X hi · x[i] i=0 where the coefficients hi , called weights, can be estimated with the MSE 2 N−1 X ∂ ∆ hopt = h ∴ hi x[i] = 0 E θ − ∂hj i=0 ∗ Let us pose e = θ − θMMSE =θ− N−1 P hi x[i], then i=0 ∂E {e 2 } ∂hj = 0 for j = 0, 1, . . . , N − 1, we obtain E {e · x[j]} = 0 that is: the error e is orthogonal to the data vector x (orthogonality principle). M. Scarpiniti CAES Lezione 3 - Elementi di Teoria della Stima 17 / 28 Outline Introduction Statistical Models Basic Definition Classical and Bayesian estimation The Cramér-Rao lower bound (CRLB) The Cramér-Rao lower bound (CRLB) or information inequality expresses the minimum value of the variance that can be obtained for the estimation of the parameters θ. Given a vector of RVs and an unbiased estimator θ̂ = h(x) characterized h i by the covariance matrix Cθ = cov(θ̂) = E (θ − θ̂)(θ − θ̂)T , we can define the Fisher information matrix J ∂2 {ln fx,θ (x; θ)} for i, j = 0, 1, . . . , L − 1 J(i, j) = −E ∂θ[i]∂θ[j] Then the Cramér-Rao lower bound (CRLB) is defined by the following inequality Cθ ≥ J−1 An estimator that satisfies this condition is called fully efficient and is a minimum variance unbiased estimator (MVU) too. M. Scarpiniti CAES Lezione 3 - Elementi di Teoria della Stima 18 / 28 Outline Introduction Statistical Models Basic Definition Classical and Bayesian estimation The Cramér-Rao lower bound (CRLB) Often the Cramér-Rao lower bound is limited to the variances only. In this case the elements on the principal diagonal of cov(θ̂) have to satisfy the condition 1 var(θ[i]) ≥ for i = 0, 1, . . . , L − 1 J(i, i) for a mono-dimensional RV, we have 1 var(θ̂) ≥ −E ∂[ln fx,θ (x;θ)] 2 ∂θ or, alternatively 1 var(θ̂) ≥ −E M. Scarpiniti h ∂ 2 [ln fx,θ (x;θ)] ∂θ2 CAES Lezione 3 - Elementi di Teoria della Stima i 19 / 28 Outline Introduction Statistical Models AR Model MA Model ARMA Model Statistical Models Statistical Models Remark An extremely powerful paradigm for the characterization of temporal series is to consider them obtained as the output of a LTI filter, with a white noise in input. M. Scarpiniti CAES Lezione 3 - Elementi di Teoria della Stima 20 / 28 Outline Introduction Statistical Models AR Model MA Model ARMA Model Wold’s theorem A random stationary sequence x[n] can be represented as the output of a LTI filter with impulse response h[n] when the input is a white noise η[n] x[n] = ∞ X h[k]η[n − k] k=0 This sequence is defined as linear stochastic process or linear process, shown in the following figure. If H(e jω ) is the frequency response of h[n], then the PSD of x[n] is 2 Rxx (e jω ) = H(e jω ) ση2 where ση2 is the variance of the white noise η[n]. M. Scarpiniti CAES Lezione 3 - Elementi di Teoria della Stima 21 / 28 Outline Introduction Statistical Models AR Model MA Model ARMA Model Autoregressive (AR) model The autoregressive (AR) model of a temporal series is characterized by the following equation p X x[n] = − a[k]x[n − k] + η[n] k=1 that describes an AR model of order p, indicated as AR(p). The coefficients a = [a1 , a2 , . . . , ap ] are called autoregressive parameters. The frequency response of this filter is (all poles filter ) 1 H(e jω ) = 1+ p P a[k]e −jωk k=1 M. Scarpiniti CAES Lezione 3 - Elementi di Teoria della Stima 22 / 28 Outline Introduction Statistical Models AR Model MA Model ARMA Model Autoregressive (AR) model Hence the PSD is ση2 Rxx (e jω ) = 2 p P −jωk 1 + a[k]e k=1 It is possible to show that the auto-correlation function of the AR(p) model satisfies the following equation p P − a[l]r [k − l] k ≥l l=1 r [k] = p P a[l]r [l] + ση2 k=0 − l=1 that can be re-write in matrix r [0] r [1] r [1] r [0] .. .. . . r [p − 1] r [p − 2] M. Scarpiniti form as · · · r [p − 1] · · · r [p − 2] .. .. . . ··· r [0] CAES Lezione 3 - Elementi di Teoria della Stima a[1] a[2] .. . a[p] = − r [1] r [2] .. . r [p] 23 / 28 Outline Introduction Statistical Models AR Model MA Model ARMA Model Autoregressive (AR) model In addition we have ση2 = r [0] + p X a[k]r [k] k=1 hence if we know the coefficient of acf r [k] for k = 1, 2, . . . , p then the AR parameters can be estimated by the previous p equations, known as the Yule-Walker equations. Example: given the AR process of first order x[n] = −a[1]x[n − 1] + η[n] it is r [k] = −a[1]r [k − 1], k ≥ 1 ⇒ r [k] = r [0](−a[1])k , k > 0 Then from ση2 = r [0] + a[1]r [1], we obtain r [k] = ση2 1 − a2 [1] (−a[1])|k| hence Rxx (e jω ) = M. Scarpiniti ση2 |1 + a[1]e −jω |2 CAES Lezione 3 - Elementi di Teoria della Stima 24 / 28 Outline Introduction Statistical Models AR Model MA Model ARMA Model Moving average (MA) model The moving average (MA) model of a temporal series is characterized by the following equation q X x[n] = b[k]η[n − k] k=0 that describes an MA model of order q, indicated as MA(q). The coefficients b = [b1 , b2 , . . . , bq ] are called moving average parameters. The frequency response of this filter is (all zeros filter ) H(e jω ) = q X b[k]e −jωk k=0 M. Scarpiniti CAES Lezione 3 - Elementi di Teoria della Stima 25 / 28 Outline Introduction Statistical Models AR Model MA Model ARMA Model Moving average (MA) model Hence the PSD is jω Rxx (e ) = ση2 q 2 X −jωk b[k]e k=0 The auto-correlation function of the model MA(q) is P 2 q−|k| ση b[l]b[l + |k|] |k| ≤ q r [k] = l=0 0 k >q M. Scarpiniti CAES Lezione 3 - Elementi di Teoria della Stima 26 / 28 Outline Introduction Statistical Models AR Model MA Model ARMA Model Autoregressive moving average (ARMA) model The autoregressive moving average (ARMA) model of a temporal series is characterized by the following equation x[n] = − p X a[k]x[n − k] + k=1 q X b[k]η[n − k] k=0 that describes an ARMA model of order p, q, indicated as ARMA(p, q), where p is the degree of the denominator and q the degree of the nominator, respectively. The PSD is the following 2 Rxx (e jω ) = ση2 | H(z)|z=e jω | = 2 |b0 +b1 e −jω +b2 e −j2ω +···+bM e −jqω | = ση2 |1+a e −jω +a e −j2ω +···+a e −jpω |2 1 M. Scarpiniti CAES Lezione 3 - Elementi di Teoria della Stima 2 N 27 / 28 Outline Introduction Statistical Models AR Model MA Model ARMA Model References S.M. Kay. Fundamentals of Statistical Signal Processing: Estimation Theory. Prentice Hall, Upper Saddle River, NJ, 1998. D.G. Manolakis, V.K. Ingle, S.M. Kogon Statistical and Adaptive Signal Processing. Artech House, Norwood, MA, 2005. B. Widrow, S.D. Stearns Adaptive Signal Processing. Prentice Hall ed., 1985. M. Scarpiniti CAES Lezione 3 - Elementi di Teoria della Stima 28 / 28
Documenti analoghi
Lezione3 - Sapienza
TECNICHE DI ELABORAZIONE
NUMERICA DI IMMAGINE E VIDEO
Ing. Michele Scarpiniti
Dipartimento INFOCOM – Università di Roma “La Sapienza”
[email protected]
http://ispac.ing.uniroma1.it/sca...