Expected value

==Properties==
The basic properties below (and their names in bold) replicate or follow immediately from those of [[Lebesgue integral]]. Note that the letters "a.s." stand for "[[almost surely]]"—a central property of the Lebesgue integral. Basically, one says that an inequality like <math>X \geq 0</math> is true almost surely, when the probability measure attributes zero-mass to the complementary event <math>\left\{ X < 0 \right\}.</math>
* Non-negativity: If <math>X \geq 0</math>  (a.s.),  then <math>\operatorname{E}[ X] \geq 0.</math>
{{anchor|Linearity}}
* Linearity of expectation:<ref name=":1">{{Cite web|last=Weisstein|first=Eric W.|title=Expectation Value|url=https://mathworld.wolfram.com/ExpectationValue.html|access-date=2020-09-11|website=mathworld.wolfram.com|language=en}}</ref> The expected value operator (or ''expectation operator'') <math>\operatorname{E}[\cdot]</math> is [[linear operator|linear]] in the sense that, for any random variables <math>X</math> and <math>Y,</math> and a constant <math>a,</math> <math display="block">\begin{align}
  \operatorname{E}[X + Y] &=   \operatorname{E}[X] + \operatorname{E}[Y], \\
  \operatorname{E}[aX]    &= a \operatorname{E}[X],
\end{align}
</math>
:whenever the right-hand side is well-defined. By [[mathematical induction|induction]], this means that the expected value of the sum of any finite number of random variables is the sum of the expected values of the individual random variables, and the expected value scales linearly with a multiplicative constant. Symbolically, for <math>N</math> random variables <math>X_{i}</math> and constants <math>a_{i} (1\leq i \leq N),</math> we have <math display="inline"> \operatorname{E}\left[\sum_{i=1}^{N}a_{i}X_{i}\right] = \sum_{i=1}^{N}a_{i}\operatorname{E}[X_{i}].</math> If we think of the set of random variables with finite expected value as forming a vector space, then the linearity of expectation implies that the expected value is a [[linear form]] on this vector space.
* Monotonicity: If <math>X\leq Y</math> [[almost surely|(a.s.)]], and both <math>\operatorname{E}[X]</math> and <math>\operatorname{E}[Y]</math> exist, then <math>\operatorname{E}[X]\leq\operatorname{E}[Y].</math> {{pb}} Proof follows from the linearity and the non-negativity property for <math>Z=Y-X,</math> since <math>Z\geq 0</math> (a.s.).
* Non-degeneracy: If <math>\operatorname{E}[|X|]=0,</math> then <math>X=0</math> (a.s.).
* If <math>X = Y</math> [[almost surely|(a.s.)]], then <math>\operatorname{E}[ X] = \operatorname{E}[ Y].</math> In other words, if X and Y are random variables that take different values with probability zero, then the expectation of X will equal the expectation of Y.
* If <math>X=c</math> [[almost surely|(a.s.)]] for some real number {{mvar|c}}, then <math>\operatorname{E}[X] = c.</math> In particular, for a random variable <math>X</math> with well-defined expectation, <math>\operatorname{E}[\operatorname{E}[X]] = \operatorname{E}[X].</math> A well defined expectation implies that there is one number, or rather, one constant that defines the expected value. Thus follows that the expectation of this constant is just the original expected value.
* As a consequence of the formula {{math|{{!}}''X''{{!}} {{=}} ''X''<sup> +</sup> + ''X''<sup> −</sup>}} as discussed above, together with the [[triangle inequality]], it follows that for any random variable <math>X</math> with well-defined expectation, one has <math>|\operatorname{E}[X]| \leq \operatorname{E}|X|.</math>
* Let {{math|'''1'''<sub>''A''</sub>}} denote the [[indicator function]] of an [[Event (probability theory)|event]] {{mvar|A}}, then {{math|E['''1'''<sub>''A''</sub>]}} is given by the probability of {{mvar|A}}. This is nothing but a different way of stating the expectation of a [[Bernoulli random variable]], as calculated in the table above.
* Formulas in terms of CDF: If <math>F(x)</math> is the [[cumulative distribution function]] of a random variable {{mvar|X}}, then
:<math display="block">\operatorname{E}[X] = \int_{-\infty}^\infty x\,dF(x),</math>
:where the values on both sides are well defined or not well defined simultaneously, and the integral is taken in the sense of [[Lebesgue-Stieltjes integral|Lebesgue-Stieltjes]]. As a consequence of [[integration by parts]] as applied to this representation of {{math|E[''X'']}}, it can be proved that <math display="block"> \operatorname{E}[X] = \int_0^\infty (1-F(x))\,dx - \int^0_{-\infty} F(x)\,dx,</math> with the integrals taken in the sense of Lebesgue.{{sfnm|1a1=Feller|1y=1971|1loc=Section V.6}} As a special case, for any random variable {{mvar|X}} valued in the nonnegative integers {{math|{0, 1, 2, 3, ...}}}, one has <math display="block"> \operatorname{E}[X]=\sum _{n=0}^\infty \operatorname{P}(X>n),</math>
:where {{mvar|P}} denotes the underlying probability measure.
* Non-multiplicativity: In general, the expected value is not multiplicative, i.e. <math>\operatorname{E}[XY]</math> is not necessarily equal to <math>\operatorname{E}[X]\cdot \operatorname{E}[Y].</math> If <math>X</math> and <math>Y</math> are [[independent random variables|independent]], then one can show that <math>\operatorname{E}[XY]=\operatorname{E}[X] \operatorname{E}[Y].</math> If the random variables are [[Dependent and independent variables|dependent]], then generally <math>\operatorname{E}[XY] \neq \operatorname{E}[X] \operatorname{E}[Y],</math> although in special cases of dependency the equality may hold.
* [[Law of the unconscious statistician]]: The expected value of a measurable function of <math>X,</math> <math>g(X),</math> given that <math>X</math> has a probability density function <math>f(x),</math> is given by the [[inner product]] of <math>f</math> and <math>g</math>:<ref name=":1" /> <math display="block">\operatorname{E}[g(X)] = \int_{\R} g(x) f(x)\, dx .</math> This formula also holds in multidimensional case, when <math>g</math> is a function of several random variables, and <math>f</math> is their [[Probability density function#Densities associated with multiple variables|joint density]].<ref name=":1" />{{sfnm|1a1=Papoulis|1a2=Pillai|1y=2002|1loc=Section 6-4}}

=== Inequalities===
[[Concentration inequalities]] control the likelihood of a random variable taking on large values. [[Markov's inequality]] is among the best-known and simplest to prove: for a ''nonnegative'' random variable {{mvar|X}} and any positive number {{mvar|a}}, it states that{{sfnm|1a1=Feller|1y=1968|1loc=Section IX.6|2a1=Feller|2y=1971|2loc=Section V.7|3a1=Papoulis|3a2=Pillai|3y=2002|3loc=Section 5-4|4a1=Ross|4y=2019|4loc=Section 2.8}} <math display="block">
\operatorname{P}(X\geq a)\leq\frac{\operatorname{E}[X]}{a}.
</math>

If {{mvar|X}} is any random variable with finite expectation, then Markov's inequality may be applied to the random variable {{math|{{!}}''X''−E[''X'']{{!}}<sup>2</sup>}} to obtain [[Chebyshev's inequality]] <math display="block">
\operatorname{P}(|X-\text{E}[X]|\geq a)\leq\frac{\operatorname{Var}[X]}{a^2},
</math>
where {{math|Var}} is the [[variance]].{{sfnm|1a1=Feller|1y=1968|1loc=Section IX.6|2a1=Feller|2y=1971|2loc=Section V.7|3a1=Papoulis|3a2=Pillai|3y=2002|3loc=Section 5-4|4a1=Ross|4y=2019|4loc=Section 2.8}} These inequalities are significant for their nearly complete lack of conditional assumptions. For example, for any random variable with finite expectation, the Chebyshev inequality implies that there is at least a 75% probability of an outcome being within two [[standard deviation]]s of the expected value. However, in special cases the Markov and Chebyshev inequalities often give much weaker information than is otherwise available. For example, in the case of an unweighted dice, Chebyshev's inequality says that odds of rolling between 1 and 6 is at least 53%; in reality, the odds are of course 100%.{{sfnm|1a1=Feller|1y=1968|1loc=Section IX.6}} The [[Kolmogorov inequality]] extends the Chebyshev inequality to the context of sums of random variables.{{sfnm|1a1=Feller|1y=1968|1loc=Section IX.7}}

The following three inequalities are of fundamental importance in the field of [[mathematical analysis]] and its applications to probability theory.
* [[Jensen's inequality]]: Let {{math|''f'': ℝ → ℝ}} be a [[convex function]] and {{mvar|X}} a random variable with finite expectation. Then{{sfnm|1a1=Feller|1y=1971|1loc=Section V.8}} <math display="block">
f(\operatorname{E}(X)) \leq \operatorname{E} (f(X)).
</math>
:Part of the assertion is that the [[positive and negative parts|negative part]] of {{math|''f''(''X'')}} has finite expectation, so that the right-hand side is well-defined (possibly infinite). Convexity of {{mvar|f}} can be phrased as saying that the output of the weighted average of ''two'' inputs under-estimates the same weighted average of the two outputs; Jensen's inequality extends this to the setting of completely general weighted averages, as represented by the expectation. In the special case that {{math|''f''(''x'') {{=}} {{!}}''x''{{!}}<sup>''t''/''s''</sup>}} for positive numbers {{math|''s'' &lt; ''t''}}, one obtains the Lyapunov inequality{{sfnm|1a1=Billingsley|1y=1995|1pp=81,277}} <math display="block">
\left(\operatorname{E}|X|^s\right)^{1/s}\leq\left(\operatorname{E}|X|^t\right)^{1/t}.
</math>
:This can also be proved by the Hölder inequality.{{sfnm|1a1=Feller|1y=1971|1loc=Section V.8}} In measure theory, this is particularly notable for proving the inclusion {{math|L<sup>''s''</sup> ⊂  L<sup>''t''</sup>}} of [[Lp space|{{math|L<sup>''p''</sup> spaces}}]], in the special case of [[probability space]]s.
* [[Hölder's inequality]]: if {{math|''p'' &gt; 1}} and {{math|''q'' &gt; 1}} are numbers satisfying {{math|''p''<sup> −1</sup> + ''q''<sup> −1</sup> {{=}} 1}}, then <math display="block">
\operatorname{E}|XY|\leq(\operatorname{E}|X|^p)^{1/p}(\operatorname{E}|Y|^q)^{1/q}.
</math>
: for any random variables {{mvar|X}} and {{mvar|Y}}.{{sfnm|1a1=Feller|1y=1971|1loc=Section V.8}} The special case of {{math|''p'' {{=}} ''q'' {{=}} 2}} is called the [[Cauchy–Schwarz inequality]], and is particularly well-known.{{sfnm|1a1=Feller|1y=1971|1loc=Section V.8}}
* [[Minkowski inequality]]: given any number {{math|''p'' ≥ 1}}, for any random variables {{mvar|X}} and {{mvar|Y}} with {{math|E{{!}}''X''{{!}}<sup>''p''</sup>}} and {{math|E{{!}}''Y''{{!}}<sup>''p''</sup>}} both finite, it follows that {{math|E{{!}}''X'' + ''Y''{{!}}<sup>''p''</sup>}} is also finite and{{sfnm|1a1=Billingsley|1y=1995|1loc=Section 19}} <math display="block">
\Bigl(\operatorname{E}|X+Y|^p\Bigr)^{1/p}\leq\Bigl(\operatorname{E}|X|^p\Bigr)^{1/p}+\Bigl(\operatorname{E}|Y|^p\Bigr)^{1/p}.
</math>
The Hölder and Minkowski inequalities can be extended to general [[measure space]]s, and are often given in that context. By contrast, the Jensen inequality is special to the case of probability spaces.

===Expectations under convergence of random variables===
In general, it is not the case that <math>\operatorname{E}[X_n] \to \operatorname{E}[X]</math> even if <math>X_n\to X</math> pointwise. Thus, one cannot interchange limits and expectation, without additional conditions on the random variables. To see this, let <math>U</math> be a random variable distributed uniformly on <math>[0,1].</math> For <math>n\geq 1,</math> define a sequence of random variables
:<math>X_n = n \cdot \mathbf{1}\left\{ U \in \left(0,\tfrac{1}{n}\right)\right\},</math>
with <math>{\mathbf 1}\{A\}</math> being the indicator function of the event <math>A.</math> Then, it follows that <math>X_n \to 0</math> pointwise. But, <math>\operatorname{E}[X_n] = n \cdot \operatorname{P}\left(U \in \left[ 0, \tfrac{1}{n}\right] \right) = n \cdot \tfrac{1}{n} = 1</math> for each <math>n.</math> Hence, <math>\lim_{n \to \infty} \operatorname{E}[X_n] = 1 \neq 0 = \operatorname{E}\left[ \lim_{n \to \infty} X_n \right].</math>

Analogously, for general sequence of random variables <math>\{ Y_n : n \geq 0\},</math> the expected value operator is not <math>\sigma</math>-additive, i.e.
:<math>\operatorname{E}\left[\sum^\infty_{n=0} Y_n\right] \neq \sum^\infty_{n=0}\operatorname{E}[Y_n].</math>

An example is easily obtained by setting <math>Y_0 = X_1</math> and <math>Y_n = X_{n+1} - X_n</math> for <math>n \geq 1,</math> where <math>X_n</math> is as in the previous example.

A number of convergence results specify exact conditions which allow one to interchange limits and expectations, as specified below.
* [[Monotone convergence theorem]]: Let <math>\{X_n : n \geq 0\}</math> be a sequence of random variables, with <math>0 \leq X_n \leq X_{n+1}</math> (a.s) for each <math>n \geq 0.</math> Furthermore, let <math>X_n \to X</math> pointwise. Then, the monotone convergence theorem states that <math>\lim_n\operatorname{E}[X_n]=\operatorname{E}[X].</math> {{pb}} Using the monotone convergence theorem, one can show that expectation indeed satisfies countable additivity for non-negative random variables. In particular, let <math>\{X_i\}^\infty_{i=0}</math> be non-negative random variables.  It follows from [[#Monotone convergence theorem|monotone convergence theorem]] that <math display="block">
\operatorname{E}\left[\sum^\infty_{i=0}X_i\right] = \sum^\infty_{i=0}\operatorname{E}[X_i].
</math>
* [[Fatou's lemma]]: Let <math>\{ X_n \geq 0 : n \geq 0\}</math> be a sequence of non-negative random variables. Fatou's lemma states that <math display="block">\operatorname{E}[\liminf_n X_n] \leq \liminf_n \operatorname{E}[X_n].</math> {{pb}} '''Corollary.''' Let <math>X_n \geq 0</math> with <math>\operatorname{E}[X_n] \leq C</math> for all <math>n \geq 0.</math> If <math>X_n \to X</math> (a.s), then <math>\operatorname{E}[X] \leq C.</math> {{pb}} '''Proof''' is by observing that <math display="inline"> X = \liminf_n X_n</math> (a.s.) and applying Fatou's lemma.
* [[Dominated convergence theorem]]: Let <math>\{X_n : n \geq 0 \}</math> be a sequence of random variables. If <math>X_n\to X</math> [[pointwise convergence|pointwise]] (a.s.), <math>|X_n|\leq Y \leq +\infty</math> (a.s.), and <math>\operatorname{E}[Y]<\infty.</math> Then, according to the dominated convergence theorem,
** <math>\operatorname{E}|X| \leq \operatorname{E}[Y] <\infty</math>;
** <math>\lim_n\operatorname{E}[X_n]=\operatorname{E}[X]</math>
** <math>\lim_n\operatorname{E}|X_n - X| = 0.</math>
* [[Uniform integrability]]: In some cases, the equality <math>\lim_n\operatorname{E}[X_n]=\operatorname{E}[\lim_n X_n]</math> holds when the sequence <math>\{X_n\}</math> is ''uniformly integrable.''

===Relationship with characteristic function===
The probability density function <math>f_X</math> of a scalar random variable <math>X</math> is related to its [[characteristic function (probability)|characteristic function]] <math>\varphi_X</math> by the inversion formula:
:<math>f_X(x) = \frac{1}{2\pi}\int_{\mathbb{R}} e^{-itx}\varphi_X(t) \, \mathrm{d}t.</math>

For the expected value of <math>g(X)</math> (where <math>g:{\mathbb R}\to{\mathbb R}</math> is a [[Measurable function|Borel function]]), we can use this inversion formula to obtain
:<math>\operatorname{E}[g(X)] = \frac{1}{2\pi} \int_{\mathbb R} g(x)\left[ \int_{\mathbb R} e^{-itx}\varphi_X(t) \, \mathrm{d}t \right]\,\mathrm{d}x.</math>

If <math>\operatorname{E}[g(X)]</math> is finite, changing the order of integration, we get, in accordance with [[Fubini theorem|Fubini–Tonelli theorem]],
:<math>\operatorname{E}[g(X)] = \frac{1}{2\pi} \int_{\mathbb R} G(t) \varphi_X(t) \, \mathrm{d}t,</math>
where
:<math>G(t) = \int_{\mathbb R} g(x) e^{-itx} \, \mathrm{d}x</math>
is the [[Fourier transform]] of <math>g(x).</math> The expression for <math>\operatorname{E}[g(X)]</math> also follows directly from the [[Plancherel theorem]].
Description	What you type	What you get
Italic	''Italic text''	Italic text
Bold	'''Bold text'''	Bold text
Bold & italic	'''''Bold & italic text'''''	*Bold & italic text*
Description	What you type	What you get
Reference	Page text.<ref>[https://www.example.org/ Link text], additional text.</ref>	Page text.^[1]
Named reference	Page text.<ref name="test">[https://www.example.org/ Link text]</ref>	Page text.^[2]
Additional use of the same reference	Page text.<ref name="test" />	Page text.^[2]
Display references	<references />	↑ Link text, additional text. ↑ Link text