Expected value

==Definition==
As discussed above, there are several context-dependent ways of defining the expected value. The simplest and original definition deals with the case of finitely many possible outcomes, such as in the flip of a coin. With the theory of infinite series, this can be extended to the case of countably many possible outcomes. It is also very common to consider the distinct case of random variables dictated by (piecewise-)continuous [[probability density function]]s, as these arise in many natural contexts. All of these specific definitions may be viewed as special cases of the general definition based upon the mathematical tools of [[measure theory]] and [[Lebesgue integration]], which provide these different contexts with an axiomatic foundation and common language.

Any definition of expected value may be extended to define an expected value of a multidimensional random variable, i.e. a [[random vector]] {{mvar|X}}. It is defined component by component, as {{math|E[''X'']<sub>''i''</sub> {{=}} E[''X''<sub>''i''</sub>]}}. Similarly, one may define the expected value of a [[random matrix]] {{mvar|X}} with components {{math|''X''<sub>''ij''</sub>}} by {{math|E[''X'']<sub>''ij''</sub> {{=}} E[''X''<sub>''ij''</sub>]}}.

===Random variables with finitely many outcomes===
Consider a random variable {{mvar|X}} with a ''finite'' list {{math|''x''<sub>1</sub>, ..., ''x''<sub>''k''</sub>}} of possible outcomes, each of which (respectively) has probability {{math|''p''<sub>1</sub>, ..., ''p''<sub>''k''</sub>}} of occurring. The expectation of {{mvar|X}} is defined as{{sfnm|1a1=Billingsley|1y=1995|1p=76}}
:<math>\operatorname{E}[X] =x_1p_1 + x_2p_2 + \cdots + x_kp_k.</math>

Since the probabilities must satisfy {{math|''p''<sub>1</sub> + ⋅⋅⋅ + ''p''<sub>''k''</sub> {{=}} 1}}, it is natural to interpret {{math|E[''X'']}} as a [[weighted average]] of the {{math|''x''<sub>''i''</sub>}} values, with weights given by their probabilities {{math|''p''<sub>''i''</sub>}}.

In the special case that all possible outcomes are [[equiprobable]] (that is, {{math|''p''<sub>1</sub> {{=}} ⋅⋅⋅ {{=}} ''p''<sub>''k''</sub>}}), the weighted average is given by the standard [[arithmetic mean|average]]. In the general case, the expected value takes into account the fact that some outcomes are more likely than others.

====Examples====
[[File:Largenumbers.svg|thumb|An illustration of the convergence of sequence averages of rolls of a dice to the expected value of 3.5 as the number of rolls (trials) grows]]

* Let <math>X</math> represent the outcome of a roll of a fair six-sided {{dice}}. More specifically, <math>X</math> will be the number of [[Pip (counting)|pips]] showing on the top face of the {{dice}} after the toss. The possible values for <math>X</math> are 1, 2, 3, 4, 5, and 6, all of which are equally likely with a probability of {{frac2|1|6}}. The expectation of <math>X</math> is
::<math>\operatorname{E}[X] = 1\cdot\frac16 + 2\cdot\frac16 + 3\cdot\frac16 + 4\cdot\frac16 + 5\cdot\frac16 + 6\cdot\frac16 = 3.5.</math>
:If one rolls the {{dice}} <math>n</math> times and computes the average ([[arithmetic mean]]) of the results, then as <math>n</math> grows, the average will [[almost surely]] [[Convergent sequence|converge]] to the expected value, a fact known as the [[strong law of large numbers]].

* The [[roulette]] game consists of a small ball and a wheel with 38 numbered pockets around the edge. As the wheel is spun, the ball bounces around randomly until it settles down in one of the pockets. Suppose random variable <math>X</math> represents the (monetary) outcome of a $1 bet on a single number ("straight up" bet). If the bet wins (which happens with probability {{frac2|1|38}} in American roulette), the payoff is $35; otherwise the player loses the bet. The expected profit from such a bet will be
::<math>\operatorname{E}[\,\text{gain from }\$1\text{ bet}\,] = -\$1 \cdot \frac{37}{38} + \$35 \cdot \frac{1}{38} = -\$\frac{1}{19}.</math>
:That is, the expected value to be won from a $1 bet is −${{frac2|1|19}}. Thus, in 190 bets, the net loss will probably be about $10.

===Random variables with countably infinitely many outcomes===
Informally, the expectation of a random variable with a [[countable set|countably infinite set]] of possible outcomes is defined analogously as the weighted average of all possible outcomes, where the weights are given by the probabilities of realizing each given value. This is to say that
:<math>\operatorname{E}[X] = \sum_{i=1}^\infty x_i\, p_i,</math>
where {{math|''x''<sub>1</sub>, ''x''<sub>2</sub>, ...}} are the possible outcomes of the random variable {{mvar|X}} and {{math|''p''<sub>1</sub>, ''p''<sub>2</sub>, ...}} are their corresponding probabilities. In many non-mathematical textbooks, this is presented as the full definition of expected values in this context.{{sfnm|1a1=Ross|1y=2019|1loc=Section 2.4.1}}

However, there are some subtleties with infinite summation, so the above formula is not suitable as a mathematical definition. In particular, the [[Riemann series theorem]] of [[mathematical analysis]] illustrates that the value of certain infinite sums involving positive and negative summands depends on the order in which the summands are given. Since the outcomes of a random variable have no naturally given order, this creates a difficulty in defining expected value precisely.

For this reason, many mathematical textbooks only consider the case that the infinite sum given above [[absolute convergence|converges absolutely]], which implies that the infinite sum is a finite number independent of the ordering of summands.{{sfnm|1a1=Feller|1y=1968|1loc=Section IX.2}} In the alternative case that the infinite sum does not converge absolutely, one says the random variable ''does not have finite expectation.''{{sfnm|1a1=Feller|1y=1968|1loc=Section IX.2}}

====Examples====
* Suppose <math>x_i = i</math> and <math>p_i = \tfrac{c}{i2^i}</math> for <math>i = 1, 2, 3, \ldots,</math> where <math>c = \tfrac{1}{\ln 2}</math> is the scaling factor which makes the probabilities sum to 1. Then we have <math display="block">\operatorname{E}[X] \,= \sum_i x_i p_i = 1(\tfrac{c}{2})
+ 2(\tfrac{c}{8}) + 3 (\tfrac{c}{24}) + \cdots
 \,= \, \tfrac{c}{2} + \tfrac{c}{4} + \tfrac{c}{8} + \cdots \,=\,  c \,=\, \tfrac{1}{\ln  2}.</math>

===Random variables with density===
Now consider a random variable {{mvar|X}} which has a [[probability density function]] given by a function {{mvar|f}} on the [[real number line]]. This means that the probability of {{mvar|X}} taking on a value in any given [[open interval]] is given by the [[integral]] of {{mvar|f}} over that interval. The expectation of {{mvar|X}} is then given by the integral{{sfnm|1a1=Papoulis|1a2=Pillai|1y=2002|1loc=Section 5-3|2a1=Ross|2y=2019|2loc=Section 2.4.2}}
:<math>\operatorname{E}[X] = \int_{-\infty}^\infty x f(x)\, dx.</math>
A general and mathematically precise formulation of this definition uses [[measure theory]] and [[Lebesgue integration]], and the corresponding theory of ''absolutely continuous random variables'' is described in the next section. The density functions of many common distributions are [[piecewise continuous]], and as such the theory is often developed in this restricted setting.{{sfnm|1a1=Feller|1y=1971|1loc=Section I.2}} For such functions, it is sufficient to only consider the standard [[Riemann integration]]. Sometimes ''continuous random variables'' are defined as those corresponding to this special class of densities, although the term is used differently by various authors.

Analogously to the countably-infinite case above, there are subtleties with this expression due to the infinite region of integration. Such subtleties can be seen concretely if the distribution of {{mvar|X}} is given by the [[Cauchy distribution]] {{math|Cauchy(0, π)}}, so that {{math|''f''(''x'') {{=}} (''x''<sup>2</sup> + π<sup>2</sup>)<sup>−1</sup>}}. It is straightforward to compute in this case that
:<math>\int_a^b xf(x)\,dx=\int_a^b \frac{x}{x^2+\pi^2}\,dx=\frac{1}{2}\ln\frac{b^2+\pi^2}{a^2+\pi^2}.</math>
The limit of this expression as {{math|''a'' → −∞}} and {{math|''b'' → ∞}} does not exist: if the limits are taken so that {{math|''a'' {{=}} −''b''}}, then the limit is zero, while if the constraint {{math|2''a'' {{=}} −''b''}} is taken, then the limit is {{math|ln(2)}}.

To avoid such ambiguities, in mathematical textbooks it is common to require that the given integral [[converges absolutely]], with {{math|E[''X'']}} left undefined otherwise.{{sfnm|1a1=Feller|1y=1971|1p=5}} However, measure-theoretic notions as given below can be used to give a systematic definition of {{math|E[''X'']}} for more general random variables {{mvar|X}}.

===Arbitrary real-valued random variables===
All definitions of the expected value may be expressed in the language of [[measure theory]]. In general, if {{mvar|X}} is a real-valued [[random variable]] defined on a [[probability space]] {{math|(Ω, Σ, P)}}, then the expected value of {{mvar|X}}, denoted by {{math|E[''X'']}}, is defined as the [[Lebesgue integration|Lebesgue integral]]{{sfnm|1a1=Billingsley|1y=1995|1p=273}}
:<math>\operatorname{E} [X]  = \int_\Omega X\,d\operatorname{P}.</math>
Despite the newly abstract situation, this definition is extremely similar in nature to the very simplest definition of expected values, given above, as certain weighted averages. This is because, in measure theory, the value of the Lebesgue integral of {{mvar|X}} is defined via weighted averages of ''approximations'' of {{mvar|X}} which take on finitely many values.{{sfnm|1a1=Billingsley|1y=1995|1loc=Section 15}} Moreover, if given a random variable with finitely or countably many possible values, the Lebesgue theory of expectation is identical with the summation formulas given above. However, the Lebesgue theory clarifies the scope of the theory of probability density functions. A random variable {{mvar|X}} is said to be ''absolutely continuous'' if any of the following conditions are satisfied:
* there is a nonnegative [[measurable function]] {{mvar|f}} on the real line such that
::<math>\text{P}(X\in A)=\int_A f(x)\,dx,</math>
:for any [[Borel set]] {{mvar|A}}, in which the integral is Lebesgue.
* the [[cumulative distribution function]] of {{mvar|X}} is [[absolutely continuous]].
* for any Borel set {{mvar|A}} of real numbers with [[Lebesgue measure]] equal to zero, the probability of {{mvar|X}} being valued in {{mvar|A}} is also equal to zero
* for any positive number {{math|ε}} there is a positive number {{math|δ}} such that: if {{mvar|A}} is a Borel set with Lebesgue measure less than {{math|δ}}, then the probability of {{mvar|X}} being valued in {{mvar|A}} is less than {{math|ε}}.
These conditions are all equivalent, although this is nontrivial to establish.{{sfnm|1a1=Billingsley|1y=1995|1loc=Theorems 31.7 and 31.8 and p. 422}} In this definition, {{mvar|f}} is called the ''probability density function'' of {{mvar|X}} (relative to Lebesgue measure). According to the change-of-variables formula for Lebesgue integration,{{sfnm|1a1=Billingsley|1y=1995|1loc=Theorem 16.13}} combined with the [[law of the unconscious statistician]],{{sfnm|1a1=Billingsley|1y=1995|1loc=Theorem 16.11}} it follows that
:<math>\operatorname{E}[X]\equiv\int_\Omega X\,d\operatorname{P}=\int_{\mathbb{R}}xf(x)\,dx</math>
for any absolutely continuous random variable {{mvar|X}}. The above discussion of continuous random variables is thus a special case of the general Lebesgue theory, due to the fact that every piecewise-continuous function is measurable.

[[File:Roland Uhl 2023 Charakterisierung des Erwartungswertes Bild1.svg|250px|right|border]]
The expected value of any real-valued random variable <math>X</math> can also be defined on the graph of its [[cumulative distribution function]] <math>F</math> by a nearby equality of areas. In fact, <math>\operatorname{E}[X] = \mu</math> with a real number <math>\mu</math> if and only if the two surfaces in the <math>x</math>-<math>y</math>-plane, described by
:<math>
x\le\mu,\;\, 0\le y\le F(x) \quad</math> or <math>\quad x\ge\mu,\;\, F(x)\le y\le 1
</math>
respectively, have the same finite area, i.e. if
:<math>
\int_{-\infty}^\mu F(x)\,dx = \int_\mu^\infty \big(1 - F(x)\big)\,dx
</math>
and both [[improper integral|improper Riemann integrals]] converge. Finally, this is equivalent to the representation
:<math>
\operatorname{E}[X] 
= \int_0^\infty \big(1 - F(x)\big)\,dx - \int_{-\infty}^0 F(x)\,dx,
</math>
also with convergent integrals.<ref>{{cite book |last1=Uhl |first1=Roland |title=Charakterisierung des Erwartungswertes am Graphen der Verteilungsfunktion |date=2023 |publisher=Technische Hochschule Brandenburg |doi=10.25933/opus4-2986 |doi-access=free}} pp. 2–4.</ref>

===Infinite expected values===
Expected values as defined above are automatically finite numbers. However, in many cases it is fundamental to be able to consider expected values of {{math|±∞}}. This is intuitive, for example, in the  case of the [[St. Petersburg paradox]], in which one considers a random variable with possible outcomes {{math|''x''<sub>''i''</sub> {{=}} 2<sup>''i''</sup>}}, with associated probabilities {{math|''p''<sub>''i''</sub> {{=}} 2<sup>−''i''</sup>}}, for {{mvar|i}} ranging over all positive integers. According to the summation formula in the case of random variables with countably many outcomes, one has
<math display="block"> \operatorname{E}[X]= \sum_{i=1}^\infty x_i\,p_i  =2\cdot \frac{1}{2}+4\cdot\frac{1}{4} + 8\cdot\frac{1}{8}+ 16\cdot\frac{1}{16}+ \cdots = 1 + 1 + 1 + 1 + \cdots.</math>
It is natural to say that the expected value equals {{math|+∞}}.

There is a rigorous mathematical theory underlying such ideas, which is often taken as part of the definition of the Lebesgue integral.{{sfnm|1a1=Billingsley|1y=1995|1loc=Section 15}} The first fundamental observation is that, whichever of the above definitions are followed, any ''nonnegative'' random variable whatsoever can be given an unambiguous expected value; whenever absolute convergence fails, then the expected value can be defined as {{math|+∞}}. The second fundamental observation is that any random variable can be written as the difference of two nonnegative random variables. Given a random variable {{mvar|X}}, one defines the [[positive and negative parts]] by {{math|''X''<sup> +</sup> {{=}} max(''X'', 0)}} and {{math|''X''<sup> −</sup> {{=}} −min(''X'', 0)}}. These are nonnegative random variables, and it can be directly checked that {{math|''X'' {{=}} ''X''<sup> +</sup> − ''X''<sup> −</sup>}}. Since {{math|E[''X''<sup> +</sup>]}} and {{math|E[''X''<sup> −</sup>]}} are both then defined as either nonnegative numbers or {{math|+∞}}, it is then natural to define:
<math display="block">
\operatorname{E}[X] = \begin{cases} \operatorname{E}[X^+] - \operatorname{E}[X^-] & \text{if } \operatorname{E}[X^+] < \infty \text{ and } \operatorname{E}[X^-] < \infty;\\
+\infty  & \text{if } \operatorname{E}[X^+] = \infty \text{ and } \operatorname{E}[X^-] < \infty;\\
-\infty & \text{if } \operatorname{E}[X^+] < \infty \text{ and } \operatorname{E}[X^-] = \infty;\\
\text{undefined} & \text{if } \operatorname{E}[X^+] = \infty \text{ and } \operatorname{E}[X^-] = \infty.
\end{cases}
</math>

According to this definition, {{math|E[''X'']}} exists and is finite if and only if {{math|E[''X''<sup> +</sup>]}} and {{math|E[''X''<sup> −</sup>]}} are both finite. Due to the formula {{math|{{!}}''X''{{!}} {{=}} ''X''<sup> +</sup> + ''X''<sup> −</sup>}}, this is the case if and only if {{math|E{{!}}''X''{{!}}}} is finite, and this is equivalent to the absolute convergence conditions in the definitions above. As such, the present considerations do not define finite expected values in any cases not previously considered; they are only useful for infinite expectations.
* In the case of the St. Petersburg paradox, one has {{math|''X''<sup> −</sup> {{=}} 0}} and so {{math|E[''X''] {{=}} +∞}} as desired.
* Suppose the random variable {{mvar|X}} takes values {{math|1, −2,3, −4, ...}} with respective probabilities {{math|6π<sup>−2</sup>, 6(2π)<sup>−2</sup>, 6(3π)<sup>−2</sup>, 6(4π)<sup>−2</sup>, ...}}. Then it follows that {{math|''X''<sup> +</sup>}} takes value {{math|2''k''−1}} with probability {{math|6((2''k''−1)π)<sup>−2</sup>}} for each positive integer {{mvar|k}}, and takes value {{math|0}} with remaining probability. Similarly, {{math|''X''<sup> −</sup>}} takes value {{math|2''k''}} with probability {{math|6(2''k''π)<sup>−2</sup>}}  for each positive integer {{mvar|k}} and takes value {{math|0}} with remaining probability. Using the definition for non-negative random variables, one can show that both {{math|E[''X''<sup> +</sup>] {{=}} ∞}} and {{math|E[''X''<sup> −</sup>] {{=}} ∞}} (see [[harmonic series (mathematics)|Harmonic series]]). Hence, in this case the expectation of {{mvar|X}} is undefined.
* Similarly, the Cauchy distribution, as discussed above, has undefined expectation.