Bipin's Bubble

Probability 1

The chapter 2 of Deep Learning Book is focussed on Probability and Information Theory. This post is TLDR part 1 of the corresponding chapter of the book.

Why Probability?


Random Variables


Probability Distributions

Discrete Random Variable and Probability Mass Function

Continuous Random Variable and Probability Density Function


Marginal Probability

\[\forall x \in \text{x}, P(\text{x}= x) = \sum_y{P(\text{x}= x, \text{y}=y)}\] \[\forall y \in \text{y}, P(\text{y}= y) = \sum_x{P(\text{x}= x, \text{y}=y)}\]

Conditional Probability

\[P(\text{y} = y \vert \text{x} = x) = \dfrac{P(\text{y} = y, \text{x} = x)}{P(\text{x} = x)}\]

Chain rule of conditional Probability

\[P(x^{(1)}, ..., x^{(n)}) = P(x^{(1)}) \ \Pi_{i=2}^n{\ P(x_{(i)} \vert x^{(1)}, ..., x^{(i-1)})}\]

Using this equation we can write:

\[P(a,b,c) = P(a \vert b,c) P (b,c) \\ . \\ P(b,c) = P(b \vert c) P (c) \\ . \\ P(a,b,c) = P(a \vert b,c) \ P(b \vert c) \ P(c)\]

Independence and conditional independence

\[\forall x \in \text{x}, y \in \text{y}, p(\text{x}=x, \text{y}=y) = p(\text{x}=x) p(\text{y}=y)\] \[\forall x \in \text{x}, y \in \text{y}, z \in \text{z}, p(\text{x}=x, \text{y}=y \vert \text{z} = z) = p(\text{x}=x \vert \text{z} = z) p(\text{y}=y \vert \text{z} = z)\]

Bayes’ Rule

\[P(x \vert y) = \frac{P(x) P(y \vert x)}{P(y)}\\.\\ P(x \vert y) = \frac{P(x) P(y \vert x)}{\sum_x{P(x) P(y \vert x)}}\]

Expectation, Variance and Covariance

Expectation

\[\mathbb{E}_{\text{x} \sim P}[f(x)] = \sum_{x}{f(x)P(x)}\] \[\mathbb{E}_{\text{x} \sim p}[f(x)] = \int{f(x)p(x)dx}\] \[\mathbb{E}_{\text{x}}[\alpha f(x) + \beta g(x)] = \alpha \mathbb{E}_{\text{x}}[f(x)] + \beta \mathbb{E}_{\text{x}}[g(x)]\]

Variance

\[Var(f(x)) = \mathbb{E}[\ (f(x) - \mathbb{E}[f(x)])^2 \ ]\]

Covariance

\[Cov( f(x), g(y) ) = \mathbb{E}[\ (f(x) - \mathbb{E}[f(x)]) (g(y) - \mathbb{E}[g(y)])\ ]\]
comments powered by Disqus