Bipin's Bubble

Probability 2: Common Probability Distributions

The chapter 2 of Deep Learning Book is focussed on Probability and Information Theory. This post is TLDR part 2 of the corresponding chapter of the book.

Bernoulli Distribution

\[P(\text{x} = 1) = \phi \\ P(\text{x} = 0) = 1 - \phi\] \[P(\text{x} = x) = \phi^x (1-\phi)^{(1-x)}\] \[\mathbb{E}_{\text{x}} [x] = \phi \\ Var_{\text{x}}(x) = \phi (1- \phi)\]

Multinoulli Distribution (Categorical disribution)


Gaussian Distribution (Normal Distribution)

\[\mathcal{N}(x; \mu, \sigma^2) = \sqrt{\frac{1}{2 \pi \sigma^2}} exp \left(-\frac{1}{2 \sigma^2} (x - \mu)^2 \right)\] \[\mathcal{N}(x; \mu, \beta^{-1}) = \sqrt{\frac{\beta}{2 \pi}} exp \left(-\frac{\beta}{2 } (x - \mu)^2 \right)\] \[\mathcal{N}(\mathbf{x}; \mathbf{\mu}, \mathbf{\Sigma}) = \sqrt{\frac{1}{(2 \pi)^n det(\mathbf{\Sigma})}} exp \left(-\frac{1}{2} (\mathbf{x} - \mathbf{\mu})^T \mathbf{\Sigma}^{-1} (\mathbf{x} - \mathbf{\mu}) \right)\]

Or, Alternatively,

\[\mathcal{N}(\mathbf{x}; \mathbf{\mu}, \mathbf{\beta}^{-1}) = \sqrt{\frac{det(\mathbf{\beta})}{(2 \pi)^n}} exp \left(-\frac{1}{2} (\mathbf{x} - \mathbf{\mu})^T \beta (\mathbf{x} - \mathbf{\mu}) \right)\]

Exponential and Laplace Distribution

\[p(x; \lambda) = \lambda \mathbf{1}_{x \ge 0} exp(-\lambda x)\] \[Laplace(x; \mu, \gamma) = \frac{1}{2\gamma} exp \left(-\frac{\vert x- \mu \vert}{\gamma}\right)\]

Dirac Distribution and empirical distribution

\[p(x) = \delta(x - \mu) = \mathbf{1}_{x = \mu}\] \[p(x) = \frac{1}{m} \sum_{i=1}^{m}\delta(x - x^{(i)})\]

Mixtrue of Distributions

\[P(x) = \sum_i{P(c = i) P(x\vert c = i)}\]

Useful Properties of common functions

Some useful relations:


Change of variable

If $y = g(x)$, where $g$ is invertible, continuous, differentiable function:

\[\vert \ p_y (g(x)) dy \ \vert = \vert \ p_x (x) dx \ \vert \\ . \\ p_y(y) = p_x(g^{-1}(y)) \left \vert \frac{\delta x}{\delta y} \right \vert \\ . \\ p_x(x) = p_y(g(x)) \left \vert \frac{\delta g(x)}{\delta x} \right \vert\]
comments powered by Disqus