Bipin's Bubble

Information Theory 1

The chapter 2 of Deep Learning Book is focussed on Probability and Information Theory. This post is TLDR part 4 of the corresponding chapter of the book.


Introduction and definitions

\[I(x) = -log(P(x))\] \[H(x) = \mathbb{E}_{x \sim P}[\ I(x) \ ] = -\mathbb{E}_{x \sim P}[log(P(x)) ]\]

KL Divergence

\[D_{KL}(P \vert\vert Q) = \mathbb{E}_{x \sim P} \left[ log \frac{P(x)}{Q(x)} \right] = \mathbb{E}_{x \sim P} \left[\ log P(x) - log Q(x) \ \right]\]

Cross Entropy

\[H(P,Q) = H(P) + D_{KL}(P \vert\vert Q) \\ . \\ = - \mathbb{E}_{x \sim P}[logP(x)] + \mathbb{E}_{x \sim P}\left[log \frac{P(x)}{Q(x)}\right] \\.\\ = - \mathbb{E}_{x \sim P}\ log \ Q(x)\]

Notes:


comments powered by Disqus