Symbulate Documentation

Methods for Common Discrete and Continuous Distributions

Symbulate can function as an analytical distribution calculator to calculate theoretical probabilities, expected values, and quantiles for common discrete and continuous distributions.

Be sure to import Symbulate using the following commands.

In [1]:
from symbulate import *
%matplotlib inline

Probability density function (pdf)

The .pdf() method evaluates the probability density function of named distributions. If $f$ is the pdf of a named continuous Distribution, calling Distribution.pdf(x) returns the value $f(x)$.

Example. The Normal(0, 1) pdf evaluated at 0 is $f(0)=1/\sqrt{2\pi}$.

In [2]:
Normal(mean=0, sd=1).pdf(0)
Out[2]:
0.3989422804014327

For discrete random variables, the probability density function is also known as the probability mass function (pmf). If $p$ is the pmf of a named discrete Distribution, calling Distribution.pdf(x) returns the value $p(x)$.

Example. The Binomial(5, 0.4) pmf evaluated at 3, which is $\binom{5}{3}0.4^3 0.6^2$.

In [3]:
Binomial(n=5, p=0.4).pdf(3)
Out[3]:
0.23039999999999999

Cumulative distribution function (cdf)

Recall that if $X$ is a random variable defined on a probability space with probability measure $P$, then its cumulative distribution function, $F:\mathbb{R}\mapsto[0,1]$, is defined by $F(x) = P(X\le x)$. The .cdf() method evaluates the cumulative distribution function of named distributions. If $F$ is the cdf of a named Distribution, calling Distribution.cdf(x) returns the value $F(x)$.

Example. The Normal(0, 1) cdf evaluated at 2 is $\int_{-\infty}^2 \frac{1}{\sqrt{2\pi}} e^{-z^2/2}\, dz$.

In [4]:
Normal(mean=0, sd=1).cdf(2)
Out[4]:
0.97724986805182079

Example. The Binomial(5, 0.4) cdf evaluated at 3 is $\sum_{x=0}^3\binom{5}{x}0.4^x 0.6^{5-x}$.

In [5]:
Binomial(n=5, p = 0.4).cdf(3)
Out[5]:
0.91295999999999999

Expected value (mean)

For a named Distribution, calling Distribution.mean() returns its expected value (mean).

Example. The expected value of a Gamma distribution with shape parameter $\alpha$ and rate parameter $\lambda$ is $\alpha/\lambda$.

In [6]:
Gamma(shape=10, rate=2).mean()
Out[6]:
5.0

Example. The expected value of a Binomial($n$, $p$) distribution is $np$.

In [7]:
Binomial(n=5, p=0.4).mean()
Out[7]:
2.0

There are some technicalities that can be encountered regarding whether an expected value is defined or finite. If $E(X)$ is defined, it can be written as $E(X) = E(\max(X,0)) - E(-\min(X, 0))$.The expected value of a random variable is finite only if both $E(\max(X,0))$ and $E(-\min(X, 0))$ are finite.

It is possible to have infinite expected value, if one of $E(\max(X,0))$ and $E(-\min(X, 0))$ is infinite and the other is finite.

Example. The expected value of an $F$ distribution with $d_2$ degrees of freedom in the denominator is infinite if $d_2\le 2$.

In [8]:
F(dfN=10, dfD=2).mean()
Out[8]:
inf

The expec ted value of a random variable $X$ is undefined if both $E(\max(X, 0))$ and $E(-\min(X, 0))$ are infinite.

Example. The expected value of a Cauchy distribution is undefined.

In [9]:
Cauchy(loc=0, scale=1).mean()
Out[9]:
nan

Variance (var) and standard deviation (sd)

For a named Distribution, calling Distribution.var() returns its variance and Distribution.sd() returns its standard deviation.

Example. The variance of a Gamma distribution with shape parameter $\alpha$ and rate parameter $\lambda$ is $\alpha/\lambda^2$.

In [10]:
Gamma(shape=10, rate=2).var()
Out[10]:
2.5

Example. The standard deviation of a Binomial($n$, $p$) distribution is $\sqrt{np(1-p)}$.

In [11]:
Binomial(n=5, p=0.4).sd()
Out[11]:
1.0954451150103321

Similar to the expected value, technicalities can be encountered regarding whether a variance is defined and finite.

Example. For a Student's t distribution, the variance is undefined for 1 degree of freedom, infinite for 2 degrees of freedom, and finite for at least 3 degrees of freedom.

In [12]:
StudentT(df=1).var()
Out[12]:
nan
In [13]:
StudentT(df=2).var()
Out[13]:
inf
In [14]:
StudentT(df=3).var()
Out[14]:
3.0

Quantile

For a named Distribution, calling Distribution.quantile(p) returns the $p$th quantile $Q(p)$.

Roughly, the value $x$ is the $p$th percentile (or quantile) of a distribution of a variable if $p$ percent of values of the variable are less than or equal to $x$. For example, saying that 630 is the 90th percentile of SAT Math scores means that 90% of scores are 630 or less.

The cumulative distribution function (cdf) of a RV fills in the blank in the following: $x$ is the [blank] percentile. On the other hand, the quantile function — essentially the inverse cdf — fills in the blank in the following: the $p$th percentile is [blank].

If a cdf $F$ is continuous and strictly increasing on its the range of possible values, then its inverse $F^{-1}:[0,1]\mapsto\mathbb{R}$ exists and is called the quantile (or percentile) function: $Q(p) = F^{-1}(p)$ for $p\in[0, 1]$. That is, if $x^*=Q(p)$ then $F(x^*)=p$.

Example. Scores on the SAT Math exam have an approximate Normal distribution with mean 500 and standard deviation 100. The 90th percentile of scores is about 630, meaning that about 90% of scores are 630 or less.

In [15]:
Normal(mean=500, sd=100).quantile(0.9)
Out[15]:
628.15515655446006
In [16]:
Normal(mean=500, sd=100).cdf(628.155)
Out[16]:
0.89999972524925842

For many continuous distributions, the quantile function can be defined as the inverse cdf. However, in some cases the inverse cdf is not well defined. This is true in particular for discrete distributions, since the cdf of a discrete distribution is a step function. For example, for a Binomial(5, 0.5) distribution, $F(x) = 0.5$ for $2\le x <3$, so there is not a unique value $x^*$ which satisfies $F(x^*)=0.5$. Also, $F(3) = 0.8125$ there is no value $x^*$ for which $F(x^*) = p$ for $0.5 < p < 0.8125$.

In [17]:
Binomial(5, 0.5).cdf(2)
Out[17]:
0.5
In [18]:
Binomial(5, 0.5).cdf(2.3)
Out[18]:
0.5
In [19]:
Binomial(5, 0.5).cdf(3)
Out[19]:
0.8125

To account for such situations in which the inverse cdf is not well defined, the quantile (or percentile) function $Q:[0,1]\mapsto\mathbb{R}$ is defined as

$$ Q(p) = \inf\{x: F(x) \ge p\}, \qquad \text{for } 0\le p \le 1. $$

When $F^{-1}$ exists then $Q=F^{-1}$.

Example. For a Binomial(5, 0.5) distribution, $F(x) = 0.5$ for $2\le x <3$, and $F(x)<0.5$ for $x < 2$, so $Q(0.5) = 2$. Also, since $F(3)=0.8125$ then $Q(p) = 3$ for $0.5 < p \le 0.8125$

In [20]:
Binomial(n=5, p=0.5).quantile(0.5)
Out[20]:
2.0
In [21]:
Binomial(n=5, p=0.5).quantile(0.6)
Out[21]:
3.0
In [22]:
Binomial(n=5, p=0.5).quantile(0.75)
Out[22]:
3.0
In [23]:
Binomial(n=5, p=0.5).quantile(0.8126)
Out[23]:
4.0

Median

Roughly, the median is the 50th percentile of a distribution. If a distribution has quantile function $Q$, then $Q(0.5)$ is the median. For a named Distribution, calling Distribution.median() returns its median.

Example. For a Normal distribution, the median and the mean are equal.

In [24]:
Normal(mean=500, sd=100).median()
Out[24]:
500.0

Example. As discussed above, the median of a Binomial(5, 0.5) distribution is 0.5. For a Binomial($n$, $p$) distribution if the mean $np$ is a whole number, then the median is also equal to $np$.

In [25]:
Binomial(5, 0.5).median()
Out[25]:
2.0
In [26]:
Binomial(1000, 0.01).median()
Out[26]:
10.0

Example. The mean of a Cauchy distribution is undefined, but the median is equal to its location parameter.

In [27]:
Cauchy(loc=0, scale=1).median()
Out[27]:
0.0
In [ ]: