Symbulate Documentation

Probability Spaces

A probability space consists of a sample space of possible outcomes and a probability measure which specifies how to assign probabilities to related events. Several common probability spaces are available in Symbulate. Users can also define their own probability spaces.

  1. BoxModel: Define a simple box model probability space.
  2. Draw: Draw an outcome according to a probability model.
  3. ProbabilitySpace: Define more general probability spaces.
  4. Independent spaces: Combine independent probability spaces.

Be sure to import Symbulate during a session using the following commands.

In [1]:
from symbulate import *
%matplotlib inline

BoxModel

The probability space in many elementary situations can be defined via a "box model." To define a Symbulate BoxModel enter a list repesenting the tickets in the box. For example, rolling a fair six-sided die could be represented as a box model with six tickets labeled 1 through 6.

In [2]:
die = [1, 2, 3, 4, 5, 6]
roll = BoxModel(die)

The list of numbers could also have been created using range() in Python. Remember that Python indexing starts from 0 by default. Remember also that range gives you all the values, up to, but not including the last value.

In [3]:
die = list(range(1, 6+1)) # this is just a list of the number 1 through 6
roll = BoxModel(die)

Draw

BoxModel itself just defines the model; it does not return any values. (The same is true for any probability space.) The .draw() method can be used to simulate one draw from the BoxModel (or any probability space).

In [4]:
roll.draw()
Out[4]:
5

BoxModel options

  • box: A list of "tickets" to sample from.
  • size: How many tickets to draw from the box.
  • replace: True if the draws are made with replacement; False if without replacement
  • probs: Probabilities that the tickets are selected. By default, all tickets are equally likely.
  • order_matters: True if different orderings of the same tickets drawn are counted as different outcomes; False if the order in which the tickets are drawn is irrelevant.

Multiple tickets can be drawn from the box using the size argument.

In [5]:
BoxModel(die, size=3).draw()
Out[5]:
(4, 4, 3)

By default BoxModel assumes equally likely tickets. This can be changed using the probs argument, by specifying a probability value for each ticket.

Example. Suppose 32% of Americans are Democrats, 27% are Republican, and 41% are Independent. Five randomly selected Americans are surveyed about their political party affiliation.

This situation could be represented as sampling with replacement from a box with 100 tickets, 32 of which are Democrat, etc, from which 5 tickets are drawn. But rather than specifying a list of 100 tickets, we can just specify the three tickets and the corresponding probabilities with probs.

In [6]:
BoxModel(['D', 'R', 'I'], probs=[0.32, 0.27, 0.41], size=5).draw()
Out[6]:
('D', 'R', 'R', 'I', 'D')

The probs argument requires that the probabilities are already normalized to sum to 1. Non-normalized values can be handled by entering the tickets as a dictionary, specifying the label on each ticket and the number of tickets in the box with that label. Note that a dictionary is enclosed in braces {} rather than brackets [].

The following code is equivalent to the previous code which used the probs option.

In [7]:
BoxModel({'D': 32,'R': 27, 'I': 41}, size=5).draw()
Out[7]:
('I', 'I', 'D', 'R', 'R')

By default BoxModel assumes sampling with replacement; each ticket is placed back in the box before the next ticket is selected. Sampling without replacement can be handled with replace=False. (The default is replace=True.)

Example. Two people are selected at random from Anakin, Bella, Frodo, Harry, Katniss to go on a quest.

In [8]:
BoxModel(['A','B','F','H','K'], size=2, replace=False).draw()
Out[8]:
('A', 'F')

Note that by default, BoxModel returns ordered outcomes, e.g. ('A', 'B') is distinct from ('B', 'A'). To return unordered outcomes, set order_matters=False.

ProbabilitySpace

Symbulate has many common probability models built in. The ProbabilitySpace command allows for user defined probability models. The first step in creating a probability space is to define a function that explains how to draw one outcome.

Example. Ten percent of all e-mail is spam. Thirty percent of spam e-mails contain the word "money", while 2% of non-spam e-mails contain the word "money". Suppose an e-mail contains the word "money". What is the probability that it is spam?

We can think of the sample space of outcomes of pairs of the possible email types (spam or not) and wordings (money or not), with the probability measure following the above specifications. First we draw from a BoxModel to determine the email type. Then, depending on the result of the first draw, we draw from one of two BoxModels to determine the wording. The function spam_sim below encodes these specifications; note the use of .draw().

In [9]:
def spam_sim():
    email_type = BoxModel(["spam", "not spam"], probs=[.1, .9]).draw()
    if email_type == "spam":
        has_money = BoxModel(["money", "no money"], probs=[.3, .7]).draw()
    else:
        has_money = BoxModel(["money", "no money"], probs=[.02, .98]).draw()
    return email_type, has_money

A ProbabilitySpace can be created once the specifications of the simulation have been defined through a function.

In [10]:
P = ProbabilitySpace(spam_sim)
P.draw()
Out[10]:
('not spam', 'no money')

Commonly used probability spaces

Symbulate has many commonly used probability spaces built in. Here are just a few examples.

In [11]:
Binomial(n=10, p=0.5).draw()
Out[11]:
6
In [12]:
Normal(mean=0, sd=1).draw()
Out[12]:
0.3619289993227831
In [13]:
mean_vector = [0, 1, 2]
cov_matrix = [[1.00, 0.50, 0.25],
              [0.50, 2.00, 0.00],
              [0.25, 0.00, 4.00]]

MultivariateNormal(mean = mean_vector, cov = cov_matrix).draw()
Out[13]:
(-0.22749679873889656, 1.6296631715075314, 0.79580835264536565)

Independent probability spaces

Independent probability spaces can be constructed by multiplying (* in Python) two probability spaces. The product * syntax reflects that under independence joint probabilities are products of marginal probabilities: For example, events $A$ and $B$ are independent if and only if $P(A\cap B) = P(A)P(B)$.

Multiple independent copies of a probability space can be created by raising a probability space to a power (** in Python).

Example. Roll a fair six-sided die and a fair four-sided die.

In [14]:
die6 = list(range(1, 6+1, 1))
die4 = list(range(1, 4+1, 1))
rolls = BoxModel(die6) * BoxModel(die4)
rolls.draw()
Out[14]:
(4, 2)

Example. A triple of independent outcomes

In [15]:
(BoxModel(['H', 'T']) * Poisson(lam=2) * Exponential(rate=5)).draw()
Out[15]:
('H', 3, 0.17962033022575485)

Example. Four independent Normal(0,1) values.

In [16]:
P = Normal(mean=0, sd=1) ** 4
P.draw()
Out[16]:
(1.500833820300937,
 0.5960138782343144,
 -0.015339890428991629,
 -0.48063522961405397)

Infinitely many independent copies of a probability space can be created by raising the probabilty space to the inf power, i.e. ** inf

Example. Infinitely many independent Normal(0, 1) values.

In [17]:
P = Normal(mean=0, sd=1) ** inf
P.draw()
Out[17]:
(0.9051545133819178, -0.6795117235056013, -0.18059828981272197, -0.16974084974970854, -0.7238363446156592, -0.8130618472750923, -0.5101024347016073, 1.9570595463860194, 1.9213150322634953, 1.55599067233092, '...')
In [ ]: