Symbulate Documentation

Simulation tools

There are several tools available for simulating outcomes from a probability space (or values of random variables or random processes) and analyzing the results. In general, the methods below can be chained, one after the other, e.g. P.sim(1000).apply(sum).tabulate().

  1. Simulate: .sim() simulates outcomes of a probability model, or values of random variables or process
  2. Get: .get() returns the results of a particular realization of the simulation.
  3. Apply: .apply() applies a function to each outcome or value
  4. Tabulate: .tabulate() creates a table summary of simulated outcomes of a probability model or simulated values of random variables
  5. Filter: .filter() and its relatives create subsets of the values which satsify some criteria.
  6. Count: .count() and its relatives count the values which satsify some criteria.
  7. Summary: Examples using tabulate, filter, and count to manipulate simulation results.

Be sure to import Symbulate using the following commands.

In [1]:
from symbulate import *
%matplotlib inline

Simulate

The .draw() extension simulates one outcome from a probability space. Many draws can be simulated .sim(); the argument is the number of outcomes or values to simulate.

Example. Simulate 100 repetitions of rolling two fair six-sided dice; each repetition involves a pair of values.

In [2]:
die = list(range(1, 6+1)) # this is just a list of the number 1 through 6
roll = BoxModel(die, size = 2)
roll.sim(100)
Out[2]:
Index Result
0(3, 6)
1(1, 5)
2(6, 3)
3(5, 5)
4(3, 2)
5(6, 1)
6(5, 3)
7(5, 1)
8(6, 1)
......
99(3, 6)

Caution: Note that every time .sim() is called new values are simulated. Store simulated values as a variable in order to perform multiple operations in different lines of code on the same set of values.

Example. Ten percent of all e-mail is spam. Thirty percent of spam e-mails contain the word "money", while 2% of non-spam e-mails contain the word "money". Simulate the email status (spam or not) and wording (money or not) for 1000 emails.

In [3]:
def spam_sim():
    email_type = BoxModel(["spam", "not spam"], probs=[.1, .9]).draw()
    if email_type == "spam":
        has_money = BoxModel(["money", "no money"], probs=[.3, .7]).draw()
    else:
        has_money = BoxModel(["money", "no money"], probs=[.02, .98]).draw()
    return email_type, has_money

P = ProbabilitySpace(spam_sim)
sims = P.sim(1000)
sims
Out[3]:
Index Result
0('not spam', 'no money')
1('not spam', 'no money')
2('not spam', 'no money')
3('not spam', 'no money')
4('not spam', 'no money')
5('not spam', 'no money')
6('not spam', 'money')
7('not spam', 'no money')
8('not spam', 'no money')
......
999('not spam', 'no money')

Get

The outcome of a particular repetition of the simulation can be accessed with .get(). Recall that Python starts an index at 0, so .get(0) returns the first simulated value, .get(1) the second, etc.

In [4]:
die = list(range(1, 6+1)) # this is just a list of the number 1 through 6
roll = BoxModel(die, size = 2)
sims = roll.sim(4)
sims
Out[4]:
Index Result
0(5, 6)
1(4, 2)
2(3, 2)
3(2, 6)
In [5]:
sims.get(0)
Out[5]:
(5, 6)
In [6]:
sims.get(2)
Out[6]:
(3, 2)

Apply

Use .apply() to apply a function to each outcome of a simulation.

Example. Roll two fair six-sided dice and compute their sum.

In [7]:
die = list(range(1, 6+1)) # this is just a list of the number 1 through 6
roll = BoxModel(die, size = 2)
roll.sim(1000).apply(sum)
Out[7]:
Index Result
010
17
29
32
48
57
64
711
87
......
9997

User defined functions can also be applied.

Example. Ten cards labeled 1, 2, $\ldots$, 10 are shuffled and dealt one at a time. Find the the probability that the number on the card matches its position in the deal for at least one of the cards. (For example, a match occurs if card 3 is the third card dealt.)

In [8]:
n = 10
labels = list(range(n)) # remember, Python starts the index at 0, so the cards are labebeled 0, ..., 9
P = BoxModel(labels, size = n, replace = False)
sims = P.sim(10000)
sims
Out[8]:
Index Result
0(1, 4, 6, 9, 2, 3, 0, 7, 8, 5)
1(0, 2, 4, 7, 5, 6, 8, 9, 1, 3)
2(7, 5, 4, 8, 1, 0, 2, 9, 6, 3)
3(4, 0, 9, 3, 7, 8, 6, 2, 5, 1)
4(9, 7, 0, 8, 4, 6, 3, 2, 5, 1)
5(2, 7, 8, 3, 4, 5, 1, 9, 0, 6)
6(7, 9, 4, 3, 8, 0, 5, 6, 1, 2)
7(9, 4, 7, 3, 2, 5, 1, 8, 6, 0)
8(2, 1, 0, 6, 4, 7, 8, 3, 9, 5)
......
9999(4, 9, 6, 7, 8, 0, 2, 5, 1, 3)
In [9]:
def is_match(x):
    for i in range(n):
        if x[i] == labels[i]:
            return 'At least one match'
    return 'No match'
    
sims.apply(is_match)
Out[9]:
Index Result
0At least one match
1At least one match
2No match
3At least one match
4At least one match
5At least one match
6At least one match
7At least one match
8At least one match
......
9999No match

Tabulate

The outcomes of a simulation can be quickly tabulated using .tabulate(). Tabulate counts the number of times each outcome occurs among the simulated values. Use .tabulate(normalize=True) to find the proportion (relative frequency) of times each outcome occurs.

Example. Roll two fair six-sided.

In [10]:
die = list(range(1, 6+1, 1)) # this is just a list of the number 1 through 6
roll = BoxModel(die, size=2)
rolls = roll.sim(10000)
rolls.tabulate()
Out[10]:
Outcome Value
(1, 1)272
(1, 2)281
(1, 3)262
(1, 4)267
(1, 5)292
(1, 6)274
(2, 1)278
(2, 2)259
(2, 3)277
(2, 4)275
(2, 5)268
(2, 6)278
(3, 1)276
(3, 2)292
(3, 3)248
(3, 4)274
(3, 5)275
(3, 6)252
(4, 1)284
......
(6, 6)284
Total10000

Now sum the dice, and approximate the distribution of the sum using the normalize option.

In [11]:
rolls.apply(sum).tabulate(normalize=True)
Out[11]:
Outcome Value
20.0272
30.0559
40.0797
50.112
60.1419
70.1698
80.1395
90.1072
100.086
110.0524
120.0284
Total0.9999999999999999

Individual entries of the table can be referenced using .tabulate()[label] where label represents the value of interest.

In [12]:
rolls.tabulate()[(2,4)]
Out[12]:
275
In [13]:
roll_sum = rolls.apply(sum).tabulate(normalize=True)
roll_sum[10] + roll_sum[11] + roll_sum[12]
Out[13]:
0.1668

By default, tabulate only tabulates those outcomes which are among the simulated values, rather than all possible outcomes. An argument can be passed to .tabulate() to tabulate all outcomes in a given list.

In [14]:
die = list(range(1, 6+1, 1)) # this is just a list of the number 1 through 6
rolls = BoxModel(die).sim(5)
rolls.tabulate(die)
Out[14]:
Outcome Value
11
20
31
40
51
62
Total5
In [15]:
# Compare with
rolls.tabulate()
Out[15]:
Outcome Value
11
31
51
62
Total5

By default, the outcomes in the table produced by .tabulate() are in alphanumeric order. A list can be passed to .tabulate() to achieve a specified order.

In [16]:
BoxModel(['a', 'b', 1, 2, 3]).sim(10).tabulate([3, 'a', 2, 'b', 1])
Out[16]:
Outcome Value
32
a2
22
b2
12
Total10

Filter

You can get the subset of simulations equal to a particular value using .filter_eq().

In [17]:
Heads = BoxModel(['H','T']).sim(10000).filter_eq('H')
Heads
Out[17]:
Index Result
0H
1H
2H
3H
4H
5H
6H
7H
8H
......
5015H

Using len (length) with the filter functions is one way to count the simulated occurrences of outcomes which satisfy some criteria.

In [18]:
len(Heads)
Out[18]:
5016

In addition to .filter_eq(), the following filter functions can be used when the values are numerical.

  • .filter_neq() subsets the values not equal to the argument
  • .filter_lt() subsets the values less than the argument
  • .filter_leq() subsets the values less than or equal to the argument
  • .filter_gt() subsets the values greater than the argument
  • .filter_geq() subsets the values greater than or equal to the argument
In [19]:
die = list(range(1, 1+6)) # this is just a list of the number 1 through 6
sims = BoxModel(die, size=2).sim(1000).apply(sum)
len(sims.filter_geq(10)) / 1000
Out[19]:
0.162

You can also define your own custom filter function. Define a function that returns True for the outcomes you want to keep, and pass the function into .filter(). For example, the following code is equivalent to using .filter_geq(10).

In [20]:
def greater_than_or_equal_to_10(x):
    return x >= 10

len(sims.filter(greater_than_or_equal_to_10)) / 1000
Out[20]:
0.162

Count

You can count the number of simulated outcomes equal to a particular value using count_eq().

In [21]:
BoxModel(['H','T']).sim(10000).count_eq('H')
Out[21]:
4997
In [22]:
die = list(range(1, 6+1, 1)) # this is just a list of the number 1 through 6
roll = BoxModel(die, size = 2)
rolls = roll.sim(10000)
rolls.count_eq((2,4))
Out[22]:
304

In addition to .count_eq(), the following count functions can be used when the values are numerical.

  • .count_neq() counts the values not equal to the argument
  • .count_lt() counts the values less than the argument
  • .count_leq() counts the values less than or equal to the argument
  • .count_gt() counts the values greater than the argument
  • .count_geq() counts the values greater than or equal to the argument
In [23]:
rolls.apply(sum).count_geq(10) / 10000
Out[23]:
0.1632

You can also count the number of outcomes which satisfy some criteria specified by a user defined function. Define a function that returns True for the outcomes you want to keep, and pass the function into .count(). For example, the following code is equivalent to using .count_geq(10).

In [24]:
def greater_than_or_equal_to_10(x):
    return x >= 10

rolls.apply(sum).count(greater_than_or_equal_to_10) / 10000
Out[24]:
0.1632

Counting can also be accomplished by creating logical (True = 1, False = 0) values according to whether an outcome satisfies some criteria and then summing.

In [25]:
rollsums = rolls.apply(sum)
sum(rollsums >= 10) / 10000
Out[25]:
0.1632

Since a mean (average) is the sum of values divided by the number of values, changing sum to mean in the above method returns the relative frequency directly (without having to divide by the number of values).

In [26]:
mean(rollsums >= 10)
Out[26]:
0.1632

Summary

The tabulate, filter, and count functions in Symbulate allow for several ways of manipulating the outcomes of a simulation. As an example, here is a recap of some of the ways Symbulate can be used to estimate the probability that the sum of two fair six-sided dice is at least 10.

In [27]:
die = list(range(1, 6+1, 1)) # this is just a list of the number 1 through 6
roll = BoxModel(die, size = 2)
rolls = roll.sim(10000)
rollsums = rolls.apply(sum)
In [28]:
(rollsums.tabulate()[10] + rollsums.tabulate()[11] + rollsums.tabulate()[12]) /10000
Out[28]:
0.166
In [29]:
(rollsums.tabulate(normalize=True)[10] + rollsums.tabulate(normalize=True)[11] + rollsums.tabulate(normalize=True)[12])
Out[29]:
0.16599999999999998
In [30]:
len(rollsums.filter_geq(10)) / 10000
Out[30]:
0.166
In [31]:
rollsums.count_geq(10) / 10000
Out[31]:
0.166
In [32]:
sum(rollsums >= 10) / 10000
Out[32]:
0.166
In [33]:
mean(rollsums >= 10)
Out[33]:
0.166
In [34]:
rollsums = RV(roll, sum)
(rollsums >= 10).sim(10000).tabulate(normalize=True)
Out[34]:
Outcome Value
False0.8342
True0.1658
Total1.0
In [ ]: