[RPG] How to calculate conditional probabilities in AnyDice

anydicestatistics

While writing the addendum to this answer, which considers the relative value of skill vs. characteristic in the "3d20 system" of Neuroshima, I found myself wanting an answer to a deceptively simple question: how many skill points are needed to succeed if the lowest roll is a natural success, vs. if it's not? In other words, I basically wanted to plot the distributions of:

the middle roll of 3d20, given that the lowest roll is less than some given threshold x; and
the sum of the lowest and middle rolls, given that the lowest roll is at least x.

In statistics, this would be just a standard conditional probability distribution, e.g. $$p_x(y) = P(Y = y \mid X < x),$$ $$q_x(z) = P(X + Y = z \mid X \ge x),$$ where \$X\$ and \$Y\$ are (interdependent) random variables representing the lowest and the middle roll of 3d20 respectively. You could compute this easily just by taking the joint distribution of \$(X,Y)\$, dropping those cases where the condition (e.g. \$X < x\$) fails, rescaling the remaining probabilities so that they sum to 1 and then optionally summing over the conditioning variable \$X\$ to obtain the marginal distribution of \$Y\$ (or \$X + Y\$).

Unfortunately, there seems to be no simple built-in way to do this in AnyDice. In fact, there doesn't even seem to be any way to answer simpler conditional probability questions like, say "what is the average sum of 3d6 if the sum rolled is even, vs. if it's odd?"

So, hence this question: Is there any way to calculate a conditional probability distribution in AnyDice, and if so, how?

Disclaimer: I realize that this question may be borderline off-topic for this site, as it's more of a programming / math question. That said, it did arise in an RPG-related context — specifically, while writing an answer here on RPG.SE — and I suspect the answer(s) may be useful to others using AnyDice to answer similar questions about other systems as well. I'll let the community decide if this Q&A should stay here or not.

Also, I did eventually manage to come up with a (slightly hacky but workable) solution to my problem on my own, so I've posted a self-answer below. That said, other answers are more than welcome too. If there's a better way to achieve this, I would very much like to know it.

Best Answer

Use the "empty die" result to disregard cases that don't meet criteria

If we want to completely disregard a certain subset of results, we can do this by using a function which returns the "empty die", d{}, for cases which do not meet our desired conditions.

The empty die d{} appears to be a special die which has no possible results and no associated probability. Consequently, if we define a function which returns this empty die for certain input cases, it is effectively removing those cases from the set of possible results, and the result distribution that comes back from the function is as if the unwanted cases were never invoked.

Here's a simple function which simply limits the received input to a set of allowed values and discards cases which don't satisfy that condition:

function: if X:n in RESTRICT:s {
  if X = RESTRICT { result: X }
  result: d{}
}

Given an input X, if X can be found in the sequence of allowed values RESTRICT, all is well and we return X; otherwise, we return d{}, assigning zero probability to that particular outcome. We can use this function to restrict a 3d6 roll to only odd or even values:

output [if 3d6 in {3,5,7,9,11,13,15,17}] named "3d6 if odd"
output [if 3d6 in {4,6,8,10,12,14,16,18}] named "3d6 if even"

And we get a result that looks like this:

Anydice graph of 3d6 restricted to odd or even

This obviously extends to more interesting cases, such as the Neuroshima rules given in the question. Here's a program which shows examples of those distributions:

function: INDEX:s at DICE:s if lowest less than MIN:n {
  if (#DICE@DICE >= MIN) { result: d{} }
  result: INDEX@DICE
}

function: INDEX:s at DICE:s if lowest at least MIN:n {
  if (#DICE@DICE < MIN) { result: d{} }
  result: INDEX@DICE
}

MIN: 10

output [2 at 3d20 if lowest less than MIN] named "Middle die of 3d20 if lowest die less than [MIN]"
output [{2,3} at 3d20 if lowest at least MIN] named "Middle and lowest die of 3d20 if lowest die at least [MIN]"

These functions first discard cases which don't meet the specified condition and then give us the values we care about from the remaining dice sequences.

You could of course also approach that problem the other way round and define functions which map undesired results to a bogus value (like -1) and then pipe that through a filtering function at the end which strips out any results with the bogus value, though doing the filtering as early as possible is I think more efficient in Anydice and will probably let you get away with running more complex programs/larger dice pools.

Background

I hit upon this empty die trick while working on this answer to another question. Essentially, I wrote a simple function that would recursively reroll 4d6-droplow until it got an 8 or better, but I realised on inspection that the result distribution it returned didn't change no matter what I set the maximum function depth to.

In Anydice, as the documentation says, exceeding the maximum function depth simply causes the function to return the empty die, and I figured out from there that meant the empty die was essentially a zero-probability result which does not affect the final result distribution, and that we can return it on purpose (rather than accidentally by exceeding function depth) if we want to disregard some category of inputs!

Overall this seems to benefit the first player, but it depends how you measure it.

I thought I would model this programatically as it will be a bit more flexible than using AnyDice. I've written a script which carries out the process a large number of times and averages the values for each player.

The main difficulty here is how you actually interpret the data: what makes one array of ability scores better than another one? There are multiple ways to judge this. I'll include some values for different methods.

Total of all ability scores
Player 1 has mean ability score value 72.95522.
Player 2 has mean ability score value 73.43975.
Player 3 has mean ability score value 73.64131.
Player 4 has mean ability score value 73.72761.
Verdict: Being later in the order is better

Total of all ability scores except the lowest one
Player 1 has mean ability score value 66.49585.
Player 2 has mean ability score value 65.71347.
Player 3 has mean ability score value 65.10333.
Player 4 has mean ability score value 64.57315.
Verdict: Being earlier in the order is better

Total points buy cost of all ability scores
(I've used invented points buy scores for numbers outside the normal allowed range: 18 = 19, 17 = 15, 16 = 12, 6-7 = -1, 4-5 = -2, 3 = -1)
Player 1 has mean ability score value 33.43838.
Player 2 has mean ability score value 31.80692.
Player 3 has mean ability score value 31.00389.
Player 4 has mean ability score value 30.64647.
Verdict: Being earlier in the order is better

Total points buy cost of all ability scores except the lowest one
Player 1 has mean ability score value 34.33727.
Player 2 has mean ability score value 31.88918.
Player 3 has mean ability score value 30.39722.
Player 4 has mean ability score value 29.45149.
Verdict: Being earlier in the order is much better

I've included my horrible amateur code here so you can try it out.

#!/usr/bin/perl

use strict;
use warnings;
use List::Util qw(sum);
use Data::Dumper;
use POSIX;
use 5.010;

my $die_size = 6;
my $number_of_dice = 4;
my $number_of_players = 4;
my $number_of_runs = 10000;

sub get_single_ability_score {
    my @rolls;
    for (1..$number_of_dice) {
        my $roll = 1 + int rand($die_size);
        push @rolls, $roll;
    }
    @rolls = sort {$a <=> $b} @rolls;
    for (1..$number_of_dice - 3) {
        shift @rolls;
    }
    my $ability_score = sum(@rolls);
    return $ability_score;
}

sub get_total_values {
    my @group_ability_scores;
    for (1..$number_of_players * 6) {
        push @group_ability_scores, get_single_ability_score();
    }
    @group_ability_scores = sort { $b <=> $a } @group_ability_scores;
    my @player_order = (1..$number_of_players);
    my @reverse_player_order = sort { $b <=> $a } @player_order;
    @player_order = (@player_order, @reverse_player_order, @player_order, @reverse_player_order, @player_order, @reverse_player_order);
    my @player_ability_scores;
    foreach my $player (@player_order) {
        my @ability_scores;
        my $chosen_ability_score = shift @group_ability_scores;
        push @ability_scores, $chosen_ability_score;
        push @{ $player_ability_scores[$player-1] }, @ability_scores;
    }
    my @total_values;
    foreach my $player (1..$number_of_players) {
        my @ability_scores = sort { $a <=> $b } @{ $player_ability_scores[$player-1] };
        my $total_value = 0;
        shift @ability_scores; # One dump stat is fine so discard the lowest ability score
        foreach my $ability_score (@ability_scores) {
            #$total_value += $ability_score;                    # Uses the score as the value
            #$total_value += floor(($ability_score - 10) / 2);  # Uses the modifier as the value
            given ($ability_score) {
                when ($_ == 18) {$total_value += 19}
                when ($_ == 17) {$total_value += 15}
                when ($_ == 16) {$total_value += 12}
                when ($_ == 15) {$total_value +=  9}
                when ($_ == 14) {$total_value +=  7}
                when ($_ == 13) {$total_value +=  5}
                when ($_ == 12) {$total_value +=  4}
                when ($_ == 11) {$total_value +=  3}
                when ($_ == 10) {$total_value +=  2}
                when ($_ ==  9) {$total_value +=  1}
                when ($_ ==  8) {$total_value +=  0}
                when ($_ ==  7) {$total_value += -1}
                when ($_ ==  6) {$total_value += -2}
                when ($_ ==  5) {$total_value += -3}
                when ($_ ==  4) {$total_value += -4}
                when ($_ ==  3) {$total_value += -5}
            }
        }
        push @total_values, $total_value;
    }
    return @total_values;
}

my @all_values;
for (1..$number_of_runs) {
    my @total_values = get_total_values();
    foreach my $player (1..$number_of_players) {
        push @{ $all_values[$player-1] }, $total_values[$player-1];
    }
}

for my $player (1..$number_of_players) {
    my $total_value;
    foreach my $value (@{ $all_values[$player-1] }) {
        $total_value += $value;
    }
    my $mean_value = $total_value / $number_of_runs;
    print "Player $player has mean ability score value $mean_value.\n";
}

Try it online!

Best Answer

Use the "empty die" result to disregard cases that don't meet criteria

Background

Related Solutions

[RPG] How to execute code depending on a die value in AnyDice

[RPG] How to model this “Party Draft Pool” ability score generation method in AnyDice

Overall this seems to benefit the first player, but it depends how you measure it.

Related Topic