The fairness debate

In this post, I provide a statistical analysis of my hole cards in the 7553 hands I have played during my 10,000 to 1,000,000 Bankroll Challenge to see if the observed frequencies of the 169 non-equivalent starting hands are consistent with a randomly shuffled deck.

Null hypothesis and result

The null hypothesis for this analysis is as follows:

The Replay shuffling algorithm generates a deck that follows a uniform distribution in the sense that each permutation of the 52 cards is equally likely.

I conduct a Pearson’s chi-squared test to see if the observed frequencies of the 169 non-equivalent starting hands are significantly different from the expected frequencies under this null hypothesis. The test finds that the null hypothesis cannot be rejected at the standard 5% significance level, i.e., the frequencies of my starting hands are consistent with a uniformly distributed shuffled deck.

Details below.

Combos

There are 1326 different starting hands (‘combos’) in Texas Hold’em. Under the null hypothesis, all 1326 starting hands are equally likely. If we just look at the ranks of our hole cards and whether or not they are suited or not but ignore the specific suits, we end up with 169 non-equivalent starting hands: 13 non-equivalent pairs, 78 non-equivalent offsuit starting hands, and 78 non-equivalent suited starting hands. There are 6 combos for each pair, 12 combos for each offsuit unpaired starting hand, and 4 combos of each suited starting hand. As consequence, the probability of being dealt

  • a specific pair (e.g., AA) is 6/1326 = 0.45%,
  • two specific offsuit hole cards (e.g., AKo) is 12/1326 = 0.90%,
  • two specific suited hole cards (e.g., AKs) is 4/1326 = 0.30%.

Expected frequencies

My sample consists of 7553 hands, so to get the expected frequencies for all 169 non-equivalent starting hands, I just need to multiply the above probabilities by 7553. Thus, the expected frequency (number of times among the 7553 hands) of being dealt

  • a specific pair (e.g., AA) is 34.2,
  • two specific offsuit hole cards (e.g., AKo) is 68.4,
  • two specific suited hole cards (e.g., AKs) is 22.8.

Observed frequencies

The following table shows the actual number of times each non-equivalent starting hand occurred in my sample.

Pearson’s chi-squared test

Pearson’s chi-squared test “is a statistical test applied to sets of categorical data to evaluate how likely it is that any observed difference between the sets arose by chance” (Wikipedia). The main outcome of a statistical test is the p-value. To quote Wikipedia:

the p-value is the probability of obtaining test results at least as extreme as the result actually observed, under the assumption that the null hypothesis is correct. A very small p-value means that such an extreme observed outcome would be very unlikely under the null hypothesis.

A null hypothesis is rejected if the p-value is smaller than some significance level chosen in advance of the test. Typical significance levels are 5% or 1%. Smaller values decrease the probability of erroneously rejecting the null hypothesis but increase the probability of erroneously not rejecting the null hypothesis.

For this analysis, I choose a standard 5% significance level.

Result

We have 169 non-equivalent starting hands as categories and I have reported the expected and observed frequencies for each category above. Applying the chi-squared test yields a p-value of 78.8%. This is (much) larger than the 5% significance level and thus we cannot reject the null hypothesis that the shuffling algorithm generates a uniformly distributed deck.

10 Likes