Finding Equilibrium

BlackWidow · January 29, 2022, 10:58am

Well, you can assume that if you want to but it’s not necessary or beneficial for calculating the Nash equilibrium of the one-shot game.

Having strategies for P1 and P2 which over a sequence of games produce the following sequence of expected profits

Game	P1 SB	P2 SB	P1 SB	P2 SB	P1 SB	P2 SB	…
P1 expected profit including paid blinds	0	0	0	0	0	0	…
P2 expected profit including paid blinds	0	0	0	0	0	0	…

is not the same as strategies which produce this sequence:

Game	P1 SB	P2 SB	P1 SB	P2 SB	P1 SB	P2 SB	…
P1 expected profit including paid blinds	0.5	-0.5	0.5	-0.5	0.5	-0.5	…
P2 expected profit including paid blinds	-0.5	0.5	-0.5	0.5	-0.5	0.5	…

Both give zero EV (including paid blinds) in the long run, but of course if a player can make 0.5 on average in the SB (even against optimal play by the BB), then they should use that strategy rather than the one that gives 0.

BlackWidow · January 29, 2022, 11:32am

Without going through all the calculations in detail, here are the most important numbers for my candidate Nash equilibrium described above:

Player 1 (Player 1 does not play a mixed strategy, so action EV = averaged action EV)
Action EV of calling with an A: 10/3
Action EV of calling with a K: 1/3
Action EV of folding a Q: 0

Thus:

Initial spot EV of P1: 1/3 * 10/3 + 1/3 * 1/3 + 1/3 * 0 = 11/9 > 1

Player 2

If P1 folds (probability 1/3):

Spot EV: 3 (the initial pot)
If P1 calls (probability 2/3):
Action EV of raising an A: 2/3 * 4 + 1/3 * 6 = 14/3
Action EV of checking back a K: 0
Action EV of raising a Q: 0
Action EV of checking back a Q: 0
Average action EV of a Q: 0

To compute the spot EV, we need to compute the probability that P2 has an A in this spot. Since P2 can have an A only if P1 has a K and P2 is equally likely to have an A or a Q if P1 has a K, the probability that P2 has an A in this spot is 25%. Thus:

Spot EV: 25% * 14/3 + 75% * 0 = 7/6

So before P1’s first action, P2’s average spot EV is

1/3 * 3 + 2/3 * 7/6 = 16/9 < 2

In conclusion, with these strategies, the player in the SB/Button wins on average 2/9 over the initial $1 SB and the player in the BB accordingly loses 2/9 relative to their initial $2 BB.

SunPowerGuru · January 29, 2022, 11:56am

I would rather not assume anything.

How does the concept of a mixed strategy apply to a “one-shot” game? Doesn’t “mixed” by definition, imply multiple iterations?

How does one exploit a one-shot game? Is it even possible to exploit a weakness you have never detected, or once detected, the game has ended? Without the possibility of exploitation, what does “unexploitable” even mean?

I’m going to bed, have fun kids.

Craig_Anthony · January 29, 2022, 12:58pm

I’ve been reading this for over a week now. Doesn’t seem like a candidate for Nash Equilibrium but what do I know ….

BlackWidow · January 29, 2022, 11:28pm

A mixed strategy is just a strategy which can randomize the choice of actions taken in each node of the decision tree. As such, it can also be applied in a single game; just throw a die or use a random number generator to decide which action to take.

If you know (parts of) your opponent’s strategy, then you can try to construct exploitative counter-strategies. If you don’t know it at all, then you can’t exploit your opponent in a one-shot game and it is probably a good idea to follow the Nash strategy which guarantees that you can’t be exploited yourself.

Detecting exploitable weaknesses is indeed based on observing betting patterns, frequencies, and bet sizes over many hands. How to implement weakness detection and to construct effective counter-strategies is another very interesting question, but it’s not directly relevant for the construction of Nash equilibria.

SunPowerGuru · January 30, 2022, 5:42pm

I don’t think so. Player 1, who starts in the SB, should use that strategy, because he oscillates between winning and breaking even. If the series ends when he is up, he will show a small profit.

P2, on the other hand, oscillates between losing and breaking even, so he would be better off using the all zeros strategy. Doing so, he has no chance of booking a losing session.

This, of course, is even more correct if you are only playing 1 hand.

And thanks for taking the time to explain basic terms like “mixed strategy.” I do understand the terms, my point was that they are more or less meaningless in a one hand game.

SunPowerGuru · January 30, 2022, 6:10pm

OK, so I go to the store and buy a can marked “Mixed Nuts.” I take it home and, to my great dismay, discover that the can contains one peanut. One peanut is not “mixed nuts.”

I buy 3 more cans of “Mixed Nuts,” and find that one contains 1 cashew, one contains 1 almond, and the last one contains 1 pecan. None of these cans, in isolation, contains mixed nuts. It’s only when we view them in the larger context of a group that they can be called “mixed.”

Now Schrödinger would probably argue that each unopened can contains an indeterminant superposition of all possible nuts, and that opening each can caused the probabilistic wave function to collapse into a stable determinant eigenstate.

If he tried to pull that with me, I would say, “Yeah, whatever scooter,” roll my eyes, then eat the cashew before someone else grabbed it.

And yes, I do know that I should just leave this whole thing alone, I really do, but I am having fun with it, so meh.

BlackWidow · February 22, 2022, 11:44pm

Game 2: The preflop raise/shove/fold game

I will present here another toy game. This is a slight modification of a game described in Will Tipton’s excellent book “Expert Heads Up No Limit Hold’em” (Volume 1).

2 Players in the SB and BB paying 0.5 bb and 1 bb blinds, respectively.
Both players have initial stack sizes of S bb (before posting the blinds).
The deck is a standard 52 card deck and both players are dealt two cards from the deck at random.
The betting structure is fixed as follows:
- The SB can either fold or raise to R bb; here, R is some given number and can’t be controlled by the SB for the purposes of this game.
- If the SB raises to R, then the BB can either fold or raise all-in.
- If the BB raises all-in, then the SB can either fold or call. If the SB calls, flop, turn, and river are dealt without further betting rounds, and the hand goes to showdown.

The betting structure can be visualized in a decision tree as follows:

My aim in this post is to derive the SB’s optimal strategy in terms of maximizing the expected value of their stack after the end of the hand, given the strategy of the BB. In other words, I want to find the SB’s maximally exploitative strategy given the strategy of the BB. In later posts, I will try to describe an approximate Nash equilibrium for this game.

Note: In this post, I will follow Will Tipton’s usage of the term EV as the expected value of a player’s stack after the end of the hand. This usage of the term EV is somewhat nonstandard but quite convenient for computations.

I first need to describe the BB’s strategy. The BB’s only decision point is in Node C, where they can either fold or go all-in. I will assume here that the BB shoves at Node C with the following hands, which contain 526 combos or 39.67% of all combos:
bb_shoving_range

The basic idea is now to work backwards through the tree and compute the SB’s EV at each node:

At nodes where the hand has ended because one of the players has folded (Nodes B, D, and F), the SB’s stack is deterministic and given in the tree diagram above.
At nodes where both players are all-in (Node G), the pot is 2S and the SB will win 2S*EQ on average, where EQ denotes the equity of the SB’s specific hand against the BB’s shoving range. Thus, the SB’s EV in Node G is 2S*EQ.
At nodes where the SB is to act (Nodes A and E), the SB seeks to maximize their EV and hence chooses the action which has the largest EV. Thus, the SB’s EV in those nodes will be the maximum of the SB’s EVs in all child nodes.
At nodes where the BB is to act (Node C), the SB’s EV is the weighted average of the EVs in all child nodes, weighted by the frequencies with which the BB takes different actions.

Let’s start with the leaf nodes:

Node B:
If the SB open-folds, their stack will be S-0.5 (giving up the small blind).

Node D:
If the SB raises and the BB folds, the SB’s stack will be S+1 (winning the big blind).

Node F:
If the SB raises, the BB shoves, and the SB in turn folds, the SB’s stack will be S-R.

Node G:
If both players are all-in, the SB’s EV will be 2S*EQ, as explained above.

Having the SB’s EV determined in all leaf nodes, we can now work backwards through the remaining nodes:

Node E:
The SB chooses to call if 2S * EQ is larger than S-R or, equivalently, if EQ > (S-R)/(2S). For example, if the initial raise is to R=2.5 bb, then the minimum equity required to call as a function of the initial stack size S looks as follows:
minimum_eq

Let’s assume in the following that the initial raise is to R=2.5 and that the initial stacks are S=10. Then the minimum equity needed to make the SB prefer calling over folding at Node E is 37.5%. To find the hands which have at least 37.5% equity against the BB’s shoving range, we can look at the equity matrix against that shoving range:

It turns out that the following hands all have at least 37.5% equity against the BB’s shoving range:
sb_calling_range

(This range contains 718 combos or 54.15% of all combos.)

The SB’s EV in Node E is thus

S-R = 7.5 for all hands that fold.
2S * EQ for all hands that call, where EQ is the equity of the SB’s specific hand and can be read off from the equity matrix above.

We can also write this as

EV(Node E) = max(S-R, 2S*EQ).

Node C:
This is a decision node for the BB. Thus, the SB’s EV is simply the weighted average of the child node EVs, weighted by the frequencies with which the BB takes the two actions. Recall that we assume that the BB is shoving p=39.67% of the time in Node C. Thus, the SB’s EV in Node C is

EV(Node C) = (1-p)*EV(Node D) + p*EV(Node E) = (1-p)*(S+1) + p*max(S-R, 2S*EQ).

Node A:
Finally, the SB needs to decide which hands to open-raise with and which hands to open-fold. To maximize their EV, the SB open-raises any hand for which EV(Node C) > EV(Node B) = S-0.5 and open-folds all other hands. For the specific values of R=2.5 and S=10, it turns out that EV(Node C) > EV(Node B) always, so that the SB’s optimal strategy in Node A is to open-raise 100% of their hands. Indeed, the following graph shows the SB’s EVs in Nodes B, C, and E as a function of the SB’s equity against the BB’s shoving range:

We see that EV(Node C) lies above EV(Node B) no matter what the SB’s EQ is. Thus, it’s optimal to open-raise 100% of hands and

EV(Node A) = EV(Node C) = (1-p)*(S+1) + p*max(S-R, 2S*EQ).

Summary
For the given parameters R=2.5 and S=10 and the given BB strategy, the SB’s optimal strategy can be summarized as follows:

The SB open-raises 100% of hands.
Against a BB shove, the SB calls with all hands that have at least 37.5% equity against the BB shoving range.

Outlook
These strategies do not form a Nash equilibrium. The BB can now in turn try to employ a maximally exploitative strategy against this SB strategy. This will likely include an increase of the BB shoving frequency. Then, the SB can again find another maximally exploitative strategy against this new BB strategy and so on… until some equilibrium or approximate equilibrium is reached and no player can unilaterally improve their EV anymore by changing their strategy.

Yorunoame · February 23, 2022, 4:40am

Wow… what a post. I suspect I’ll re-read this quite a few times. Thank you for taking the time to put this together.

johnlittle · February 23, 2022, 9:13am

What a challange for me. I have to sort out syntax and sematic. Will definitly take a while but I am looking forward too reading more posts like this. Thanks a lot.

Craig_Anthony · February 23, 2022, 2:31pm

Wow !!! I need a new brain processor. Information overload. Job well done

BlackWidow · February 23, 2022, 5:33pm

Thanks, it’s good to see that some of you find this interesting. I’m currently working through Tipton’s book and going through slightly different examples and writing the arguments and results down in my own words is very useful for me to make sure that I really understand everything.

Gulf_gator · February 23, 2022, 6:35pm

BW Your SB’s strategy work is well done- impressive. Above My pay grade but reminds Me of My Dad whom worked with the invisible From Youth to Retirement— In later Years when He had a problem in His GE Design lab that needed a solution … He would leave a pen and notepad by the bed and doze off thinking of the snag or hiccup He encountered. Often during the night or first thing in morning would write solution down while still a fresh memory. Truly All the Best. GG

Dorkupine · February 23, 2022, 7:28pm

Ugh. it’s important to read to the end. I’m reading through this thinking "that ‘open raise every hand’ decision is dependent on the assumptions you’ve made for R, S and p. if p changes, that may not be right anymore… " Then lo and behold, here you are saying just that. I should know better than to doubt…

Rather than assuming some back-and-forth exploitative dance to get to equilibrium, maybe the better problem to solve given this is what is the BB’s p value that maximizes their EV value against the optimal SB strategy for that value of p (given values of R and S). Can take your analysis to define a function f(p)= SB EV? then solve for the value of p that gives minimum SB EV (which should equal maximum BB EV since it’s a zero sum game/no rake). Since it’s really BB’s single p decision in node C that seems to drive the whole thing, the BB should be able to go directly to the equilibrium point where nothing the SB does can improve their outcome.

Or am i missing something?

BlackWidow · February 23, 2022, 7:56pm

Card removal effects

Before going any further, it is worth pointing out that the EV calculations in my first post on the preflop raise/shove/fold game are actually not quite accurate. Specifically, there is a subtle problem with this statement:

Why is this not quite accurate? The reason is that the SB knows their own 2 cards. Consequently, the BB cannot hold those 2 cards and this affects the number of combos in the BB’s shoving range. Thus, the BB’s shoving frequency is not quite independent of the SB’s hole cards. For example:

If the SB holds 32o, that blocks 3 combos each of 33 and 22, 3 combos each of A2o and A3o, and 1 combo each of A3s, A2s, and K2s in the BB’s shoving range. This is a total of 15 combos out of 526 combos in the unblocked shoving range. The total number of all combos is reduced from 1326 to 1225. Thus, the effective shoving probability of the BB is then (526-15)/1225 = 41.71%, which is about 2 percentage points higher than the unblocked range of 39.67% of all combos.
If the SB holds AKo, that blocks a lot more combos, namely 79 out of 526. Thus, the effective shoving probability of the BB is then (526-79)/1225 = 447/1225 = 36.49%. This is more than 3 percentage points lower than the unblocked frequency!

These subtleties are important to get to an accurate solution, but they also make any manual computations so much harder. Thus, to not obfuscate the key ideas, I will ignore these card removal effects for the time being.

BlackWidow · February 23, 2022, 8:18pm

That’s an interesting idea. I think the main problem is that f(p) also depends on the SB’s specific hand equity against the BB’s calling range. Now, one could perhaps average over all holdings and get the SB’s EV “before cards are dealt”. But even then the function would still depend on the specific calling range chosen by the BB, not only the frequency p.

For reasonable stack and raise sizes, it is indeed possible to find the approximate shoving frequency p using the so-called indifference principle. I am getting ahead of myself here a bit, but the key idea is that at equilibrium, it is expected that the SB will open-fold and raise-fold with positive frequencies. For this to be in equilibrium, the SB needs to be indifferent between open-folding a hand and raise-folding a hand (otherwise, the SB would always choose the option with higher EV). Now,

EV(open-fold) = S-0.5
EV(raise-fold) = (1-p)*(S+1) + p*(S-R)

where p is again the shoving frequency of the BB. Setting these two EVs equal and solving for p yields

p = 1.5/(1+R).

For example, if the SB is min-raising (R=2), then the BB should shove p=50% of all hands at equilibrium.

Dorkupine · February 23, 2022, 8:37pm

Yeah…i’m lost again. Here i was thinking I was kinda understanding this and maybe even could add a little value, but it turns out I’m just a poser who’s in over his head after all. (But you knew that already)

BluffaloKing · May 15, 2022, 7:37pm

Yes, a lot of the art of poker is in subtly breaking the Nash equilibrium like boiling a frog. Opponents don’t realize that they are being gamed until they find themselves in boiling water with no handrail.

For example I was playing in a tournament last night and we are down to four players, but only three will get the money.

There is a small stack and he is timid, and clearly the game plan for the other three players should be to eliminate the small stack, and to avoid giving him opportunities to treble or quadruple his stack. So when he is in the blinds, one player should attack him and the others fold unless they have premium hands that are unfoldable.

However, if one player attacks the small stack in the BB, there may be a betrayal of trust when the SB reraises, knocks the early raiser out of the pot, loses to the SB, but makes an overall profit on the pot, since the opening raise is greater than the stack of the small stack in BB.

I had a situation last night like that, but eventually the small stack succumbed.

Now you have three players left, all get paid, but it is clear that whomever succeeds in eliminating another player will have the largest stack and be favorite to take the big prize. On the other hand, getting all-in and losing could get you knocked out in third place.

And so it goes on seemingly for ever as the blinds raise, it becomes folly to call any raise from the BB, because the price of losing is so high. Until this happens:

Suddenly the whole game is changed and the loser in that pot can no longer afford to steal pots, and soon after that hand he crashes out.

Then you are on to heads-up, and it is a whole new game. With the blinds so high, it becomes suicide to not call preflop raises, or to limp from the button, or to fold preflop. One you see the pattern of play of an opponent, how they respond to preflop raises, and how much they fold preflop, which hands they raise and which hands they limp and fold, then they are roadkill.

It is all about constantly adjusting to equilibrium. In RP 6-max tournaments, before you get on the final table, you may have to play three-handed for a considerable time before you play 6-handed, and many players fail to adjust strategy when they make the final table, and then after each opponent disappears leaving an altered point of equilibrium.

But shhh! Don’t tell anyone.

Craig_Anthony · May 15, 2022, 9:10pm

Hahahahahaha, I won’t tell a soul

Topic		Replies	Views
GTO on Replay Poker Strategy	9	2073	March 15, 2017
GTO vs OLD SCHOOL POKER! Poker Strategy	21	2125	July 11, 2019
Game Theory Pessimal (GTP) Strategy Poker Strategy	11	730	April 21, 2019
Simple Probability Strategy => 2 + or more ways of winning Poker Strategy	14	473	September 23, 2022
Comparing Simple Strategies Poker Strategy	252	3938	August 31, 2022

Finding Equilibrium

Related Topics