Recall that Thursday’s class will be the Enigma talk by David J. Saltman in PHS 218. [Details]

To ensure that everyone is keeping up with the readings, everyone should post a comment (to this post) about the Rubin and Watson reading [reading post]. Your comment should either: (1) state three surprising things you learned from the paper (which may not include anything a previous comment has mentioned), (2) state one surprising/interesting thing you learned from the paper and ask a clear question based on something in the paper, or (3) provide a good answer to a question an earlier comment posed.

You should post your comment by Thursday (and should feel free, of course, to followup with additional comments later).

I find the “fictitious play” strategy interesting. It seems like the bots we design for the project (assuming it implements learning) can just play itself regardless of what other teams may be doing and eventually get to a good enough epsilon equilibrium strategy. The paper seems to only discuss using this for 2 player poker bots. Is it practical or has it been done for multiplayer poker games?

I will post an interesting finding in a separate post. First my question.

I am not sure that I understand the IHR metric. Why take 1/2 the number of ties in the calculation? If I want to know my hands rank, shouldn’t I just include the number of ties at whole value?

Doing this gives a probability the hand will not lose (or at least tie if you look at it that way). I find this to be more meaningful. Any thoughts?

More insights to come.

-Ben R.

I think it’s incredibly useful to note how effective targeting the weakest opponents can be as described with AKI-RealBot. It must be pretty simple to add such functionality, even if it’s just a matter of identifying the worst player, to whatever strategy is being used to implement the AI. For games involving more than 2 people I bet it increases income quite a bit.

The Counterfactual Regret Minimization (CFR) algorithm works surprisingly well and seems incredibly promising for future work. What about the CFR, if anything, guarantees that the strategies approach an equilibrium?

Does anyone know what resources are used to identify the computational limitations of current strategies? I feel like many of these problems could be easily adapted and run on a supercomputer, though it may be hard to get time on a system like that.

I really have the same feeling about

Fictitious playwith Matt. It seems very counterintuitive to build some bot by training it against itself… and it beats Sparbot and Vexbot? Really?I also doubt the effectiveness of

Frequentist best response (FBR)strategy. It seems the naive idea is to first train itself to learn the opponent model. However the opponent’s opponent is simply picked to be a raise/call 50-50 opponent. I think that’s gonna affect the accuracy of learning. Besides, an easy counter-strategy for this would be to record a certain period of opponent’s history decision. Once the counter-strategy noticed a sudden change of strategy of its opponent (presumably when the training stage ends), it also changes its own strategy.One last thing to complain, as we complained on our way to the last home game, almost every paper talked about using

linear programmingto solve the nash-equilibrium for poker (at least in related work section), and some also mentioned bucketing hands/reduced rounds of betting to simplify the problem, but no paper has actually given any details. I am not a math person and I don’t get any of it.@Ben Rodes: Because ties give you, half the pot it makes sense to rank the

strategy as being half-good:) (assuming 2-player game here)

@Peter: I was also a little surprised by the success at heavily exploiting just

one player. But given the success, I would assume that every bot in the

competition will now try this, and the return from the exploited players

will now be divided up in those new bots. As such, the extensive success of

AKI-RealBot may no longer be observed in future competitions.

I have what might be a dumb question: could somebody explain me the

difference between fictitious play and the simulation-based bots they

described in page 8?

One interesting point I found that will not count is Weikeng and I have listed this paper as the 2nd reference paper for our class….

OK, based on Matt and Yuchen’s questions, I feel that the “fictitious play” should be strictly weaker than the oridinary genetic algorithm. I check papers mentioned and they implemented fictitious play just like an evolutionary process without crossover stage. The bot will only iteratively adjust key parameters itself and judge if this is a “good” adjustment. I am not clear why they have so good performance.

I agree with Samee’s idea about Ben’s question. But multiple players? The number of possible cases will grow exponentially and it is not possible to compute IHR by enumeration.

Also one thing I find useful is how to apply Monte Carlo simulation for poker. Poker is very complicated game and different sampling area should be sampled with different weights. Their selective-sampling simulation by dynamically maintaining a weight table is very interesting.

One questionI had was regarding “implied pot odds,” which are mentioned in footnote #4 on page 6. While I understand the concept of pot odds, I don’tunderstand “implied pot odds.” Maybe someone can clear it up for me? Here’s a link to more info: http://en.wikipedia.org/wiki/Pot_odds#Implied_pot_odds

I thought the section on simulation based poker agents (3.3) was particularly interesting.In this section he authors describe how Loki-2 would evaluate expected values of various decisions. This procedure works by considering the bot’s hole cards, and then simulating a large number of hands basedon the opponents having random cards. However, Billings made the observation that this approach is not as effective since the opponent is assigned

randomcards. This introduces a bias, since at a given point in the game certain random cards (like 7-2 unsuited) are extremely unlikely to be the opponentshole cards. In order to eliminate such biases, Billing’s proposed “selecting-sampling” simulation. In this technique, a bias is introduced when choosing these formerly fully random cards with which to evaluate. Every time the opponent shows their cards (ie in a showdown), this information is used to inform

weight the probabilities with which random hands are chosen during the simulation.

I thought there were a few interesting things that could be added to this selective sampling approach. First, it is likely a poor choice to start out with totally even weights for generating our random cards. Certain information is known a-priori, and we should use it. We should therefore start with these weights as some sensible initial values to avoid losing a lot while th weights are adjusted. Finally, I think adding an extra “dimension” to this evaluation strategy would work well. Billings’ selective-adaptive simulation strategy does weight the random hands chosen based on what cards the opponent is likely to hold. However, we could also weight based on things like the opponents amount of chips and other criteria. For example, an opponent may be more likely to wait for “very good” hole cards before betting heavily when they have few chips remaining.

Regarding

implied pot odds, the idea is to estimate what you expect to be able to win if your draw (or bluffing card) comes through. This is especially important for no-limit games, where there is the possibility of taking your opponent’s entire stack on any hand.Here’s an example (roughly based on a hard from our last home game) where considering only pot odds would suggest folding, but the implied pot odds make calling a much better option. Suppose two players are left in the hand, and both have large (>3000) stacks. Player A has 58 and the board is 4AK7. The pot has 160, and Player B bets 20. Player A would need to pay 20 to call, for a chance to win a pot of 200. Thus, the pot odds are 20:200 = 1:10. Player A believes Player B has a strong A, possibly even AA or AK. This means the only chance to win is to make the inside straight with a 6 (so, player B has 4 outs = 4/(52-6) = 0.087 chance to make draw). Since 0.087 << 0.1, a simple pot odds analysis suggest Player B should fold. But, if the straight does come, it is very well disguised (especially since Player B may assume Player A would follow the usual guidance which is that only fools draw to inside straights) and Player B will have a hard time folding his strong hand to a big bet. So, there is a reasonable chance that Player A will make much more than 200 chips when the straight comes. An implied odds calculation would attempt to estimate this amount to decide whether it is a good call to go for a draw that is not profitable according to just the current size of the pot.

@Yuchen I think this link is helpful to explain the link between game theory and linear programming.

http://www.slideshare.net/leingang/lesson-35-game-theory-and-linear-programming

More comment to come

Good point Samee about my IHR question. I guess the key thing to remember is IHR is an immediate hand strength, not the probability of winning or losing. So it is its own metric, or so it would seem. I don’t really see the use though. Again, if I have a hand, I just want to know my probability of not losing, because after all, I might as well treat a tie like a win since there is no harm done.

-Ben

Post 2 of ???

I know some one has mentioned this before, but I want to talk about it a little more. The paper mentions a bot that does really well against weak opponents. This brings up an interesting strategy. Not a strategy to win, but to come in the top few players. In tournament play, the top players receive a payout, so it could be profitable.

In no limit poker this strategy is even more effective since a larger stack size does provide more power in the game. The strategy could then be to focus on the weaker players, build up a leading stack size and use the larger stack to bully the stronger players. A strong player should be able to beat this strategy (at least I would hope so), but the goal is not to win necessarily, just to place.

Of course, this is all conjecture.

@samee I think for fictitious play “both player are aware of the strategy of the other”, while simulation-based approach doesn’t have this condition.

Looks like I am a little late to the party. I thought that the fact that the simulation based strategy (SBSPoki) didn’t perform well was enlightening. It seems that in theory it should work very well, but its likely that its impossible to perform enough simulations in the time constraints. I also thought the success of fictitious play was surprising. This again seems like a situation where you would have to simulate much more than seems reasonable to reach a good bot. Finally, I thought the case-based lazy learning system mentioned at the end was interesting because it seemed to be the best approximation of what a human player actually does. The comparison to old cases seems to match the “intuition” or “gut feeling” that people get when they see certain hands.

@Ben: I’m not sure where you are going with that. As Samee pointed out, it might not be successful during a bot tournament, and I don’t imagine a professional tournament would have a weak enough player to exploit. It may, however, be a good way to make money at casinos or in on-line poker.

This implies you are a really tight player.

Thanks for that example! That really cleared up implied pot odds.

Section formula-based strategies defined the concept of negative

potential, but I don’t know why it is not used anywhere in the

formula. In addition, I can’t see the connection between pot odds

and the winning confidence of a player to call, as mentioned in

the end of the pot odds section. I think it could be an

improvement to computer the hand potentials considering the

effects of early folds.

The basic idea of simulation based game tree search technique

seems straightforward. But I believe the hard part is to give

reasonable probability weights to the action edges on the tree,

especially for opponents’ actions, so that the simulation results

reflect the reality more accurately. It seems a critical point the

simulation-based approach relies heavily on correct opponent

model for any meaningful outcome. I wish the review had

elaborated more about this.

Game-theoretical approach seems to be the ultimate sword for the

problem. I feel that with the rapidly increasing computing power

of the clouds, equilibrium strategy for 2-player (very) limited

Texas Hold’em could be computed in foreseeable future. However,

it remains very difficult to generalize this approach to no-limit

and multi-player games without substantial lossy abstraction. I

also found the concept of epsilon-equilibrium refreshing. But I

found no clue from the survey about how those prior work

guaranteed the epsilon value. Is it just an empirical value

concluded from experiment runs without any formal proof? It will

be really disappointing if that is the case.

My big question comes from the Range of Skill algorithm. It

claims (page 17, first paragraph of section 4.6.2) that the bots

created by the Range of Skill algorithm always beats its previous

self by epsilon, so the sequence has a finite length. I don’t see

why it is the case. By the theory of exploitive bots, it is

entirely possible to find a circular sequence of bots that each

one beats its predecessor by exploiting a specific weakness.

The CFR example described in page 18 require two players begin by playing perfect information game against each other, which I think is very unlikely to happen in practice. Also one player can deliberately choose bad strategy in order to fool the opponent.

The incomplete information game tree search example described in pg22-23 builds opponent model by constructing showdown histograms. However, if the game terminates before the showdown, no information about the opponents hand would be available. In no limit game, I think the money in the pot should also be considered in the opponent model.

Since the paper contains numbers of current work, it seems that it’s not fully explaining each concept. So it makes me hard to understand every word in this paper. The concept of near-equilibrium strategies, however, sounds interesting to me. It saids near-equilibrium can be derived by the stochastic games if complete Nash equilibrium is not possible to be computed or if it is known that the game has no Nash equilibrium. In this case, we may derive epsilon-Nash equilibria with some positive probability epsilon. And Range of Skill algorithm can be used in order to approach to this near-equilibrium.

Then, what would be the good (or approachable) value of epsilon? How can it be computed? Why is it a good value?

One thing that I have serious doubt with is what the authors say about “Fictitious Play”. Fictitious play produces near-equilibrium agents, which implies that these agents are very likely to be exploited. Vexbot is an exploitive bot, how could Adam, a near-equilibrium agent had constant win rate against Vexbot?

Another question is about exploitive agents. If the opponent is constantly, randomly changing the strategy, how could an exploitive agent construct the model of the opponent? What’s more, an exploitive agent itself could be exploited: the opponent might keep one strategy for a while and let the exploitive agent to construct the model of that opponent. Then the opponent changes the strategy, exploiting the exploitive agent’s strategy. So the exploitive agent is passive, and it is always one step behind a player that’s constantly deliberately changing strategy.

I assume you mean the strategy is tight, but I don’t see why. Care to elaborate?

My argument is I need to know the probability I wont lose. I can then decide if it is prudent to bluff, which could be based on this metric, or from some probability. Because I haven’t ruled out bluffing, I am not necessarily a tight player (or this strategy isn’t).

The point is, how do I know when I am bluffing? I know when I am bluffing based on the strength of my cards. The strength, in my point of view, is the probability of not losing. I see no reason to consider tying hands “half as good”.

For example, if I told you your probability of winning a game is 25% you may hesitate to bet as much. But if I told you your probability of not losing is 50%, you may bet even more, and take advantage of the fact you wont lose anything if indeed there was a tie.

Ben

@Brielin: You are right, if you are in a bot tournament and all bots make use of this strategy, it may be a wash. It may depend how you determine weak players though. I believe in the paper, the described system determined the weakest opponent is who has lost more chips to player in question. If you use this metric, and all bots are adopting this strategy, then each bot will have a different idea of what bot is the weakest. While this should all work out in the end, or at least I think it should, it may not be optimal.

Anyway, that was a tangent. What I really wanted to say was I just find the strategy interesting. Where it is useful is a different question. Certainly as you mentioned, ripping people off online in tournaments with real people is one environment this strategy may prove most effective. Or, just a strategy for myself, it may be useful in our next poker night

@Yan: for your first question, explainations could be found in:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.59.2001&rep=rep1&type=pdf

According to our understandings, the original formula should be:

EHS=IHS*(1-NegativePotential)+(1-IHS)*PositivePotential

and they said they ignore negative parts because “we generally want to bet when we currently have the best hand”, so negative potential “is not very important”. I do not totally agree this idea. But I guess they tried both and the formula without negative part works better.

@Yan: for your first question, explainations could be found in:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.59.2001&rep=rep1&type=pdf

According to our understandings, the original formula should be:

EHS=IHS*(1-NegativePotential)+(1-IHS)*PositivePotential

and they said they ignore negative parts because “we generally want to bet when we currently have the best hand”, so negative potential “is not very important”. I do not totally agree this idea. But I guess they tried both and the formula without negative part works better.

I think only the example requires perfect knowledge, but I could be misunderstanding all of this. Based on how bots using CFR have done in competition, you have to admit the idea appears to be powerful. Since I don’t think bots in competition are likely to freely reveal their hole cards, variations of CFR must work with imperfect information.

-Ben R.

Regarding linear programming:

First of all, a definition of the subject: linear programming is the study of how to maximize or minimize the value of a linear function in a region of its domain that is defined by linear constraints. For example, I may wish to determine the minimum value of a linear function of two variables inside of some polygon in the plane. Solving a linear program means determining that minimum value and determining which point in the region the function is minimized at.

For a zero-sum game like poker, the linear program is easy to understand and easy to solve (using a computer program). Basically, player 1 is attempting to minimize the payoff for player 2, and player 2 is attempting to maximize the payoff. The number of dimensions for the player 1 program will be the number of pure strategies for player 1 plus one (the payoff); the number of dimensions for player 2 will be the number of pure strategies for player 2 (plus one). As an example, in the AKQ game, the “pure strategies” would be 3-tuples, where each element of the tuple could take on two values (check/bet, call/fold), giving a total of nine pure strategies for each player, so the linear program would be in ten dimensions for both players. The region of space on which the program is solved will be determined by the payoff matrix for player 2 (which is the negative of the matrix for player 1) and the expected payoff for the whole game. The solution will give the mixed equilibrium for each player, and the expected payoff.

Notice, though, that the number of dimensions and constraints will increase exponentially with the size of the game tree. For the simple AKQ game, there will be 9 dimensions; for a more complicated game like the pre-flop part of our project, there will be millions of dimensions. Thus, it is necessary to reduce the number of constraints and dimensions by creating abstractions; some abstractions are “lossless” like suit equivalence, others are “lossy” in the sense that the LP solution will not be an exact equilibrium, but some approximation. Luckily, the algorithms that solve linear programs run in polynomial time in the size of the program; thus, you only need to worry about keeping the program size small enough.

There are other approaches to the LP solution, like giving the program in sequence form (which the paper discusses) that will reduce the size of the linear program. This is not quite as a straightforward, though, so I won’t try to explain it in a blog post comment (however, the paper does give a good reference for it). Even so, as the authors point out, the sequence form still results in very large linear programs, and so some abstractions are needed.

With regard to “cloud computing” — I do not think that this will help solve linear programs in any meaningful way, since linear programming falls in the class of P-complete problems. For anyone not familiar, P-complete problems are problems which are inherently sequential — unless P=NP, there will be little advantage to using a parallel computer to solve these problems (in the worst case). Now, it may be the case that the specific linear programs needed to compute equilibrium solutions to games like poker are not worst-case problems and could benefit from being parallelized, which could be nice, but I would be careful about making any assumptions here without some proof.

The goal is not to “not lose,” it is to win, and to win as much and as often as possible. Tying is not winning; if you and would tie at a showdown, you would prefer if I fold so that you can take the whole pot. Ties have the effect of reducing your expected payoff; not as much as losses, but the goal is to maximize the payoff, not to break even.

Good to know linear programming is P-complete.

Thanks, Ben! This is a great explanation.

I found it surprising that two computers are teaching each other, how long before they become self aware.