RPS.bot: Opponent Modeling Starter Code
Starter Code
We provide Python starter code that you can use to build RPS bots.
The code by default plays randomly, which will guarantee a breakeven result (and won’t be very fun).
The main actions are in the get_action
function:
def get_action(self, *, match_clock):
return random.choice([RockAction(), PaperAction(), ScissorsAction()])
Variables
The profit so far in the match is stored in:
self.my_profit
And the history of your actions and your opponent actions are stored in the following array, where each history is a tuple of (my_action, their_action)
.
self.history[]
Functions
def __init__(self)
:- Initializes profit
self.my_profit
- Initializes the history array
self.history[]
- Initializes profit
handle_results(self, *, my_action, their_action, my_payoff, match_clock)
:- This is run at the end of every game
- It updates the history with
my_action
andtheir_action
- It updates the payoff with
my_payoff
match_clock
is used to see how much of your 30 second total time is remaining (this can be ignored)
def get_action(self, *, match_clock)
:- This is the main function where you run your strategy and return an action
- You can return
RockAction()
,PaperAction()
, orScissorsAction()
Beating simple known opponents
It’s a lot easier to beat your opponents when you know their strategy! Though unrealistic, this is still a worthwhile first exercise for thinking about which strategies beat which opponent strategies and how to code them.
Rockbot
Opponent strategy: 100% Rock
Our counter-strategy: 100% Paper
Our best EV: +1 per game
Our counter-strategy in code:
def get_action(self, *, match_clock):
return PaperAction()
r532bot
Opponent strategy: 50% Rock, 30% Paper, 20% Scissors
Our counter-strategy: 100% Paper
Our best EV: 0.5(1) + 0.3(0) + 0.2(-1) = 0.3 per game
Our counter-strategy in code:
def get_action(self, *, match_clock):
return PaperAction()
RPSbot
Opponent strategy: RPSRPSRPS…
Our counter-strategy: PSRPSRPSR…
Our best EV: +1 per game
Our counter-strategy in code:
def get_action(self, *, match_clock):
= [PaperAction(), ScissorsAction(), RockAction()]
moves = len(self.history) % 3 # Use history length to cycle
move_index return moves[move_index]
Mimicbot
Opponent strategy: Copy our last action
Our counter-strategy:
- If our last action was Rock, play Paper
- If our last action was Paper, play Scissors
- If our last action was Scissors, play Rock
Our best EV: +1 per game (after 1st)
Our counter-strategy code:
def get_action(self, *, match_clock):
if self.history: # After the 1st game
= self.history[-1][0] # Find our last move
our_last_move if isinstance(our_last_move, RockAction): # If Rock
return PaperAction() # Play Paper
elif isinstance(our_last_move, PaperAction): # If Paper
return ScissorsAction() # Play Scissors
else: # If Scissors
return RockAction() # Play Rock
else: # 1st game play randomly
return random.choice([RockAction(), PaperAction(), ScissorsAction()])
Beatprevbot
Opponent strategy: Beat our last action
Opponent strategy code:
def get_action(self, *, match_clock):
if self.history: # After the 1st game
= self.history[-1][1] # Find last opponent move
last_opponent_move if isinstance(last_opponent_move, RockAction): # If Rock
return PaperAction() # Play Paper
elif isinstance(last_opponent_move, PaperAction): # If Paper
return ScissorsAction() # Play Scissors
else: # If Scissors
return RockAction() # Play Rock
else: # 1st game play randomly
return random.choice([RockAction(), PaperAction(), ScissorsAction()])
Our counter-strategy:
- If our last action was Rock, play Scissors
- If our last action was Paper, play Rock
- If our last action was Scissors, play Paper
Our best EV: +1 per game (after 1st)
Our counter-strategy code:
def get_action(self, *, match_clock):
if self.history: # After the 1st game
= self.history[-1][0] # Find our last move
our_last_move if isinstance(our_last_move, RockAction): # If Rock
return ScissorsAction() # Play Scissors
elif isinstance(our_last_move, PaperAction): # If Paper
return RockAction() # Play Rock
else: # If Scissors
return PaperAction() # Play Paper
else: # 1st game play randomly
return random.choice([RockAction(), PaperAction(), ScissorsAction()])
There could be more levels to this! If you assume that your opponent knows that you are going to do this, then they might play the next level up and instead of “beat previous”, go to “beat the move that beats the move that beats previous”.
We play: Paper
Beat previous plays: Scissors
Beat the move that beats the move that beats previous plays: Paper
We play: Rock
Beat previous plays: Paper
Beat the move that beats the move that beats previous plays: Rock
We play: Paper
Beat previous plays: Scissors
Beat the move that beats the move that beats previous plays: Paper
This turns out to be mimicbot, which we saw in the previous section! And more levels are possible…
Freqbot
Opponent strategy: Beat our most frequent overall action
Opponent strategy code:
def get_action(self, *, match_clock):
if self.history: # After the 1st game
= [move[1] for move in self.history]
opponent_moves
# Count the occurrences of each move
= {} # Count each R/P/S
move_counts for move in opponent_moves:
if move in move_counts:
+= 1 # Add to counter
move_counts[move] else:
= 1 # Start counter
move_counts[move]
# Find the move with the highest count
= None
most_common_move = 0
highest_count for move, count in move_counts.items():
if count > highest_count:
= move
most_common_move = count
highest_count
if isinstance(most_common_move, RockAction):
return PaperAction()
elif isinstance(most_common_move, PaperAction):
return ScissorsAction()
else: # most_common_move is ScissorsAction
return RockAction()
else: # 1st game play randomly
return random.choice([RockAction(), PaperAction(), ScissorsAction()])
Our counter-strategy:
- If our most frequent action was Rock, play Scissors
- If our most frequent action was Paper, play Rock
- If our most frequent action was Scissors, play Paper
Our best EV: +1 per game (after 1st)
Our counter-strategy code: Exercise for the reader!
General strategies
In poker, it’s common to start by attempting to play the game theory optimal (GTO) strategy and then as you see weaknesses in opponents, to deviate accordingly. If you see your opponent folds too much, then you can bluff more.
Here are a few possible general strategic ideas for RPS:
- Make a rules-based agent that plays a strategy that seems good
- Detect a single opponent strategy and exploit it
- Build an ensemble of opponent detectors and exploits that target various types of opponents
- Use random (game theory optimal) play to either:
- Breakeven forever
- Start with random and then update after you learn more about your opponent
- Use it as a backup plan to switch to if your current strategy is not going well
- Assume by default that the opponent is playing approximately GTO and then detect deviations and play accordingly
- Attempt to predict the next move of each opponent using a general algorithm
Breakeven unless you’re playing Rockbot
You could check to see if any bot is always playing Rock. If so, you always play Paper. If not, you play randomly. Then you will breakeven against everyone, but crush the Rock-only bot.
Here’s how to execute this strategy:
def get_action(self, *, match_clock):
if self.history: # Check if the opponent has played at least once
# Get all of the opponent's moves
= [move[1] for move in self.history]
opponent_moves
# Count how many times the opponent played Rock
= sum(1 for move in opponent_moves if isinstance(move, RockAction))
rock_count
# If opponent has only played Rock, we play Paper
if rock_count == len(opponent_moves):
return PaperAction()
# If it's the first move or opponent hasn't only played Rock, play randomly
return random.choice([RockAction(), PaperAction(), ScissorsAction()])
Random play backup plan
Assuming a setting with suboptimal bots, breaking even by playing randomly won’t be the most rewarding strategy, but could be valuable to consider in some situations. Here is an example of using random play in combination with another strategy.
Suppose that you have a theory that a winning strategy is to beat the action that the opponent played two games ago.
You could try something like this:
- If you’re winning or tying, then play to beat the action from two games ago 80% of the time and play randomly 20% of the time so that you aren’t too predictable
- If you’re losing and down by more than 10 units (i.e. maybe this strategy is not working well), then always play randomly
def get_action(self, *, match_clock):
# If losing, play randomly
if self.my_profit < -10:
return random.choice([RockAction(), PaperAction(), ScissorsAction()])
# If winning or tying, and there are at least 2 rounds of history
if len(self.history) >= 2:
# 80% chance to play based on opponent's action from 2 rounds ago
if random.random() < 0.8:
= self.history[-2][1] # Opponent's action from 2 rounds ago
two_rounds_ago if isinstance(two_rounds_ago, RockAction):
return PaperAction()
elif isinstance(two_rounds_ago, PaperAction):
return ScissorsAction()
else: # ScissorsAction
return RockAction()
# In all other cases (including 1st 2 rounds), play randomly
return random.choice([RockAction(), PaperAction(), ScissorsAction()])