Monte Carlo Tree Search (MCTS) is an algorithm used in decision-making and search problems. The algorithm builds a search tree of possible moves and outcomes, exploring the tree by simulating a number of games to determine the best moves. MCTS balances exploration of unexplored nodes and exploitation of nodes that have shown promise in the past. The algorithm selects the best move to make from the current game state based on the statistics collected during the simulation phase.

# Algorithm

- Initialize
- POMDP problem
- Root node
- Hyperparameters
- $d$ = search depth
- $m$ = search simulations before choosing best action
- $c$ = exploration constant
- $U$ = value function estimate method, rollout simulator

**For $m$ simulations, do:**

**Selection**:- If first simulation:
**Expand**root node creating child nodes for all possible actions (or modification).- Return value estimate from root node using rollout

- If not first simulation:
- Select best action according to UCB metric
- $Q(s,a)+cN(s,a)logN(s) $

- Select best action according to UCB metric

- If first simulation:
**Expansion**:- If node has no children and is not terminal expand it.
- Create next state, $s_{′}$ using transition distribution and chosen action, creating new unexplored nodes for each possible action from $s_{′}$.
- The transition distribution won’t necessarily yield the same $s_{′}$ next time this action is taken from $s$.

- If node has children and not at max depth, return to
**Selection**

- If node has no children and is not terminal expand it.
**Rollout**:- When max depth is reached, get value estimate using rollout and policy. This is the leaf node.

**Backprop**:- Recursively propagate value estimate at leaf node backwards through tree, updating value estimate $Q$ at each parent using current reward + discounted future reward.

# Code

Based on [1] with additional explanation.

# Modifications

- (Double) Progressive Widening
- Limit the number of actions considered from state $s$
- Limit the number of states resulting from action $a$

- Other rollout methods

## References

[1] M. J. Kochenderfer, T. A. Wheeler, and K. H. Wray, Algorithms for decision making. Cambridge, Massachusetts: The MIT Press, 2022.