博弈类DP
MinMax算法
Coin in a Line 1
There are n coins in a line. Two players take turns to take one or two coins from right side until there are no more coins left. The player who take the last coin wins.
Could you please decide the first play will win or lose?
Example
n = 1, return true. n = 2, return true. n = 3, return false. n = 4, return true. n = 5, return true.
分析
State: dp[i] 表示,现在剩i个硬币,先手取硬币的人最后的输赢情况。只表示先手的状态可以简化方程。
博弈类DP把 后手层的状态放到先手层,这样就不用区分当前是先手层还是后手层了。
往左边是拿一个,右边是拿两个
00000 先手拿
1 / \ 2
0000 000 后手
/ \ / \
000 00 00 0
f t t t
n-2 n-3 n-4
为了让先手赢,必须走向一个分支,这个分支的任何一个情况都是先手赢,
比如往右走,因为我们假设后手肯定回去走让自己赢的那个分支。
transfer:
dp[i] = (dp[i - 2] && dp[i - 3]) || (dp[i - 3] && dp[i - 4])
跳过后手层
init:
dp[0] = false
dp[1] = true
dp[2] = true
dp[3] = true
code
public class Solution {
public boolean firstWillWin(int n) {
// write your code here
boolean[] dp = new boolean[n + 1]; // cache, 相当于hash的value
int[] flag = new int[n + 1]; // 是否访问过,相当于hash的key
return search(n, dp, flag);
}
public boolean search(int i, boolean[] dp, int[] flag) {
if(flag[i] != 0) {
return dp[i];
}
flag[i] = 1;
if(i <= 0) {
dp[0] = false;
} else if(i == 1) {
dp[1] = true;
} else if(i == 2) {
dp[2] = true;
} else if(i == 3) {
dp[3] = false;
} else {
dp[i] = (search(i - 2, dp, flag) && search(i - 3, dp, flag))
|| (search(i - 3, dp, flag) && search(i - 4, dp, flag));
}
return dp[i];
}
}
Coins in a Line II
There are n coins with different value in a line. Two players take turns to take one or two coins from left side until there are no more coins left. The player who take the coins with the most value wins.
Could you please decide the first player will win or lose?
Example: Given values array A = [1,2,2], return true. Given A = [1,2,4], return false.
分析
MaxMin算法
state: dp[i]
还剩i个硬币,先手取硬币的人,最多能取得多少价值
Answer:dp[n] > sum / 2
first player win
5,1,2,10 先手拿
5 / \ 5,1
1,2,10 2,10 后手
1 / \1,2 2/ \2,10
2,10 10 10 0
Function:
dp[i] = max(min(dp[i-2], dp[i-3])+coin[n-i]), (min(dp[i-3],dp[i-4])+coin[n-i]+coin[n-i+1])
min的意义是后手希望让先手取得最小的价值
Init:
dp[0] = 0
dp[1] = coin[i - 1]
dp[2] = coin[i - 2] + coin[i - 1]
dp[3] = coin[i - 2] + coin[i - 3]
public boolean firstWillWin(int[] values) {
// write your code here
if(values.length <= 2) return true;
// winner will get value large than half
int sum = 0;
for(int value : values) {
sum += value;
}
int[] dp = new int[values.length + 1];
Arrays.fill(dp, -1);
int n = values.length;
dp[0] = 0;
dp[1] = values[n - 1];
dp[2] = values[n - 1] + values[n - 2];
return 2 * dfs(values, dp, n) > sum;
}
private int dfs(int[] values, int[] dp, int remain) {
if(remain < 0) return 0;
if(dp[remain] != - 1) return dp[remain];
int n = values.length;
dp[remain] = Math.max(
Math.min(dfs(values, dp, remain - 2),
dfs(values, dp, remain - 3)) + values[n - remain],
Math.min(dfs(values, dp, remain - 3),
dfs(values, dp, remain - 4)) + values[n - remain] + values[n - remain + 1]
);
return dp[remain];
}
Coins in a Line III (Snapchat)
There are n coins in a line. Two players take turns to take a coin from one of the ends of the line until there are no more coins left. The player with the larger amount of money wins.
Could you please decide the first player will win or lose?
Example: Given array A = [3,2,2], return true. Given array A = [1,2,4], return true. Given array A = [1,20,4], return false.
分析
和上一题区别最大的是 dp[][]
,因为我们不能再固定一端,用一个维度来表示剩下硬币构成的数组,因为从两端取硬币的时候,总共的区间数量 (start, end) 是 O(n^2),需要用两个维度表示。
博弈类DP把 后手层的状态放到先手层,这样就不用区分当前是先手层还是后手层了。
state: 只定义先手的状态
dp[i]
还剩i个硬币时,不好定义,因为可以从左取硬币,也可以从右取硬币,不能区分开来这两种状态。
取完一个硬币后还剩一个连续的区间,所以可以用一个区间来表示,可以区分出每一个状态。
dp[i][j]
表示现在还剩第i到第j个硬币的时候,先手取硬币的人最后最多的硬币的价值。
function:
left = min(dp[i + 2][j], dp[i + 1][j - 1]) + coin[i]
right = min(dp[i][j - 2], dp[i + 1][j - 1]) + coin[j]
dp[i][j] = max(left, right)
init:
dp[i][i] = coin[i]
dp[i][i+1] = max(coin[i], coin[i+1])
answer:
dp[0][n - 1]
code
public class Solution {
public boolean firstWillWin(int[] values) {
// write your code here
if(values == null || values.length == 0) {
return false;
}
int n = values.length;
// dp[i][j] 表示现在还剩第i到第j个硬币的时候,先手取硬币的人最后最多的硬币的价值。
int[][] dp = new int[n][n];
boolean[][] flag = new boolean[n][n];
int sum = 0;
for(int value : values) {
sum += value;
}
return sum < 2 * dfs(values, dp, 0, n - 1, flag);
}
public int dfs(int[] values, int[][] dp, int i, int j, boolean[][] flag) {
// 避免重复计算
if(flag[i][j]) {
return dp[i][j];
}
flag[i][j] = true;
if(i > j) {
dp[i][j] = 0;
} else if(i == j) {
dp[i][j] = values[i];
} else if(i + 1 == j) {
dp[i][j] = Math.max(values[i], values[j]);
} else {
// 后手的策略是让先手取到更少的价值
int goLeft = Math.min(dfs(values, dp, i + 2, j, flag), dfs(values, dp, i + 1, j - 1, flag)) + values[i];
int goRight = Math.min(dfs(values, dp, i + 1, j - 1, flag), dfs(values, dp, i, j - 2, flag)) + values[j];
dp[i][j] = Math.max(goLeft, goRight);
}
return dp[i][j];
}
}