You are given chess board (of arbitrary size $M \times N$) where each square has a certain non-negative value. A single rook is placed on the board. The rook can move horizontally or vertically as in regular chess. However, it can only move in this fashion to squares that have a strictly lower value than the square it starts from. Given the rook’s starting square, find the length of the longest path it can take on the board.
Here’s a sample board of size $6 \times 5$ –
2 | 3 | 1 | 4 | 5 |
---|---|---|---|---|
6 | 7 | 9 | 2 | 1 |
3 | 5 | 7 | 1 | 9 |
8 | 8 | 8 | 7 | 6 |
9 | 4 | 2 | 5 | 8 |
2 | 1 | 4 | 7 | 9 |
Let’s say the rook is located at coordinates $(3, 0)$.
2 | 3 | 1 | 4 | 5 |
---|---|---|---|---|
6 | 7 | 9 | 2 | 1 |
3 | 5 | 7 | 1 | 9 |
♖ | 8 | 8 | 7 | 6 |
9 | 4 | 2 | 5 | 8 |
2 | 1 | 4 | 7 | 9 |
Given its starting square is of value $8$, it can only move (vertically or horizontally) to squares that are of value strictly lower than $8$.
✅ | 3 | 1 | 4 | 5 |
---|---|---|---|---|
✅ | 7 | 9 | 2 | 1 |
✅ | 5 | 7 | 1 | 9 |
♖ | ❌ | ❌ | ✅ | ✅ |
❌ | 4 | 2 | 5 | 8 |
✅ | 1 | 4 | 7 | 9 |
The objective is to find the longest path the rook can take on the board. It suffices to simply return the length.
This immediately felt like a dynamic programming problem. From a given state (of where the rook is located), you can call an imaginary routine $\text{longestPathFrom(rook)}$ to find the length of the longest path from that state. This is independent of how you arrived at that state. Thus, we can formulate a preliminary recurrence relation as –
\[\text{longestPathFrom(state)} = 1 + \text{longestPathFrom}(\text{state}')\]for all states that are immediately reachable from the current state. Given how the rook moves, these immediately reachable squares are those on the same rank or file as the rook could be moved to in a single move (provided their value was strictly less than the value of the rook’s square).
However, there may be multiple ways of reaching the same square, and some may be longer than others. Hence, we need to modify the recurrence relation to –
\[\text{longestPathFrom(state)} = \max_{\text{state}'\ \in \text{ reachable}}\left(0, 1 + \text{longestPathFrom}(\text{state}')\right)\]A simple algorithm to implement this is –
$M$: the number of rows in the board
$N$: the number of columns in the board
$\text{values}[M][N]$: the values of each cell in the board that govern reachability
$(i, j)$: the (row, column) of the cell from which we want to find the longest path
Initialize $\text{dp}[i][j] \leftarrow \varnothing, \forall i \in [1, M], j \in [1, N]$$\text{longestPathFrom}(i, j)$
$\quad$if $\text{dp}[i][j] == \varnothing$ then
$\qquad d_{\max} \leftarrow \max_{i’ \in [1, M]}(0, 1 + \text{longestPathFrom}(i’, j))$
$\qquad d_{\max} \leftarrow \max_{j’ \in [1, N]} (0, 1 + \text{longestPathFrom}(i, j’))$
$\qquad \text{dp}[i][j] \leftarrow d_{\max}$
$\quad$endif
return $\text{dp}[i][j]$
Here’s the algorithm implemented as a full solution in Python –
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
def print_matrix(matrix):
print('\n'.join(['\t'.join([str(cell) for cell in row]) for row in matrix]))
def max_dist(values, to_i, to_j, dp):
"""
max_dist(i, j) returns the max length path from coordinates (i, j)
"""
if dp[to_i][to_j] is None:
curr_max_dist = 0 # default unexplored max distance (no reachable squares)
for k in range(len(values[0])):
if values[to_i][k] < values[to_i][to_j]:
curr_max_dist = max(curr_max_dist, 1 + max_dist(values, to_i, k, dp))
for k in range(len(values)):
if values[k][to_j] < values[to_i][to_j]:
curr_max_dist = max(curr_max_dist, 1 + max_dist(values, k, to_j, dp))
dp[to_i][to_j] = curr_max_dist
return dp[to_i][to_j]
if __name__ == '__main__':
with open('input.txt', 'r') as inputfile:
n, m = map(int, inputfile.readline().split())
values = [[] for _ in range(n)]
for i in range(n):
values[i] = list(map(int, inputfile.readline().split()))
u, v = map(int, inputfile.readline().split())
dp = [[None] * m for _ in range(n)]
ans = max_dist(values, u, v, dp)
# print_matrix(dp)
print(ans)
Here are the contents of a sample input.txt
file you can use –
6 5
2 3 1 4 5
6 7 9 2 1
3 5 7 1 9
8 8 8 7 6
9 4 2 5 8
2 1 4 7 9
3 0
The correct answer is $7$. Here’s another one
6 5
4 4 4 4 4
5 5 5 5 5
6 6 6 6 5
7 7 7 6 5
8 8 7 6 5
9 8 7 6 5
5 0
The correct answer is $5$.
]]>Fighting game tournaments traditionally use double elimination brackets. In single elimination brackets, you would keep playing matches until you lose, after which you’re eliminated from the tournament. In a double elimination tournament, you’re instead put into a loser’s bracket (or lower bracket). If you win here, you’re paired off with the losers from the winner’s bracket (or upper bracket). It’s only if you lose a second time that you’re eliminated from the tournament. This gives competitors a second chance and makes for a more satisfying experience given fighting games can be quite volatile, and a loss might not represent a major difference in skill.
To esimate the amount of time required to run a double elimination (DE) bracket with $N$ entrants, we can calculate how many rounds we’d have to run. This ignores practical issues, since competitors cannot instantaneously teleport to the next available station once their match is over, and a match could take a variable amount of time to complete (say, $t_\text{match}$). However, if we have the number of rounds $R$, we can estimate the total time taken to run a tournament $T_\text{total}$ as
\[T_\text{total} = R \times t_\text{match}\]Let’s first assume we have $N=2^m$ entrants. How a DE bracket works can be summarized by the table below. The entry under each bracket shows how many entrants it has, describing the flow of competitors.
Round | Upper Bracket | Lower Bracket |
---|---|---|
Round 1 | $2^m \to (2^{m-1}, 2^{m-1})$ | $-$ |
Round 2 | $2^{m-1} \to (2^{m-2}, 2^{m-2})$ | $2^{m-1} \to (2^{m-2}, -)$ |
Round 3 | $-$ | $2^{m-2} + 2^{m-2} \to (2^{m-2}, -)$ |
Round 4 | $2^{m-2} \to (2^{m-3}, 2^{m-3})$ | $2^{m-2} \to (2^{m-3}, -)$ |
Round 5 | $-$ | $2^{m-3} + 2^{m-3} \to (2^{m-3}, -)$ |
$\vdots$ | $\vdots$ | $\vdots$ |
Winner’s Final | $2 \to (1, 1)$ | $2 \to (1, 1)$ |
Loser’s Final | $-$ | $1 + 1 \to (1, -)$ |
Grand Final | $2 \to (1, -)$ | |
Grand Final Reset | $-$ | $2 \to (1, -)$ |
The general pattern is easy to see. Skipping the first round, for every round in the upper bracket, there must be $2$ rounds in the lower bracket - one to get the number of entrants equal to the number of losers from the upper bracket, and two to match the new loser’s to the previous losers. Including the initial round, grand finals and possible reset, we have $2$ additional rounds plus a possible extra. Thus, we get -
\[R = \left\{\begin{array}{lr} 2 \times (\left\lceil\lg N\right\rceil - 1) + 2, \text{ no grand finals reset} \\ 2 \times (\left\lceil\lg N\right\rceil - 1) + 3, \text{ grand finals reset} \end{array}\right\}\]For $N = 16$ as in the figure above, this formula correctly predicts $R = 9$. For $N = 11$ as in this tournament, this formula still correctly predicts $R = 8$ since the higher seeded entrants are usually given byes into the next round, which evens out the round count.
DocMassacre is a senior electrical engineer at Raytheon Technologies who’s also an avid Tekken player (and holds a doctorate!). He posted a series of tweets exploring the probability of taking an entire tournament assuming a fixed probability of winning any particular set against anyone. In particular, he makes four claims -
Of course, I wanted to write a script myself to find these numbers and cross-check them. DocMassacre used a simple Monte Carlo simulation^{4} to find these numbers, but I believe we can do this analytically using some combinatorics.
Given a series of matches between two competitors, say guessing correctly on a coin flip, what’s the probability of one competitor reaching $N$ wins first? The motivation behind solving this problem is that we can model the probability of winning a single game (Tekken games are usually FT3 rounds), and a single match (Tekken matches are usually FT2 or FT3) in this fashion.
The number of wins achieved is a random variable $X$ that follows the binomial distribution $X \sim B(n, p)$, where $n$ is the $N$ in our “FT{N}”, and $p$ is the probability of winning that game (or round, or match).
If we have $n$ wins, we want to distribute $l \in [0, n)$ losses among them (but none after the final win, since the game would be over). This is a fairly standard combinatorics problem. If we want to distribute $n$ identical objects into $r$ groups, such that each group can have $0$ or more ($\leq n$) objects, then the number of ways to do so is ${n+r-1 \choose r-1}$. Let’s say we have $l$ losses. We want to distribute those $l$ losses between our $n - 1$ wins (which forms $n - 1 + 1 = n$ groups). The number of ways to do so is thus ${l+n-1 \choose n-1}$.
Given the probability of winning a game as $p$, we can find the probability of winning a FT{N} as -
\[\Pr[\text{FT}\{N\}] = \sum_{l=0}^{n - 1} p^n (1-p)^{l} {l+n-1 \choose n-1}\]I wrote a Python script to implement this formula. I also coded up DocMassacre’s Monte Carlo simulation to provide a comparison, using 1000 trials.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
import math
import random
from typing import Callable
random.seed(1)
FTN_FN = Callable[[int, float, int], float]
TRIALS = 10**3
ROUNDS_PER_GAME = 2 # number of rounds to win in a game, to win the game
FT2_GAMES = 2 # number of games to win in an FT2 set, to win the set
FT3_GAMES = 2 # number of games to win in an FT3 set, to win the set
def ftn_prob(n: int, win_prob: float) -> float:
prob: float = 0.0
for losses in range(0, n):
prob += (win_prob ** n) * ((1 - win_prob) ** losses) * math.factorial(n - 1 + losses) / (math.factorial(n - 1) * math.factorial(losses))
return prob
def sim_prob(n: int, win_prob: float, trials: int=TRIALS) -> float:
m: int = 0
for _ in range(0, trials):
r = 0
for _ in range(n + 1):
r += 1 if (random.random() < win_prob) else -1
m += 1 if (r > 0) else 0
return m / trials
def ft1_prob(ftn_fn: FTN_FN, round_win_prob: float) -> float:
return ftn_fn(ROUNDS_PER_GAME, round_win_prob)
def ft2_prob(ftn_fn: FTN_FN, round_win_prob: float) -> float:
return ftn_fn(FT2_GAMES, ft1_prob(ftn_fn, round_win_prob))
def ft3_prob(ftn_fn: FTN_FN, round_win_prob: float) -> float:
return ftn_fn(FT3_GAMES, ft1_prob(ftn_fn, round_win_prob)) # TODO: change back to 3
def n_man_bracket_prob(ftn_fn: FTN_FN, n: int, round_win_prob: float) -> float:
ft2s = math.floor(math.log2(n)) - 1
ft3s = 1
no_loss_prob = (ft2_prob(ftn_fn, round_win_prob) ** ft2s) * (ft3_prob(ftn_fn, round_win_prob) ** ft3s)
one_loss_prob = 0 # TODO:
total_prob = no_loss_prob + one_loss_prob
return total_prob
if __name__ == '__main__':
n: int = 64
for round_win_prob in (x * 0.01 for x in list(range(0, 70, 5)) + list(range(70, 101))):
print(f'{round_win_prob:.2f}\t{n_man_bracket_prob(ftn_prob, n, round_win_prob):.10f}\t{n_man_bracket_prob(sim_prob, n, round_win_prob):.10f}')
Here are the results -
Here are some comparisons to what DocMassacre found -
Question | DocMassacre’s answer | Analytical answer | Monte Carlo answer |
---|---|---|---|
^{1} | $0.018$ | $0.015$ | $0.011$ |
^{2} | $0.71$ | $0.71$ | $\sim 0.71$ |
^{3} | $0.0007$ | $0.0005$ | $0.0006$ |
^{5} | $0.000001$ | $0.000003$ | $0.000002$ |
The values are fairly similar, with the simulated values more or less agreeing with DocMassacre’s results (he might have used more trials). The simulations consistently overestimate the probability compared to the analytical values, though, especially for the lower win probability rates (till $\approx 0.7$).
We can modify the script to account for double elimination brackets as is used in actual Tekken tournaments. I will explore this later, however.
https://twitter.com/luria_justin/status/1627749352373424134 ↩ ↩^{2} ↩^{3}
https://twitter.com/luria_justin/status/1627786360189775872 ↩ ↩^{2}
https://twitter.com/luria_justin/status/1627786892648259585 ↩ ↩^{2}
https://twitter.com/luria_justin/status/1628032344190853126 ↩
https://twitter.com/luria_justin/status/1627833348461678593 ↩
Breaking Bad is one of the finest television shows I’ve seen. I recently binge-watched the entire five seasons of the show. In doing so, I noticed the unique presentation of the opening credits where cast members have an elemental symbol from the periodic table highlighted in their name, reflecting the importance of chemistry in the show (and also calling back to the design of the title). I wondered how the show runners might have done this, and thought about writing a program to do this “elementalization” automatically.
From observing how these elements are highlighted in the cast names, I made the following observations -
Ar
is
assigned to both actor Aaron Paul and producer Stewart A. Lyons)Ar
cannot be assigned to Cynthia Rodgers)The problem then is to assign elemental symbols (e.g. H
, Li
, Na
) to a list of cast member
names if that symbol exists in their name in a valid way, which matches my earlier observations.
Some names might have multiple possible assignments, and others might have none. The program should
return all possible valid assignments for each name.
As presented, the problem is easy to solve. Simply iterate through the list of names and for each, check if every possible elemental symbol is present in it and return that list. I manually compiled a list of elemental symbols in the periodic table from Ptable, which you can find here. Since case is unimportant, all the cast member and elemental symbol names are preprocessed to be in lowercase for simplicity. There might be names in which an element can be assigned to multiple places in the name, however I chose to leave this to the person responsible for the credits. The relevant snippet in Python is given below -
from typing import List, Dict
from collections import defaultdict
def assign_elements(names: List[str], elements: List[str]) -> Dict[str, List[str]]:
assignment = defaultdict(list)
for name in names:
for element in elements:
if element in name:
assignment[name].append(element)
return assignment
For the first-billed cast from the Pilot, we get the following output -
{
'bryan cranston': ['b', 'c', 'n', 'o', 's', 'cr', 'br', 'y', 'ra'],
'anna gunn': ['n', 'na', 'u'],
'aaron paul': ['n', 'o', 'p', 'ar', 'au', 'pa', 'u'],
'dean norris': ['n', 'o', 's', 'i', 'no'],
'betsy brandt': ['be', 'b', 'n', 's', 'br', 'y', 'nd', 'ra', 'ts'],
'rj mitte': ['te', 'i'],
'max arciniega': ['c', 'n', 'ar', 'ni', 'ga', 'in', 'i'],
'john koyama': ['h', 'n', 'o', 'k', 'y', 'am'],
'steven michael quezada': ['h', 'c', 'n', 's', 'v', 'te', 'i', 'u'],
'marius stan': ['n', 's', 'ar', 'i', 'ta', 'u'],
'aaron hill': ['h', 'n', 'o', 'ar', 'i'],
'gregory chase': ['h', 'c', 'o', 's', 'as', 'se', 'y', 're'],
'carmen serano': ['c', 'n', 'o', 's', 'ar', 'ca', 'se', 'er', 'ra', 'no'],
'evan bobrick': ['b', 'c', 'n', 'o', 'k', 'v', 'br', 'i'],
'roberta marquez seret': ['be', 'b', 'o', 's', 'ar', 'se', 'er', 'ta', 're', 'u']
}
We could stop here. However, it was interesting to me to think about the additional constraint of
restricting an element from being used for more than one cast member. It might become repetitive if
the common symbols like H
, S
and O
are reused. What would an algorithm to obey this additional
constraint look like?
We cannot simply use a greedy algorithm of assigning the first available symbol to a name. Consider
two hypothetical names Br and B. Br can use the symbols B
and Br
, while B can only use
the symbol B
. If we first assign B
to Br, then we won’t have any symbol for B, even though
we would have had one had we used Br
in the first place. We could think of backtracking and trying
different assignments if we encounter an impossibility, but we wouldn’t know if there actually is an
impossibility or not until we exhaust all options, which could take a long time.
We can instead model this problem using a bipartite graph. On one side, we have a set of nodes, each containing the name of a cast member. On the other side, we have another set of nodes, each containing an elemental symbol. Using the previous solution, we can construct edges between these two sets of nodes, connecting a name to every elemental symbol that can be assigned to it. We need to choose a set of edges (which connect a name to an elemental symbol which can be assigned to it) such that no two edges coincide (because that means we are assigning two elements to the same name) and every name-node lies on an edge in the selection (because we don’t want to leave a name unassigned if we can help it).
This is exactly the problem of finding a maximum
matching in a bipartite graph. There
are many well-known algorithms to solve this. We will use the Hopcroft-Karp
algorithm as implemented in the
networkx
package.
from typing import List, Dict
import networkx as nx
def assign_unique_elements(names: List[str], elements: List[str]) -> Dict[str, str]:
graph = nx.Graph()
graph.add_nodes_from(names, bipartite=0)
graph.add_nodes_from(elements, bipartite=1)
graph.add_edges_from([(name, ele) for ele in elements for name in names if ele in name])
return nx.bipartite.maximum_matching(graph, top_nodes=names)
The output using the cast from the Pilot again is -
{
'gregory chase': 'h',
'aaron hill': 'n',
'anna gunn': 'na',
'bryan cranston': 'b',
'aaron paul': 'o',
'roberta marquez seret': 'be',
'dean norris': 's',
'evan bobrick': 'c',
'betsy brandt': 'br',
'steven michael quezada': 'v',
'max arciniega': 'ar',
'marius stan': 'i',
'carmen serano': 'ca',
'rj mitte': 'te',
'john koyama': 'k',
}
I can imagine the show runners might want to focus the more “dramatic” elemental symbols (i.e. the
two-character ones) on the first-billed cast displayed first (e.g. assigning Br
to Bryan
Cranston instead of Betsy Brandt). I can also imagine one might want to iterate through multiple
assignments to find something more appropriate than the first one returned (I don’t know how the
minds of TV show creators work). Refer to
Uno, T. (1997)
which proposes an algorithm for this, since it isn’t directly implemented in networkx
.
The usage of a classic graph algorithm to solve a problem I noticed in the world was satisfying to me. I can think of a few additional questions to ask like -
This paper defines a metric to measure the local interpretability of a model - the number of operations the model requires to make a prediction. The metric is correlated with local interpretability by measuring the latter with a human-subject study. Users were asked to predict the output of a model given an input, and predict the output given counterfactual input i.e., slight modifications of the input. The results show that the operation count metric is correlated with local interpretability, and that decision trees and logistic regression models are more interpretable than neural networks.
The metric defined to act as a proxy for interpretability is the runtime operation count.
Effectively it is the number of operations a human must carry out in order to simulate a single
prediction the model. This was performed by instrumenting the prediction operation for existing
trained models in Python’s scikit-learn
package.
The authors ran a crowdsourced experiment $(N=1000)$ using the Prolific
platform. Participants were asked to predict the output of a model on an input (by replicating the
actual calculations involved), and then asked to predict the output again for a slightly modified
version of the input. Three model types were used - decision trees (DT), logistic regression (LR)
and neural networks (NN). They were trained using scikit-learn
. Training details are omitted here.
Users were trained in how to perform these calculations for each model type using a small fill in the blank exercise conducted before the actual prediction task. The models were trained using synthetic data to avoid the effects of domain knowledge on prediction.
The survey was administered through Qualtrics. Hypotheses for the study were pre-registered on the Open Science Framework.
Based on the number of correct responses for the two tasks of simulation and counterfactual prediction for each of the three models, three hypotheses were made regarding local interpretability of the models - $DT > NN$, $DT > LR$, $LR > NN$. p-values and confidence intervals were calculated for each using Fisher’s Exact Test.
The Fisher Exact Test gives exact p-values for $2 \times 2$ contingency tables, where samples from $2$ different treatments can be classified in $2$ different ways. In the case of this paper, an input from $2$ different models is classified into correct or incorrect based on the user’s response. Fisher’s Exact Test then tells us how likely the particular assignment of values to each cell are, with the null hypothesis being independence of all categories. It is a special case of the chi-squared test but works for all sample sizes.
The results show that decision trees are more simulatable and “what-if” locally explainable than logistic regression or neural network models. However, the results did not find evidence for logistic regression to be more locally interpretable than neural network models.
Visual comparison of the plots of accuracy vs. operation count across all three tasks (simulatability, “what-if” local explainability and local interpretability, the last of which simply measures how many users got both of the previous tasks right), shows that they are correlated. Also graphed was the time taken vs. the operation count, and the accuracy vs. the time taken.
The authors point out how this metric may be used to assess the interpretability of a model without a user study.
This is a neat paper which attempt to quantitatively validate a hypothesis that most people would lend credence to being true. Operation count is of course a very crude way to approximate interpretability, and is absolutely unfeasible for the mammoth network sizes that power today’s models. The paper also doesn’t attempt to answer questions regarding interpretability for different models with similar operation counts. Is a large decision tree equally as interpretable as a small neural network? The graphs seem to show that the mean operation counts across the three model types are significantly different. I would have liked to see how interpretability is affected when the operation counts for different model types are similar.
I also question whether brute calculation of the model to produce outputs is how most people would try to interpret a model. One can probably build some intuition from the feature weights and response curves to be able to make accurate predictions for the smaller model types without having to actually simulate the prediction function on paper. Nevertheless, it is a simple and good enough lower bound to start asking questions about interpretability.
]]>This paper identifies various human cognitive biases from literature that could have an impact in the interpretation of rule-based explanations of AI systems. For each bias identified, it provides an explanation, its hypothesized impact on rule interpretation and a debiasing technique informed by literature. The paper offers a set of recommendations for rule-learning algorithms and software to help reduce the chances of users misinterpreting their rules. Finally, the paper concludes with a discussion of its limitations and future work.
The authors conducted a literature review of cognitive science and rule-based learning research. Using personal judgment, they identified the relevant biases and debiasing techniques, and hypothesized the impact each bias might have had on the rule interpretation.
A summary of the full list of biases identified, as well as their impact on rule interpretation and associated debiasing technique can be seen in the figure from the paper below.
The authors extensively discuss the limitations of their work. First, the need for validation of these biases through human-subject experiments. The biases have been identified as relating to the interpretation of rules based on the authors’ judgment. This needs to be experimentally validated. There is also a lack of research in validating these biases in the context of rule interpretation.
Next, the authors discuss the effect of domain knowledge in interpreting rules, as well as individual differences between users that can cause rules to be interpreted differently. They also point to the lack of research on the efficacy of debiasing techniques and the need to explore them further.
Lastly, they discuss extending the scope of this research beyond biases to misinterpretation arising from cognitive load and remembrance. They also discuss applying these principles to forms of explanation beyond logical rules.
This is a very well written paper, and a much needed one based on the lack of human-centred evaluation of explanations I identified in previous papers I’ve summarized. It states its assumptions and limitations very clearly, and is well-researched, with each bias and debiasing technique drawing on a lot of relevant papers in cognitive science. The directions for future work are analyzed well, and the biases themselves seem convincing in that I can visualize myself falling for them when interpreting rules.
I see the logical next step for the paper to be investigating the effect of specific biases on rule interpretation. For example, take the bias related to the misinterpretation of AND. The debiasing technique suggested is to use AND to connect propositions and not categories (don’t use it to say Linda is a bank teller AND active in the feminist movement, say Linda is a bank teller and Linda is active in the feminist movement). The effect of the bias and debiasing technique could be quantitatively explored in a rule-based interpretation context. The same study methodology could be applied to all biases and debiasing techniques presented in the paper. This would give us a insight into which biases we should worry most about when designing explainability techniques for AI systems.
]]>The theme of this talk was about how ML and RL can be used by game designers to make the process of designing and testing games more painless and efficient. The speakers hypothesize that ML/RL will play an increasing role in game dev and will move from spectacular, one-off demos (e.g., AlphaGo, AlphaStar, OpenAI Five) to routine deployment. They recognize that game design goals are difficult to specify as reward functions or labels. They identify three key ways in which ML can become more accessible to game designers, and highlight one work for each.
ML and RL agents are black boxes. Their policy-making and inner workings need to be interpretable by game designers so that they can appropriately respond to the agents’ observed behaviour.
The speakers highlight a portion of the work by Broll et. al. (2019) in which the authors attempt to create imitation learning components to imitate human-like behaviors in Minecraft. Evaluating the “human-likeness” of behaviour is time-consuming, since one has to manually examine many different rollouts (episodes of play by the agent). The authors came up with a visualization tool for a single rollout which could indicate actions that ‘would have been taken’ (counterfactuals) were the agent in a different location of the state. This could be done with any state dimension other than location. Playing through the rollout frame-by-frame, the speaker noted that the actions stayed relatively stable when the enemies were far away from the agent, but the actions changed sharply and erratically when the enemies surrounded the player. This indicated that such situations were perhaps too few in the training set, and that more should be added.
Broadly, there need to be game design-centric visualization tools to help game designers understand how agents behave. This could either inform how the agent behaviour needs to be changed, if the agents are themselves part of the game, or how the game itself needs to be changed, if the agents are being used merely for playtesting.
AI agents are deployed in games for specific purposes. It would give a great deal of creative expression to game designers if the behaviour of powerful agents could be customized to suit their game design goals. Designers need to be provided with easily-used “knobs” which can tweak the behaviour of agents in meaningful ways.
The speakers highlight the work by Zhan et. al. (2020). In this, they apply the concept of *labeling functions’ from weak supervision learning to the problem of imitation learning. They use designer-provided labeling functions which can classify a basketball player’s trajectory on court using three metrics and three styles within those metrics (e.g., destination - close by, medium, far away). The idea is to take as additional input these style settings and use it to generate a trajectory. The labeling functions are applied to the generated trajectory to produce an error with the original settings, which is backpropagated to update the policy.
Broadly, there needs to be more focus on providing easy-to-use tools for meaningfully changing the behaviour of trained RL agents while not having to significantly alter the agent itself (e.g., train a whole new agent). The speakers speculate that perhaps these control knobs requiring designer input offloads too much work to the designer, and consider the possibility of having more abstract control knobs.
The interaction loop between AI agents and game designers is hard to study. Agents may be deployed for applications like playtesting, but the combination of how designers interpret and influence the agents needs to be better observed.
The speakers present some preliminary efforts towards this by introducing a game environment geared towards studying automated game balancing. This is called 0 AD and is an open-source real-time strategy game akin to Age of Empires that allows for writing custom bots to explore the balance of the game. Playing multiple games against each other will reveal effective strategies that will inform the designer of how they wish to change the game.
The speakers conclude by mentioning the biggest opportunity of this research area as being able to do automated game design and playtesting. They mention the biggest challenge as doing player modeling in rich game environments.
]]>The theme of this talk was regarding how AI can be used to co-create with humans. The first half of the talk demonstrated this with two co-creative experiences developed by the speaker to improvise dance and theatre. The second half explored how AI co-creation could be applied to game design, with the speaker addressing what challenges there are in the application of RL to it, and how a work of his addresses a particular challenge.
AI is already used as part of creative processes to -
It can be applied to games by -
The first possibility has been explored using two creative installations which use improv to create emergent dance and theatre.
LuminAI does partner dance. Based on a user’s dance moves, it will generate gestures that are contextual and appropriate to the human performer. A challenge in implementing it was the knowledge-authoring bottleneck of designing various gestures. The solution to it was to learn them from users, learn the sequencing of these actions, and use procedural generation to generate variants of these actions when none are available. The takeaway is to learn from users to avoid the knowledge-authoring bottleneck.
The Robot Improv Circus is a VR game where a participant plays the Props game - they are given a prop and must use it in a way that is surprising and humorous, drawing on the abstract shape of the prop and various shared knowledge between the performer and the audience.
The challenge with this is that the action-set is very open-ended. Selecting an action in a time-sensitive manner from this large pool is tricky even for humans. The speaker’s group solved this challenge by first creating a gesture dataset from humans interacting with a fixed set of props. They trained a conditional VAE to generate variants of these actions. They used computational models of creativity drawn from literature to evaluate each action’s novelty, unexpectedness and quality. They then selected actions based on how well they fit a predetermined creative arc through the dimension-space defined by the aforementioned metrics.
The second half of the talk focused on how human-AI co-creation could be applied to game design and development. The speaker first addressed the challenges that real-world game AI developers face when using ML and RL techniques in their games. In the accompanying AIIDE 2020 paper, the speaker interviewed multiple such developers and thematically analyzed their survey responses. A subset of these challenges are -
The speaker explored the first challenge in a work done by a mentee of theirs. They explored how a designer could provide style-rewards to an agent to get it to obey a desired style in its behaviour. Issues with reward hacking were solved using potential- based reward shaping. This enabled expressive control over RL agents.
The speaker concluding with presenting the biggest challenge of this area as doing more human-centered research into these techniques. The biggest opportunity is to get co-creation tools out of labs and into the hands of creators.
]]>This is an (almost) hour-long video which contains excellent advice on how to write an academic research paper. It mainly deals with how to structure the content of your paper, with an emphasis on keeping the reader in mind. I’ll summarize the 7 key points from the talk, with a lengthier explanation following it.
Don’t wait: write
Don’t follow the typical research of process of idea > research > paper. Instead, do idea > paper > research. Let the process of writing the paper guide your research. Writing your ideas down crystallises them, forcing you to think more deeply about your idea, and revealing possibilities you did not consider. It also creates a shareable artifact that you can use to invite collaboration and critique.
Identify your key idea
Imagine your paper to be virus whose goal it is to infect the mind of your reader with one key idea. This idea should be something useful and re-usable that stays with the reader after they’ve finished reading your paper.
Tell a story
Imagine explaining your research to a friend at a whiteboard. You would probably explain the problem first, describe why it is interesting and unsolved, talk about your idea and how it works (with details and data), then compare it with other approaches. Remember that you start losing readers from the first page onwards, so keep your paper engaging.
Nail your contributions
Ensure you focus on your contributions in the 1st page - list them in bullet points and make them refutable. Use forward references to link to where you justify the contributions later in the paper.
Related work: later
Related work can act as a barrier between your idea and the reader’s mind. It is better to include it at the end after you’ve already explained your idea simply. Use them earlier if they help explain your solution path better (e.g. if you’re building off of prior work), but don’t list them for the sake of adding alternative approaches or examples. Give credit freely (it is not a zero-sum game).
Put your readers first (examples)
Your problem is a labyrinth; in your research, you’ve encountered many dead-ends before hitting on the solution. You need to lead your readers comfortably to the solution. Address obvious objections your reader might have. Eschew technical-sounding prose since they will make your reader feel sleepy or stupid. A reader should be able to take something away from your paper even if they skip the details.
Listen to your readers
Get your paper read by as many friendly guinea pigs as possible. Explain carefully what you want (confusion is better feedback than spelling errors). Value the first time readers and make it easy for them to give you feedback. Reviews are gold dust; people are donating their time to write a review for you. Take criticism positively.
This was one of the best pieces of advice from the talk for me, and is something I’ve also slowly begun to see the benefit of. When writing my AML paper, I frequently found unjustified claims or methodological shortcomings that provided directions for literature review or software tests. Doing this right from the start of research is invaluable to getting your ideas down on paper, having a living document which you keep updating with your new insights and readings, and being able to have something to show for all your brainstorming.
Dr. Jones also advocates for submitting work in progress to appropriate publication venues. The main thrust behind this advice was to crystallise your ideas by the process of writing them down as a paper.
Dr. Jones advises that your paper should contain one single key idea that you want to communicate to your reader. He uses the analogy of the idea being a virus that your paper should infect your reader’s mind with. The main thrust behind this advice is to structure your paper around a single, central idea; one that is useful and re-usable, that will stick with readers after they’ve finished reading the paper.
He also encourages researchers to not be intimidated about the kind of ideas they feel they need to have to write a paper. You don’t need to have an idea you think is “brilliant” before you start writing, it is okay to write a paper about a “simple” idea. If, as part of writing the paper, you realize that the idea is indeed unviable, that’s okay. More often than not, you’ll find some complexity that you hadn’t foreseen, and end up with a decent bit of research. Write and communicate about your ideas no matter how insignificant they seem to you.
Dr. Jones asks us to visualize explaining the core idea of our paper to a friend at a whiteboard. One could imagine following the same process in the paper as well. He emphasizes the importance of hooking your readers from the first page itself, since the number of readers who will stick with your paper drop off significantly after the first page.
You should begin with introducing your problem and summarizing your contributions towards solving it. Dr. Jones advises against the common trope of explaining a much larger problem than the one you’re trying to solve. As an example, don’t waste words explaining work done in solving the (very general) problem of bugs in programs. Explain instead the problem of identifying and removing a specific type of bug.
Your contributions are what inform a reader as to whether they want to continue reading your paper. You need to advertise them clearly, in bullet points, with forward references to section numbers where you explain those contributions (or justify the claims you’ve made). Dr. Jones emphasises the need for these contributions to be refutable. For example, don’t say that you “described the WizWoz system”. Instead, say that you “give the syntax and semantics of a language that supports concurrent processes (Section 3)”. Remember that the first page is all the time and words you’ve got to get your reader hooked.
Related work is frequently used to pad the paper with the appearance of scholarliness by cramming into it as many examples of alternative approaches as possible. Frequently, these citations are very cursory, serving only to provide examples of something, and don’t really indicate that you’ve read the paper. It is better to first explain the core idea of your paper simply, and include the related work section at the end. Having too many citations in your prose distracts the reader from what you actually want to convey.
The objective of your paper should be to create a path for the reader from their current state of knowledge to your idea. References should be included if they make creating this path easier. For example, if your work builds off of an existing work in the field, it makes sense to use it to aid your explanations. Any citation should be accompanied with a value-judgment, explaining how this work relates to your paper (in a way that helps the reader appreciate your core idea).
Don’t include citations to make them look bad, and your work look good by contrast. Giving credit to others is not a zero-sum game where you lose face by giving due credit. Acknowledge inspirations for the approach you’re taking. Lastly, acknowledge weaknesses in your paper (towards the end).
Make sure to always keep your reader in mind when writing your paper, and make decisions while asking yourself what would make things more understandable for your readers. Imagine the problem you’re trying to solve as a labyrinth. In your research, you’ve traveled down various promising corridors, only to be met with obstacles or dead ends. After sustained effort, you’ve managed to hit upon the exit. Your job should be to guide the reader along a much shorter, simpler path directly to the exit.
To do so, ask yourself about potential objections or alternative, seemingly simpler approaches your reader might have and address these pre-emptively. Be careful with using impressive, technical-sounding paragraphs lest you make your reader feel sleepy or stupid. Remember that conveying the intuition behind your paper is primary, and that the reader should be able to take this intuition away even if they skip the details of your paper. Use examples.
In communicating the essence of your paper, don’t skip out on the details that make for reproducibility. A potential concern is simplifying your paper to such an extent that a reader begins to doubt the value of your contributions. Balance readability with showing your reader the failed solution pathways that indicate how the problem is more complex than it seems. You can use your own research or rely on the work of others for this.
You should get your paper read by as many “friendly guinea pigs” as possible, and as soon as possible. You want to also explain to them what you actually want as feedback - “I got confused by this paragraph” is much more valuable than “there is a spelling mistake here”.
Remember that any reader can only read your paper for the first time once, so getting valuable feedback from them is crucial. Make it as easy as possible for them to give them feedback, especially when they might be uncomfortable. Confusing paragraphs where the reader might feel stupid for not understanding things should be addressed.
Getting feedback from experts is very valuable. Share your work with them when you’re nearly done (i.e. you have a completed draft), and especially if you’ve cited them. Ask them if you’ve described their work fairly, and you’ll very likely receive very helpful responses. These experts are likely to be referees for conferences you might submit your work to anyway, so getting early feedback from them is a big plus.
Lastly, treat every piece of feedback you receive like gold dust. Be truly grateful for every bit of praise and criticism. Remember that readers and reviewers are donating their time on this Earth to you. The hour it took them to read your paper (or write a review) put them literally an hour closer to death. Respect their time and their feedback. Even if the feedback seems hostile or “stupid”, ask yourself how you could rewrite your paper so that this seemingly “stupid” reviewer can still understand it.
]]>This work presents a method to automatically uncover user-interpretable strategies for solving a logic-based puzzle game called Nonograms. The method involves using a DSL to describe a pattern- condition-action rule which can be applied to solve states in the game. Sound rules in this format are uncovered using an SMT solver, and the rules sets are optimized for coverage and conciseness.
Nonograms is a popular deductive logic-based puzzle game similar to Sudoku or Kakuro. It involves a square grid with integer hints provided for each row and column. The goal is to select the squares to fill in such that the number of squares in each contiguous sequence matches the integer hint provided. It involves examining the constraints on the blocks, both from the integer hints and based on what has already been filled in and deducing which subsequent blocks need to be filled.
The Luna Story series by Floralmong is an excellent mobile app for trying out Nonogram puzzles.
The DSL is designed to be human-interpretable, and to encapsulate strategies for solving Nonograms in the form of a pattern-condition-action rule.
The pattern contains constructs that allow parts of a state (a line in a Nonogram level, its integer hints and currently filled/unfilled blocks) to be referenced later. It is basically a system to allow binding the state to the given pattern. It is designed to include Nonogram-state concepts like hints, blocks and gaps (sort of like state features). The bindings themselves have levels of generality.
The condition describes when the action described in the rule can be applied. The action as is designed in the paper only allows for filling in a block.
Finding interpretable rules in this DSL proceeds in 3 steps -
Given a training set consisting of Nonogram states $s$, an SMT solver is used (along with a specification of Nonogram rules) to calculate the maximally filled state $t$ that can be achieved from $s$. These transitions $\langle s, t \rangle$ are the output from this stage.
In this phase, an SMT solver is used to find a DSL program which includes the transition $\langle s, t \rangle$ and is sound with respect to the game rules. This means that the transition holds for the rule condition and the action obeys the game rules. Limitations of the SMT solver like requiring a finite bound for program size and soundness are addressed by iteratively increasing the program size for the search.
The generated rules are modified to make them more general and concise. The former is achieved by modifying the patterns and conditions of rules by brute-force enumeration. Patterns are modified by replacing bindings with their more general versions and checking if the rule is still sound. Conditions are modified by synthesizing a new program that covers strictly more states than the generated one.
The latter goal of conciseness is achieved by using a designer-provided cost function which provides a quantitative measure of the complexity of a rule. New rules are synthesized and kept if their cost is less than that of the current rule.
The rules obtained from rule synthesis are pruned by selecting a subset of $k$ rules which best cover the states in the training set. Here, coverage is measured by the total number of cells filled.
The testing data is obtained from commercial Nonogram puzzle books and digital games. The train data is presumably obtained from a subset of this data with restricted line lengths. Crucially, all puzzles are able to be solved by considering a line at a time and don’t require any guesswork. Individual states were obtained from the process of solving the puzzle using the SMT solver.
The paper does not mention the cost function used
The authors encoded control rules in the DSL using strategies sourced from puzzle books and strategy guides for Nonograms. These serve as a benchmark for evaluating the rules recovered by the system.
The system was able to recover 9 out of the 14 control rules. The authors hypothesize that with slight modifications, the system would be able to recover the missing rules as well. They note that the existing system recovered rules which covered much of the same states as the missing control rules.
The coverage of the control rule set and learned rule set are compared. Coverage is the number of cells covered by applying the rules in the set to the transitions in the test set. The learned rule set covers nearly $98\%$ of the transitions covered by both together.
This notion of coverage is not intuitive. I would think coverage would measure the proportion of transitions to which a rule set applies. The notion in the paper is a measure of how many cells are correctly covered using the rule set across a variety of states, and is more a measure of the effectiveness of a rule set.
The authors clarify their goal as not generating human-interpretable strategies, but generating strategies that humans are likely to use. Possible lines of investigation involve using player solution traces for cost estimation.
They discuss how their work ties into puzzle game generation tools.
Lastly, they discuss the applicability of this work to other puzzle games.
This is an excellent paper, one that I’ve read earlier, and was responsible for instigating my current project. The background material provided is excellently detailed, and helped me identify my current project as being situated in automated game analysis as well.
The domain used doesn’t seem to have a lot of potential for strategy, given that the authors restricted themselves to Nonogram puzzles which could be solved line-by-line. Extending it to Sudoku while preserving that constraint would be very restrictive, since most decent Sudoku puzzles require some amount of guesswork and backtracking and cannot be solved using only deduction. This would impact the availability of an oracle to generate transitions (solved states).
The state space representation is also very simple, since we only need a single line. However, using more sophisticated state spaces for other domains would necessitate a better DSL, which is a design task.
There is a lack of assessing the interpretability of the uncovered strategies. A robust user evaluation would help alleviate concerns. Overall, the evaluation of the uncovered rules is rather simplistic. It is possible that the rules don’t make very much semantic sense to a human, or aren’t as intuitive, despite the increased “coverage”. Providing some examples of the learned rules would help develop a qualitative understanding of the type of rules learned, and conducting the aforementioned human trial would add quantitative support.
This method involves generating a dataset of game states and the associated next best move, and finding rules (DSL programs) which map onto them. Finding the next best move might be simple in logic-based games like Nonograms and Sudoku where a solver can practically find the best squares to fill (even with guesswork), but this is not so clear in games like chess, where the best move is dependent on who’s making it (i.e., the strength of the engine). Perhaps we could in fact generate transitions using a particular engine and design a DSL to describe “chess strategies” and learn them from the training set. I believe the DSL design is going to be a major bottleneck.
]]>This paper presents a technique to perform hierarchical reinforcement learning. It presents a method to train a policy to use previously learned skills to learn more complex skills. The skills can be interpreted in natural language. The authors use a custom Minecraft environment built using Project Malmo as their testbed. They conduct an ablation study using learning curves to demonstrate the impact of their algorithm choices. They compare their hierarchical policy with a “flat” policy to demonstrate its generalizability. They demonstrate its interpretability by displaying the hierarchical plan for several tasks in a tree-based format.
The method presented in the paper trains a policy which learns to solve tasks. Tasks (denoted by $g$) are “skills” which the agent can learn, and each has its associated reward function $R(s, g)$. In the paper, tasks are constrained to a specific template $\langle \text{skill}, \text{item} \rangle$ e.g., $\langle \text{find}, \text{blue} \rangle$ which denote object manipulation tasks. Each task can thus be represented in natural language.
The model is trained to solve tasks in a hierarchical fashion. It first learns to solve “base” level tasks using the environment action set. The task set is then augmented with “higher-order” tasks which can ostensibly be solved using the lower-order task policies, and a “higher-order” policy is trained on this new task set, which constrains it to use as “actions” the tasks from the lower-order policy. This process is repeated until all tasks have been learned. Figure 1 in the paper shows how the task “stack blue” is learned by composing the lower-order tasks “get blue”, “find blue” and “put blue”.
Concretely, the model defines the global policy of a level in the hierarchy (denoted by $\pi_k$, with $k$ being the level) using the following sub-policies,
The global policy $\pi_k(a_t|s_t,g)$ can thus be denoted compactly as -
\[e_t \sim \pi_k^{\text{sw}}(e_t|s,g) \tag{switch policy}\] \[g_t' \sim \pi_k^{\text{instr}} (g'_t|s_t,g) \tag{instruction policy}\] \[a_t \sim \pi_k(a_t|s_t,g) = \pi_{k-1}(a_t|s_t,g')^{(1-e_t)}\pi_{k}^{\text{aug}}(a_t|s_t,g) \tag{global policy}\]The global policy is augmented with the use of an STG (spatial temporal grammar) to capture temporal relationships between tasks. If a particular order of task selection has been used in the past, the STG will be able to provide it as a prior to the other policies. Concretely, the STG learns the distribution of an alternate Markov chain from the sequence of $\langle e_t,g_t’ \rangle$ tuples conditioned on a specific task. giving us the transition probabilities $\rho_k(e_t,g_t’|e_{t-1},g_{t-1}’,g)$ and the distribution of $\langle e_0,g_0’ \rangle$ as $q_k(e_0,g_0’|g)$. These are used to augment the switch and instruction policies of the respective level.
Learning this policy proceeds in two phases. First, tasks are learned only by sampling tasks from the previous task set, ensuring that the global policy learns to connect tasks to previously learned tasks. In the next phase, it is allowed to sample the full action set, allowing it to discover new ways to complete tasks.
All policies are optimized using advantage actor-critic (A2C). The advantage functions used are provided in the paper. Other training details to stabilize learning and increase episode efficiency are also provided. The STG is trained using simple MLE.
The authors use a custom room in Minecraft built using the Malmo platform as their environment. They define the tasks as “find X”, “get X”, “put X” and “stack X” with 6 different block colours, amounting to 24 different tasks. Each task has its own reward function also defined.
The authors adopt a specific skill acquisition order but stress than any alternate order would not invalidate the conclusions.
The authors perform an ablation study with different versions of their model without certain features and a baseline flat policy. They plot the learning curves for each of these models to compare average reward gained and convergence rates. They also demonstrate the efficacy of their 2-phase curriculum learning using this method.
The authors compare the performance of their policy vs. a flat policy trained on the first level of tasks (i.e., “get X”). They use two rooms with differing number of objects for this.
Lastly, the authors present a visualization of the plans of several tasks generated by their policy. They use it to claim interpretability.
The ablation study confirms that the algorithm and training choices do indeed provide higher reward compared to the flat policy and improve convergence rate. Their policy far outperforms the flat policy, demonstrating the generalization capability of the model.
The authors admit the reliance on “weak human supervision” to define the order of skills to be learned in each training stage, with future work involving how to increase the task set automatically.
This work is situated more in hierarchical RL than in XRL. The “explanations” or interpretability of the policy was very briefly touched on in the paper, and the bulk of the focus was more on demonstrating performance improvement. Regarding interpretability, it does seem like a good tool for showing the hierarchy of the learned policy. However, I question whether a domain will have tasks which neatly fit the $\langle \text{skill}, \text{item} \rangle$ template used here, or whether the tasks will have a sensible hierarchy among them. We might have the policy learn a hierarchy for tasks which make no sense to a human (which isn’t a bad thing, since it helps debug the system).
I’m not very familiar with the field of hierarchical RL. It seems like a major authoring burden to define a set of useful tasks, and an appropriate hierarchy of them for learning. The tasks need to be composed in terms of the base action sets, and, if we want interpretability, need to be translated for humans.
Regarding changing the order of the task sets, the authors simply mention -
One may also alter the order, and the main conclusions shall still hold.
I would have liked to see the effects of reversing the order used in the paper on training and performance.
]]>