The Sniff! - Lab¶

In this assignment we take on chemotaxic exploration. We’ll compare two of our random agents, levy and diffusion, with a gradient searcher who operates akin to a E. Coli (the simple model, anyway).

There are two sections. First we examine exploration for a single target with a variable scent in an open field. Second, we play with a maze.

Install and import needed modules¶

# # Install explorationlib?
!pip install --upgrade git+https://github.com/parenthetical-e/explorationlib
!pip install --upgrade git+https://github.com/MattChanTK/gym-maze.git

# Import misc
import shutil
import glob
import os
import copy
import sys

# Vis - 1
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# Exp
from explorationlib.run import experiment
from explorationlib.util import select_exp
from explorationlib.util import load
from explorationlib.util import save

# Agents
from explorationlib.agent import DiffusionDiscrete
from explorationlib.agent import TruncatedLevyDiscrete
from explorationlib.agent import GradientDiffusionDiscrete

# Env
from explorationlib.local_gym import ScentGrid
from explorationlib.local_gym import ScentMazeEnv
from explorationlib.local_gym import create_grid_scent
from explorationlib.local_gym import create_maze_scent


# Vis - 2
from explorationlib.plot import plot_position2d
from explorationlib.plot import plot_length_hist
from explorationlib.plot import plot_length
from explorationlib.plot import plot_targets2d
from explorationlib.plot import plot_scent_grid

# Score
from explorationlib.score import total_reward

# Pretty plots
%matplotlib inline
%config InlineBackend.figure_format='retina'
%config IPCompleter.greedy=True
plt.rcParams["axes.facecolor"] = "white"
plt.rcParams["figure.facecolor"] = "white"
plt.rcParams["font.size"] = "16"

# Dev
%load_ext autoreload
%autoreload 2

Section 1 - singular scent¶

How much faster can smell get you there?

Background: the model of scent in our sniff agent (aka GradientDiffusionDiscrete) is as simple as can be.

When the scent gradient is positive, meaning you are going “up” the gradient, the probability of turning is set to p pos.
When the gradient is negative, the turning probability is set to p neg. (See code below, for an example).
If the agent “decides” to turn, the direction is uniform random.
The length of travel before the next turn decision is sampled from an exponential distribution just like the DiffusionDiscrete

Note: this lab the open field and maze are defined on a discrete (integer) grid. In previous labs we worked with a continuous field.

Question 1.1¶

Make a blind guess for how much better the sniffing agent will be? Will the other random agents ever come close? Answer this for both the open field, with a single target, and the maze, with its walls barriers and dead ends.

# Write your answer here, as a comment

The name of the env for this section is ScentGrid. Like adding targets, adding a scent is a separate step from creating the env. So the example code below.

Example - run 1 experiment and visualize some of its results.¶

# Experiment settings
num_experiments = 1
num_steps = 1000
p_neg = 1
p_pos = 0.5
scent_sigma = 10


# Env
detection_radius = 1
min_length = 1
max_length = 10

env = ScentGrid(mode="discrete")
boundary = (100, 100)
target = (5,5)
coord, scent = create_grid_scent(boundary, amplitude=1, sigma=scent_sigma)
env.add_scent(target, 1, coord, scent)
# TODO plot scent

# Agents
diff = DiffusionDiscrete(min_length=min_length, scale=1)
levy2 = TruncatedLevyDiscrete(min_length=min_length, max_length=max_length, exponent=2)
sniff = GradientDiffusionDiscrete(num_actions=4, min_length=min_length, scale=2, p_neg=p_neg, p_pos=p_pos)

# Cleanup 
for path in glob.glob("data/test4_*.pkl"):
    os.remove(path)

# !
levy2_exp = experiment(
    f"data/test4_levy.pkl",
    levy2,
    env,
    num_steps=num_steps,
    num_experiments=num_experiments,
    dump=False,
    split_state=True,
)
diff_exp = experiment(
    f"data/test4_diff.pkl",
    diff,
    env,
    num_steps=num_steps,
    num_experiments=num_experiments,
    dump=False,
    split_state=True,
)
sniff_exp = experiment(
    f"data/test4_sniff.pkl",
    sniff,
    env,
    num_steps=num_steps,
    num_experiments=num_experiments,
    dump=False,
    split_state=True,
)

Plot the scent

Note: the axis is in matrix space not Grid space. Use this to get a sense of how high and wide the scent is.

plot_scent_grid(env)

Plot the walk (in grid space)

plot_boundary = (100, 100)

num_experiment = 0
ax = plot_position2d(
    select_exp(levy2_exp, num_experiment),
    boundary=plot_boundary,
    label="Levy2",
    color="purple",
    alpha=0.6,
    figsize=(3, 3),
)
ax = plot_position2d(
    select_exp(diff_exp, num_experiment),
    boundary=plot_boundary,
    label="Diffusion",
    color="brown",
    alpha=0.6,
    ax=ax,
)
ax = plot_position2d(
    select_exp(sniff_exp, num_experiment),
    boundary=plot_boundary,
    label="Sniff",
    color="green",
    alpha=0.6,
    ax=ax,
)
ax = plot_targets2d(
    env,
    boundary=plot_boundary,
    color="black",
    alpha=1,
    label="Targets",
    ax=ax,
)   

Total reward

print(f'Levy - {np.sum(select_exp(levy2_exp, num_experiment)["exp_reward"])}')
print(f'Diff - {np.sum(select_exp(diff_exp, num_experiment)["exp_reward"])}')
print(f'Sniff - {np.sum(select_exp(sniff_exp, num_experiment)["exp_reward"])}')

Question 1.2¶

In the example above p pos was 0.5, and p neg was 1.0. This means that when the gradient was positive half the time the walker would change direction anyway. It also means the all the time when the grad was negative, the explorer would change direction.

Do these parameter choices seem optimal to you?

Make a best guess for how to improve them, if you think they can be improved. Explain your choice.

# Write your answer here, as a comment

Question 1.3¶

Test your hypothesis from Question 1.2. Use total reward as your metric, and the code above to get started.

Note: Leave the scent sigma parameter value set to 10.

Were you right?

What is the best set of p pos and p neg that you can find?

# Write your code here

# Write your answers here, as a comment

Question 1.4¶

In Question 1.3 we held scent sigma set to 10. If we vary scent sigma on (1,2,5,10), do you think this will change your best set of p pos and p neg?

Guess first, then test.

# Write your answers here, as a comment

# Write your code here

Question 1.5¶

Was your hypothesis in Question 1.4 right?

What are the best sets of p pos and p neg you can find for the four scent sigmas in Question 1.4?

# Write your answers here, as a comment

Section 2 - a-maze-zing¶

I have modifed and existing Maze env to have a scent. Here is an example of it, as a gif.

Maze

The agent starts at the top (red), and tries to find the exit (blue) in the bottom right. In our version we can add a scent field to the exit. This can help our sniffer solve the maze more quickly, or at least that is what we guess should happen in principle.

Example - one maze experiment¶

# Experiment settings
num_experiments = 1
num_steps = 50000
p_neg = 1
p_pos = 0.5
scent_sigma = 5

# Env
detection_radius = 1
min_length = 1
max_length = 10

# Env
boundary = (10, 10)
env = ScentMazeEnv(maze_size=boundary)
coord, scent = create_maze_scent(boundary, amplitude=1, sigma=scent_sigma)
env.add_scent(scent)

# Agents
diff = DiffusionDiscrete(num_actions=4, min_length=min_length, scale=1)
levy2 = TruncatedLevyDiscrete(num_actions=4, min_length=min_length, max_length=max_length, exponent=2)
sniff = GradientDiffusionDiscrete(num_actions=4, min_length=min_length, scale=2, p_neg=p_neg, p_pos=p_pos)

# Cleanup 
for path in glob.glob("data/test4_*.pkl"):
    os.remove(path)

# !
levy2_exp = experiment(
    f"data/test4_levy.pkl",
    levy2,
    env,
    num_steps=num_steps,
    num_experiments=num_experiments,
    dump=False,
    split_state=True,
)
diff_exp = experiment(
    f"data/test4_diff.pkl",
    diff,
    env,
    num_steps=num_steps,
    num_experiments=num_experiments,
    dump=False,
    split_state=True,
)
sniff_exp = experiment(
    f"data/test4_sniff.pkl",
    sniff,
    env,
    num_steps=num_steps,
    num_experiments=num_experiments,
    dump=False,
    split_state=True,
)

Plot the experiment.

Note: unfortunately out standard plotting methods “flip” the axis when displaying the maze. So it looks like we begin at the middle, and finish at the top right. Don’t let this distract you. Nothing important has changed.

plot_boundary = (10, 10)

num_experiment = 0
ax = plot_position2d(
    select_exp(levy2_exp, num_experiment),
    boundary=plot_boundary,
    label="Levy",
    color="purple",
    alpha=0.6,
    figsize=(3, 3),
)
ax = plot_position2d(
    select_exp(diff_exp, num_experiment),
    boundary=plot_boundary,
    label="Diff",
    color="brown",
    alpha=0.6,
    ax=ax,
)
ax = plot_position2d(
    select_exp(sniff_exp, num_experiment),
    boundary=plot_boundary,
    label="Sniff",
    color="green",
    alpha=0.6,
    ax=ax,
)

Total reward

Note: Postive values are good. Negative values bad. If you can’t get positive values, try increasing num steps.

print(f'Levy - {np.sum(select_exp(levy2_exp, num_experiment)["exp_reward"])}')
print(f'Diff - {np.sum(select_exp(diff_exp, num_experiment)["exp_reward"])}')
print(f'Sniff - {np.sum(select_exp(sniff_exp, num_experiment)["exp_reward"])}')

Question 2.1¶

If we set scent sigma to 5 for the maze, do you think the best p pos and p neg that you found in Question 1.5 will be best for the maze task?

Why?

# Write your answers here, as a comment

Question 2.2¶

Do you think that any value of p pos and p neg will cause the sniffer to outperform the other two explorers (Levy and Diffusion)? Explain your answer.

# Write your answers here, as a comment

Question 2.3¶

To find an approximate answer to Question 2.2, run 100 experiments with three sets of p pos and p neg values. But first, explain your choice for each.

# Write your answers here, as a comment

# Write your code here

# Analysis helper code:

# Results, names, and colors
results = [levy2_exp, diff_exp, sniff_exp]
names = ["Levy", "Diff", "Sniff"]
colors = ["purple", "brown", "green"]

# Score by eff
scores = []
for name, res, color in zip(names, results, colors):
    r = total_reward(res)
    scores.append(r)   

# Dists
for (name, s, c) in zip(names, scores, colors):
    plt.hist(s, label=name, color=c, alpha=0.5, bins=20)
    plt.legend()
    plt.xlabel("Score")
    plt.tight_layout()
    sns.despine()

# Tabulate
m, sd = [], []
for (name, s, c) in zip(names, scores, colors):
    m.append(np.mean(s))
    sd.append(np.std(s))

# Plot means
fig = plt.figure(figsize=(3, 3))
plt.bar(names, m, yerr=sd, color="black", alpha=0.6)
plt.ylabel("Score")
plt.tight_layout()
sns.despine()

Explorations!?