Bio-Inspired Neural Networks for OMR & Rheotaxis

At a glance

This thesis explores how bio-inspired artificial neural networks can be used to improve the behavior of a neuromechanical zebrafish simulation (simZFish), with a focus on two tightly coupled behaviors:

Optomotor Response (OMR): visually-evoked turning and forward swimming that stabilizes position relative to moving visual scenes.
Rheotaxis: station-keeping in flowing water (tendency to orient and swim against the current).

The starting point is simZFish, a highly realistic and biologically grounded control architecture, but one that relies heavily on hand-tuning many neural parameters. This work proposes learning-based alternatives that preserve biological structure while drastically improving scalability, reproducibility, and adaptability.

Training ground for simZFish: a Webots simulation of a flowing river where our reinforcement learning framework teaches neural controllers to navigate complex fluid dynamics. This approach bridges the gap between biological realism and learned adaptability.

Why this matters: From “hand-tuned circuits” to scalable discovery tools

Hand-tuned neural circuits can reproduce animal behavior with impressive fidelity, but they come with practical limitations:

Tuning is time-consuming and becomes impractical as networks grow;
It is hard to explore hypotheses that require systematic parameter sweeps;
Not all neuron activations are available from biology, which introduces arbitrariness.

This thesis treats simZFish as an experimental platform where we can:

1) Keep the biological interpretability of structured circuits, while
2) Introducing learning frameworks that can fill unknown gaps and optimize behavior under controlled conditions.

The platform: simZFish as an embodied brain–body–environment system

SimZFish integrates a complete chain from perception to actuation:

Visual sensing (camera-based retina model),
Motion-direction processing (direction-selective stages),
Pretectal and hindbrain representations that separate turning/forward commands,
Central Pattern Generators (CPGs) and motor pathways to drive tail motion,
Embodiment and hydrodynamics in a river-like environment.

This matters because OMR and rheotaxis are not “outputs of a controller” only, but emerge from the interaction of neural dynamics + body mechanics + environment.

Two complementary learning routes

1) Supervised learning: learn bio-inspired networks from activation targets

The supervised-learning framework focuses on learning a specific sub-circuit that maps pretectal activity to motor-command neurons:

Input: early pretectum (ePT) neurons
Hidden: late pretectum (lPT) neurons
Output: nMLF and aHB neurons (brainstem / motor-command related)

Two architectures are trained to mirror two hand-designed variants:

Model 1: 8 → 16 → 4 (corresponding to simZFish 1.0)
Model 2: 8 → 24 → 4 (corresponding to simZFish 2.0)

Architecture of Model 2 (corresponding to simZFish 2.0). This bio-inspired network is a multi-layer perceptron with a single hidden layer, and is composed of eight input neurons, twenty-four hidden neurons and four output neurons.

Training data are generated by presenting diverse moving-stripe stimuli and collecting input/output activations from a reference controller. Two training strategies are considered:

Output supervision only: constrain the output neurons.
Hidden + output supervision: additionally constrain hidden-layer activations (stronger biological anchoring).

Graph representing the values of the weights of Model 2 (simZFish 2.0) after training.

Graph representing the values of the weights of Model 2 (simZFish 2.0) after the training process with hidden layer supervision. The positive weights are represented in red, the negative weights in blue. The thickness of each weight represents its magnitude.

Main takeaway: supervised learning can reproduce target activations very accurately and produces stable solutions, but it cannot truly “beat” the reference if the dataset itself is derived from that reference (it tends to approximate what it was trained on).

2) Reinforcement learning: optimize behavior directly (with modular priors)

The reinforcement-learning framework is designed to be modular: the user can vary what the agent “knows” (observation space) and what it can control (action space), which directly corresponds to injecting different degrees of biological realism.

Algorithms: PPO and SAC (implemented in a Webots-based training loop).

Observation spaces (examples):

Idealized baseline: position, heading, velocity (easy but less biological)
Biologically grounded: direction-selective (DS) signals in early pretectum (harder but closer to sensorimotor processing)

Action spaces (examples):

Bias/drive motor-command pathways (e.g., nMLF/aHB-like control signals)
Adjust CPG parameters to modulate gaits
Directly control tail motors (a more fine-grained but harder control problem)

Reward shaping: encourages station-keeping (stay near a target position) and alignment against flow (stable upstream heading), with smooth penalties to support learning.

Main takeaway: RL can produce controllers that achieve more convincing rheotaxis/OMR behaviors than hand-tuned baselines. Furthermore, it can generalize across flow conditions, especially when the action space allows fine motor control and the observations provide sufficient state information.

OMR in a flowing river of the hand-tuned simZFish 2.0

OMR in a flowing river of the Reinforcement Learning controller

Forward optomotor response (OMR) under different flow speeds. The OMR is a visually driven behavior that helps the fish swim to compensate for motion in its surroundings. Here we test station keeping in a flowing river: better controllers keep the longitudinal position close to zero (less drift). Left (hand-tuned simZFish 2.0): larger displacements indicate weaker compensation. Right (Reinforcement Learning controller): the SAC-trained controller (observation space: position/velocity; action space: delta motors) reduces drift, resulting in more stable behavior across flow speeds.

Experimental evaluation: OMR + rheotaxis in a simulated river

To validate learned networks behaviorally (not just neuron-wise), the thesis defines a river-based evaluation suite inspired by prior simZFish experiments:

Turning OMR: initialize the fish with different headings and measure whether it turns to align appropriately.
Turning OMR (multi-trial): repeat for two headings across multiple trials to assess robustness.
Forward OMR + rheotaxis (varying flow): initialize upstream and test station-keeping under different water speeds.
Forward OMR + rheotaxis (multi-trial): robustness under repeated trials at a fixed flow.

These experiments allow controlled comparisons between:

Hand-tuned controllers,
Supervised-trained controllers (activation-matching),
RL-trained controllers (behavior-optimized).

Key outcomes and lessons learned

Supervised learning is excellent for reproducing known activation patterns and for building interpretable, biologically constrained surrogates. However, its ceiling is limited by the dataset source.
Reinforcement learning is the most powerful tool for behavioral optimization because it avoids requiring labeled neural activation datasets and can adapt to new conditions.
Modularity matters: swapping observation/action spaces provides a principled way to encode biological assumptions and ask “what minimal signals/control pathways are sufficient for the behavior?”
Practical insight: baseline rhythmicity (CPGs) can dramatically simplify learning when sensory signals do not directly reveal global state, while direct motor control can be superior when state information is explicit.

Acknowledgements

Source of all images: “Enhancing Robotic Behavior through Bio-Inspired Artificial Neural Networks: A Study of Optomotor Response and Rheotaxis in Robotic Fish”, Luca Zunino