Bio-Inspired Neural Networks for OMR & Rheotaxis
EPFL Master’s thesis
At a glance
This thesis explores how bio-inspired artificial neural networks can be used to improve the behavior of a neuromechanical zebrafish simulation (simZFish), with a focus on two tightly coupled behaviors:
- Optomotor Response (OMR): visually-evoked turning and forward swimming that stabilizes position relative to moving visual scenes.
- Rheotaxis: station-keeping in flowing water (tendency to orient and swim against the current).
The starting point is simZFish, a highly realistic and biologically grounded control architecture, but one that relies heavily on hand-tuning many neural parameters. This work proposes learning-based alternatives that preserve biological structure while drastically improving scalability, reproducibility, and adaptability.
Why this matters: From “hand-tuned circuits” to scalable discovery tools
Hand-tuned neural circuits can reproduce animal behavior with impressive fidelity, but they come with practical limitations:
- Tuning is time-consuming and becomes impractical as networks grow;
- It is hard to explore hypotheses that require systematic parameter sweeps;
- Not all neuron activations are available from biology, which introduces arbitrariness.
This thesis treats simZFish as an experimental platform where we can:
1) Keep the biological interpretability of structured circuits, while
2) Introducing learning frameworks that can fill unknown gaps and optimize behavior under controlled conditions.
The platform: simZFish as an embodied brain–body–environment system
SimZFish integrates a complete chain from perception to actuation:
- Visual sensing (camera-based retina model),
- Motion-direction processing (direction-selective stages),
- Pretectal and hindbrain representations that separate turning/forward commands,
- Central Pattern Generators (CPGs) and motor pathways to drive tail motion,
- Embodiment and hydrodynamics in a river-like environment.
This matters because OMR and rheotaxis are not “outputs of a controller” only, but emerge from the interaction of neural dynamics + body mechanics + environment.
Two complementary learning routes
1) Supervised learning: learn bio-inspired networks from activation targets
The supervised-learning framework focuses on learning a specific sub-circuit that maps pretectal activity to motor-command neurons:
- Input: early pretectum (ePT) neurons
- Hidden: late pretectum (lPT) neurons
- Output: nMLF and aHB neurons (brainstem / motor-command related)
Two architectures are trained to mirror two hand-designed variants:
- Model 1: 8 → 16 → 4 (corresponding to simZFish 1.0)
- Model 2: 8 → 24 → 4 (corresponding to simZFish 2.0)
Training data are generated by presenting diverse moving-stripe stimuli and collecting input/output activations from a reference controller. Two training strategies are considered:
- Output supervision only: constrain the output neurons.
- Hidden + output supervision: additionally constrain hidden-layer activations (stronger biological anchoring).
Main takeaway: supervised learning can reproduce target activations very accurately and produces stable solutions, but it cannot truly “beat” the reference if the dataset itself is derived from that reference (it tends to approximate what it was trained on).
2) Reinforcement learning: optimize behavior directly (with modular priors)
The reinforcement-learning framework is designed to be modular: the user can vary what the agent “knows” (observation space) and what it can control (action space), which directly corresponds to injecting different degrees of biological realism.
Algorithms: PPO and SAC (implemented in a Webots-based training loop).
Observation spaces (examples):
- Idealized baseline: position, heading, velocity (easy but less biological)
- Biologically grounded: direction-selective (DS) signals in early pretectum (harder but closer to sensorimotor processing)
Action spaces (examples):
- Bias/drive motor-command pathways (e.g., nMLF/aHB-like control signals)
- Adjust CPG parameters to modulate gaits
- Directly control tail motors (a more fine-grained but harder control problem)
Reward shaping: encourages station-keeping (stay near a target position) and alignment against flow (stable upstream heading), with smooth penalties to support learning.
Main takeaway: RL can produce controllers that achieve more convincing rheotaxis/OMR behaviors than hand-tuned baselines. Furthermore, it can generalize across flow conditions, especially when the action space allows fine motor control and the observations provide sufficient state information.
Experimental evaluation: OMR + rheotaxis in a simulated river
To validate learned networks behaviorally (not just neuron-wise), the thesis defines a river-based evaluation suite inspired by prior simZFish experiments:
- Turning OMR: initialize the fish with different headings and measure whether it turns to align appropriately.
- Turning OMR (multi-trial): repeat for two headings across multiple trials to assess robustness.
- Forward OMR + rheotaxis (varying flow): initialize upstream and test station-keeping under different water speeds.
- Forward OMR + rheotaxis (multi-trial): robustness under repeated trials at a fixed flow.
These experiments allow controlled comparisons between:
- Hand-tuned controllers,
- Supervised-trained controllers (activation-matching),
- RL-trained controllers (behavior-optimized).
Key outcomes and lessons learned
- Supervised learning is excellent for reproducing known activation patterns and for building interpretable, biologically constrained surrogates. However, its ceiling is limited by the dataset source.
- Reinforcement learning is the most powerful tool for behavioral optimization because it avoids requiring labeled neural activation datasets and can adapt to new conditions.
- Modularity matters: swapping observation/action spaces provides a principled way to encode biological assumptions and ask “what minimal signals/control pathways are sufficient for the behavior?”
- Practical insight: baseline rhythmicity (CPGs) can dramatically simplify learning when sensory signals do not directly reveal global state, while direct motor control can be superior when state information is explicit.
Acknowledgements
Source of all images: “Enhancing Robotic Behavior through Bio-Inspired Artificial Neural Networks: A Study of Optomotor Response and Rheotaxis in Robotic Fish”, Luca Zunino