Kurzzusammenfassung 9.5.25
Here I testes SAV on the inverted pendulum balance task.
As ET I compared the reward distribution of a moving window with one that was sampled at the beginning.
-
No retrain gave bad performance after the changes
-
Not stop train had problems with collapsing results in the stationary areas (Probably to few samples of bad actions)
-
Depending of the size of the change the time to detect was shorter. Large change -> faster detection
-
Relearning was triggered and resulted in a bad behavior at the beginning of the training. No transfer was done. So this was expected!