The aim thought of my project is that training MARS stream with two stream lets MARS can capture spatial-temproral information. The initial fusion method bettwen rgb stream