Blog: Students as Reinforcement Learning Agents Part 1
What I witness trying to teach entry level university physics labs.
So I teach as a graduate student introductory physics labs to STEM majors which gives me an unique vintage point into students and how they mimic agents in Reinforcement Learning. Any behaviors that I describe will be partial due to the structure of the course, partial due to the students. What someone may see at a different institution could very drastically from the things that I witness. Now the way physics labs are set up follows something more inline of this article where students are asked to answer an overall question rather than follow some step-by-step instruction manual. This has lead to some very interesting behavior.
Local minimums versus Global minimums
So this concept should be familiar to those who deal with optimization of any sort, that the agent gets stuck on a local minimum rather than finding the global minimum. Rather than continue searching for better options, the agent stays at the local minimum and therefor the suboptimal solution. An example I have of this is a lab where my students were supposed to construct a “clock” from a pendulum, i.e. they needed to manipulate the pendulum to achieve a certain measurable period. They could change, although I did not explicitly tell them these things, the starting angle, the length of the string, or the object of the pendulum. I did however tell them ahead of time that I expected from prior experience that a period of 0.5 to 2 seconds which comes from the theoretical equation based on small angles. So what did a majority of my students do? They created graphs of the change in angle versus observed period with angles from 0 to 90 degrees. Did they observe the pendulum’s period change? Yes. Was the possible periods that they collected within the range I had give? Also yes. However, the possible periods that they gave was only a small subset (think 1.5 to 1.75 seconds periods) of the range that I gave them. The students had narrowed down on a method of changing the period but they failed to explore the other options available to them and became stuck at a global minimum.
More to come
I’ll keep going over things parallels that show up in the following weeks.