# Blog

ProjectBlog: Students as Reinforcement Learning Agents Part 1

### Blog: Students as Reinforcement Learning Agents Part 1

What I witness trying to teach entry level university physics labs.

### Foreword

So I teach as a graduate student introductory physics labs to STEM majors which gives me an unique vintage point into students and how they mimic agents in Reinforcement Learning. Any behaviors that I describe will be partial due to the structure of the course, partial due to the students. What someone may see at a different institution could very drastically from the things that I witness. Now the way physics labs are set up follows something more inline of this article where students are asked to answer an overall question rather than follow some step-by-step instruction manual. This has lead to some very interesting behavior.

### Local minimums versus Global minimums

So this concept should be familiar to those who deal with optimization of any sort, that the agent gets stuck on a local minimum rather than finding the global minimum. Rather than continue searching for better options, the agent stays at the local minimum and therefor the suboptimal solution. An example I have of this is a lab where my students were supposed to construct a “clock” from a pendulum, i.e. they needed to manipulate the pendulum to achieve a certain measurable period. They could change, although I did not explicitly tell them these things, the starting angle, the length of the string, or the object of the pendulum. I did however tell them ahead of time that I expected from prior experience that a period of 0.5 to 2 seconds which comes from the theoretical equation based on small angles. So what did a majority of my students do? They created graphs of the change in angle versus observed period with angles from 0 to 90 degrees. Did they observe the pendulum’s period change? Yes. Was the possible periods that they collected within the range I had give? Also yes. However, the possible periods that they gave was only a small subset (think 1.5 to 1.75 seconds periods) of the range that I gave them. The students had narrowed down on a method of changing the period but they failed to explore the other options available to them and became stuck at a global minimum.

### More to come

I’ll keep going over things parallels that show up in the following weeks.

Source: Artificial Intelligence on Medium