Blog: Students as Reinforcement Learning Agents Part 2
How students respond to grading and randomness.
Exploratory Physics Labs
The labs that I supervise as a critic are fairly open ended. The labs extend over a couple weeks and generally involve a test over things like constructing a clock with a given period from a pendulum. Students have ample time to play with the system to construct models and ideally, they would play with a couple of models. How I’m supposed to grade is that on two distinct parts, the process that they used to get their model and a prediction result of their model against a new measurement. Ideally, the students should be able to distinguish if their process is wrong or if they aren’t using a good enough model for whatever system they use. This means that my grading is essentially two seperate reward functions that are interpreted as a total grade. This tends to lead to some interesting behavior in my students.
Linear Models in Physics
It almost never fails that students will try a linear model. It is definitely the fastest to make in Excel using the LINEST function. That said, generally physics is never linear without some manipulations. Will simple linear models sometimes work out? Yes, but it generally sporadic at best as you need the range to be small so that the first order Taylor series expansion is fairly accurate at their prediction point. For instance instead of using the traditional small angle approximation of period is proportional to the square root of the string length, it is possible to have the pendulum period to be linearly related to angle.
Grading as a Reward Function
By far, the students are using a greedy or at least a very small epsilon greedy learning algorithm. They will learn the process and documentation with linear models when it’s easy and get very close to perfect scores on their process section. Thus, as with learning greedy learning models, or at least low exploration algorithms, the a lot of students do not vary their approach when it comes to later labs. Thus when the linear models fail, they don’t change either because they know linear models can work or because they don’t want to put forth the effort to change models. Perhaps if the semester was long enough my students would learn to use other models that were not linear but alas, here we are at the end of the semester.