First, you start by giving your input (x) to the network. Then in the network, it is used with randomly initialized parameters to calculate linear functions (y=ax+b). To get any kind of shapes (not just linear) there is a non-linear function (e.g. ReLU) after every linear function. In the end, you get number/numbers out that should be close to the real value (y). This was forward pass and to train the network we need to make backward pass. Backward pass means that we calculate the gradient of the network when we know the error between prediction and real value. Simply we calculate derivative respect to every parameter. Even more simply we increase and decrease every parameter one by one and change it to the direction that reduces the difference between prediction and the real value. That’s it. You just continue this and change the parameters until those generalize well to different inputs.