since in our model we measure in the Z basis, the purpose of the Hadamard gates is to initialize all the qubits in a state which is not biased towards Z=+1 or Z=-1. Indeed the effect of H is to prepare the |+> state, for which the expectation value of Z is zero.
What would happen if we remove the first layer of H gates? Probably the final result would be similar but maybe the training time would be longer because of the different initial condition. This is just a guess, but the best way to know is to try and see how it goes.