Hey Qben, very good question. In an ideal world, both cases the two cases are strictly equivalent. However, the trick is that you have to take into account arrival. For instance, when some features arrive, you first make a prediction. But the label might arrive later on. Therefore, you need to simulate the arrival delay in order to faithfully reproduce an online production scenario. I wrote a blog post about this here: https://maxhalford.github.io/blog/online-learning-evaluation/. Feel free to tell me if something isn't clear :)