Hey Karthik, I am glad you liked this :))
Now coming to your doubt, Actually to train the model with every possible combination of Anchor, Positive and negative we need to follow the traditional approach of selecting training examples. But this is obviously highly infeasible because we’ll have a huge training set which inturn requires heavy computation.
To make the training lightweight( Fast Convergence, Minimum Computation) and at the same time effective ( Allowing the model to learn all the required information ) we go with a method known as “Hard Training”.
In Hard training, we choose training examples in such a way that, we have an Anchor, a Hard Positive(A Positive example but with maximum possible distance/deviation from Anchor) and a Hard Negative(A Negative example but with minimum possible distance/deviation from Anchor)
This significantly reduces the training data and also gives the model to learn all the necessary critical information to differentiate between two faces.