I'm getting confused when it comes to model validation.
What I've done for 6 different algorithms:
-->separated my dataset 75/25 (training/test) --> the test I left untouched.
-->with the training set I did the following:
- splited in 4-folds (outer) and performed a nested repeated (five times) tenfold (inner) cross-validation. With hiperparameter tuning by random search 10 times. (leave one out strategy)
- extracted the metrics (ROC curves, acc, specificity, etc) and got the parameters of the best model.
Now this is the problem:
I still have an untouched test set (from the split in the beginning), what should I do with it? Apply directly to the best model and see the performance? or retrain the best model with the best parameters using the whole training set and then apply the test set?
Or is everything wrong here?
question from:https://stackoverflow.com/questions/65649231/how-to-correctly-validate-a-machine-learning-model