I am trying to use GPT2 for Arabic text classification task as follows:
tokenizer = GPT2Tokenizer.from_pretrained(model_path)
model = GPT2ForSequenceClassification.from_pretrained(model_path,
num_labels=len(lab2ind))
However, when I use the tokenizer it converts the Arabic characters to symbols like this
'?ù??aù??±'