Do v need to provide an initial alignment for the inputs during training, in addition to the actual decoded label sequence?
question from:https://stackoverflow.com/questions/65882518/ctc-alignment-for-video-sequence-recognitionDo v need to provide an initial alignment for the inputs during training, in addition to the actual decoded label sequence?
question from:https://stackoverflow.com/questions/65882518/ctc-alignment-for-video-sequence-recognition