I'm making a windows desktop application that needs to transcribe videos and I'm looking for a good free API to help me achieve that. I looked a lot but most of the API's that I've found have bad accuracies.
Google's Speech-to-Text API has state of the art accuracy, a simple interface, and client libraries in many languages. You get 60 minutes free per month.
Link: https://cloud.google.com/speech-to-text/
If you want online API that is totally free, you most likely will not find it.
If you are willing to go offline, you will probably have to come up with a custom solution using the weights of some openly available deep learning model. Read some papers on state-of-the-art transcription models and see if any of the weights are available on GitHub. Keep in mind that performing such a task offline is very computationally expensive, and might require a GPU to give you results in a reasonable amount of time.