Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I am developing a speech recognition system from scratch using Octave. I am trying to detect phonemes by detecting differences in frequency. Currently I have read in a wav file, organized the values into blocks and applied fft to the overall data. After, I plot the new data with plot(abs(real(fft(q)))) which creates this graph: fft graph

How could I get the frequency values (the peaks of the graph)?

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
1.4k views
Welcome To Ask or Share your Answers For Others

1 Answer

If you don't have access to findpeaks, the basic premise behind how it works is that for each point in your signal, it searches a three element window that is centred at this point and checks to see whether the centre of this window is larger than the left and right element of this window. You want to be able to find both positive and negative peaks, so you'd need to check the absolute value.

As such, what you can do is make two additional signals that shift the signal to the left by 1 and to the right by 1. When we do this, we will actually be checking for peaks starting at the second element in your signal, in order to make room for looking to the left. We keep checking up until the second last element, in order to make room for looking to the right. Therefore, we will actually be checking for peaks on a N - 2 version of the signal where N is the length of your signal. Therefore, when we create the left shifted signal, we extract the first element of the signal up until the third last element. When we create the right shifted signal, we extract from the third element up until the last element. The original signal will simply have its first and last elements removed.

Therefore, by checking for peaks this way, we will lose out on the first and last point of your data, but that should be suitable as there most likely won't be any peaks at the beginning and at the end. After, creating all of these signals, simply use logical indexing to see whether the corresponding values in the original signal (without the first and last elements) are larger than the other two signals in their corresponding positions.

As such, supposing your signal was stored in f, you would do the following:

f1 = abs(f(2:end-1)); %// Original signal
f2 = abs(f(1:end-2)); %// Left shift
f3 = abs(f(3:end)); %// Right shift

idx = find(f1 > f2 & f1 > f3) + 1; %// Get the locations of where we find our peaks

idx will contain the index locations of where the peaks occur. Bear in mind that we started searching for peaks at the second position, and so you need to add 1 to accommodate for this shift. If you wanted to find the actual time (or frequency in your case) values, you would just use idx to index into the time (or frequency) array that was used to generate your signal and find them. As such, let's use an artificial case where I generate a sinusoid from 0 to 3 seconds with a frequency of 1 Hz. Therefore:

t = 0 : 0.01 : 3;
f = sin(2*pi*t);

Now, if we ran the above code with this signal, we'd find the location of our peaks. We can then use these locations to index into t and f and plot the signal as well as where we have detected our peaks. Therefore:

plot(t, f, t(idx), f(idx), 'r.')

This is what I get:

enter image description here

Bear in mind that this is a very simple way of detecting peaks, but that is what is essentially done in findpeaks. If you used the above code, it would basically find all peaks. As such, the code would find dozens of peaks in that above graph, because there are local maxima all over your spectrum. You probably want to determine where the strong peaks are located. What people usually do is use a threshold to signify how large the peak should be before deciding whether that is a valid peak. As such, you can enforce a threshold, and do something like this:

thresh = ... ; %// Define threshold here
idx = find(f1 > f2 & f1 > f3 & f1 > thresh) + 1; %// Get the locations of where we find our peaks

In your case for your graph, you may want to set this so that you find any peaks whose magnitude is larger than 10 perhaps.


There are a lot of other things that findpeaks does, such as filtering out noisy peaks and some other robust measures. If you want to use findpeaks, you need to make sure that you install signal package. You can simply use pkg install from the Octave Command Prompt and install the signal package. Specifically, try this:

pkg install -forge signal

Once you install the signal package, you can load it into the Octave environment by doing:

pkg load signal

If you have to install dependencies, it'll tell you when you try to install the signal package. Check out this link for more details: https://www.gnu.org/software/octave/doc/interpreter/Installing-and-Removing-Packages.html

mkoctfile stands for making / compiling an Octave file. If you don't have mkoctfile, make sure you have the most recent version of Octave installed. What I recommend you do to make things simple is to install either Homebrew or MacPorts and get Octave in that fashion. Once you install it, then you should be able to get mkoctfile working. However, if you still can't, you may need to have a compatible compiler installed. The easy approach is to install the Command Line Developer tools from Xcode. Go to this link then go to Additional Tools.

Good luck!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...