The method of transforming the source speaker’s
speech to that of the target speaker is usually referred as Voice
Morphing or voice transformation or voice conversion. Using the linear
transformations estimated from time-aligned parallel training data, it
transforms the spectral envelope of the potential speaker in tone with
the target speaker. As the image morphing is analogous in nature, i.e.
the source face smoothly changing its shape and texture to the target
face, speech morphing also should smoothly change the source voice into
another, keeping the shared characteristics of the starting and ending
signals. The pitch and the envelope information are two factors that
coincide in a speech signal, which needs to be separated. The method of
cepstral analysis is usually employed to extract the same.
17
Oct 08