CV01 – A Physics Approach to Vocaloid Technology

Written By Koko Liu

Hatsune Miku

Kagamine Rin

Kagamine Len

Vocaloid is a singing synthesizer software that began from the concept of synthesizing human vocals for singing. Consisting of charismatic, anime-styled virtual idols, most are familiar with the characters distributed by Crypton Future Media, Inc., namely Hatsune Miku, Kagamine Rin, Kagamine Len, Megurine Luka, KAITO, and MEIKO. The vocal synthesization technology behind the Vocaloid software is extremely advanced, and its lengthy creation process reveals the historical progression of sound technology.

Megurine Luka

MEIKO

KAITO

Music can be reduced to the physical phenomenon of waves. In more technical terms, sound is a vibration propagated as a wave through a transmission medium such as a gas, liquid, or solid. This creates a wave in the medium, with areas of high and low pressure. The pressure maxima and minima, alongside the distance between the varying pressure areas, determine the frequency and the amplitude of the wave. This affects pitch and volume of the sound, respectively. However, physics does not do justice by failing to encompass what music is. Hence, this subject can be approached from the perspective of music theory.

In music, specific frequencies are labelled as notes. A sound of a frequency of 261.63 Hz/waves per second is widely adapted to be middle C. Categorizing different frequencies into different notes allows musicians to interpret and express music with ease. Although these notes are assigned somewhat randomly, the intervals between them are not. For example, if the frequency of one note is twice the frequency of another, the interval between them is called an octave. So, essentially, music is simply a composition of fractions.

Despite the correlations between frequency and notes, if musical instruments produced a perfect pure tone of the exact frequency of the note, they would lose their artistic essence. In reality, instruments create a much more chaotic wave in comparison to a simple wave. Differently sized instruments, materials, and shapes are used to manipulate this wave. Musicians themselves can also use different techniques on the same instruments to further manipulate the chaos. The chaos that differentiates sounds of the same frequency and amplitude is known as the timbre.

The idea that modifying a sound’s timbre sufficiently can replicate any other sound of identical frequency and amplitude was tested by Christian Kratzenstein in 1779. He modified five pipe organs into specific distorted shapes. As air vibrations passed through these uniquely shaped pipes, they generated relatively clear pronunciations of the five English vowels: A, E, I, O, and U.

Simplification of Complex Waveforms via Fourier Analysis

By playing two notes of an interval at the same time in a chord, two sound waves can be added together. Different sounds with differing timbres, frequencies, and amplitudes can be combined to achieve any desired sound, and this sum is a representation of a function called the Fourier Series. By manipulating the sound wave through complex functions, fundamental components of the sound wave can be altered. In essence, these complex functions provide more freedom to change the specific qualities of a sound in comparison to a set of acoustic pipes.

“Voder” was created in 1939 with two source sounds: a buzzing tone and a hissing sound, imitating the vibrations of the vocal cord and the sound of air being pushed out of the lungs through unique timbres. These sounds were then passed through a bank of electric filtering devices, which controlled what was being pronounced, as well as the tone it was being pronounced. The company that developed “Voder” then went on to make the famous recording of Daisy Bell on the IBM 7094 in 1961. The IBM 7094 is often referred to as “The Grandfather of Hatsune Miku.”

The Source Filter Model

As we can modify sounds to a fairly free degree, one idea is to take inspiration from the human voice. This forms the basis of the source-filter model, which generates sounds of a certain pitch and passes them through an imitation of a vocal tract.

The source filter model provided flexible and smooth vocal synthesis, but its sound quality was heavily reliant on the synthesis of the initial sound source and the accuracy of the data analysis. This dependence frequently led to muffled output, limiting its appeal to music producers. Adding to these drawbacks, the model was typically too complex or resource-intensive for the average producer using personal computers. It was clear that the technology required further research and development.

So, when will this technology be available for commercial use?

Vocaloid Characters from Crypton Future Media, Inc.

On March 3rd, 2004, the first two Vocaloids—LEON and LOLA—were released by British company Zero-G Limited. Unfortunately, the voices were not well received, with them retaining their rather robotic sound. The company releasing Vocaloids then switched to Crypton Future Media, Inc., which released two new and improved Vocaloids—MEIKO and KAITO—with upgrades to vocal quality and a more user-friendly interface. They were also assigned illustrations as a marketing technique. Nonetheless, from a commercial perspective, the Vocaloid project remained unsuccessful.

It was not until 2007 that things began to take a turn. Crypton Future Media and Yamaha released Hatsune Miku (初音ミク), with “Hatsu” being Japanese for “first”, “ne” meaning sound, and “Miku” meaning future. Thus, the name meant “the first sound from the future”. Codenamed CV01 (Character Voice 01), Hatsune Miku became a near-instant success. With Saki Fujita (藤田咲) as a voice provider, Hatsune Miku featured much smoother and clearer vocals than the Vocaloids who came before her.

Vocaloid is not the name of the synthesis process; rather, it refers to the software program made by Yamaha that allows users to input lyrics and an instrumental melody for generated vocals. Users may then begin to “tune” these vocals, altering how robotic or realistic they sound. Through this process, several mainstream songs such as “Senbonzakura” by Kurousa-P, “Hibana” by DECO*27, and “Love Trial” by 40mP were created by their respective producers. The software has grown immensely popular, becoming a revolutionary internet phenomenon.

Breakout International

CV01 – A Physics Approach to Vocaloid Technology

Simplification of Complex Waveforms via Fourier Analysis

The Source Filter Model

So, when will this technology be available for commercial use?

Related Posts

Leave a ReplyCancel reply

CV01 – A Physics Approach to Vocaloid Technology

Simplification of Complex Waveforms via Fourier Analysis

The Source Filter Model

So, when will this technology be available for commercial use?

Related Posts

Leave a ReplyCancel reply

Discover more from Breakout International