How Hatsune Miku Is Different From AI Voices

If you've been around the internet for long enough, you may be at least passingly familiar with the virtual idol, Hatsune Miku. Since the character's introduction in 2007, she's become an enduring staple of online pop culture, particularly in the anime and independent music scenes. Miku's voice has been featured in thousands of songs across streaming platforms like YouTube and Nicovideo, and though she herself is just a holographic character, she regularly sells out live venues with her singing and dancing, accompanied by live bands. 

Hatsune Miku is even slated to perform at Coachella 2024, an appearance that was originally supposed to occur in 2020 before being stymied by COVID-19. Despite the character's long-standing popularity, though, some concerns have been raised about the authenticity of her voice. 

Controversies have cropped up in recent years around the usage of AI voice duplication systems to copy and infringe on the voices of singers and characters, and the synthesized nature of the character has drawn some comparisons. Putting aside the fact that Miku was on the scene well before this became an issue, the systems that power her voice are actually quite different from those used in AI voice duplication.

The VOCALOID software

Obviously, Hatsune Miku is not a real person. Her voice is synthesized through the usage of a program called VOCALOID, the original version of which was released by Yamaha in 2003. VOCALOID is a music-making tool designed to allow users to assemble songs with not just synthesized instruments but synthesized voices as well.

The voices are created through the use of voicebanks, massive collections of simple sounds from a particular language like syllables, phonemes, and so on. In the VOCALOID Editor, you string together both vocal sounds from the voicebank and a melody, and it turns into music.

There are actually quite a few voicebanks available for VOCALOID; Hatsune Miku is just one of the Japanese-speaking ones, and she wasn't even created by Yamaha. Miku was the second generation of Japanese-speaking voicebanks created by a third-party company, Crypton Future Media. There are first- and third-party voices in various languages, including English, Spanish, Chinese, and Korean, as well as different voice tones for various genres of music. If you've got the right voice bank and some musical know-how, you can make pretty much any kind of song from scratch.

Assembling voice banks

AI voice duplication is created by feeding samples of a particular person's voice into an algorithm, including tone, speech patterns, and little nuances. The algorithm then uses this information to make what is essentially an educated guess of the subject's speech. Depending on the quantity and quality of the samples, apps like Podcastle Pro can create realistic-sounding voices. 

VOCALOID voice banks, on the other hand, are created by painstakingly recording a subject's speech patterns and storing them separately. Hatsune Miku's voice, for instance, is based on the voice of Japanese voice actress Saki Fujita. To provide her voice for the voicebank, Fujita needs to stand in a recording booth and individually sound out every syllable and phoneme you could potentially need for crafting lyrics. This is a job she has been repeatedly called on for every time the voice bank receives a new update, such as the English release of the software in 2013.

The clear difference between AI voice duplication and VOCALOID is that while they're both used for a form of voice synthesis, the latter still maintains a significant human component. The voicebank just provides the sounds; it's up to the user to string them together into something halfway resembling music. This is why many VOCALOID-made songs will usually have the user's name listed as the actual creator, with something like "featuring Hatsune Miku" in the title or credits.

The human element

The major concerns of AI voice duplication are that it could be used to effectively steal the voices of actors and prominent individuals, whether for silly karaoke covers on YouTube or using AI for outright malicious purposes. This unlicensed usage of peoples' voices still remains a concern both in and out of the entertainment industry at large.

VOCALOID, and by extension, Hatsune Miku, cannot be used for these purposes. Unlike unsanctioned voice cloning, those who contribute their voices to the creation of voicebanks like Saki Fujita are compensated for their work. Yamaha, Crypton, and other VOCALOID-related companies enter into an agreement with the voice actors. The signature robotic sound of music created with VOCALOID sounds very distinct from the real thing, so while users can make their own music, there's no way it could completely supplant a sampled singer, at least not without an astronomical level of work.

If you're concerned that supporting a virtual idol is the same as validating the theft of voices, don't be. It's a completely different technology intended only for entertainment purposes, designed to simply transform the act of singing into a digital instrument.