LIAISE's logo

Current Challenges & LIAISE's Solutions

Author Avatar


July 1, 2024

Current Obstacles in AI Video Dubbing

When dubbing videos using AI, one of the key challenges is ensuring that the dubbed audio matches the timing of the original video. This synchronization is crucial to maintaining the natural flow of the video and ensuring that the audio matches the lip movements of the characters on screen.

Character Length and Timing

1. Original Language (Sentence 1):

  • Example: "English to Chinese"
  • The original sentence may have a certain number of characters and a specific duration when spoken.

2. Translated Language (Sentence 2):

  • Example: "Chinese to English"
  • The translated sentence might have a different number of characters and can take a different amount of time to speak compared to the original.

Problems Arising from Character Length Differences

  • Longer Translations: If the translated sentence is longer than the original, it will naturally take more time to be spoken. This creates a mismatch in timing, as the AI needs to fit a longer sentence into the same time frame as the shorter original sentence.
  • Shorter Translations: Conversely, if the translated sentence is shorter, it will be spoken in a natural speed, but may lead to gaps or pauses after the sentence is finished.

Challenges in AI Video Dubbing

1. Timing Synchronization:

The AI must ensure that the translated audio fits within the exact timing of the original audio. If the original audio is 5 seconds long, the translated audio must also be 5 seconds long, even if the sentences differ in length.

2. Natural Flow and Lip Sync:

In video dubbing, it's important that the spoken words match the lip movements of the characters on screen. Differences in sentence length can disrupt this synchronization, making the dubbing look and sound unnatural.

3. Speeding and Pausing:

To match the timing, the AI might have to speed up longer translations or introduce unnatural pauses in shorter translations. Both of these adjustments can negatively impact the overall quality of the dubbing.


1. English translate to Chinese

  • "Hello, how are you?" (18 characters) to "你好,你好嗎?" (6 characters) Difference: 12
  • "This is a test sentence." (24 characters) to"這是一個測試句子。" (10 characters) Difference: 14
  • "The quick brown fox jumps over the lazy dog." (44 characters) to "快速的棕色狐狸跳過懶狗。" (15 characters) Difference: 29

2. Chinese to English

  • "你好嗎?" (4 characters) to "How are you?" (11 characters) Difference: 7
  • "這是一個測試。" (7 characters) to "This is a test." (14 characters) Difference: 7
  • "快速的棕色狐狸跳過懶狗。" (14 characters) to "The quick brown fox jumps over the lazy dog." (44 characters) Difference: 30

In these examples, if the English sentence takes 5 seconds to speak, the Chinese sentence will have to be adjusted to fit within the same 5 seconds. If not properly synchronized, the AI may speak too quickly or slowly, affecting the natural pacing of the video.

LIAISE's Solution

  • Frame Addition: If the translated text requires more time than the original video, LIAISE will seamlessly add extra frames to ensure everything fits perfectly.
  • Speed Adjustment: When adding frames isn't enough, LIAISE can modify the speed of the video segment. By slowing down or speeding up the video, we ensure the translated content matches the original length as closely as possible.
  • Additionally, to further enhance translation accuracy, we offer a verification step before the translation process begins. This allows users to confirm the original context, ensuring the system can provide the most precise translation.
LIAISE's logo
Our platform utilizes Automatic Speech Recognition (ASR) to accurately understand spoken language and Text to Speech (TTS) to transform text into natural, human-like speech.

Terms & conditions

Copyright © 2024. Made by LIAISE dev team.