
Project scope
Categories
Information technology Software development Machine learningSkills
russian language javascript (programming language) lip sync python (programming language) application programming interface (api) korean language german language kaldi front end design french languageWe're looking for an Intern who understands Java Script to update our UNOMi 3D Lip-Sync plugin. This is a great opportunity to learn about how a startup works and gain experience.
PROJECT OVERVIEW
Languages to Add
- Mandarin, Japanese, Hindi, Arabic, Spanish, Portuguese, German, French, Korean, Russian
Tech Stack
- ASR & Alignment: Kaldi
- Backend: Python + Node.js/JavaScript
- Frontend: Angular
- Phoneme Processing: G2P tools (e.g., Espeak, Phonetisaurus), IPA-based mappings
PHASE 1: Planning & Resources (Week 1)
• Language Resource Audit: Identify available Kaldi recipes and G2P tools for each target language
• Model Selection: Choose between pre-trained or training new Kaldi models
• Viseme Set Design: Create or map a universal viseme set for multilingual phoneme integration
• Timeline Finalization: Confirm team bandwidth and resource allocation
PHASE 2: Model Setup & G2P (Weeks 2–4)
• Setup Kaldi Environments: Spin up Docker or server instances per language
• Integrate Pretrained Models or Begin Training: Load pretrained models (e.g., CommonVoice, Aishell, GlobalPhone) or begin training with transcribed data
• G2P Mapping: Integrate Espeak/Phonetisaurus or language-specific G2P models
• Test G2P Conversions: Validate phoneme output against expected transcription with native speakers, where possible
PHASE 3: Phoneme Alignment & Backend Processing (Weeks 5–6)
• Forced Alignment: Use Kaldi to align audio + transcript to phoneme timing for each language
• Standardize Output: Normalize output format (JSON, SRT, XML) for lip-sync engine compatibility
• Python Middleware Update: Update backend to handle phoneme inputs from each language and pass to animation engine
• JavaScript Integration: Update JS logic for language detection and ASR model switching
PHASE 4: Frontend & Viseme Mapping (Weeks 7–8)
• Viseme Map Expansion: Add mappings from phonemes (IPA or language-specific) to viseme shapes
• Angular Component Update: Expand Angular UI to support language selection or detection
• Integrate Viseme Timings: Sync timing data with animation engine (possibly via WebSocket or event stream)
• User Testing (Internal): Run internal tests on lip sync output per language
PHASE 5: QA, Optimization & Launch (Weeks 9–10)
• QA Pass (External Review): Collect feedback from native speakers for each language
• Optimize Latency: Tune backend processing (Kaldi decode/align) for sub-5s performance if possible
• Final Bug Fixes: Polish frontend and backend features based on QA feedback
• Deploy: Roll out support to production and announce multilingual support
DELIVERABLES BY PHASE:
• Phase 1: Timeline, toolchain finalized, phoneme–viseme map draft
• Phase 2: All G2P modules ready, ASR models integrated
• Phase 3: JSON-formatted phoneme timings output from Kaldi
• Phase 4: Frontend updated, 3D characters lip-syncing to new languages
• Phase 5: Fully functioning, optimized, and tested UNOMi with multilingual support
ADDITIONAL NOTES
- Pretrained Kaldi models:
- Spanish, French, German: CommonVoice
- Mandarin: AISHELL, THCHS-30
- Arabic: MGB Challenge
- Russian: VoxForge or RUSLANA
- Hindi: MUCS 2021 Challenge dataset
- Japanese/Korean: You may need to use Mozilla’s Whisper, OpenAI Whisper, or train your own
- Alternative ASR/G2P fallback: If Kaldi support is weak for some languages, Whisper or Vosk may fill the gap.
Providing specialized, in-depth knowledge and general industry insights for a comprehensive understanding.
Sharing knowledge in specific technical skills, techniques, methodologies required for the project.
Direct involvement in project tasks, offering guidance, and demonstrating techniques.
Providing access to necessary tools, software, and resources required for project completion.
Scheduled check-ins to discuss progress, address challenges, and provide feedback.
About the company
UNOMi is innovative and easy to use software for animators. UNOMi reduces the production time and budget on developing content by 30% to 70%. It does this by automatically syncing 2D and 3D mouth poses to voice-over recordings of each character that an artist or animator creates. We understand the pain that is involved in producing quality animated content and we’ve created the perfect tool to help with the process. It normally takes animators about a day to animate one character talking for 30 seconds but with UNOMi animators can get that done in seconds.
UNOMi’s top mission is to solve the greatest challenges facing animators today. With the level of technology in the world, there is no reason why animators should still be struggling to tell their stories.