A high school student's innovative approach to audio decomposition has sparked an engaging discussion in the tech community about the current state and challenges of music transcription technology. The project, while initially described as source separation, has highlighted important distinctions in audio processing terminology and revealed the complexity of converting audio to sheet music.
Clarifying the Technology
The community discussion revealed an important distinction between audio source separation and what the project actually accomplishes. Rather than performing stem separation (isolating individual instruments from a mixed track), the project focuses on pitch detection and instrument classification using Fourier transforms and envelope analysis.
Audio Source Separation I think is the general term used in research. It is often applied to musical audio though, where you want to do stem separation - that's source separation where you want to isolate audio stems, a term referring to audio from related groups of signals, e.g. drums (which can contain multiple individual signals, like one for each drum/cymbal). [https://news.ycombinator.com/item?id=42098491]
Current State of Music Transcription
The discussion revealed that automatic music transcription has become a significant subfield of deep learning and music information retrieval. For piano transcription specifically, the technology has reached impressive accuracy levels. However, multi-track transcription for complex arrangements remains challenging.
Technical Challenges
Several technical limitations were identified by the community:
- Instrument physics variations: The same instrument can produce different harmonic spectrums depending on playing intensity
- Complex arrangements: Experimental music with unconventional playing techniques can produce unpredictable results
- Score interpretation: Converting MIDI to proper musical notation involves complex cultural and contextual rules
- Duration and velocity accuracy: While pitch and onset detection work well, note duration and intensity remain challenging
Industry Solutions
The community highlighted several existing solutions in this space:
- Commercial DAWs (Digital Audio Workstations) are increasingly incorporating stem separation features
- Google's MT3 project for multi-track music transcription
- Meta's Demucs for source separation
- Specialized tools like RipX and Stemroller
The discussion emphasized that while significant progress has been made in this field, particularly for single-instrument transcription, creating accurate multi-instrument transcriptions remains a complex challenge requiring sophisticated approaches beyond basic signal processing.
Source: Audio Decomposition Source: Hacker News Discussion