High School Student's Audio Decomposition Project Sparks Discussion on Music Transcription Technology

BigGo Editorial Team

High School Student's Audio Decomposition Project Sparks Discussion on Music Transcription Technology

A high school student's innovative approach to audio decomposition has sparked an engaging discussion in the tech community about the current state and challenges of music transcription technology. The project, while initially described as source separation, has highlighted important distinctions in audio processing terminology and revealed the complexity of converting audio to sheet music.

Clarifying the Technology

The community discussion revealed an important distinction between audio source separation and what the project actually accomplishes. Rather than performing stem separation (isolating individual instruments from a mixed track), the project focuses on pitch detection and instrument classification using Fourier transforms and envelope analysis.

Audio Source Separation I think is the general term used in research. It is often applied to musical audio though, where you want to do stem separation - that's source separation where you want to isolate audio stems, a term referring to audio from related groups of signals, e.g. drums (which can contain multiple individual signals, like one for each drum/cymbal). [https://news.ycombinator.com/item?id=42098491]

Current State of Music Transcription

The discussion revealed that automatic music transcription has become a significant subfield of deep learning and music information retrieval. For piano transcription specifically, the technology has reached impressive accuracy levels. However, multi-track transcription for complex arrangements remains challenging.

Technical Challenges

Several technical limitations were identified by the community:

Instrument physics variations: The same instrument can produce different harmonic spectrums depending on playing intensity
Complex arrangements: Experimental music with unconventional playing techniques can produce unpredictable results
Score interpretation: Converting MIDI to proper musical notation involves complex cultural and contextual rules
Duration and velocity accuracy: While pitch and onset detection work well, note duration and intensity remain challenging

Industry Solutions

The community highlighted several existing solutions in this space:

Commercial DAWs (Digital Audio Workstations) are increasingly incorporating stem separation features
Google's MT3 project for multi-track music transcription
Meta's Demucs for source separation
Specialized tools like RipX and Stemroller

The discussion emphasized that while significant progress has been made in this field, particularly for single-instrument transcription, creating accurate multi-instrument transcriptions remains a complex challenge requiring sophisticated approaches beyond basic signal processing.

Source: Audio Decomposition Source: Hacker News Discussion