Advanced Acoustic Analysis and Classification of Instrumental Sounds in High-Dimensional Feature Spaces
Streszczenie
This thesis provides an analysis of the classification of isolated instrumental sounds in acoustic data using a classical Machine Learning method — Support Vector Machine (SVM). The study uses the Philharmonia Sound Samples library and curates a dataset of 13,533 professionally recorded, monophonic clips from 19 instruments (average duration 1.91 s, sampling rate 44.1 kHz). A 32-dimensional feature vector composed of 13 MFCCs, 7 chroma features, and 12 spectral-contrast features represents each clip. A Support Vector Machine (SVM) with a Radial Basis Function (RBF) kernel was selected for its strong performance in high-dimensional spaces, computational efficiency, and good generalization properties. Other traditional classifiers (e.g., KNN) were considered but found less suitable for the chosen feature representation and problem scale. The dataset was split stratified 80:20 (10,826 training / 2,707 test samples). Hyperparameter optimization using Grid Search resulted in an optimal configuration that achieved approximately 98.96% accuracy on the held-out test set. The macro-averaged precision was around 98.5%, recall was about 97.2%, and the F1 score was approximately 97.6%. Error analysis revealed that most misclassifications occur between acoustically similar instruments, such as the violin and viola, as well as closely related plucked string instruments. These distinctions can also be challenging for human listeners. This thesis documents the feature-engineering decisions, methodological rationale, evaluation metrics, and a reproducible implementation using librosa and scikit-learn on Google Colab. We conclude that a compact and carefully tuned Support Vector Machine (SVM) pipeline, utilizing complementary timbral and harmonic features, can achieve near-state-of-the-art results on clean, monophonic datasets.
Collections
Z tą pozycją powiązane są następujące pliki licencyjne:
