Ashvala

Hi, I am Ashvala Vinay!

I am a PhD student at Georgia Tech's Center for Music Technology. I am currently working with Prof. Alexander Lerch on problems related to evaluating generative audio synthesizers.

I graduated from Berklee College of Music in 2016. Following that, I worked on some personal projects and helped get two startups off the ground with their software engineering work. After that, I got my Masters in Music Technology at GTCMT.

I play in a live coding band with my good buddy, Ian Clester called Merge Conflict .

On this page, I will be sharing my work, and some posts where I will discuss music, technology and my progress through the course of my PhD work.


Publications

<
1 / 0
>

AQUATk: An Audio Assessment Toolkit

Ashvala Vinay, Alexander Lerch

ISMIR 2023, Late Breaking Demo

Abstract: Recent advancements in Neural Audio Synthesis (NAS) have outpaced the development of standardized evaluation methodologies and tools. To bridge this gap, we introduce AquaTk, an open-source Python library specifically designed to simplify and standardize the evaluation of NAS systems. AquaTk offers a range of audio quality metrics, including a unique Python implementation of the basic PEAQ algorithm, and operates in multiple modes to accommodate various user needs.

Download
BibTex
<
1 / 0
>

The Impact of Salient Musical Features in a Hybrid Recommendation System for a Sound Library

Jason Smith, Ashvala Vinay, Jason Freeman

Joint Proceedings of the ACM IUI Workshops, 2023

Abstract: EarSketch is an online learning environment that teaches coding and music concepts through the computational manipulation of sounds selected from a large sound library. It features sound recommendations based on acoustic similarity and co-usage with a user's current sound selection in order to encourage exploration of the library. However, students have reported that the recommended sounds do not complement their current projects in terms of two areas: musical key and rhythm. We aim to improve the relevance of these recommendations through the inclusion of these two musically related features. This paper describes the addition of key signature and beat extraction to the EarSketch sound recommendation model in order to improve the musical compatibility of the recommendations with the sounds in a user’s project. Additionally, we present an analysis of the effects of these new recommendation strategies on user exploration and usage of the recommended sounds. The results of this analysis suggest that the addition of explicitly musically-relevant attributes increases the coverage of the sound library among sound recommendations as well as the sounds selected by users. It reflects the importance of including multiple musical attributes when building recommendation systems for creative and open-ended musical systems.

Download
BibTex
<
1 / 0
>

Evaluating Generative Audio Systems and Their Metrics

Ashvala Vinay, Alexander Lerch

ISMIR 2022

Abstract: Recent years have seen considerable advances in audio synthesis with deep generative models. However, the state-of-the-art is very difficult to quantify; different studies often use different evaluation methodologies and different metrics when reporting results, making a direct comparison to other systems difficult if not impossible. Furthermore, the perceptual relevance and meaning of the reported metrics in most cases unknown, prohibiting any conclusive insights with respect to practical usability and audio quality. This paper presents a study that investigates state-of-the-art approaches side-by-side with (i) a set of previously proposed objective metrics for audio reconstruction, and with (ii) a listening study. The results indicate that currently used objective metrics are insufficient to describe the perceptual quality of current systems.

Download
BibTex
<
1 / 0
>

Mind the Beat: Detecting Audio Onsets from EEG Recordings of Music Listening

Ashvala Vinay, Alexander Lerch, Grace Leslie

ICASSP 2021

Abstract: We propose a deep learning approach to predicting audio event onsets in electroencephalogram (EEG) recorded from users as they listen to music. We use a publicly available dataset containing ten contemporary songs and concurrently recorded EEG. We generate a sequence of onset labels for the songs in our dataset and trained neural networks (a fully connected network (FCN) and a recurrent neural network (RNN)) to parse one second windows of input EEG to predict one second windows of onsets in the audio. We compare our RNN network to both the standard spectral-flux based novelty function and the FCN. We find that our RNN was able to produce results that reflected its ability to generalize better than the other methods.Since there are no pre-existing works on this topic, the numbers presented in this paper may serve as useful benchmarks for future approaches to this research problem.

Download
BibTex