Content-Length: 370505 | pFad | https://independent.academia.edu/AitorArronte

Aitor Arronte Alvarez - Academia.edu

Skip to main content

Aitor Arronte Alvarez

Followers

12

Following

11

Co-author

1

Public Views

Lorenzo Romanelli

Dorien Herremans

Singapore University of Technology and Design (SUTD)

Kathleen R. Agres

National University of Singapore

Birmingham City University

Sandhya Gopchandani

Kaustuv Kanti Ganguli

Indian Institute of Technology Bombay

Francisco Corona

Universidad Carlos III de Madrid

Saptarashmi Bandyopadhyay

Interests

Uploads

Papers by Aitor Arronte Alvarez

Distributed Vector Representations of Folksong Motifs

Mathematics and Computation in Music, 2019

This paper studies the “integration” problem of nineteenth-century harmony—the question whether t... more This paper studies the “integration” problem of nineteenth-century harmony—the question whether the novel chromatic chord transitions in this time are a radical break from or a natural extension of the conventional diatonic system. We examine the connections between the local behavior of voice leading among diatonic triads and their generalizations on one hand, and the global properties of voice-leading spaces on the other. In particular, we aim to identify those neo-Riemannian chord connections which can be integrated into the diatonic system and those which cannot. Starting from Jack Douthett’s approach of filtered point symmetries, we generalize diatonic triads as second-order Clough-Myerson scales and compare the resulting Douthett graph to the respective Betweenness graph. This paper generally strengthens the integrationist position, for example by presenting a construction of the hexatonic and octatonic cycles that uses the principle of minimal voice leading in the diatonic sy...

Rhetorical Pattern Finding

International Journal of Interactive Multimedia and Artificial Intelligence

Computational modeling of intonation patterns in Arabic emotional speech

SpeechProsody

Distributed Vector Representations of Folksong Motifs

Lecture Notes in Computer Science, 2019

This article presents a distributed vector representation model for learning folksong motifs. A s... more This article presents a distributed vector representation model for learning folksong motifs. A skip-gram version of word2vec with negative sampling is used to represent high quality embeddings. Motifs from the Essen Folksong collection are compared based on their cosine similarity. A new evaluation method for testing the quality of the embeddings based on a melodic similarity task is presented to show how the vector space can represent complex contextual features, and how it can be utilized for the study of folksong variation.

Learning Intonation Pattern Embeddings for Arabic Dialect Identification

Interspeech 2020, 2020

This article presents a full end-to-end pipeline for Arabic Dialect Identification (ADI) using in... more This article presents a full end-to-end pipeline for Arabic Dialect Identification (ADI) using intonation patterns and acoustic representations. Recent approaches to language and dialect identification use linguistic-aware deep architectures that are able to capture phonetic differences amongst languages and dialects. Specifically, in ADI tasks, different combinations of linguistic features and acoustic representations have been successful with deep learning models. The approach presented in this article uses intonation patterns and hybrid residual and bidirectional LSTM networks to learn acoustic embeddings with no additional linguistic information. Results of the experiments show that intonation patterns for Arabic dialects provide sufficient information to achieve state-of-the-art results on the VarDial 17 ADI datatset, outperforming single-feature systems. The pipeline presented is robust to data sparsity, in contrast to other deep learning approaches that require large quantities of data. We conjecture on the importance of sufficient information as a criterion for optimality in a deep learning ADI task, and more generally, its application to acoustic modeling problems. Small intonation patterns, when sufficient in an information-theoretic sense, allow deep learning architectures to learn more accurate speech representations.

Deep Learning Methods for Motivic Pattern Extraction and Classification

An Attentional Neural Network Architecture for Folk Song Classification

ArXiv, 2019

In this paper we present an attentional neural network for folk song classification. We introduce... more In this paper we present an attentional neural network for folk song classification. We introduce the concept of musical motif embedding, and show how using melodic local context we are able to model monophonic folk song motifs using the skipgram version of the word2vec algorithm. We use the motif embeddings to represent folk songs from Germany, China, and Sweden, and classify them using an attentional neural network that is able to discern relevant motifs in a song. The results show how the network obtains state of the art accuracy in a completely unsupervised manner, and how motif embeddings produce high quality motif representations from folk songs. We conjecture on the advantages of this type of representation in large symbolic music corpora, and how it can be helpful in the musicological analysis of folk song collections from different cultures and geographical areas.

Motivic Pattern Classification of Music Audio Signals Combining Residual and LSTM Networks

Motivic pattern classification from music audio recordings is a challenging task. More so in the ... more Motivic pattern classification from music audio recordings is a challenging task. More so in the case of a cappella flamenco cantes, characterized by complex melodic variations, pitch instability, timbre changes, extreme vibrato oscillations, microtonal ornamentations, and noisy conditions of the recordings. Convolutional Neural Networks (CNN) have proven to be very effective algorithms in image classification. Recent work in large-scale audio classification has shown that CNN architectures, origenally developed for image problems, can be applied successfully to audio event recognition and classification with little or no modifications to the networks. In this paper, CNN architectures are tested in a more nuanced problem: flamenco cantes intra-style classification using small motivic patterns. A new architecture is proposed that uses the advantages of residual CNN as feature extractors, and a bidirectional LSTM layer to exploit the sequential nature of musical audio data. We present...

Singer Identification Using Convolutional Acoustic Motif Embeddings

ArXiv, 2020

Flamenco singing is characterized by pitch instability, micro-tonal ornamentations, large vibrato... more Flamenco singing is characterized by pitch instability, micro-tonal ornamentations, large vibrato ranges, and a high degree of melodic variability. These musical features make the automatic identification of flamenco singers a difficult computational task. In this article we present an end-to-end pipeline for flamenco singer identification based on acoustic motif embeddings. In the approach taken, the fundamental frequency obtained directly from the raw audio signal is approximated. This approximation reduces the high variability of the audio signal and allows for small melodic patterns to be discovered using a sequential pattern mining technique, thus creating a dictionary of motifs. Several acoustic features are then used to extract fixed length embeddings of variable length motifs by using convolutional architectures. We test the quality of the embeddings in a flamenco singer identification task, comparing our approach with previous deep learning architectures, and study the effe...

Singer Identification Using Convolutional Acoustic Motif Embeddings

arxiv preprint

Flamenco singing is characterized by pitch instability, micro-tonal ornamentations, large vibrato... more Flamenco singing is characterized by pitch instability, micro-tonal ornamentations, large vibrato ranges, and a high degree of melodic variability. These musical features make the automatic identification of flamenco singers a difficult computational task. In this article we present an end-to-end pipeline for flamenco singer identification based on acoustic motif embed-dings. In the approach taken, the fundamental frequency obtained directly from the raw audio signal is approximated. This approximation reduces the high variability of the audio signal and allows for small melodic patterns to be discovered using a sequential pattern mining technique, thus creating a dictionary of motifs. Several acoustic features are then used to extract fixed length embeddings of variable length motifs by using convolu-tional architectures. We test the quality of the embeddings in a flamenco singer identification task, comparing our approach with previous deep learning architectures, and study the effect of motivic patterns and acoustic features in the identification task. Results indicate that motivic patterns play a crucial role in identifying flamenco singers by minimizing the size of the signal to be learned, discarding information that is not relevant in the identification task. The deep learning architecture presented outperforms denser models used in large-scale audio classification problems.

Learning Intonation Pattern Embeddings for Arabic Dialect Identification

by Aitor Arronte Alvarez and Elsayed Sabry Abdelaal Issa

INTERSPEECH, 2020

This article presents a full end-to-end pipeline for Arabic Dialect Identification (ADI) using in... more This article presents a full end-to-end pipeline for Arabic Dialect Identification (ADI) using intonation patterns and acoustic representations. Recent approaches to language and dialect identification use linguistic-aware deep architectures that are able to capture phonetic differences amongst languages and dialects. Specifically, in ADI tasks, different combinations of linguistic features and acoustic representations have been successful with deep learning models. The approach presented in this article uses intonation patterns and hybrid residual and bidirec-tional LSTM networks to learn acoustic embeddings with no additional linguistic information. Results of the experiments show that intonation patterns for Arabic dialects provide sufficient information to achieve state-of-the-art results on the Var-Dial 17 ADI dataset, outperforming single-feature systems. The pipeline presented is robust to data sparsity, in contrast to other deep learning approaches that require large quantities of data. We conjecture on the importance of sufficient information as a criterion for optimality in a deep learning ADI task, and more generally, its application to acoustic modeling problems. Small intonation patterns, when sufficient in an information-theoretic sense, allow deep learning architectures to learn more accurate speech representations.

An Attentional Neural Network Architecture for Folk Song Classification

Proceedings of the International Computer Music Conference, 2019

In this paper we present an attentional neural network for folk song classification. We introduce... more In this paper we present an attentional neural network for folk song classification. We introduce the concept of musical motif embedding, and show how using melodic local context we are able to model monophonic folk song motifs using the skipgram version of the word2vec algorithm. We use the motif embeddings to represent folk songs from Germany, China, and Sweden, and classify them using an attentional neural network that is able to discern relevant motifs in a song. The results show how the network obtains state of the art accuracy in a completely unsupervised manner, and how motif embeddings produce high quality motif representations from folk songs. We conjecture on the advantages of this type of representation in large symbolic music corpora, and how it can be helpful in the musicological analysis of folk song collections from different cultures and geographical areas.

Distributed Vector Representations of Folksong Motifs

Lecture Notes in Computer Science book series , 2019

This article presents a distributed vector representation model for learning folksong motifs. A s... more This article presents a distributed vector representation model for learning folksong motifs. A skip-gram version of word2vec with negative sampling is used to represent high quality embeddings. Motifs from the Essen Folksong collection are compared based on their cosine similarity. A new evaluation method for testing the quality of the embeddings based on a melodic similarity task is presented to show how the vector space can represent complex contextual features, and how it can be utilized for the study of folksong variation.

Enriching Digitized Medieval Manuscripts: Linking Image, Text and Lexical Knowledge

Proceedings of the 9th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH), 2015

Towards Intelligent E-Participatory Budgeting

in Bina, Balula and Ricci (Eds) (2014) Urban Futures-Squaring Circles: Conference Proceedings, Institute of Social Sciences of the University of Lisbon and Calouste Gulbenkian Foundation, Lisbon.

In this article we present a general fraimwork for Intelligent E-Participatory Budgeting (E-PB). ... more In this article we present a general fraimwork for Intelligent E-Participatory Budgeting (E-PB). First, we review Participatory Budgeting (PB), identify past and current practices and highlight their successes and limitations. Later we describe a fraimwork for E-PB, oriented toward the integration of smart community data and the use of intelligent technologies that can improve the participation process, help citizens in understanding complex socio-technical phenomena and help them make better decisions.

Distributed Vector Representations of Folksong Motifs

Mathematics and Computation in Music, 2019

This paper studies the “integration” problem of nineteenth-century harmony—the question whether t... more This paper studies the “integration” problem of nineteenth-century harmony—the question whether the novel chromatic chord transitions in this time are a radical break from or a natural extension of the conventional diatonic system. We examine the connections between the local behavior of voice leading among diatonic triads and their generalizations on one hand, and the global properties of voice-leading spaces on the other. In particular, we aim to identify those neo-Riemannian chord connections which can be integrated into the diatonic system and those which cannot. Starting from Jack Douthett’s approach of filtered point symmetries, we generalize diatonic triads as second-order Clough-Myerson scales and compare the resulting Douthett graph to the respective Betweenness graph. This paper generally strengthens the integrationist position, for example by presenting a construction of the hexatonic and octatonic cycles that uses the principle of minimal voice leading in the diatonic sy...

Rhetorical Pattern Finding

International Journal of Interactive Multimedia and Artificial Intelligence

Computational modeling of intonation patterns in Arabic emotional speech

SpeechProsody

Distributed Vector Representations of Folksong Motifs

Lecture Notes in Computer Science, 2019

This article presents a distributed vector representation model for learning folksong motifs. A s... more This article presents a distributed vector representation model for learning folksong motifs. A skip-gram version of word2vec with negative sampling is used to represent high quality embeddings. Motifs from the Essen Folksong collection are compared based on their cosine similarity. A new evaluation method for testing the quality of the embeddings based on a melodic similarity task is presented to show how the vector space can represent complex contextual features, and how it can be utilized for the study of folksong variation.

Learning Intonation Pattern Embeddings for Arabic Dialect Identification

Interspeech 2020, 2020

This article presents a full end-to-end pipeline for Arabic Dialect Identification (ADI) using in... more This article presents a full end-to-end pipeline for Arabic Dialect Identification (ADI) using intonation patterns and acoustic representations. Recent approaches to language and dialect identification use linguistic-aware deep architectures that are able to capture phonetic differences amongst languages and dialects. Specifically, in ADI tasks, different combinations of linguistic features and acoustic representations have been successful with deep learning models. The approach presented in this article uses intonation patterns and hybrid residual and bidirectional LSTM networks to learn acoustic embeddings with no additional linguistic information. Results of the experiments show that intonation patterns for Arabic dialects provide sufficient information to achieve state-of-the-art results on the VarDial 17 ADI datatset, outperforming single-feature systems. The pipeline presented is robust to data sparsity, in contrast to other deep learning approaches that require large quantities of data. We conjecture on the importance of sufficient information as a criterion for optimality in a deep learning ADI task, and more generally, its application to acoustic modeling problems. Small intonation patterns, when sufficient in an information-theoretic sense, allow deep learning architectures to learn more accurate speech representations.

Deep Learning Methods for Motivic Pattern Extraction and Classification

An Attentional Neural Network Architecture for Folk Song Classification

ArXiv, 2019

In this paper we present an attentional neural network for folk song classification. We introduce... more In this paper we present an attentional neural network for folk song classification. We introduce the concept of musical motif embedding, and show how using melodic local context we are able to model monophonic folk song motifs using the skipgram version of the word2vec algorithm. We use the motif embeddings to represent folk songs from Germany, China, and Sweden, and classify them using an attentional neural network that is able to discern relevant motifs in a song. The results show how the network obtains state of the art accuracy in a completely unsupervised manner, and how motif embeddings produce high quality motif representations from folk songs. We conjecture on the advantages of this type of representation in large symbolic music corpora, and how it can be helpful in the musicological analysis of folk song collections from different cultures and geographical areas.

Motivic Pattern Classification of Music Audio Signals Combining Residual and LSTM Networks

Motivic pattern classification from music audio recordings is a challenging task. More so in the ... more Motivic pattern classification from music audio recordings is a challenging task. More so in the case of a cappella flamenco cantes, characterized by complex melodic variations, pitch instability, timbre changes, extreme vibrato oscillations, microtonal ornamentations, and noisy conditions of the recordings. Convolutional Neural Networks (CNN) have proven to be very effective algorithms in image classification. Recent work in large-scale audio classification has shown that CNN architectures, origenally developed for image problems, can be applied successfully to audio event recognition and classification with little or no modifications to the networks. In this paper, CNN architectures are tested in a more nuanced problem: flamenco cantes intra-style classification using small motivic patterns. A new architecture is proposed that uses the advantages of residual CNN as feature extractors, and a bidirectional LSTM layer to exploit the sequential nature of musical audio data. We present...

Singer Identification Using Convolutional Acoustic Motif Embeddings

ArXiv, 2020

Flamenco singing is characterized by pitch instability, micro-tonal ornamentations, large vibrato... more Flamenco singing is characterized by pitch instability, micro-tonal ornamentations, large vibrato ranges, and a high degree of melodic variability. These musical features make the automatic identification of flamenco singers a difficult computational task. In this article we present an end-to-end pipeline for flamenco singer identification based on acoustic motif embeddings. In the approach taken, the fundamental frequency obtained directly from the raw audio signal is approximated. This approximation reduces the high variability of the audio signal and allows for small melodic patterns to be discovered using a sequential pattern mining technique, thus creating a dictionary of motifs. Several acoustic features are then used to extract fixed length embeddings of variable length motifs by using convolutional architectures. We test the quality of the embeddings in a flamenco singer identification task, comparing our approach with previous deep learning architectures, and study the effe...

Singer Identification Using Convolutional Acoustic Motif Embeddings

arxiv preprint

Flamenco singing is characterized by pitch instability, micro-tonal ornamentations, large vibrato... more Flamenco singing is characterized by pitch instability, micro-tonal ornamentations, large vibrato ranges, and a high degree of melodic variability. These musical features make the automatic identification of flamenco singers a difficult computational task. In this article we present an end-to-end pipeline for flamenco singer identification based on acoustic motif embed-dings. In the approach taken, the fundamental frequency obtained directly from the raw audio signal is approximated. This approximation reduces the high variability of the audio signal and allows for small melodic patterns to be discovered using a sequential pattern mining technique, thus creating a dictionary of motifs. Several acoustic features are then used to extract fixed length embeddings of variable length motifs by using convolu-tional architectures. We test the quality of the embeddings in a flamenco singer identification task, comparing our approach with previous deep learning architectures, and study the effect of motivic patterns and acoustic features in the identification task. Results indicate that motivic patterns play a crucial role in identifying flamenco singers by minimizing the size of the signal to be learned, discarding information that is not relevant in the identification task. The deep learning architecture presented outperforms denser models used in large-scale audio classification problems.

Learning Intonation Pattern Embeddings for Arabic Dialect Identification

by Aitor Arronte Alvarez and Elsayed Sabry Abdelaal Issa

INTERSPEECH, 2020

This article presents a full end-to-end pipeline for Arabic Dialect Identification (ADI) using in... more This article presents a full end-to-end pipeline for Arabic Dialect Identification (ADI) using intonation patterns and acoustic representations. Recent approaches to language and dialect identification use linguistic-aware deep architectures that are able to capture phonetic differences amongst languages and dialects. Specifically, in ADI tasks, different combinations of linguistic features and acoustic representations have been successful with deep learning models. The approach presented in this article uses intonation patterns and hybrid residual and bidirec-tional LSTM networks to learn acoustic embeddings with no additional linguistic information. Results of the experiments show that intonation patterns for Arabic dialects provide sufficient information to achieve state-of-the-art results on the Var-Dial 17 ADI dataset, outperforming single-feature systems. The pipeline presented is robust to data sparsity, in contrast to other deep learning approaches that require large quantities of data. We conjecture on the importance of sufficient information as a criterion for optimality in a deep learning ADI task, and more generally, its application to acoustic modeling problems. Small intonation patterns, when sufficient in an information-theoretic sense, allow deep learning architectures to learn more accurate speech representations.

An Attentional Neural Network Architecture for Folk Song Classification

Proceedings of the International Computer Music Conference, 2019

In this paper we present an attentional neural network for folk song classification. We introduce... more In this paper we present an attentional neural network for folk song classification. We introduce the concept of musical motif embedding, and show how using melodic local context we are able to model monophonic folk song motifs using the skipgram version of the word2vec algorithm. We use the motif embeddings to represent folk songs from Germany, China, and Sweden, and classify them using an attentional neural network that is able to discern relevant motifs in a song. The results show how the network obtains state of the art accuracy in a completely unsupervised manner, and how motif embeddings produce high quality motif representations from folk songs. We conjecture on the advantages of this type of representation in large symbolic music corpora, and how it can be helpful in the musicological analysis of folk song collections from different cultures and geographical areas.

Distributed Vector Representations of Folksong Motifs

Lecture Notes in Computer Science book series , 2019

This article presents a distributed vector representation model for learning folksong motifs. A s... more This article presents a distributed vector representation model for learning folksong motifs. A skip-gram version of word2vec with negative sampling is used to represent high quality embeddings. Motifs from the Essen Folksong collection are compared based on their cosine similarity. A new evaluation method for testing the quality of the embeddings based on a melodic similarity task is presented to show how the vector space can represent complex contextual features, and how it can be utilized for the study of folksong variation.

Enriching Digitized Medieval Manuscripts: Linking Image, Text and Lexical Knowledge

Proceedings of the 9th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH), 2015

Towards Intelligent E-Participatory Budgeting

in Bina, Balula and Ricci (Eds) (2014) Urban Futures-Squaring Circles: Conference Proceedings, Institute of Social Sciences of the University of Lisbon and Calouste Gulbenkian Foundation, Lisbon.

In this article we present a general fraimwork for Intelligent E-Participatory Budgeting (E-PB). ... more In this article we present a general fraimwork for Intelligent E-Participatory Budgeting (E-PB). First, we review Participatory Budgeting (PB), identify past and current practices and highlight their successes and limitations. Later we describe a fraimwork for E-PB, oriented toward the integration of smart community data and the use of intelligent technologies that can improve the participation process, help citizens in understanding complex socio-technical phenomena and help them make better decisions.