nlp | Pamela Toman

End-to-end neural networks for subvocal speech recognition

My final project for Stanford CS 224S was on subvocal speech recognition. This was my last paper at Stanford; it draws on everything I learned in a whirlwind of CS grad school without a CS undergraduate major. Pol Rosello provided the topic; he and I contributed equally to the paper.

We describe the first approach toward end-to-end, session-independent subvocal automatic speech recognition from involuntary facial and laryngeal muscle movements detected by surface electromyography. We leverage character-level recurrent neural networks and the connectionist temporal classiﬁcation loss (CTC). We attempt to address challenges posed by a lack of data, including poor generalization, through data augmentation of electromyographic signals, a specialized multi-modal architecture, and regularization. We show results indicating reasonable qualitative performance on test set utterances, and describe promising avenues for future work in this direction.

Hybrid Word-Character Neural Machine Translation for Arabic

My final project for Stanford CS 224N was on hybrid word-character machine translation for Arabic.

Traditional models of neural machine translation make the false-but-true-in-English assumption that words are essentially equivalent to units of meaning. Morphologically rich languages disobey this assumption. We implement a hybrid translation model that backs off unknown words to a representation created by modeling their constituent characters in TensorFlow, we apply the model to Arabic translation, and approach state-of-the-art performance for Arabic over the weeks allotted for a class project.

Automatically assessing Integrative Complexity

My final project for Stanford CS 224U was on automatically assessing integrative complexity. I drew on work I’d previously done that demonstrated ongoing value from this political psychology construct, but I had not previously tried to automatically code for this construct. The code is available on github.

Integrative complexity is a construct from political psychology that measures semantic complexity in discourse. Although this metric has been shown useful in predicting violence and understanding elections, it is very time-consuming for analysts to assess. We describe a theory-driven automated system that improves the state-of-the-art for this task from Pearson’s r = 0.57 to r = 0.73 through framing the task as ordinal regression, leveraging dense vector representations of words, and developing syntactic and semantic features that go beyond lexical phrase matching. Our approach is less labor-intensive and more transferable than the previous state-of-the-art for this task.

Automatic sign language identification

My final project for Stanford CS 231N was on automatically identifying sign languages from publicly licensed YouTube clips. For this project I learned from scratch about working with neural networks, computer vision, and video data.

Automatic processing of sign languages can only recently potentially advance beyond the toy problem of fingerspelling recognition. In just the last few years, we have leaped forward in our understanding of sign language theory, effective computer vision practices, and large-scale availability of data. This project achieves better-than-human performance on sign language identification, and it releases a dataset and benchmark for future work on the topic. It is intended as a precursor to sign language machine translation.

Identifying sign languages from video: SLANG-3k

As I haven’t yet created a permanent place to hold the dataset I collected for my most recent class project, I’m hanging it here for now. SLANG-3k is an uncurated corpus of 3000 clips of 15 seconds each of people signing in American Sign Language, British Sign Language, and German Sign Language, intended as a public benchmark dataset for sign language identification in the wild. Using 5 frames, I was able to achieve accuracies bounded around 0.66/0.67. More details can be found in the paper and poster created for CS 231N, Convolutional Neural Networks for Visual Recognition.

Many thanks to everyone who helped with this project — and most especially to the anonymous survey respondents who received only warm fuzzies as compensation for taking the time to help with this early-stage research.

Pamela Toman

Tag: nlp

End-to-end neural networks for subvocal speech recognition

Hybrid Word-Character Neural Machine Translation for Arabic

Automatically assessing Integrative Complexity

Automatic sign language identification

Identifying sign languages from video: SLANG-3k

What’s this blog about?

Recent posts

Tags