2022
-
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BigScience Workshop (390 authors, detailed contributions in paper). arXiv preprint.
→ [paper] [arXiv] -
Reducing conversational agents' overconfidence through linguistic calibration
Sabrina J. Mielke, Arthur Szlam, Emily Dinan, Y-Lan Boureau. TACL (to be presented at NAACL 2022).
→ [paper] [arXiv] [MIT Press] [slides] [slides (no animations)] [talk on YouTube] [talk (download)] -
UniMorph 4.0: Universal Morphology
Khuyagbaatar Batsuren, Omer Goldman, Salam Khalifa, Nizar Habash, Witold Kieraś, Gábor Bella, Brian Leonard, Garrett Nicolai, Kyle Gorman, Yustinus Ghanggo Ate, Maria Ryskina, Sabrina J. Mielke, Elena Budianskaya, Charbel El-Khaissi, Tiago Pimentel, Michael Gasser, William Lane, Mohit Raj, Matt Coler, Jaime Rafael Montoya Samame, Delio Siticonatzi Camaiteri, Benoît Sagot, Esaú Zumaeta Rojas, Didier López Francis, Arturo Oncevay, Juan López Bautista, Gema Celeste Silva Villegas, Lucas Torroba Hennigen, Adam Ek, David Guriel, Peter Dirix, Jean-Philippe Bernardy, Andrey Scherbakov, Aziyana Bayyr-ool, Antonios Anastasopoulos, Roberto Zariquiey, Karina Sheifer, Sofya Ganieva, Hilaria Cruz, Ritván Karahóǧa, Stella Markantonatou, George Pavlidis, Matvey Plugaryov, Elena Klyachko, Ali Salehi, Candy Angulo, Jatayu Baxi, Andrew Krizhanovsky, Natalia Krizhanovskaya, Elizabeth Salesky, Clara Vania, Sardana Ivanova, Jennifer White, Rowan Hall Maudslay, Josef Valvoda, Ran Zmigrod, Paula Czarnowska, Irene Nikkarinen, Aelita Salchak, Brijesh Bhatt, Christopher Straughn, Zoey Liu, Jonathan North Washington, Yuval Pinter, Duygu Ataman, Marcin Wolinski, Totok Suhardijanto, Anna Yablonskaya, Niklas Stoehr, Hossep Dolatian, Zahroh Nuriah, Shyam Ratan, Francis M. Tyers, Edoardo M. Ponti, Grant Aiton, Aryaman Arora, Richard J. Hatcher, Ritesh Kumar, Jeremiah Young, Daria Rodionova, Anastasia Yemelina, Taras Andrushko, Igor Marchenko, Polina Mashkovtseva, Alexandra Serova, Emily Prud'hommeaux, Maria Nepomniashchaya, Fausto Giunchiglia, Eleanor Chodroff, Mans Hulden, Miikka Silfverberg, Arya D. McCarthy, David Yarowsky, Ryan Cotterell, Reut Tsarfaty, Ekaterina Vylomova. arXiv preprint.
→ [paper] [arXiv]
2021
-
Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP
Sabrina J. Mielke, Zaid Alyafeai, Elizabeth Salesky, Colin Raffel, Manan Dey, Matthias Gallé, Arun Raja, Chenglei Si, Wilson Y. Lee, Benoît Sagot, Samson Tan. arXiv preprint.
→ [paper] [arXiv] -
SIGTYP 2021 Shared Task: Robust Spoken Language Identification
Elizabeth Salesky*, Badr M. Abdullah*, Sabrina J. Mielke*, Elena Klyachko, Oleg Serikov, Edoardo Maria Ponti, Ritesh Kumar, Ryan Cotterell, Ekaterina Vylomova. SIGTYP 2021.
→ [paper] [entry in proceedings] [arXiv] -
SIGMORPHON 2021 Shared Task on Morphological Reinflection: Generalization Across Languages
Tiago Pimentel, Maria Ryskina, Sabrina J. Mielke, Shijie Wu, Eleanor Chodroff, Brian Leonard, Garrett Nicolai, Yustinus Ghanggo Ate, Salam Khalifa, Nizar Habash, Charbel El-Khaissi, Omer Goldman, Michael Gasser, William Lane, Matt Coler, Arturo Oncevay, Jaime Rafael Montoya Samame, Gema Celeste Silva Villegas, Adam Ek, Jean-Philippe Bernardy, Andrey Shcherbakov, Aziyana Bayyr-ool, Karina Sheifer, Sofya Ganieva, Matvey Plugaryov, Elena Klyachko, Ali Salehi, Andrew Krizhanovsky, Natalia Krizhanovsky, Clara Vania, Sardana Ivanova, Aelita Salchak, Christopher Straughn, Zoey Liu, Jonathan North Washington, Duygu Ataman, Witold Kieraś, Marcin Woliński, Totok Suhardijanto, Niklas Stoehr, Zahroh Nuriah, Shyam Ratan, Francis M. Tyers, Edoardo M. Ponti, Grant Aiton, Richard J. Hatcher, Emily Prud’hommeaux, Ritesh Kumar, Mans Hulden, Botond Barta, Dorina Lakatos, Gábor Szolnok, Judit Ács, Mohit Raj, David Yarowsky, Ryan Cotterell, Ben Ambridge, Ekaterina Vylomova. SIGMORPHON 2021.
→ [paper] [entry in proceedings]
2020
-
Tired of Topic Models? Clusters of Pretrained Word Embeddings Make for Fast and Good Topics too!
Suzanna Sia, Ayush Dalmia, Sabrina J. Mielke. EMNLP 2020.
→ [paper] [entry in proceedings] [arXiv] -
SIGTYP 2020 Shared Task: Prediction of Typological Features
Johannes Bjerva, Elizabeth Salesky, Sabrina J. Mielke, Aditi Chaudhary, Giuseppe G. A. Celano, Edoardo M. Ponti, Ekaterina Vylomova, Ryan Cotterell, Isabelle Augenstein. SIGTYP 2020.
→ [paper] [entry in proceedings] [arXiv] -
SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection
Ekaterina Vylomova, Jennifer White, Elizabeth Salesky, Sabrina J. Mielke, Shijie Wu, Edoardo Ponti, Rowan Hall Maudslay, Ran Zmigrod, Josef Valvoda, Svetlana Toldova, Francis Tyers, Elena Klyachko, Ilya Yegorov, Natalia Krizhanovsky, Paula Czarnowska, Irene Nikkarinen, Andrew Krizhanovsky, Tiago Pimentel, Lucas Torroba Hennigen, Christo Kirov, Garrett Nicolai, Adina Williams, Antonios Anastasopoulos, Hilaria Cruz, Eleanor Chodroff, Ryan Cotterell, Miikka Silfverberg, Mans Hulden. SIGMORPHON 2020.
→ [paper] [entry in proceedings] [arXiv] -
It's Easier to Translate out of English than into it: Measuring Neural Translation Difficulty by Cross-Mutual Information
Emanuele Bugliarello, Sabrina J. Mielke, Antonios Anastasopoulos, Ryan Cotterell, Naoaki Okazaki. ACL 2020.
→ [paper] [inofficial slides] [entry in proceedings] [arXiv] -
Processing South Asian Languages Written in the Latin Script: the Dakshina Dataset
Brian Roark, Lawrence Wolf-Sonkin, Christo Kirov, Sabrina J. Mielke, Cibu Johny, Isin Demirsahin, Keith Hall. LREC 2020.
→ [paper] [entry in proceedings] [arXiv] -
UniMorph 3.0: Universal Morphology
Arya D. McCarthy, Christo Kirov, Matteo Grella, Amrit Nidhi, Patrick Xia, Kyle Gorman, Ekaterina Vylomova, Sabrina J. Mielke, Garrett Nicolai, Miikka Silfverberg, Timofey Arkhangelskiy, Nataly Krizhanovsky, Andrew Krizhanovsky, Elena Klyachko, Alexey Sorokin, John Mansfield, Valts Ernštreits, Yuval Pinter, Cassandra L. Jacobs, Ryan Cotterell, Mans Hulden, David Yarowsky. LREC 2020.
→ [paper] [entry in proceedings]
2019
-
The SIGMORPHON 2019 Shared Task: Morphological Analysis in Context and Cross-Lingual Transfer for Inflection
Arya D. McCarthy, Ekaterina Vylomova, Shijie Wu, Chaitanya Malaviya, Lawrence Wolf-Sonkin, Garrett Nicolai, Miikka Silfverberg, Sabrina J. Mielke, Jeffrey Heinz, Ryan Cotterell, Mans Hulden. SIGMORPHON 2019.
→ [paper] [entry in proceedings] [arXiv] -
What Kind of Language Is Hard to Language-Model?
Sabrina J. Mielke, Ryan Cotterell, Kyle Gorman, Brian Roark, Jason Eisner. ACL 2019.
→ [paper] [slides] [entry in proceedings] [arXiv] -
Counterfactual Data Augmentation for Mitigating Gender Stereotypes in Languages with Rich Morphology
Ran Zmigrod, Sabrina J. Mielke, Ryan Cotterell, Hanna Wallach. ACL 2019.
→ [paper] [slides] [entry in proceedings] [arXiv] -
Spell Once, Summon Anywhere: A Two-Level Open-Vocabulary Language Model
Sabrina J. Mielke and Jason Eisner. AAAI 2019.
→ [paper] [slides] [poster] [arXiv] [AAAI] [summary webpage] [reversible tokenization code]
2018
-
The CoNLL--SIGMORPHON 2018 Shared Task: Universal Morphological Reinflection
Ryan Cotterell, Christo Kirov, John Sylak-Glassman, Géraldine Walther, Ekaterina Vylomova, Arya D. McCarthy, Katharina Kann, Sabrina J. Mielke, Garrett Nicolai, Miikka Silfverberg, David Yarowsky, Jason Eisner, Mans Hulden. CoNLL-SIGMORPHON 2018.
→ [paper] [entry in proceedings] [arXiv] -
A Structured Variational Autoencoder for Contextual Morphological Inflection
Lawrence Wolf-Sonkin*, Jason Naradowsky*, Sabrina J. Mielke*, Ryan Cotterell*. ACL 2018.
→ [paper] [entry in proceedings] [arXiv] [code/data] -
Unsupervised Disambiguation of Syncretism in Inflected Lexicons
Ryan Cotterell, Christo Kirov, Sabrina J. Mielke, Jason Eisner. NAACL 2018.
→ [paper] [poster] [entry in proceedings] [arXiv] [code/data] -
Are All Languages Equally Hard to Language-Model?
Ryan Cotterell, Sabrina J. Mielke, Jason Eisner, Brian Roark. NAACL 2018.
→ [paper] [poster] [entry in proceedings] [arXiv] -
UniMorph 2.0: Universal Morphology
Christo Kirov, Ryan Cotterell, John Sylak-Glassman, Géraldine Walther, Ekaterina Vylomova, Patrick Xia, Manaal Faruqui, Sabrina J. Mielke, Arya McCarthy, Sandra Kübler, David Yarowsky, Jason Eisner, Mans Hulden. LREC 2018.
→ [paper] [entry in proceedings] [arXiv] -
Incident-driven machine translation and name tagging for low-resource languages
Ulf Hermjakob, Qiang Li, Daniel Marcu, Jonathan May, S. J. Mielke, Nima Pourdamghani, Michael Pust, Xing Shi, Kevin Knight, Tomer Levinboim, Kenton Murray, David Chiang, Boliang Zhang, Xiaoman Pan, Di Lu, Ying Lin, Heng Ji. Machine Translation (Springer Journal).
→ [entry on SpringerLink]
2017
-
Soft matching of terminals for syntactic parsing (Master's thesis)
→ [thesis] [slides] -
Using hybrid grammars for machine translation (research project)
→ [report] [slides]
2016
-
Empirically dissecting Count-based State Merging (research project talk)
→ [slides] -
Let's not be clever: simple pre- and post-processing tricks in machine translation (internship presentation)
→ [slides]
2015
-
Transition-based dependency parsing using neural networks (seminar presentation on “A Fast and Accurate Dependency Parser using Neural Networks”, Chen and Manning, 2014)
→ [report] [slides] -
Extracting and binarizing probabilistic linear context-free rewriting systems (Bachelor's thesis)
→ [thesis] [slides]