Tianjian Li
Center for Language and Speech Processing, Johns Hopkins University
Hi👋, I’m Tianjian! I’m a 1st/3rd year PhD student in Computer Science at Johns Hopkins University, proudly advised by Prof. Daniel Khashabi. Previously, I completed my Master’s degree in Computer Science at JHU, where I had the privilege of working closely with my wonderful advisors, Kenton Murray and Philipp Koehn, focusing on multilingual NLP.
My research interest lies at the intersection of machine learning and natural language processing, with a particular focus on addressing the question: how can we better leverage our vast amount of data beyond simply feeding it into our models during training? To this end, I am currently working on measuring various properties of data (e.g. quality) and curating data for training and aligning our language models.
I prefer solutions that are simple, generalizable, and theoretically sound.
If you have anything to share with me, please feel free to contact me through my email: tli104 at jhu.edu
news
Oct 4, 2024 | New preprint on how to train on heavily imbalanced datasets!! |
---|---|
Apr 7, 2024 | I will be staying at Johns Hopkins University for my PhD, working with Prof. Daniel Khashabi! |
Jan 15, 2024 | Error Norm Truncation has been accepted to ICLR 2024 (spotlight) !! |
Nov 8, 2023 | New blog post on latest advances on balanced training for Multilingual Machine Translation! |
Oct 2, 2023 | New preprint on truncating noisy data for training text generation models!! |
selected publications
- preprint
- preprint
- ICLRError Norm Truncation: Robust Training in the Presence of Data Noise for Text Generation ModelsIn The Twelfth International Conference on Learning Representations
(Spotlight - Top 5%), 2024 - ACLWhy Does Zero-shot Cross-lingual Generation Fail? An Explaination and A SolutionIn Proceedings of the 2023 Annual Meeting of the Association for Computational Linguistics (Findings), Jul 2023