[Defense] Named Entity Recognition on Social Media
Monday, November 14, 2022
3:30 pm - 5:15 pm
In
Partial
Fulfillment
of
the
Requirements
for
the
Degree
of
Doctor
of
Philosophy
Shuguang
Chen
will
defend
his
dissertation
Named
Entity
Recognition
on
Social
Media
Abstract
With the increase in popularity of social media platforms (e.g., Twitter, Facebook, and Snapchat), more and more people create, share, and exchange information and ideas in such virtual spaces every day. Consequently, this raises an increasing demand for more tools and resources to automate the processing of social media text. Specifically, the user-generated text on social media tends to be ambiguous and incomprehensible as it is often very short and contains many misspellings and language variations, making it difficult for machine learning systems to perform correctly. Moreover, social media is considered a low-resource domain, with relatively less data available for building machine learning systems. Annotating data is always time-consuming and labor-intensive, and it requires domain knowledge and experts. This dissertation aims at presenting novel methods to mitigate performance degradation and improve the model robustness of named entity recognition (NER) systems on social media. In an effort to mitigate performance degradation, I study text and image information extraction and fusion to adapt NER systems to multimodal social media environments. Besides, I propose exploiting trending topics to mitigate the impact of temporal drift. For the purpose of improving model robustness, I investigate data augmentation techniques to increase the data size and diversity for training NER systems. I propose new methods for transferring data across domains based on textual patterns (e.g., style and noise). Additionally, given that social media is a low-resource domain, I propose adversarial attack methods to audit the model robustness by creating adversarial examples to identify the potential vulnerabilities of NER systems. The methods presented in this dissertation are meant to make NER systems resilient to performance decreases, robust under various conditions, and reliable in noisy social media environments, in the hope of benefiting downstream natural language processing tasks such as information extraction, question answering, machine reading comprehension, etc.
Monday,
November
14,
2022
3:30PM
-
5:15PM
CT
Online
via
聽
Dr. Thamar Solorio, dissertation advisor
Faculty, students and the general public are invited.
