Natural language processing for addiction research: a practical primer
About
Download the preparatory information and documents
Background: Tools from artificial intelligence are promising for health research related to people who use drugs. Natural language processing (NLP) leverages machine learning in many research tasks that rely on unstructured text data. For example, NLP can be used to classify death certificate data or autopsy reports to identify the different substances that are involved in an individual’s death, enable sentiment analysis of social media posts to gauge public perceptions, or monitor administrative and clinical records for epidemiological surveillance.
Methods: This workshop will introduce basic concepts in NLP and machine learning and provide an interactive tutorial. The interactive component will use a dataset that includes over 100,000 literal text U.S. medical examiner records (i.e., text descriptions that include cause of death, contributing factors, injury description, and demographic information). All analyses will be performed in R/RStudio, a free open-source software. A dataset will be provided, and the presenters will go through code for the audience members to follow and do on their own computers. Basic typewriting skills are necessary, but extensive coding knowledge is not.
Results: Attendees will learn to implement NLP tools to identify specific substances implicated in death, categorize deaths by drug class involved, validate model performance, and adapt the approach for use in other datasets. The workshop will also discuss sources of text data on which to use NLP, including publicly available data, administrative records, online forums, and social media posts, along with special considerations for using each to address questions related to drug use or drug policy.
Conclusions: Attendees will be able to train a natural language pipeline that uses machine learning and then evaluate the performance of the pipeline on its ability to classify substances in a large, real-world dataset. Beyond the workshop demonstration example, attendees will gain a generalized understanding of the challenges and opportunities in applying NLP approaches to research questions on addiction, drugs, and drug policy.