Associate Professor
Your browser is ancient!
Upgrade to a different browser to experience this site.
In “Applied Information Extraction in Python,” you will learn how to extract useful information from free-text data, which is a type of string data created when people type. Examples of free-text data include names of people or organizations, location information such as cities and zip codes, or other elements like stock prices or clinical diagnoses. Free-text data is found everywhere, from magazine articles to social media posts, and can be complex to analyze.
In this course, you’ll use applied machine learning and text-mining techniques to analyze free-text data. You will learn how to identify named entities and tag them with appropriate types of classifications, using real-world data from business, politics, and healthcare. You’ll develop multiple approaches to recognize and extract named entities and attributes of interest from free-text data, ranging from regular expressions to neural network models. Finally, you’ll explore Transformer models such as ChatGPT and Large Language Models to extract information from large datasets.
This is the final course in “More Applied Data Science with Python,” a four-course series focused on helping you apply advanced data science techniques using Python. It is recommended that all learners complete the following courses from the Applied Data Science with Python Specialization: Introduction to Data Science in Python, Applied Machine Learning in Python, and Applied Text Mining in Python.
Welcome to Applied Information Extraction in Python, part of the More Applied Data Science with Python specialization. This course explores techniques for extracting structured information from text using rule-based methods, machine learning, neural networks, and transformer models. You will gain hands-on experience building information extraction pipelines across diverse application domains.
This abbreviated syllabus description was created with the help of AI tools and reviewed by staff. The full syllabus is available to those who enroll in the course.
Module 1: Information Extraction
Module 2: Named Entity Recognition (NER)
Module 3: Neural Network Models
Module 4: Transformers Transform Information Extraction
Learners must complete all graded assignments. There is an assessment in each module worth 5% of your total grade and a programming assignment in each module worth 20% of your final grade.
Associate Professor
Course content developed by U-M faculty and managed by the university. Faculty titles and affiliations are updated periodically.
Advanced Level
Learners will benefit from having exposure to machine learning in Python, as well as completing the courses of this series in sequential order.