Applied Information Extraction in Python

What You'll Learn

Develop skills to process and interpret information presented in free-text data.
Identify the major classes of named entity recognition (NER) in different domains such as business, politics, and healthcare.
Implement, with guidance, state-of-the-art machine learning techniques for NER.
Compare, contract, and select between multiple machine learning and deep learning approaches for NER.

4 Modules

24 Hours

6 hrs per module (approx.)

About Applied Information Extraction in Python

In “Applied Information Extraction in Python,” you will learn how to extract useful information from free-text data, which is a type of string data created when people type. Examples of free-text data include names of people or organizations, location information such as cities and zip codes, or other elements like stock prices or clinical diagnoses. Free-text data is found everywhere, from magazine articles to social media posts, and can be complex to analyze.

In this course, you’ll use applied machine learning and text-mining techniques to analyze free-text data. You will learn how to identify named entities and tag them with appropriate types of classifications, using real-world data from business, politics, and healthcare. You’ll develop multiple approaches to recognize and extract named entities and attributes of interest from free-text data, ranging from regular expressions to neural network models. Finally, you’ll explore Transformer models such as ChatGPT and Large Language Models to extract information from large datasets.

This is the final course in “More Applied Data Science with Python,” a four-course series focused on helping you apply advanced data science techniques using Python. It is recommended that all learners complete the following courses from the Applied Data Science with Python Specialization: Introduction to Data Science in Python, Applied Machine Learning in Python, and Applied Text Mining in Python.

Skills You'll Gain

What You'll Earn

Certificate of Completion: Certificates of completion acknowledge knowledge acquired upon completion of a non-credit course or program.

Experience Type: 100% Online
Format: Self-Paced
Series: More Applied Data Science with Python
Subject: Computer Science

Data Science
Platform: Coursera

Welcome Message

Welcome to Applied Information Extraction in Python, part of the More Applied Data Science with Python specialization. This course explores techniques for extracting structured information from text using rule-based methods, machine learning, neural networks, and transformer models. You will gain hands-on experience building information extraction pipelines across diverse application domains.

This abbreviated syllabus description was created with the help of AI tools and reviewed by staff. The full syllabus is available to those who enroll in the course.

Course Schedule

Module 1: Information Extraction

Video: Welcome to Information Extraction
Reading: MADSwPy Certificate Roadmap
Reading: Course Syllabus
Reading: Introduction to Jupyter Notebook
Discussion Prompt: Meet Other Learners
Reading: Help Us Learn About You
Video: What is Information Extraction?
Video: Information Extraction in Different Domains
Graded Assignment: Knowledge Check: Introduction to Information Extraction
Video: Extracting Formatted Information
Reading: Regular Expressions in Detail
Video: Lookup Based Extraction
Graded Assignment: Knowledge Check: Rule-Based Approaches to Information Extraction
Video: Demo: Using Regular Expressions & Examining Output
Ungraded Lab: Jupyter Notebook Practice on Basic NLP and Rule-Based Extraction
Video: Assignment 1 Introduction: Formatting & Normalizing Data with Regular Expressions
Graded: Build an Information Extraction Pipeline for Template/List-Based Fields
Graded: Module 1 Assignment

Module 2: Named Entity Recognition (NER)

Video: What is Named Entity Recognition (NER)?
Graded Assignment: Knowledge Check: Named Entities and Named Entity Recognition
Reading: BIO Encoding for Named Entity Labels
Reading: BILOU Encoding for Named Entity Labels
Reading: Machine Learning Fundamentals: How Machines Learn to Label Named Entities
Video: NER as a Sequence Classification Task
Graded Assignment: Knowledge Check: Setting up NER as a Machine Learning Task
Reading: Markov Chain and Hidden Markov Models
Video: Fundamentals of Markov Chain Models
Video: Hidden Markov Models (HMMs)
Reading: Training Hidden Markov Models: How HMMs Learn to Assign Labels
Reading: The Math Behind HMMs: How Probabilities Power Sequence Labeling
Video: Conditional Random Fields (CRFs)
Graded Assignment: Knowledge Check: Hidden Markov Models (HMMs)
Video: Demo: CRF Model Training
Ungraded Lab: Jupyter Notebook Practice on Training CRFs
Video: Assignment 2 Introduction: Implementing a CRF Model
Graded: Build an Information Extraction Pipeline for CRF Based Extraction
Graded: Module 2 Assignment

Module 3: Neural Network Models

Video: Introduction to Deep Learning
Reading: Understanding Deep Learning: A Shift From Rules to Representation
Graded Assignment: Knowledge Check: What Is Deep Learning?
Video: Neural Network Models
Reading: Activation Functions: How Deep Learning Models Make Decisions
Graded Assignment: Knowledge Check: What Are Neural Network Models?
Video: Deep Neural Network Models
Reading: Understanding Deep Neural Network Models: How Depth Enables Learning at Scale
Graded Assignment: Knowledge Check: Deeper Neural Networks
Reading: Building an Information Extraction Pipeline with BiLSTMs and CRFs
Ungraded Lab: Jupyter Notebook Practice on Training LSTMs
Video: Demo: Configuring the Bi-Directional LSTM
Video: Assignment 3 Introduction: Building an Information Extraction Pipeline with BiLSTMs and CRFs
Graded: Build an Information Extraction Pipeline using Deep Neural Networks
Graded: Module 3 Assignment

Module 4: Transformers Transform Information Extraction

Video: Language Models (LMs)
Video: Large Language Models (LLMs)
Graded Assignment: Knowledge Check: Language Models
Video: Transformers
Reading: Recent Advances in GPTs
Graded Assignment: Knowledge Check: What Are Transformers?
Reading: Building an Information Extraction Pipeline with Transformers and LLMs
Role Play: Design an Information Extraction System for Sports News Reports
Video: Assignment 4 Introduction: Build an Information Extraction Pipeline using Transformers
Video: Course Wrap-Up
Reading: Continue Your Journey and Earn a Master of Applied Data Science Degree Online
Reading: Course Post-Survey
Graded: Build an Information Extraction Pipeline using Transformers
Graded: Module 4 Assignment

Grading Policy

Learners must complete all graded assignments. There is an assessment in each module worth 5% of your total grade and a programming assignment in each module worth 20% of your final grade.

V.G. Vinod Vydiswaran

Associate Professor

Course content developed by U-M faculty and managed by the university. Faculty titles and affiliations are updated periodically.

Advanced Level

Learners will benefit from having exposure to machine learning in Python, as well as completing the courses of this series in sequential order.

Individuals

This experience is available to individual learners on the following platforms:

U-M Community

Free access is only available to current U-M students, alumni, faculty, and staff.

Organizations

Special pricing and tailored programming bundles available for organizational partners.

Applied Information Extraction in Python

What You'll Learn

About Applied Information Extraction in Python

Skills You'll Gain

What You'll Earn

Enrollment Options

Individuals

U-M Community

Organizations

Michigan Online
For You

About Applied Information Extraction in Python

Skills You'll Gain

What You'll Earn

Individuals

U-M Community

Organizations

Michigan OnlineFor You

Michigan Online
For You