Develop skills to process and interpret information presented in free-text data.
Identify the major classes of named entity recognition (NER) in different domains such as business, politics, and healthcare.
Implement, with guidance, state-of-the-art machine learning techniques for NER.
Compare, contract, and select between multiple machine learning and deep learning approaches for NER.
4 Modules
24 Hours
6 hrs per module (approx.)
About Applied Information Extraction in Python
In “Applied Information Extraction in Python,” you will learn how to extract useful information from free-text data, which is a type of string data created when people type. Examples of free-text data include names of people or organizations, location information such as cities and zip codes, or other elements like stock prices or clinical diagnoses. Free-text data is found everywhere, from magazine articles to social media posts, and can be complex to analyze.
In this course, you’ll use applied machine learning and text-mining techniques to analyze free-text data. You will learn how to identify named entities and tag them with appropriate types of classifications, using real-world data from business, politics, and healthcare. You’ll develop multiple approaches to recognize and extract named entities and attributes of interest from free-text data, ranging from regular expressions to neural network models. Finally, you’ll explore Transformer models such as ChatGPT and Large Language Models to extract information from large datasets.
This is the final course in “More Applied Data Science with Python,” a four-course series focused on helping you apply advanced data science techniques using Python. It is recommended that all learners complete the following courses from the Applied Data Science with Python Specialization: Introduction to Data Science in Python, Applied Machine Learning in Python, and Applied Text Mining in Python.
Skills You'll Gain
Data Manipulation
Information Extraction
Machine Learning
Python For Data Analysis
Python (Programming Language)
Text Extraction
Text Processing
What You'll Earn
Certificate of Completion:
Certificates of completion acknowledge knowledge acquired upon completion of a non-credit course or program.
Welcome to Applied Information Extraction in Python, part of the More Applied Data Science with Python specialization. This course explores techniques for extracting structured information from text using rule-based methods, machine learning, neural networks, and transformer models. You will gain hands-on experience building information extraction pipelines across diverse application domains.
This abbreviated syllabus description was created with the help of AI tools and reviewed by staff. The full syllabus is available to those who enroll in the course.
Course Schedule
Module 1: Information Extraction
Video: Welcome to Information Extraction
Reading: MADSwPy Certificate Roadmap
Reading: Course Syllabus
Reading: Introduction to Jupyter Notebook
Discussion Prompt: Meet Other Learners
Reading: Help Us Learn About You
Video: What is Information Extraction?
Video: Information Extraction in Different Domains
Graded Assignment: Knowledge Check: Introduction to Information Extraction
Video: Extracting Formatted Information
Reading: Regular Expressions in Detail
Video: Lookup Based Extraction
Graded Assignment: Knowledge Check: Rule-Based Approaches to Information Extraction
Video: Demo: Using Regular Expressions & Examining Output
Ungraded Lab: Jupyter Notebook Practice on Basic NLP and Rule-Based Extraction
Video: Assignment 1 Introduction: Formatting & Normalizing Data with Regular Expressions
Graded: Build an Information Extraction Pipeline for Template/List-Based Fields
Graded: Module 1 Assignment
Module 2: Named Entity Recognition (NER)
Video: What is Named Entity Recognition (NER)?
Graded Assignment: Knowledge Check: Named Entities and Named Entity Recognition
Reading: BIO Encoding for Named Entity Labels
Reading: BILOU Encoding for Named Entity Labels
Reading: Machine Learning Fundamentals: How Machines Learn to Label Named Entities
Video: NER as a Sequence Classification Task
Graded Assignment: Knowledge Check: Setting up NER as a Machine Learning Task
Reading: Markov Chain and Hidden Markov Models
Video: Fundamentals of Markov Chain Models
Video: Hidden Markov Models (HMMs)
Reading: Training Hidden Markov Models: How HMMs Learn to Assign Labels
Reading: The Math Behind HMMs: How Probabilities Power Sequence Labeling
Reading: Building an Information Extraction Pipeline with BiLSTMs and CRFs
Ungraded Lab: Jupyter Notebook Practice on Training LSTMs
Video: Demo: Configuring the Bi-Directional LSTM
Video: Assignment 3 Introduction: Building an Information Extraction Pipeline with BiLSTMs and CRFs
Graded: Build an Information Extraction Pipeline using Deep Neural Networks
Graded: Module 3 Assignment
Module 4: Transformers Transform Information Extraction
Video: Language Models (LMs)
Video: Large Language Models (LLMs)
Graded Assignment: Knowledge Check: Language Models
Video: Transformers
Reading: Recent Advances in GPTs
Graded Assignment: Knowledge Check: What Are Transformers?
Reading: Building an Information Extraction Pipeline with Transformers and LLMs
Role Play: Design an Information Extraction System for Sports News Reports
Video: Assignment 4 Introduction: Build an Information Extraction Pipeline using Transformers
Video: Course Wrap-Up
Reading: Continue Your Journey and Earn a Master of Applied Data Science Degree Online
Reading: Course Post-Survey
Graded: Build an Information Extraction Pipeline using Transformers
Graded: Module 4 Assignment
Grading Policy
Learners must complete all graded assignments. There is an assessment in each module worth 5% of your total grade and a programming assignment in each module worth 20% of your final grade.