Module IV. Modeling Data
22
Messy Data
Foundations of Data Science
Preface
1
Introduction
Module I. Data Science Projects
2
History and Evolution of Data Science
3
The Data Science Project Lifecycle
4
Thinking Like a Data Scientist
5
Roles on Data Science Teams
Module II. Business Understanding: Discovery
6
Introduction
7
Approach and Methodology
8
Example: Email Campaign Optimization
Module III. Data Engineering
9
Introduction
10
Data Sources and File Formats
11
Data Access
12
Data Quality
13
Data Summarization
14
Data Visualization
15
SQL Basics
16
Data Integration
Module IV. Modeling Data
17
Introduction
18
General Concepts
19
Correlation and Causation
20
The Bias-Variance Tradeoff
21
Testing, Validation, Cross-Validation
22
Messy Data
23
Feature and Target Processing
Module V. Evaluation & Communication
Module VI. Operationalization
24
Data Science versus Software Engineering,
25
Coding Best Practices
26
Data Science Tools
Module VII. Applied Ethics in Data Science
27
Introduction
28
How Things Go Wrong
29
Bias and Harm in Algorithms
30
Personal Information and Personal Data
31
Ethics of Generative AI
Module VIII. Review Topics
32
Probability
33
Statistics
34
Linear Algebra
35
Estimation
References
Table of contents
22.1
Low Signal to Noise
22.2
Unbalanced Data
22.3
Outliers
22.4
Missing Data
Module IV. Modeling Data
22
Messy Data
22
Messy Data
22.1
Low Signal to Noise
22.2
Unbalanced Data
22.3
Outliers
22.4
Missing Data
21
Testing, Validation, Cross-Validation
23
Feature and Target Processing