Data Science -- A Practical and Philosophical Introduction

An introduction to the theory and practice of data science, using Python, SQL, and R

View project on GitHub

Data Science: A Practical and Philosophical Introduction

Brendan Shea, PHD

Welcome to Data Science: A Practical and Philosophical Introduction, an open-access textbook designed to teach the fundamentals of data science through a hands-on, practical approach combined with philosophical insights. The textbook includes Jupyter notebooks with interactive exercises, real-world data examples, and practical tools like SQL, Python, and statistical libraries.

Chapters

0. Introduction to Colab

Open In Colab
This chapter introduces you to Google Colab, the platform you will be using for running Python code in this textbook. You’ll learn the basics of working in Colab, including writing code, adding markdown cells, and running Jupyter notebooks.

1. Organizing Data

Open In Colab
In this chapter, you’ll explore different ways to organize your data, covering topics such as directory structures, file naming conventions, and an introduction to data wrangling using Python and Pandas. Examples include organizing a dataset of game scores.

2. Types of Data

Open In Colab
This chapter introduces you to different data types (e.g., categorical, numerical, ordinal), including hands-on examples using datasets such as demographic surveys. You’ll use Pandas to identify and manage different data types, and learn how to handle conversions and type inference.

3. Minecrafting Our Data

Open In Colab
An engaging example using Minecraft-inspired data. In this chapter, you will practice data extraction, transformation, and loading (ETL) using Python tools like SQLite and Pandas. You’ll build a small SQLite database with village data, preparing it for future analysis.

4. Data Cleaning

Open In Colab
In this chapter, you’ll delve into data cleaning techniques, including handling missing data, detecting outliers, and ensuring data consistency. You’ll work with real-world datasets, such as survey responses, and use Pandas for transforming messy data into a clean, usable form.

5. Write Better Queries

Open In Colab
Learn how to write efficient SQL queries to access and manipulate relational databases. You’ll use SQLite to practice your query writing skills on datasets such as movie databases and learn how to optimize performance for large datasets.

6. Descriptive Statistics

Open In Colab
This chapter covers the fundamentals of descriptive statistics. You’ll use Python libraries like NumPy and Pandas to calculate measures of central tendency and variability. The examples include data from weather observations and student test scores.

7. Inferential Statistics

Open In Colab
Dive into inferential statistics, learning about hypothesis testing, confidence intervals, and statistical tests such as t-tests and chi-squared tests. You’ll apply these concepts to data such as medical trial results and election polling data, using SciPy and Statsmodels.

8. Analysis

Open In Colab
In this chapter, you’ll analyze datasets using techniques such as linear regression, correlation analysis, and classification. The examples focus on economic data and sports analytics, using tools such as Pandas, Statsmodels, and Scikit-learn.

9. Reports

Open In Colab
This chapter explores the best practices for creating professional reports. You’ll learn how to use Markdown, Jupyter Notebooks, and LaTeX to format and present your analysis effectively. Examples include preparing reports on company sales data.

10. Data Dashboards

Open In Colab
Learn to create interactive data dashboards using Dash and Plotly. In this chapter, you’ll build a simple dashboard that visualizes data from a fictional retail company, allowing users to interact with the data and generate custom views.

11. Data Governance

Open In Colab
This chapter covers the critical topic of data governance, including issues related to data privacy, ethics, and compliance. You’ll explore real-world examples of data governance challenges faced by a fictional DNA testing company (set in the world of the X-men) using a Postgres database.

Additional Notebooks

  • Make Shire House Data
    Open In Colab
    This supplementary notebook provides a guide for generating synthetic data for a hypothetical Shire House business. You’ll use Python to simulate customer data and prepare it for analysis.

License

This open-access textbook is licensed under the MIT License. For more details, refer to the LICENSE file in this repository.


We hope you find this textbook useful and encourage you to explore the various chapters in an interactive way through the provided Colab links.

A Note on the Use of AI Tools. These chapters were intitially developed as the “generative AI” explosion took off (staring with OpenAI’s GPT 3.0), and I’ve had fun experimenting with many of these tools—including successive versions of ChatGPT, Google Gemini, Claude, Mistral, CoPilot, and others—in helping to turn my (voluminous, but often unorganized) lecture notes into something resembling a proper book. My experience was these tools with these has been generally positive, and I think that they can someday do at least some of the work done by traditional editors and publishing houses (I say this as a former editor at an academic press!). I’m less convinced they are going to immediately replace the actual writer or programmer, though, as there’s still a fair amount of expertise (and effort!) into producing quality, meaningful output.

About the Author

Brendan Shea, PhD, is Professor of Philosophy and Computer Science at Rochester Community and Technical College and a Resident Fellow at the Minnesota Center for Philosophy of Science at the University of Minnesota-Twin Cities. He also serves as the Public Member of the Institutional Biosafety Committee at Mayo Clinic-Rochester. His main research and teaching interests lie in the philosophy of science, data modeling, applied ethics, and in the areas where these overlap (such as bioethics and the ethics of artificial intelligence). You can find out more about his research here: https://philpeople.org/profiles/brendan-shea.