Chapter 5: Relational Schema Design (The Covert Spy Academy)#
Database and SQL Through Pop Culture | Brendan Shea, PhD
Welcome to the exciting world of database design. In this chapter, we move through the full process of creating a database from scratch. We begin with conceptual modeling, where we identify the key entities, attributes, and relationships in our spy academy database. From there, we move to logical modeling, where keys, normalization, and join tables help us avoid redundancy and protect data integrity. Finally, we move into physical modeling, where we use SQL to create, modify, and delete tables in SQLite.
Learning Outcomes#
By the end of this chapter, you will be able to:
Distinguish conceptual, logical, and physical database design.
Translate business rules into entities, attributes, relationships, and ERDs.
Choose primary keys and foreign keys and resolve many-to-many relationships with join tables.
Normalize tables to 1NF, 2NF, and 3NF by identifying repeating groups, partial dependencies, and transitive dependencies.
Explain the benefits and trade-offs of normalization and avoid common design mistakes.
Choose appropriate data types and constraints, including how SQLite type affinity differs from stricter SQL typing.
Use SQL DDL to create and drop tables with keys and column constraints.
Keywords: database design, conceptual modeling, logical modeling, physical modeling, entities, attributes, relationships, ERD, primary key, foreign key, normalization, 1NF, 2NF, 3NF, join tables, data types, constraints, CREATE TABLE, DROP TABLE
Brendan’s Lecture#
from IPython.display import YouTubeVideo
YouTubeVideo('6C2KECb9ZwE', width=800, height=500)
Conceptual Design at the Covert Academy#
Welcome to the Covert Academy, the world’s premier educational institution for aspiring secret agents! Our state-of-the-art facility is hidden beneath the streets of London and equipped with the latest in espionage technology. We offer a wide range of classes to train our students in the arts of surveillance, infiltration, disguise, and more.
To keep track of our students and classes, we need a well-designed database. Let’s walk through the process of conceptual design for this database.
What is Conceptual Design?#
Conceptual design is the first step in creating a database. It involves formulating business rules, which define how the database should work and what constraints it should have. It also involves creating a preliminary list of entities and attributes, which represent the key concepts and data elements in the system. Finally, it involves creating a preliminary Entity-Relationship Diagram (ERD) to visually represent the entities and their relationships.
Formulating Business Rules#
Business rules are concise, unambiguous statements that define or constrain some aspect of the database. They are derived through a combination of descriptive processes, such as interviewing stakeholders to understand their needs and requirements, and normative processes, which involve making decisions about how the system should work.
For the Covert Academy, we have the following business rules:
Each student must be enrolled in at least one class, but may be enrolled in multiple classes.
Each class must have at least one student enrolled, but may have many students.
To write effective business rules:
Keep them concise and unambiguous
Ensure they are testable (you can determine if the system is complying with the rule)
Involve all relevant stakeholders in formulating and reviewing them
Prioritize them (some may be must-haves, others may be nice-to-haves)
Preliminary List of Entities and Attributes#
Based on our business rules, we can identify our preliminary entities, which are the key concepts or objects in our system, and their attributes, which are the data elements that describe each entity.
For the Covert Academy, our entities and attributes are:
Student
Name
Codename
Nationality
Specialization
Class
Name
Description
Instructor
Location
Note that at this stage, we don’t include data types, identifier columns, or join tables. We’re just focusing on the core entities and their key attributes.
Preliminary ERD#
Finally, we can create a preliminary ERD to visualize our entities and their relationship. Here’s what it looks like for the Covert Academy:
This diagram shows that there is a many-to-many relationship between STUDENT and CLASS (a student can enroll in many classes, a class can have many students).
We’ll refine this ERD in the next step, logical modeling, where we’ll resolve the many-to-many relationship with a join table. But this preliminary ERD gives us a good starting point to visualize our system.
What is Logical Modeling?#
In the previous section, we laid the conceptual foundation for the Covert Academy’s database. Now it’s time to take that conceptual model and transform it into a logical model using the relational model.
Logical modeling is the process of taking a conceptual model and adapting it to fit a specific logical data model. In our case, we’ll be using the relational model, which organizes data into tables (also known as relations) with rows (also known as tuples) and columns (also known as attributes).
In the relational model:
Each table should represent a single entity or concept
Each row in a table represents a specific instance of that entity
Each column in a table represents an attribute of that entity
Each table should have a primary key, a unique identifier for each row
Resolving Many-to-Many Relationships#
In our conceptual model, we had a many-to-many relationship between STUDENT and CLASS. However, in the relational model, we can’t directly represent many-to-many relationships. Instead, we need to introduce a join table.
A join table is a table that breaks down a many-to-many relationship into two one-to-many relationships. It does this by having foreign keys to both of the original tables.
For the Covert Academy, our join table will be called ENROLLMENT. It will have the following structure:
ENROLLMENT
StudentID (Foreign Key to STUDENT)
ClassID (Foreign Key to CLASS)
EnrollmentDate
Now, instead of a direct many-to-many relationship, we have:
A one-to-many relationship between STUDENT and ENROLLMENT
A one-to-many relationship between CLASS and ENROLLMENT
Choosing Primary Keys#
Each table in our database needs a primary key. A primary key is a unique identifier for each row in a table. There are two main options for primary keys:
Natural Key: A natural key is a key that uses one of the existing attributes of the entity. For example, we could use a student’s email as the primary key for the STUDENT table. Natural keys can be convenient because they don’t require an additional column. However, they can also be problematic if the natural key ever needs to change.
Surrogate Key: A surrogate key is an artificial key that is created specifically to be the primary key. It’s usually a simple integer or a universally unique identifier (UUID). Surrogate keys are often preferable because they are guaranteed to be unique and they never need to change.
For the Covert Academy, we’ll use surrogate keys for all of our tables. We’ll call these ID columns.
Updated ERD#
With these changes in mind, here’s our updated ERD:
The tables are as follows:
STUDENT (ID (Primary Key), Name, Codename, Nationality, Specialization)
CLASS (ID (Primary Key), Name, Description, Instructor, Location)
ENROLLMENT (StudentID (Foreign Key to STUDENT), ClassID (Foreign Key to CLASS), EnrollmentDate)
Logical Modeling: Normalization#
Normalization helps you organize tables so each fact is stored once and relationships stay clear.
Why normalize?#
A single table that mixes agents, missions, and handler contacts looks convenient, but it creates:
Redundant data
Update anomalies
Insertion anomalies
Deletion anomalies
Using the SPECTRE Academy example, repeating handler phone numbers or storing many skills in one field makes the design hard to maintain.
First Normal Form (1NF)#
A table is in 1NF when:
Each column holds one value per row.
Repeating groups are removed.
Each row can be identified with a key.
Before: an Agents table stores Skills as "Espionage, Combat, Driving".
After: split the design into:
AGENT(AgentID, Name, Codename)AGENT_SKILLS(AgentID, Skill)
That change makes each skill searchable, sortable, and easy to validate.
1NF Checklist and Tradeoffs#
Common warning signs:
comma-separated lists in one column
contact fields that mix phone numbers and emails
location or address fields that hide multiple facts in one string
Benefits of 1NF:
simpler queries
cleaner updates
better indexing
stronger data integrity
Sometimes teams denormalize for reporting or performance, but that should be a deliberate choice with a clear reason.
Second Normal Form (2NF)#
Once a database achieves First Normal Form, the next step is to evaluate it for Second Normal Form compliance. Second Normal Form builds upon 1NF by addressing a specific type of data dependency.
A table is in Second Normal Form when:
The table is already in First Normal Form.
Every non-key attribute in the table is fully functionally dependent on the entire primary key.
The concept of functional dependency means that one attribute’s value determines another attribute’s value. A full functional dependency exists when an attribute depends on the entire primary key, not just part of it.
Recognizing Partial Dependencies#
Let’s examine a table from our spy training academy that tracks agent certifications:
AgentID |
SkillID |
CertificationDate |
SkillName |
SkillDifficulty |
|---|---|---|---|---|
007 |
S1 |
2023-01-15 |
Lockpicking |
Advanced |
007 |
S2 |
2023-02-01 |
Surveillance |
Expert |
008 |
S1 |
2023-03-10 |
Lockpicking |
Advanced |
In this table, the primary key consists of both AgentID and SkillID together. Let’s analyze the dependencies:
CertificationDate depends on both AgentID and SkillID.
SkillName depends only on SkillID.
SkillDifficulty depends only on SkillID.
The partial dependencies in this table violate Second Normal Form because SkillName and SkillDifficulty depend only on SkillID, which is just part of the primary key.
Converting to 2NF#
To achieve Second Normal Form, we separate the attributes that depend on only part of the primary key into their own table:
AGENT_CERTIFICATIONS table:
AgentID |
SkillID |
CertificationDate |
|---|---|---|
007 |
S1 |
2023-01-15 |
007 |
S2 |
2023-02-01 |
008 |
S1 |
2023-03-10 |
SKILLS table:
SkillID |
SkillName |
SkillDifficulty |
|---|---|---|
S1 |
Lockpicking |
Advanced |
S2 |
Surveillance |
Expert |
Another 2NF Example#
Consider a table tracking mission equipment assignments:
MissionID |
AgentID |
EquipmentID |
MissionLocation |
EquipmentName |
EquipmentType |
|---|---|---|---|---|---|
M1 |
007 |
E1 |
Venice |
Laser Watch |
Gadget |
M1 |
007 |
E2 |
Venice |
Grappling Hook |
Tool |
M2 |
008 |
E1 |
Paris |
Laser Watch |
Gadget |
This table has several problems:
MissionLocation depends only on MissionID.
EquipmentName and EquipmentType depend only on EquipmentID.
To achieve 2NF, we create three separate tables:
MISSION_EQUIPMENT table:
MissionID |
AgentID |
EquipmentID |
|---|---|---|
M1 |
007 |
E1 |
M1 |
007 |
E2 |
M2 |
008 |
E1 |
MISSIONS table:
MissionID |
MissionLocation |
|---|---|
M1 |
Venice |
M2 |
Paris |
EQUIPMENT table:
EquipmentID |
EquipmentName |
EquipmentType |
|---|---|---|
E1 |
Laser Watch |
Gadget |
E2 |
Grappling Hook |
Tool |
Benefits of Second Normal Form#
Moving to Second Normal Form provides several important benefits:
Data consistency improves because information about each entity is stored in only one place.
Update operations become more reliable because related data is properly grouped together.
Storage efficiency increases because the database eliminates redundant data storage.
Data modifications become safer because changes to one aspect of the data won’t affect unrelated aspects.
Common 2NF Mistakes to Avoid#
Database designers should watch for these common issues when working with Second Normal Form:
Composite keys are sometimes overlooked when analyzing dependencies, leading to incomplete normalization.
Transitive dependencies might be missed when focusing only on partial dependencies.
Over-normalization can occur when splitting tables unnecessarily, making the database harder to understand.
In the next section, we’ll explore Third Normal Form and examine how it addresses transitive dependencies to further improve database design.
Third Normal Form and Beyond#
Third Normal Form (3NF) removes transitive dependencies. A transitive dependency happens when a non-key field depends on another non-key field instead of directly on the key.
Example:
AGENT(AgentID, Name, HandlerID, HandlerName, HandlerRank)
Here, HandlerName and HandlerRank really depend on HandlerID, not AgentID.
To fix it, split the design into:
AGENTS(AgentID, Name, HandlerID)HANDLERS(HandlerID, HandlerName, HandlerRank)
This reduces duplication and keeps handler updates in one place.
Practical Normalization Strategy#
Use this workflow when reviewing a design:
Get each table into 1NF.
Remove partial dependencies to reach 2NF.
Remove transitive dependencies to reach 3NF.
Stop and evaluate before adding more complexity.
Beyond 3NF:
BCNF strengthens the key rules.
4NF handles multivalued dependencies.
5NF handles rare join-dependency problems.
In practice, most production schemas stop at 3NF and only denormalize later for a measured performance or reporting need.
Quiz: Normalize This#
Interactive activity: Open this standalone page to use the activity in the Jupyter Book or from Colab.
Physical Modeling - Bringing the Database to Life#
We’ve conceptually designed our database and transformed it into a logical model using the relational model and normalization techniques. Now it’s time for the exciting part - actually creating our database!
Physical modeling is the process of taking the logical model and implementing it in a specific database management system (DBMS). This involves defining the actual tables, columns, data types, and constraints in the database using the Data Definition Language (DDL) of the chosen DBMS.
For the Covert Academy’s database, we’ll be using SQLite (which is what we’ve used in the class so far).
SQLite is a lightweight, file-based DBMS. It’s a popular choice for many applications because:
It’s serverless (the database is just a file)
It’s self-contained (no external dependencies)
It’s cross-platform (works on all major operating systems)
It’s open-source and free to use
SQLite, like most modern DBMSs, uses SQL (Structured Query Language) for defining and manipulating databases.
ANSI SQL Data Types: Big Picture#
The ANSI/ISO SQL standard gives database systems a shared vocabulary for types, but each DBMS implements that standard a little differently.
Common type families#
Numeric:
INTEGER,SMALLINT,BIGINT,DECIMAL,REALCharacter:
CHAR,VARCHAR,TEXTTemporal:
DATE,TIME,TIMESTAMPBinary:
BINARY,VARBINARY,BLOBOther common types:
BOOLEAN,JSON,UUID
Why the standard matters#
improves portability between database systems
helps teams talk about schema choices precisely
makes column intent clearer during design
A good rule is to choose the smallest type that accurately represents the data and matches how you plan to query it.
SQLite Type Affinity in Practice#
SQLite is more flexible than many other databases. Instead of enforcing types rigidly, it uses five storage classes:
NULLINTEGERREALTEXTBLOB
Declared column types map to type affinity:
INTEGERaffinityTEXTaffinityREALaffinityBLOBaffinityNUMERICaffinity
That means SQLite will often convert values when it can. For example, '8' may be stored as an integer in a numeric column, while 7 inserted into a text column may be stored as '7'.
The design takeaway is simple: declare types clearly even in SQLite, because the schema communicates your intent to both the database and future readers.
Practical Database Design for Beginners#
When you’re creating your first database for an intelligence operation (or any application), choosing the right data types is one of the most important decisions you’ll make. Here are some straightforward guidelines:
Choosing the Right Data Type#
Keep it simple when starting out. Begin with these basic types for most of your needs:
INTEGER for IDs, counts, and whole numbers
TEXT for names, descriptions, and any text content
REAL for measurements and calculations
DATE/TIME for timestamps and calendar dates
BOOLEAN for yes/no or true/false situations
Match the data type to what you’re actually storing. Think about:
What kind of information is this? (number, text, date, etc.)
How large could this value get? (small integers vs. large integers)
Do I need decimal precision? (whole numbers vs. decimals)
Will I need to sort or calculate with this field?
Examples of Basic Data Type Decisions#
For an agent management system, you might choose:
agent_id: INTEGER - Simple, efficient for primary keys
code_name: TEXT - Variable length names like “Bond” or “Black Widow”
clearance_level: INTEGER - Simple numbers from 1-10
active_status: BOOLEAN - Either active (1) or inactive (0)
recruitment_date: DATE - When they joined
For mission tracking:
mission_id: INTEGER - Sequential identifier
mission_name: TEXT - Operation names like “Skyfall”
location: TEXT - “London, UK” or “Abandoned Warehouse”
start_date: DATE - When the mission begins
budget: DECIMAL(10,2) - Precise financial tracking (e.g., $250,000.75)
SQLite’s Simple Approach for Beginners#
SQLite makes data types easier for beginners. Unlike other databases, SQLite is flexible about data types. This means:
You can store a number in a TEXT column (it will convert automatically)
You can store text in an INTEGER column (it will try to convert if possible)
You don’t need to worry as much about exact type matching
This table shows what actually happens when you insert different values into SQLite columns:
Declared Column Type |
Value Inserted |
What Actually Gets Stored |
|---|---|---|
INTEGER |
42 |
INTEGER (42) |
INTEGER |
“42” |
INTEGER (42) |
INTEGER |
“Agent 42” |
TEXT (“Agent 42”) |
TEXT |
42 |
TEXT (“42”) |
TEXT |
“007” |
TEXT (“007”) |
REAL |
99 |
REAL (99.0) |
REAL |
“99.7” |
REAL (99.7) |
Common Beginner Mistakes to Avoid#
Using TEXT for everything. While SQLite allows this, it’s not a good practice because:
You lose the ability to do proper numeric sorting (TEXT “10” comes before “2”)
Mathematical operations won’t work properly
It uses more storage space than necessary
Choosing types that are too small. For example:
Using SMALLINT for agent IDs when you might eventually have more than 32,767 agents
Using VARCHAR(20) for names when some might be longer
Ignoring date formatting. When storing dates as TEXT, they should be in YYYY-MM-DD format for proper sorting and comparison.
Forgetting about NULL values. Decide whether each column should allow NULL values (unknown/missing information) or require data.
Simple Performance Tips#
Make smart choices for smoother operations:
Use INTEGER for primary keys and foreign keys when possible
Use appropriate field sizes (don’t use TEXT for single-character fields)
Put your most frequently searched fields first in your table design
Consider adding indexes to columns you search often
When to Keep It Simple vs. When to Be Specific#
For most beginner projects, stick to these guidelines:
Keep it simple when:
Building your first database
Creating a prototype
Working on small projects
The data structure is unlikely to change much
Be more specific when:
Dealing with financial or scientific data
Building systems that will grow very large
Working with multiple connected systems
Performance is critical to your application
Conclusion: Data Types#
Understanding data types is an important first step in database design. For beginners, start with the basic types (INTEGER, TEXT, REAL) and gradually introduce more specific types as your needs and skills grow.
Remember that SQLite’s flexible approach makes it forgiving for beginners, but developing good habits around data type selection will serve you well as you advance to more complex database systems and larger projects. The right data types ensure your intelligence operations (or any application) run efficiently, maintain data integrity, and scale effectively as your needs grow.
The CREATE TABLE Statement#
Now that we know about data types, we can start creating our tables! In SQL, we use the CREATE TABLE statement for this.
The general syntax is:
CREATE TABLE table_name (
column1 datatype constraint,
column2 datatype constraint,
...
PRIMARY KEY (one or more columns)
);
For example, let’s create the STUDENT table:
%%sql
DROP TABLE IF EXISTS STUDENT;
CREATE TABLE STUDENT (
ID INTEGER PRIMARY KEY, -- The primary key, an auto-incrementing integer
Name VARCHAR(100) NOT NULL, -- The student's name, cannot be NULL
Codename VARCHAR(50), -- The student's codename, can be NULL
Nationality VARCHAR(50), -- The student's nationality
Specialization VARCHAR(100) -- The student's specialization
);
This creates a table named STUDENT with five columns. Note the use of ANSI standard data types and the comments explaining each column.
We can create the CLASS and ENROLLMENT tables similarly:
%%sql
DROP TABLE IF EXISTS CLASS;
DROP TABLE IF EXISTS ENROLLMENT;
CREATE TABLE CLASS (
ID INTEGER PRIMARY KEY, -- The primary key
Name VARCHAR(100) NOT NULL, -- The class name, cannot be NULL
Description VARCHAR(200), -- The class description
Instructor VARCHAR(100), -- The class instructor
Location VARCHAR(100) -- The class location
);
CREATE TABLE ENROLLMENT (
StudentID INTEGER, -- Foreign key to STUDENT table
ClassID INTEGER, -- Foreign key to CLASS table
EnrollmentDate DATE, -- The date of enrollment
PRIMARY KEY (StudentID, ClassID), -- Composite primary key
FOREIGN KEY (StudentID) REFERENCES STUDENT(ID), -- Foreign key constraint
FOREIGN KEY (ClassID) REFERENCES CLASS(ID) -- Foreign key constraint
);
Note how in the ENROLLMENT table, we define a composite primary key (StudentID, ClassID) and also specify the foreign key constraints. Let’s now use PRAGMA to see how our tables appear:
%%sql
PRAGMA table_info(STUDENT);
| cid | name | type | notnull | dflt_value | pk |
|---|---|---|---|---|---|
| 0 | ID | INTEGER | 0 | None | 1 |
| 1 | Name | VARCHAR(100) | 1 | None | 0 |
| 2 | Codename | VARCHAR(50) | 0 | None | 0 |
| 3 | Nationality | VARCHAR(50) | 0 | None | 0 |
| 4 | Specialization | VARCHAR(100) | 0 | None | 0 |
%%sql
PRAGMA table_info(CLASS);
| cid | name | type | notnull | dflt_value | pk |
|---|---|---|---|---|---|
| 0 | ID | INTEGER | 0 | None | 1 |
| 1 | Name | VARCHAR(100) | 1 | None | 0 |
| 2 | Description | VARCHAR(200) | 0 | None | 0 |
| 3 | Instructor | VARCHAR(100) | 0 | None | 0 |
| 4 | Location | VARCHAR(100) | 0 | None | 0 |
%%sql
PRAGMA table_info(ENROLLMENT);
| cid | name | type | notnull | dflt_value | pk |
|---|---|---|---|---|---|
| 0 | StudentID | INTEGER | 0 | None | 1 |
| 1 | ClassID | INTEGER | 0 | None | 2 |
| 2 | EnrollmentDate | DATE | 0 | None | 0 |
Column Constraints#
Before we start inserting data into our database, let’s discuss some important concepts that help maintain the integrity and consistency of our data: constraints.
Column constraints are rules applied to individual columns in a table. They restrict the type of data that can be stored in a column. The most common constraints are:
CHECK: Ensures that a column’s value satisfies a boolean expression.
DEFAULT: Specifies a default value for a column when no value is provided.
NOT NULL: Ensures that a column cannot have a NULL value.
UNIQUE: Ensures that each value in a column is unique across the whole table.
Let’s see how we can apply these constraints to our tables.
The DROP TABLE Statement#
But first, let’s discuss how to drop a table. The DROP TABLE statement is used to remove a table definition and all its data. The syntax is simple:
DROP TABLE table_name;
This statement is irreversible, so use it with caution!
Now, let’s recreate our tables with some constraints.
Recreating the STUDENT Table#
%%sql
DROP TABLE IF EXISTS STUDENT; -- Drop the table if it already exists
CREATE TABLE STUDENT (
ID INTEGER PRIMARY KEY,
Name VARCHAR(100) NOT NULL,
Codename VARCHAR(50) UNIQUE, -- Codenames must be unique
Nationality VARCHAR(50) DEFAULT 'Unknown', -- Default nationality is 'Unknown'
Specialization VARCHAR(100),
Age INTEGER CHECK (Age >= 18) -- Students must be at least 18 years old
);
Here, we’ve added a UNIQUE constraint to the Codename column, a DEFAULT constraint to the Nationality column, and a CHECK constraint to ensure that all students are at least 18 years old.
Recreating the CLASS Table#
For the CLASS table, we’ve added a NOT NULL constraint to the Instructor column and a CHECK constraint to ensure that the StartDate is always before the EndDate.
%%sql
DROP TABLE IF EXISTS CLASS;
CREATE TABLE CLASS (
ID INTEGER PRIMARY KEY,
Name VARCHAR(100) NOT NULL,
Description VARCHAR(200),
Instructor VARCHAR(100) NOT NULL, -- Every class must have an instructor
Location VARCHAR(100),
StartDate DATE,
EndDate DATE,
CHECK (StartDate < EndDate) -- The start date must be before the end date
);
Recreating the ENROLLMENT Table#
For the ENROLLMENT table, we’ve added a DEFAULT constraint to set the EnrollmentDate to the current date if no value is provided.
%%sql
DROP TABLE IF EXISTS ENROLLMENT;
CREATE TABLE ENROLLMENT (
StudentID INTEGER,
ClassID INTEGER,
EnrollmentDate DATE DEFAULT CURRENT_DATE, -- Default is the current date
PRIMARY KEY (StudentID, ClassID),
FOREIGN KEY (StudentID) REFERENCES STUDENT(ID),
FOREIGN KEY (ClassID) REFERENCES CLASS(ID)
);
And there we have it! Our Covert Academy database is now set up with constraints to ensure data integrity.
ERD: Physical Model#
The DROP TABLE IF EXISTS Syntax#
One more thing to note: in our previous examples, we used the syntax DROP TABLE IF EXISTS instead of just DROP TABLE. What does this do?
DROP TABLE IF EXISTS is a safer way to drop tables. If the specified table doesn’t exist, DROP TABLE would normally throw an error. However, with IF EXISTS, the statement doesn’t throw an error if the table doesn’t exist; it simply does nothing.
This is useful in scripts where you want to ensure that a table doesn’t exist before creating it, but you don’t want the script to fail if the table doesn’t exist to begin with.
ALTER TABLE: Modifying Your Database Structure#
As your database evolves, you may find yourself needing to modify the structure of existing tables. Maybe you need to add a new column, change a column’s data type, or add a new constraint. This is where the ALTER TABLE statement comes in.
The ALTER TABLE statement allows you to change the structure of an existing table without deleting and recreating it. This is useful because it preserves the existing data in the table.
Adding a New Column#
To add a new column to an existing table, you use the ADD COLUMN clause:
ALTER TABLE table_name
ADD COLUMN new_column_name datatype constraint;
For example, let’s say we want to add a ‘DateOfBirth’ column to our STUDENT table:
%%sql
ALTER TABLE STUDENT
ADD COLUMN DateOfBirth DATE;
We can know view the table schema to ensure it has been added.
%%sql
PRAGMA table_info(STUDENT);
| cid | name | type | notnull | dflt_value | pk |
|---|---|---|---|---|---|
| 0 | ID | INTEGER | 0 | None | 1 |
| 1 | Name | VARCHAR(100) | 1 | None | 0 |
| 2 | Codename | VARCHAR(50) | 0 | None | 0 |
| 3 | Nationality | VARCHAR(50) | 0 | 'Unknown' | 0 |
| 4 | Specialization | VARCHAR(100) | 0 | None | 0 |
| 5 | Age | INTEGER | 0 | None | 0 |
| 6 | DateOfBirth | DATE | 0 | None | 0 |
Dropping a Column#
We can drop (remove) a column using the drop table keyword:
ALTER TABLE table_name
DROP COLUMN column_name
Let’s now drop the DateOfBirth column.
%%sql
ALTER TABLE STUDENT
DROP COLUMN DateOfBirth;
Renaming Tables and Columns#
We can also rename tables and columns. For examples, let’s suppose we prefer table names like Students, Classes, and Enrollments.
%%sql
ALTER TABLE STUDENT
RENAME TO Students;
%%sql
ALTER TABLE CLASS
RENAME TO Classes;
%%sql
ALTER TABLE ENROLLMENT
RENAME TO Enrollments;
Modifying a Column’s Definition#
To change a column’s data type in databases like Oracle, MySQL, or Postgres, you use the MODIFY COLUMN clause (or ALTER COLUMN in other databases):
ALTER TABLE table_name
MODIFY COLUMN column_name new_datatype;
For example, if you wanted to change the ‘Specialization’ column in the STUDENT table to have a maximum length of 200 characters:
ALTER TABLE STUDENT
MODIFY COLUMN Specialization VARCHAR(200);
You can also do something similar in these database to add constraints:
ALTER TABLE STUDENT
ADD CONSTRAINT CHK_Age CHECK (Age < 100);
However, SQLite has limited support for ALTER TABLE. While you can rename, add, or drop columns, you can’t directly modify a column’s datatype or constraints. To achieve this in SQLite, you need to use a workaround:
Create a new column with the desired constraints (or new data type).
Copy the data from the old column to the new column.
Drop the old column.
Rename the new column to the original name
For example, here’s how we can add a constraint to make sure age is between 0 and 100.
%%sql
-- Step 1: Add a new column with the desired constraints
ALTER TABLE Students ADD COLUMN Age_NEW INTEGER CHECK (Age_NEW < 100);
-- Step 2: Copy the data from the old column to the new column
-- Not needed here, since our table is empty!
UPDATE Students SET Age_NEW = Age;
-- Step 3: Drop the old column
ALTER TABLE Students DROP COLUMN Age;
-- Step 4: Rename the new column to the original name
ALTER TABLE Students RENAME COLUMN Age_NEW TO Age;
Chapter Summary#
Database design is a structured process that unfolds in three main phases:
Conceptual Modeling#
Business Rules: Establish clear, testable statements that define database constraints and behavior
Entity Identification: Identify key entities (like students and classes) and their attributes
Preliminary ERD: Create a visual representation of entities and their relationships
Logical Modeling#
Relational Model: Organize data into tables with rows and columns
Keys: Implement primary keys (unique identifiers) and foreign keys (relationship links)
Many-to-Many Relationships: Resolve using join tables (e.g., ENROLLMENT connecting STUDENT and CLASS)
Normalization:
First Normal Form (1NF): Eliminate repeating groups and ensure atomic values
Second Normal Form (2NF): Remove partial dependencies on the primary key
Third Normal Form (3NF): Eliminate transitive dependencies between non-key attributes
Physical Modeling#
SQL Implementation: Translate logical models into actual database structures
Data Types: Select appropriate types (INTEGER, VARCHAR, DATE, etc.) for each column
DDL Commands:
CREATE TABLE: Define new tables with columns and constraints
DROP TABLE: Remove existing tables
ALTER TABLE: Modify table structure (add/drop columns, rename tables)
Constraints: Implement rules to maintain data integrity
NOT NULL: Require values in specified columns
UNIQUE: Ensure distinct values across rows
CHECK: Validate data against specific conditions
DEFAULT: Provide fallback values
PRIMARY KEY/FOREIGN KEY: Establish uniqueness and relationships
Database design is an iterative process requiring careful planning and a deep understanding of the data relationships. Proper design ensures data integrity, reduces redundancy, and provides an efficient structure for data storage and retrieval that meets business requirements.
Lab: SQL Data Definition Langauge Practice#
Practice with the Loop of the Recursive Dragon#
Sharpen what you just learned with a chapter-matched review set in the Loop of the Recursive Dragon — an adaptive review game with multiple question types and RPG-style mechanics, built for this book.
Glossary#
Use this list as a quick review sheet for the chapter.
Design process#
Conceptual model — A high-level picture of the data and how it relates, with no technology details.
Logical model — A detailed design that captures attributes, keys, and constraints without committing to a specific DBMS.
Physical model — The design as implemented in a specific database system.
Business rule — A statement that defines or constrains how the business works.
Descriptive process — A design approach that begins from existing data and refines its structure.
Normative process — A design approach that begins from requirements and builds the structure from scratch.
Keys and relationships#
PRIMARY KEY— A constraint that uniquely identifies each row in a table.FOREIGN KEY (...) REFERENCES ...— A constraint that points to a key in another table.col_name datatype PRIMARY KEY— A column definition that declares the primary key while creating the table.Natural key — A real-world attribute, or set of attributes, that uniquely identifies a row.
Surrogate key — An artificial identifier used in place of a natural key.
Normalization#
Normalization — The process of organizing data to reduce redundancy and improve integrity.
First normal form (1NF) — A normalization level that removes repeating groups and keeps values atomic.
Second normal form (2NF) — A normalization level that removes partial dependency on a composite key.
Third normal form (3NF) — A normalization level that removes transitive dependencies between non-key columns.
ERD elements#
Entities (Crow’s foot ERD) — The main objects or concepts drawn as boxes in an ER diagram.
Attributes (Crow’s foot ERD) — The properties listed inside an entity box.
Relationships (Crow’s foot ERD) — The lines that connect entities to show how they relate.
SQL data definition#
CREATE TABLE— Creates a new table in the database.DROP TABLE— Removes a table and all of its data.DROP TABLE IF EXISTS— Removes a table only if it already exists.ALTER TABLE ... ADD COLUMN ...— Adds a new column to an existing table.ALTER TABLE ... DROP COLUMN ...— Removes a column from an existing table.ALTER TABLE ... RENAME TO ...— Renames an existing table.
Constraints#
NOT NULL— A constraint that prevents empty values in a column.UNIQUE— A constraint that requires every value in a column to be distinct.CHECK— A constraint that requires each row to satisfy a condition.
Data types#
INTEGER— Stores whole numbers.REAL— Stores floating-point numbers.TEXT— A variable-length text data type.CHAR(n)— A fixed-length text data type.VARCHAR(n)— A variable-length text data type with a maximum length.DATE— Stores a calendar date.TIME— Stores a time of day.BLOB— A large binary value, such as an image or file.JSONB— PostgreSQL’s binary JSON data type.
Standards#
ANSI — The American National Standards Institute, which helps define SQL standards.
ISO — The International Organization for Standardization, which also publishes SQL standards.