Chapter 5: Relational Schema Design (The Covert Spy Academy)

Contents

Open In Colab

Chapter 5: Relational Schema Design (The Covert Spy Academy)#

Database and SQL Through Pop Culture | Brendan Shea, PhD

Welcome to the exciting world of database design. In this chapter, we move through the full process of creating a database from scratch. We begin with conceptual modeling, where we identify the key entities, attributes, and relationships in our spy academy database. From there, we move to logical modeling, where keys, normalization, and join tables help us avoid redundancy and protect data integrity. Finally, we move into physical modeling, where we use SQL to create, modify, and delete tables in SQLite.

Learning Outcomes#

By the end of this chapter, you will be able to:

  1. Distinguish conceptual, logical, and physical database design.

  2. Translate business rules into entities, attributes, relationships, and ERDs.

  3. Choose primary keys and foreign keys and resolve many-to-many relationships with join tables.

  4. Normalize tables to 1NF, 2NF, and 3NF by identifying repeating groups, partial dependencies, and transitive dependencies.

  5. Explain the benefits and trade-offs of normalization and avoid common design mistakes.

  6. Choose appropriate data types and constraints, including how SQLite type affinity differs from stricter SQL typing.

  7. Use SQL DDL to create and drop tables with keys and column constraints.

Keywords: database design, conceptual modeling, logical modeling, physical modeling, entities, attributes, relationships, ERD, primary key, foreign key, normalization, 1NF, 2NF, 3NF, join tables, data types, constraints, CREATE TABLE, DROP TABLE

Brendan’s Lecture#

from IPython.display import YouTubeVideo
YouTubeVideo('6C2KECb9ZwE', width=800, height=500)

Conceptual Design at the Covert Academy#

Welcome to the Covert Academy, the world’s premier educational institution for aspiring secret agents! Our state-of-the-art facility is hidden beneath the streets of London and equipped with the latest in espionage technology. We offer a wide range of classes to train our students in the arts of surveillance, infiltration, disguise, and more.

To keep track of our students and classes, we need a well-designed database. Let’s walk through the process of conceptual design for this database.

What is Conceptual Design?#

Conceptual design is the first step in creating a database. It involves formulating business rules, which define how the database should work and what constraints it should have. It also involves creating a preliminary list of entities and attributes, which represent the key concepts and data elements in the system. Finally, it involves creating a preliminary Entity-Relationship Diagram (ERD) to visually represent the entities and their relationships.

Formulating Business Rules#

Business rules are concise, unambiguous statements that define or constrain some aspect of the database. They are derived through a combination of descriptive processes, such as interviewing stakeholders to understand their needs and requirements, and normative processes, which involve making decisions about how the system should work.

For the Covert Academy, we have the following business rules:

  1. Each student must be enrolled in at least one class, but may be enrolled in multiple classes.

  2. Each class must have at least one student enrolled, but may have many students.

To write effective business rules:

  • Keep them concise and unambiguous

  • Ensure they are testable (you can determine if the system is complying with the rule)

  • Involve all relevant stakeholders in formulating and reviewing them

  • Prioritize them (some may be must-haves, others may be nice-to-haves)

Preliminary List of Entities and Attributes#

Based on our business rules, we can identify our preliminary entities, which are the key concepts or objects in our system, and their attributes, which are the data elements that describe each entity.

For the Covert Academy, our entities and attributes are:

Student

  • Name

  • Codename

  • Nationality

  • Specialization

Class

  • Name

  • Description

  • Instructor

  • Location

Note that at this stage, we don’t include data types, identifier columns, or join tables. We’re just focusing on the core entities and their key attributes.

Preliminary ERD#

Finally, we can create a preliminary ERD to visualize our entities and their relationship. Here’s what it looks like for the Covert Academy:

This diagram shows that there is a many-to-many relationship between STUDENT and CLASS (a student can enroll in many classes, a class can have many students).

We’ll refine this ERD in the next step, logical modeling, where we’ll resolve the many-to-many relationship with a join table. But this preliminary ERD gives us a good starting point to visualize our system.

What is Logical Modeling?#

In the previous section, we laid the conceptual foundation for the Covert Academy’s database. Now it’s time to take that conceptual model and transform it into a logical model using the relational model.

Logical modeling is the process of taking a conceptual model and adapting it to fit a specific logical data model. In our case, we’ll be using the relational model, which organizes data into tables (also known as relations) with rows (also known as tuples) and columns (also known as attributes).

In the relational model:

  • Each table should represent a single entity or concept

  • Each row in a table represents a specific instance of that entity

  • Each column in a table represents an attribute of that entity

  • Each table should have a primary key, a unique identifier for each row

Resolving Many-to-Many Relationships#

In our conceptual model, we had a many-to-many relationship between STUDENT and CLASS. However, in the relational model, we can’t directly represent many-to-many relationships. Instead, we need to introduce a join table.

A join table is a table that breaks down a many-to-many relationship into two one-to-many relationships. It does this by having foreign keys to both of the original tables.

For the Covert Academy, our join table will be called ENROLLMENT. It will have the following structure:

ENROLLMENT

  • StudentID (Foreign Key to STUDENT)

  • ClassID (Foreign Key to CLASS)

  • EnrollmentDate

Now, instead of a direct many-to-many relationship, we have:

  • A one-to-many relationship between STUDENT and ENROLLMENT

  • A one-to-many relationship between CLASS and ENROLLMENT

Choosing Primary Keys#

Each table in our database needs a primary key. A primary key is a unique identifier for each row in a table. There are two main options for primary keys:

  1. Natural Key: A natural key is a key that uses one of the existing attributes of the entity. For example, we could use a student’s email as the primary key for the STUDENT table. Natural keys can be convenient because they don’t require an additional column. However, they can also be problematic if the natural key ever needs to change.

  2. Surrogate Key: A surrogate key is an artificial key that is created specifically to be the primary key. It’s usually a simple integer or a universally unique identifier (UUID). Surrogate keys are often preferable because they are guaranteed to be unique and they never need to change.

For the Covert Academy, we’ll use surrogate keys for all of our tables. We’ll call these ID columns.

Updated ERD#

With these changes in mind, here’s our updated ERD:

The tables are as follows:

STUDENT (ID (Primary Key), Name, Codename, Nationality, Specialization)

CLASS (ID (Primary Key), Name, Description, Instructor, Location)

ENROLLMENT (StudentID (Foreign Key to STUDENT), ClassID (Foreign Key to CLASS), EnrollmentDate)

Logical Modeling: Normalization#

Normalization helps you organize tables so each fact is stored once and relationships stay clear.

Why normalize?#

A single table that mixes agents, missions, and handler contacts looks convenient, but it creates:

  1. Redundant data

  2. Update anomalies

  3. Insertion anomalies

  4. Deletion anomalies

Using the SPECTRE Academy example, repeating handler phone numbers or storing many skills in one field makes the design hard to maintain.

First Normal Form (1NF)#

A table is in 1NF when:

  1. Each column holds one value per row.

  2. Repeating groups are removed.

  3. Each row can be identified with a key.

Before: an Agents table stores Skills as "Espionage, Combat, Driving".

After: split the design into:

  • AGENT(AgentID, Name, Codename)

  • AGENT_SKILLS(AgentID, Skill)

That change makes each skill searchable, sortable, and easy to validate.

1NF Checklist and Tradeoffs#

Common warning signs:

  • comma-separated lists in one column

  • contact fields that mix phone numbers and emails

  • location or address fields that hide multiple facts in one string

Benefits of 1NF:

  • simpler queries

  • cleaner updates

  • better indexing

  • stronger data integrity

Sometimes teams denormalize for reporting or performance, but that should be a deliberate choice with a clear reason.

Second Normal Form (2NF)#

Once a database achieves First Normal Form, the next step is to evaluate it for Second Normal Form compliance. Second Normal Form builds upon 1NF by addressing a specific type of data dependency.

A table is in Second Normal Form when:

  1. The table is already in First Normal Form.

  2. Every non-key attribute in the table is fully functionally dependent on the entire primary key.

The concept of functional dependency means that one attribute’s value determines another attribute’s value. A full functional dependency exists when an attribute depends on the entire primary key, not just part of it.

Recognizing Partial Dependencies#

Let’s examine a table from our spy training academy that tracks agent certifications:

AgentID

SkillID

CertificationDate

SkillName

SkillDifficulty

007

S1

2023-01-15

Lockpicking

Advanced

007

S2

2023-02-01

Surveillance

Expert

008

S1

2023-03-10

Lockpicking

Advanced

In this table, the primary key consists of both AgentID and SkillID together. Let’s analyze the dependencies:

  1. CertificationDate depends on both AgentID and SkillID.

  2. SkillName depends only on SkillID.

  3. SkillDifficulty depends only on SkillID.

The partial dependencies in this table violate Second Normal Form because SkillName and SkillDifficulty depend only on SkillID, which is just part of the primary key.

Converting to 2NF#

To achieve Second Normal Form, we separate the attributes that depend on only part of the primary key into their own table:

AGENT_CERTIFICATIONS table:

AgentID

SkillID

CertificationDate

007

S1

2023-01-15

007

S2

2023-02-01

008

S1

2023-03-10

SKILLS table:

SkillID

SkillName

SkillDifficulty

S1

Lockpicking

Advanced

S2

Surveillance

Expert

Another 2NF Example#

Consider a table tracking mission equipment assignments:

MissionID

AgentID

EquipmentID

MissionLocation

EquipmentName

EquipmentType

M1

007

E1

Venice

Laser Watch

Gadget

M1

007

E2

Venice

Grappling Hook

Tool

M2

008

E1

Paris

Laser Watch

Gadget

This table has several problems:

  1. MissionLocation depends only on MissionID.

  2. EquipmentName and EquipmentType depend only on EquipmentID.

To achieve 2NF, we create three separate tables:

MISSION_EQUIPMENT table:

MissionID

AgentID

EquipmentID

M1

007

E1

M1

007

E2

M2

008

E1

MISSIONS table:

MissionID

MissionLocation

M1

Venice

M2

Paris

EQUIPMENT table:

EquipmentID

EquipmentName

EquipmentType

E1

Laser Watch

Gadget

E2

Grappling Hook

Tool

Benefits of Second Normal Form#

Moving to Second Normal Form provides several important benefits:

  1. Data consistency improves because information about each entity is stored in only one place.

  2. Update operations become more reliable because related data is properly grouped together.

  3. Storage efficiency increases because the database eliminates redundant data storage.

  4. Data modifications become safer because changes to one aspect of the data won’t affect unrelated aspects.

Common 2NF Mistakes to Avoid#

Database designers should watch for these common issues when working with Second Normal Form:

  1. Composite keys are sometimes overlooked when analyzing dependencies, leading to incomplete normalization.

  2. Transitive dependencies might be missed when focusing only on partial dependencies.

  3. Over-normalization can occur when splitting tables unnecessarily, making the database harder to understand.

In the next section, we’ll explore Third Normal Form and examine how it addresses transitive dependencies to further improve database design.

Third Normal Form and Beyond#

Third Normal Form (3NF) removes transitive dependencies. A transitive dependency happens when a non-key field depends on another non-key field instead of directly on the key.

Example:

AGENT(AgentID, Name, HandlerID, HandlerName, HandlerRank)

Here, HandlerName and HandlerRank really depend on HandlerID, not AgentID.

To fix it, split the design into:

  • AGENTS(AgentID, Name, HandlerID)

  • HANDLERS(HandlerID, HandlerName, HandlerRank)

This reduces duplication and keeps handler updates in one place.

Practical Normalization Strategy#

Use this workflow when reviewing a design:

  1. Get each table into 1NF.

  2. Remove partial dependencies to reach 2NF.

  3. Remove transitive dependencies to reach 3NF.

  4. Stop and evaluate before adding more complexity.

Beyond 3NF:

  • BCNF strengthens the key rules.

  • 4NF handles multivalued dependencies.

  • 5NF handles rare join-dependency problems.

In practice, most production schemas stop at 3NF and only denormalize later for a measured performance or reporting need.

Quiz: Normalize This#

Interactive activity: Open this standalone page to use the activity in the Jupyter Book or from Colab.

Open the database normalization quiz

Physical Modeling - Bringing the Database to Life#

We’ve conceptually designed our database and transformed it into a logical model using the relational model and normalization techniques. Now it’s time for the exciting part - actually creating our database!

Physical modeling is the process of taking the logical model and implementing it in a specific database management system (DBMS). This involves defining the actual tables, columns, data types, and constraints in the database using the Data Definition Language (DDL) of the chosen DBMS.

For the Covert Academy’s database, we’ll be using SQLite (which is what we’ve used in the class so far).

SQLite is a lightweight, file-based DBMS. It’s a popular choice for many applications because:

  1. It’s serverless (the database is just a file)

  2. It’s self-contained (no external dependencies)

  3. It’s cross-platform (works on all major operating systems)

  4. It’s open-source and free to use

SQLite, like most modern DBMSs, uses SQL (Structured Query Language) for defining and manipulating databases.

ANSI SQL Data Types: Big Picture#

The ANSI/ISO SQL standard gives database systems a shared vocabulary for types, but each DBMS implements that standard a little differently.

Common type families#

  • Numeric: INTEGER, SMALLINT, BIGINT, DECIMAL, REAL

  • Character: CHAR, VARCHAR, TEXT

  • Temporal: DATE, TIME, TIMESTAMP

  • Binary: BINARY, VARBINARY, BLOB

  • Other common types: BOOLEAN, JSON, UUID

Why the standard matters#

  • improves portability between database systems

  • helps teams talk about schema choices precisely

  • makes column intent clearer during design

A good rule is to choose the smallest type that accurately represents the data and matches how you plan to query it.

SQLite Type Affinity in Practice#

SQLite is more flexible than many other databases. Instead of enforcing types rigidly, it uses five storage classes:

  • NULL

  • INTEGER

  • REAL

  • TEXT

  • BLOB

Declared column types map to type affinity:

  • INTEGER affinity

  • TEXT affinity

  • REAL affinity

  • BLOB affinity

  • NUMERIC affinity

That means SQLite will often convert values when it can. For example, '8' may be stored as an integer in a numeric column, while 7 inserted into a text column may be stored as '7'.

The design takeaway is simple: declare types clearly even in SQLite, because the schema communicates your intent to both the database and future readers.

Practical Database Design for Beginners#

When you’re creating your first database for an intelligence operation (or any application), choosing the right data types is one of the most important decisions you’ll make. Here are some straightforward guidelines:

Choosing the Right Data Type#

Keep it simple when starting out. Begin with these basic types for most of your needs:

  • INTEGER for IDs, counts, and whole numbers

  • TEXT for names, descriptions, and any text content

  • REAL for measurements and calculations

  • DATE/TIME for timestamps and calendar dates

  • BOOLEAN for yes/no or true/false situations

Match the data type to what you’re actually storing. Think about:

  • What kind of information is this? (number, text, date, etc.)

  • How large could this value get? (small integers vs. large integers)

  • Do I need decimal precision? (whole numbers vs. decimals)

  • Will I need to sort or calculate with this field?

Examples of Basic Data Type Decisions#

For an agent management system, you might choose:

  • agent_id: INTEGER - Simple, efficient for primary keys

  • code_name: TEXT - Variable length names like “Bond” or “Black Widow”

  • clearance_level: INTEGER - Simple numbers from 1-10

  • active_status: BOOLEAN - Either active (1) or inactive (0)

  • recruitment_date: DATE - When they joined

For mission tracking:

  • mission_id: INTEGER - Sequential identifier

  • mission_name: TEXT - Operation names like “Skyfall”

  • location: TEXT - “London, UK” or “Abandoned Warehouse”

  • start_date: DATE - When the mission begins

  • budget: DECIMAL(10,2) - Precise financial tracking (e.g., $250,000.75)

SQLite’s Simple Approach for Beginners#

SQLite makes data types easier for beginners. Unlike other databases, SQLite is flexible about data types. This means:

  • You can store a number in a TEXT column (it will convert automatically)

  • You can store text in an INTEGER column (it will try to convert if possible)

  • You don’t need to worry as much about exact type matching

This table shows what actually happens when you insert different values into SQLite columns:

Declared Column Type

Value Inserted

What Actually Gets Stored

INTEGER

42

INTEGER (42)

INTEGER

“42”

INTEGER (42)

INTEGER

“Agent 42”

TEXT (“Agent 42”)

TEXT

42

TEXT (“42”)

TEXT

“007”

TEXT (“007”)

REAL

99

REAL (99.0)

REAL

“99.7”

REAL (99.7)

Common Beginner Mistakes to Avoid#

Using TEXT for everything. While SQLite allows this, it’s not a good practice because:

  • You lose the ability to do proper numeric sorting (TEXT “10” comes before “2”)

  • Mathematical operations won’t work properly

  • It uses more storage space than necessary

Choosing types that are too small. For example:

  • Using SMALLINT for agent IDs when you might eventually have more than 32,767 agents

  • Using VARCHAR(20) for names when some might be longer

Ignoring date formatting. When storing dates as TEXT, they should be in YYYY-MM-DD format for proper sorting and comparison.

Forgetting about NULL values. Decide whether each column should allow NULL values (unknown/missing information) or require data.

Simple Performance Tips#

Make smart choices for smoother operations:

  • Use INTEGER for primary keys and foreign keys when possible

  • Use appropriate field sizes (don’t use TEXT for single-character fields)

  • Put your most frequently searched fields first in your table design

  • Consider adding indexes to columns you search often

When to Keep It Simple vs. When to Be Specific#

For most beginner projects, stick to these guidelines:

  • Keep it simple when:

    • Building your first database

    • Creating a prototype

    • Working on small projects

    • The data structure is unlikely to change much

  • Be more specific when:

    • Dealing with financial or scientific data

    • Building systems that will grow very large

    • Working with multiple connected systems

    • Performance is critical to your application

Conclusion: Data Types#

Understanding data types is an important first step in database design. For beginners, start with the basic types (INTEGER, TEXT, REAL) and gradually introduce more specific types as your needs and skills grow.

Remember that SQLite’s flexible approach makes it forgiving for beginners, but developing good habits around data type selection will serve you well as you advance to more complex database systems and larger projects. The right data types ensure your intelligence operations (or any application) run efficiently, maintain data integrity, and scale effectively as your needs grow.

The CREATE TABLE Statement#

Now that we know about data types, we can start creating our tables! In SQL, we use the CREATE TABLE statement for this.

The general syntax is:

CREATE TABLE table_name (
    column1 datatype constraint,
    column2 datatype constraint,
    ...
    PRIMARY KEY (one or more columns)
);

For example, let’s create the STUDENT table:

%%sql
DROP TABLE IF EXISTS STUDENT;

CREATE TABLE STUDENT (
    ID INTEGER PRIMARY KEY,  -- The primary key, an auto-incrementing integer
    Name VARCHAR(100) NOT NULL,  -- The student's name, cannot be NULL
    Codename VARCHAR(50),  -- The student's codename, can be NULL
    Nationality VARCHAR(50),  -- The student's nationality
    Specialization VARCHAR(100)  -- The student's specialization
);
Running query in 'sqlite:///design_covert_academy.db'

This creates a table named STUDENT with five columns. Note the use of ANSI standard data types and the comments explaining each column.

We can create the CLASS and ENROLLMENT tables similarly:

%%sql
DROP TABLE IF EXISTS CLASS;
DROP TABLE IF EXISTS ENROLLMENT;

CREATE TABLE CLASS (
    ID INTEGER PRIMARY KEY,  -- The primary key
    Name VARCHAR(100) NOT NULL,  -- The class name, cannot be NULL
    Description VARCHAR(200),  -- The class description
    Instructor VARCHAR(100),  -- The class instructor
    Location VARCHAR(100)  -- The class location
);

CREATE TABLE ENROLLMENT (
    StudentID INTEGER,  -- Foreign key to STUDENT table
    ClassID INTEGER,  -- Foreign key to CLASS table
    EnrollmentDate DATE,  -- The date of enrollment
    PRIMARY KEY (StudentID, ClassID),  -- Composite primary key
    FOREIGN KEY (StudentID) REFERENCES STUDENT(ID),  -- Foreign key constraint
    FOREIGN KEY (ClassID) REFERENCES CLASS(ID)  -- Foreign key constraint
);
Running query in 'sqlite:///design_covert_academy.db'

Note how in the ENROLLMENT table, we define a composite primary key (StudentID, ClassID) and also specify the foreign key constraints. Let’s now use PRAGMA to see how our tables appear:

%%sql
PRAGMA table_info(STUDENT);
Running query in 'sqlite:///design_covert_academy.db'
cid name type notnull dflt_value pk
0 ID INTEGER 0 None 1
1 Name VARCHAR(100) 1 None 0
2 Codename VARCHAR(50) 0 None 0
3 Nationality VARCHAR(50) 0 None 0
4 Specialization VARCHAR(100) 0 None 0
%%sql
PRAGMA table_info(CLASS);
Running query in 'sqlite:///design_covert_academy.db'
cid name type notnull dflt_value pk
0 ID INTEGER 0 None 1
1 Name VARCHAR(100) 1 None 0
2 Description VARCHAR(200) 0 None 0
3 Instructor VARCHAR(100) 0 None 0
4 Location VARCHAR(100) 0 None 0
%%sql
PRAGMA table_info(ENROLLMENT);
Running query in 'sqlite:///design_covert_academy.db'
cid name type notnull dflt_value pk
0 StudentID INTEGER 0 None 1
1 ClassID INTEGER 0 None 2
2 EnrollmentDate DATE 0 None 0

Column Constraints#

Before we start inserting data into our database, let’s discuss some important concepts that help maintain the integrity and consistency of our data: constraints.

Column constraints are rules applied to individual columns in a table. They restrict the type of data that can be stored in a column. The most common constraints are:

  1. CHECK: Ensures that a column’s value satisfies a boolean expression.

  2. DEFAULT: Specifies a default value for a column when no value is provided.

  3. NOT NULL: Ensures that a column cannot have a NULL value.

  4. UNIQUE: Ensures that each value in a column is unique across the whole table.

Let’s see how we can apply these constraints to our tables.

The DROP TABLE Statement#

But first, let’s discuss how to drop a table. The DROP TABLE statement is used to remove a table definition and all its data. The syntax is simple:

DROP TABLE table_name;

This statement is irreversible, so use it with caution!

Now, let’s recreate our tables with some constraints.

Recreating the STUDENT Table#

%%sql
DROP TABLE IF EXISTS STUDENT;  -- Drop the table if it already exists

CREATE TABLE STUDENT (
    ID INTEGER PRIMARY KEY,
    Name VARCHAR(100) NOT NULL,
    Codename VARCHAR(50) UNIQUE,  -- Codenames must be unique
    Nationality VARCHAR(50) DEFAULT 'Unknown',  -- Default nationality is 'Unknown'
    Specialization VARCHAR(100),
    Age INTEGER CHECK (Age >= 18)  -- Students must be at least 18 years old
);
Running query in 'sqlite:///design_covert_academy.db'

Here, we’ve added a UNIQUE constraint to the Codename column, a DEFAULT constraint to the Nationality column, and a CHECK constraint to ensure that all students are at least 18 years old.

Recreating the CLASS Table#

For the CLASS table, we’ve added a NOT NULL constraint to the Instructor column and a CHECK constraint to ensure that the StartDate is always before the EndDate.

%%sql
DROP TABLE IF EXISTS CLASS;

CREATE TABLE CLASS (
    ID INTEGER PRIMARY KEY,
    Name VARCHAR(100) NOT NULL,
    Description VARCHAR(200),
    Instructor VARCHAR(100) NOT NULL,  -- Every class must have an instructor
    Location VARCHAR(100),
    StartDate DATE,
    EndDate DATE,
    CHECK (StartDate < EndDate)  -- The start date must be before the end date
);
Running query in 'sqlite:///design_covert_academy.db'

Recreating the ENROLLMENT Table#

For the ENROLLMENT table, we’ve added a DEFAULT constraint to set the EnrollmentDate to the current date if no value is provided.

%%sql
DROP TABLE IF EXISTS ENROLLMENT;

CREATE TABLE ENROLLMENT (
    StudentID INTEGER,
    ClassID INTEGER,
    EnrollmentDate DATE DEFAULT CURRENT_DATE,  -- Default is the current date
    PRIMARY KEY (StudentID, ClassID),
    FOREIGN KEY (StudentID) REFERENCES STUDENT(ID),
    FOREIGN KEY (ClassID) REFERENCES CLASS(ID)
);
Running query in 'sqlite:///design_covert_academy.db'

And there we have it! Our Covert Academy database is now set up with constraints to ensure data integrity.

ERD: Physical Model#

The DROP TABLE IF EXISTS Syntax#

One more thing to note: in our previous examples, we used the syntax DROP TABLE IF EXISTS instead of just DROP TABLE. What does this do?

DROP TABLE IF EXISTS is a safer way to drop tables. If the specified table doesn’t exist, DROP TABLE would normally throw an error. However, with IF EXISTS, the statement doesn’t throw an error if the table doesn’t exist; it simply does nothing.

This is useful in scripts where you want to ensure that a table doesn’t exist before creating it, but you don’t want the script to fail if the table doesn’t exist to begin with.

ALTER TABLE: Modifying Your Database Structure#

As your database evolves, you may find yourself needing to modify the structure of existing tables. Maybe you need to add a new column, change a column’s data type, or add a new constraint. This is where the ALTER TABLE statement comes in.

The ALTER TABLE statement allows you to change the structure of an existing table without deleting and recreating it. This is useful because it preserves the existing data in the table.

Adding a New Column#

To add a new column to an existing table, you use the ADD COLUMN clause:

ALTER TABLE table_name
ADD COLUMN new_column_name datatype constraint;

For example, let’s say we want to add a ‘DateOfBirth’ column to our STUDENT table:

%%sql
ALTER TABLE STUDENT
ADD COLUMN DateOfBirth DATE;
Running query in 'sqlite:///design_covert_academy.db'

We can know view the table schema to ensure it has been added.

%%sql
PRAGMA table_info(STUDENT);
Running query in 'sqlite:///design_covert_academy.db'
cid name type notnull dflt_value pk
0 ID INTEGER 0 None 1
1 Name VARCHAR(100) 1 None 0
2 Codename VARCHAR(50) 0 None 0
3 Nationality VARCHAR(50) 0 'Unknown' 0
4 Specialization VARCHAR(100) 0 None 0
5 Age INTEGER 0 None 0
6 DateOfBirth DATE 0 None 0

Dropping a Column#

We can drop (remove) a column using the drop table keyword:

ALTER TABLE table_name
DROP COLUMN column_name

Let’s now drop the DateOfBirth column.

%%sql
ALTER TABLE STUDENT
DROP COLUMN DateOfBirth;
Running query in 'sqlite:///design_covert_academy.db'

Renaming Tables and Columns#

We can also rename tables and columns. For examples, let’s suppose we prefer table names like Students, Classes, and Enrollments.

%%sql
ALTER TABLE STUDENT
RENAME TO Students;
Running query in 'sqlite:///design_covert_academy.db'
%%sql
ALTER TABLE CLASS
RENAME TO Classes;
Running query in 'sqlite:///design_covert_academy.db'
%%sql
ALTER TABLE ENROLLMENT
RENAME TO Enrollments;
Running query in 'sqlite:///design_covert_academy.db'

Modifying a Column’s Definition#

To change a column’s data type in databases like Oracle, MySQL, or Postgres, you use the MODIFY COLUMN clause (or ALTER COLUMN in other databases):

ALTER TABLE table_name
MODIFY COLUMN column_name new_datatype;

For example, if you wanted to change the ‘Specialization’ column in the STUDENT table to have a maximum length of 200 characters:

ALTER TABLE STUDENT
MODIFY COLUMN Specialization VARCHAR(200);

You can also do something similar in these database to add constraints:

ALTER TABLE STUDENT
ADD CONSTRAINT CHK_Age CHECK (Age < 100);

However, SQLite has limited support for ALTER TABLE. While you can rename, add, or drop columns, you can’t directly modify a column’s datatype or constraints. To achieve this in SQLite, you need to use a workaround:

  1. Create a new column with the desired constraints (or new data type).

  2. Copy the data from the old column to the new column.

  3. Drop the old column.

  4. Rename the new column to the original name

For example, here’s how we can add a constraint to make sure age is between 0 and 100.

%%sql
-- Step 1: Add a new column with the desired constraints
ALTER TABLE Students ADD COLUMN Age_NEW INTEGER CHECK (Age_NEW < 100);

-- Step 2: Copy the data from the old column to the new column
-- Not needed here, since our table is empty!
UPDATE Students SET Age_NEW = Age;

-- Step 3: Drop the old column
ALTER TABLE Students DROP COLUMN Age;

-- Step 4: Rename the new column to the original name
ALTER TABLE Students RENAME COLUMN Age_NEW TO Age;
Running query in 'sqlite:///design_covert_academy.db'

Chapter Summary#

Database design is a structured process that unfolds in three main phases:

Conceptual Modeling#

  • Business Rules: Establish clear, testable statements that define database constraints and behavior

  • Entity Identification: Identify key entities (like students and classes) and their attributes

  • Preliminary ERD: Create a visual representation of entities and their relationships

Logical Modeling#

  • Relational Model: Organize data into tables with rows and columns

  • Keys: Implement primary keys (unique identifiers) and foreign keys (relationship links)

  • Many-to-Many Relationships: Resolve using join tables (e.g., ENROLLMENT connecting STUDENT and CLASS)

  • Normalization:

    • First Normal Form (1NF): Eliminate repeating groups and ensure atomic values

    • Second Normal Form (2NF): Remove partial dependencies on the primary key

    • Third Normal Form (3NF): Eliminate transitive dependencies between non-key attributes

Physical Modeling#

  • SQL Implementation: Translate logical models into actual database structures

  • Data Types: Select appropriate types (INTEGER, VARCHAR, DATE, etc.) for each column

  • DDL Commands:

    • CREATE TABLE: Define new tables with columns and constraints

    • DROP TABLE: Remove existing tables

    • ALTER TABLE: Modify table structure (add/drop columns, rename tables)

  • Constraints: Implement rules to maintain data integrity

    • NOT NULL: Require values in specified columns

    • UNIQUE: Ensure distinct values across rows

    • CHECK: Validate data against specific conditions

    • DEFAULT: Provide fallback values

    • PRIMARY KEY/FOREIGN KEY: Establish uniqueness and relationships

Database design is an iterative process requiring careful planning and a deep understanding of the data relationships. Proper design ensures data integrity, reduces redundancy, and provides an efficient structure for data storage and retrieval that meets business requirements.

Lab: SQL Data Definition Langauge Practice#

Practice with the Loop of the Recursive Dragon#

Sharpen what you just learned with a chapter-matched review set in the Loop of the Recursive Dragon — an adaptive review game with multiple question types and RPG-style mechanics, built for this book.

Launch the Chapter 5 review set →

Glossary#

Use this list as a quick review sheet for the chapter.

Design process#

  • Conceptual model — A high-level picture of the data and how it relates, with no technology details.

  • Logical model — A detailed design that captures attributes, keys, and constraints without committing to a specific DBMS.

  • Physical model — The design as implemented in a specific database system.

  • Business rule — A statement that defines or constrains how the business works.

  • Descriptive process — A design approach that begins from existing data and refines its structure.

  • Normative process — A design approach that begins from requirements and builds the structure from scratch.

Keys and relationships#

  • PRIMARY KEY — A constraint that uniquely identifies each row in a table.

  • FOREIGN KEY (...) REFERENCES ... — A constraint that points to a key in another table.

  • col_name datatype PRIMARY KEY — A column definition that declares the primary key while creating the table.

  • Natural key — A real-world attribute, or set of attributes, that uniquely identifies a row.

  • Surrogate key — An artificial identifier used in place of a natural key.

Normalization#

  • Normalization — The process of organizing data to reduce redundancy and improve integrity.

  • First normal form (1NF) — A normalization level that removes repeating groups and keeps values atomic.

  • Second normal form (2NF) — A normalization level that removes partial dependency on a composite key.

  • Third normal form (3NF) — A normalization level that removes transitive dependencies between non-key columns.

ERD elements#

  • Entities (Crow’s foot ERD) — The main objects or concepts drawn as boxes in an ER diagram.

  • Attributes (Crow’s foot ERD) — The properties listed inside an entity box.

  • Relationships (Crow’s foot ERD) — The lines that connect entities to show how they relate.

SQL data definition#

  • CREATE TABLE — Creates a new table in the database.

  • DROP TABLE — Removes a table and all of its data.

  • DROP TABLE IF EXISTS — Removes a table only if it already exists.

  • ALTER TABLE ... ADD COLUMN ... — Adds a new column to an existing table.

  • ALTER TABLE ... DROP COLUMN ... — Removes a column from an existing table.

  • ALTER TABLE ... RENAME TO ... — Renames an existing table.

Constraints#

  • NOT NULL — A constraint that prevents empty values in a column.

  • UNIQUE — A constraint that requires every value in a column to be distinct.

  • CHECK — A constraint that requires each row to satisfy a condition.

Data types#

  • INTEGER — Stores whole numbers.

  • REAL — Stores floating-point numbers.

  • TEXT — A variable-length text data type.

  • CHAR(n) — A fixed-length text data type.

  • VARCHAR(n) — A variable-length text data type with a maximum length.

  • DATE — Stores a calendar date.

  • TIME — Stores a time of day.

  • BLOB — A large binary value, such as an image or file.

  • JSONB — PostgreSQL’s binary JSON data type.

Standards#

  • ANSI — The American National Standards Institute, which helps define SQL standards.

  • ISO — The International Organization for Standardization, which also publishes SQL standards.