Chapter 7: Schema Abstraction with Views (The Pop Culture Music Archive)

Chapter 7: Schema Abstraction with Views (The Pop Culture Music Archive)#

Database and SQL Through Pop Culture | Brendan Shea, PhD#

This chapter introduces Views and Common Table Expressions (CTEs) in SQLite using a Rolling Stone / Spotify music dataset. You will see how Views simplify queries, support aggregation, and control access to sensitive data, and how CTEs break complex queries into readable pieces. The chapter closes with a look at data governance and GDPR in a music ranking web app.

Learning Outcomes#

By the end of this chapter, you will be able to:

Explain what a view is and when it is useful for presentation, reuse, or access control.
Create and query views that format, filter, and aggregate data.
Use views to limit exposure of sensitive data and provide selective access to results.
Distinguish standard, temporary, and materialized views and describe common use cases.
Explain when data can and cannot be modified through a view.
Use Common Table Expressions to break complex queries into readable steps.
Explain how views, data classification, and data loss prevention support data governance and privacy requirements such as GDPR.

Keywords: SQLite, views, temporary views, materialized views, CTEs, data presentation, aggregation, data governance, data loss prevention, GDPR

Brendan’s Lecture#

from IPython.display import YouTubeVideo
YouTubeVideo('UYlAo8F9wco', width=800, height=500)

Data Dictionary for “Rolling Stone’s Greatest Albums DB.”#

Before diving into views, let’s get familiar with the data. The data dictionary below describes each table. (Author’s note: I built this database manually, starting from Rolling Stone’s greatest-albums lists and adding metadata pulled from the Spotify Application Programming Interface (API).)

Table: `artists`#

This table holds information about the artists featured in the Rolling Stone lists.

Column Name	Data Type	Description
`artist_id`	STRING	Unique identifier for each artist.
`Artist`	TEXT	Name of the artist.
`followers`	INTEGER	Number of followers the artist has on Spotify.
`genres`	TEXT	Comma-separated list of genres associated with the artist.
`popularity`	INTEGER	Spotify popularity score, from 0 to 100.

Table: `albums`#

This table holds information about the albums in the Rolling Stone lists across years.

Column Name	Data Type	Description
`album_id`	STRING	Unique identifier for each album.
`Album`	TEXT	Title of the album.
`artist_id`	STRING	Identifier for the artist who created the album.
`Year`	INTEGER	Year the album was released.
`rank_2003`	INTEGER	Position of the album in the 2003 list.
`rank_2012`	INTEGER	Position of the album in the 2012 list.
`rank_2020`	INTEGER	Position of the album in the 2020 list.
`rank_2023`	INTEGER	Position of the album in the 2023 list.
Foreign Key		FOREIGN KEY (`artist_id`) REFERENCES `artists`(`artist_id`)

Table: `tracks`#

This table holds information about the tracks on each album.

Column Name	Data Type	Description
`track_id`	STRING	Unique identifier for each track.
`track_name`	TEXT	Name of the track.
`album_id`	STRING	Identifier for the album that the track belongs to.
`track_number`	INTEGER	Position of the track within the album.
`duration_ms`	INTEGER	Duration of the track in milliseconds.
Foreign Key		FOREIGN KEY (`album_id`) REFERENCES `albums`(`album_id`)

Examining the “Head” of the Data#

Before trying to design our own queries, let’s take a look at the head of the data (that is, the first few rows). We can do this using LIMIT.

%%sql
SELECT * FROM artists LIMIT 5;

Running query in 'sqlite:///views_rs_greatest_albums.db'

	artist_id	Artist	followers	genres	popularity
0	3koiLjNrgRTNbOwViDipeA	Gaye, Marvin	5313182	['classic soul', 'motown', 'neo soul', 'northern soul', 'quiet storm', 'soul']	71
1	3oDbviiivRWhXwIE8hxkVV	Beach Boys	4499081	['baroque pop', 'classic rock', 'folk rock', 'mellow gold', 'psychedelic rock', 'rock', 'singer-songwriter', 'sunshine pop']	71
2	5hW4L92KnC6dX9t7tYM4Ve	Mitchell, Joni	1461857	['canadian singer-songwriter', 'folk', 'folk rock', 'singer-songwriter']	59
3	7guDJrEfX3qb6FEbdPA5qi	Wonder, Stevie	6427524	['motown', 'soul']	73
4	3WrFJ7ztbogyGnTHbHJFl2	Beatles	27469908	['british invasion', 'classic rock', 'merseybeat', 'psychedelic rock', 'rock']	82

%%sql
SELECT * FROM albums LIMIT 5;

Running query in 'sqlite:///views_rs_greatest_albums.db'

	album_id	Album	artist_id	Year	rank_2003	rank_2012	rank_2020	rank_2023
0	7qD9oSiS3xFJsw2oSV2akp	What's Going On	3koiLjNrgRTNbOwViDipeA	1971	6	6	1	1
1	2CNEkSE8TADXRT2AzcEt1b	Pet Sounds	3oDbviiivRWhXwIE8hxkVV	1966	2	2	2	2
2	1vz94WpXDVYIEGja8cjFNa	Blue	5hW4L92KnC6dX9t7tYM4Ve	1971	30	30	3	3
3	6YUCc2RiXcEKS9ibuZxjt0	Songs in the Key of Life	7guDJrEfX3qb6FEbdPA5qi	1976	57	57	4	4
4	0ETFjACtuP2ADo6LFhL6HN	Abbey Road	3WrFJ7ztbogyGnTHbHJFl2	1969	14	14	5	5

%%sql
SELECT * FROM tracks LIMIT 5;

Running query in 'sqlite:///views_rs_greatest_albums.db'

	track_id	track_name	album_id	track_number	duration_ms
0	4cgLt753cDfsBAbpOcGnZi	What's Going On - Detroit Mix	7qD9oSiS3xFJsw2oSV2akp	1	247840
1	0lz97gXzH4DD822L62kLeK	What's Happening Brother - Detroit Mix	7qD9oSiS3xFJsw2oSV2akp	2	163146
2	0oQV9CKXruLbVUTn76eBnr	Flyin' High (In The Friendly Sky) - Detroit Mix	7qD9oSiS3xFJsw2oSV2akp	3	228693
3	3KAbXV1OSwthoke6BqR0y5	Save The Children - Detroit Mix	7qD9oSiS3xFJsw2oSV2akp	4	241626
4	1J3WjiLbPAdW0mSBTxoHQF	God Is Love - Detroit Mix	7qD9oSiS3xFJsw2oSV2akp	5	106733

A few things stand out from the schema:

String primary keys. artist_id, album_id, and track_id are strings, not integers. Spotify likely uses string identifiers because they are globally unique identifiers (GUIDs) — keys that can be generated independently on different machines without coordination. GUIDs are common in distributed databases where many systems write at once.
Multi-genre artists. The genres column stores several values in one column, comma-separated and wrapped in square brackets like a JSON array. That’s efficient storage, but it means querying genres needs string parsing rather than simple equality.
Ranks across years. The albums table has four ranking columns — one per Rolling Stone list (2003, 2012, 2020, 2023). The lists evolved over time, and this schema captures that history.
Track duration in milliseconds. Storing duration in duration_ms is precise but not human-friendly. We will build a view that converts it to minutes and seconds.

What is a VIEW?#

A view is a virtual table built from a SELECT statement. You can query a view the same way you query a regular table. Behind the scenes, the database runs the saved SELECT each time the view is accessed.

The general syntax for creating a view in SQL is:

CREATE [TEMP | TEMPORARY] VIEW [IF NOT EXISTS] view_name AS
SELECT column1, column2, ...
FROM table_name
WHERE condition;

What each piece does:

TEMP or TEMPORARY (optional) — creates a view that only exists for the current session.
IF NOT EXISTS (optional) — avoids an error if a view with that name already exists.
view_name — the name of the view. Same naming rules as a table.
AS — precedes the SELECT that defines the view.
SELECT — the query that produces the view’s rows. It can pull from one or more tables and include conditions, joins, and aggregations.

Once a view exists, you can query it like a table. The result set is regenerated each time the view is accessed, based on the latest data in the underlying tables.

Here’s a simple example of creating a view:

%%sql
DROP VIEW IF EXISTS album_info;

CREATE VIEW album_info AS
SELECT
  Albums.album,
  Albums.year,
  Artists.artist
FROM Albums
  JOIN Artists ON Albums.artist_id = Artists.artist_id;

Running query in 'sqlite:///views_rs_greatest_albums.db'

In this example, we create a view named album_info that combines information from the albums and artists tables. The view includes the Album and Year columns from the albums table and the Artist column from the artists table. The tables are joined based on the artist_id foreign key relationship.

Now, whenever we query the album_info view, it will return the result set as defined by the SELECT statement, providing a convenient way to access the combined album and artist information.

%%sql
SELECT * FROM album_info LIMIT 5;

Running query in 'sqlite:///views_rs_greatest_albums.db'

	Album	Year	Artist
0	What's Going On	1971	Gaye, Marvin
1	Pet Sounds	1966	Beach Boys
2	Blue	1971	Mitchell, Joni
3	Songs in the Key of Life	1976	Wonder, Stevie
4	Abbey Road	1969	Beatles

Views offer several benefits:

They simplify complex queries by hiding them behind a single, reusable name.
They add a layer of abstraction over the underlying tables.
They restrict access to specific columns or rows, which helps with security.
They give applications a consistent, central way to read shared data.

The next few sections explore practical uses of views: presenting data, aggregating data, and securing data — all using the music dataset.

Views for Presenting Data#

Views provide a powerful way to present data in a more user-friendly and meaningful manner. By creating views, you can customize the appearance of the data, combine information from multiple tables, and present it in a way that is easier to understand and interpret. Let’s look at a few examples using the music dataset.

Greatest albums, 2023 version only#

Suppose you want to create a view that displays only the albums that appeared in the 2023 version of Rolling Stone’s greatest albums list. Here’s how you can achieve that:

%%sql
DROP VIEW IF EXISTS greatest_albums_2023;

CREATE VIEW greatest_albums_2023 AS
SELECT
  Albums.album,
  Artists.artist,
  Albums.year,
  Albums.rank_2023
FROM Albums
JOIN Artists ON Albums.artist_id = artists.artist_id
WHERE rank_2023 IS NOT NULL
ORDER BY rank_2023;

Running query in 'sqlite:///views_rs_greatest_albums.db'

In this example, the greatest_albums_2023 view retrieves the Album, Artist, Year, and rank_2023 columns from the albums and artists tables. The tables are joined based on the artist_id foreign key relationship. The WHERE clause filters the results to include only the albums that have a non-null value in the rank_2023 column. Finally, the ORDER BY clause sorts the albums based on their 2023 ranking.

Now, you can query the greatest_albums_2023 view to retrieve the list of albums from the 2023 version of Rolling Stone’s list:

%%sql
SELECT * FROM greatest_albums_2023 LIMIT 10;

Running query in 'sqlite:///views_rs_greatest_albums.db'

	Album	Artist	Year	rank_2023
0	What's Going On	Gaye, Marvin	1971	1
1	Pet Sounds	Beach Boys	1966	2
2	Blue	Mitchell, Joni	1971	3
3	Songs in the Key of Life	Wonder, Stevie	1976	4
4	Abbey Road	Beatles	1969	5
5	Nevermind	Nirvana	1991	6
6	Rumours	Fleetwood Mac	1977	7
7	Purple Rain	Prince and the Revolution	1984	8
8	Blood on the Tracks	Dylan, Bob	1975	9
9	The Miseducation of Lauryn Hill	Hill, Lauryn	1998	10

Presenting track information with reading Minutes and Seconds#

Let’s say you want to present the track information in a more readable format by converting the duration from milliseconds to minutes and seconds. Here’s how you can create a view for that:

%%sql
DROP VIEW IF EXISTS track_info;

CREATE VIEW track_info AS
SELECT
  Tracks.track_name,
  Albums.album,
  Tracks.track_number,
       PRINTF('%d:%02d', duration_ms / 1000 / 60, duration_ms / 1000 % 60) AS duration
FROM Tracks
  JOIN Albums ON Tracks.album_id = Albums.album_id;

Running query in 'sqlite:///views_rs_greatest_albums.db'

In this view, the track_info retrieves the track_name, album_id, track_number, and duration columns from the tracks table. The printf function is used to format the duration_ms column into a string representation of minutes and seconds (MM:SS format). The duration_ms value is divided by 1000 to convert it from milliseconds to seconds, and then the minutes and seconds are extracted using integer division and modulo operations.

You can now query the track_info view to retrieve the track information with the formatted duration:

%%sql
SELECT * FROM track_info LIMIT 5;

Running query in 'sqlite:///views_rs_greatest_albums.db'

	track_name	Album	track_number	duration
0	What's Going On - Detroit Mix	What's Going On	1	4:07
1	What's Happening Brother - Detroit Mix	What's Going On	2	2:43
2	Flyin' High (In The Friendly Sky) - Detroit Mix	What's Going On	3	3:48
3	Save The Children - Detroit Mix	What's Going On	4	4:01
4	God Is Love - Detroit Mix	What's Going On	5	1:46

Albums with biggest change from 2003 to 2023#

To identify the albums that had the biggest change in ranking from 2003 to 2023, you can create a view that calculates the difference between the rankings:

%%sql
DROP VIEW IF EXISTS biggest_rank_change;

CREATE VIEW biggest_rank_change AS
SELECT
  Albums.album,
  Artists.artist,
  Albums.year,
  Albums.rank_2003,
  Albums.rank_2023,
       rank_2003 - rank_2023 AS rank_change
FROM
  Albums
  JOIN Artists ON Albums.artist_id = Artists.artist_id
WHERE
  rank_2003 IS NOT NULL
  AND rank_2023 IS NOT NULL
ORDER BY ABS(rank_change) DESC
LIMIT 10;

Running query in 'sqlite:///views_rs_greatest_albums.db'

In this view, the biggest_rank_change retrieves the Album, Artist, rank_2003, rank_2023, and rank_change columns. The rank_change column is calculated by subtracting the rank_2023 from the rank_2003. The WHERE clause filters the results to include only albums that have non-null values for both rank_2003 and rank_2023. The ORDER BY clause sorts the albums based on the absolute value of the rank_change in descending order. Finally, the LIMIT clause retrieves only the top 10 albums with the biggest rank change.

You can query the biggest_rank_change view to see the albums with the most significant changes in ranking between 2003 and 2023:

%%sql
SELECT * FROM biggest_rank_change

Running query in 'sqlite:///views_rs_greatest_albums.db'

	Album	Artist	Year	rank_2003	rank_2023	rank_change
0	Voodoo	D'Angelo	2000	480	28	452
1	Aquemini	OutKast	1998	500	49	451
2	The Blueprint	Jay-Z	2001	456	50	406
3	Kid A	Radiohead	2000	420	20	400
4	Enter the Wu_Tang: 36 Chambers	Wu Tang Clan	1993	382	27	355
5	Illmatic	Nas	1994	396	44	352
6	Live Through This	Hole	1994	458	106	352
7	Maggot Brain	Funkadelic	1971	478	136	342
8	The Score	Fugees	1996	469	134	335
9	Heart Like a Wheel	Ronstadt, Linda	1975	163	490	-327

This view has some limitations (for example, it doesn’t account for albums that fell out of the rankings altogether). However, it gives a good sense of how critics views have changed over time. For example, many of the albums the had the biggest increases in rank (by D’Angelo, Jay-Z, Radiohead, and Outkast) were still relatively “new” in 2003. However, looking back, it appears they have “stood the test of time.”

Views for Aggregating Data#

In addition to presenting data in a more meaningful way, views can also be used to aggregate data and provide summarized information. Aggregation involves performing calculations or applying functions across multiple rows of data to derive summarized results. Let’s explore a couple of examples of using views for data aggregation in our music database.

Artists with the Most Albums in the Database#

To find out which artists have the most albums in the database, we can create a view that counts the number of albums for each artist and orders the results by the album count in descending order. We’ll also include their 2024 popularity (according to Spotify).

%%sql
DROP VIEW IF EXISTS artist_album_count;

CREATE VIEW Artist_Album_Count AS
SELECT
  Ar.Artist,
  COUNT(*) AS Album_Count,
  Ar.popularity AS '2024_Popularity'
FROM
  Albums Al
  JOIN Artists Ar ON Al.artist_id = Ar.artist_id
GROUP BY Ar.Artist
ORDER BY Album_Count DESC;

Running query in 'sqlite:///views_rs_greatest_albums.db'

The view is named Artist_Album_Count and selects two columns: Ar.Artist (the artist name) and COUNT(*) (the count of albums for each artist).
The FROM clause specifies the Albums table as the main table and joins it with the Artists table using the artist_id foreign key relationship.
The GROUP BY clause groups the results by Ar.Artist, so that the COUNT(*) function calculates the number of albums for each unique artist.
The ORDER BY clause sorts the results based on the Album_Count column in descending order, so artists with the most albums appear at the top.

To retrieve the results, you can simply query the Artist_Album_Count view:

%%sql
SELECT * FROM Artist_Album_Count LIMIT 10;

Running query in 'sqlite:///views_rs_greatest_albums.db'

	Artist	Album_Count	2024_Popularity
0	Dylan, Bob	11	69
1	Rolling Stones	10	77
2	Beatles	10	82
3	Springsteen, Bruce	9	75
4	Who, The	7	65
5	Bowie, David	7	75
6	West, Kanye	6	90
7	Marley, Bob	6	80
8	John, Elton	6	81
9	Young, Neil	5	67

Average Song Length by Artists#

We can also create an average song length by artist, to see which artists have the longest (or shortest songs). For example, let’s create a view that sorts by longest song length

%%sql
DROP VIEW IF EXISTS Average_Song_Length;

CREATE VIEW Average_Song_Length AS
SELECT
  Artists.artist,
  -- Average song length minutes
  ROUND(AVG(Tracks.duration_ms) / 60000,2) AS avg_song_length,
  COUNT(Tracks.track_id) AS num_songs
FROM
  Tracks
  JOIN Albums ON Tracks.album_id = Albums.album_id
  JOIN Artists ON Albums.artist_id = Artists.artist_id
GROUP BY Artists.artist
HAVING num_songs > 5
ORDER BY avg_song_length DESC;

Running query in 'sqlite:///views_rs_greatest_albums.db'

This query creates a view named Average_Song_Length that calculates the average song length in minutes and the number of songs for each artist. The view does six things:

Joins Tracks, Albums, and Artists using their foreign keys (album_id and artist_id).
Computes the average song length per artist by dividing duration_ms by 60,000 (to convert milliseconds to minutes) and rounding to two decimal places. Aliased as avg_song_length.
Counts the number of songs per artist with COUNT(track_id), aliased as num_songs.
Groups the rows by artist name.
Filters with HAVING to keep only artists with more than five songs in the database.
Sorts the result by avg_song_length in descending order.

We can now query the view:

%%sql
SELECT * FROM Average_Song_Length LIMIT 5;

Running query in 'sqlite:///views_rs_greatest_albums.db'

	Artist	avg_song_length	num_songs
0	Allman Brothers Band	11.24	7
1	Yes	9.26	7
2	Davis, Miles	7.58	45
3	War	7.34	6
4	King Sunny Ade	7.15	10

Views for Securing Data#

Views can also serve as a security mechanism. By exposing only certain columns or rows, you can control who sees what — especially when the underlying table contains Personally Identifiable Information (PII).

PII is any data that can identify a specific person, directly or indirectly. Common examples include names, email addresses, phone numbers, Social Security numbers, and financial account numbers.

Protecting PII matters for three reasons:

Legal compliance. Laws like GDPR and HIPAA require organizations to protect personal data. Violations can mean significant fines.
Ethical responsibility. Mishandled PII can lead to identity theft and financial fraud for real people.
Trust and reputation. A breach damages a company’s brand and erodes customer confidence.

By creating views that hide sensitive columns, you can let users access the data they need without exposing the rest.

Fake data table: Musician Info#

First, let’s create a Musician_Info table that includes PII so we can demonstrate the technique:

%%sql
DROP TABLE IF EXISTS Musician_Info;
CREATE TABLE Musician_Info (
  musician_id INTEGER PRIMARY KEY,
  name TEXT,
  email TEXT,
  phone TEXT,
  address TEXT,
  ssn TEXT,
  bank_account TEXT
);

INSERT INTO Musician_Info (name, email, phone, address, ssn, bank_account)
VALUES
  ('Freddie Mercurious', 'freddie@queen.com', '123-456-7890',
    '1 Bohemian Rhapsody Lane', '123-45-6789', '1234567890'),
  ('Elton Johnsmith', 'elton@rocketman.com', '987-654-3210',
    '42 Yellow Brick Road', '987-65-4321', '9876543210');

SELECT * FROM Musician_Info;

Running query in 'sqlite:///views_rs_greatest_albums.db'

2 rows affected.

	musician_id	name	email	phone	address	ssn	bank_account
0	1	Freddie Mercurious	freddie@queen.com	123-456-7890	1 Bohemian Rhapsody Lane	123-45-6789	1234567890
1	2	Elton Johnsmith	elton@rocketman.com	987-654-3210	42 Yellow Brick Road	987-65-4321	9876543210

In this example, the Musician_Info table contains PII columns like ssn and bank_account, which should not be accessible to all users.

Creating Views to Selectively Hide PII#

To protect PII, we can create a view that exposes only the non-sensitive columns:

%%sql
CREATE VIEW Musician_Public_Info AS
SELECT musician_id, name, email, phone, address
FROM Musician_Info;

Running query in 'sqlite:///views_rs_greatest_albums.db'

The Musician_Public_Info view selects only the musician_id, name, email, phone, and address columns from the Musician_Info table, excluding the sensitive PII columns ssn and bank_account.

Now, instead of granting access to the entire Musician_Info table, you can grant users access to only the Musician_Public_Info view. This way, users can retrieve the public information about musicians without being able to see the PII.

%%sql
SELECT * FROM Musician_Public_Info;

Running query in 'sqlite:///views_rs_greatest_albums.db'

	musician_id	name	email	phone	address
0	1	Freddie Mercurious	freddie@queen.com	123-456-7890	1 Bohemian Rhapsody Lane
1	2	Elton Johnsmith	elton@rocketman.com	987-654-3210	42 Yellow Brick Road

The query above will return only the non-sensitive columns defined in the view.

Types of Views and When to Use Them#

SQL supports three types of views: default views, temporary views, and materialized views. Each one serves a different purpose.

Default views#

Default views (also called simple or regular views) are the most common type. They’re created with CREATE VIEW and stored in the database schema alongside the tables. They are recomputed every time they are queried, so they always reflect the current data.

CREATE VIEW view_name AS
SELECT column1, column2, ...
FROM table_name
WHERE condition;

Example use case. A music streaming app needs a clean view of song details that combines Songs, Albums, and Artists. A default view called Song_Info gives developers a single object to query instead of writing the same join repeatedly.

Temporary views#

Temporary views (also called ephemeral views) are created with CREATE TEMPORARY VIEW. They live in a separate temporary database and are dropped automatically when the connection closes. They are invisible to other connections.

CREATE TEMPORARY VIEW view_name AS
SELECT column1, column2, ...
FROM table_name
WHERE condition;

Example use case. A one-time music festival needs a reporting dashboard built from ticket sales data. Temporary views can hold the complex calculations needed for each report without cluttering the main schema, and they disappear when the festival ends.

Materialized views#

A materialized view stores its result in a separate table. Unlike default views, which run their query each time, a materialized view is refreshed on a schedule or on demand. They are essentially a cached snapshot.

-- not supported natively in SQLite
CREATE MATERIALIZED VIEW view_name AS
SELECT column1, column2, ...
FROM table_name
WHERE condition;

-- Refresh the materialized view
REFRESH MATERIALIZED VIEW view_name;

Example use case. A music analytics platform reports artist popularity based on millions of streaming events. A materialized view can precompute the popularity metrics so dashboards load quickly. The view gets refreshed nightly to reflect new streams.

SQLite does not have built-in support for materialized views. The workaround is to create a regular table and populate it from a query. Most enterprise-scale systems (PostgreSQL, MySQL, Oracle, SQL Server) support materialized views natively.

Choosing the right type#

A few factors guide the choice:

Complexity of the query. Heavier queries benefit more from caching.
Update frequency. Frequently changing data argues for default views.
Performance requirements. Hot dashboards may need materialized views.
Scope of the data. Session-specific work fits temporary views.

Default views handle most cases. Temporary views are useful for isolated, session-scoped work. Materialized views shine when query cost is high and the underlying data changes slowly.

Modifying Data in Views#

In SQLite, views are read-only by default. You cannot run INSERT, UPDATE, or DELETE directly against a view. (You can drop a view with DROP VIEW, but that removes the view, not its data.)

The reason is that a view is just a saved query. If the query involves joins, aggregations, or calculations, the database often cannot figure out how to translate a change to the view back into changes on the underlying tables.

You can work around this with triggers. A trigger is a database object that runs automatically in response to a specific event — INSERT, UPDATE, or DELETE — on a table or view. Triggers can intercept operations against a view and redirect them to the right rows in the underlying table.

Consider the Musician_Public_Info view from the previous section:

CREATE VIEW Musician_Public_Info AS
SELECT musician_id, name, email, phone, address
FROM Musician_Info;

To allow inserts through this view, you can create a trigger on the view that forwards new rows to the underlying Musician_Info table.

Here’s how to create a trigger that enables INSERT against the Musician_Public_Info view:

import sqlite3

with sqlite3.connect('rs_greatest_albums.db') as conn:
    conn.execute('DROP TRIGGER IF EXISTS Insert_Musician_Public_Info;')
    try:
        conn.execute("""
        INSERT INTO Musician_Public_Info (name, email, phone, address)
        VALUES ('Taylor Drift', 'taylor@example.com', '123-456-7890', '456 Elm St');
        """)
        conn.commit()
    except sqlite3.OperationalError as exc:
        print(f'Expected view error: {exc}')

Expected view error: cannot modify Musician_Public_Info because it is a view

%%sql
DROP TRIGGER IF EXISTS Insert_Musician_Public_Info;

-- Trigger for INSERT operation
CREATE TRIGGER Insert_Musician_Public_Info
INSTEAD OF INSERT ON Musician_Public_Info
FOR EACH ROW
BEGIN
    INSERT INTO Musician_Info (musician_id, name, email, phone, address)
    VALUES (NEW.musician_id, NEW.name, NEW.email, NEW.phone, NEW.address);
END;

Running query in 'sqlite:///views_rs_greatest_albums.db'

Now, we should be able to INSERT data into this table:

%%sql
-- Insert a new musician into the view
INSERT INTO Musician_Public_Info (name, email, phone, address)
VALUES ('Taylor Drift', 'taylor@example.com', '123-456-7890', '456 Elm St');

SELECT * FROM Musician_Public_Info;

Running query in 'sqlite:///views_rs_greatest_albums.db'

	musician_id	name	email	phone	address
0	1	Freddie Mercurious	freddie@queen.com	123-456-7890	1 Bohemian Rhapsody Lane
1	2	Elton Johnsmith	elton@rocketman.com	987-654-3210	42 Yellow Brick Road
2	3	Taylor Drift	taylor@example.com	123-456-7890	456 Elm St

These operations are intercepted by the triggers and applied to the Musician_Info table.

A few things to keep in mind:

Performance. Triggers add overhead and can slow down writes, especially on large datasets or complex views.
Data integrity. Triggers should handle every column the view exposes. Views spanning multiple tables may need several triggers and careful logic to keep the underlying data consistent.
Other databases. Enterprise systems usually have better native support for updatable views than SQLite does.

Common Table Expressions#

A Common Table Expression (CTE) is a temporary named result set that lives inside a single SQL statement. CTEs help you break a complex query into smaller, named parts. They are useful when:

You want to reference the same subquery more than once in a single query.
You want to make a long query easier to read.

Basic syntax#

WITH cte_name AS (
    SELECT column1, column2, ...
    FROM table_name
    WHERE condition
)
SELECT *
FROM cte_name;

How it works:

WITH starts the CTE definition.
cte_name is the name you give the CTE; the main query references it by name.
The parenthesized SELECT is the CTE’s body — it produces the temporary result set.
The main query, after the CTE, uses cte_name as if it were a regular table.

Example: “Time” in album titles, 2000–2020#

Let’s create a CTE for albums released between 2000 and 2020, then filter that CTE for titles that include the word “time”.

%%sql
-- THe Common Table Expression Starts Here
WITH recent_albums AS (
    SELECT
      Albums.album,
      Artists.artist,
      Albums.year
    FROM Albums
    JOIN Artists ON Albums.artist_id = Artists.artist_id
    WHERE Albums.year BETWEEN 2000 AND 2020
)

-- The Main Query Starts Here
SELECT *
FROM recent_albums
WHERE recent_albums.album LIKE '%time%'
ORDER BY year DESC
LIMIT 10;

Running query in 'sqlite:///views_rs_greatest_albums.db'

	album	artist	year
0	Modern Times	Dylan, Bob	2006
1	All Time Greatest Hits	Lynn, Loretta	2002
2	Time (The Revelator)	Welch, Gillian	2001

The query uses a CTE named recent_albums that pulls albums released between 2000 and 2020 along with the corresponding artist names. The CTE is defined with WITH, naming the result set and providing the SELECT that builds it.

The main query then treats recent_albums like a regular table. It selects all the CTE’s columns, filters to album names containing the word “time”, sorts by year in descending order, and limits the output to the top 10 rows.

Using a CTE here pays off in three ways:

It splits the query into smaller, more understandable parts.
It lets the same temporary result set be referenced multiple times within the same query if needed.
It encourages a modular style that makes longer SQL easier to maintain.

Case Study: Data Governance in a Music Ranking Web App#

Imagine a web app called MusicRank that lets users keep personal rankings of their favorite albums and share them with a community of music fans. The features:

Users create accounts, log in, and manage their profiles.
Users search for albums, add them to a personal ranking, and assign scores.
The app combines rankings from all users to produce global popularity scores.
Users connect with friends, share rankings, and discover music through their network.

Behind the scenes, the app stores everything in a database (SQLite for this case study) and uses Python to handle the web layer and the database interactions.

With that kind of platform comes responsibility. Developers have to think carefully about data governance — the rules and practices that keep user data safe, private, and compliant with the law. The rest of this chapter walks through three pieces of that picture:

Data loss prevention policies.
Data classification.
Global regulations like GDPR.

Good data governance is not only about avoiding fines. It is about building an app users can trust.

Graphic: UML for Music Web App#

Data Loss Prevention Policies#

When building a web app like MusicRank, you need ways to protect sensitive data from unauthorized access and accidental loss. That is the job of data loss prevention (DLP).

DLP refers to the strategies, practices, and tools that keep sensitive data from being lost, stolen, or misused. DLP policies help organizations protect their data assets and stay compliant with regulations.

Three DLP areas are worth a closer look in our music app.

1. Data encryption#

Encryption turns plaintext data into coded text that only the right key can decode. It protects data from anyone who manages to read it without permission.

Example. When a user creates an account in MusicRank, their password should be encrypted (hashed, more specifically) before it is stored. Even if the database is leaked, attackers cannot read the original passwords.

In a SQLite-based app, encryption is usually handled in the application layer. Python provides libraries such as cryptography and pycryptodome that supply solid algorithms and key management. The app encrypts sensitive fields before writing them to the database and decrypts them on the way out.

2. Access controls#

Access controls decide who can see which data and what they can do with it. Without access controls, every user could see every record.

Example. MusicRank uses role-based access control (RBAC). Regular users can only view and modify their own rankings. Administrators have extra privileges to manage the platform. Roles map to permissions, and permissions decide who gets through.

Python web frameworks like Flask and Django provide the building blocks: authentication, sessions, and authorization checks. When a request arrives, the framework checks the user’s role and decides whether the request is allowed.

3. Data backups#

Backups are copies of the data made to protect against accidents, hardware failure, and attacks. Regular backups let you recover the system when something goes wrong.

Example. Suppose the MusicRank server crashes and the database file becomes corrupted. With recent backups in place, the team restores from the latest copy and only loses a small window of data.

In a SQLite-based app, backups are typically automated with Python scripts: copy the database file to a separate location on a schedule and store it in secure cloud storage. In a disaster, restoring is as simple as putting the backup back in place.

SQLite leans on Python and the operating system for these capabilities. Larger systems like PostgreSQL and MySQL have richer native support for encryption, access controls, and backup. With Python libraries and good habits, though, a SQLite app can still meet most DLP needs.

Data Classification#

A key step in data governance is classifying data by sensitivity and regulatory requirements. Classification helps you understand what data you hold and what kind of protection it needs.

Four common classification categories matter to a music ranking app:

Personally Identifiable Information (PII). Data that can identify a person directly or indirectly. Examples: names, email addresses, phone numbers, social security numbers. MusicRank collects PII when users sign up — at minimum names, email addresses, and profile pictures. PII needs encryption in transit and at rest, restricted access by role, and clear consent from users about how it will be used.
Protected Health Information (PHI). A subset of PII tied to a person’s health, healthcare services, or healthcare payments. In the United States, PHI is regulated by the Health Insurance Portability and Accountability Act (HIPAA). MusicRank doesn’t directly deal with PHI, but if it ever integrated with a health service or collected any health-related signal, HIPAA requirements would kick in.
Payment Card Industry Data Security Standard (PCI-DSS). A set of rules that protects credit card data during transactions and storage. If MusicRank added e-commerce — concert ticket sales, paid subscriptions — PCI-DSS would apply. Compliance usually means using a vetted third-party payment gateway, encrypting card data, and following strict access-control practices.
Sensitive business data. Internal information that gives the company its edge — proprietary algorithms, partnership agreements, financial details. For MusicRank, this might include the ranking algorithm itself. Protect this category with strong access controls, monitoring, and employee training.

Best practices#

A short checklist for handling classified data in a web app:

Run a data inventory to catalog what you collect.
Write clear classification policies and rules for each category.
Enforce access controls based on roles and data sensitivity.
Encrypt sensitive data both at rest and in transit.
Train every developer and employee on the handling rules.

With classification in place, security controls can be matched to the actual risk of each piece of data.

Discussion Questions#

Why does data governance matter for a web app like MusicRank? What can go wrong without it?
How do encryption, access controls, and backups work together to prevent data loss and unauthorized access? Give a concrete example of each from MusicRank.
Suppose MusicRank adds concert ticket sales. What new data classifications come into play, and how would you handle them?
As a developer, how would you make MusicRank GDPR-compliant? What steps would you take for consent, user rights, and privacy by design?
If MusicRank suffered a data breach, what would you do to limit damage and meet GDPR’s notification requirements?
How would you stay current on data protection laws as MusicRank expands into new countries?
Beyond technical measures, what policies and employee training would help build a culture of privacy and security inside the team?

Chapter Summary#

Views are virtual tables that simplify complex queries and provide a user-friendly interface for data retrieval
Views enable data aggregation, access control, and enhance data security
SQLite supports three types of views: default views, temporary views, and materialized views (simulated)
Common Table Expressions (CTEs) are temporary named result sets within a single SQL statement
CTEs break down complex queries into manageable parts, improving code readability and maintainability
Views and CTEs can be used together to create powerful and efficient database solutions
Data governance, including data loss prevention and compliance with regulations like GDPR, is crucial for web applications handling sensitive user data
Implementing data governance requires a holistic approach involving technical measures, organizational policies, and employee training

Lab: Practice Your SQL#

Practice with the Loop of the Recursive Dragon#

Sharpen what you just learned with a chapter-matched review set in the Loop of the Recursive Dragon — an adaptive review game with multiple question types and RPG-style mechanics, built for this book.

Launch the Chapter 7 review set →

Glossary#

Use this list as a quick review sheet for the chapter.

Views and CTEs#

View — A virtual table whose contents come from a saved query.
CREATE VIEW v AS query — SQL that creates a virtual table from a query result.
DROP VIEW IF EXISTS view_name — SQL that removes a view only if it already exists.
Temporary view — A view that lasts only for the current session or transaction.
Materialized view — A stored copy of a query result that can be refreshed when the underlying data changes.
Common Table Expression (CTE) — A temporary named result set defined inside a single query using WITH.
WITH cte_name AS (inner_query) outer_query — SQL syntax that defines a CTE and then uses it in a later query.

Data governance and security#

Data dictionary — Metadata that describes the structure and constraints of a database.
Data classification — Grouping data by sensitivity, value, or handling rules.
Data governance — The overall management of data availability, usability, integrity, and security.
Data Loss Prevention (DLP) — Practices that stop sensitive data from leaving approved channels.
Encryption — Turning data into a protected form that unauthorized users cannot easily read.
Role-based access control (RBAC) — Restricting access based on the role assigned to a user.
Personally Identifiable Information (PII) — Data that can identify a specific person.
Personal Health Information (PHI) — Health-related information that can be tied to a person.
Informed consent — Permission given by someone who understands the likely effects of agreeing.
Globally unique identifier (GUID) — A 128-bit value used to identify something uniquely across systems.
Distributed database — A database stored across multiple computers rather than on a single machine.

Compliance frameworks#

GDPR — The European Union regulation governing data protection and privacy.
PCI-DSS — The security standard that applies to handling payment card data.
Right to be forgotten — A GDPR right that lets people have their personal data erased in certain cases.
Right to data breach notification — A GDPR right that requires individuals to be told when their personal data has been breached.
Right to rectify — A GDPR right that lets people correct inaccurate personal data.

Chapter 7: Schema Abstraction with Views (The Pop Culture Music Archive)

Contents

Chapter 7: Schema Abstraction with Views (The Pop Culture Music Archive)#

Database and SQL Through Pop Culture | Brendan Shea, PhD#

Learning Outcomes#

Brendan’s Lecture#

Data Dictionary for “Rolling Stone’s Greatest Albums DB.”#

Table: artists#

Table: albums#

Table: tracks#

Examining the “Head” of the Data#

What is a VIEW?#

Views for Presenting Data#

Greatest albums, 2023 version only#

Presenting track information with reading Minutes and Seconds#

Albums with biggest change from 2003 to 2023#

Views for Aggregating Data#

Artists with the Most Albums in the Database#

Average Song Length by Artists#

Views for Securing Data#

Fake data table: Musician Info#

Creating Views to Selectively Hide PII#

Types of Views and When to Use Them#

Default views#

Temporary views#

Materialized views#

Choosing the right type#

Modifying Data in Views#

Common Table Expressions#

Basic syntax#

Example: “Time” in album titles, 2000–2020#

Case Study: Data Governance in a Music Ranking Web App#

Graphic: UML for Music Web App#

Data Loss Prevention Policies#

1. Data encryption#

2. Access controls#

3. Data backups#

Data Classification#

Best practices#

Global Regulations (GDPR)#

Discussion Questions#

Chapter Summary#

Lab: Practice Your SQL#

Practice with the Loop of the Recursive Dragon#

Glossary#

Views and CTEs#

Data governance and security#

Compliance frameworks#

Table: `artists`#

Table: `albums`#

Table: `tracks`#