Chapter 7: Schema Abstraction with Views (The Pop Culture Music Archive)#
Database and SQL Through Pop Culture | Brendan Shea, PhD#
This chapter introduces Views and Common Table Expressions (CTEs) in SQLite using a Rolling Stone / Spotify music dataset. You will see how Views simplify queries, support aggregation, and control access to sensitive data, and how CTEs break complex queries into readable pieces. The chapter closes with a look at data governance and GDPR in a music ranking web app.
Learning Outcomes#
By the end of this chapter, you will be able to:
Explain what a view is and when it is useful for presentation, reuse, or access control.
Create and query views that format, filter, and aggregate data.
Use views to limit exposure of sensitive data and provide selective access to results.
Distinguish standard, temporary, and materialized views and describe common use cases.
Explain when data can and cannot be modified through a view.
Use Common Table Expressions to break complex queries into readable steps.
Explain how views, data classification, and data loss prevention support data governance and privacy requirements such as GDPR.
Keywords: SQLite, views, temporary views, materialized views, CTEs, data presentation, aggregation, data governance, data loss prevention, GDPR
Brendan’s Lecture#
from IPython.display import YouTubeVideo
YouTubeVideo('UYlAo8F9wco', width=800, height=500)
Data Dictionary for “Rolling Stone’s Greatest Albums DB.”#
Before diving into views, let’s get familiar with the data. The data dictionary below describes each table. (Author’s note: I built this database manually, starting from Rolling Stone’s greatest-albums lists and adding metadata pulled from the Spotify Application Programming Interface (API).)
Table: artists#
This table holds information about the artists featured in the Rolling Stone lists.
Column Name |
Data Type |
Description |
|---|---|---|
|
STRING |
Unique identifier for each artist. |
|
TEXT |
Name of the artist. |
|
INTEGER |
Number of followers the artist has on Spotify. |
|
TEXT |
Comma-separated list of genres associated with the artist. |
|
INTEGER |
Spotify popularity score, from 0 to 100. |
Table: albums#
This table holds information about the albums in the Rolling Stone lists across years.
Column Name |
Data Type |
Description |
|---|---|---|
|
STRING |
Unique identifier for each album. |
|
TEXT |
Title of the album. |
|
STRING |
Identifier for the artist who created the album. |
|
INTEGER |
Year the album was released. |
|
INTEGER |
Position of the album in the 2003 list. |
|
INTEGER |
Position of the album in the 2012 list. |
|
INTEGER |
Position of the album in the 2020 list. |
|
INTEGER |
Position of the album in the 2023 list. |
Foreign Key |
FOREIGN KEY ( |
Table: tracks#
This table holds information about the tracks on each album.
Column Name |
Data Type |
Description |
|---|---|---|
|
STRING |
Unique identifier for each track. |
|
TEXT |
Name of the track. |
|
STRING |
Identifier for the album that the track belongs to. |
|
INTEGER |
Position of the track within the album. |
|
INTEGER |
Duration of the track in milliseconds. |
Foreign Key |
FOREIGN KEY ( |
Examining the “Head” of the Data#
Before trying to design our own queries, let’s take a look at the head of the data (that is, the first few rows). We can do this using LIMIT.
%%sql
SELECT * FROM artists LIMIT 5;
| artist_id | Artist | followers | genres | popularity | |
|---|---|---|---|---|---|
| 0 | 3koiLjNrgRTNbOwViDipeA | Gaye, Marvin | 5313182 | ['classic soul', 'motown', 'neo soul', 'northern soul', 'quiet storm', 'soul'] | 71 |
| 1 | 3oDbviiivRWhXwIE8hxkVV | Beach Boys | 4499081 | ['baroque pop', 'classic rock', 'folk rock', 'mellow gold', 'psychedelic rock', 'rock', 'singer-songwriter', 'sunshine pop'] | 71 |
| 2 | 5hW4L92KnC6dX9t7tYM4Ve | Mitchell, Joni | 1461857 | ['canadian singer-songwriter', 'folk', 'folk rock', 'singer-songwriter'] | 59 |
| 3 | 7guDJrEfX3qb6FEbdPA5qi | Wonder, Stevie | 6427524 | ['motown', 'soul'] | 73 |
| 4 | 3WrFJ7ztbogyGnTHbHJFl2 | Beatles | 27469908 | ['british invasion', 'classic rock', 'merseybeat', 'psychedelic rock', 'rock'] | 82 |
%%sql
SELECT * FROM albums LIMIT 5;
| album_id | Album | artist_id | Year | rank_2003 | rank_2012 | rank_2020 | rank_2023 | |
|---|---|---|---|---|---|---|---|---|
| 0 | 7qD9oSiS3xFJsw2oSV2akp | What's Going On | 3koiLjNrgRTNbOwViDipeA | 1971 | 6 | 6 | 1 | 1 |
| 1 | 2CNEkSE8TADXRT2AzcEt1b | Pet Sounds | 3oDbviiivRWhXwIE8hxkVV | 1966 | 2 | 2 | 2 | 2 |
| 2 | 1vz94WpXDVYIEGja8cjFNa | Blue | 5hW4L92KnC6dX9t7tYM4Ve | 1971 | 30 | 30 | 3 | 3 |
| 3 | 6YUCc2RiXcEKS9ibuZxjt0 | Songs in the Key of Life | 7guDJrEfX3qb6FEbdPA5qi | 1976 | 57 | 57 | 4 | 4 |
| 4 | 0ETFjACtuP2ADo6LFhL6HN | Abbey Road | 3WrFJ7ztbogyGnTHbHJFl2 | 1969 | 14 | 14 | 5 | 5 |
%%sql
SELECT * FROM tracks LIMIT 5;
| track_id | track_name | album_id | track_number | duration_ms | |
|---|---|---|---|---|---|
| 0 | 4cgLt753cDfsBAbpOcGnZi | What's Going On - Detroit Mix | 7qD9oSiS3xFJsw2oSV2akp | 1 | 247840 |
| 1 | 0lz97gXzH4DD822L62kLeK | What's Happening Brother - Detroit Mix | 7qD9oSiS3xFJsw2oSV2akp | 2 | 163146 |
| 2 | 0oQV9CKXruLbVUTn76eBnr | Flyin' High (In The Friendly Sky) - Detroit Mix | 7qD9oSiS3xFJsw2oSV2akp | 3 | 228693 |
| 3 | 3KAbXV1OSwthoke6BqR0y5 | Save The Children - Detroit Mix | 7qD9oSiS3xFJsw2oSV2akp | 4 | 241626 |
| 4 | 1J3WjiLbPAdW0mSBTxoHQF | God Is Love - Detroit Mix | 7qD9oSiS3xFJsw2oSV2akp | 5 | 106733 |
A few things stand out from the schema:
String primary keys.
artist_id,album_id, andtrack_idare strings, not integers. Spotify likely uses string identifiers because they are globally unique identifiers (GUIDs) — keys that can be generated independently on different machines without coordination. GUIDs are common in distributed databases where many systems write at once.Multi-genre artists. The
genrescolumn stores several values in one column, comma-separated and wrapped in square brackets like a JSON array. That’s efficient storage, but it means querying genres needs string parsing rather than simple equality.Ranks across years. The
albumstable has four ranking columns — one per Rolling Stone list (2003, 2012, 2020, 2023). The lists evolved over time, and this schema captures that history.Track duration in milliseconds. Storing duration in
duration_msis precise but not human-friendly. We will build a view that converts it to minutes and seconds.
What is a VIEW?#
A view is a virtual table built from a SELECT statement. You can query a view the same way you query a regular table. Behind the scenes, the database runs the saved SELECT each time the view is accessed.
The general syntax for creating a view in SQL is:
CREATE [TEMP | TEMPORARY] VIEW [IF NOT EXISTS] view_name AS
SELECT column1, column2, ...
FROM table_name
WHERE condition;
What each piece does:
TEMPorTEMPORARY(optional) — creates a view that only exists for the current session.IF NOT EXISTS(optional) — avoids an error if a view with that name already exists.view_name— the name of the view. Same naming rules as a table.AS— precedes theSELECTthat defines the view.SELECT— the query that produces the view’s rows. It can pull from one or more tables and include conditions, joins, and aggregations.
Once a view exists, you can query it like a table. The result set is regenerated each time the view is accessed, based on the latest data in the underlying tables.
Here’s a simple example of creating a view:
%%sql
DROP VIEW IF EXISTS album_info;
CREATE VIEW album_info AS
SELECT
Albums.album,
Albums.year,
Artists.artist
FROM Albums
JOIN Artists ON Albums.artist_id = Artists.artist_id;
In this example, we create a view named album_info that combines information from the albums and artists tables. The view includes the Album and Year columns from the albums table and the Artist column from the artists table. The tables are joined based on the artist_id foreign key relationship.
Now, whenever we query the album_info view, it will return the result set as defined by the SELECT statement, providing a convenient way to access the combined album and artist information.
%%sql
SELECT * FROM album_info LIMIT 5;
| Album | Year | Artist | |
|---|---|---|---|
| 0 | What's Going On | 1971 | Gaye, Marvin |
| 1 | Pet Sounds | 1966 | Beach Boys |
| 2 | Blue | 1971 | Mitchell, Joni |
| 3 | Songs in the Key of Life | 1976 | Wonder, Stevie |
| 4 | Abbey Road | 1969 | Beatles |
Views offer several benefits:
They simplify complex queries by hiding them behind a single, reusable name.
They add a layer of abstraction over the underlying tables.
They restrict access to specific columns or rows, which helps with security.
They give applications a consistent, central way to read shared data.
The next few sections explore practical uses of views: presenting data, aggregating data, and securing data — all using the music dataset.
Views for Presenting Data#
Views provide a powerful way to present data in a more user-friendly and meaningful manner. By creating views, you can customize the appearance of the data, combine information from multiple tables, and present it in a way that is easier to understand and interpret. Let’s look at a few examples using the music dataset.
Greatest albums, 2023 version only#
Suppose you want to create a view that displays only the albums that appeared in the 2023 version of Rolling Stone’s greatest albums list. Here’s how you can achieve that:
%%sql
DROP VIEW IF EXISTS greatest_albums_2023;
CREATE VIEW greatest_albums_2023 AS
SELECT
Albums.album,
Artists.artist,
Albums.year,
Albums.rank_2023
FROM Albums
JOIN Artists ON Albums.artist_id = artists.artist_id
WHERE rank_2023 IS NOT NULL
ORDER BY rank_2023;
In this example, the greatest_albums_2023 view retrieves the Album, Artist, Year, and rank_2023 columns from the albums and artists tables. The tables are joined based on the artist_id foreign key relationship. The WHERE clause filters the results to include only the albums that have a non-null value in the rank_2023 column. Finally, the ORDER BY clause sorts the albums based on their 2023 ranking.
Now, you can query the greatest_albums_2023 view to retrieve the list of albums from the 2023 version of Rolling Stone’s list:
%%sql
SELECT * FROM greatest_albums_2023 LIMIT 10;
| Album | Artist | Year | rank_2023 | |
|---|---|---|---|---|
| 0 | What's Going On | Gaye, Marvin | 1971 | 1 |
| 1 | Pet Sounds | Beach Boys | 1966 | 2 |
| 2 | Blue | Mitchell, Joni | 1971 | 3 |
| 3 | Songs in the Key of Life | Wonder, Stevie | 1976 | 4 |
| 4 | Abbey Road | Beatles | 1969 | 5 |
| 5 | Nevermind | Nirvana | 1991 | 6 |
| 6 | Rumours | Fleetwood Mac | 1977 | 7 |
| 7 | Purple Rain | Prince and the Revolution | 1984 | 8 |
| 8 | Blood on the Tracks | Dylan, Bob | 1975 | 9 |
| 9 | The Miseducation of Lauryn Hill | Hill, Lauryn | 1998 | 10 |
Presenting track information with reading Minutes and Seconds#
Let’s say you want to present the track information in a more readable format by converting the duration from milliseconds to minutes and seconds. Here’s how you can create a view for that:
%%sql
DROP VIEW IF EXISTS track_info;
CREATE VIEW track_info AS
SELECT
Tracks.track_name,
Albums.album,
Tracks.track_number,
PRINTF('%d:%02d', duration_ms / 1000 / 60, duration_ms / 1000 % 60) AS duration
FROM Tracks
JOIN Albums ON Tracks.album_id = Albums.album_id;
In this view, the track_info retrieves the track_name, album_id, track_number, and duration columns from the tracks table. The printf function is used to format the duration_ms column into a string representation of minutes and seconds (MM:SS format). The duration_ms value is divided by 1000 to convert it from milliseconds to seconds, and then the minutes and seconds are extracted using integer division and modulo operations.
You can now query the track_info view to retrieve the track information with the formatted duration:
%%sql
SELECT * FROM track_info LIMIT 5;
| track_name | Album | track_number | duration | |
|---|---|---|---|---|
| 0 | What's Going On - Detroit Mix | What's Going On | 1 | 4:07 |
| 1 | What's Happening Brother - Detroit Mix | What's Going On | 2 | 2:43 |
| 2 | Flyin' High (In The Friendly Sky) - Detroit Mix | What's Going On | 3 | 3:48 |
| 3 | Save The Children - Detroit Mix | What's Going On | 4 | 4:01 |
| 4 | God Is Love - Detroit Mix | What's Going On | 5 | 1:46 |
Albums with biggest change from 2003 to 2023#
To identify the albums that had the biggest change in ranking from 2003 to 2023, you can create a view that calculates the difference between the rankings:
%%sql
DROP VIEW IF EXISTS biggest_rank_change;
CREATE VIEW biggest_rank_change AS
SELECT
Albums.album,
Artists.artist,
Albums.year,
Albums.rank_2003,
Albums.rank_2023,
rank_2003 - rank_2023 AS rank_change
FROM
Albums
JOIN Artists ON Albums.artist_id = Artists.artist_id
WHERE
rank_2003 IS NOT NULL
AND rank_2023 IS NOT NULL
ORDER BY ABS(rank_change) DESC
LIMIT 10;
In this view, the biggest_rank_change retrieves the Album, Artist, rank_2003, rank_2023, and rank_change columns. The rank_change column is calculated by subtracting the rank_2023 from the rank_2003. The WHERE clause filters the results to include only albums that have non-null values for both rank_2003 and rank_2023. The ORDER BY clause sorts the albums based on the absolute value of the rank_change in descending order. Finally, the LIMIT clause retrieves only the top 10 albums with the biggest rank change.
You can query the biggest_rank_change view to see the albums with the most significant changes in ranking between 2003 and 2023:
%%sql
SELECT * FROM biggest_rank_change
| Album | Artist | Year | rank_2003 | rank_2023 | rank_change | |
|---|---|---|---|---|---|---|
| 0 | Voodoo | D'Angelo | 2000 | 480 | 28 | 452 |
| 1 | Aquemini | OutKast | 1998 | 500 | 49 | 451 |
| 2 | The Blueprint | Jay-Z | 2001 | 456 | 50 | 406 |
| 3 | Kid A | Radiohead | 2000 | 420 | 20 | 400 |
| 4 | Enter the Wu_Tang: 36 Chambers | Wu Tang Clan | 1993 | 382 | 27 | 355 |
| 5 | Illmatic | Nas | 1994 | 396 | 44 | 352 |
| 6 | Live Through This | Hole | 1994 | 458 | 106 | 352 |
| 7 | Maggot Brain | Funkadelic | 1971 | 478 | 136 | 342 |
| 8 | The Score | Fugees | 1996 | 469 | 134 | 335 |
| 9 | Heart Like a Wheel | Ronstadt, Linda | 1975 | 163 | 490 | -327 |
This view has some limitations (for example, it doesn’t account for albums that fell out of the rankings altogether). However, it gives a good sense of how critics views have changed over time. For example, many of the albums the had the biggest increases in rank (by D’Angelo, Jay-Z, Radiohead, and Outkast) were still relatively “new” in 2003. However, looking back, it appears they have “stood the test of time.”
Views for Aggregating Data#
In addition to presenting data in a more meaningful way, views can also be used to aggregate data and provide summarized information. Aggregation involves performing calculations or applying functions across multiple rows of data to derive summarized results. Let’s explore a couple of examples of using views for data aggregation in our music database.
Artists with the Most Albums in the Database#
To find out which artists have the most albums in the database, we can create a view that counts the number of albums for each artist and orders the results by the album count in descending order. We’ll also include their 2024 popularity (according to Spotify).
%%sql
DROP VIEW IF EXISTS artist_album_count;
CREATE VIEW Artist_Album_Count AS
SELECT
Ar.Artist,
COUNT(*) AS Album_Count,
Ar.popularity AS '2024_Popularity'
FROM
Albums Al
JOIN Artists Ar ON Al.artist_id = Ar.artist_id
GROUP BY Ar.Artist
ORDER BY Album_Count DESC;
The view is named
Artist_Album_Countand selects two columns:Ar.Artist(the artist name) andCOUNT(*)(the count of albums for each artist).The
FROMclause specifies theAlbumstable as the main table and joins it with theArtiststable using theartist_idforeign key relationship.The
GROUP BYclause groups the results byAr.Artist, so that theCOUNT(*)function calculates the number of albums for each unique artist.The
ORDER BYclause sorts the results based on theAlbum_Countcolumn in descending order, so artists with the most albums appear at the top.
To retrieve the results, you can simply query the Artist_Album_Count view:
%%sql
SELECT * FROM Artist_Album_Count LIMIT 10;
| Artist | Album_Count | 2024_Popularity | |
|---|---|---|---|
| 0 | Dylan, Bob | 11 | 69 |
| 1 | Rolling Stones | 10 | 77 |
| 2 | Beatles | 10 | 82 |
| 3 | Springsteen, Bruce | 9 | 75 |
| 4 | Who, The | 7 | 65 |
| 5 | Bowie, David | 7 | 75 |
| 6 | West, Kanye | 6 | 90 |
| 7 | Marley, Bob | 6 | 80 |
| 8 | John, Elton | 6 | 81 |
| 9 | Young, Neil | 5 | 67 |
Average Song Length by Artists#
We can also create an average song length by artist, to see which artists have the longest (or shortest songs). For example, let’s create a view that sorts by longest song length
%%sql
DROP VIEW IF EXISTS Average_Song_Length;
CREATE VIEW Average_Song_Length AS
SELECT
Artists.artist,
-- Average song length minutes
ROUND(AVG(Tracks.duration_ms) / 60000,2) AS avg_song_length,
COUNT(Tracks.track_id) AS num_songs
FROM
Tracks
JOIN Albums ON Tracks.album_id = Albums.album_id
JOIN Artists ON Albums.artist_id = Artists.artist_id
GROUP BY Artists.artist
HAVING num_songs > 5
ORDER BY avg_song_length DESC;
This query creates a view named Average_Song_Length that calculates the average song length in minutes and the number of songs for each artist. The view does six things:
Joins
Tracks,Albums, andArtistsusing their foreign keys (album_idandartist_id).Computes the average song length per artist by dividing
duration_msby 60,000 (to convert milliseconds to minutes) and rounding to two decimal places. Aliased asavg_song_length.Counts the number of songs per artist with
COUNT(track_id), aliased asnum_songs.Groups the rows by artist name.
Filters with
HAVINGto keep only artists with more than five songs in the database.Sorts the result by
avg_song_lengthin descending order.
We can now query the view:
%%sql
SELECT * FROM Average_Song_Length LIMIT 5;
| Artist | avg_song_length | num_songs | |
|---|---|---|---|
| 0 | Allman Brothers Band | 11.24 | 7 |
| 1 | Yes | 9.26 | 7 |
| 2 | Davis, Miles | 7.58 | 45 |
| 3 | War | 7.34 | 6 |
| 4 | King Sunny Ade | 7.15 | 10 |
Views for Securing Data#
Views can also serve as a security mechanism. By exposing only certain columns or rows, you can control who sees what — especially when the underlying table contains Personally Identifiable Information (PII).
PII is any data that can identify a specific person, directly or indirectly. Common examples include names, email addresses, phone numbers, Social Security numbers, and financial account numbers.
Protecting PII matters for three reasons:
Legal compliance. Laws like GDPR and HIPAA require organizations to protect personal data. Violations can mean significant fines.
Ethical responsibility. Mishandled PII can lead to identity theft and financial fraud for real people.
Trust and reputation. A breach damages a company’s brand and erodes customer confidence.
By creating views that hide sensitive columns, you can let users access the data they need without exposing the rest.
Fake data table: Musician Info#
First, let’s create a Musician_Info table that includes PII so we can demonstrate the technique:
%%sql
DROP TABLE IF EXISTS Musician_Info;
CREATE TABLE Musician_Info (
musician_id INTEGER PRIMARY KEY,
name TEXT,
email TEXT,
phone TEXT,
address TEXT,
ssn TEXT,
bank_account TEXT
);
INSERT INTO Musician_Info (name, email, phone, address, ssn, bank_account)
VALUES
('Freddie Mercurious', 'freddie@queen.com', '123-456-7890',
'1 Bohemian Rhapsody Lane', '123-45-6789', '1234567890'),
('Elton Johnsmith', 'elton@rocketman.com', '987-654-3210',
'42 Yellow Brick Road', '987-65-4321', '9876543210');
SELECT * FROM Musician_Info;
| musician_id | name | phone | address | ssn | bank_account | ||
|---|---|---|---|---|---|---|---|
| 0 | 1 | Freddie Mercurious | freddie@queen.com | 123-456-7890 | 1 Bohemian Rhapsody Lane | 123-45-6789 | 1234567890 |
| 1 | 2 | Elton Johnsmith | elton@rocketman.com | 987-654-3210 | 42 Yellow Brick Road | 987-65-4321 | 9876543210 |
In this example, the Musician_Info table contains PII columns like ssn and bank_account, which should not be accessible to all users.
Creating Views to Selectively Hide PII#
To protect PII, we can create a view that exposes only the non-sensitive columns:
%%sql
CREATE VIEW Musician_Public_Info AS
SELECT musician_id, name, email, phone, address
FROM Musician_Info;
The Musician_Public_Info view selects only the musician_id, name, email, phone, and address columns from the Musician_Info table, excluding the sensitive PII columns ssn and bank_account.
Now, instead of granting access to the entire Musician_Info table, you can grant users access to only the Musician_Public_Info view. This way, users can retrieve the public information about musicians without being able to see the PII.
%%sql
SELECT * FROM Musician_Public_Info;
| musician_id | name | phone | address | ||
|---|---|---|---|---|---|
| 0 | 1 | Freddie Mercurious | freddie@queen.com | 123-456-7890 | 1 Bohemian Rhapsody Lane |
| 1 | 2 | Elton Johnsmith | elton@rocketman.com | 987-654-3210 | 42 Yellow Brick Road |
The query above will return only the non-sensitive columns defined in the view.
Types of Views and When to Use Them#
SQL supports three types of views: default views, temporary views, and materialized views. Each one serves a different purpose.
Default views#
Default views (also called simple or regular views) are the most common type. They’re created with CREATE VIEW and stored in the database schema alongside the tables. They are recomputed every time they are queried, so they always reflect the current data.
CREATE VIEW view_name AS
SELECT column1, column2, ...
FROM table_name
WHERE condition;
Example use case. A music streaming app needs a clean view of song details that combines Songs, Albums, and Artists. A default view called Song_Info gives developers a single object to query instead of writing the same join repeatedly.
Temporary views#
Temporary views (also called ephemeral views) are created with CREATE TEMPORARY VIEW. They live in a separate temporary database and are dropped automatically when the connection closes. They are invisible to other connections.
CREATE TEMPORARY VIEW view_name AS
SELECT column1, column2, ...
FROM table_name
WHERE condition;
Example use case. A one-time music festival needs a reporting dashboard built from ticket sales data. Temporary views can hold the complex calculations needed for each report without cluttering the main schema, and they disappear when the festival ends.
Materialized views#
A materialized view stores its result in a separate table. Unlike default views, which run their query each time, a materialized view is refreshed on a schedule or on demand. They are essentially a cached snapshot.
-- not supported natively in SQLite
CREATE MATERIALIZED VIEW view_name AS
SELECT column1, column2, ...
FROM table_name
WHERE condition;
-- Refresh the materialized view
REFRESH MATERIALIZED VIEW view_name;
Example use case. A music analytics platform reports artist popularity based on millions of streaming events. A materialized view can precompute the popularity metrics so dashboards load quickly. The view gets refreshed nightly to reflect new streams.
SQLite does not have built-in support for materialized views. The workaround is to create a regular table and populate it from a query. Most enterprise-scale systems (PostgreSQL, MySQL, Oracle, SQL Server) support materialized views natively.
Choosing the right type#
A few factors guide the choice:
Complexity of the query. Heavier queries benefit more from caching.
Update frequency. Frequently changing data argues for default views.
Performance requirements. Hot dashboards may need materialized views.
Scope of the data. Session-specific work fits temporary views.
Default views handle most cases. Temporary views are useful for isolated, session-scoped work. Materialized views shine when query cost is high and the underlying data changes slowly.
Modifying Data in Views#
In SQLite, views are read-only by default. You cannot run INSERT, UPDATE, or DELETE directly against a view. (You can drop a view with DROP VIEW, but that removes the view, not its data.)
The reason is that a view is just a saved query. If the query involves joins, aggregations, or calculations, the database often cannot figure out how to translate a change to the view back into changes on the underlying tables.
You can work around this with triggers. A trigger is a database object that runs automatically in response to a specific event — INSERT, UPDATE, or DELETE — on a table or view. Triggers can intercept operations against a view and redirect them to the right rows in the underlying table.
Consider the Musician_Public_Info view from the previous section:
CREATE VIEW Musician_Public_Info AS
SELECT musician_id, name, email, phone, address
FROM Musician_Info;
To allow inserts through this view, you can create a trigger on the view that forwards new rows to the underlying Musician_Info table.
Here’s how to create a trigger that enables INSERT against the Musician_Public_Info view:
import sqlite3
with sqlite3.connect('rs_greatest_albums.db') as conn:
conn.execute('DROP TRIGGER IF EXISTS Insert_Musician_Public_Info;')
try:
conn.execute("""
INSERT INTO Musician_Public_Info (name, email, phone, address)
VALUES ('Taylor Drift', 'taylor@example.com', '123-456-7890', '456 Elm St');
""")
conn.commit()
except sqlite3.OperationalError as exc:
print(f'Expected view error: {exc}')
Expected view error: cannot modify Musician_Public_Info because it is a view
%%sql
DROP TRIGGER IF EXISTS Insert_Musician_Public_Info;
-- Trigger for INSERT operation
CREATE TRIGGER Insert_Musician_Public_Info
INSTEAD OF INSERT ON Musician_Public_Info
FOR EACH ROW
BEGIN
INSERT INTO Musician_Info (musician_id, name, email, phone, address)
VALUES (NEW.musician_id, NEW.name, NEW.email, NEW.phone, NEW.address);
END;
Now, we should be able to INSERT data into this table:
%%sql
-- Insert a new musician into the view
INSERT INTO Musician_Public_Info (name, email, phone, address)
VALUES ('Taylor Drift', 'taylor@example.com', '123-456-7890', '456 Elm St');
SELECT * FROM Musician_Public_Info;
| musician_id | name | phone | address | ||
|---|---|---|---|---|---|
| 0 | 1 | Freddie Mercurious | freddie@queen.com | 123-456-7890 | 1 Bohemian Rhapsody Lane |
| 1 | 2 | Elton Johnsmith | elton@rocketman.com | 987-654-3210 | 42 Yellow Brick Road |
| 2 | 3 | Taylor Drift | taylor@example.com | 123-456-7890 | 456 Elm St |
These operations are intercepted by the triggers and applied to the Musician_Info table.
A few things to keep in mind:
Performance. Triggers add overhead and can slow down writes, especially on large datasets or complex views.
Data integrity. Triggers should handle every column the view exposes. Views spanning multiple tables may need several triggers and careful logic to keep the underlying data consistent.
Other databases. Enterprise systems usually have better native support for updatable views than SQLite does.
Common Table Expressions#
A Common Table Expression (CTE) is a temporary named result set that lives inside a single SQL statement. CTEs help you break a complex query into smaller, named parts. They are useful when:
You want to reference the same subquery more than once in a single query.
You want to make a long query easier to read.
Basic syntax#
WITH cte_name AS (
SELECT column1, column2, ...
FROM table_name
WHERE condition
)
SELECT *
FROM cte_name;
How it works:
WITHstarts the CTE definition.cte_nameis the name you give the CTE; the main query references it by name.The parenthesized
SELECTis the CTE’s body — it produces the temporary result set.The main query, after the CTE, uses
cte_nameas if it were a regular table.
Example: “Time” in album titles, 2000–2020#
Let’s create a CTE for albums released between 2000 and 2020, then filter that CTE for titles that include the word “time”.
%%sql
-- THe Common Table Expression Starts Here
WITH recent_albums AS (
SELECT
Albums.album,
Artists.artist,
Albums.year
FROM Albums
JOIN Artists ON Albums.artist_id = Artists.artist_id
WHERE Albums.year BETWEEN 2000 AND 2020
)
-- The Main Query Starts Here
SELECT *
FROM recent_albums
WHERE recent_albums.album LIKE '%time%'
ORDER BY year DESC
LIMIT 10;
| album | artist | year | |
|---|---|---|---|
| 0 | Modern Times | Dylan, Bob | 2006 |
| 1 | All Time Greatest Hits | Lynn, Loretta | 2002 |
| 2 | Time (The Revelator) | Welch, Gillian | 2001 |
The query uses a CTE named recent_albums that pulls albums released between 2000 and 2020 along with the corresponding artist names. The CTE is defined with WITH, naming the result set and providing the SELECT that builds it.
The main query then treats recent_albums like a regular table. It selects all the CTE’s columns, filters to album names containing the word “time”, sorts by year in descending order, and limits the output to the top 10 rows.
Using a CTE here pays off in three ways:
It splits the query into smaller, more understandable parts.
It lets the same temporary result set be referenced multiple times within the same query if needed.
It encourages a modular style that makes longer SQL easier to maintain.
Case Study: Data Governance in a Music Ranking Web App#
Imagine a web app called MusicRank that lets users keep personal rankings of their favorite albums and share them with a community of music fans. The features:
Users create accounts, log in, and manage their profiles.
Users search for albums, add them to a personal ranking, and assign scores.
The app combines rankings from all users to produce global popularity scores.
Users connect with friends, share rankings, and discover music through their network.
Behind the scenes, the app stores everything in a database (SQLite for this case study) and uses Python to handle the web layer and the database interactions.
With that kind of platform comes responsibility. Developers have to think carefully about data governance — the rules and practices that keep user data safe, private, and compliant with the law. The rest of this chapter walks through three pieces of that picture:
Data loss prevention policies.
Data classification.
Global regulations like GDPR.
Good data governance is not only about avoiding fines. It is about building an app users can trust.
Graphic: UML for Music Web App#
Data Loss Prevention Policies#
When building a web app like MusicRank, you need ways to protect sensitive data from unauthorized access and accidental loss. That is the job of data loss prevention (DLP).
DLP refers to the strategies, practices, and tools that keep sensitive data from being lost, stolen, or misused. DLP policies help organizations protect their data assets and stay compliant with regulations.
Three DLP areas are worth a closer look in our music app.
1. Data encryption#
Encryption turns plaintext data into coded text that only the right key can decode. It protects data from anyone who manages to read it without permission.
Example. When a user creates an account in MusicRank, their password should be encrypted (hashed, more specifically) before it is stored. Even if the database is leaked, attackers cannot read the original passwords.
In a SQLite-based app, encryption is usually handled in the application layer. Python provides libraries such as cryptography and pycryptodome that supply solid algorithms and key management. The app encrypts sensitive fields before writing them to the database and decrypts them on the way out.
2. Access controls#
Access controls decide who can see which data and what they can do with it. Without access controls, every user could see every record.
Example. MusicRank uses role-based access control (RBAC). Regular users can only view and modify their own rankings. Administrators have extra privileges to manage the platform. Roles map to permissions, and permissions decide who gets through.
Python web frameworks like Flask and Django provide the building blocks: authentication, sessions, and authorization checks. When a request arrives, the framework checks the user’s role and decides whether the request is allowed.
3. Data backups#
Backups are copies of the data made to protect against accidents, hardware failure, and attacks. Regular backups let you recover the system when something goes wrong.
Example. Suppose the MusicRank server crashes and the database file becomes corrupted. With recent backups in place, the team restores from the latest copy and only loses a small window of data.
In a SQLite-based app, backups are typically automated with Python scripts: copy the database file to a separate location on a schedule and store it in secure cloud storage. In a disaster, restoring is as simple as putting the backup back in place.
SQLite leans on Python and the operating system for these capabilities. Larger systems like PostgreSQL and MySQL have richer native support for encryption, access controls, and backup. With Python libraries and good habits, though, a SQLite app can still meet most DLP needs.
Data Classification#
A key step in data governance is classifying data by sensitivity and regulatory requirements. Classification helps you understand what data you hold and what kind of protection it needs.
Four common classification categories matter to a music ranking app:
Personally Identifiable Information (PII). Data that can identify a person directly or indirectly. Examples: names, email addresses, phone numbers, social security numbers. MusicRank collects PII when users sign up — at minimum names, email addresses, and profile pictures. PII needs encryption in transit and at rest, restricted access by role, and clear consent from users about how it will be used.
Protected Health Information (PHI). A subset of PII tied to a person’s health, healthcare services, or healthcare payments. In the United States, PHI is regulated by the Health Insurance Portability and Accountability Act (HIPAA). MusicRank doesn’t directly deal with PHI, but if it ever integrated with a health service or collected any health-related signal, HIPAA requirements would kick in.
Payment Card Industry Data Security Standard (PCI-DSS). A set of rules that protects credit card data during transactions and storage. If MusicRank added e-commerce — concert ticket sales, paid subscriptions — PCI-DSS would apply. Compliance usually means using a vetted third-party payment gateway, encrypting card data, and following strict access-control practices.
Sensitive business data. Internal information that gives the company its edge — proprietary algorithms, partnership agreements, financial details. For MusicRank, this might include the ranking algorithm itself. Protect this category with strong access controls, monitoring, and employee training.
Best practices#
A short checklist for handling classified data in a web app:
Run a data inventory to catalog what you collect.
Write clear classification policies and rules for each category.
Enforce access controls based on roles and data sensitivity.
Encrypt sensitive data both at rest and in transit.
Train every developer and employee on the handling rules.
With classification in place, security controls can be matched to the actual risk of each piece of data.
Global Regulations (GDPR)#
Web apps usually have users from many countries, which means developers must know and follow global data protection laws. The most influential recent regulation is the General Data Protection Regulation (GDPR), introduced by the European Union.
GDPR took effect on May 25, 2018. It applies to any organization that collects, processes, or stores personal data about people in the EU, no matter where the organization itself is located. Its goals are to give individuals more control over their data and to require organizations to handle that data responsibly and transparently.
Four GDPR areas matter to a music ranking app:
Data collection and consent.
You must have a legal basis for collecting and processing personal data. Usually that means explicit consent from the user.
At signup, the app must clearly explain what data is collected, why, and how it will be used. Users must be able to give or withhold consent for different purposes.
Consent has to be freely given, specific, informed, and unambiguous. Withdrawing consent must be as easy as giving it.
Individual rights.
GDPR grants users specific rights: access to their data, the right to correct inaccurate data, the right to erasure (the “right to be forgotten”), and the right to data portability.
MusicRank must support these rights with concrete features — profile editing, account deletion, data export, and so on.
Privacy by design and default.
Data protection must be built into the system from the start, not bolted on later. The default settings must be the safest ones.
Practically, this means collecting only what you need, using pseudonymization or anonymization where possible, and protecting whatever you keep.
Breach notification.
When a breach poses a risk to users, GDPR requires the organization to notify both the relevant supervisory authority and the affected individuals — quickly.
MusicRank needs monitoring and alerting in place, an incident response plan, and clear channels for telling users when something has gone wrong.
GDPR is one example among many. Other regions have their own rules — the California Consumer Privacy Act (CCPA) in the United States, the Personal Information Protection Law (PIPL) in China, and so on. Developers need to track the regulations that apply wherever their users live and design the app to meet them all.
Discussion Questions#
Why does data governance matter for a web app like MusicRank? What can go wrong without it?
How do encryption, access controls, and backups work together to prevent data loss and unauthorized access? Give a concrete example of each from MusicRank.
Suppose MusicRank adds concert ticket sales. What new data classifications come into play, and how would you handle them?
As a developer, how would you make MusicRank GDPR-compliant? What steps would you take for consent, user rights, and privacy by design?
If MusicRank suffered a data breach, what would you do to limit damage and meet GDPR’s notification requirements?
How would you stay current on data protection laws as MusicRank expands into new countries?
Beyond technical measures, what policies and employee training would help build a culture of privacy and security inside the team?
Chapter Summary#
Views are virtual tables that simplify complex queries and provide a user-friendly interface for data retrieval
Views enable data aggregation, access control, and enhance data security
SQLite supports three types of views: default views, temporary views, and materialized views (simulated)
Common Table Expressions (CTEs) are temporary named result sets within a single SQL statement
CTEs break down complex queries into manageable parts, improving code readability and maintainability
Views and CTEs can be used together to create powerful and efficient database solutions
Data governance, including data loss prevention and compliance with regulations like GDPR, is crucial for web applications handling sensitive user data
Implementing data governance requires a holistic approach involving technical measures, organizational policies, and employee training
Lab: Practice Your SQL#
Practice with the Loop of the Recursive Dragon#
Sharpen what you just learned with a chapter-matched review set in the Loop of the Recursive Dragon — an adaptive review game with multiple question types and RPG-style mechanics, built for this book.
Glossary#
Use this list as a quick review sheet for the chapter.
Views and CTEs#
View — A virtual table whose contents come from a saved query.
CREATE VIEW v AS query— SQL that creates a virtual table from a query result.DROP VIEW IF EXISTS view_name— SQL that removes a view only if it already exists.Temporary view — A view that lasts only for the current session or transaction.
Materialized view — A stored copy of a query result that can be refreshed when the underlying data changes.
Common Table Expression (CTE) — A temporary named result set defined inside a single query using
WITH.WITH cte_name AS (inner_query) outer_query— SQL syntax that defines a CTE and then uses it in a later query.
Data governance and security#
Data dictionary — Metadata that describes the structure and constraints of a database.
Data classification — Grouping data by sensitivity, value, or handling rules.
Data governance — The overall management of data availability, usability, integrity, and security.
Data Loss Prevention (DLP) — Practices that stop sensitive data from leaving approved channels.
Encryption — Turning data into a protected form that unauthorized users cannot easily read.
Role-based access control (RBAC) — Restricting access based on the role assigned to a user.
Personally Identifiable Information (PII) — Data that can identify a specific person.
Personal Health Information (PHI) — Health-related information that can be tied to a person.
Informed consent — Permission given by someone who understands the likely effects of agreeing.
Globally unique identifier (GUID) — A 128-bit value used to identify something uniquely across systems.
Distributed database — A database stored across multiple computers rather than on a single machine.
Compliance frameworks#
GDPR — The European Union regulation governing data protection and privacy.
PCI-DSS — The security standard that applies to handling payment card data.
Right to be forgotten — A GDPR right that lets people have their personal data erased in certain cases.
Right to data breach notification — A GDPR right that requires individuals to be told when their personal data has been breached.
Right to rectify — A GDPR right that lets people correct inaccurate personal data.