Session 6

The Theory and Practice of SQL


CS 3140
Database Systems


Opening Exercise

For Homework 2, Problem 2 you converted an ER model with an entity named Ship and a unary relationship named SisterOf into two relations:

    Ship    (name, yearLaunched)
    SisterOf(shipName, sisterName)
After last week, you know how to implement these relations in SQL.

Write the SQL commands to:

  1. create the Ship table, and
  2. insert a tuple for the USS Iowa, which was launched in 1942.



Opening Solution

We are using SQLite as our DBMS, so I'll fire it up in order to create Ship and begin to populate it:

    $ sqlite3
    sqlite> CREATE TABLE Ship(
       ...>    name VARCHAR(30),
       ...>    yearLaunched SMALLINT,
       ...>    PRIMARY KEY (name)
       ...>    );
    sqlite> INSERT INTO Ship VALUES('USS Iowa', 1942);

We could leave the datatypes off, or use another numeric type for yearLaunched. The default number type INT is 32 bits, which is much larger than we need to represent a year, so I went with the 16-bit SMALLINT.

Could we do this differently?

Now all we need is more ship data and a information about sister ships. Then we will be able to do this:

    sqlite> SELECT sisterName from SisterOf
       ...>  WHERE shipName='USS Iowa';
    USS Missouri
    USS New Jersey
    USS Wisconsin



Where Are We?

Last week, we learned the basics of SQL and SQLite. These tools give us the ability to implement our database models in executable code.

Some of you also learned a little bit about using Unix...

For similar reasons, we will study the theory that underlies SQL (we start today!) and the way that a DBMS works under the hood, later in the semester.

Today, we will summarize what we learned about each, extend our knowledge a bit, and consolidate our learning with a couple of high-level ideas.



Where We Are: SQL

SQL is the most common language for creating and querying relational databases. It consists of two sub-languages: a data definition language for creating and manipulating schemas, and a data manipulation language for querying the database and modifying the data in the tables.

We used two commands from the data definition language last week: the CREATE command to define a new table, and the DROP command to delete a table's schema from the database.

CREATE has many different forms. The instruction to create the Ship table above:

    sqlite> CREATE TABLE Ship(
       ...>    name VARCHAR(30),
       ...>    yearLaunched SMALLINT,
       ...>    PRIMARY KEY (name)
       ...>    );
specifies types for the table's attributes and a key for the table. Both are optional. It turns out that there are other ways to specify a table's key, which we will eventually see.

The CREATE command exemplifies a common phenomenon in SQL: there are often multiple ways to perform the same action or specify the same data.

We also learned two commands from SQL's data manipulation language: INSERT, which adds a tuple to a table, and SELECT, which queries the database for data matching some condition.

    sqlite> INSERT INTO Ship VALUES('USS Iowa', 1942);
    ...
    sqlite> SELECT sisterName from SisterOf
       ...>  WHERE shipName='USS Iowa';
    USS Missouri
    USS New Jersey
    USS Wisconsin

Like CREATE, the SELECT command takes multiple forms. We can specify the attributes we want to see in an answer or use * to see all the attributes. The WHERE clause is optional. When present, though, WHERE can refer to attributes in two different tables:

    sqlite> SELECT Customer.name FROM Customer, Address
       ...> WHERE Address.city='Cedar Falls'
       ...>       AND Address.owner=Customer.ssNo;
    John Smith
    Eugene Wallingford

SELECT is a powerful mini-language all on its own. It allows us to do declarative programming: rather than describe a procedure that computes the answer we seek, we state the kind of data we want to see. Our DBMS -- in this case, SQLite -- is charged with figuring out the procedure. Ideally, we would like for the DBMS to construct an efficient procedure. This is a challenge for the implementors of our DBMSes.

At the end of the session, we will begin our study of the theory that underlies SQL and makes it possible for a DBMS to construct such an efficient procedure to answer a query: relational algebra.



Where We Are: SQLite

Last week we also learned about SQLite, a popular embedded database management system. It is the DBMS that reads and executes our SQL commands.

SQLite provides a dictionary of "dot" commands for controlling the DBMS. These are distinct from SQL's commands, which are common to most relational database systems. They start with a period and do not end with a semicolon like SQL's commands.

Some SQLite commands operate at the system level. .quit is the one command of this form that we have used. At the system level, we can access and close databases using .open and .save .

Other SQLite commands operate at the level of a database's tables and schemas. We have used these:

We used .schema and .dump to examine individual tables:

    sqlite> .schema Ship
    CREATE TABLE Ship(
        name VARCHAR(30),
        yearLaunched SMALLINT,
        PRIMARY KEY (name)
        );

    sqlite> .dump Ship
    PRAGMA foreign_keys=OFF;
    BEGIN TRANSACTION;
    CREATE TABLE Ship(
        name VARCHAR(30),
        yearLaunched SMALLINT,
        PRIMARY KEY (name)
        );
    INSERT INTO Ship VALUES('USS Iowa',1942);
    INSERT INTO Ship VALUES('USS New Jersey',1942);
    ...
    INSERT INTO Ship VALUES('Hindenburg',1936);
    INSERT INTO Ship VALUES('Graf Zeppelin II',1938);
    COMMIT;

If we omit the table on either command, SQLite will perform the action for all the tables in the database!

Let's learn one new SQLite dot command today that can immediately make us much more efficient SQL programmers: .read .

Last session, we learned that an SQLite database is a single, portable file. This makes it easy for me to share, say, sister-ships.db, the sister ships database from our opening exercise. It contains typed schema for the Ship and SisterOf tables plus data for several groups of sister ships that I learned about on Wikipedia. That's the database I've been in using in my examples today.

Is that the only way to share an SQL database?

In the course of working through last week's sessions, you may have reached a point where working in the SQLite interpreter seemed limiting. Maybe you made some typos and wanted a quick way to re-run a batch of commands all at once to recreate a table and its tuples. That's certainly what we are used to being able to do as programmers.

We can use the .read command to load a batch of SQL statements from a file into SQLite. Suppose that I define the schema for Ship and SisterOf in SQL:

    CREATE TABLE Ship(
        name VARCHAR(30),
        yearLaunched SMALLINT,
        PRIMARY KEY (name)
        );

    CREATE TABLE SisterOf(
        shipName VARCHAR(30),
        sisterName VARCHAR(30),
        PRIMARY KEY (shipName, sisterName)
        );
and store them in a file named sister-ships-schema.sql. I can use .read to load the file, executing all of the statements it contains:
    $ sqlite3
    sqlite> .read sister-ships-schema.sql
    sqlite> .tables
    Ship      SisterOf
    sqlite> .schema Ship
    CREATE TABLE Ship(
        name VARCHAR(30),
        yearLaunched SMALLINT,
        PRIMARY KEY (name)
        );

Beautiful. This enables us to write SQL programs, load them, and execute them. Once I have a working database, sharing the database file itself is efficient. But while I'm experimenting and debugging my SQL code, it can be really handy to work with the SQL code itself.

Even better, I can share with you my SQL program. Here is sister-ships.sql, which contains both the create and insert commands I used to create the tables and insert all the data. Download it, load it into SQLite, and play with it!

Challenge Questions: Look at all the insert commands in that file. Isn't there a lot of duplication in the program? How can we eliminate the duplication? What effect would that have on the queries we write against the database?

As we saw last session SQLite -- like most implementations of SQL -- do not implement all the features of any SQL standard. This is true even of expensive, high-end professional DBMSes such as Oracle. We'll do our best to avoid SQLite's missing features this semester and point out the ones we do encounter.

Interestingly, SQL does not implement the entire theory on which it is based. Even so, SQL is incredibly useful for our needs. Let's begin to explore the theory so that we can then go on to explore more SQL and understand it better.



An Introduction to Relational Algebra

We begin our study of operations on relations by learning about relational algebra, a special-purpose algebra that consists of only a few simple but powerful ways to construct new relations from existing relations.

Some early prototypes of relational DBMSes used relational algebra itself as a query language, but no one does that now. Like many mathematical theories, it is not suitable for most people's daily needs. Instead, we use SQL, a query language that is based on relational algebra.

Most SQL programs are syntactic abstractions of expressions in the relational algebra. That is, they use more convenient names and grammar than the algebra, but they can be translated directly into it. Even more, when a DBMS processes an SQL query, the first thing it usually does is translate the expression into relational algebra, or some other very similar internal representation, before optimizing it.

These are all good reasons to begin our study of relational database programming with an invertigation of relational algebra.

Sidenote: If you'd like to learn more about the idea of syntactic sugar and how language processors evaluate the syntactic abstractions in a language, consider taking Programming Languages and Paradigms.

First, let's consider why we need a specialized query language at all. Why not use Java or Python or C++ to write our database programs? The answer may surprise you: relational algebra is most useful to us because it is less powerful than a traditional programming language.

We can certainly do things in Java and Python that we can't so in standard SQL. For example, we cannot write an SQL quesy to determine if the number of tuples in a relation even or odd. It is so easy to do so in any procedural programming language that an student in Intro to Computing can do it.

Limiting our query language in a way that allows it only to express relational queries offers a two big advantages. The first is ease of programming. It can be much easier to write an SQL query than the equivalent procedural program, because its operators allow us to describe a result rather the process for computing the result. The second is performance. Because the relational algebra is small and well-understood mathematically, the DBMS can manipulate our queries to produce highly optimized code.

Sidenote: If you'd like to learn more about the idea of optimization and how compilers translate code into efficient executables, consider taking Translation of Programming Languages.

Thus, relational algebra offers improvements at both programming time and at run time. Such advantages help to explain the popularity of SQL as a database programming language.

What is an algebra? An algebra consists of a set of values and a set of operators. (Our data types are usually part of an algebra.) The algebra of "arithmetic" consists of numeric values in the form of variables and constants, along with the common operations of addition, subtraction, multiplication, and division. Any algebra allows us to build expressions by applying operators to operands and other expressions.

In relational algebra, the values are relations, which are sets of tuples. There are constants (specific relations) and variables that stand in for specifix relations.

There are four broad classes of operation in relational algebra:

We refer to expressions in relational algebra as queries.

This is the foundation of relational algebra, which is the foundation of the SQL, the query language of the relational data model.

Next, we go forward.



Review of Homework 2

Let's take a look at Homework 2.

Problem 1.    Under ordinary ER modeling, representing leagues, teams, and players would be difficult. Players with the same name can play on different teams, and teams with the same name can play in different leagues. That would make it impossible to have unique keys in our relations. But with weak entities and supporting entities and relationships, a weak entity can inherit part of its key from the supporting entity.

leagues, weak teams, and weak players

Problem 2.    This problem gave you a chance to work with a unary relationship and see that it's not really any different than a binary relationship.

    Ship    (name, yearLaunched)
    SisterOf(shipName, sisterName)
When the same entity plays different roles in the same relationship, we want to carefully choose attribute names in the relationship.

Problem 3.    This problem gave you a chance to convert a weak entity into a relation. Customer and Flight are ordinary entities, so they generate standard relations:

    Customer(ssNo, name, addr, phone)
    Flight  (number, day, aircraft)
Booking is a weak entity supported by both, so it needs the keys from both to create a unique key for bookings:
    Booking (customer, flightNo, flightDay, row, seat)
We do not have to create relations for the toCust and toFlt relationships. Booking already contains the keys from Customer and Flight. This will be true for all weak entities and their supporting relationships.

Problem 4.    What have you learned...

~~~~~

Exam 1 is your online activity for the week. I plan to post it tomorrow. It will be a timed exam through eLearning. I will give you a window of a couple of days to take the exam, so that you can do it at a time that is most convenient to you.



Eugene Wallingford ..... wallingf@cs.uni.edu ..... September 24, 2020