Introduction to RDBMS and MySQL

 

 

Topics:

  1. Introduction.
  2. Relational Databases.
  3. Data Types.
  4. Database Creation.
  5. NULL's.
  6. Dropping a table.
  7. SQL-DML commands (Insert, Select, Update, Delete).
  8. Describing a table.
  9. Running SQL commands from a file (Scripting)

 

Introduction:

In the mid 1970s SQL was developed under the name SEQUEL at the IBM San Jose research Facilities to be the data manipulation language for IBM's prototype relational model DBMS, System R. In 1980, the language was renamed SQL to avoid confusion with an unrelated hardware product called SEQUEL. SQL is used as the data manipulation language for IBM's current production offering in the relational DBMS arena, DB2. Most relational DBMSs, including MySQL, use a version of SQL as a data manipulation language.

In this chapter you also will learn how to assign data types for columns in the database; learn about a special type of value, called a null value; and see how such values are handled during database creation. Finally, you will learn how to load a database by adding data to the tables that are created.

A relational database is essentially a collection of tables. A relational database is perceived by the user as being just such a collection. (The phrase "perceived by the user" simply indicates that this is how things appear' to the user, not what the DBMS is actually doing behind the scenes,) You might wonder why this model is not called the "table" model, or something similar if a database is a collection of tables. Formally, these tables are called relations, and this is where the model gets its name.

Entities, Attributes, and Relationships:

There are some terms and concepts that are very important for you to know in the database environment. The terms entity, attribute, and relationship are fundamental when discussing databases. An entity is like a noun; it is a person, place, thing, or event. The entities of interest to a sales database, for example, are such things as customers, orders, and sales reps. The entities that are of interest to a school include students, faculty, and classes; a real estate agency is interested in clients, houses, and agents; and a used car dealer is interested in vehicles, customers, and manufacturers.

An attribute is a property of an entity. The term is used here exactly as it is used in everyday English. For the entity person, for example, the list of attributes might include such things as eye color and height. For a customer table, the attributes of interest are such things as first name, last name, address, city, and so on.

The final key term is relationship. A relationship is the association between entities. For example, there is an association between customers and sales reps. A sales rep is associated with all of his or her customers, and a customer is associated with his or her sales rep. Technically, you say that a sales rep is related to all of his or her customers, and a customer is related to his or her sales rep.

This particular relationship is called a one-to-many relationship because one sales rep is associated with many customers, but each customer is associated with only one sales rep. (In this type of relationship, the word many is used in a way that is different from everyday English; it might not always mean a large number. In this context, for example, the term many means that a sales rep may be associated with any number of customers. That is, one sales rep can be associated with zero, one, or more customers.)

How does a DBMS that follows the relational model handle entities, attributes of entities, and relationships between entities? Entities and attributes are fairly simple. Each entity has its own table. In a typical sales database, there is one table for sales reps, a separate table for customers, and so on. The attributes of an entity become the columns in the table. In the table for sales reps, for example, there is a column for the sales rep number, columns for the sales reps' names, and so on.

What about relationships? In a typical sales database there is a one-to-many relationships between sales reps and customers (each sales rep is related to the many customers he or she represents, and each customer is related to the one sales rep who represents the customer). How is this relationship implemented in a relational model database? The answer is by using common columns in two or more tables.  The CUSTOMER_ID column of the CUSTOMER table and the CUSTOMER_ID Column of the PURCHASES table are used to implement the relationship between purchases and customers. Given a CUSTOMER_ID, you can use these columns to determine all the products that customer has purchased.

 

With this background, a relation is essentially a two-dimensional table. There are certain restrictions that should be placed on relations. Each column should have a unique name, and entries within each column should "match" this column name. For example, if the column name is PRICE, all entries in that column must be price. Also, each row should be unique. After all, if two rows are identical, the second row doesn't provide any new information. For maximum flexibility, the order of the columns and rows should be immaterial. Finally, the table should be as simple as possible. To do this you can restrict each position to a single entry by not allowing multiple entries (or repeating groups) in an individual location in the table.

These ideas lead to the following definitions:

Definition: A relation is a two-dimensional table in which the entries in the table are:
1. Single-valued (each location in the table contains a single entry)
2. Each column has a distinct name (or attribute name)
3. All values in a column are values of the same attribute
4. The order of the rows and columns is immaterial
5. Each row contains a unique value. 
From that definition, you can say that a relational database is a collection of relations.

 

There is a commonly accepted shorthand representation of the structure of a relational database. You can write the name of the table and then, within parentheses, list all the columns (fields) in the table. For example, the database for Premiere Products can be written as follows:

CUSTOMER (CUSTOMER_ID, LAST_NAME, FIRST_NAME, DOB, PHONE)

PRODUCT_TYPES (PRODUCT_TYPE_ID, NAME)

PRODUCTS (PRODUCT_ID, PRODUCT_TYPE_ID, NAME, DESCRIPTION, PRICE)

PURCHASES (PRODUCT_ID, CUSTOMER_ID, QUANTITY)

EMPLOYEES (EMPLOYEE_ID, MANAGER_ID, FIRST_NAME, LAST_NAME, TITLE, SALARY)

Notice that there is some duplication of names the CUSTOMER_ID column appears in both the PURCHASES table and in the CUSTOMER table. If you write CUSTOMER_ID, how would a user of the DBMS or the DBMS itself know which CUSTOMER_ID column you are referring to? You need a way to associate the correct table with the column name. One common approach to this problem is to write both the table name and the column name, separated by a period. Thus, the CUSTOMER_ID column in the CUSTOMER table is written as CUSTOMER. CUSTOMER_ID; the CUSTOMER_ID column in the PURCHASES table is written as PURCHASES.CUSTOMER_ID. This technique of including the table name with the column name is known as qualifying the names. It is always acceptable to qualify data names, even if there is no possibility of confusion. If confusion can arise, however, it is essential to qualify the names.

The primary key of a table (or relation) is the column, or collection of columns, that uniquely identifies a given row. In the CUSTOMER table, for example, the CUSTOMER_ID uniquely identifies a given row. (Customer 6 occurs in only one row of the table, for instance.) Thus, the CUSTOMER_ID column is the table's primary key. Primary Keys usually are indicated by underlining the column (or collection of columns) that contains the primary key. The complete shorthand representation for this database is as follows, where the underlined column name indicates the table's primary key:

CUSTOMER (CUSTOMER_ID, LAST_NAME, FIRST_NAME, DOB, PHONE)

PRODUCT_TYPES (PRODUCT_TYPE_ID, NAME)

PRODUCTS (PRODUCT_ID, PRODUCT_TYPE_ID, NAME, DESCRIPTION, PRICE)

PURCHASES (PRODUCT_ID, CUSTOMER_ID, QUANTITY)

EMPLOYEES (EMPLOYEE_ID, MANAGER_ID, FIRST_NAME, LAST_NAME, TITLE, SALARY)

 

Why does the primary key for the PURCHASES table consist of two columns, instead of one?

No single column in the PURCHASES table uniquely identifies a given row. Two columns, PRODUCT_ID and CUSTOMER_ID, are required to create a unique row. When a table's primary key is made up of two or more columns, this is known as a compound or composite key.

A Note on Primary keys: Whenever possible, primary keys should be made up of an internally generated integer value. This is because primary keys once created for a row, should NEVER be changed. They should never be changed, because a change to the primary key of say the customer would also require a change to all of the tables that are related to it, such as orders. This is known as an Update Anamoly (an update to one table requires an update to one or more additional tables). This internally generated primary key is often referred to as a surrogate key.

Although, something like social security number on it?s face value look?s like a good primary key, it is not. Because it?s accurate value is outside the control of the DBMS, it can require a change if say the number was incorrectly inputted. If data is entered before the correction, we wind up with an Update Anamoly. A social security number does however make for a good Candiate Key. A candidate key is a value that has the potential to be used as a primary key, but is not the best choice.

Basic SQL DATA-TYPES (More MySQL data types covered in Chapter 5):

    Data-Type (Maximum Size)

 

Before you begin loading and accessing data in a table, you must describe the layout of each table to be contained in the database.

Describe the layout of the CUSTOMER table to the DBMS.

The SQL Create Command:
The SQL command used to describe the layout of a table is CREATE TABLE. The CREATE TABLE command is followed by the name of the table to be created, and then in parentheses by the names and data types of the columns that comprise the table. The data type indicates the type of data that can be contained in the column (for example, characters, numbers, or dates) as well as the maximum number of characters or digits. The rules for naming tables and columns vary slightly from one version of SQL to another. If you have any questions about naming tables or columns, consult your system's manual or your DBMS's online Help system. Some typical naming conventions are as follows:

Syntax:

CREATE TABLE <table-name> (column-name data-type[, column-name data-type], ?);

Example:

CREATE TABLE customer  (customer_id     INT PRIMARY KEY,
     last_name                  VARCHAR(50) NOT NULL,
     first_name                  VARCHAR(50) NOT NULL,
     dob                   DATE,
     phone                     VARCHAR(20));

 

Table Creation Rules:

1. The name cannot be longer than 18 characters.

2. The name must start with a letter.

3. The name can contain letters, numbers, and underscores (_).

4. The name cannot contain spaces. The names used in this text should work for any SQL implementation.

NULLS:
Occasionally, when a row is created a value for one or more of it's columns may not be known or may not be applicable. For example when creating a customer record the new customer may not have a work number. In SQL you handle this problem by using a special value to represent the situation. This special value is called a null data value, or simply a null. For each column in the table, you can choose whether to allow nulls.

Example:

CREATE TABLE customers
    (customer_id     INT PRIMARY KEY AUTO_INCREMENT,
     last_name                  VARCHAR(50) NOT NULL,
     first_name                  VARCHAR(50) NOT NULL,
     dob                   DATE,
     phone                     VARCHAR(20));

In the above example the columns with the NOT NULL (customer_id, last_name, first_name) cannot contain null values, the row that do not have the NOT NULL (dob, phone) can contain null values.

 

Dropping a Table:

Sometimes you will need to delete a table in a database, because it is no longer needed. The command to drop a table is drop table <table-name>.

Example:

DROP TABLE customers;

 

Note: Although I am using upper-case to denote SQL commands, this is not a requirement. It is simple a standard that was adopted so that SQL commands would stand out from Non-SQL commands. For example when SQL is being embedded in languages such as C or C++ which uses lower-case commands, SQL will standout.

 

The SQL-INSERT Command:

The command to create new rows in a table is the Insert statement. The values in an insert statement must be enclosed in single quotes (') if they are a CHAR, VARCHAR, TEXT data-type. If they are NUMERIC data-type, they do not need to be enclosed in quotes. For example even though zip_code contains only number, because it is declared as a CHAR type, it must be enclosed in single quotes.

Syntax:

INSERT INTO <table-name> [(col1, col2, col3, ?, colN)]
        VALUES (value1, value2, value3, ? valueN);

The part that is between brackets "[(col1, col2, col3, ?, colN)]" is optional, but it is highly recommended that you include this part because without out the column names, the values will simply be put into the table first value to first column, second value to second column and so forth. If you include the name of the columns, the insert statement will match the first value to first column you specify, the second value to the second column you specify, and so forth. This is particularly important, because at the time of writing the SQL statement, the rows and columns may match up properly without specifying the columns, but what very often happens is the DBA will add to a table. When this happens if you didn?t specify the columns your program will "Crash and Burn" unless you go to every INSERT statement in your program that is affected and make the necessary change. Given the size of many programs, this could be quite an undertaking. If you do specify the columns your program will continue to work without modifications. For this reason most companies will require you to include the column name in your INSERT statements. For this class they are also required, ANY submission of an INSERT statement without the column-names being specified will be returned to you as Unacceptable.

Examples:

1) Column-names NOT specified.

INSERT INTO customers VALUES (1, 'Jones', 'Mary', '1980-01-01', '555-1212');

2) Column-names specified.

INSERT INTO customers (last_name, first_name)
    VALUES    ('Smith', 'John');

Now let's assume the DBA (database administrator) is instructed to add a column after first for the middle initial. Our table now looks like this.

customers (customer_id, last_name, first_name, mid_init, dob, phone)

In this case the first SQL-Statement (Column-names NOT specified) will fail, because the insert statement will try to put '01-JAN-1980' into mid_init, city into address, state into city, zip_code into state (This will fail because zip is mid_init is CHAR(1) and the value is more than one character).  Next you will attempt to put an invalid date value '555-1212' into dob.

However the Second SQL-Statement (Column-names specified) will still work, because we specified which values go into which columns.

Inserting NULL:

Because we only defined customer_id, last, and first as NOT NULL columns, those three columns are the only three for which we must specify values.

Example:

INSERT INTO customers (customer_id, last_name, first_name)
        VALUES (2, 'Martin', 'Elyse');

The above is fine, because the other columns where not specified as NOT NULL, so no value is required for them.

 

Viewing data (SQL-SELECT):

The SQL command to retrieve and view data from a database is the SELECT statement.

Syntax:

SELECT [table|alias.]column-name1[, column-name2[, column-nameN]] | *

FROM table-name1 [alias][, table-name2 [alias][, table-nameN [alias]]]

[WHERE <logical condition> [AND|OR <logical-condintion]]

[ORDER BY column1[,column2[, columnN]]]

[GROUP BY column1[,column2[, columnN]]

[HAVING <logical condition> [AND|OR <logical-condintion]]

]

Examples:

  1. The simplest SQL-Statement.

    SELECT * FROM customers;

    Will return all the rows and columns from the "sales_rep" table and display them in a spreadsheet like format.

  2. A simple where clause.

    SELECT * FROM customers WHERE customer_id = 1;

    Will return only the row for customer number 1.

  3. Specify the columns to view and the order.

SELECT last_name, first_name FROM customers ORDER BY last_name;

Will return just the last name and first name from customer and we will order them by last name.

We will talk more about the other aspects of the SELECT statement, when we get into Advanced SQL in the next chapter and beyond.

Updating Data in a table (The SQL-UPDATE command):

The command in SQL for updating a table is the UPDATE command.

Syntax:

UPDATE table-name
SET column1 = value1[, column2 = value2[, columnN = valueN]]
[WHERE <logical condition> [AND|OR <logical-condintion]]

Example:

UPDATE customers
SET phone = '555-1234';

Will change everyone's phone number to '555-1212', probably not what we wanted to do.

UPDATE customers
SET phone = '555-1234'
WHERE customer_id = 2;

Will only change the phone number for customer 2.

Deleting rows of data (The SQL-DELETE command):

The command to delete rows of data from a table is the DELETE command.

Syntax:

DELETE FROM table-name
[WHERE <logical condition> [AND|OR <logical-condintion]]

 

Example:

DELETE FROM customers
WHERE customer_id = 2;

DELETE FROM customers;

This will remove ALL rows from the customers table, usually not a desired result, so most DELETE commands will usually have a WHERE clause.

Viewing the structure of an already existing table (SQL-DESCRIBE command):

The CREATE TABLE command clearly shows the structure of the table. The command indicates all the fields, data-types, and lengths of the fields. The CREATE TABLE command also indicates which fields cannot accept nulls.

When you work with an existing table you may forget the layout of the table, it is necessary to have a way to list out the structure of an existing table, to do this we have the SQL-DESCRIBE command.

Syntax:

SQL> DESCRIBE customer

MySQL Scripting:

Many times it is desired to save our SQL-statements to re-execute at a later date, or we may want to group all of our SQL-Statements together, such as saving all of our CREATE TABLE statements that we used to create our database. MySQL has command line interpreter called MySQL Interactive which allows us to run all of our SQL commands both interactively and from a file. MySQL actually does a lot more than this, as we will learn as the class progresses. SQL-Script files can be created with any editor, such as notepad. It is considered a standard to use a ".sql" extension for all SQL script files.

Once you log into MySQL interactively you execute a script file by prefixing it with "source" command.

Example:

MySQL> source world.sql