database

: The Relational Model

Purpose: Concepts, terminology. Express queries in relational algebra and relational calculus. Translate between relational algebra and SQL (Lesson 6).

e.g. Student, Courses, Grades relations (tables)

An attribute in an entity-relationship diagram has a domain, that is, a datatype and a range of valid values. For example, zip code is a five-digit positive number.

A Cartesian product of domains is a set containing all possible combinations of elements, one from each domain. For example, a chessboard presents the Cartesian product of the rows (a-h) with the columns (1-8), with one square for each combination b5, h2, etc.

A tuple is an element of a Cartesian product.

A relation is a subset of a Cartesian product. For example, the set of valid values for (city, state, zip code) form a relation. Note that this relation contains information, whereas the Cartesian product City x State x Zip contains no information.

In a relation, there are

no duplicate attribute names

no duplicate tuples

In a relation

degree = number of attributes

cardinality = number of tuples

order of attributes has no significance

order of tuples has no sigificance

Note: it is because the columns of a relation are named with distinct names that the order of the columns has no significance.

Note: a result set is a relation in which the both the columns and the tuples are ordered.

Note: a table is a physical representation of a relation. In a table, whether it is in a document on a hard drive, some sequence of columns and tuples will generally be evident.

A relational schema is the set of attributes (with their domains) in a relation.

Relational variables: when we write “a relation R”, we are using R as a variable which represents a specific relational schema and could take as its values or instances any table that is consistent with the relational schema.

Relational Algebra: five fundamental operations, two standard derived operations

· select(R, proposition) S(R, p)

· project(R, attribute set) P(R, A)

· union(R1, R2) R1+R2

· difference(R1, R2) R1-R2

· product(R1, R2) R1*R2

· join(R1, R2, comparison) J(R1, R2, p)

= S(R1*R2, p)

· divide(R1, R2) R1 / R2

= P(R1- (P(R1, att(R1) \ att(R2))*R2) - R1), att(R1) \ att(R2))

Aggregate functions in SQL can be represented in relational algebra by allowing the projection operator to project onto aggregate functions on attributes which are themselves projected out.

A proposition is a Boolean expression which evaluates to True or False for each tuple without requiring any data except that provided by the tuple. For example, on Grades relation, “grade > B” is a proposition. A proposition does not contain any “for all” or “there exists” quantifiers.

Joins come in various flavors:

· Theta join, equi-join (a particular type of theta join)

· Natural join, outer join, inner join

· Semi-join

Division—for example, identify all students who have taken all courses.

Relational Calculus (tuple oriented):

R = { FQAttSet | predicate }

where “FQAttSet” stands for a set of fully-qualified attribute names.

Relational Calculus (domain oriented):

R = { AttributeSet | predicate }

A predicate is a Boolean expression which may contain attributes not in the set “FQAttSet”. All such attributes will be quantified by a “for all” or “there exists”. For example, to get a list of names of student with at least one “A”, we take

FQAttSet = Student.name

predicate = “there exists Grade.courseID such that

Student.courseID = Grade.courseID and

Grade.grade = A”

Note closure: every operation in relational algebra or relational calculus produces a relation.

Fundamental Theorem: Given a set of relations RSet = {R1, ..., Rn}, a relation R can be derived from RSet by relational algebra if and only if R can be derived from RSet by relational calculus.

Keys in a Relation R

· A superkey is a set of attributes whose values uniquely identifies each tuple in any instance of R

· A candidate key is a superkey which does not contain another superkey

· A primary key is a specific candidate key

· An alternate key is a candidate key other than the primary key

· A foreign key is a set of attributes in R which matches a candidate key in a relation S, where S could be R

Note that a key is defined with respect to a relational variable, not an instance of the variable. Identification of a key must be supported by business rules.

No instance of R can establish that a set of attributes is a superkey for R. For example, it may happen that student names in a specific course are all different, but this would not imply that the student-name attribute is a superkey for a classlist relation. On the other hand, a single class in which two students have identical names is sufficient to establish that the student-name attribute is not a superkey.

A view is a derived relation. A view can always be defined by a query on base relations. The purpose of a view is to give restricted access to the database for both user convenience and data security.

The principle issue with a view is whether or not it permits updating the base relations from which it is derived. Updates are allowed if the defining query involves a single base relation and contains a candidate key of base relation.

A relational database is a set of normalized relations (Lesson 4).

Integrity in a relational database

· Entity Integrity—no attribute of a primary key can be Null

· Referential Integrity—a foreign key is either wholly Null or has a match

Nulls may be used to indicate incomplete data. The Interim Report 75-02-08 to the ANSI X3 (SPARC Study Group 1975) lists 14 different kinds of incomplete data that could appear as the result of queries or as attribute values. These types include overflows, underflows, errors and other problems trying to represent the real world within the limits of a computer.

not valid for this individual (e.g., maiden name of male employee)
valid, but does not yet exist for this individual (e.g., married name of female, unmarried employee).
exists, but not permitted to be logically stored (e.g. religion of this employee).
exists, but not knowable for this individual (e.g., last efficiency rating of an employee who worked for another company).
exists, but not yet logically stored for this individual (e.g. medical history of newly hired employee)

6. logically stored but subsequently logically deleted

logically stored, but not yet available.
available, but undergoing change (may be no longer valid).

change begun, but new values not yet computed
change incomplete, committed values are part new, part old, may be inconsistent
change incomplete, but part new values not yet committed
change complete but new values not yet committed.

available, but of suspect validity (unreliable)

possible failure in conceptual data acquisition
possible failure in internal data maintenance

available, but invalid

not too bad
too bad

secured for this class of conceptual data
secured for this individual object
secured at this time
derived from null conceptual data (any of the above).

Few Software Development Knowledge

 What are the benefits of OOD? o Object oriented design facilitates “Reusability”, “Extensibility” and “Modularity” of the software. We can divide the software into many components/classes so that “Abstraction” can be done on the changing part and non changing part to support the “changeability”. Plus Inheritance, Composition and polymorphism will always help in implementation.  Explain how the Garbage Collector works in C# o The GC of the C# (.Net rather) is a cool component in .NET framework. Being a C++ programmer, I truly understand the importance of it. I don’t need to worry about allocating /freeing the memory once I use it in .net. In c#, all the object memory management is done in “Managed heap. GC collects the unused memory based on metadata information which gives the memory layout of the created object. It performs the collection in two phases. Mark -> GC first finds the root of each object and traverse to the bottom of it adding each object and make a graph of it. It...

Salesforce Ten Not Ben

Search This Blog

database

: The Relational Model

Integrity in a relational database

Comments

Popular posts from this blog

MS interview

Amazon interview question Collection

Few Software Development Knowledge