Share
Explore BrainMass

Key benefit provided by the relational model over previous data

1. What is the key benefit provided by the relational model over previous data models?
2. What are three types of physical data dependence described by Codd in his paper?
3. What are the good/bad choices the first two RDB systems made?
4. How the two systems finally united to the current relational database systems. Think about query language, system model, choices on storage, index, query evaluation, etc. What do you think are the choices made in commercial database, why? (Consider 80/20 rule).
5. Use the cost formula to explain why blocked access or pre-fetching is good exercise.
6. Considering a join operation between two tables, one M pages, another N pages, running on a buffer pool with K pages. What is the number of I/Os using nested-loop join, block-nested loop join. What if a hot set algorithm is used or the DBMIN algorithm is used? What if the replacement policy is LRU or MRU?
7. Consider a 3-level tree structure index with maximum 64 entries per internal node. On a range query that retrieve 1000-1200 entries. What is the number of I/Os if the index is B-tree. What is the number of I/Os if the index is B+ tree. What is the number of I/Os if the index is a cluster index?
8. We say that the basic idea behind index is partition and labeling. What is the partition and labeling of a trie structure?
9. write the exhaustive node-split algorithm for R-tree, then analyze the complexity of the algorithm
10. Analyze the complexity of the node-split algorithms in the R*-tree paper. Compare the algorithms to the corresponding ones in the R-tree paper, illustrate under what circumstance the algorithms in the R*-tree paper will out-perform the ones in the R-tree paper, and under what circumstance they will not.
11. In a Z-curve on a 1024 x 1024 grid, the point with coordinates (1,1) is first, the point (1,2) is second, the point (2,1) is third, and so on. What are the coordinates of the 45th point? Of the 17,945th point?
12. Given you a set of nodes (in 2D space), that belongs to one R-tree node that is to be splitted, what is the results of the splitting, using the three algorithms (as presented in the R-tree paper) respectively?
13. Consider a scenario in which we want to use GIST to model the string matching operation, where containment relationship is a fussy sub-string (with gaps). Please specify what the key functions should be defined.
14. Given a table that has 10 columns, 100K tuples expended across 1000 pages, if we project on two columns, with duplicate elimination, and assume that there are 50 unique values on the combination of the two columns, what is number of I/Os in the best/worse cases?
15. Given two tables R1 and R2, one has 1000 tuples, 100 pages, the other has 2000 tuples, 20 pages. Perform join on the two tables with predicate "R1.a > 2 and R1.a = R2.b", what join algorithm will you use, if (1) there is no index at all. (2) there is B+ tree index on R1.a (3) there is B+ tree index on R2.b. (4) there is cluster B+ tree index on R1.a. (5) there is join index on R1.a and R2.b.
16. What does an operator do in the "open" phase? Consider operator sort, nested-loop join, block-nested-loop join, file scan, index scan.
17. What are the possible physical operator(s) that implement a join? Which are blocked, which are pipelined?
18. Give an example of ((A join B) join C) where the output of a sort-merge join for (A join B) can be fed as input to a simple merge join with C, with no intermediate sorting required. A,B,C are relations -- you pick the attributes and the join conditions to make the above example work out.
19. You have a database with a primary index on ID for Employee and Dept, and secondary index on Name for the Employee relation only. You have access methods available for a Nested Loops Join, Nested Indexed Loop Join, and a Grace Hash Join. Generate alternative plans for the following query:
SELECT E.Name
FROM Employee E, Dept D
WHERE E.Dept = D.ID
AND D.Name = "Toys"
20. If you are to design an application-style benchmark for a text-oriented XML database, how would you configure the data set and what are the queries you may ask?

Attachments

Solution Preview

Hello

Please find the solution in the attached file.

1.
The relational model for database management is a database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar Codd.
Its core idea is to describe a database as a collection of predicates over a finite set of predicate variables, describing constraints on the possible values and combinations of values. The content of the database at any given time is a finite (logical) model of the database, i.e. a set of relations, one per predicate variable, such that all predicates are satisfied. A request for information from the database (a database query) is also a predicate.

Advantages of the relational model:

* It is extensively studied, proven in practice, and based on a formal theoretical model. Almost all of the things that are known about it are actually proven as mathematical theorems. The data manipulation paradigm is based on first order logic and is in full support of DatabaseIsRepresenterOfFacts.
* It offers an abstracted view of data. It was among the first major application of abstraction as a way to manage software complexity. It basically abstracts the physical structure of data storage, from the logical structure of data.
* It offers a declarative interface (relational calculus) for the specification of data manipulation, that is actually translated to an efficient (sometimes the most efficient) implementation, given a physical data layout and within reasonable heuristic limits.

2.
Physical data independence-- The ability to modify the physical scheme without causing application programs to be rewritten. Modifications at this level are usually to improve performance .Three types of Physical data independence
are----
- the capacity to change the internal schema without
having to change the conceptual (or external) schema
- internal schema may change to improve the performance
(e.g., creating additional access structure)
- easier to achieve logical data independence, because
application programs are dependent on logical structures

3.
A database management system (DBMS) is the software which controls the storage, retrieval, deletion, security, and integrity of data within a database. An RDBMS is a DBMS which manages a relational database. A relational database stores data in tables. Tables are organized into columns, and each column stores one type of data (integer, real number, character strings, date, ...). The data for a single "instance" of a table is stored as a row. For example, the Customer table would have columns such as CustomerNumber, FirstName, and Surname, and a row within that table would be something like {1701, "James", "Kirk"}.

The GOODS of a relational database system:

1.You use tables to create the relation.
2.Inside the tables are primary and alternative keys.
3.The primary keys create the relation between the tables with the alternative keys.
4.Accessing large results sets in smaller portions enables your application to display initial results earlier, increasing response time.
5.Performance is improved when a portion of a result set is required because less data is transmitted across the network.

The BADS
1.You can lose information if the keys over lap in the requests made by the creator.
2.Application developers need to understand that the underlying data can change between the times that data records are accessed via the cursor: previously retrieved records may have been deleted, records may have been inserted into previously retrieved portions of the result set, or previously retrieved records may have been modified.
3.Not all cursors are created equal. Some cursors only allow forward scrolling.
4.Cursors are a resource drain on the database because they are memory intensive.

4.
A database can be understood as a collection of related files. How those files are related depends on the model used. Early models included the hierarchical model (where files are related in a parent/child manner, with each child file having at most one parent file), and the network model (where files are related as owners and members, similar to the network model except that each member file can have more than one owner).

The relational database model was a huge step forward, as it allowed files to be related by means of a common field. In order to relate any two files, they simply need to have a common field, which makes the model extremely flexible.

The relational model specifies that the tuples of a relation have no specific order and that the tuples, in turn, impose no order on the attributes. Applications access data by specifying queries, which use operations such as select to identify tuples, project to identify attributes, and join to combine relations. Relations can be modified using the insert, delete, and update operators. New tuples can supply explicit values or be derived from a query. Similarly, queries identify tuples for updating or deleting.
5.
a)Regular Access Prefetching -- Tagged Next Block Lookahead,Exploit sequential access pattern
b)Stride ...

Solution Summary

The key benefits provided by the relational model over previous data are examined. How the two systems finally unite to the current relational database system is determined.

$2.19