Distinguish between the primary key and a candidate key. Provide an example of each.
What requirements must a two-dimensional table satisfy in order to be a relation? Provide an example of a table that is a relation.
The primary key is a candidate key that the IT professional has selected that acts to identify the rows in a table. The primary key is typically chosen based upon the amount of foreign key data that needs to be generated. The primary key is used for four distinct tasks, including identifying table rows, and then representing the rows into a relationship. The other two duties include organizing the relation of storage and for use in indexes to improve operational performance. A good use of a primary key would be a key internally generated "that has no meaning outside of the database system" (Chapple, 2012). This would then be used as an auto number to create a record identification, which would be automatically generated in specific order each time a new record is created by the user.
The candidate ...
This solution discusses primary and candidate keys, and also discusses the requirements needed for a two dimensional table in order to be a relation. References are provided for both questions.
Key benefit provided by the relational model over previous data
1. What is the key benefit provided by the relational model over previous data models?
2. What are three types of physical data dependence described by Codd in his paper?
3. What are the good/bad choices the first two RDB systems made?
4. How the two systems finally united to the current relational database systems. Think about query language, system model, choices on storage, index, query evaluation, etc. What do you think are the choices made in commercial database, why? (Consider 80/20 rule).
5. Use the cost formula to explain why blocked access or pre-fetching is good exercise.
6. Considering a join operation between two tables, one M pages, another N pages, running on a buffer pool with K pages. What is the number of I/Os using nested-loop join, block-nested loop join. What if a hot set algorithm is used or the DBMIN algorithm is used? What if the replacement policy is LRU or MRU?
7. Consider a 3-level tree structure index with maximum 64 entries per internal node. On a range query that retrieve 1000-1200 entries. What is the number of I/Os if the index is B-tree. What is the number of I/Os if the index is B+ tree. What is the number of I/Os if the index is a cluster index?
8. We say that the basic idea behind index is partition and labeling. What is the partition and labeling of a trie structure?
9. write the exhaustive node-split algorithm for R-tree, then analyze the complexity of the algorithm
10. Analyze the complexity of the node-split algorithms in the R*-tree paper. Compare the algorithms to the corresponding ones in the R-tree paper, illustrate under what circumstance the algorithms in the R*-tree paper will out-perform the ones in the R-tree paper, and under what circumstance they will not.
11. In a Z-curve on a 1024 x 1024 grid, the point with coordinates (1,1) is first, the point (1,2) is second, the point (2,1) is third, and so on. What are the coordinates of the 45th point? Of the 17,945th point?
12. Given you a set of nodes (in 2D space), that belongs to one R-tree node that is to be splitted, what is the results of the splitting, using the three algorithms (as presented in the R-tree paper) respectively?
13. Consider a scenario in which we want to use GIST to model the string matching operation, where containment relationship is a fussy sub-string (with gaps). Please specify what the key functions should be defined.
14. Given a table that has 10 columns, 100K tuples expended across 1000 pages, if we project on two columns, with duplicate elimination, and assume that there are 50 unique values on the combination of the two columns, what is number of I/Os in the best/worse cases?
15. Given two tables R1 and R2, one has 1000 tuples, 100 pages, the other has 2000 tuples, 20 pages. Perform join on the two tables with predicate "R1.a > 2 and R1.a = R2.b", what join algorithm will you use, if (1) there is no index at all. (2) there is B+ tree index on R1.a (3) there is B+ tree index on R2.b. (4) there is cluster B+ tree index on R1.a. (5) there is join index on R1.a and R2.b.
16. What does an operator do in the "open" phase? Consider operator sort, nested-loop join, block-nested-loop join, file scan, index scan.
17. What are the possible physical operator(s) that implement a join? Which are blocked, which are pipelined?
18. Give an example of ((A join B) join C) where the output of a sort-merge join for (A join B) can be fed as input to a simple merge join with C, with no intermediate sorting required. A,B,C are relations -- you pick the attributes and the join conditions to make the above example work out.
19. You have a database with a primary index on ID for Employee and Dept, and secondary index on Name for the Employee relation only. You have access methods available for a Nested Loops Join, Nested Indexed Loop Join, and a Grace Hash Join. Generate alternative plans for the following query:
FROM Employee E, Dept D
WHERE E.Dept = D.ID
AND D.Name = "Toys"
20. If you are to design an application-style benchmark for a text-oriented XML database, how would you configure the data set and what are the queries you may ask?