nnf.p000066600003200000027000003330000547220560200130310ustar00liddlpstudent00000000000000.nr LL 6.5i
.ll 6.5i
.nr PO 1i
.po 1i
.nr PS 12
.ps 12
.nr VS 16
.vs 16
.EH ''''
.OH ''''
.EF ''%''
.OF ''%''
.EQ
delim %%
gsize 12
define subset '\(sb'
define notsubset '\(sb back 35 size +2 /~'
define subsetorequal '\(ib'
define notsubsetorequal '\(ib back 35 size +2 /~'
define semijoin '| back 30 down 2 times'
define join '| back 59 down 3 times back 67 |'
define oj '\(ci back 80 times'
define oj '\(ci back 80 times'
define loj 'C back 50 times back 38 |'
define notin '\(mo back 30 /~'
define includes '\(sp'
define includesorequal '\(ip'
define in '\(mo'
define ninter 'up 25 inter'
define nunion 'up 25 union'
define smallunion 'size 7 up 25 union'
define smallinter 'size 7 up 25 inter'
define nsum 'back 65 sum'
define square '\(sq'
define dash '\(em'
define md '- back 10 down 30 *'
define <-> '<- back 20 ->'
define notFD '-> back 38 /~'
define |= '| back 55 = back 50 ='
define ->> '-> back 20 ->'
define w->> '-> back 20 -> back 90 down 35 w'
define => 'roman {up 7 = back 30 up 7 = back 30 up 7 = back 70 >}'
define tilde-not 'size +8 down 35 "~"'
define <=> 'roman {< back 70 up 7 = back 30 up 7 = back 30 up 7 = back 30 up 7 = back 70 >}'
define |- '| back 29 - back 25 -'
define there-exists 'roman {fwd 25 | back 95 up 44 - back 52 up 9 - back 54 down 23 - back 10 ~}'
define not '\(no'
define /= '\(!='
define != '\(!='
define one-half '\(12'
define division '\(di'
define unknown '| back 81 _'
define emptyset '\(es'
define down-arrow '\(da'
define up-arrow '\(ua'
define smallcircle 'down 20 \(de'
define star '\(**'
define infinity '\(if'
define notimplies 'roman {= back 40 = back 70 / >}~'
define left-co '- back 20 - back 20 ('
define right-co ') back 18 - back 20 - back 67 up 5 roman >'
define disj '{roman {back 20 \e / back 20 ~}}'
define conj '{roman {/ fwd 9 up 5 \e}}'
define line '^ down 8 \(br ^'
define for-all '{roman {\e back 3 / back 48 up 16 size -2 - back 10 ~}}'
define Nquant 'size +1 | back 34 / back 33 size +1 | back 60 ~'
.EN
.ce
.B
.ps 14
AN IMPROVED NORMAL FORM FOR NESTED RELATIONS
.ps 12
.sp
.I
.ce 3
Wai Yin Mok\(dg
.FS
\(dg Much of the work on this paper
was done while this author was at Hong Kong Polytechnic.
.FE
Yiu-Kai Ng
David W. Embley
.R
.sp
.ce 2
Brigham Young University
Provo, Utah, USA
.sp 2
.ce
.I
ABSTRACT
.sp
.QP
We give a new normal form for nested relations that has significantly better
properties than nested normal forms suggested in earlier papers
.[ [
Ozsoyoglu Yuan new normal form
.],
.[
Ozsoyoglu Yuan normalization
.],
.[
Roth Korth nested normal form
.]].
Whereas earlier definitions allow redundancy in some cases, our new
Nested Normal Form (NNF) definition does not. Indeed, we show that our
NNF definition is equivalent to consistent nested relation schemes that have no
potential redundancy.
In addition, our NNF fully uses semantic constraints imposed by
functional dependencies. This permits greater attribute clustering and
leads to more flexibility in nested relation design.
These improvements present a strong case for the adoption of
NNF as defined here.
.QP
Keywords: Normalization theory, nested relations, nested normal form,
data redundancy, database design.
.sp
.NH
Introduction
.PP
Although normalization theory for flat relations has a long research history,
its extension to nested relations is much more recent.
Partition Normal Form (PNF)
.[ [
Roth Korth Silberschatz
.]],
which guarantees expected properties for
nesting and unnesting and for keys of nested relations, has been well accepted.
Indeed, nested relations are sometimes defined such that only PNF relations are
allowed
.[ [
Abiteboul Bidoit
.],
.[
Ozsoyoglu Yuan new normal form
.],
.[
Ozsoyoglu Yuan normalization
.],
.[
Roth Korth nested normal form
.]],
and for
.[ [
Abiteboul Bidoit
.]]
the definition predates PNF. Nested Normal Form (NNF)
.[ [
Ozsoyoglu Yuan new normal form
.],
.[
Ozsoyoglu Yuan normalization
.],
.[
Roth Korth nested normal form
.]],
however, which is designed to
remove redundancy and the update anomalies that accompany
redundancy, has not been widely accepted.
.PP
There are some difficulties in working out normal forms for nested relation
schemes.
Indeed, with respect to an arbitrary set of functional dependencies (FDs) and
multivalued dependencies (MVDs),
none of the definitions for NNF in the literature totally prevents redundancy.
In addition they all restrict attribute clustering more than is necessary.
Our purpose in writing this paper is to propose a definition for NNF that
completely prevents redundancy with respect to any given set of FDs and MVDs
and, at the same time, permits greater flexibility
in clustering attributes into nested relation schemes.
.PP
It appears that
NNF was originally devised as an MVD analogy to Third Normal Form
(3NF) because, as defined in
.[ [
Ozsoyoglu Yuan new normal form
.],
.[
Ozsoyoglu Yuan normalization
.],
.[
Roth Korth nested normal form
.]],
NNF has neither partial nor transitive MVDs.
There are, however, some additional considerations, and in attempting to
address these additional considerations, these earlier definitions of NNF
have both gone too far and not far enough. They go too far because they
prevent some schemes from being generated that can never have redundant
relations and thus overly restrict desirable flexibility. They do not go far
enough because they permit some generated schemes that allow redundant relations
even when all partial and transitive MVDs as defined in
.[ [
Ozsoyoglu Yuan new normal form
.],
.[
Ozsoyoglu Yuan normalization
.],
.[
Roth Korth nested normal form
.]]
have been removed.
.PP
What makes NNF definitions complex is that FDs must be considered as well,
and they cannot simply be treated as MVDs. In
.[ [
Ozsoyoglu Yuan normalization
.]]
and
.[ [
Roth Korth nested normal form
.]]
the need to consider FDs was recognized, and improvements were made by adding
FD semantics. These improvements, however, are insufficient.
It turns out, as we show in this paper, that rather than try to improve
these NNF definitions, a better approach is to
abandon the 3NF point of view about partial and transitive dependencies
and start over from the point of view of redundancy removal.
Here, we take this redundancy-removal point of view and
argue that our proposed NNF definition achieves the desired result. It
disallows nested relation schemes that permit valid relations with
redundancy, and it does not overly restrict attribute clustering.
.PP
We present our arguments leading to these results as follows.
In Section 2, we provide our basic definitions for nested relations. Like
.[ [
Abiteboul Bidoit
.],
.[
Ozsoyoglu Yuan new normal form
.],
.[
Ozsoyoglu Yuan normalization
.],
.[
Roth Korth nested normal form
.]],
we define our nested relations to be in PNF. In Section 2
we also give carefully specified redundancy definitions.
As illustrations for our redundancy definitions, we give examples to
show that none of the earlier definitions of NNF fully prevents redundancy.
.PP
In Section 3, we present our NNF definition, which consists of three conditions
that must be satisfied. As we illustrate our definition,
we also compare it to earlier NNF definitions and show that ours provides
greater flexibility in how attributes may be clustered in nested relation
schemes.
.PP
We present a theorem in Section 4 guaranteeing that the first two conditions
of our NNF definition prevent redundancy. In Section 5, we investigate the
converse of this theorem. We are able to show that if a nested
relation scheme is consistent with a given set of MVDs and FDs and
there is no potential redundancy,
then the nested relation scheme satisfies the first two conditions of
our NNF definition.
.PP
The third condition of our NNF definition
has nothing to do with redundancy, but instead removes an unwanted
structural characteristic of nested relations, namely embedded nested relations
with at most one tuple, which we call singleton buckets.
This is in harmony with earlier definitions of NNF
that also disallow singleton buckets
.[ [
Ozsoyoglu Yuan normalization
.],
.[
Roth Korth nested normal form
.]].
In Section 6 we prove that our third condition does indeed prevent singleton
buckets.
.PP
In Section 7 we present our conclusions. We also mention and reference some
related work that further illuminates some of the properties of our NNF
definition. The discussion includes some work we have done on algorithms
for generating NNF schemes, especially for practical cases and cases
where the universal relation assumption may not hold. It also includes
some work we have completed on unifying normalization theory in which
we show that Fourth Normal Form (4NF), as defined by Fagin
.[ [
Fagin new normal form
.]],
is a special case of our NNF definition,
and that Boyce-Codd Normal Form (BCNF) is also a special case when
we limit the dependencies to FDs.
.NH
Basic Definitions and Properties
.NH 2
Nested Relations
.PP
A nested relation allows each tuple component to be either atomic or
another nested relation, which may itself be nested several levels deep.
As in
.[ [
Abiteboul Bidoit
.],
.[
Ozsoyoglu Yuan new normal form
.],
.[
Ozsoyoglu Yuan normalization
.],
.[
Roth Korth nested normal form
.]],
we are only interested in nested relations that are in PNF.
Thus, in a nested relation, there can never be distinct
tuples that agree on the atomic attributes of either the nested relation itself
or of any nested relation embedded within it
.[ [
Atzeni DeAntonellis Relational Database Theory
.]].
.sp
.LP
\fBDefinition 1.\fR
Let %U% be a set of attributes.
A \fInested relation scheme\fR is recursively defined as follows:
.IP (1)
If %X% is a nonempty subset of %U%, then %X% is a
\fInested relation scheme\fR over the set of attributes %X%.
.IP (2)
If %X%, %X sub 1%, ..., %X sub n% are pairwise disjoint, nonempty subsets of
%U%, and %R sub 1%, ..., %R sub n% are nested relation schemes over
%X sub 1%, ..., %X sub n% respectively,
then %X% %(R sub 1 )*% ... %(R sub n )*% is a \fInested relation scheme\fR over
%XX sub 1% ... %X sub n%. %square%
.sp
.LP
\fBDefinition 2.\fR
Let %R% be a nested relation scheme over a nonempty set of
attributes %Z%. Let the domain of an attribute %A~in~Z% be denoted by %dom(A)%.
A \fInested relation over R\fR is recursively defined as follows:
.IP (1)
If %R% has the form %X% where %X% is a set of attributes {%A sub 1%, ...,
%A sub n%}, %n~>=~1%, then %r% is a \fInested relation over
R\fR if \fIr\fR is a (possibly empty) set of functions
{%t sub 1%, ..., %t sub m%} where each function %t sub i%, %1~<=~i~<=~m%,
maps %A sub j% to an element in %dom(A sub j )%, %1~<=~j~<=~n%.
.IP (2)
If %R% has the form %X% %(R sub 1 )*% ... %(R sub m )*%, %m~>=~1%, where %X%
is a set of attributes {%A sub 1%, ..., %A sub n%}, %n~>=~1%, then %r% is
a \fInested relation over R\fR if
.RS
.IP (a)
\fIr\fR is a (possibly empty) set of functions {%t sub 1%, ...,
%t sub p%} where each function %t sub i%, %1~<=~i~<=~p%, maps %A sub j%
to an element in %dom(A sub j )%, %1~<=~j~<=~n%, and maps %R sub k%
to a nested relation over %R sub k%, %1~<=~k~<=~m%, and
.IP (b)
%t sub i~in~r% and %t sub j~in~r% and %t sub i (X)% = %t sub j (X)% implies
%t sub i% %=% %t sub j%, %1~<=~i,~j~<=~p%.
.RE
.LP
Each function of a nested relation %r% over nested relation scheme %R%
is a \fInested tuple\fR of %r%. %square%
.sp
.LP
\fBExample 1.\fR
Figure 1 shows a nested relation. Its scheme is
\fIDept Chair\fR (\fIProf\fR (\fIHobby\fR)* (\fIMatriculation\fR (\fIStudent\fR
(\fIInterest\fR)* )* )* )* and it contains two nested tuples. As in
.[ [
Abiteboul Bidoit
.]],
we draw a bucket for each embedded nested relation. Each bucket also contains
nested tuples of its own. For example, <\fIYoung\fR, {\fIChess\fR,
\fISoccer\fR}> and <\fIBarker\fR, {\fISkiing\fR}> are nested tuples
in the first bucket under the embedded nested relation scheme
\fIStudent\fR (\fIInterest\fR)*.
Notice that, as required, PNF is satisfied. Thus, the values
for the atomic attributes, \fIDept Chair\fR, differ, and in each bucket
the atomic values differ. %square%
.sp
.LP
\fBDefinition 3.\fR
Let %R% be a nested relation scheme. Let %r% be a nested relation on %R%.
The \fItotal unnesting\fR of %r% is recursively defined as follows:
.IP (1)
If %R% has the form %X%, then %r% is the total unnesting of %r%.
.IP (2)
If %R% has the form %X% %(R sub 1 )*% ... %(R sub n )*%, where %X sub i%
is the set of attributes in %R sub i%, %1~<=~i~<=~n%,
then the total unnesting of
%r% = { %t% | there exists a nested tuple %u~in~r% such that %t(X)% =
%u(X)% and %t(X sub i )% is a
tuple in the total unnesting of %u(R sub i )%, %1~<=~i~<=~n%. } %square%
.sp
.LP
\fBDefinition 4.\fR
Let %R% be a nested relation scheme. Let %r% be a nested relation on %R%.
Let %t% be a nested tuple of %r%.
The \fItotal unnesting\fR of %t% is defined as the total unnesting of
%q%, where %q% is a nested relation containing the single nested tuple %t%.
%square%
.sp
.LP
\fBExample 2.\fR
Figure 2 shows the total unnesting of the nested relation in Figure 1.
Observe that the first two tuples in the total unnesting contain the
total unnesting of the nested tuple <\fIYoung\fR, {\fIChess\fR, \fISoccer\fR}>.
%square%
.NH 2
Scheme Trees
.PP
We can graphically represent a nested relation scheme by a tree, called a
scheme tree.
A scheme tree captures the logical structure of a nested relation scheme
and explicitly represents a set of multivalued dependencies (MVDs).
Scheme trees have been used for earlier definitions of NNF
.[ [
Ozsoyoglu Yuan new normal form
.],
.[
Ozsoyoglu Yuan normalization
.],
.[
Roth Korth nested normal form
.]].
We use them here for the same purpose.
.sp
.LP
\fBDefinition 5.\fR
A \fIscheme tree\fR %T% corresponding to a nested relation scheme
%R% is recursively defined as follows:
.IP (1)
If %R% has the form %X%, then %T% is a single node scheme tree whose
root node is the set of attributes %X%.
.IP (2)
If %R% has the form %X% %(R sub 1 )*% ... %(R sub n )*%, then
the root node of %T% is the set of attributes %X%, and
a child of the root of %T% is the root of the scheme tree %T sub i%,
where %T sub i% is the corresponding scheme tree for the nested
relation scheme %R sub i%, %1~<=~i~<=~n%. %square%
.sp
.PP
The one-to-one correspondence between a scheme tree and a nested
relation scheme along with the definition of a nested relation
scheme impose several properties on a scheme tree. Let %T% be a scheme tree.
We denote the set of attributes in %T% by %Aset(T)%. Observe that the atomic
attributes of a nested relation scheme, at any level of nesting,
constitute a node in a scheme tree.
Observe further that since Definition 1 requires nonempty sets of
attributes, every node in %T% consists of a nonempty
set of attributes. Furthermore, since the sets of attributes corresponding
to nodes in %T% are pairwise disjoint and include all the attributes of %T%,
the nodes in %T% are pairwise disjoint, and their union is %Aset(T)%.
.PP
Let %N% be a node in %T%. Notationally, %Ancestor(N)% denotes
the union of attributes in all ancestors of %N%, including %N%.
Similarly, %Descendent(N)% denotes the union of attributes in all
descendants of %N%, including %N%.
.PP
In a scheme tree %T% each edge %(V,~W)%, where %V% is the parent of %W%,
denotes an MVD %Ancestor(V)% %->>% %Descendent(W)%. Notationly, we use
%MVD(T)% to denote the union of all the MVDs represented by the edges in %T%.
By construction, each MVD in %MVD(T)% is satisfied in the total unnesting of any
nested relation for %T%. Since FDs are also of
interest, we use %FD(T)% to denote any set of FDs equivalent to all FDs %X~->~Y%
implied by a given set of FDs and MVDs over a set of attributes %U% such that
%Aset(T)~subsetorequal~U% and %XY~subsetorequal~Aset(T)%. (Note that because
a set of FDs %F% together with a set of MVDs %M% can imply FDs not implied by
%F% alone, %FD(T)%, in general,
is not equivalent to the set of FDs in %F% whose left-hand side is
a subset of %Aset(T)% and whose right-hand side is restricted to %Aset(T)%.)
.sp
.LP
\fBExample 3.\fR
Figure 3 shows the scheme tree %T% for the scheme of the nested relation in
Figure 1. Figure 3 also gives the set of attributes in %Aset(T)% and
the set of dependencies in %MVD(T)%. Observe that each of the MVDs in
%MVD(T)% is satisfied in the unnested relation in Figure 2. %square%
.NH 2
Data Redundancy
.PP
Data redundancy is a concern in database design. Redundant data
leads to higher storage and access cost. It also leads to update anomalies,
forcing multiple copies of the same data value to be updated when one copy
changes, and it leads to data inconsistency if all copies do not agree.
.PP
Except in rare cases, such as
.[ [
Vincent Srinivasan
.]],
papers and text books on normalization fail to provide rigorous definitions
for redundancy and thus also fail to prove that normalization removes redundancy
as expected. Offered instead, are motivating examples to illustrate redundancy
removal. Thus, in the vast body of research literature on normalization,
we mostly only have rigorous syntactic justifications for
normalization; what we are missing are rigorous semantic justifications.
Besides only providing for syntactic characterizations, a
danger of not treating redundancy formally is that
the examples may be misleading. Indeed, as we show below, the
definition for 4NF found in most text books does not detect potential
redundancy for all
cases even though some readers of these books are led to believe that it does.
.PP
In creating definitions for redundancy, we should try to find
simple and intuitive characterizations, but creating a
simple and intuitive definition for redundancy is more difficult than one might
at first think. Any definition will involve a sophisticated statement,
and there are many possible approaches one might use. Our notion of redundancy
is based on the idea that an atomic value %v% in a nested or
flat relation is redundant if we can erase %v%, and then from what remains
and from a single FD or MVD that holds, determine what %v% must have been.
.PP
The way we define ``holds'' is important. Here, we adapt Fagin's definition
.[ [
Fagin new normal form
.]],
and we explain it thoroughly before proceeding with our definition of
redundancy.
.sp
.LP
\fBDefinition 6.\fR
Let %U% be a set of attributes. Let %M% be a set of MVDs over %U% and
%F% be a set of FDs over %U%.
Let %T% be a scheme tree such that %Aset(T)~subsetorequal~U%.
An MVD %X~->>~Y% \fIholds\fR for %T% with respect
to %M% and %F% if %X~subsetorequal~Aset(T)% and
there exists a set of attributes %Z~subsetorequal~U% such that
%Y~=~Z~smallinter~Aset(T)% and %M~smallunion~F% implies %X~->>~Z%
on %U%. An FD %X~->~Y% \fIholds\fR for %T% with respect
to %M% and %F% if %XY~subsetorequal~Aset(T)% and %M~smallunion~F% implies
%X~->~Y% on %U%. %square%
.sp
.PP
This definition is motivated by the following Lemma, which is Theorem 5 in
.[ [
Fagin new normal form
.]].
For the sake of completeness and because we later use its corollary,
we include a proof here.
.LP
\fBLemma 1.\fR
(Fagin
.[ [
Fagin new normal form
.]])
Let %U% be a set of attributes and %R~subsetorequal~U%.
Let %M% be a set of MVDs over %U% and %F% be a set of FDs over %U%.
Let %X~subsetorequal~R%, %Z~subsetorequal~U%, and %Y% = %Z~smallinter~R%.
If %M~smallunion~F% implies %X~->>~Z% on %U%, then
%M~smallunion~F% implies %X~->>~Y% on %R%.
.LP
\fBProof.\fR
Let %r sub U% be a relation over %U% that satisfies %X~->>~Z% and let
%r sub R% = %pi sub R (r sub U )%. Let %t sub 1% and %t sub 2% be
tuples in %r sub R% such that %t sub 1 (X)% = %t sub 2 (X)%.
Since %r sub R% = %pi sub R (r sub U )%, there exist tuples
%t' sub 1% and %t' sub 2% in %r sub U% such that
%t' sub 1 (R)% = %t sub 1% and %t' sub 2 (R)% = %t sub 2%.
Since %r sub U% satisfies %X~->>~Z%,
there exists a tuple %t' sub 3% in %r sub U% such that %t' sub 3 (XZ)% =
%t' sub 1 (XZ)% and %t' sub 3 (U~-~(XZ))% = %t' sub 2 (U~-~(XZ))%.
Consider the tuple %t sub 3% in %r sub R% where %t sub 3% = %t' sub 3 (R)%.
Since %Y~subsetorequal~Z%, %t sub 3 (XY)% = %t sub 1 (XY)% and
%t sub 3 (R~-~(XY))% = %t sub 2 (R~-~(XY))%.
Hence, %r sub R% satisfies %X~->>~Y%. %square%
.sp
.LP
\fBCorollary.\fR
If %X~->>~Y% holds for %R%, then there
exists a set of attributes %Z~subsetorequal~U% such that
%Y% = %Z~smallinter~R% and %M~smallunion~F% implies %X~->>~Z%.
.LP
\fBProof.\fR
Suppose not, then for every set of attributes %Z~subsetorequal~U%,
%Y% %/=% %Z~smallinter~R% or %M~smallunion~F% does not imply %X~->>~Z%.
But %Y% = %R~smallinter~Z%, so for every set of attributes %Z~subsetorequal~U%,
%M~smallunion~F% does not imply %X~->>~Z%. However, let %Z% = %X%, and since
%X~->>~X% is trivial, %M~smallunion~F% implies %X~->>~Z% %dash% a contradiction.
%square%
.sp
.LP
\fBExample 4.\fR
Figure 4 shows a given set of attributes %U% and a given set of FDs %F% over %U%
and a given set of MVDs %M% over %U%.
All the FDs in %F% in Figure 4 hold for the scheme tree %T% in Figure 3, as
do all the FDs implied by %M~smallunion~F%.
Not all the MVDs in %M% hold for %T%. In particular, neither %Hobby% %->>%
%Hobby-Equipment% nor %Prof% %->>% %Hobby% %Hobby-Equipment% hold for %T%.
Since %Hobby% %Hobby-Equipment% %smallinter% %Aset(T)% = %Hobby%, however,
%Prof% %->>% %Hobby% does hold for %T%.
Although %Prof% %->>% %Hobby% does hold for %T%, observe that it
is not implied by %M~smallunion~F% on %U%. %square%
.sp
.PP
As illustrated in Example 4, certain
MVDs hold for a relation scheme even when they are not implied by a given
set of FDs and MVDs. It is those that hold that are of interest to us.
.sp
.PP
We now return to our task of defining redundancy. Since our definition
depends on the validity of a nested relation, however,
we must first define what it means
for a relation to be valid for a given set of FDs and MVDs.
.sp
.LP
\fBDefinition 7.\fR
Let %U% be a set of attributes.
Let %M% be a set of MVDs over %U% and %F% be a set of FDs over %U%.
Let %T% be a scheme tree such that
%Aset(T)~subsetorequal~U%. Let %r% be a nested relation on %T%.
Nested relation %r% is \fIvalid\fR with respect to %M~smallunion~F% if in the
total unnesting of %r%, every FD and every MVD that holds for %T% with
respect to %M% and %F% is satisfied. %square%
.sp
.PP
We now define redundancy. The definition has two parts: FD redundancy and
MVD redundancy. The pre-conditions for both FD redundancy and MVD redundancy
are identical except for a reference to an FD in one case and an MVD in the
other case. Rather than repeat the lengthy pre-conditions, we combine
both cases and use the symbol ``%md%'' to represent either an FD or an MVD.
.sp
.LP
\fBDefinition 8.\fR
Let %U% be a set of attributes.
Let %M% be a set of MVDs over %U% and %F% be a set of FDs over %U%.
Let %T% be a scheme tree such that %Aset(T)~subsetorequal~U%. Let
%XY~subsetorequal~Aset(T)%, and let %X~md~Y% be an FD or an MVD
that holds for %T% with respect to %M% and %F% and has
an attribute %A~in~Y% and %A~notin~X%.
Let %r% be a non-empty nested relation on %T%
that is valid with respect to %M~smallunion~F%.
Let %S% be the subtree of %T% that contains %A% as an atomic attribute, and
let %s sub 1%, ..., %s sub n% be the nested relations over %S% in %r%.
Let %u sub 1% and %u sub 2% be distinct nested tuples of
%s sub i% and %s sub j% respectively, %1~<=~i,j~<=~n%,
such that %u sub 1 (A)% = %v%, %u sub 2 (A)% = %v'%, and %v% = %v'%.
(Note that %i% = %j% is possible so that %s sub i% and %s sub j% may either be
the same nested relation under %S% or may be different nested relations
under %S%.) Let %t sub 1% and %t sub 2% be distinct tuples in the
total unnesting of %r% such that %t sub 1 (Aset(S))% and %t sub 2 (Aset(S))%
are tuples in the total unnesting of %u sub 1% and %u sub 2% respectively.
.IP \(bu
FD redundancy, when %X~md~Y% is %X~->~Y%:
If %t sub 1 (X)% = %t sub 2 (X)%, then atomic value %v%
is a \fIredundant atomic value\fR in %r% caused by %X~->~Y%.
.IP \(bu
MVD redundancy, when %X~md~Y% is %X~->>~Y%:
If %t sub 1 (X)% = %t sub 2 (X)%, %t sub 1 (Y)% = %t sub 2 (Y)%, and
%t sub 1 (Z)% %/=% %t sub 2 (Z)% where %Z% = %Aset(T)~-~(XY)%, then atomic
value %v% is a \fIredundant atomic value\fR in %r% caused by %X~->>~Y%.
%square%
.sp
.LP
\fBExample 5.\fR
Let %Student% %->% %Dept% and consider the nested relation and its
total unnesting in Figure 5. Both appearances of %Math% are redundant
in both the nested and unnested relation.
We can see this formally as follows. Let %t sub 1% be the third tuple and
%t sub 2% be the last tuple in the unnested relation.
Now we have %t sub 1 (Student)% = %t sub 2 (Student)%, and thus, %Math%
in the third tuple of the unnested relation is redundant. Since %Math% in the
third tuple of the unnested relation comes from the first nested tuple in
the nested relation, %Math% in the first nested tuple of the nested relation
is redundant. By reversing %t sub 1% and %t sub 2%, we can formally see that
%Math% in the second
nested tuple of the nested relation and in the last tuple of the unnested
relation are also redundant. %square%
.sp
.PP
It is possible for a value not to be redundant in a nested relation and yet
be redundant in the total unnesting of the relation. Indeed, this is often the
reason we create nested relations %dash% to remove redundant values.
.sp
.LP
\fBExample 6.\fR
Suppose %Student% %->% %Interest% and we allow students to have multiple
majors. Now consider the nested relation and its total unnesting in
Figure 6. Observe that %Skiing% is redundant in the unnested relation.
However, in the nested relation, %Skiing% is
\fInot\fR redundant because it appears only once. %square%
.sp
.LP
\fBExample 7.\fR
Let %Prof% %->>% %Dept% %Student% and %Prof% %->>% %Hobby% %Hobby-Equipment%
and consider the flat relation in Figure 7. Since the scheme of the
relation in Figure 7 is %Prof% %Student% %Hobby% and
%Prof% %->>% %Dept% %Student% and %Prof% %->>% %Hobby% %Hobby-Equipment%,
by Lemma 1 %Prof% %->>% %Student% and %Prof% %->>% %Hobby% hold for the scheme
%Prof% %Student% %Hobby%.
But now, all the data values under %Student% and %Hobby% are redundant, as can
be formally seen by appropriately picking two distinct tuples and choosing which
attribute and value to consider.
For example, let %t sub 1% be the first tuple and %t sub 2% be the second
tuple, then %Young% in %t sub 1% is redundant since
%t sub 1 (Prof)% = %t sub 2 (Prof)%,
%t sub 1 (Student)% = %t sub 2 (Student)%,
and %t sub 1 (Hobby)% %/=% %t sub 2 (Hobby)%.
.LP
As an aside,
we observe here that by the common definition of 4NF found in most text books
(e.g.,
.[ [
Korth Silberschatz 1991
.],
.[
Maier
.]])
the relation scheme in
Figure 7 is in 4NF. This is because no nontrivial MVD, given or implied,
applies to the scheme, where applies means that the set of attributes
that constitutes the MVD is a subset of the scheme. In particular, for
Example 7, neither %Prof% %->>% %Student% nor %Prof% %->>% %Hobby% is
implied by or is in the given set of MVDs {%Prof% %->>% %Dept% %Student%,
%Prof% %->>% %Hobby% %Hobby-Equipment%}. According to
the original definition given by Fagin
.[ [
Fagin new normal form
.]],
however, the relation scheme in
Figure 7 is not in 4NF. Fagin's definition not only considers all nontrivial
MVDs that are given or implied (without regard to the scheme under
consideration), but also the MVDs that hold when the scheme is considered.
%square%
.sp
.LP
\fBExample 8.\fR
To show an example of a nested relation (with embedded relations) that has
redundancy caused by an MVD, we present one more example of redundancy.
Let %Prof% %->>% %Hobby% and %Hobby% %->>% %Prof% and consider the nested
relation and its total unnesting in Figure 8. Based on the
MVD %Hobby% %->>% %Course%, which holds for the nested relation scheme in
Figure 8, all the values under %Course%
in the nested relation are redundant. We can formally see, for example,
that the last %Discrete~Str~II% value under %(Course)*% is redundant by letting
%t sub 1% be the last tuple and
%t sub 2% be the 4th tuple in the unnested relation. Thus,
%t sub 1 (Hobby)% = %t sub 2 (Hobby)%,
%t sub 1 (Course)% = %t sub 2 (Course)%,
and %t sub 1 (Prof)% %/=% %t sub 2 (Prof)%. %square%
.sp
.PP
Definition 8 tells us what it means for an individual atomic value to be
redundant in a nested relation for an FD or MVD that holds. Our next definition
ties together the notion of a redundant data value in a nested relation and
the notion of
a nested relation scheme that permits valid nested relations that contain
redundancy. It is this definition that allows us to later show that our
definition for NNF prevents redundancy.
.sp
.LP
\fBDefinition 9.\fR
Let %U% be a set of attributes.
Let %M% be a set of MVDs over %U% and %F% be a set of FDs over %U%.
Let %T% be a scheme tree such that %Aset(T)~subsetorequal~U%.
%T% is said to have \fIpotential redundancy\fR with respect to %M~smallunion~F%
if there exists a redundant atomic value in any valid nested relation
for %T% caused by either
an FD or an MVD that holds for %T% with respect to %M% and %F%. %square%
.sp
.LP
\fBExample 9.\fR
Since the nested relations in Figures 5, 7 and 8 all have redundancy,
the nested relation schemes in Figures 5, 7 and 8 are all said to have
potential redundancy with respect to the FDs and MVDs given for the examples.
%square%
.NH
Nested Normal Form
.PP
We motivate the need for a new definition for NNF by making two
observations about the examples we have presented.
First, if we are given the FDs and MVDs in Figure 4,
none of the earlier definitions of NNF allow the scheme of the nested relation
in Figure 1, which is also equivalent to the scheme tree %T%
in Figure 3. They therefore do not allow the nested relation in Figure 1
even though it is a good clustering for this application and has no redundancy.
For a scheme tree %T% to be in the NNF as defined in all the earlier definitions
.[ [
Ozsoyoglu Yuan new normal form
.],
.[
Ozsoyoglu Yuan normalization
.],
.[
Roth Korth nested normal form
.]],
%T% must satisfy four conditions. It turns out that
%T% in Figure 3 does not satisfy the
fourth condition for any of these previous NNF definitions.
One requirement of the fourth condition insists that the root of a
scheme tree be the left-hand side of a reduced nontrivial MVD, but
all (implied) MVDs with \fIDept Chair\fR as the left-hand side are
trivial. This is not the only violation, but the other technicalities
are more difficult to explain without the full details.
.PP
Second, all the earlier definitions of NNF
.[ [
Ozsoyoglu Yuan new normal form
.],
.[
Ozsoyoglu Yuan normalization
.],
.[
Roth Korth nested normal form
.]]
allow the scheme of the nested relation in Figure 8, but, as pointed out in
Example 8, the nested relation has redundancy. We can see that the earlier
definitions allow the scheme of the nested relation in Figure 8 as follows.
Let %T% be the scheme tree for the nested relation scheme in Figure 8,
and assume we are given the set of MVDs, %M% = {%Prof% %->>% %Hobby%,
%Hobby% %->>% %Prof%}, and the empty set of FDs.
When there are no FDs, all three earlier NNF definitions
are equivalent. Now, observe that %MVD(T)% = {%Prof% %->>% %Hobby%,
%Prof% %->>% %Course%}. Since %M% implies %MVD(T)%, the first condition
of their NNF definitions is satisfied.
Since %M% does not imply an MVD with a left-hand side that is a proper subset
of %Prof%, %T% has no partial MVDs, and thus their second condition is
satisfied. Since %Hobby% %->>% %Prof%, %T% has no
transitive MVDs, and thus their third condition is satisfied.
Since each node in the scheme tree for Figure 8 is a single attribute, there
can be no decomposition of nodes, and thus their fourth condition is satisfied.
.PP
We now give our definition for Nested Normal Form.
.sp
.LP
\fBDefinition 10.\fR
Let %U% be a set of attributes.
Let %M% be a set of MVDs over %U% and %F% be a set of FDs over %U%.
Let %T% be a scheme tree such that %Aset(T)~subsetorequal~U%.
%T% is in \fINested Normal Form (NNF)\fR with respect to %M~smallunion~F%
if the following conditions are satisfied.
.IP (1)
If %D% is the set of MVDs and FDs that hold for %T%,
then %D% is equivalent to %MVD(T)% %smallunion% %FD(T)% on %Aset(T)%.
.IP (2)
For each nontrivial FD %X~->~A% that holds for %T%,
%X~->~Ancestor(N sub A )%, where %N sub A% is the node in %T% that contains %A%.
.IP (3)
For every node %N~in~T%, if %Ancestor(N)% %->% %Y% holds for %T%,
%Y% %subsetorequal% %Ancestor(N)%. %square%
.sp
.LP
\fBExample 10.\fR
Suppose we are given %U%, %F%, and %M% as in Figure 4.
Then the scheme tree %T% in Figure 3 is in NNF. We can see this from
our definition as follows. We have observed in Example 8 that
%Hobby% %->>% %Hobby-Equipment% does not hold for %Aset(T)%. The set of
MVDs and FDs that do hold for %T% is equivalent to %F% %smallunion%
{%Student~->>~Interest%, %Prof~->>~Hobby%} considered on %Aset(T)%.
This set is thus equivalent to %D% in the NNF definition. %MVD(T)% is the
set of MVDs in Figure 3, and we can let %F% in Figure 4 be %FD(T)%.
We can convince ourselves that Condition 1 is satisfied by
applying a few standard MVD and FD derivation rules.
For example, we can derive %Prof% %->>% %Hobby% from %MVD(T)% and %FD(T)%
by using the FDs in %FD(T)% to obtain %Prof% %->% %Dept% %Chair% %Prof%,
converting this derived FD into an MVD, and then applying transitivity
with %Dept% %Chair% %Prof% %->>% %Hobby% in %MVD(T)% to obtain
%Prof% %->>% %Hobby%. To convince ourselves that Condition 2 is satisfied,
we consider %Student% %->% %Matriculation%, which holds for %T%, and observe
that %Ancestor(Matriculation)% = %Matriculation% %Prof% %Dept% %Chair% and that
%Student% %->% %Matriculation% %Prof% %Dept% %Chair% is implied.
Hence, %Student% %->% %Ancestor(Matriculation)%. We
similarly check each nontrivial FD in %FD(T)%, which is sufficient
to ensure that Condition 2 is satisfied.
To convince ourselves that Condition 3 is satisfied, we consider the
%Prof% node. Now, %Ancestor(Prof)% = %Prof% %Dept% %Chair% and thus
%Ancestor(Prof)% %->% %Prof% %Dept% %Chair% (trivially). Hence,
%Prof% %Dept% %Chair% %subsetorequal% %Ancestor(Prof)%.
We similarly check all
other nodes in %T% to ensure that Condition 3 is satisfied. %square%
.sp
.PP
Example 10 not only illustrates our NNF definition in a nontrivial case, but
also shows that our definition accepts the nested relation scheme in Figure 1,
which we consider to be good, but which is rejected by all the
earlier definitions of NNF as explained above. We now continue by giving
three examples, one that violates Condition 1 of our NNF definition,
one that violates Condition 2, and one that violates Condition 3. Our example
that violates Condition 1 also shows that our definition of NNF rejects the
nested relation scheme in Figure 8. It therefore rejects the scheme that
allows redundancy, but is not rejected by the earlier NNF definitions.
.sp
.LP
\fBExample 11.\fR
Let %U% = {%Prof%, %Hobby%, %Course%}, %M% = {%Prof% %->>% %Hobby%,
%Hobby% %->>% %Prof%}, and %F% = %emptyset%. As in Figure 8, let %T%
be %Prof% (%Hobby%)* (%Course%)*. %T% does not satisfy Condition 1 because
%MVD(T)% %smallunion% %FD(T)%, which is
{%Prof~->>~Hobby%, %Prof~->>~Course%}, is not equivalent to the set of
FDs and MVDs that hold for %T%. For example, we cannot
derive %Hobby% %->>% %Prof% from {%Prof~->>~Hobby%, %Prof~->>~Course%}.
Thus, Condition 1 is not satisfied. %square%
.sp
.LP
\fBExample 12.\fR
Let %U% = {%Interest%, %Dept%, %Student%}, %M% = %emptyset%, and %F% =
{%Student% %->% %Dept%}. As in Figure 5, let %T% = %Interest%
(%Dept% (%Student%)* )*. %T% does not satisfy Condition 2 because
%Student% %->% %Dept% is a nontrivial FD that holds for %T% and %Ancestor(Dept)%
= %Dept% %Interest%, but %Student% %notFD% %Dept% %Interest%. %square%
.sp
.LP
\fBExample 13.\fR
Our violation of Condition 3 is based on Figure 9.
Let %U% = {%Dept%, %Chair%, %Prof%}, %M% = %emptyset%, %F% =
{%Dept% %->% %Chair%}, and %T% = %Dept% (%Chair%)* (%Prof%)*.
%T% does not satisfy Condition 3 because %Ancestor(Dept)% = %Dept%, but
%Dept% %->% %Chair% holds for %T% and %Chair% %notsubsetorequal%
%Dept%. %square%
.sp
.PP
The problem with the nested relation in
Figure 9 is that all buckets under %Chair%
\fImust\fR be singleton buckets, i.e., must contain at most one tuple.
There is thus no need for the additional structure; scheme
%Dept% %Chair% (%Prof%)* provides all
the advantages without additional structure. Notice that this is different from
a bucket that just happens to have one tuple, such as the second bucket under
%Prof% in Figure 9 %dash% singleton buckets always have at most
a single tuple and should be avoided.
.NH
NNF and Potential Redundancy
.PP
We now prove one of our main results %dash% that a nested relation whose
scheme is in NNF for a given set of FDs and MVDs cannot have redundancy
with respect to the given FDs and MVDs. The full proof is lengthy, consisting
of several substantial lemmas plus the proof of the theorem itself. We have
included all the details. However, we present the proof of the theorem
(Theorem 1 below) so that it can be read alone as a reasonable proof-sketch
of the central idea.
.PP
Many of the lemmas here depend on a set of FD and MVD derivation rules. We
use the following rules, where %X%, %Y%, %Z%, %V%, %W%, and %Z'%
are all subsets of a set of attributes %R%:
.DS
FD derivation rules:
F1: (reflexivity) %Y~subsetorequal~X% implies %X~->~Y%.
F2: (augmentation) %X~->~Y% and %V~subsetorequal~W% imply %XW~->~YV%.
F3: (transitivity) %X~->~Y% and %Y~->~Z% imply %X~->~Z%.
.DE
.DS
MVD derivation rules:
M1: (reflexivity) %Y~subsetorequal~X% implies %X~->>~Y%
M2: (augmentation) %X~->>~Y% and %Z'~subsetorequal~Z% imply %XZ~->>~YZ'%
M3: (transitivity) %X~->>~Y% and %Y~->>~Z% imply %X~->>~Z~-~Y%
M4: (complementation) %X~->>~Y% implies %X~->>~R~-~(XY)%
M5: (trivial complementation) %X~->>~R~-~X%
.DE
.DS
Combined FD and MVD derivation rules:
C1: (replication) %X~->~Y% implies %X~->>~Y%
C2: (coalescence) %X~->>~Y%, %Z~->~W%, %W~subsetorequal~Y%, and %Y~smallinter~Z% = %emptyset% imply %X~->~W%
.DE
.PP
These FD and MVD derivation rules are sound and complete
.[ [
Beeri Fagin Howard 1977
.]],
but not minimal.
Indeed, part of what we show is that M1 (reflexivity) is not needed so that
without it the derivation rules are sound and complete. The more common
choice, of course, is to
retain M1 and omit M5. For our proofs about scheme trees, however,
it is often required that our MVDs stretch from root to
leaf. We therefore use the alternative choice for trivial MVDs. Since this
choice is not common, we prove in Lemma 2 that this is possible.
In addition, we add a corollary that tailors the lemma for the case of
MVDs only.
.sp
.LP
\fBLemma 2.\fR
Let %U% be a set of attributes.
Let %M% be a set of MVDs over %U% and %F% be a set of FDs over %U%.
Let %Z~->>~W% be an MVD implied by %M~smallunion~F% on %U%.
There exists an %(M~smallunion~F)%-based derivation sequence for %Z~->>~W%
on %U% that uses only F1 - F3, M2 - M5, and C1 - C2.
.LP
\fBProof.\fR
Since %Z~->>~W% is implied by %M~smallunion~F% on %U% and since the derivation
rules F1 - F3, M1 - M5, C1, and C2 are sound and complete, there exists a
derivation sequence %S% for %Z~->>~W% on %U% using these rules. If %S% does not
include M1, we are done. Otherwise, we replace each usage of M1
.DS
%X~->>~Y%, by M1 (reflexivity) where %Y~subsetorequal~X%
.DE
by the following sequence of derivation rules:
.DS
%X~->~Y%, by F1 (reflexivity) since %Y~subsetorequal~X%
%X~->>~Y%, by C1 (replication)
.DE
Since this sequence of derivation rules derives %X~->>~Y% but does not use
M1, and since each usage of M1 is replaced by this sequence, the derivation
sequence obtained derives %Z~->>~W%. Thus, there exists an
%(M~smallunion~F)%-based derivation sequence for %Z~->>~W% on %U%
that does not use M1 (reflexivity). %square%
.sp
.LP
\fBCorollary.\fR
If %F% = %emptyset%,
there exists an %M%-based derivation sequence for %Z~->>~W%
that uses only the MVD rules M2 - M5.
.LP
\fBProof.\fR
Since M1 - M4 are sound and complete when no FDs are given,
there exists a derivation sequence %S% for %Z~->>~W% that uses only M1 - M4.
If %S% does not include M1, we are done. Otherwise, we replace each usage of M1
by the following sequence of derivation rules:
.DS
%X~->>~R~-~X%, by M5 (trivial complementation)
%X~->>~R~-~(X(R~-~X))%, by M4 (complementation)
%X~->>~emptyset%, since %R~-~(X(R~-~X))~=~emptyset%
%XY~->>~Y%, by M2 (augmentation)
%X~->>~Y%, since %Y~subsetorequal~X% %square%
.DE
.sp
.PP
Lemma 3 guarantees us that if an attribute of a node %N% in a scheme tree is in
the right-hand side, but not the left-hand side, of a nontrivial MVD, and if the
MVD is implied by the MVDs of the scheme tree, then all the attributes in %N%
are included in the MVD.
.sp
.LP
\fBLemma 3.\fR
Let %U% be a set of attributes.
Let %T% be a scheme tree such that %Aset(T)% %subsetorequal% %U%.
Let %XY~subsetorequal~Aset(T)%.
Let %X~->>~Y% be an MVD in %MVD(T) sup +% on %Aset(T)% such that
%A~in~Y% and %A~notin~X%.
Let %A% be in node %N% of %T%, then %N~subsetorequal~XY%.
.LP
\fBProof.\fR
Since %X~->>~Y% is an MVD in %MVD(T) sup +% on %Aset(T)% and since %MVD(T)%
consists only of MVDs,
there exists an %MVD(T)%-based derivation sequence %S% for %X~->>~Y%
on %Aset(T)%, that by the Corollary to Lemma 2, uses only the
MVD rules M2 - M5. We show by induction on
the number of MVDs %n% in %S% that for every MVD %X'~->>~Y'% in %S%,
if %A% is an attribute in node %N% of %T% such that
%A~in~Y'% and %A~notin~X'%, then %N~subsetorequal~X'Y'%.
Thus, since %X~->>~Y% is in %S%, %N~subsetorequal~XY%.
.LP
Basis:
.PP
Suppose %n% = 1, since only rules M2 - M5 are used and since M2 - M4 require
antecedents, %X~->>~Y% is either given or introduced by
M5 (trivial complementation).
.IP
If %X~->>~Y% is given, then %X~->>~Y% %in% %MVD(T)%. Since every MVD in
%MVD(T)% has the form %Ancestor(P)% %->>% %Descendent(M)% where %P% is the
parent node of node %M%, and since %Descendent(M)% includes all attributes
of every node in %Descendent(M)%, %Y~includesorequal~N%, the node of
attribute %A~in~Y%. Thus, %N~subsetorequal~XY%.
.IP
If %X~->>~Y% is introduced by M5 (trivial complementation), %XY% = %Aset(T)%.
Hence, every attribute and every node is a subset of %XY%, and thus,
%N~subsetorequal~XY%.
.LP
Induction:
.PP
Since %X~->>~Y% can be introduced by any of the MVD derivation rules
M2 - M5 or as a given MVD in %MVD(T)%, we have five cases to consider.
Since, however, the cases for given MVDs and M5 (trivial complementation),
have no antecedents, they are the same as in the
basis. Therefore, we can reduce the cases to three.
.IP (1)
%X'~->>~Y'% is introduced by M2 (augmentation).
Let %V~->>~W% be the antecedent MVD and let %Z'~subsetorequal~Z% such that
%X'% = %VZ% and %Y'% = %WZ'%. If %A~in~Y'% and %A~notin~X'%, then since
%X'% = %VZ%, %Y'% = %WZ'%, and %Z'~subsetorequal~Z%, %A~in~W% and
%A~notin~V%. By the induction hypothesis,
since %A% is in node %N% of %T%, %A~in~W%, and %A~notin~V%,
%N~subsetorequal~VW%. Since %X'% = %VZ% and
%Y'% = %WZ'%, %VW~subsetorequal~X'Y'%. Thus, %N~subsetorequal~X'Y'%.
.IP (2)
%X'~->>~Y'% is introduced by M3 (transitivity).
Let %V~->>~W% and %W~->>~Z% be the antecedent MVDs.
Thus, %X'% = %V% and %Y'% = %Z~-~W%.
Now, assume there exists an attribute %A% in node %N% of %T% such that
%A~in~Y'% and %A~notin~X'%, but %N~notsubsetorequal~X'Y'%.
Then, there exists an attribute %B% such that %B~in~N% and %B~notin~X'Y'%.
Since %B~notin~X'Y'% and %X'% = %V% and %Y'% = %Z~-~W%, %B~notin~V% and either
%B~in~W% or %B~in~Aset(T)~-~(VWZ)%. Suppose %B~in~W%, then since
%B~in~W% and %B~notin~V%, by the induction hypothesis, %N~subsetorequal~VW%.
Since %A~in~N%, %A~in~VW%. But since %A~notin~X'% and %X'% = %V%,
%A~notin~V%; and since %A~in~Y'% and %Y'% = %Z~-~W%, %A~notin~W%. Thus,
%A~notin~VW%. We therefore suppose that
%B~in~Aset(T)~-~(VWZ)%. But now we have %A~in~Y'%, %Y'% = %Z~-~W%, and
therefore %A~in~Z%, %A~notin~W%, and %A~in~N%. Therefore, by the induction
hypothesis, we have %N~subsetorequal~WZ%. Since %B~in~N%,
%B~in~WZ%. However, %B~in~Aset(T)~-~(VWZ)% and thus,
%B~notin~WZ% %dash% a contradiction.
.IP (3)
%X'~->>~Y'% is introduced by M4 (complementation).
Let %V~->>~W% be the antecedent MVD. Thus, %X'% = %V% and
%Y'% = %Aset(T)~-~(VW)%.
Now, assume there exists an attribute %A% in node %N% of %T% such that
%A~in~Y'% and %A~notin~X'%, but %N~notsubsetorequal~X'Y'%.
Then, there exists an attribute %B% such that %B~in~N% and %B~notin~X'Y'%.
Since %X'% = %V% and %Y'% = %Aset(T)~-~(VW)%,
%B~notin~V(Aset(T)~-~(VW))%. Thus, %B~in~W% and
%B~notin~V%. Since %B~notin~V%, %B~in~W%, and %B~in~N%, by the induction
hypothesis, %N~subsetorequal~VW%. Since %A~in~N% and %N~subsetorequal~VW%,
%A~in~VW%. However, since %A~in~Y'% and %Y'% = %Aset(T)~-~(VW)%,
%A~notin~VW% %dash% a contradiction. %square%
.sp
.PP
Lemma 4 extends Lemma 3 to not only guarantee us that a node is included,
but also that all ancestors and descendents of the node are included. That is,
Lemma 4 guarantees us that if an attribute of a node in a scheme tree is in
the right-hand side of a nontrivial MVD
(but not in the left-hand side), and if the
MVD is implied by the MVDs of the scheme tree, then both the ancestors
and the descendents of the node are included in the MVD.
.sp
.LP
\fBLemma 4.\fR
Let %U% be a set of attributes.
Let %T% be a scheme tree such that %Aset(T)% %subsetorequal% %U%.
Let %XY~subsetorequal~Aset(T)%.
Let %X~->>~Y% be a nontrivial MVD in %MVD(T) sup +% on %Aset(T)%.
Let %A% be an attribute such that %A~in~Y% and %A~notin~X%, and let
%A% be in node %N% of %T%.
Then both %Ancestor(N)% %subsetorequal% %XY% and
%Descendent(N)% %subsetorequal% %XY%.
.LP
\fBProof.\fR
Since %X~->>~Y% is an MVD in %MVD(T) sup +% on %Aset(T)% and since %MVD(T)%
consists only of MVDs,
there exists an %MVD(T)%-based derivation sequence %S% for %X~->>~Y%
on %Aset(T)%, that by the Corollary to Lemma 2, uses only the MVD rules M2 - M5.
We show by induction on
the number of MVDs %n% in %S% that for every MVD %X'~->>~Y'% in %S%
if %A% is an attribute such that %A~in~Y'% and %A~notin~X'%, and if
%A% is in node %N% of %T%,
then both %Ancestor(N)% %subsetorequal% %X'Y'% and
%Descendent(N)% %subsetorequal% %X'Y'%.
.LP
Basis:
.PP
Suppose %n% = 1. Since %S% has no MVD introduced by M1 (reflexivity)
and since there is only one MVD %X~->>~Y% in %S%, %X~->>~Y% is given
or is introduced by M5 (trivial complementation).
.IP
If %X~->>~Y% is given, then %X~->>~Y% %in% %MVD(T)%. Since %X~->>~Y~in~MVD(T)%,
%X~->>~Y% has the form %Ancestor(P)% %->>% %Descendent(M)% where %P% is the
parent node of node %M%.
Since %Descendent(M)% includes all attributes of every node in %Descendent(M)%,
and since %A~in~Y% and thus %A~in% %Descendent(M)%, and since %A~in~N%,
%N% is a node in %Descendent(M)%. Hence,
%Descendent(N)% %subsetorequal% %Descendent(M)%, and thus, since %Y% =
%Descendent(M)%, %Descendent(N)% %subsetorequal% %Y%, and thus also
%Descendent(N)% %subsetorequal% %XY%. Furthermore, since
%Descendent(N)% %subsetorequal% %Descendent(M)% and %P% is the parent of %M%,
%Ancestor(N)% %subsetorequal% %Ancestor(P)Descendent(M)%. Hence,
since %X% = %Ancestor(P)% and %Y% = %Descendent(M)%,
%Ancestor(N)% %subsetorequal% %XY%. Therefore, both
%Ancestor(N)% %subsetorequal% %XY% and %Descendent(N)% %subsetorequal% %XY%.
.IP
If %X~->>~Y% is introduced by M5 (trivial complementation), %XY% = %Aset(T)%.
Hence, every node is a subset of %XY%, and thus, both
%Ancestor(N)% %subsetorequal% %XY% and %Descendent(N)% %subsetorequal% %XY%.
.LP
Induction:
.PP
Since the MVD in Step %n% may be given or may be introduced by M2 - M5,
there are five cases to consider.
Since neither given MVDs nor MVDs introduced by M5 (trivial complementation)
have antecedents, however, the basis arguments hold for every step in which
a given MVD and an MVD introduced by M5 appear.
Therefore, we have only three cases to consider.
.IP (1)
%X'~->>~Y'% is introduced by M2 (augmentation).
Let %V~->>~W% be the antecedent MVD. Thus, there exist sets of attributes
%Z% and %Z'% such that %Z'~subsetorequal~Z%, %X'% = %VZ%, and %Y'% = %WZ'%.
Now, let %A% be an attribute in node %N% of %T%
such that %A~in~Y'% and %A~notin~X'%. Since %A~in~Y'%, %Y'% = %WZ'%,
%A~notin~X'%, %X'% = %VZ%, and %Z'~subsetorequal~Z%, %A~in~W% and %A~notin~V%.
Since %A% is an attribute such that %A~in~W% and %A~notin~V%, and since
%A% is in node %N% of %T%, by the induction hypothesis
both %Ancestor(N)% %subsetorequal% %VW% and
%Descendent(N)% %subsetorequal% %VW%.
Since %V~subsetorequal~X'% and %W~subsetorequal~Y'%, %VW~subsetorequal~X'Y'%.
Thus both %Ancestor(N)% %subsetorequal% %X'Y'% and
%Descendent(N)% %subsetorequal% %X'Y'%.
.IP (2)
%X'~->>~Y'% is introduced by M3 (transitivity).
Let %V~->>~W% and %W~->>~Z% be the antecedent MVDs.
Thus, %X'% = %V% and %Y'% = %Z~-~W%.
Let %A% be an attribute in node %N% of %T% such that %A~in~Y'% and %A~notin~X'%.
Since %A~in~N%, %A~in~Y'%, and %A~notin~X'%, by Lemma 3, %N~subsetorequal~X'Y'%.
.IP
We claim that %Ancestor(N)% %subsetorequal~X'Y'%. For suppose not, then
there exists an attribute %B~in% %Ancestor(N)% such that %B~notin~X'Y'%.
Since %B~notin~X'Y'% and %X'% = %V% and %Y'% = %Z~-~W%,
%B~notin~V(Z~-~W)%. Thus, %B~notin~V% and either %B~in~W% or
%B~in~Aset(T)~-~(VWZ)%.
.RS
.IP
We first assume that %B~in~W%. Since %B~notin~V% and %B~in~W%,
by Lemma 3, %B% is in a node %N'% such that
%N'% %subsetorequal% %VW%. Since %B~in% %Ancestor(N)% and %B% %in% %N'%,
%N~subsetorequal% %Descendent(N')%.
Since %N% %subsetorequal% %Descendent(N')% and %A~in~N%, %A~in%
%Descendent(N')%. By the induction hypothesis,
%Descendent(N')% %subsetorequal~VW%. Thus, since %A~in% %Descendent(N')%,
%A~in~VW%. But since %A~in~Y'% and %Y'% = %Z~-~W%, %A~notin~W%, and
since %A~notin~X'% and %X'% = %V%, %A~notin~V%. Thus,
%A~notin~VW% %dash% a contradiction.
.IP
Thus, we assume that %B~in~Aset(T)~-~(VWZ)%.
Since %A~in~Y'% and %Y'% = %Z~-~W%, %A~in~Z% and %A~notin~W%.
Since %A~in~N% and since %A~in~Z% and %A~notin~W%, by the induction hypothesis,
%Ancestor(N)% %subsetorequal~WZ%. Since %B~in% %Ancestor(N)% and %Ancestor(N)%
%subsetorequal~WZ%, %B~in~WZ%. Thus, %B~notin~Aset(T)~-~(VWZ)% %dash% a
contradiction.
.RE
.IP
By an identical argument with %Descendent% replacing %Ancestor%
and vice versa, %Descendent(N)% %subsetorequal~X'Y'%.
.IP (3)
%X'~->>~Y'% is introduced by M4 (complementation).
Let %V~->>~W% be the antecedent MVD.
Thus, %X'% = %V% and %Y'% = %Aset(T)~-~(VW)%. Now,
let %A% be an attribute in node %N% of %T% such that
%A~in~Y'% and thus %A~notin~X'%.
.IP
We claim that %Ancestor(N)% %subsetorequal~X'Y'%. For suppose not, then
there exists an attribute %B~in% %Ancestor(N)% such that %B~notin~X'Y'%.
Since %B~notin~X'Y'% and %X'% = %V% and %Y'% = %Aset(T)~-~(VW)%,
%B~notin~V(Aset(T)~-~(VW))%. Since %B~in% %Ancestor(N)% and thus in %Aset(T)%,
and since %B~notin~V(Aset(T)~-~(VW))%, %B~in~W~-~V%.
Hence %B~in~W% and %B~notin~V%. Let %B% be in node %N'% of %T%.
Since %B% is an attribute in %N'% such that %B~in~W% and %B~notin~V%,
by the induction hypothesis, %Descendent(N')% %subsetorequal~VW%.
Since %B~in% %Ancestor(N)% and %B~in~N'%, %N~subsetorequal% %Descendent(N')%.
Since %A~in~N% and %N~subsetorequal% %Descendent(N')%,
%A~in~Descendent(N')%. Since %A% %in% %Descendent(N')% and
%Descendent(N')% %subsetorequal~VW%, %A~in~VW%. But %A~in~Y'% and
%Y'% = %Aset(T)~-~(VW)%, so that %A~notin~VW% %dash% a contradiction.
.IP
By an identical argument with %Descendent% replacing %Ancestor%
and vice versa, %Descendent(N)% %subsetorequal~X'Y'%. %square%
.sp
.PP
Lemma 5 tells us that the set %D% of dependencies that holds for a scheme tree
%T% is the closure of itself on %Aset(T)%.
.sp
.LP
\fBLemma 5.\fR
Let %U% be a set of attributes.
Let %M% be a set of MVDs over %U% and %F% be a set of FDs over %U%.
Let %T% be a scheme tree such that %Aset(T)% %subsetorequal% %U%.
Let %D% be the set of dependencies in %(M~smallunion~F) sup +% that
hold for %T%. %D sup +% = %D% on %Aset(T)%.
.LP
\fBProof.\fR
We prove that none of the inference rules F1 - F3, M1 - M5, C1 and C2 can
add a new dependency that is not already in %D%, and thus we establish our claim
that %D sup +% = %D%.
.IP (1)
F1 cannot add a new FD. Let %X~subsetorequal~Aset(T)% and let
%Y~subsetorequal~X%. Since %XY% %subsetorequal% %Aset(T)%, %XY%
%subsetorequal% %U% and hence %X~->~Y% is in %(M~smallunion~F) sup +%.
Since %XY% %subsetorequal% %Aset(T)%, %X~->~Y% is already in %D%.
.IP (2)
F2 cannot add a new FD. Let %W~subsetorequal~Aset(T)%,
%V~subsetorequal~W%, and %X~->~Y% be an FD in %D%. Thus, %X~->~Y% holds for
%T% and hence %XY~subsetorequal~Aset(T)%.
By definition of %D%, %X~->~Y% is also an FD in %(M~smallunion~F) sup +%.
Hence, %XW~->~YV% is in %(M~smallunion~F) sup +%. Since
%XWYV~subsetorequal~Aset(T)%, %XW~->~YV% is already in %D%.
.IP (3)
F3 cannot add a new FD. Let %X~->~Y% and %Y~->~Z% be FDs in %D%. Thus,
%X~->~Y% and %Y~->~Z% hold for %T% and hence %XYZ~subsetorequal~Aset(T)%.
By definition of %D%, %X~->~Y% and %Y~->~Z% are also FDs in
%(M~smallunion~F) sup +%. Therefore, %X~->~Z% is in
%(M~smallunion~F) sup +%. Since %XZ~subsetorequal~Aset(T)%, %X~->~Z% is
already in %D%.
.IP (4)
M1 cannot add a new MVD. The argument for this case is the same as for F1.
.IP (5)
M2 cannot add a new MVD. Let %Z~subsetorequal~Aset(T)%,
%Z'~subsetorequal~Z%, and %X~->>~Y% be an MVD in %D%.
Thus, %X~->>~Y% holds for %T% and hence %XY~subsetorequal~Aset(T)% and
there exists an MVD %X~->>~Y'% in
%(M~smallunion~F) sup +% such that %Y~=~Y'~smallinter~Aset(T)%.
Hence, %XZ~->>~Y'Z'% is in %(M~smallunion~F) sup +%. Since
%Z'~subsetorequal~Aset(T)%, %Y'Z'~smallinter~Aset(T)% = %YZ'%. Hence,
%XZ~->>~YZ'% is already in %D%.
.IP (6)
M3 cannot add a new MVD. Let %X~->>~Y% and %Y~->>~Z% be MVDs in %D%.
Thus, %X~->>~Y% and %Y~->>~Z% hold for %T% and hence
%XYZ~subsetorequal~Aset(T)% and there exist MVDs %X~->>~Y'% and %Y~->>~Z'% in
%(M~smallunion~F) sup +% such that %Y~=~Y'~smallinter~Aset(T)% and
%Z~=~Z'~smallinter~Aset(T)%. By M2 (augmentation), %Y'~->>~Z'% is in
%(M~smallunion~F) sup +% and thus %X~->>~Z'~-~Y'% is in
%(M~smallunion~F) sup +%. Since %(Z'~-~Y')% %smallinter% %Aset(T)% =
%Z~-~Y%, %X~->>~Z~-~Y% is already in %D%.
.IP (7)
M4 cannot add a new MVD. Let %X~->>~Y% be an MVD in %D%.
Thus, %X~->>~Y% holds for %T% and hence %XY~subsetorequal~Aset(T)% and
there exists an MVD %X~->>~Y'% in
%(M~smallunion~F) sup +% such that %Y~=~Y'~smallinter~Aset(T)%.
Hence, %X~->>~U~-~(XY')% is in %(M~smallunion~F) sup +%.
Since %(U~-~(XY'))% %smallinter% %Aset(T)% = %Aset(T)~-~(XY)%,
%X~->>~Aset(T)~-~XY% is already in %D%.
.IP (8)
M5 cannot add a new MVD. Let %X~subsetorequal~Aset(T)%.
Since %(U~-~X)~smallinter~Aset(T)% = %Aset(T)~-~X%, %X~->>~Aset(T)~-~X% is
already in %D%.
.IP (9)
C1 cannot add a new MVD. Let %X~->~Y% be an FD in %D%. Thus, %X~->~Y%
holds for %T% and hence %X~->~Y% is in
%(M~smallunion~F) sup +% and %XY~subsetorequal~Aset(T)%.
Hence, %X~->>~Y% is also in %(M~smallunion~F) sup +%. Since
%XY~subsetorequal~Aset(T)%, %X~->>~Y% is already in %D%.
.IP (10)
C2 cannot add a new FD. Let %X~->>~Y% be an MVD in %D%, and let %Z~->~W%
be an FD in %D% such that %W~subsetorequal~Y% and %Y~smallinter~Z% =
%emptyset%. Thus, %X~->>~Y% and %Z~->~W% hold for %T%. Since %X~->>~Y%
holds for %T%, %XY~subsetorequal~Aset(T)% and there exists an MVD %X~->>~Y'% in
%(M~smallunion~F) sup +% such that %Y~=~Y'~smallinter~Aset(T)%.
Since %Z~->~W% holds for %T%, %Z~->~W% is in
%(M~smallunion~F) sup +% and %ZW~subsetorequal~Aset(T)%. Since
%W~subsetorequal~Y% and %Y~subsetorequal~Y'%, %W~subsetorequal~Y'%.
Since %Y~=~Y'~smallinter~Aset(T)%, %Z~subsetorequal~Aset(T)%, and
%Y~smallinter~Z% = %emptyset%, %Y'~smallinter~Z% = %emptyset%.
Hence, %X~->~W% is in %(M~smallunion~F) sup +%. Since
%XW~subsetorequal~Aset(T)%, %X~->~W% is already in %D%. %square%
.sp
.PP
Lemma 6 provides an interesting result: the given FDs can be disregarded
if we are only interested in certain implied MVDs. In particular, if
%MVD(T)% and %FD(T)% imply an MVD %X~->>~Y%, then if we close the
left-hand side of the MVD under %MVD(T)% and %FD(T)% on %Aset(T)% to
obtain %X sup +%, %MVD(T)% alone is sufficient to imply %X sup +~->>~Y%.
The converse also holds, and although we do not need the converse for Theorem 1,
we provide it here because we need it later for a lemma required for Theorem 3.
.sp
.LP
\fBLemma 6.\fR
Let %U% be a set of attributes.
Let %M% be a set of MVDs over %U% and %F% be a set of FDs over %U%.
Let %T% be a scheme tree such that %Aset(T)% %subsetorequal% %U%.
Let %XY% %subsetorequal% %Aset(T)%. If
%MVD(T)% %smallunion% %FD(T)% implies %X~->>~Y% on %Aset(T)%, then
%MVD(T)% implies %X sub {MVD(T)~smallunion~FD(T)} sup + ~->>~Y% on %Aset(T)%
and conversely.
.LP
\fBProof.\fR
Throughout the proof, all implications (and thus also all closures of
attributes) are taken with respect to %Aset(T)%. Thus, for example,
%X sup + ~->>~Y% means %X sub {MVD(T)~smallunion~FD(T)} sup + ~->>~Y%
on %Aset(T)%. Furthermore, since by Lemma 5, %D sup +% = %D% on %Aset(T)%,
all implications on %Aset(T)% remain on %Aset(T)%. We therefore omit
references to %Aset(T)%, leaving this understood without explicit mention.
.PP
Since %MVD(T)% %smallunion% %FD(T)% implies %X~->>~Y% on %Aset(T)% and since the
derivation rules F1 - F3, M1 - M5, C1, and C2 are sound and complete,
there exists an (%MVD(T)% %smallunion% %FD(T)%)-based derivation sequence %S%
for %X~->>~Y% that, by Lemma 2, uses only F1 - F3, M2 - M5, and C1 - C2, and
thus does not use M1 (reflexivity).
We show by induction on the length %n% of %S%
that if %MVD(T)% %smallunion% %FD(T)% implies %X~->>~Y%,
then %MVD(T)% implies %X sup + ~->>~Y%.
.LP
Basis:
.PP
Suppose %n% = 1, then %X~->>~Y% is given or
introduced by M5 (trivial complementation).
If %X~->>~Y% is given, then %X~->>~Y~in~MVD(T)%; otherwise, if %X~->>~Y% is
introduced by M5 (trivial complementation),
then %X~->>~Y% is trivial and thus %MVD(T)% implies %X~->>~Y%.
Since by M2 (augmentation) we can derive
%X sup + ~->>~Y%, %MVD(T)% implies %X sup + ~->>~Y%.
.LP
Induction:
.PP
Since %X~->>~Y% may be introduced by any of the MVD derivation rules
M2 - M5, by C1 (replication), or as a given MVD in %MVD(T)%,
we have six cases to consider.
Since, however, the cases for given MVDs and M5
(trivial complementation) are the same as in the basis, we can reduce the
cases to four.
.IP (1)
%X~->>~Y% is introduced by C1 (replication). Since %X~->>~Y% is introduced by
C1, %X~->~Y% is the antecedent FD. Since %X~->~Y%,
%Y~subsetorequal~X sup +%. Thus, %X sup + ~->>~Y% is trivial
and hence is implied by %MVD(T)%.
.IP (2)
%X~->>~Y% is introduced by M2 (augmentation).
Let %V~->>~W% be the antecedent MVD. Thus, there exist sets of attributes
%Z% and %Z'% with %Z'~subsetorequal~Z% such that %X~=~VZ% and %Y~=~WZ'%.
Therefore, %V~subsetorequal~X%. Since %V~subsetorequal~X%, %V sup +%
%subsetorequal% %X sup +%. By the induction hypothesis, we have
%MVD(T)% implies %V sup + ~->>~W%. Since %V sup +% %subsetorequal%
%X sup +%, by M2 (augmentation), we can derive %X sup + ~->>~W%. Therefore,
%MVD(T)% implies %X sup + ~->>~W%. Since %X~=~VZ% and %Z'~subsetorequal~Z%,
%Z'~subsetorequal~X%. Since %Z'~subsetorequal~X%, %Z'~subsetorequal~X sup +%.
Hence, %X sup + ~->>~Z'% is trivial and thus %MVD(T)% implies %X sup + ~->>~Z'%.
Since %MVD(T)% implies %X sup + ~->>~W% and %X sup + ~->>~Z'%, by the MVD union
rule, which can be derived from M2 (augmentation), M3 (transitivity), and
M4 (complementation), %MVD(T)% implies %X sup + ~->>~WZ'%.
Since %Y~=~WZ'%, %MVD(T)% implies %X sup + ~->>~Y%.
.IP (3)
%X~->>~Y% is introduced by M3 (transitivity).
Let %V~->>~W% be the first antecedent and let %W~->>~Z% be the second
antecedent. Thus, %X~=~V% and %Y~=~Z~-~W%.
.IP
We first prove that %W sup +~-~W~subsetorequal~V sup +%. Let %A% be an
attribute in %W sup +~-~W%. If %A% %in% %(W sup +~-~W)% %smallinter% %V%,
then %A~in~V% and thus %A~in~V sup +%.
If %A~notin~(W sup +~-~W)~smallinter~V%, then
since %A~in~W sup +~-~W%, %A~notin~V% and thus
%A% %in% %(W sup +~-~W)% %-% %V%. Now, since %V~->>~W% is in the
derivation sequence %S%, %MVD(T)% %smallunion% %FD(T)% implies %V~->>~W%.
Thus, %MVD(T)% %smallunion% %FD(T)%
implies %V~->>~Aset(T)~-~(VW)% by M4 (complementation).
Since %W sup +% %subsetorequal%
%Aset(T)% and %A% %in% %(W sup +~-~W)% %-% %V%, %A% %in% %Aset(T)~-~VW%.
Since %A% %in% %W sup +~-~W%, %A~in~W sup +% and thus %W~->~A%. Thus,
by C2 (coalescence) based on %V~->>~Aset(T)~-~(VW)%, %W~->~A%,
%A% %in% %Aset(T)~-~(VW)%, and %(Aset(T)~-~(VW))% %smallinter% %W% =
%emptyset%, %MVD(T)% %smallunion% %FD(T)% implies %V~->~A% and thus
%A~in~V sup +%. Hence, %W sup +~-~W~subsetorequal~V sup +%.
.IP
By the induction hypothesis
we have %MVD(T)% implies %V sup +~->>~W% and %W sup +~->>~Z%. Since
%W sup +~-~W~subsetorequal~V sup +% and since %MVD(T)% implies %V sup +~->>~W%
and %W~subsetorequal~W sup +%,
by M1 (reflexivity) and the MVD union rule, %MVD(T)% implies
%V sup +~->>~W sup +%. Since %MVD(T)% implies %W sup +~->>~Z%,
by M3 (transitivity), %MVD(T)% implies %V sup +~->>~Z~-~W sup +%.
We now augment both sides of %V sup +~->>~Z~-~W sup +% by
%Z~smallinter~(W sup +~-~W)% and obtain
%V sup + (Z~smallinter~(W sup +~-~W))% %->>%
%(Z~-~W sup + )%%(Z~smallinter~(W sup +~-~W))%.
Since %W sup +~-~W~subsetorequal~V sup +%,
%V sup + (Z~smallinter~(W sup +~-~W))% = %V sup +%.
Since %W~subsetorequal~W sup +%,
%(Z~-~W sup + )%%(Z~smallinter~(W sup +~-~W))% = %Z~-~W%.
Hence, we have %MVD(T)% implies %V sup +~->>~Z~-~W%.
Since %X~=~V% and %Y~=~Z~-~W%, %MVD(T)% implies %X sup + ~->>~Y%.
.IP (4)
%X~->>~Y% is introduced by M4 (complementation).
Let %V~->>~W% be the antecedent MVD. Thus, %X~=~V% and
%Y% = %Aset(T)~-~(VW)%.
By the induction hypothesis, %MVD(T)% implies %V sup + ~->>~W%.
Hence, by M4 (complementation), %MVD(T)% implies
%V sup +% %->>% %Aset(T)~-~(V sup + W)%.
Since %V sup + ~-~(VW)% %subsetorequal% %V sup +%,
%V sup +% %->>% %V sup + ~-~(VW)%
is trivial and thus %MVD(T)% implies %V sup +% %->>% %V sup + ~-~(VW)%.
Since %MVD(T)% implies %V sup +% %->>% %Aset(T)~-~(V sup + W)% and also
%V sup +% %->>% %V sup + ~-~(VW)%, by the MVD union rule, %MVD(T)% implies
%V sup +% %->>% %(Aset(T)~-~(V sup + W))%%(V sup + ~-~(VW))%. Since %MVD(T)%
implies %V sup +% %->>% %(Aset(T)% %-% %(V sup + W))%%(V sup +% %-% %(VW))% and
%(Aset(T)~-~(V sup + W))%%(V sup + ~-~(VW))% = %Aset(T)~-~(VW)%,
%MVD(T)% implies %V sup +% %->>% %Aset(T)~-~(VW)%. Since %X~=~V% and
%Y~=~Aset(T)~-~(VW)%, %MVD(T)% implies %X sup + ~->>~Y%.
.PP
We now prove the converse.
Since %MVD(T)% implies %X sup + ~->>~Y%,
%MVD(T)% %smallunion% %FD(T)% implies %X sup + ~->>~Y%. We also have
%MVD(T)% %smallunion% %FD(T)% implies %X~->~X sup +%.
Since, by C1 (replication), %X~->~X sup +% implies %X~->>~X sup +%,
%MVD(T)% %smallunion% %FD(T)% implies %X~->>~X sup +%.
By M3 (transitivity), %X~->>~X sup +% and %X sup + ~->>~Y% imply
%X~->>~Y~-~X sup +%. Hence, %MVD(T)% %smallunion% %FD(T)% implies
%X~->>~Y~-~X sup +%. Since
%(Y~smallinter~X sup + )% %subsetorequal% %X sup +% and since
%MVD(T)% %smallunion% %FD(T)% implies %X~->~X sup +%,
%MVD(T)% %smallunion% %FD(T)% implies %X~->~Y~smallinter~X sup +%.
Hence, by C1 (replication), %MVD(T)% %smallunion% %FD(T)% implies
%X~->>~Y~smallinter~X sup +%. Since %MVD(T)% %smallunion% %FD(T)%
implies %X~->>~Y~-~X sup +% and %X~->>~Y~smallinter~X sup +%, by the
MVD union rule, %MVD(T)% %smallunion% %FD(T)%
implies %X~->>% %(Y~-~X sup + )%%(Y~smallinter~X sup + )%.
Since %Y% = %(Y~-~X sup + )%%(Y~smallinter~X sup + )%,
%MVD(T)% %smallunion% %FD(T)% implies %X~->>~Y%. %square%
.sp
.PP
Lemma 7 begins to directly address the redundancy issue in nested relations.
We use it twice in Theorem 1, and thus we write it separately as a Lemma.
Before stating and proving Lemma 7, we need a definition for a path in a
scheme tree.
.sp
.LP
\fBDefinition 11.\fR
A %path% of a scheme tree %T% is a sequence of nodes
%N sub 1%, ..., %N sub n% where %N sub 1% is the root of %T% and %N sub n%
is a leaf node of %T% and %N sub i% is the parent of %N sub i+1%,
%1~<=~i~<=~n-1%.
.sp
.LP
\fBLemma 7.\fR
Let %T% be a scheme tree. Let %r% be a nested relation on %T%.
Let %A% be an attribute in node %N sub A% of %T%.
If %t sub 1% and %t sub 2% are distinct tuples in the total unnesting of %r%
such that %t sub 1 (Ancestor(N sub A ))% = %t sub 2 (Ancestor(N sub A ))%,
then %t sub 1 (A)% and %t sub 2 (A)% are in the same nested tuple
of a single nested relation under the nested relation scheme whose set of
atomic attributes is %N sub A%.
.LP
\fBProof.\fR
Let %N sub 1%, ..., %N sub n% be a path in %T% such that %N sub k% =
%N sub A%, %1~<=~k~<=~n%. Since %N sub k% = %N sub A% and %N sub k% is in path
%N sub 1%, ..., %N sub n% of %T%, %Ancestor(N sub A )% = %N sub 1%
%smallunion% ... %smallunion% %N sub k%.
We show by induction on the number of nodes %k% in
%Ancestor(N sub A )% that %t sub 1 (A)% and %t sub 2 (A)%
are in the same nested tuple of a single nested relation
under the nested relation scheme whose set of atomic attributes
is %N sub A%.
.LP
Basis:
.PP
Suppose %k% = 1. Hence, %T%'s root node is %N sub 1% = %N sub A% =
%Ancestor(N sub A )%. Since every tuple in the total unnesting of %r%
is nested within some nested tuple of %r%,
let %t sub 1% be nested within %u sub 1% of %r% and
let %t sub 2% be nested within %u sub 2% of %r%. Since
%t sub 1% is nested within %u sub 1% and %N sub A% is the root of %T%,
%u sub 1 (N sub A )% = %t sub 1 (N sub A )% and since
%t sub 2% is nested within %u sub 2% and %N sub A% is the root of %T%,
%u sub 2 (N sub A )% = %t sub 2 (N sub A )%.
Since %t sub 1 (N sub A )% = %t sub 2 (N sub A )%,
%u sub 1 (N sub A )% = %u sub 2 (N sub A )%.
Since %u sub 1~in~r% and %u sub 2~in~r% and
%u sub 1 (N sub A )% = %u sub 2 (N sub A )%,
by Part (2b) of Definition 2, %u sub 1% = %u sub 2%,
and thus are the same nested tuple %t% of %r%.
Since %t sub 1% and %t sub 2% are nested within the same nested tuple of %r%
and since for any nested relation, there is a single nested relation
under the nested relation scheme whose set of atomic attributes
is the root (which is %N sub A% here), %t sub 1 (A)% and %t sub 2 (A)%
are in the same nested tuple of a single nested relation
under the nested relation scheme whose set of atomic attributes
is %N sub A%.
.LP
Induction:
.PP
Suppose %k% = %q%, %q~<=~n%. Because of the one-to-one correspondence
between scheme trees and nested relation schemes, %N sub q% is the set
of atomic attributes of some nested relation scheme %R sub q% of %T%.
We claim that for any attribute %B~in~N sub q%, %t sub 1 (B)% and
%t sub 2 (B)% are in the same nested tuple of a
single nested relation %r sub q% under %R sub q%, and thus,
if %A~in~N sub q%, %t sub 1 (A)% and %t sub 2 (A)%
are in the same nested tuple of a single nested relation
under %R sub q%. We proceed by contradiction and thus
suppose that it is not the case that for any attribute %B~in~N sub q%,
%t sub 1 (B)% and %t sub 2 (B)% are in the same nested tuple of a
single nested relation %r sub q% under %R sub q%. Hence, there exists some
attribute %A'~in~N sub q% such that
either %t sub 1 (A')% or %t sub 2 (A')% or both are not in
%r sub q%, or %t sub 1 (A')% and %t sub 2 (A')% are
in distinct nested tuples of %r sub q%.
.IP
Suppose first that either %t sub 1 (A')% or %t sub 2 (A')% or both are not
in %r sub q%. Since %r sub q% was chosen as
the nested relation that has a nested tuple in which %t sub 1 (A')%
and %t sub 2 (A')% were supposed to have been nested and since %t sub 1 (A')%
and %t sub 2 (A')% are both nested in some nested relation on %R sub q%,
we may assume, without loss of generality, that at least one of these nested
tuples, say %t sub 1 (A')%, is in %r sub q%.
If %N sub q% is the root, then, by the basis case, %t sub 1 (A')% and
%t sub 2 (A')% are both in %r sub q%, which contradicts the
assumption that they are not. Thus, %N sub q% is not the root, and hence
%N sub q%'s parent %N sub q-1% exists. Let %R sub q-1% be the nested relation
scheme of %T% whose set of atomic attributes is %N sub q-1%.
Since %N sub q-1% exists,
we may use the induction hypothesis to guarantee us that for every attribute
%A''% in %Ancestor(N sub q-1 )%, %t sub 1 (A'')% and %t sub 2 (A'')%
are in the same nested tuple of a single nested relation
under %R sub q-1%. In particular, %t sub 1 (N sub q-1 )% and
%t sub 2 (N sub q-1 )% are in the same nested tuple under %R sub q-1%.
Since the %Descendent(N sub q )% components of
any two tuples in the total unnesting of %r%
whose %N sub q-1% components are in the same nested tuple under %R sub q-1%
are in the same nested relation under %R sub q%
and since %A'~in~Descendent(N sub q )% and %t sub 1 (A')% is in
%r sub q%, %t sub 2 (A')% is also in %r sub q%, which contradicts
the assumption that one or the other or both are not in %r sub q%.
.IP
Thus, we must suppose that %t sub 1 (A')% and %t sub 2 (A')% are
in distinct nested tuples %t sub q1% and %t sub q2%, respectively, of
%r sub q% under %R sub q%.
Since %t sub q1% and %t sub q2% are distinct, %t sub q1% %/=% %t sub q2%.
Since %t sub 1 (A')% is in %t sub q1%, %A'~in~N sub q%, and
%N sub q% is the set of atomic attributes of %R sub q%, the nested relation
scheme of %r sub q%, %t sub q1 (N sub q )% = %t sub 1 (N sub q )%.
Similarly, %t sub q2 (N sub q )% = %t sub 2 (N sub q )%.
Since %A'~in~N sub q% and %N sub q% is a node in %Ancestor(N sub A )%, and
since %t sub 1 (Ancestor(N sub A ))% = %t sub 2 (Ancestor(N sub A ))%,
%t sub 1 (N sub q )% = %t sub 2 (N sub q )%. Since
%t sub 1 (N sub q )% = %t sub 2 (N sub q )%,
%t sub q1 (N sub q )% = %t sub 1 (N sub q )%, and
%t sub q2 (N sub q )% = %t sub 2 (N sub q )%, %t sub q1 (N sub q )% =
%t sub q2 (N sub q )%. Since %t sub q1~in~r sub q% and %t sub q2~in~r sub q%,
%N sub q% is the set of atomic attributes of %R sub q% on which %r sub q%
is defined, and %t sub q1 (N sub q )% = %t sub q2 (N sub q )%, by Part (2b)
of Definition 2, %t sub q1% = %t sub q2%, which contradicts %t sub q1% %/=%
%t sub q2%. %square%
.sp
.LP
\fBTheorem 1.\fR
Let %U% be a set of attributes.
Let %M% be a set of MVDs over %U% and %F% be a set of FDs over %U%.
Let %T% be a scheme tree such that %Aset(T)% %subsetorequal% %U%.
%T% has no potential redundancy with respect to %M~smallunion~F% if
the following conditions are satisfied:
.IP (1)
If %D% is the set of MVDs and FDs that hold for %T%,
then %D% is equivalent to %MVD(T)% %smallunion% %FD(T)% on %Aset(T)%.
.IP (2)
For each nontrivial FD %X~->~A% that holds for %T%,
%X~->~Ancestor(N sub A )%, where %N sub A% is the node in %T% that contains %A%.
.LP
\fBProof.\fR
Assume not, then %T% has potential redundancy with respect to
%M~smallunion~F%. Thus, by Definition 9, there exists a redundant atomic
value %v% in a valid nested relation %r% on %T% caused by a dependency
%X~md~Y%, which is either an FD
or an MVD, that holds for %T% with respect to %M% and %F%.
In either case, by Definition 8, we have the following:
%XY~subsetorequal~Aset(T)%, there exists
an attribute %A% such that %A~in~Y% and %A~notin~X%, there exists a subtree
%S% of %T% that contains %A% as an atomic attribute, there exist
distinct nested tuples %u sub 1% and %u sub 2% that are respectively in two
(not necessarily distinct) nested relations on %S%, and there exists
an atomic value %v'%,
such that %u sub 1 (A)% = %v%, %u sub 2 (A)% = %v'%, and %v% = %v'%.
Furthermore, there exist distinct tuples %t sub 1% and
%t sub 2% in the total unnesting of %r% such that
%t sub 1 (Aset(S))% and %t sub 2 (Aset(S))% are tuples in the total
unnesting of %u sub 1% and %u sub 2%, respectively.
.PP
First assume that the redundancy is caused by an FD. Without loss of
generality, we may assume that %Y% = %A% and thus that the FD is %X~->~A%.
Since %v% is a redundant atomic value in %r% caused by %X~->~A%,
by the FD part of Definition 8, %t sub 1 (X)% = %t sub 2 (X)%.
Since %A% is an atomic attribute in %S%, %A% is in the root node of the
subtree %S% of %T%. Let the node that contains %A% be denoted by %N sub A%.
Now, either %t sub 1 (Ancestor(N sub A ))% %=% %t sub 2 (Ancestor(N sub A ))% or
%t sub 1 (Ancestor(N sub A ))% %/=% %t sub 2 (Ancestor(N sub A ))%.
.IP
Case 1.
Suppose %t sub 1 (Ancestor(N sub A ))% %/=% %t sub 2 (Ancestor(N sub A ))%.
Since %A~notin~X%, %X~->~A% is nontrivial. Since %t sub 1 (X)% = %t sub 2 (X)%
but %t sub 1 (Ancestor(N sub A ))% %/=% %t sub 2 (Ancestor(N sub A ))%,
%X~notFD~Ancestor(N sub A )%. Thus, there exists a nontrivial FD %X~->~A% that
holds for %T%, but %X~notFD~Ancestor(N sub A )%, where %N sub A% is the node in
%T% that contains %A%, but this contradicts Condition 2.
.IP
Case 2.
Suppose %t sub 1 (Ancestor(N sub A ))% %=% %t sub 2 (Ancestor(N sub A ))%.
Since %r% is a nested relation on %T% and
%A% is an attribute in node %N sub A% of %T% and since
%t sub 1% and %t sub 2% are distinct tuples in the total unnesting of
%r% such that %t sub 1 (Ancestor(N sub A ))% = %t sub 2 (Ancestor(N sub A ))%,
by Lemma 7, %t sub 1 (A)% and %t sub 2 (A)% are in the same nested tuple
of a single nested relation under the nested relation scheme whose set of
atomic attributes is %N sub A%. Since %S% is the nested relation scheme whose
set of atomic attributes is %N sub A%, %t sub 1 (A)% and %t sub 2 (A)% are
in the same nested tuple under %S%.
Since %A~in~N sub A% and %N sub A% is the root node of subtree %S%,
%A~in~Aset(S)%. Since %A~in~Aset(S)% and since
%t sub 1 (Aset(S))% and %t sub 2 (Aset(S))% are tuples in the total
unnesting of %u sub 1% and %u sub 2%, respectively, %t sub 1 (A)% and
%t sub 2 (A)% are respectively in %u sub 1% and %u sub 2%.
Since %t sub 1 (A)% is in %u sub 1% and %t sub 2 (A)% is in %u sub 2%
and %u sub 1% and %u sub 2% are distinct nested tuples under %S%,
%t sub 1 (A)% and %t sub 2 (A)% are in distinct nested tuples under %S%, a
contradiction to our earlier established fact that %t sub 1 (A)% and
%t sub 2 (A)% are in the same nested tuple under %S%.
.PP
Since the redundancy cannot be caused by an FD, we thus assume that the
dependency %X~md~Y% that causes the redundancy is the MVD %X~->>~Y%.
Since %v% is a redundant atomic value in %r% caused by %X~->>~Y%,
by the MVD part of Definition 8,
%t sub 1 (X)% = %t sub 2 (X)%, %t sub 1 (Y)% = %t sub 2 (Y)%,
and %t sub 1 (Z)% %/=% %t sub 2 (Z)% where %Z% = %Aset(T)~-~(XY)%.
Since %Z% = %Aset(T)~-~(XY)% and %t sub 1 (Z)% %/=% %t sub 2 (Z)%,
%Z% %/=% %emptyset% and thus
%XY% %/=% %Aset(T)%. Since %A~in~Y~-~X%, %Y~notsubsetorequal~X%. Since
%XY% %/=% %Aset(T)% and %Y~notsubsetorequal~X%, %X~->>~Y% is a nontrivial MVD
on %Aset(T)%. Since %A~in~Aset(T)%, %A% is in a node of %T%. Let
%N sub A% be the node of %T% that contains %A%.
.PP
Since %X~->>~Y% holds for %T% given %M~smallunion~F%,
%X~->>~Y~in~D% and thus %D% implies %X~->>~Y%. Since %D% implies %X~->>~Y%
and since by Condition 1, %D% is equivalent to %MVD(T)% %smallunion% %FD(T)%
on %Aset(T)%, %MVD(T)% %smallunion% %FD(T)% implies %X~->>~Y% on %Aset(T)%.
Since %MVD(T)% %smallunion% %FD(T)% implies %X~->>~Y% on %Aset(T)%, by Lemma 6,
%MVD(T)% implies %X sub {MVD(T)~smallunion~FD(T)} sup + ~->>~Y% on %Aset(T)%.
Let %X sup +% = %X sub {MVD(T)~smallunion~FD(T)} sup +%, then
%MVD(T)% implies %X sup +~->>~Y% and thus
%X sup +~->>~Y% %in% %MVD(T) sup +% on
%Aset(T)%. Since by Lemma 5 %D% = %D sup +% on %Aset(T)% and since
by Condition 1 %MVD(T)~smallunion~FD(T)% is equivalent to %D% on %Aset(T)%,
each MVD and FD in %(MVD(T)~smallunion~FD(T)) sup +% on %Aset(T)% is in %D%.
Since %X~->~X sup +% is implied by %MVD(T)~smallunion~FD(T)% on %Aset(T)%
and since each MVD and FD in %(MVD(T)~smallunion~FD(T)) sup +% on %Aset(T)%
is in %D%, %X~->~X sup +% is in %D%.
Since every attribute of every MVD and FD in %D% is in %Aset(T)%
and since %X~->~X sup +% is in %D%, %XX sup +~subsetorequal~Aset(T)%. Since
%X~->~X sup +% and %XX sup +~subsetorequal~Aset(T)% and since
%t sub 1 (X)% = %t sub 2 (X)%, %t sub 1 (X sup + )% = %t sub 2 (X sup + )%.
Also, %A~notin~X sup +%; otherwise, %X~->~A% holds for %T%, and by an argument
similar to the FD case above, we can arrive at a contradiction.
Furthermore, %Z~notsubsetorequal~X sup +%; otherwise, %X~->~Z% holds for %T%,
which, since %t sub 1 (X)% = %t sub 2 (X)%, makes
%t sub 1 (Z)% = %t sub 2 (Z)%, which contradicts
%t sub 1 (Z)% %/=% %t sub 2 (Z)%. Now, let %Z'% = %Aset(T)~-~(X sup + Y)%.
Then, %Z'% %/=% %emptyset% because %Z% = %Aset(T)~-~(XY)%,
%Z~notsubsetorequal~X sup +%, %Z% %/=% %emptyset%, and
%Z'% = %Aset(T)~-~(X sup + Y)%. Since %A~in~Y% and
%A~notin~X sup +%, %Y~notsubsetorequal~X sup +%; and since %Z'%
(= %Aset(T)~-~(X sup + Y)%)
%/=% %emptyset%, %X sup +~->>~Y% is a nontrivial MVD on %Aset(T)%.
Since we now have %X sup + Y~subsetorequal~Aset(T)%,
%X sup +~->>~Y% is a nontrivial MVD in %MVD(T) sup +% on %Aset(T)%,
%A~in~Y%, %A~notin~X sup +%, and %A% is in node %N sub A% of %T%, then by
Lemma 4, %Ancestor(N sub A )% %subsetorequal% %X sup + Y%.
.PP
Since %t sub 1 (X sup + )% = %t sub 2 (X sup + )%
and %t sub 1 (Y)% = %t sub 2 (Y)%,
%t sub 1 (X sup + Y)% = %t sub 2 (X sup + Y)%.
Since %t sub 1 (X sup + Y)% = %t sub 2 (X sup + Y)%,
and %Ancestor(N sub A )% %subsetorequal% %X sup + Y%,
%t sub 1 (Ancestor(N sub A ))% = %t sub 2 (Ancestor(N sub A ))%.
Since %t sub 1 (Ancestor(N sub A ))% = %t sub 2 (Ancestor(N sub A ))%, we
may use the argument in Case 2 for FDs above to obtain a contradiction.
%square%
.NH
Potential Redundancy and NNF
.PP
If we could prove the converse of Theorem 1, we would have a
precise characterization of redundancy in nested relations in terms of our
NNF definition. Unfortunately, the converse is not true. With a small
adjustment, however, we can make it true. The
problem is that we might have a scheme tree that is not consistent with
the given FDs and MVDs. We define consistency as follows.
.sp
.LP
\fBDefinition 12.\fR
Let %U% be a set of attributes.
Let %M% be a set of MVDs over %U% and %F% be a set of FDs over %U%.
Let %T% be a scheme tree such that %Aset(T)% %subsetorequal% %U%.
Let %D% be the set of MVDs and FDs that hold for %T% with respect to %M% and
%F%. A scheme tree %T% is \fIconsistent\fR with %M% and %F% if %D% implies
%MVD(T)% on %Aset(T)%. %square%
.sp
.LP
\fBExample 14.\fR
As a counterexample to show that the converse of Theorem 1 does not hold and
to motivate our desire for consistency, consider the following. Let %U% =
%ABC%. Let %M% be the empty set of MVDs and %F% be the empty set of FDs.
Let %T% be the scheme tree of the nested relation scheme %A% (%B%)* (%C%)*.
Now, %MVD(T)% = {%A~->>~B%, %A~->>~C%}, and we can let %FD(T)% = %emptyset%.
Since both %M% and %F% are empty, the set of FDs and MVDs %D% that hold for %T%
includes only trivial dependencies.
Thus, %D% is not equivalent to %MVD(T)% %smallunion% %FD(T)% and hence
%T% is not in NNF. However, since the only constraints that apply are trivial,
there is no potential redundancy. Thus, we have a scheme tree that has no
potential redundancy, but is not in NNF. %T%, however, is not consistent since
the trivial dependencies are insufficient to imply %MVD(T)%,
which contains nontrivial dependencies. %square%
.sp
.PP
Intuitively, we should not want any scheme tree that implies nontrivial MVDs
unless the implied nontrivial MVDs are given or implied by given constraints.
That is, we should only be interested in consistent scheme trees.
Therefore, the consistency requirement does not turn
out to be a problem, and we can have what we want. If we
assume that scheme trees are consistent with a given set of MVDs and FDs,
we can obtain a precise characterization of redundancy in nested
relations. For then, as we show in this section, a consistent
scheme tree %T% has no potential redundancy
if and only if %T% satisfies the first two conditions of our NNF definition.
.PP
We now proceed to prove this result. We first prove in Theorem 2 that the
first two conditions of our NNF definition not only imply that a scheme tree
has no potential redundancy, but also that it is consistent.
This result follows almost immediately from Theorem 1.
.sp
.LP
\fBTheorem 2.\fR
Let %U% be a set of attributes.
Let %M% be a set of MVDs over %U% and %F% be a set of FDs over %U%.
Let %T% be a scheme tree such that %Aset(T)% %subsetorequal% %U%.
%T% is consistent with %M% and %F% and
has no potential redundancy with respect to %M~smallunion~F%
if the following conditions are satisfied:
.IP (1)
If %D% is the set of MVDs and FDs that hold for %T%,
then %D% is equivalent to %MVD(T)% %smallunion% %FD(T)% on %Aset(T)%.
.IP (2)
For each nontrivial FD %X~->~A% that holds for %T%,
%X~->~Ancestor(N sub A )%, where %N sub A% is the node in %T% that contains %A%.
.LP
\fBProof.\fR
By Theorem 1, %T% has no potential redundancy with respect to %M~smallunion~F%.
By Condition 1, %D% is equivalent to %MVD(T)% %smallunion% %FD(T)% on %Aset(T)%.
Thus %D% implies %MVD(T)% on %Aset(T)% and hence %T% is consistent with %M%
and %F%. %square%
.sp
.PP
We now proceed to prove the converse of Theorem 2. Since
Theorem 2 guarantees that Condition 1 and Condition 2 of our NNF definition
imply that a consistent scheme tree has no potential redundancy with respect to
a given set of FDs and MVDs, the converse will guarantee us that if a consistent
scheme tree has no potential redundancy with respect to a given set of FDs and
MVDs, then it satisfies Condition 1 and Condition 2 of our NNF definition.
.PP
First, however, we need a result about paths in scheme trees, which
we prove in Lemma 8, and a result about potential redundancy, which we prove
in Lemma 9. We then prove that if we only allow consistent scheme trees and
some scheme tree does not satisfy Condition 1 (Lemma 10) or does not satisfy
Condition 2 (Lemma 11), the scheme tree has potential redundancy.
.sp
.LP
\fBLemma 8.\fR
Let %U% be a set of attributes.
Let %T% be a scheme tree such that %Aset(T)% %subsetorequal% %U%.
Let %X~->>~Y% be an MVD on %Aset(T)%. If %MVD(T)% does not imply %X~->>~Y%
on %Aset(T)%,
then there is a path %p% of %T% whose set of attributes is %P% such that
%(Y~-~X)% %smallinter% %P% and %(Aset(T)~-~(XY))% %smallinter% %P% are both
nonempty proper subsets of %P%.
.LP
\fBProof.\fR
As we have done before, we omit references to %Aset(T)%, since all implications
are taken with respect to and remain on %Aset(T)%.
.PP
%X~->>~Y% must be nontrivial; otherwise, %MVD(T)% implies
%X~->>~Y%. Assume that the paths of %T% are %p sub 1%, ...,
%p sub n%, %n~>=~1%. Let the set of attributes of %p sub i% be %P sub i%,
%1~<=~i~<=~n%. Let %Z% = %Aset(T)~-~(XY)%. We proceed by contradiction, and
thus assume that for all %i%, %1~<=~i~<=~n%, either
%(Y~-~X)% %smallinter% %P sub i% is not a nonempty proper subset of %P sub i% or
%Z% %smallinter% %P sub i% is not a nonempty proper subset of
%P sub i%. Considering each disjunct separately, we have:
.IP
%(Y~-~X)% %smallinter% %P sub i% is not a nonempty proper subset of
%P sub i% if and only if %(Y~-~X)% %smallinter% %P sub i% is empty or is
%P sub i%, and
%(Y~-~X)% %smallinter% %P sub i% is empty or is %P sub i% if and only if
%P sub i% %subsetorequal% %XZ% or %P sub i% %subsetorequal% %Y~-~X%.
.IP
%Z% %smallinter% %P sub i% is not a nonempty proper subset of
%P sub i% if and only if %Z% %smallinter% %P sub i% is empty or is
%P sub i%, and
%Z% %smallinter% %P sub i% is empty or is %P sub i% if and only if
%P sub i% %subsetorequal% %XY% or %P sub i% %subsetorequal% %Z%.
.LP
Thus, for all %i%, %1~<=~i~<=~n%,
%P sub i% %subsetorequal% %XZ% or %P sub i% %subsetorequal% %Y~-~X% or
%P sub i% %subsetorequal% %XY% or %P sub i% %subsetorequal% %Z%, which,
since %Y~-~X% %subsetorequal% %XY% and %Z% %subsetorequal% %XZ%,
implies for all %i%, %1~<=~i~<=~n%,
either %P sub i% %subsetorequal% %XY% or %P sub i% %subsetorequal% %XZ%.
.PP
Since %P sub i% %subsetorequal% %XY% or %P sub i% %subsetorequal% %XZ%,
for all %i%, %1~<=~i~<=~n%,
if there is no %P sub i% %subsetorequal% %XZ%, %1~<=~i~<=~n%,
then %P sub i% %subsetorequal% %XY%, %1~<=~i~<=~n%.
But if %P sub i% %subsetorequal% %XY%, %1~<=~i~<=~n%,
then since %P sub 1% %smallunion% ... %smallunion% %P sub n% = %Aset(T)%,
%XY% = %Aset(T)%, and thus, %X~->>~Y% is trivial.
Similarly, since %P sub i% %subsetorequal% %XY% or
%P sub i% %subsetorequal% %XZ%, for all %i%, %1~<=~i~<=~n%,
if there is no %P sub i% %subsetorequal% %XY%,
%1~<=~i~<=~n%, then %X~->>~Z% is trivial.
Thus (after reindexing, if necessary), there is an index %q%, %1~<=~q~<~n%,
such that %P sub i% %subsetorequal% %XY%, for each %i%, %1~<=~i~<=~q%, and
%P sub j% %subsetorequal% %XZ%, for each %j%, %q+1~<=~j~<=~n%.
Let %V% = %P sub 1% %smallunion% ... %smallunion% %P sub q% and
%W% = %P sub q+1% %smallunion% ... %smallunion% %P sub n%.
.PP
We next show that %MVD(T)% implies %(V~smallinter~W)~->>~V%.
For any scheme tree, there is a one-to-one correspondence between
leaf nodes and paths. Thus, if we let %L sub i% be the leaf node of path
%p sub i%, %1~<=~i~<=~n% (under the possible reindexing),
%L sub i~subsetorequal~V%
for %1~<=~i~<=~q% and %L sub i~notsubsetorequal~V~smallinter~W% for
%1~<=~i~<=~q%. Since every path in any scheme tree includes the root, the
root of %T% is in %V~smallinter~W%. Thus, for every path %p sub i%,
%1~<=~i~<=~q%, we know that the leaf %L sub i%
of %p sub i% is not in %V~smallinter~W% and that
the root of %p sub i% is in %V~smallinter~W%. Hence, for every path %p sub i%,
%1~<=~i~<=~q%, there exists a lowest level node %N sub i%, %1~<=~i~<=~q%,
that is not the leaf and is in %V~smallinter~W%.
Since none of the nodes %N sub i%, %1~<=~i~<=~q%, is
a leaf, each has one or more children. Let %N' sub i%, %1~<=~i~<=~q%, be the
child of %N sub i% on path %p sub i%.
Now, by the definition of %MVD(T)%,
%Ancestor(N sub i )% %->>% %Descendent(N' sub i )% %in% %MVD(T)%,
%1~<=~i~<=~q%.
Since %N sub i~subsetorequal~V~smallinter~W%,
by the definition of %V% and %W%,
%Ancestor(N sub i )% %subsetorequal% %V~smallinter~W%, %1~<=~i~<=~q%.
Since %Ancestor(N sub i )% %subsetorequal% %V~smallinter~W%, %1~<=~i~<=~q%, by
M2 (augmentation), %MVD(T)% implies
%V~smallinter~W% %->>% %Descendent(N' sub i )%, %1~<=~i~<=~q%.
By M1 (reflexivity), %MVD(T)% implies
%V~smallinter~W% %->>% %V~smallinter~W%. By our construction
process, %(V~smallinter~W)% %smallunion% %Descendent(N' sub 1 )%
%smallunion% ... %smallunion% %Descendent(N' sub q )% = %V%. Thus, by applying
the MVD union rule to the MVDs %V~smallinter~W% %->>% %Descendent(N' sub i )%,
%1~<=~i~<=~q%, and %V~smallinter~W% %->>% %V~smallinter~W% and substituting
%V% for the right-hand side of the result,
%MVD(T)% implies %V~smallinter~W~->>~V%.
.PP
To finish the proof, we prove that on %Aset(T)%, %X~->>~Y% follows from
%V~smallinter~W~->>~V% and thus we will have a contradiction. We first prove
%V~smallinter~W% %subsetorequal% %X% and %V~-~X% = %Y~-~X%.
.IP
Since %V~subsetorequal~XY% and %W~subsetorequal~XZ%, %(V~smallinter~W)%
%subsetorequal% %(XY~smallinter~XZ)%. Since
%Z% = %Aset(T)~-~(XY)%, %Z~smallinter~XY% = %emptyset%. Therefore,
%XY~smallinter~XZ% = %X%. Hence, %(V~smallinter~W)% %subsetorequal% %X%.
.IP
Since %V~subsetorequal~XY%, %V~-~X% %subsetorequal% %XY~-~X%. However,
%XY~-~X% = %Y~-~X% and thus %V~-~X% %subsetorequal% %Y~-~X%.
Assume %Y~-~X% %notsubsetorequal% %V~-~X%, then there is
an attribute %A~in~Y~-~X% such that %A~notin~V~-~X%. Since %A~in~Y~-~X%,
%A~notin~X%. Since %A~notin~V~-~X% and %A~notin~X%, %A~notin~V%.
Let the path containing %A% be %p% and let %P% be the set of attributes
of %p%. Hence, %A~in~P%. %P~notsubsetorequal~V%; otherwise
%A~in~V% which contradicts %A~notin~V%. In addition, %P~notsubsetorequal~XY%;
otherwise %P~subsetorequal~V% since %V% is the union of paths that
are subsets of %XY%. Hence, there is an attribute %B~in~P% that is not
in %XY%. Since %A~in~Y~-~X%,
%B~notin~XY%, and both %A% and %B% are attributes in %P%,
%(Y~-~X)% %smallinter% %P% and %(Aset(T)~-~(XY))% %smallinter% %P% are both
nonempty proper subsets of %P%,
which is a contradiction since %P% is the set of attributes for a path of %T%
and for the attribute sets %P sub i% for all paths of %T%, %1~<=~i~<=~n%, either
%(Y~-~X)% %smallinter% %P sub i% is not a proper subset of %P sub i% or
%(Aset(T)~-~(XY))% %smallinter% %P sub i% is not a proper subset of %P sub i%.
Therefore, %Y~-~X~subsetorequal~V~-~X%. Thus, since
%V~-~X% %subsetorequal% %Y~-~X%, %V~-~X% = %Y~-~X%.
.PP
Since %MVD(T)% implies %V~smallinter~W~->>~V% and
%V~smallinter~W% %subsetorequal% %X%, by M2 (augmentation),
%MVD(T)% implies %X~->>~V%. Since %X~->>~X% is trivial,
%MVD(T)% implies %X~->>~X%. Since %MVD(T)% implies %X~->>~X%
and %X~->>~V%, by M3 (transitivity), %MVD(T)% implies
%X~->>~V~-~X%. Thus, since %V~-~X% = %Y~-~X%, %MVD(T)% implies
%X~->>~Y~-~X%. Now, by augmenting both sides of %X~->>~Y~-~X%
with %X~smallinter~Y%, we have %MVD(T)% implies %X~->>~Y% %dash%
a contradiction. %square%
.sp
.LP
\fBLemma 9.\fR
Let %U% be a set of attributes.
Let %M% be a set of MVDs over %U% and %F% be a set of FDs over %U%.
Let %T% be a scheme tree such that %Aset(T)% %subsetorequal% %U% and %T% is
consistent with %M% and %F%. Let %D% be
the set of MVDs and FDs that holds for %T% with respect to %M% and %F%.
Let %A% and %B% be attributes in %Aset(T)%, and let %X% and %Z% be sets of
attributes in %Aset(T)%. If %A% is in node %N sub A% of %T%,
%B~in~Ancestor(N sub A )%, %A~notin~Z%, %A~notin~X%, %B~in~Z%,
%Z~in~DEP sub D%(%X%)\(dg,
.FS
\(dg %DEP sub D%(%X%) is the dependency basis for %X% with respect to %D%
on %Aset(T)%
.[ [
Maier
.]].
.FE
and %Z~smallinter~X% = %emptyset%, then
%T% has potential redundancy with respect to %M~smallunion~F%.
.LP
\fBProof.\fR
As we have done before, we omit references to %Aset(T)%, since all implications
are taken with respect to and remain on %Aset(T)%.
.PP
To establish potential redundancy for %T% with respect to %M~smallunion~F%,
we must exhibit a valid nested relation for %T% with respect to
%M~smallunion~F% that has a redundant atomic value.
Consider a flat relation %r% on %Aset(T)% that has two tuples
%t sub 1% and %t sub 2%. Let %t sub 1% be a row of all 1's,
%t sub 2 (Aset(T)~-~Z)% = %t sub 1 (Aset(T)~-~Z)%, and %t sub 2 (Z)% be all 2's.
We claim that %r% is valid with respect to %M~smallunion~F%. Thus, by
Definition 7, we claim that every FD and every MVD that holds for %T%
with respect to %M% and %F% is satisfied. We prove this claim by contradiction
and thus assume that %r% does not satisfy some FD or MVD that holds
for %T%. Since an FD is also an MVD, without loss of generality, we assume that
%r% does not satisfy MVD %V~->>~W% that holds for %T%. By the
construction of %r% and since %r% does not satisfy %V~->>~W%,
%V~subsetorequal~Aset(T)~-~Z% and both %Z~smallinter~(W~-~V)% and
%Z~smallinter~(Aset(T)~-~(VW))% are nonempty proper subsets of %Z%.
Since %V~subsetorequal~Aset(T)~-~Z%, %V~smallinter~Z% = %emptyset%.
Since %V~smallinter~Z% = %emptyset% and
%Z~smallinter~(W~-~V)% is a nonempty proper subset of %Z%,
%Z~smallinter~W% is a nonempty proper subset of %Z%.
Since %Z~in~DEP sub D%(%X%), %X~->>~Z%.
Since %X~->>~Z%, by M4 (complementation) and M2 (augmentation),
%X% %->>% %X(Aset(T)~-~(XZ))%.
Since %V~smallinter~Z% = %emptyset%,
%V% %subsetorequal% %X(Aset(T)~-~(XZ))%. Since %V~->>~W% and
%V% %subsetorequal% %X(Aset(T)~-~(XZ))%, by M2 (augmentation)
%X% %(Aset(T)~-~(XZ))% %->>% %W%.
Since %X% %->>% %X(Aset(T)~-~(XZ))% and
%X(Aset(T)~-~(XZ))% %->>% %W%, by M3 (transitivity),
%X% %->>% %W% %-% %X(Aset(T)~-~(XZ))%.
Hence, since for any sets of attributes %X%, %W%, %Z%, and %Aset(T)%
such that %X%, %W%, and %Z% are subsets of %Aset(T)%,
%W% %-% %X(Aset(T)~-~(XZ))% = %(Z~smallinter~W)% %-% %X%,
%X% %->>% %(Z~smallinter~W)~-~X%. Since %Z~smallinter~X% =
%emptyset%, %(Z~smallinter~W)% %-% %X% = %Z~smallinter~W%, and hence
%X~->>~Z~smallinter~W%.
However, %Z~smallinter~W% is a nonempty proper subset of %Z% which implies
that %Z~notin~DEP sub D%(%X%) %dash% a contradiction.
Hence, %r% is valid with respect to %M~smallunion~F%.
.PP
We now establish three groups of essential facts, and then conclude.
.IP (1)
Since %T% is consistent with %M% and %F%, %D% implies %MVD(T)%.
Since %r% is a flat relation and is valid with respect to %M~smallunion~F%,
%r% satisfies every FD and MVD that holds for %T% with respect to %M% and %F%,
and thus %r% satisfies %D%. Since %r% satisfies %D% and %D% implies %MVD(T)%,
%r% satisfies %MVD(T)%. Since %r% satisfies %MVD(T)% and %r% is defined on
%Aset(T)%, we can nest %r% according to %T% to obtain a nested relation
%q% on %T%. Since %r% is the total unnesting of %q% and
%r% satisfies every FD and MVD that holds for %T% with respect to %M% and %F%,
%q% is valid with respect to %M~smallunion~F%.
.IP (2)
Since %B~in~Z%, %t sub 1 (B)% = 1 and %t sub 2 (B)% = 2. Since %A~notin~Z% and
%A~notin~X%, %A% %in% %Aset(T)~-~(XZ)%.
Since %A% %in% %Aset(T)~-~(XZ)%, %t sub 1 (A)% = %t sub 2 (A)% and are both
1's. Let %S% be the nested relation scheme
in %T% whose set of atomic attributes is
%N sub A%. Then, since %t sub 1 (B)% = 1 and %t sub 2 (B)% = 2 and %B% %in%
%Ancestor(N sub A )%, %t sub 1 (A)% and %t sub 2 (A)% are in distinct nested
tuples %u sub 1% and %u sub 2% under %S% in %q%.
.IP (3)
Since %Z~in~DEP sub D (X)%, %D% implies %X~->>~Z%.
Let %Y% = %Aset(T)~-~(XZ)%. Hence, by
M4 (complementation), %X% %->>% %Y%. Since %D% implies %X~->>~Y%, by
Lemma 5, %X~->>~Y% is in %D%. Since
%A% %in% %Aset(T)~-~(XZ)% and %Y% = %Aset(T)~-~(XZ)%,
%A~in~Y% and %A~notin~X%. Since %Z~smallinter~X% = %emptyset%
and %t sub 1 (Aset(T)~-~Z)% = %t sub 2 (Aset(T)~-~Z)%,
%t sub 1 (X)% = %t sub 2 (X)%. Since
%Y% = %Aset(T)~-~(XZ)%, %t sub 1 (Y)% = %t sub 2 (Y)%.
.PP
Now we have MVD %X~->>~Y% in %D%,
which thus holds for %T% with respect to
%M% and %F%, an attribute %A~in~Y% and %A~notin~X%,
a non-empty nested relation
%q% on %T% that is valid with respect to %M~smallunion~F%, a subtree %S%
of %T% that contains %A% as an atomic attribute, and
distinct nested tuples %u sub 1% and %u sub 2% of %q% under %S%,
such that %u sub 1 (A)% = %u sub 2 (A)%, and we have distinct tuples
%t sub 1% and %t sub 2% in the total unnesting of %q% (= %r%) such that
%t sub 1 (Aset(S))% and %t sub 2 (Aset(S))%
are tuples in the total unnesting of %u sub 1% and %u sub 2% respectively,
such that %t sub 1 (X)% = %t sub 2 (X)%,
%t sub 1 (Y)% = %t sub 2 (Y)%, %t sub 1 (Z)% %/=% %t sub 2 (Z)%
where %Z% = %Aset(T)~-~(XY)%. Therefore, atomic value %u sub 1 (A)%
is a redundant atomic value in %q% caused by %X~->>~Y%.
Since there exists a redundant atomic value in a valid nested relation
for %T% caused by %X~->>~Y%, which holds for %T% with respect to %M%
and %F%, %T% has potential redundancy with respect to %M~smallunion~F%.
%square%
.sp
.LP
\fBLemma 10.\fR
Let %U% be a set of attributes.
Let %M% be a set of MVDs over %U% and %F% be a set of FDs over %U%.
Let %T% be a scheme tree such that %Aset(T)% %subsetorequal% %U% and %T% is
consistent with %M% and %F%.
If %D% is the set of MVDs and FDs that holds for %T% with respect to %M% and %F%
and %D% is not equivalent to %MVD(T)% %smallunion% %FD(T)% on %Aset(T)%, then
%T% has potential redundancy with respect to %M~smallunion~F%.
.LP
\fBProof.\fR
As we have done before, we omit references to %Aset(T)%, since all implications
(and thus also all closures of attributes)
are taken with respect to and remain on %Aset(T)%.
.PP
Since %T% is consistent with %M% and %F%, %D% implies %MVD(T)%.
By definition of %FD(T)% and %D%, %FD(T)% is equivalent to the set of FDs in
%D%. Since %D% implies %MVD(T)% and %FD(T)% is equivalent to the set of FDs in
%D%, %D% implies %MVD(T)% %smallunion% %FD(T)%. However, since
%D% is not equivalent to %MVD(T)% %smallunion% %FD(T)%,
%MVD(T)% %smallunion% %FD(T)% does not imply %D%.
Thus, there is an FD or MVD in %D% that is not implied by %MVD(T)% %smallunion%
%FD(T)%. Since %FD(T)% is equivalent to the set of FDs in %D%, there
is an MVD %X~->>~Y% in %D% that is not implied by %MVD(T)% %smallunion% %FD(T)%.
Let %X sup +% = %X sub {MVD(T)~smallunion~FD(T)} sup +%.
Since %MVD(T)% %smallunion% %FD(T)% does not imply %X~->>~Y%, by Lemma 6,
%MVD(T)% does not imply %X sup +~->>~Y%.
Since %MVD(T)% does not imply %X sup +~->>~Y%,
by Lemma 8, there is a path %p% of %T% whose set of attributes is %P% such that
%(Y~-~X sup + )% %smallinter% %P% and %(Aset(T)~-~(X sup + Y))% %smallinter%
%P% are both nonempty proper subsets of %P%. Since %X~->>~Y% is in %D%
and %X~subsetorequal~X sup +%, by M2 (augmentation), %D% implies
%X sup +~->>~Y%. Since %D% implies %X sup +~->>~Y%
and since %(Y~-~X sup + )% %smallinter% %P% and
%(Aset(T)~-~(X sup + Y))% %smallinter% %P% are both nonempty proper subsets
of %P%, there exist %Z sub 1% and %Z sub 2~in~DEP sub D%(%X sup +%)
such that %Z sub 1% %subsetorequal% %(Y~-~X sup + )% and
%Z sub 1% %smallinter% %P% is a nonempty proper subset of %P% and
%Z sub 2% %subsetorequal% %Aset(T)~-~(X sup + Y)% and
%Z sub 2% %smallinter% %P% is a nonempty proper subset of %P%.
Let %A sub 1% be an attribute in %Z sub 1% %smallinter% %P% and
let %A sub 2% be an attribute in %Z sub 2% %smallinter% %P%. Further, let
%A sub 1% and %A sub 2% be in nodes %N sub 1% and %N sub 2% respectively.
Since %A sub 1% and %A sub 2% are in %P%, %N sub 1% and %N sub 2% are in the
path %p%. Therefore, either %N sub 1% %subsetorequal%
%Ancestor(N sub 2 )% or %N sub 2% %subsetorequal% %Ancestor(N sub 1 )%.
If %N sub 1% %subsetorequal% %Ancestor(N sub 2 )%, let
%Z% be %Z sub 1%, %B% be %A sub 1% and %A% be %A sub 2%; otherwise,
if %N sub 2% %subsetorequal% %Ancestor(N sub 1 )%,
%Z% be %Z sub 2%, %B% be %A sub 2% and %A% be %A sub 1%.
In either case let %N sub A% be the node that contains %A%; then we have
%A~in~N sub A%, %B~in~Ancestor(N sub A )%, %A~notin~Z%, %A~notin~X sup +%,
%B~in~Z%, %Z~in~DEP sub D%(%X sup +%), and %Z~smallinter~X sup +% =
%emptyset%. Hence, by Lemma 9,
%T% has potential redundancy with respect to %M~smallunion~F%. %square%
.sp
.LP
\fBLemma 11.\fR
Let %U% be a set of attributes.
Let %M% be a set of MVDs over %U% and %F% be a set of FDs over %U%.
Let %T% be a scheme tree such that %Aset(T)% %subsetorequal% %U% and %T% is
consistent with %M% and %F%.
If there exists a nontrivial FD %X~->~A% that holds for %T% such that
%X~notFD~Ancestor(N sub A )%, where %N sub A% is the node in %T% that
contains %A%, then %T% has potential redundancy with respect to
%M~smallunion~F%.
.LP
\fBProof.\fR
Since %X~->~A% is nontrivial, %A~notin~X%.
Since %X~notFD~Ancestor(N sub A )%, there exists an attribute
%B~in~Ancestor(N sub A )% such that %X~notFD~B%. Since %X~notFD~B%,
%B~notin~X%. Thus, there exists a %Z~in~DEP sub D%(%X%) such that
%Z~smallinter~X% = %emptyset% and %B~in~Z%. Since %X~->~A% and %X~notFD~B% and
thus %A~/=~B% and since %B~in~Z%, %A~notin~Z%; otherwise %Z% is not minimal
since %A% would be a proper subset of %Z% and thus we would not have
%Z~in~DEP sub D%(%X%). Since %X~->~A% holds for %T%, %A~in~Aset(T)%.
Now, since %A~in~N sub A%, %A~notin~X%, %B~in~Ancestor(N sub A )%,
%Z~in~DEP sub D%(%X%), %Z~smallinter~X% = %emptyset%, %B~in~Z%,
%A~notin~Z%, by Lemma 9, %T% has potential redundancy with respect to
%M~smallunion~F%. %square%
.sp
.PP
With Lemmas 10 and 11 in hand, we are now ready to prove Theorem 3,
the converse of Theorem 2.
.sp
.LP
\fBTheorem 3.\fR
Let %U% be a set of attributes.
Let %M% be a set of MVDs over %U% and %F% be a set of FDs over %U%.
Let %T% be a scheme tree such that %Aset(T)~subsetorequal~U%.
If %T% is consistent with %M% and %F% and
has no potential redundancy with respect to %M~smallunion~F%,
then the following conditions are satisfied:
.IP (1)
If %D% is the set of MVDs and FDs that hold for %T%,
then %D% is equivalent to %MVD(T)% %smallunion% %FD(T)% on %Aset(T)%.
.IP (2)
For each nontrivial FD %X~->~A% that holds for %T%,
%X~->~Ancestor(N sub A )%, where %N sub A% is the node in %T% that contains %A%.
.LP
\fBProof.\fR
By Lemma 10, if %D% is the set of MVDs and FDs that hold for %T%
and %D% is not equivalent to %MVD(T)% %smallunion% %FD(T)% on %Aset(T)%, then
%T% has potential redundancy with respect to %M~smallunion~F%.
By Lemma 11, if there exists a nontrivial FD %X~->~A% that holds for %T%
such that %X~notFD~Ancestor(N sub A )%, where %N sub A% is the node in %T%
that contains %A%, then %T% has potential redundancy with respect to
%M~smallunion~F%.
Since %T% is given as being consistent with %M% and %F%, we thus have
if %T% is consistent with %M% and %F% and Condition 1 is not satisfied, then
%T% has potential redundancy with respect to %M~smallunion~F%, and
if %T% is consistent with %M% and %F% and Condition 2 is not satisfied, then
%T% has potential redundancy with respect to %M~smallunion~F%.
Hence, if %T% is consistent with %M% and %F% and
%T% has no potential redundancy with respect to %M~smallunion~F%, then
both Condition 1 and Condition 2 are satisfied. %square%
.sp
.PP
Theorem 2 and Theorem 3 together give us a precise characterization of
redundancy in nested relations for a given, consistent
scheme tree %T% with respect to a
given set of FDs and MVDs. We can have potential redundancy if %T%
violates Condition 1 or Condition 2 of our NNF definition.
Equivalently, we can be sure that no redundancy arises if %T%
satisfies Condition 1 and Condition 2 of our NNF definition.
Moreover, if we are sure that no redundancy arises (and since %T% is
consistent), then %T% satisfies both
Condition 1 and Condition 2 of our NNF definition.
.NH
Singleton Buckets
.PP
In this section we show that no nested relation whose scheme is consistent
with a given set of FDs and MVDs and satisfies Condition 3 has
a singleton bucket except (possibly) its outermost bucket.
Our exception that the outermost bucket may be a singleton is reasonable
because there are occasions when this may be useful and, more importantly,
there may be no alternative. Singleton buckets other than the outermost
bucket can always be avoided by a simple transformation on the scheme tree
that moves the attribute(s) that are causing the problem up the tree, i.e.,
closer to (or to) the root. However, it is not possible to move attributes
already in the root any farther up the tree.
.sp
.LP
\fBDefinition 13.\fR
Let %U% be a set of attributes.
Let %M% be a set of MVDs over %U% and %F% be a set of FDs over %U%.
Let %T% be a scheme tree such that %Aset(T)~subsetorequal~U%.
Let %S% = %X% %(S sub 1 )*% ... %(S sub n )*% be a subtree
of %T%. %S% is a \fIsingleton bucket scheme\fR of %T% if,
with respect to %M~smallunion~F%, any valid nested relation %s% on %S%
has at most a single nested tuple. When %S% is a singleton bucket scheme,
we call %s% a \fIsingleton bucket\fR of %T%. %square%
.sp
.LP
\fBTheorem 4.\fR
Let %U% be a set of attributes.
Let %M% be a set of MVDs over %U% and %F% be a set of FDs over %U%.
Let %T% be a scheme tree such that %Aset(T)% %subsetorequal% %U% and %T% is
consistent with %M% and %F%.
Let %S% = %X% %(S sub 1 )*% ... %(S sub n )*% be a subtree of
%T% such that %X% is not the root of %T%. %S% is not a singleton bucket scheme
of %T% if the following condition is satisfied.
.IP
For every node %N~in~T%, if %Ancestor(N)% %->% %Y% holds for %T%,
%Y% %subsetorequal% %Ancestor(N)%.
.LP
\fBProof\fR.
We prove the contrapositive and thus assume that
%S% is a singleton bucket scheme of %T%.
Since %S% = %X% %(S sub 1 )*% ... %(S sub n )*% and
%X% is not the root of %T%, %X% is in some path %V sub 1%, ...,
%V sub p% of %T% and %X~/=~V sub 1%. Since %X% is in path %V sub 1%, ...,
%V sub p% and %X~/=~V sub 1%, %X% = %V sub i%, for some %i%, %2~<=~i~<=~p%.
Since %i~>=~2%, %V sub i-1 % exists and is the parent of %V sub i%.
.PP
We claim that %Ancestor(V sub i-1 )~->~X% holds for %T% with respect to %M% and
%F%. For suppose not then either
%Ancestor(V sub i-1 )X~notsubsetorequal~Aset(T)%
or %M~smallunion~F% does not imply %Ancestor(V sub i-1 )~->~X% on %U%.
Since %X% and %V sub i-1% are nodes in a path of %T%,
%Ancestor(V sub i-1 )X~subsetorequal~Aset(T)%. Hence, it must be that
%M~smallunion~F% does not imply %Ancestor(V sub i-1 )~->~X% on %U%.
Thus, there exists a flat relation %r% on %U% that is valid with respect to
%M~smallunion~F% and has tuples %t sub 1% and %t sub 2% such
that %t sub 1 (Ancestor(V sub i-1 ))% = %t sub 2 (Ancestor(V sub i-1 ))%, but
%t sub 1 (X)% %/=% %t sub 2 (X)%. Let %r'% be the projection of %r% on
%Aset(T)%, and let %t' sub 1% = %t sub 1 (Aset(T))% and
%t' sub 2% = %t sub 2 (Aset(T))%.
Since %Ancestor(V sub i-1 )X~subsetorequal~Aset(T)%
and %t sub 1 (X)% %/=% %t sub 2 (X)%, %t' sub 1% and %t' sub 2% are
distinct tuples in %r'%. Since the projection of a valid flat relation
with respect to %M~smallunion~F% is valid with respect to %M~smallunion~F%,
%r'% is valid with respect to
%M~smallunion~F% on %Aset(T)%. Since %r'% is a valid flat relation with
respect to %M~smallunion~F% on %Aset(T)% and is consistent with %M% and %F%,
%r'% is the total unnesting of a nested relation %q% on %T%.
Since %t' sub 1% and %t' sub 2% are distinct tuples in %r'%, which is
the total unnesting of %q%, and since
%t' sub 1 (Ancestor(V sub i-1 ))% = %t' sub 2 (Ancestor(V sub i-1 ))%,
by Lemma 7, for each attribute %A% in %V sub i-1%, %t' sub 1 (A)% and
%t' sub 2 (A)% are in the same nested tuple of a single nested
relation under the nested relation scheme whose set of atomic attributes is
%V sub i-1%. Moreover,
since %t' sub 1 (V sub i-1 )% and %t' sub 2 (V sub i-1 )%
are in the same nested tuple of a single nested relation under the nested
relation scheme whose set of atomic attributes is %V sub i-1% and since
the descendent components of any two such tuples are in the
nested relation %q'% whose set of atomic attributes is a child of %V sub i-1%
and since %V sub i% is the child of %V sub i-1%, %t' sub 1 (V sub i )% and
%t' sub 2 (V sub i )% are both in %q'%.
However, since %X% = %V sub i% and %X% is the root node of the singleton
bucket scheme %S% and since %t' sub 1 (X)% %/=% %t' sub 2 (X)% and
%t' sub 1 (X)% and %t' sub 2 (X)% are in %q'%,
%t' sub 1 (X)% and %t' sub 2 (X)% are not in the same nested tuple in %q'%
under %S%. Hence, there are at least two distinct nested tuples in %q'%,
which contradicts the assumption that %S% is a singleton bucket scheme of %T%.
Thus, %Ancestor(V sub i-1 )~->~X% holds for %T% with respect to %M% and %F%.
.PP
Since by the definition of a nested relation scheme each attribute
in %Aset(T)% appears in at most one node of %T% and no node of %T% is empty
and since %X% (= %V sub i%) is the child of
%V sub i-1%, %X~notsubsetorequal~Ancestor(V sub i-1 )%.
Since %Ancestor(V sub i-1 )~->~X% holds for %T% but
%X~notsubsetorequal~Ancestor(V sub i-1 )%, it is not the case that
for every node %N~in~T%, if %Ancestor(N)% %->% %Y% holds for %T%,
%Y% %subsetorequal% %Ancestor(N)% %dash% a contradiction. %square%
.NH
Conclusions and Related/Future Work
.NH 2
Concluding Remarks
.PP
In this paper, we proposed a new definition for Nested Normal Form (NNF).
Given a set of MVDs and FDs,
our NNF definition imposes three conditions on nested relation schemes:
.IP (1)
The set of MVDs and FDs that hold must agree with the union of the set of MVDs
that are naturally implied by the scheme tree and the set of FDs that hold.
.IP (2)
Any nontrivial FDs that hold and have a single attribute on the right-hand side
must functionally determine all the ancestor attributes of the right-hand-side
attribute. This forces applicable one-to-many constraints to conform to the
scheme tree.
.IP (3)
The scheme tree must not allow internal singleton buckets, which are
structurally anomalous because they allow at most one tuple.
.LP
Our NNF definition fully uses semantic constraints imposed by
FDs as well as MVDs. This permits greater attribute clustering and leads to
more flexibility in nested relation design than do previously proposed
definitions for NNF.
.PP
We also proposed a definition for redundancy based
on the notion that an atomic value in a nested relation
is redundant if it can be determined from an
FD or MVD that holds and other atomic values in the nested relation.
We then proved that the first two conditions of our NNF definition are necessary
and sufficient to remove potential redundancy for any scheme tree that is
consistent with a given set of MVDs and FDs.
.NH 2
Related and Future Work
.PP
There are several different ways redundancy might be characterized
including the one given in
.[ [
Vincent Srinivasan
.]]
and some suggested ones we have considered: namely, that we could base our
definition on all dependencies rather than on one dependency or that a data
value is redundant if there do not exist two different relations that
agree on all other data values and obey a given set of dependencies.
Further investigation into the possible
equivalence of these and other definitions may
be illuminating. As usual, when there are multiple equivalent definitions,
the best ones can all be used to provide different points of view. We like
ours for teaching where we can simply erase an atomic value in a
nested relation and ask if there is a constraint
and other data values that together determine the erased value.
.PP
Elsewhere, we have shown that normalization theory for flat relations
is a special case of normalization theory for nested relations
.[ [
Embley Ng Mok Unifying
.]].
We have proved that the first two conditions of our definition for NNF are
equivalent to Fagin's 4NF for degenerate nested relations (flat relations).
Equivalence for BCNF follows immediately when we only consider given FDs.
Since in Sections 4 and 5, of this paper, we proved that if a nested relation
scheme is consistent and
satisfies the first two conditions of our NNF definition, it has
no potential redundancy and conversely, and since we proved in
.[ [
Embley Ng Mok Unifying
.]]
that the first two conditions of our NNF definition
are equivalent to both Fagin's 4NF and BCNF under degenerate conditions,
we are able to observe
as a corollary that our redundancy definitions precisely characterize both
Fagin's 4NF and BCNF. As a result of these proofs and corollaries,
our definition for potential redundancy stands as an alternative
characterization for Fagin's 4NF and BCNF as well as for our definition of NNF.
.PP
We are investigating algorithms for obtaining NNF schemes given a set of
attributes and a set of MVDs and FDs.
Since our definition provides a high degree of flexibility, it naturally
permits many different algorithms for generating nested relation
schemes in NNF. Furthermore, different input assumptions lead to vastly
different algorithmic characteristics. For example, if we
assume that in practical cases the MVDs are generated from an acyclic hypergraph
and that the FDs are embedded in hypergraph edges, we can create an algorithm
that more efficiently generates nested relation schemes in NNF than if we have
no assumptions. For practical cases, we should also be concerned about
the possible failure of the universal relation assumption and about
application-dependent
choices for attribute clustering, for example, maximal
clustering or clustering with respect to user-chosen roots.
We have written an initial paper that begins to address the issues for
the practical case
.[ [
Mok Embley Ng Hong Kong
.]],
and have given some thought about how to proceed when the universal
relation assumption does not hold
.[ [
Light in preparation
.]],
but we leave to later papers a full exploration of these ideas.
These papers should also explore issues such as lossless-join decompositions
and dependency preservation. We, of course, want to ensure that no information
is lost, but we should not, in general, expect dependency preservation.
.[
$LIST$
.]
nnf.refs000066600003200000027000000060130547220561300135340ustar00liddlpstudent00000000000000%A S. Abiteboul
%A N. Bidoit
%T Non-first normal form relations: an algebra allowing data restructuring
%J Journal of Computer and System Sciences
%V 33
%D 1986
%P 361-393
%A P. Atzeni
%A V. DeAntonellis
%T Relational Database Theory
%I The Benjamin/Cummings Publishing Company, Inc.
%C Redwood City, California
%D 1993
%A C. Beeri
%A P.A. Bernstein
%A N. Goodman
%T A sophisticate's introduction to database normalization theory
%J Proceedings of the International Conference on Very Large Data Bases
%D 1978
%P 113-124
%A C. Beeri
%A R. Fagin
%A J. Howard
%T A Complete axiomatization for functional and multivalued dependencies in
database relations
%J ACM SIGMOD Conference
%D 1977
%P 47-61
%A E.F. Codd
%T Further normalization of the data base relational model
%B Data Base Systems
%E R. Rustin
%I Prentice-Hall
%C Englewood Cliffs
%D 1972
%P 33-64
%A D.W. Embley
%A W.Y. Mok
%A Y.K. Ng
%T An Improved Normal Form for Nested Relations
%R Technical Report BYU-CS-93-2, Department of Computer Science, Brigham Young University
%C Provo, Utah
%D February 1992
%A D.W. Embley
%A Y.K. Ng
%A W.Y. Mok
%T Unifying Normalization Theory Under a New Definition for Nested Normal Form
%O (submitted for publication)
%A R. Fagin
%T Multivalued dependencies and a new normal form for relational databases
%J ACM Transactions on Database Systems
%V 2
%N 3
%D September 1977
%P 262-278
%A H.F. Korth
%A A. Silberschatz
%T Database System Concepts, Second Edition
%I McGraw-Hill, Inc.
%C New York, New York
%D 1991
%A J. Light
%T A Relational Database Design Tool for Object-Relationship Models
%R Master's Thesis, Department of Computer Science, Brigham Young University
%C Provo, Utah
%O (in preparation)
%A D. Maier
%T The Theory of Relational Databases
%I Computer Science Press
%C Rockville, Maryland
%D 1983
%A W.Y. Mok
%A Y.K. Ng
%A D.W. Embley
%T An improved nested normal form for use in object-oriented software systems
%J Proceedings of the Second International Computer Science Conference: Data and Knowledge Engineering: Theory and Applications
%C Hong Kong
%D 13-16 December 1992
%P 446-452
%A Z.M. Ozsoyoglu
%A L. Yuan
%T A new normal form for nested relations
%J ACM Transactions on Database Systems
%V 12
%N 1
%D March 1987
%P 111-136
%A Z.M. Ozsoyoglu
%A L. Yuan
%T On the normalization in nested relational databases
%J Lecture Notes in Computer Science #361
%I Springer Verlag
%D 1989
%P 243-271
%A M.A. Roth
%A H.F. Korth
%T The design of non-1NF relational databases into nested normal form
%J Proceedings of the 1987 ACM-SIGMOD Conference
%C San Francisco
%D May 1987
%P 143-159
%A M.A. Roth
%A H.F. Korth
%A A. Silberschatz
%T Extended algebra and calculus for nested relational databases
%J ACM Transactions on Database Systems
%V 13
%N 4
%D December 1988
%P 389-417
%A M.W. Vincent
%A B. Srinivasan
%T Redundancy and the justification for fourth normal form in relational databases
%J Proceedings of the Second International Computer Science Conference: Data and Knowledge Engineering: Theory and Applications
%C Hong Kong
%D December 1992
%P 432-438
n Francisco
%D May 1987
%P 143-159
%A M.A. Roth
%A H.F. Korth
%A A. Silberschatz
%T Extended algebra and calculus for nested relational databases
%J ACM Transactions on Database Systems
%V 13
%N 4
%D December 1988
%P 389-417
%A M.W. Vincent
%A B. Srinivasan
%T Redundancy and the justification for fourth normal form in relational databases
%J Proceedings of the Second International Computer Science Conference: Data and Knowledge Engineering: Theory and Applications
%C Hong Kong
%D December 1992
i-1 )~->~X% holds for %T% but
%X~notsubsetorequal~Ancestor(V sub i-1 )%, it is not the case that
for every node %N~in~T%, if %Ancestor(N)% %->% %Y% holds for %T%,
%Y% %subsetorequal% %Ancestor(N)% %dash% a contradiction. %square%
.NH
Conclusions and Related/Future Work
.NH 2
Concluding Remarks
.PP
In this paper, we proposed a new definition for Nested Normal Form (NNF).
Given a set of MVDs and FDs,
our NNF definition imposes three conditions on nested relation schemes:
.IP (1)
The set of MVDs and FDs that hold must agree with the union of the set of MVDs
that are naturally implied by the scheme tree and the set of FDs that hold.
.IP (2)
Any nontrivial FDs that hold and have a single attribute on the right-hand side
must functionally determine all the ancestor attributes of the right-hand-side
attribute. This forces applicable one-to-many constraints to conform to the
scheme tree.
.IP (3)
The scheme tree must not allow internal singleton buckets, which are
structurally anomalous because they allow at most one tuple.
.LP
Our NNF definition fully uses semantic constraints imposed by
FDs as well as MVDs. This permits greater attribute clustering and leads to
more flexibility in nested relation design than do previously proposed
definitions for NNF.
.PP
We also proposed a definition for redundancy based
on the notion that an atomic value in a nested relation
is redundant if it can be determined from an
FD or MVD that holds and other atomic values in the nested relation.
We then proved that the first two conditions of our NNF definition are necessary
and sufficient to remove potential redundancy for any scheme tree that is
consistent with a given set of MVDs and FDs.
.NH 2
Related and Future Work
.PP
There are several different ways redundancy might be characterized
including the one given in
.[ [
Vincent Srinivasan
.]]
and some suggested ones we have considered: namely, that we could base our
definition on all dependencies rather than on one dependency or that a data
value is redundant if there do not exist two different relations that
agree on all other data values and obey a given set of dependencies.
Further investigation into the possible
equivalence of these and other definitions may
be illuminating. As usual, when there are multiple equivalent definitions,
the best ones can all be used to provide different points of view. We like
ours for teaching where we can simply erase an atomic value in a
nested relation and ask if there is a constraint
and other data values that together determine the erased value.
.PP
Elsewhere, we have shown that normalization theory for flat relations
is a special case of normalization theory for nested relations
.[ [
Embley Ng Mok Unifying
.]].
We have proved that the first two conditions of our definition for NNF are
equivalent to Fagin's 4NF for degenerate nested relations (flat relations).
Equivalence for BCNF follows immediately when we only consider given FDs.
Since in Sections 4 and 5, of this paper, we proved that if a nested relation
scheme is consistent and
satisfies the first two conditions of our NNF definition, it has
no potential redundancy and conversely, and since we proved in
.[ [
Embley Ng Mok Unifying
.]]
that the first two conditions of our NNF definition
are equivalent to both Fagin's 4NF and BCNF under degenerate conditions,
we are able to observe
as a corollary that our redundancy definitions precisely characterize both
Fagin's 4NF and BCNF. As a result of these proofs and corollaries,
our definition for potential redundancy stands as an alternative
characterization for Fagin's 4NF and BCNF as well as for our definition of NNF.
.PP
We are investigating algorithms for obtaining NNF schemes given a set of
attributes and a set of MVDs and FDs.
Since our definition provides a high degree of flexibility, it naturally
permits many different algorithms for generating nested relation
schemes in NNF. Furthermore, different input assumptions lead to vastly
different algorithmic characteristics. For example, if we
assume that in practical cases the MVDs are generated from an acyclic hypergraph
and that the FDs are embedded in hypergraph edges, we can create an algorithm
that more efficiently generates nested relation schemes in NNF than if we have
no assumptions. For practical cases, we should also be concerned about
the possible failure of the universal relation assumption and about
application-dependent
choices for attribute clustering, for example, maximal
clustering or clustering with respect to user-chosen roots.
We have written an initial paper that begins to address the issues for
the practical case
.[ [
Mok Embley Ng Hong Kong
.]],
and have given some thought about how to proceed when the universal
relation assumption does not hold
.[ [
Light in preparation
.]],
but we leave to later papers a full exploration of these ideas.
These papers should also explore issues such as lossless-join decompositions
and dependency preservation. We, of course, want to ensure that no information
is lost, but we should not, in general, expect dependency preservation.
.[
$LIST$
.]