Storage Facility Help
Storage Facility Documentation
NOTE : This document is currently under construction.
Storage Facility Overview
The Storage Facility was developed to facilitate storage of OSA models.
It is an in-memory database that offers garbage collection, indices,
tuple and table storage, nested array storage structures, relational table
operators, scheme and constraint definitions, and data-instance and
seamless model structures.
The following is an index into the Storage Facility Documentation:
Reference Counting
Every object in the storage facility has reference counting functionality.
The reference counting class, RefObject, keeps a count of the number of times
that an object is inserted into a container object.
Therefore, the initial reference count is zero for an object until it is inserted
into an SF_ARRAY object or a subtype of the SF_ARRAY class.
There are two methods used to perform reference counting:
- ref() increments this object's count and returns the object.
- unref() decrements the count and returns true if the count is equal to zero.
RefObject's also have an update flag mechanism. This is used to indicate
the current status of the object; namely:
- OBJECT_FLAG_NONE
- OBJECT_FLAG_NEW
- OBJECT_FLAG_CHANGED
- OBJECT_FLAG_ENTRY_ADDED
- OBJECT_FLAG_ENTRY_CHANGED
- OBJECT_FLAG_ENTRY_DELETED
- OBJECT_FLAG_ALL
The methods are setFlag() and getFlags().
In creating a new subclass of RefObject, the ref() method must be overwritten
as follows :
SF_ARRAY* ref(void) { return ((SF_ARRAY *) RefObject::ref()); }
Example:
SF_ARRAY v;
....
RefObject *o = v[2];
if (o->getFlags() && OBJECT_FLAG_NONE && o->unref())
delete o;
Garbage Collection
The reference counting mechanism can be used to garbage collect. This can be done as
follows :
insertion : v.putAt(o->ref()); // reference count = 2 : 1 for ref() and 1 for putAt()
garbage collection : if (o->unref() == TRUE)
delete o;
Name Index
The SF_NAME_INDEX class uses an SF_ARRAY object to store character strings. A string is
stored based on a hash function. Thus, this name index acts like a primary index for
any character string.
Operations in addition to those described in SF_ARRAY.
- Indexing
- indexOf() returns the index (integer) of a given string.
- Operator []
- The [] operator can be used as in an array to access the ith element.
Example:
char *name = "Christine";
SF_NAME_INDEX index;
index.insert(name);
int indexOf = name.indexOf("Christine");
...
Inverted Index
The SF_INVERTED_INDEX class is a sub-class of SF_ARRAY that
has an inverted index entry in each element of the array. The index entry consists
of a SF_OBJECT (or sub-type) object plus a list of indices (integers) that can be
used to reference rows in an SF_TABLE where that object is
stored.
Operations in addition to those described in SF_ARRAY.
- Indexing
- indexOf() returns the index (integer) of a given element.
Example:
SF_TUPLE *t = new SF_TUPLE(2, "Fred, 2");
SF_INVERTED_INDEX index;
SF_TABLE tab;
tab.putAt(3, t);
index.insert(t->at(i), 3);
...
Mapping Names to Integers
An SF_INTEGER_MAPPING object maps one integer into another integer. This is used
to map an index value from a name index to a column value for a table. This was
created to all the original storage facility to have names for columns rather than
using position numbers.
Operations.
- Get domain and range values
- getDomain() returns the ith domain value.
- getRange() returns the ith range value.
- Mapping
- map() maps a domain value to a range value.
- unmap() removes a mapping.
- Retrieve a mapping
- The operator [] retrieves the range for a given domain value.
Example:
SF_INVERTED_INDEX nameIndex;
SF_INTEGER_MAPPING nameToTableMapping;
int index;
...
int nIdx = nameIndex.insert("Fred");
nameToTableMapping.map(nIdx, index);
...
Object
The SF_OBJECT class is a base class for basic object types available
in the storage facility system. Currently, there are two types,
lexical and
non-lexical.
Operations.
- Relational operators
- operator >
- operator >=
- operator <
- operator <=
- operator ==
- operator !=
- Object value operations
- getValue() returns the value of the object as a string.
- setValue() sets the value of the object. If the input is
in double quotes then the object is lexical; otherwise it is
non-lexical.
- Static Compare Methods
- Equal - determines if two objects are equal.
- lessThan - determines if the first object is greater than
the second.
- greaterThan - determines if the first object is less than
the second.
Example:
...
Lexical Objects
An SF_LEXICAL object is an SF_OBJECT that is lexical. Lexical means that the
value of the object also acts as its representation. In other database systems
lexical objects are referred to as values. Thus "Fred", 1, 6948.234, etc. are
considered lexical objects. In the storage facility, lexical values are
distinguishable by enclosing double quotes.
Operations in addition to those described in
object.
Example:
...
Non-lexical Objects
An SF_NONLEXICAL object is an SF_OBJECT that is non-lexical. Lexical means that
object has no external value. The only value it has is a unique identifier that
is internal to the system.
Operations in addition to those described in
object.
Example:
...
Array Container
The SF_ARRAY class provides a generic, array container where each entry in the array
can be any RefObject. An array has dynamic size reallocation.
Operations
- Access
- The [] operator is overload to return a reference to the object in the
ith position.
- The at() method does the same, except that a pointer is returned.
- Insertion and Deletion
- The insert() method inserts the object into the first blank position.
- putAt() puts the element at the position specified (removing any current
object in that place).
- add() places the element at the end of the array.
- The + operator does the same as add().
- remove() removes the element from the array,
deleting it if its reference count is zero.
- Element Count and Indexing
- The occurrence count of an element (occurrencesOf()).
- The index of the element is returned (indexOf()).
- Iterators
- Since array slots may be empty, the firstIndex() and nextIndex() methods
are used to iterate through the array.
- For convenience, the FOR, FOR_p, ITERATE, and ITERATE_p macros can be
used to iterate through the entire array.
- Compare Methods
- The compare() method compares this with another SF_ARRAY.
- Operators are overloaded to compare objects (==, !=).
Example:
SF_ARRAY array;
int found = 0;
SF_LEXICAL l1("Mary"), l2("John"), l3("Christine");
array.putAt(4,l2);
array.insert(l3);
array.add(l1);
array.insert(l2);
array.putAt(23, l3);
array + l1;
ITERATE(int i, array, !found) {
if (array[i] == l1) {
array.remove(i);
found = 1;
}
}
FOR(i, array)
array.at(i)->printOn(cout);
int num = array.occurrencesOf(l3);
int index = array.indexOf(l1);
...
Tuple Container
The SF_TUPLE class provides a container similar to SF_ARRAY except
that the size of the tuple is restricted to be a certain size (number of elements) and
the SF_TUPLE can be loaded by parsing an input string. The grammar for such a string
is
:= | ,
:= : |
:= character string
:= character string
where the tuple size is determined by the number of comma separated parts.
If the tuple is part of a named storage facility mechanism, then tuple entries
can be referenced by name. Otherwise, they are integer referenced.
Operations in addition to those described in SF_ARRAY.
- Constructor and Set Value
- SF_TUPLE() can be passed a string which is parsed and the values
are inserted in the tuple.
- setValue() takes a string, parses it, and loads the tuple with
the values obtained.
- Object Containment
- contains() verifies if an object is in the tuple at a position.
- Static Compare Methods
- Equal - determines if two tuples are equal.
- lessThan - determines if the first tuple is greater than the second.
- greaterThan - determines if the first tuple is less than the second.
Example:
char *input = "\"Fred\", 2, "\18 North Street\"";
SF_TUPLE tuple1(3), tuple2(3, input);
tuple1.setValue(input);
if (SF_TUPLE::equal(tuple1, tuple2))
tuple2.remove(SF_NON_LEXICAL(2));
...
SF_LEXICAL lex("\"Fred\"");
if (tuple2.contains(lex) == TRUE)
tuple1.add(lex);
...
Table Container
A table is similar to a relational database table in it stores an array of tuples, each tuple
having the same size. A SF_TABLE also has an inverted index on each row, making indexing more
efficient.
The SF_Table class is very much similar to SF_ARRAY
and SF_TUPLE. Every element stored is a tuple with the same tuple size
(i.e., number of elements in the tuple). A SF_TABLE can also be loaded from a string or
a stream. The grammar for a table is :
:= RETURN ~
:= SF_TUPLE | SF_TUPLE RETURN
where SF_TUPLE is parsed as described in the tuple description and
RETURN means a carriage return.
Some operations on tables may or may not require that the tables are compatible. Table
compatibility means that the column names are exactly the same. Compatibility can be
retrieved and set with the setDefaultCompatibility() and getDefaultCompatibility()
methods.
If the table is part of a named storage facility mechanism, then table operations on
columns can be referenced by name. Otherwise, they are integer referenced.
Operations in addition to those described in SF_ARRAY.
- Constructor
- All tables must be initialized with a tuple size.
- Index access
- The inverted index for a column can be accessed (getIndex()).
- Load and Save
- load() the table based on the grammar.
- save() the table in the same format as the grammar prescribes.
- Element operations
- occurrencesOfComponent() finds the number of times an element appears
in a column within the table.
- containsComponent() verifies if an element is in the ith column of the table.
- Subset
- isSubsetOf() checks to see if this table is a subset of another table.
- Tuple selection
- selectTupleOn() will find the first occurrence of a tuple with a particular
value in one or more columns, columns being referenced by name or by position
number.
- Change
- change() is the equivalent of remove and insert with changed value.
Example:
SF_TABLE tab1(2);
tab1 + &(*new SF_TUPLE(2)
+ new SF_LEXICAL("Test 1")
+ new SF_NON_LEXICAL(1034)
);
tab1.assignPosName("A",0);
tab1.assignPosName("B",1);
SF_TUPLE *result = tab1.selectTupleOn(1, "A", new SF_NON_LEXICAL(1034));
...
Relational Algebra Table Operators
The relational table operators are subclasses of the SF_TABLE class that create a
table after performing a relation algebra operation on an input table(s). This
is done by using the constructor with parameters that specify the table to be
operated on and additional information to the operator (e.g., row number).
Column names can be used for specifying the operation parameters if the operand tables
have name mapping information. This usually comes from using a model instance as the source of
the tables. If no name mapping information is available, then columns have to be specified with
column numbers, 0 being the first column.
Relational Algebra Table Operators.
- SF_TABLE_PROJECTION
- A table is produced which has been projected on the specified columns.
- SF_TABLE_SELECTION
- A table is produced with tuples that match selection parameters. Selection is
based on equality with SF_OBJECT parameters. If a comparison other than equality is
desired, a function can be passed in that specifies the selection criteria.
- SF_TABLE_JOIN
- A table is produced based on the equi-join of two other tables. If no columns
are specified as joining parameters then the cartesian product is performed. If a theta
join is desired using an operator other than equality, then a function can be passed in
which specifies the join criteria.
- SF_TABLE_UNION
- A table is produced which is the union of two other tables.
- SF_TABLE_DIFFERENCE
- A table is produced which is the difference of two other tables.
- SF_TABLE_INTERSECTION
- A table is produced which is the intersection of two other tables.
- SF_TABLE_CHI
- A table is produced that uses the chi operator to perform some operation (a function)
on the input table. The operation desired is passed in as a parameter to the SF_TABLE_CHI
constructor. Column grouping is possible.
Example:
SF_TABLE &t1 = model.getTable("Person has Name"),
&t2 = model.getTable("Person has SSN");
SF_LEXICAL name("Fred Flinstone");
SF_TABLE result(
SF_TABLE_JOIN(
SF_TABLE_PROJECTION(
SF_TABLE_SELECTION(t1, 1, "Name", name),
1, "Person"),
t2, 1, "Person", "Person"));
...
SF_TABLE_UNION t(anotherTable, fredsTable);
SF_TABLE_JOIN lastone(t, t1, 2, 1, 3, 2, 4);
Defining Tables and Constraints
A scheme can be defined as a set of relational table definitions and a set of
constraints. There are two types of table definitions and eight types of constraint
definitions. The table definitions define the tuple size and name of each relational
table in the SF_DATA_INSTANCE. The constraints are used to define varies constraints
as well as to validate a data instance to see if it violates the constraint.
Model Definitions
- Table Definition Types.
- Object Class
- An SF_OC_DEF defines an object class table, which is a table with tuple size of one.
The definition also defines if the elements of the table are to be lexical or non-lexical.
- Relationship Set
- An SF_RS_DEF defines a relationship set, which is a table of tuplesize 2 or bigger.
- Constraint Definition Types.
- Inclusion Constraint
- An SF_IN_DEF defines an inclusion dependency, which states that the specialization class
must be included in the union of the generalization classes.
- Mutual Exclusion Constraint
- An SF_ME_DEF defines a mutual exclusion constraint, which states that the intersection
of two object classes is empty.
- Union Constraint
- An SF_UN_DEF defines a union constraint, which states that the union of a number of
object classes equals the union class.
- Referential Integrity Constraint
- An SF_RE_DEF defines a referential integrity constraint, which states that the objects
in a relationship must also exist in an object class. The position value is used to
determine what row of the relationship set that the object class appears.
- Subpart Constraint
- An SF_SU_DEF defines a subpart constraint, which ??.
- Participation Constraint
- An SF_PC_DEF defines a participation constraint. A participation has a number of
min-max pairs for an object class in a relationship set. This constraint verifies that
each object in the relationship set for that object class participates inbetween at least
one "min-max pair" times. Participation means that the cardinality of that object in
the relationship is between min and max values.
- Co-occurrence Constraint
- An SF_CO_DEF defines a co-occurrence constraint, which ...
- General Constraint
- An SF_GC_DEF defines a general constraint, which is a constraint writen formally or
informally. If formal, the general constraint can be parsed and executed on the data
instance to be verified.
Example:
Future Work
To Do List.
- More extensive error reporting on constraint checking.
- Lexical extensions (real, integer, etc.)
Jeff Pinkston
(jeffp@cs.byu.edu)