This paper describes a form of set, relation, and field definitions created for describing and implementing set, relation and field operations and shows how it is easily used both as a form for descriptions of data relationships and implementations of algorithms based on those data relationships.
It also includes comments pertaining to some variations that have been included in computer implementations of the approach.
Traditional sets and relations do not lend themselves well to the description of or the application to many types of problems due to their ambiguity and adherence to "historical ideal" definitions. Also the arbitrariness of their set member types, and overall lack of rigor in what a set can or cannot contain contribute to excessive complexity in implementing a useful algorithmic notation and support software packages.
An example of set ambiguity is to consider a set containing {apple, ball, ball, ball}. It is easy to see that the set has three balls, but any reference to one of the balls is ambiguous, as the listed ordering is NOT traditionally and inherent part of the set properties.
The arbitrariness of the set membership is illustrated by sets that contain themselves and their parents as well as such members as the four listed above. The nonhomogeneity of the membership of the traditional set makes the representation and implementation of sets themselves extremely awkward and complex.
Attempting to generate an algorithmic notation for such general sets leads to extreme obscurity of what should or should not be included, and precludes the reduction of complexity that any such notation is intended to provide.
A way was needed to represent set applications for elemental components of higher-order complex data structures, such as mesh models, network models, and graph models, in a way that allows a few features and capabilities not easily available in currently available approaches:
It was obvious that a level of data abstraction lower than that provided by meshes, graphs, and networks was needed that could relate mappings among all the needed set member forms. Such a level of abstraction could also include enough metadata to enable the ability to exchange data among models and model forms.
In defining the sets, relations, and fields approach it also became obvious that it lent itself well to a desirable descriptive notation which enabled the easy design of complex data structures and algorithms. This notation allows easy description of parallelization and repartitioning algorithms, as well as complex models of meshes, graphs, and other network models.
An unambiguous set form is defined along with relations based in that representation. Then fields of attribute values over the sets' members are defined to allow a diversity of set member "types" that may be enjoyed via the traditional set membership versatility. The use of the SRF abstractions is then discussed relative to numerous set, relation, and field operations. Ways to apply the SRF operations and notation to various problem domains is then covered. Finally some variations included in some of the software implementations of the approach are discussed.
The SRF approach is based on an indexed set definition, relations between these indexed sets' members, and fields of values over the sets' members. Each of these is discussed in more detail.
An indexed set is defined to be a set whose members are successive integers, starting at 0 and counting up to n-1 for a set with n members. Minimally the sets include a unique set ID, a set name, and a count of set members. For dynamic set membership members may be deleted, and the index is left out to prevent having to propagate a set member renumbering to every relation and field which reference the set as a domain or range set. In these cases the set must also include an enumeration of the set members present, generally as a hashed set of values.
An indexed set has the following properties:
To associate an indexed set with such properties as colors, descriptive names, or other attribute values a field is defined over the members of the set. A field has one "value" for each member of the set, and these values may be any form, but of a single homogeneous form per field. A set of three members could have fields of values including colors (red, green, blue), 2-dimensional coordinates ((0.00, 0.00), (0.00, 1.00), (0.50, 0.50)) and probabilistic weights (0.23975, 0.71, 0.0525).
Additionally, the values of the fields can be user-defined structures which allow the references to other sets (for sets including sets), sets of fields, and sets of relation tuples, all of which have been included in various versions of the SRF library.
To map any finite traditional set to an indexed set's members is a trivial task, and the traditional set's member designations, such as names or IDs, may be easily mapped into a field over the indexed set's members. To map such as set to an indexed set the procedure is:
Extensions have been made to at least one implementation of the SRF library that allow an indexed set to include time "instances" to represent multiple time instances of time-varying set membership in computer memory at one time, yet easily maintain a distinction between the various sets, their membership at each time instance, and the corresponding time-varying relation tuples and field values for each corresponding set member.
A relation between the members of a domain set and a range set is a collection of tuples which each map a single domain set member to a single range set member. A relation may represent a generalized relation or a special form of relation known as a subset relation. Minimally a relation consists of a unique relation ID, a relation name, a domain set, a range set, and a collection of tuples.
A generalized relation may have any number of mappings from each member of the domain set to range set members. The domain set and range set may be any two sets.
A subset relation relates the members of a subset (a set) each to a unique member of its parent set.
To allow the use of fields over the tuples in a relation the relation set was created. Its members represent the tuples in a relation. Such a representation allows easy representation of a field of attributes over the links of a graph, and the representation of the links as both a relation and a set.
Efficient implementation forms have been devised for representing the relation tuples as a sparse matrix. Also, for time-varying sets an implementation includes representing time-varying relations via time instances of relations' tuples.
A field is a collection of values over the members of its domain set. All the values of a field are of a single type (numeric, string, etc.) and form (scalar, vector, etc.). There is one value for each set member, though some values may be "undefined" in some field implementations. Minimally a field always includes a unique field ID, a field name, the domain set, and the field values.
The values in a field may be of types such as enumerated variables, strings, scalar numbers, or even user-defined structures. These types may then be combined into complex numbers, vectors, matrices, tensors, vectors of strings, or even arrays of user-defined structures. In a given field the values MUST be of a homogeneuous type and form to allow the definition of simple field operations over the set members in a consistent and useful manner.
Fields may include metadata about the data values they contain. In some implementations this metadata includes the stored data type, units for scalar numeric values, vector forms used (such as Cartesion versus spherical versus cylindrical), field derivation audit data, and derived field calculation data to allow "on-demand" generation of the field values.
For numeric fields it is useful include metadata that describes the values of the field in an efficient way, including a field name, field data type descriptions, numeric field value units, and for vectors, the vector form used. By using such metadata it is possible to include automatic units conversion for some field operations and self-describing data output formatting so the values can easily be read into other applications with a reasonably full context for units and vector form conversion at load time.
The time instances have also been implemented over fields, storing field value instances for each time instance. This allows time-based extrapolation and integration functions to be easily implemented over multiple time instances of field values. Retaining multiple field value instances allows multi-time-step calculations to use in-memory historical values.
There are several classes of operations that are possible via the three basic SRF constructs:
Each relation (Ra,b, Rb,c) is a mapping from the members of a domain set (Sa for Ra,b, or Sb for Rb,c) to the members of a range set (Sb for Ra,b or Sc for Rb,c), and is represented by a collection of tuples. Some operations on those relations may be considered, such as:
This may be viewed as just reversing the pairings of domain set, range set member references, and swapping the domain and range sets for the relation. The result is a relation going the opposite direction. For subset relations this implies that some parent set members are not mapped to any members of the subset.
An example is:
Given:
Ra,b: Sa --->
Sb
Ra,b = {(0, 5) (1, 1) (2, 3)}
and wanting:
Rb,a: Sb --->
Sa
to be obtained by inverting:
Rb,a = Ra,b-1
which is
Rb,a = {(5, 0) (1, 1) (3, 2)}
or (reordering the tuples)
Rb,a = {(1, 1) (3, 2) (5, 0)}
This may be viewed as merging two mappings via the members of a shared intermediate set. When relations are composed they are mappings from the domain set of the first to the range set of the second, but modified by the intermediate relationships with the shared set.
An example is:
Given:
Ra,b: Sa --->
Sb
Ra,b = {(0, 0) (1, 2) (2, 1)}
Rb,c: Sb --->
Sc
Rb,c = {(0, 1) (1, 3) (2, 0)}
and wanting:
Ra,(b),c: Sa --->
Sc
(via Sb)
to be obtained by composing:
Ra,(b),c = Ra,b *
Rb,c
or
Ra,(b),c = {(0, 0) (1, 2) (2, 1)} *
{(0, 1) (1, 3) (2, 0)}
and, combined
Ra,(b),c = {(0, 1) (1, 0) (2, 3)}
This may be viewed as creating the union of the set of tuples in one relation with those of another, getting a single relation which includes all the paths defined in either of the original relations.
An example is:
Given:
Ra,(p1),b: Sa --->
Sb
Ra,(p1),b = {(0, 0) (1, 2) (2, 1)}
Ra,(p2),b: Sa --->
Sb
Ra,(p2),b = {(0, 1) (1, 3) (2, 0)}
and wanting:
Ra,(p1 and p2),b:
Sa --->
Sb
(via either path p1 or path p2)
to be obtained by augmenting:
Ra,(p1 and p2),b =
Ra,(p1),b +
Ra,(p2),b
which is
Ra,(p1 and p2),b =
{(0, 0) (1, 2) (2, 1)} + {(0, 1) (1, 3) (2, 0)}
Ra,(p1 and p2),b =
{(0, 0) (0, 1) (1, 2) (1, 3) (2, 0) (2, 1)}
This may be viewed as having a single mapping from a domain set to a range set, but generating an intermediate subset of the range set and generating a mapping from the orginal domain set to the new subset. This results in three products, the new subset, the mapping to the new subset, and the subset relation from the new subset to the range set.
An example is:
Given:
Ra,c: Sa --->
Sc
Ra,c = {(0, 1) (1, 3) (1, 4) (3, 4) (4, 5)}
and wanting:
Ra,b: Sa --->
Sb
and
Rb,c: Sb --->
Sc
where Sb is a minimal subset of Sc
to be obtained by decomposing via:
Ra,c = Ra,b *
Rb,c
The domain set members referenced are {0, 1, 3, 4} and
the range set members referenced are {1, 3, 4, 5}.
This allows the definition of the new minimal subset with 4 members and
the subset relation
Rb,c = {(0, 1) (1, 3) (2, 4) (3, 5)}
giving the useful derivation
Rb,c-1 =
{(1, 0) (3, 1) (4, 2) (5, 3)}
and allows the redirection of the original Ra,c as
Ra,b = Ra,c *
Rb,c-1
Ra,b =
{(0, 1) (1, 3) (1, 4) (3, 4) (4, 5)} *
{(1, 0) (3, 1) (4, 2) (5, 3)}
Ra,b = {(0, 0) (1, 1) (1, 2) (3, 2) (4, 3)}
Other operations may be useful, but most can be resolved down to combinations of these four with judiciously selected sets and subsets.
Sets are normally associated with basic set operations, including:
The operand sets are subsets of the parent set. The subset relations indicate to which members of the parent set each subset member refers. Thus, all the set operations may be done in the classical manner on the collection of range set references in the operands' subset relations. The result becomes the range set references for the result subset defined by the set operation being performed.
An example of subsets with a shared parent set is as follows:
Using these definitions each of the set operations using the subsets Sa and Sb are described below.
The range set members referenced in each subset relation are {0, 2, 4} and
{2, 1, 3}, and the traditional intersection of these is {2}. This
defines the collection of range set references for
RIntersect(a,b). This has only one member, so the new
result subset has only one member and its subset relation to the parent set
is given by
RIntersect(a,b) = {(0, 2)}
mapping the one member of the subset SIntersect(a,b)
to the one referenced parent set member.
The traditional union of {0, 2, 4} and {2, 1, 3}, which is {0, 1, 2, 3, 4},
is the collection of range set references for RUnion(a, b).
So the new result subset has 5 members, and these define the subset
relation tuples
RUnion(a, b) =
{(0, 0) (1, 1) (2, 2) (3, 3) (4, 4)}
which are the associations of the union set SUnion(a, b)
members' indices to the indices of each corresponding parent set member.
The ordering of the subset members is arbitrary at the time the subset
relation is being defined, although referencing the parent set members
in increasing order of index value is often convenient for clarity of
representation.
The traditional difference of Sa, {0, 2, 4} minus
Sb, {2, 1, 3}, is {0, 4}. This is the collection of range
set references for RDiff(a, b), giving
RDiff(a, b) = {(0, 0) (1, 4)}
and the traditional difference of Sb, {2, 1, 3} minus
Sa, {0, 2, 4}, which is {1, 3}, is the collection of
range set references for RDiff(b, a), giving
RDiff(b, a) = {(0, 1) (1, 3)}
Fields are separate values, one for each member of the domain set for the field. Based on the value types, any function that can be performed on a single value of that type may be performed on every entry in the field, thus it becomes a candidate for a field value operation. Thus field values can have several categories of operations defined over their values, such as the following, each with examples:
Unary operations may include a variety of types based on the data forms involved, such as
Binary operations can be defined over fields similarly, by passing value pairs
for a single domain set member into the function and storing the result into
the result field entry for that same domain set member. An expression such
as
Fy: a = Fx1: a +
Fx2: a
causes the repeated performance of the operation
y[i] = x1[i] + x2[i]
as i varies over the indices of the members of the domain set
Sa.
Similarly field operations may be defined and implemented over more complex value types and combinations of fields.
When fields are NOT over the same domain set it is sometimes necessary to extract the values of one field over its domain set to become a field over the set which is the domain set of the other field. Also, if field values are calculated in a parallel environment it is often necessary to merge the resulting field values over subsets (the partition subsets) into a global field of values over a more global domain set (the partitioned domain set).
When a field (Fx: p) over a set (Sp)
must be extracted to a field (Fx: a) over a subset
(Sa) this can be done via the composition of a field with
a relation derived from the subset relation (Ra,p) as
follows:
Fx: a = Ra,p *
Fx: p
When a field (Fx: a) over a subset (Sa)
must be projected into an existing field (Fx: p) over the
parent set (Sp) with the subset specified by the subset
relation (Ra,p) then the field value members must be mapped
to the parent set members so the values may replace the proper member's
values in the field over the parent set. This can be indicated via
Fx: p <-- Fx: a *
Ra,p
When attempting dynamic repartitioning of extremely large models the repartitioning cannot be done easily in a single processor. Ideally one would like to divide up the activities of the repartitioning among the multiple processors already available. To do this requires two collections of subsets (Sp[i] and Sp[j]) representing the current partitioning (p[i]) and the new partitioning (p[j]) of the base set (Sp). To repartition the data in each field over Sp the subsets given by the intersections of the subsets in each partitioning must be obtained (Rp[i][j],p) and then the extraction relations (Rp[i][j],p[i]) and the insertion relations (Rp[i][j],p[j]) can be created. These relations may be composed for each Sp[i][j] giving the extraction/insertion relation Rp[i],p[j].
Since the old partition on each processor (p[i]) consists of the the fields
over the subset (Sp[i]) on that processor then the
repartitioning may be done via
Fx: p[i][j] =
Fx: p[i] *
Rp[i],p[i][j]
for each new subset (Sp[i][j]) to which that subset maps,
and the fields (Fx: p[i][j]) over that subset may be
written out. Processor j in the new partitioning can then read in only those
fields (Fx: p[i][j]) over the subsets
(Sp[i][j]) which are common to the jth processor and merge
them into the new partition subset's fields using
Fx: p[j] <--
Fx: p[i][j] *
Rp[i][j],p[j]
The use of the sets, relations, and fields approach presented above has enabled a simplified notation for expressing the partitioning of complex data structures for dynamic reconfiguring in parallel processors.
An example:
Se: A set of mesh model element polyhedral
atmosphere volumes
Sn: A set of mesh model element vertex
node points
Re,n: The association of the element
volume polyhedra to the nodes which are their vertices
Fa: n: The field of coordinate vectors for
each mesh node
Fb: n: The field of temperatures at each
node
Fc: e: The field of element atmosphere
pressures for each elemental volume
Fd: e: The field of element atmosphere
volumes for each elemental volume
Fe: e: The field of element CO2
mass in each elemental volume
Ff: e: The field of element N2
mass in each elemental volume
Fg: e: The field of element O2
mass in each elemental volume
Fh: e: The field of element atmosphere motion flow vectors
Fi: e: The field of element atmosphere momentum
Fj: e: The field of element atmosphere charge
Fk: e: The field of element gaseous water mass
Fl: e: The field of element liquid water mass
Fm: e: The field of element solid water mass
Spartitions: A set of parallel partitions
Rpartitions,e: The partitioning of the elements for the parallel processes.
Using this description it becomes easy to describe the projection of the
partitioning to the nodes, determine which nodes must be shared among
processes, and determine which field data must be extracted for each
processor. Even more important is that it is now logically tractable to
design the algorithms to perform distributed dynamic repartitioning of
the model data.
With these definitions questions may be answered easily, such as:
Extremely large network and graph models may be easily partitioned and dynamically repartitioned as well as their operations can be defined for implementation on large parallel architectures using the conceptually simple notation.
Using the basic constructs of set, relations, and fields and interoperations among them simple concepts and a simple notation have been defined which allow large, complex software algorithms to be described and implemented.
The versatility of the approach allows it to be easily encapsulated by graph, network, and mesh application-specific wrappers, but still allows the sharing of data between the various data models. This low-level commonality that includes only a few simple constructs provides an infrastructure that has been applied in highly parallel mesh applications, graph applications, and network applications, and in complex protein folding applications.
The approach is simple, easy to understand, and easy to implement. It also makes it easy to implement and manipulate previously intractable models in small comprehensible steps.