Sets, Relations, and Fields (SRF):
Overview

July 23, 2007

by
James R. Holten III
Senior Research Scientist
Institute for Complex Additive Systems Analysis
New Mexico Institute of Mining and Technology
jholten@icasa.nmt.edu



I. Overview

This paper describes a form of set, relation, and field definitions created for describing and implementing set, relation and field operations and shows how it is easily used both as a form for descriptions of data relationships and implementations of algorithms based on those data relationships.

It also includes comments pertaining to some variations that have been included in computer implementations of the approach.

A. Background

Traditional sets and relations do not lend themselves well to the description of or the application to many types of problems due to their ambiguity and adherence to "historical ideal" definitions. Also the arbitrariness of their set member types, and overall lack of rigor in what a set can or cannot contain contribute to excessive complexity in implementing a useful algorithmic notation and support software packages.

An example of set ambiguity is to consider a set containing {apple, ball, ball, ball}. It is easy to see that the set has three balls, but any reference to one of the balls is ambiguous, as the listed ordering is NOT traditionally and inherent part of the set properties.

The arbitrariness of the set membership is illustrated by sets that contain themselves and their parents as well as such members as the four listed above. The nonhomogeneity of the membership of the traditional set makes the representation and implementation of sets themselves extremely awkward and complex.

Attempting to generate an algorithmic notation for such general sets leads to extreme obscurity of what should or should not be included, and precludes the reduction of complexity that any such notation is intended to provide.

A way was needed to represent set applications for elemental components of higher-order complex data structures, such as mesh models, network models, and graph models, in a way that allows a few features and capabilities not easily available in currently available approaches:

  1. Include descriptions of relationships between collections of mesh elements such as mesh nodes, edges, and faces, and components of a network or graph model to enable the easy design of complex model data partitioning as well as the design of inter-model conversion and mapping algorithms;
  2. Include the metadata needed to write out data from one model, update the model, then read the data back into a new environment, or to read the data into and append it to the data of other models;
  3. Allow the description and implementation of dynamic repartitioning of extremely large parallel models, including the intended overlap of "ghost regions" and collections of lower level elements that must be shared among the partitions;
  4. Allow ways to easily define well borders between parallel processor responsibilities so that dynamic repartitioning could be performed for load balancing or load shifting, as well needed for dynamic reconfiguration for fault-tolerant parallel and distributed processing;
  5. Allow easy description of field calculations; and
  6. Include metadata to enable automated units conversion when needed in field calculations.

B. Approach

It was obvious that a level of data abstraction lower than that provided by meshes, graphs, and networks was needed that could relate mappings among all the needed set member forms. Such a level of abstraction could also include enough metadata to enable the ability to exchange data among models and model forms.

In defining the sets, relations, and fields approach it also became obvious that it lent itself well to a desirable descriptive notation which enabled the easy design of complex data structures and algorithms. This notation allows easy description of parallelization and repartitioning algorithms, as well as complex models of meshes, graphs, and other network models.

An unambiguous set form is defined along with relations based in that representation. Then fields of attribute values over the sets' members are defined to allow a diversity of set member "types" that may be enjoyed via the traditional set membership versatility. The use of the SRF abstractions is then discussed relative to numerous set, relation, and field operations. Ways to apply the SRF operations and notation to various problem domains is then covered. Finally some variations included in some of the software implementations of the approach are discussed.

II. The Basics

The SRF approach is based on an indexed set definition, relations between these indexed sets' members, and fields of values over the sets' members. Each of these is discussed in more detail.

A. Indexed Sets

An indexed set is defined to be a set whose members are successive integers, starting at 0 and counting up to n-1 for a set with n members. Minimally the sets include a unique set ID, a set name, and a count of set members. For dynamic set membership members may be deleted, and the index is left out to prevent having to propagate a set member renumbering to every relation and field which reference the set as a domain or range set. In these cases the set must also include an enumeration of the set members present, generally as a hashed set of values.

An indexed set has the following properties:

  1. All the members are the successive integers from 0 to n-1 for an n-member set.
  2. The set members have the defined integer ordering.
  3. The set members are distinct non-negative integers.
  4. The set is finite, though there is some evidence it could be extended in some cases to countably infinite sets.
These properties of the indexed set differ from the traditionally defined set, yet enable the description and implementation of the various set operations. The properties also allow higher level abstract data models to be built on the basic indexed set structure.

To associate an indexed set with such properties as colors, descriptive names, or other attribute values a field is defined over the members of the set. A field has one "value" for each member of the set, and these values may be any form, but of a single homogeneous form per field. A set of three members could have fields of values including colors (red, green, blue), 2-dimensional coordinates ((0.00, 0.00), (0.00, 1.00), (0.50, 0.50)) and probabilistic weights (0.23975, 0.71, 0.0525).

Additionally, the values of the fields can be user-defined structures which allow the references to other sets (for sets including sets), sets of fields, and sets of relation tuples, all of which have been included in various versions of the SRF library.

To map any finite traditional set to an indexed set's members is a trivial task, and the traditional set's member designations, such as names or IDs, may be easily mapped into a field over the indexed set's members. To map such as set to an indexed set the procedure is:

  1. Remove a member from the traditional set,
  2. Map it to the first available index, storing its descriptive attribute as a field value,
  3. Remove another member and map it to the next index, and
  4. Repeat until the traditional set is empty.

Extensions have been made to at least one implementation of the SRF library that allow an indexed set to include time "instances" to represent multiple time instances of time-varying set membership in computer memory at one time, yet easily maintain a distinction between the various sets, their membership at each time instance, and the corresponding time-varying relation tuples and field values for each corresponding set member.

B. Relations

A relation between the members of a domain set and a range set is a collection of tuples which each map a single domain set member to a single range set member. A relation may represent a generalized relation or a special form of relation known as a subset relation. Minimally a relation consists of a unique relation ID, a relation name, a domain set, a range set, and a collection of tuples.

A generalized relation may have any number of mappings from each member of the domain set to range set members. The domain set and range set may be any two sets.

A subset relation relates the members of a subset (a set) each to a unique member of its parent set.

To allow the use of fields over the tuples in a relation the relation set was created. Its members represent the tuples in a relation. Such a representation allows easy representation of a field of attributes over the links of a graph, and the representation of the links as both a relation and a set.

Efficient implementation forms have been devised for representing the relation tuples as a sparse matrix. Also, for time-varying sets an implementation includes representing time-varying relations via time instances of relations' tuples.

C. Fields

A field is a collection of values over the members of its domain set. All the values of a field are of a single type (numeric, string, etc.) and form (scalar, vector, etc.). There is one value for each set member, though some values may be "undefined" in some field implementations. Minimally a field always includes a unique field ID, a field name, the domain set, and the field values.

The values in a field may be of types such as enumerated variables, strings, scalar numbers, or even user-defined structures. These types may then be combined into complex numbers, vectors, matrices, tensors, vectors of strings, or even arrays of user-defined structures. In a given field the values MUST be of a homogeneuous type and form to allow the definition of simple field operations over the set members in a consistent and useful manner.

Fields may include metadata about the data values they contain. In some implementations this metadata includes the stored data type, units for scalar numeric values, vector forms used (such as Cartesion versus spherical versus cylindrical), field derivation audit data, and derived field calculation data to allow "on-demand" generation of the field values.

For numeric fields it is useful include metadata that describes the values of the field in an efficient way, including a field name, field data type descriptions, numeric field value units, and for vectors, the vector form used. By using such metadata it is possible to include automatic units conversion for some field operations and self-describing data output formatting so the values can easily be read into other applications with a reasonably full context for units and vector form conversion at load time.

The time instances have also been implemented over fields, storing field value instances for each time instance. This allows time-based extrapolation and integration functions to be easily implemented over multiple time instances of field values. Retaining multiple field value instances allows multi-time-step calculations to use in-memory historical values.

III. Operations

There are several classes of operations that are possible via the three basic SRF constructs:

Each is discussed further.

A. Relation Operations

Each relation (Ra,b, Rb,c) is a mapping from the members of a domain set (Sa for Ra,b, or Sb for Rb,c) to the members of a range set (Sb for Ra,b or Sc for Rb,c), and is represented by a collection of tuples. Some operations on those relations may be considered, such as:

  1. Inversion, the reversal of all the contained tuples:

       Rb,a = Ra,b-1

  2. Composition, the consecutive mappings of two relations through the members of a shared intermediate set, giving a relation based on a relational routing via the intermediate set Sb:

       Ra,(b),c = Ra,b * Rb,c

  3. Augmentation, the mappings of two relations through different "paths" (p1 versus p2), usually but not always through different intermediate sets, giving the resulting tuples from both relational routings as a single relation:

       Ra,(p1 and p2),b = Ra,(p1),b + Ra,(p2),b

    This is also known as a range set union (see below).
  4. Minimal decomposition, the decomposition of a relation, Ra,b, into a general relation, Ra,c the minimal subset of nodes from its range set, Sc, and the subset relation, Rc,a:

       (Sc , Rc,a , Ra,c) = MinDecom(Ra,b)

Each is discussed further.

1. Inversion

This may be viewed as just reversing the pairings of domain set, range set member references, and swapping the domain and range sets for the relation. The result is a relation going the opposite direction. For subset relations this implies that some parent set members are not mapped to any members of the subset.

An example is:

Given:

   Ra,b: Sa ---> Sb

   Ra,b = {(0, 5) (1, 1) (2, 3)}

and wanting:

   Rb,a: Sb ---> Sa

to be obtained by inverting:

   Rb,a = Ra,b-1

which is

   Rb,a = {(5, 0) (1, 1) (3, 2)}

or (reordering the tuples)

   Rb,a = {(1, 1) (3, 2) (5, 0)}

2. Composition

This may be viewed as merging two mappings via the members of a shared intermediate set. When relations are composed they are mappings from the domain set of the first to the range set of the second, but modified by the intermediate relationships with the shared set.

An example is:

Given:

   Ra,b: Sa ---> Sb

   Ra,b = {(0, 0) (1, 2) (2, 1)}

   Rb,c: Sb ---> Sc

   Rb,c = {(0, 1) (1, 3) (2, 0)}

and wanting:

   Ra,(b),c: Sa ---> Sc (via Sb)

to be obtained by composing:

   Ra,(b),c = Ra,b * Rb,c

or

   Ra,(b),c = {(0, 0) (1, 2) (2, 1)} * {(0, 1) (1, 3) (2, 0)}

and, combined

   Ra,(b),c = {(0, 1) (1, 0) (2, 3)}

3. Augmentation (or Range Set Union)

This may be viewed as creating the union of the set of tuples in one relation with those of another, getting a single relation which includes all the paths defined in either of the original relations.

An example is:

Given:

   Ra,(p1),b: Sa ---> Sb

   Ra,(p1),b = {(0, 0) (1, 2) (2, 1)}

   Ra,(p2),b: Sa ---> Sb

   Ra,(p2),b = {(0, 1) (1, 3) (2, 0)}

and wanting:

   Ra,(p1 and p2),b: Sa ---> Sb (via either path p1 or path p2)

to be obtained by augmenting:

   Ra,(p1 and p2),b = Ra,(p1),b + Ra,(p2),b

which is

   Ra,(p1 and p2),b = {(0, 0) (1, 2) (2, 1)} + {(0, 1) (1, 3) (2, 0)}

   Ra,(p1 and p2),b = {(0, 0) (0, 1) (1, 2) (1, 3) (2, 0) (2, 1)}

4. Minimal Decomposition

This may be viewed as having a single mapping from a domain set to a range set, but generating an intermediate subset of the range set and generating a mapping from the orginal domain set to the new subset. This results in three products, the new subset, the mapping to the new subset, and the subset relation from the new subset to the range set.

An example is:

Given:

   Ra,c: Sa ---> Sc

   Ra,c = {(0, 1) (1, 3) (1, 4) (3, 4) (4, 5)}

and wanting:

   Ra,b: Sa ---> Sb

and

   Rb,c: Sb ---> Sc

where Sb is a minimal subset of Sc to be obtained by decomposing via:

   Ra,c = Ra,b * Rb,c

The domain set members referenced are {0, 1, 3, 4} and the range set members referenced are {1, 3, 4, 5}.

This allows the definition of the new minimal subset with 4 members and the subset relation

   Rb,c = {(0, 1) (1, 3) (2, 4) (3, 5)}

giving the useful derivation

   Rb,c-1 = {(1, 0) (3, 1) (4, 2) (5, 3)}

and allows the redirection of the original Ra,c as

   Ra,b = Ra,c * Rb,c-1

   Ra,b = {(0, 1) (1, 3) (1, 4) (3, 4) (4, 5)} * {(1, 0) (3, 1) (4, 2) (5, 3)}

   Ra,b = {(0, 0) (1, 1) (1, 2) (3, 2) (4, 3)}

Other operations may be useful, but most can be resolved down to combinations of these four with judiciously selected sets and subsets.

B. Set Operations

Sets are normally associated with basic set operations, including:

  1. Intersection, getting the common members of two sets;
  2. Union, getting the aggregate of all members of two sets; and
  3. Difference, or the members of the first set that are NOT in the second set.
To perform set operations one must assume a "universal" set which includes all the members of all the operand sets, making the operand sets and the result set all subsets of this "universal set". For indexed sets this means that the sets being operated upon must be subsets and share a common parent set, implying they must each be described by subset relations with the same range set.

The operand sets are subsets of the parent set. The subset relations indicate to which members of the parent set each subset member refers. Thus, all the set operations may be done in the classical manner on the collection of range set references in the operands' subset relations. The result becomes the range set references for the result subset defined by the set operation being performed.

An example of subsets with a shared parent set is as follows:

with Sa and Sb having the subset relations Ra,p and Rb,p where

   Ra,p: Sa ---> Sp

   Ra,p = {(0, 0) (1, 2) (2, 4)}

projecting the subset Sa to the parent set Sp members {0, 2, 4}, as the collection of range set references in Ra,p, and

   Rb,p: Sb ---> Sp

   Rb,p = {(0, 2) (1, 1) (2, 3)}

projects the subset Sb to the parent set Sp members {2, 1, 3}, as the collection of range set references in Rb,p.

Using these definitions each of the set operations using the subsets Sa and Sb are described below.

1. Intersection

The range set members referenced in each subset relation are {0, 2, 4} and {2, 1, 3}, and the traditional intersection of these is {2}. This defines the collection of range set references for RIntersect(a,b). This has only one member, so the new result subset has only one member and its subset relation to the parent set is given by

   RIntersect(a,b) = {(0, 2)}

mapping the one member of the subset SIntersect(a,b) to the one referenced parent set member.

2. Union

The traditional union of {0, 2, 4} and {2, 1, 3}, which is {0, 1, 2, 3, 4}, is the collection of range set references for RUnion(a, b). So the new result subset has 5 members, and these define the subset relation tuples

   RUnion(a, b) = {(0, 0) (1, 1) (2, 2) (3, 3) (4, 4)}

which are the associations of the union set SUnion(a, b) members' indices to the indices of each corresponding parent set member. The ordering of the subset members is arbitrary at the time the subset relation is being defined, although referencing the parent set members in increasing order of index value is often convenient for clarity of representation.

3. Difference

The traditional difference of Sa, {0, 2, 4} minus Sb, {2, 1, 3}, is {0, 4}. This is the collection of range set references for RDiff(a, b), giving

   RDiff(a, b) = {(0, 0) (1, 4)}

and the traditional difference of Sb, {2, 1, 3} minus Sa, {0, 2, 4}, which is {1, 3}, is the collection of range set references for RDiff(b, a), giving

   RDiff(b, a) = {(0, 1) (1, 3)}

C. Field Value Operations

Fields are separate values, one for each member of the domain set for the field. Based on the value types, any function that can be performed on a single value of that type may be performed on every entry in the field, thus it becomes a candidate for a field value operation. Thus field values can have several categories of operations defined over their values, such as the following, each with examples:

  1. Unary operations --

       Fy: a = -Fx: a

  2. Binary operations --

       Fy: a = Fx1: a + Fx2: a

  3. Conglomerate operations which combine multiple fields into one --

       Fy: a = Σi=0,N-1 Fx[i]: a

    where

       N is the number of fields over Sa to be included in the sum.

  4. Aggregate operations which combine multiple values in a single field's values into a single value --

       <value> = Σj=0,n-1 Fx: a[j]

    where

       <value> is a value associated with the field Fx: a reflecting the sum of its member values, and

       n is the number of members in the domain set Sa.

  5. Transformation operations which transform a field of values over one set into a field of values over a different set

       Fy: b = TRANS(Fx: a)

    whereby the argument field Fx: a values are transformed via a global transform into a collection of values (Fy: b) over the members of another set. An even more complex example is applying a Fourier Transform over the members of a set (Sb) to a field of values (Fvalues: b), getting a field of complex values at frequencies (Ff-values: freq) over the set. For transform over a set of "frequency" members (Sfreq), with the frequencies for each member in another field (Ffrequencies: freq) over the set Sfreq. For transform operations multiple fields and possibly relations may be included in the transformation to allow structure information (such as mesh element "neighbors") to be applied in the calculations and member value selections.

Unary operations may include a variety of types based on the data forms involved, such as

Binary operations can be defined over fields similarly, by passing value pairs for a single domain set member into the function and storing the result into the result field entry for that same domain set member. An expression such as

   Fy: a = Fx1: a + Fx2: a

causes the repeated performance of the operation

   y[i] = x1[i] + x2[i]

as i varies over the indices of the members of the domain set Sa.

Similarly field operations may be defined and implemented over more complex value types and combinations of fields.

D. Field Extraction/Insertion Operations

When fields are NOT over the same domain set it is sometimes necessary to extract the values of one field over its domain set to become a field over the set which is the domain set of the other field. Also, if field values are calculated in a parallel environment it is often necessary to merge the resulting field values over subsets (the partition subsets) into a global field of values over a more global domain set (the partitioned domain set).

When a field (Fx: p) over a set (Sp) must be extracted to a field (Fx: a) over a subset (Sa) this can be done via the composition of a field with a relation derived from the subset relation (Ra,p) as follows:

   Fx: a = Ra,p * Fx: p

When a field (Fx: a) over a subset (Sa) must be projected into an existing field (Fx: p) over the parent set (Sp) with the subset specified by the subset relation (Ra,p) then the field value members must be mapped to the parent set members so the values may replace the proper member's values in the field over the parent set. This can be indicated via

   Fx: p <-- Fx: a * Ra,p

When attempting dynamic repartitioning of extremely large models the repartitioning cannot be done easily in a single processor. Ideally one would like to divide up the activities of the repartitioning among the multiple processors already available. To do this requires two collections of subsets (Sp[i] and Sp[j]) representing the current partitioning (p[i]) and the new partitioning (p[j]) of the base set (Sp). To repartition the data in each field over Sp the subsets given by the intersections of the subsets in each partitioning must be obtained (Rp[i][j],p) and then the extraction relations (Rp[i][j],p[i]) and the insertion relations (Rp[i][j],p[j]) can be created. These relations may be composed for each Sp[i][j] giving the extraction/insertion relation Rp[i],p[j].

Since the old partition on each processor (p[i]) consists of the the fields over the subset (Sp[i]) on that processor then the repartitioning may be done via

   Fx: p[i][j] = Fx: p[i] * Rp[i],p[i][j]

for each new subset (Sp[i][j]) to which that subset maps, and the fields (Fx: p[i][j]) over that subset may be written out. Processor j in the new partitioning can then read in only those fields (Fx: p[i][j]) over the subsets (Sp[i][j]) which are common to the jth processor and merge them into the new partition subset's fields using

   Fx: p[j] <-- Fx: p[i][j] * Rp[i][j],p[j]

IV. Implications

The use of the sets, relations, and fields approach presented above has enabled a simplified notation for expressing the partitioning of complex data structures for dynamic reconfiguring in parallel processors.

An example:

   Se: A set of mesh model element polyhedral atmosphere volumes

   Sn: A set of mesh model element vertex node points

   Re,n: The association of the element volume polyhedra to the nodes which are their vertices

   Fa: n: The field of coordinate vectors for each mesh node

   Fb: n: The field of temperatures at each node

   Fc: e: The field of element atmosphere pressures for each elemental volume

   Fd: e: The field of element atmosphere volumes for each elemental volume

   Fe: e: The field of element CO2 mass in each elemental volume

   Ff: e: The field of element N2 mass in each elemental volume

   Fg: e: The field of element O2 mass in each elemental volume

   Fh: e: The field of element atmosphere motion flow vectors

   Fi: e: The field of element atmosphere momentum

   Fj: e: The field of element atmosphere charge

   Fk: e: The field of element gaseous water mass

   Fl: e: The field of element liquid water mass

   Fm: e: The field of element solid water mass

   Spartitions: A set of parallel partitions

   Rpartitions,e: The partitioning of the elements for the parallel processes.

Using this description it becomes easy to describe the projection of the partitioning to the nodes, determine which nodes must be shared among processes, and determine which field data must be extracted for each processor. Even more important is that it is now logically tractable to design the algorithms to perform distributed dynamic repartitioning of the model data.

With these definitions questions may be answered easily, such as:

  1. Given a subset of the elements that are the atmosphere volumes at the earth's surface (Ses, and Res,e) then how are the O2 masses over just those volumes obtained?

       Fg: es = Fg: e * Res,e

  2. What are the total water masses in each atmosphere volume?

       Fw: e = Fk: e + Fl: e + Fm: e

Also, substantially more complex operations can be defined and described with similar ease.

Extremely large network and graph models may be easily partitioned and dynamically repartitioned as well as their operations can be defined for implementation on large parallel architectures using the conceptually simple notation.

V. Summary

Using the basic constructs of set, relations, and fields and interoperations among them simple concepts and a simple notation have been defined which allow large, complex software algorithms to be described and implemented.

The versatility of the approach allows it to be easily encapsulated by graph, network, and mesh application-specific wrappers, but still allows the sharing of data between the various data models. This low-level commonality that includes only a few simple constructs provides an infrastructure that has been applied in highly parallel mesh applications, graph applications, and network applications, and in complex protein folding applications.

The approach is simple, easy to understand, and easy to implement. It also makes it easy to implement and manipulate previously intractable models in small comprehensible steps.


References:

Sets, Relations, and Fields (SRF): Application to Social Networks , July 23, 2007