SRF Algebra: the Foundations

SRF Algebra: The Foundations

2010-11-16

James R. Holten III, PhD
Institute for Complex Additive Systems Analysis (ICASA)
New Mexico Institute of Mining and Technology (NMT)
Socorro, NM, 87801

Abstract

This paper describes a collection of low-level abstractions that allow informatics data models to be expressed in a common and simple form. The abstractions enabled the creation of notation for expressing the many complex interactions found in the wide variety of informatics data models. Some of the basic capabilities this notation enables include the representation of "object" collections' metadata descriptions as a single complex systems graphical form. The abstractions presented here have been used to define mesh models, social networks, and generic graphs. These abstractions have enabled easy description of the logically complex data forms, and those descriptions lay the foundation for far more complex manipulation via the metadata abstractions they provide. The concepts and definitions integral to the SRF Algebra have been described in earlier documents, but this paper resolves previous ambiguities and delivers a firm foundational set of definitions for the SRF Algebra concepts, objects, and operations.

I. Overview

The SRF (Sets, Relations, and Fields) Algebra was devised as an abstraction for basic data representation and manipulation. The abstraction enables a common and simple description of data object collections. The descriptions, in turn, enable expressing the data object interrelationships, data structure algorithm operations on those data objects, and data structure related analysis steps. The basic representation of data relationships allows concepts and data objects from one application domain to be easily shared by developers and analysts in other application domains. Models developed using the abstractions of the SRF Algebra have been implemented, and those implementations demonstrate how data and analysis tools can be built to be shared among diverse applications with very little effort. The abstractions also allow merging the data structures and data relationships of multiple diverse models into a single application.

A. Background

In the various disciplines employing informatics there are as many different representations of objects as there are application developers. This diversity of implementations is a major obstacle to using data from one model in another, or applying algorithms written for one model to another. This diversity of object representations also makes it difficult to generate a hybrid model that includes object definitions from two or more implementations together in one aggregate implementation.

Often model implementations define collections of "object instances" as arrays of class instances (as in "C++", "Java", or other object-oriented languages) or arrays of structures (as in "C"). The arrays of instances each include all the applicable object property variables for the "object" class or structure. With older languages, such as many versions of FORTRAN, without structured data types these instances were stored as an array for each property, and a common index implicitly related the attributes of an entity across the multiple arrays of properties to associate the data fields corresponding to that "object" instance. The implementations of the collections included a companion set of routines or "object methods" to access and manipulate these object instances and their contained property or attribute values. Even when the abstract manipulations were not dependent on the data types of the objects, classes, or structs, the routines often still had to be customized to access the data fields via the specific data type.

For these model implementations many common operations are still often implemented in a one-off fashion, one version for each of the various object types, and for each data field for each object type. The minor variations allow object-oriented method overloading for the specific object types and internal object field names and/or data types. The replicated methods include both operations on the contents of the various object types and operations on collections of the various object types. The results have always been a heterogeneous mish-mash of custom code for each combination.

The programmer often wants to take advantage of common functionality via overloaded functions, C++ templates, Java generics, or C method pointer redirection tables. However, these require each model implementation to be designed for the commonality of all the objects that are to use the implementation, compensate for the objects' peculiarities within the method, or have an overloaded program interface and method for each object type that may eventually use it. If all the needed variations are not considered in the original designs then the compensating code variations must be updated each time a new derived object type is to be added to the aggregate model. Similarly, when existing models are to share data then matching data translators must often be written to extract and remap the data to make compatible any otherwise incompatible implementation aspects.

Occasionally "standards" emerge in application disciplines. These standards are designed to allow data and analysis tools which follow these standards to be shared among all applications of that discipline. When the standards are followed the data and tools usually can be shared without dramatic translation code development being needed. Often codes were written before the standards became available, models are imported from another discipline where the standards are not known, or the standards do not support needed new types of data objects.

These circumstances usually mean that even though standards exist, the generated models often are not compatible among applications or with pre-built tools without a substantial investment of analyst and code developer effort and time.

A key ingredient that is often missing is how the objects in different object collections may be related to one another. Occasionally this is defined in a standard for a given model type, but the defined conventions vary greatly from language to language and from model to model. In C and C++ a named property which is a pointer to an instance of an object of the same or a different type is frequently used. In many old codes using old FORTRAN the references to other objects are still maintained as indices in separate arrays for each relationship type and object type pair to be related. The instances that are related are indicated by their shared index into the arrays for the object instance's properties to be referenced. In databases a key field is stored in one table, and a field which references this key, by maintaining a copy of its value, allows the two associated objects in separate tables to be related.

This lack of commonality among languages and model implementations presents the greatest challenge to scaling models to extremely large sizes, generalizing the partitioning of data elements, and representing generalized relationships between members of the same or different object sets.

It is difficult for application developers to understand how to handle the commonality in models without shared abstractions. Many application disciplines need to share data, be able to scale problems to large sizes, partition the data to run on parallel processors, and combine multiple models to run as a single program. Without common abstractions it is awkward to even describe what manipulations are needed and how common implementations or tools could possibly be shared.

B. The Approach

The SRF Algebra is based on abstract sets of objects, fields of properties or attributes for those objects, and relations describing the associations between pairs of members object sets [1]. The sets in SRF Algebra are commonly known as "indexed sets" of homogeneous members rather than the typical generic set of arbitrary membership [2,3] and therefore the members of the set are similar, yet distinct and may be refrenced uniquely by their index in the set. All properties or attributes of the set members are represented in fields of values over the set members, much like columns in a table of set member rows. The members of the sets may be associated with one another via explict relations, made up of collections of tuples, with each tuple containing one index of a member from one set, and a second index of a member from the same or another set.

A combination of a set and its associated fields are equivalent to a database table schema, a C structure definition, or the data fields of a class object. Each set of objects is equivalent to a table, an array of structure instances, or an array of class instances. Each set member is equivalent to a row in the table, or an instance of the set's structure or class. Each field over the set represents a column in the table or a single variable field in a collection of structure or object instances. The relations are an explicit representation of the pairings between member pairs in sets. In databases these relations replace the value-based matches between values in a single column, value matches in multiple columns, or value matches between columns in different tables, often facilitated via specified "key" columns, "foreign key" columns, and indices over columns [4].

In the various commonly used solid model representations, such as structured mesh models and finite element models [5], much of the information about the object associations is implicit (array indices, implicit orderings of face and node references, etc.). Since the associations are not explicitly represented the context of what was meant by the programmer or model developer is not always obvious from the data organization or from the code unless it is in comments or external documentation. Mistakes are easily made in attempting to partition or otherwise update the models from simple oversights when using the conventional object representations, regardless of the programming language used. By using the SRF Algebra abstractions only a few basic SRF object associations are implicit, and these are known by any developer or analyst familiar with the basic SRF Algebra concepts. All the model data associations are explicitly described using SRF relations. The SRF Algebra abstraction is a portable metadata structure in which many diverse model forms, including generalized and finite element mesh model [5, 6], graph and network models [7,8], and particle models may be easily represented, and all data associations may easily be expressed in the explicit relations.

II. The SRF Algebra

The SRF Algebra objects were chosen for their low level of abstraction and their versatility in representing and manipulating sets of model objects through a common low-level abstraction. It also relies on the highest level at which those shared abstractions can easily be manipulated in ways needed to extend for the application disciplines.

A. The SRF Algebra Objects

The basic object abstractions of the SRF Algebra are the set (S), the relation (R), and the field (F). These three, combined with the special set forms "subset" and "relation set" allow the easy description and manipulation of complex object collections and their relationships. Other more complex variations such as "set of sets", "set of fields", and "set of relations" are also easily represented and manipulated.

1. The Set

A set has a name and a count (n) of contained members. The set members themselves are an abstraction and have no properties of their own. It is convenient, though, to assume the members are unique, ordered, and uniquely identified by their index within the set, conveniently numbered 0 to n-1 for n set members.

A set is identified by S_a, where a represents a unique reference to the particular set (its name), such as in the following modeling examples:

A collection of generalized mesh objects may include the sets
1. S_zones (3-D volume elements),
2. S_faces (2-D plane elements),
3. S_edges (1-D line elements),
4. S_nodes (0-D points), or
5. the subset S_{surface zones} (volume elements at the surface of the model).
A social network model's objects may include the sets
1. S_authors,
2. S_employers,
3. S_{authored documents}, and
4. S_{subject areas}.
The nodes and links in a graph may include the sets
1. S_nodes and
2. S_links.

The set is the fundamental abstraction on which SRF is based. All aspects of what the members of the set are beyond their existence and ordering in the set are determined by their properties, also known as attributes, and are stored as values in "fields" as defined below.

A function that is useful for sets is:

Size(S_a)

Although a set has N members it is always representable as O(1) space complexity.

2. The Relation

A relation has a domain set, a range set, and a collection of tuples made up of set member reference pairs, such as (from index, to index). Each tuple maps from a member of the relation's domain set to a member of the relation's range set. The relation is the collection of 0 or more tuples. The domain set and the range set may be the same set.

A relation may be identified by R_a,b where a and b represent the names of the domain set and the range set respectively. However, since relations between the same domain and range sets may come from different sources and be very dissimilar, the possible ambiguities may be resolved by referencing the relations as R_a,{p},b, where the {p} represents a "derivation description". If the relation reference is not ambiguous the derivation description may be left out.

Examples of unambiguous relations for the example sets given above are:

A collection of generalized mesh objects may include the basic relations
1. R_faces,zones relating the faces to the zones for which they are the boundaries,
2. R_edges,faces relating the edges to the faces for which they are the boundaries, and
3. R_nodes,edges relating the nodes to the edges for which they are the ends.
A social network model's objects may include the basic relations
1. R_{authors, employers} where each author works,
2. R_{authors, authored documents} what each author wrote, and
3. R_{authored documents, subject areas} what each document is about.
The nodes and links in a graph may include the basic relation
1. R_{nodes, nodes}.

Examples of possibly ambiguously named relations for the example sets given above are:

A collection of generalized mesh objects may include the relations
1. R_{zones,{shared faces},zones} relating the zones to the zones with which they share a face,
2. R_{zones,{shared edges},zones} relating the zones to the zones with which they share an edge,
3. R_{zones,{shared node},zones} relating the zones to the zones with which they share a node (vertex),
4. R_{faces,{shared edges},faces} relating the faces to the faces with which they share an edge,
5. R_{faces,{shared nodes},faces} relating the faces to the faces with which they share a node, and
6. R_{edges,{shared nodes},edges} relating the edges to the edges with which they share a node.
A social network model's objects may include the relations
1. R_{authors,{shared employers}, authors} with whom each author works,
2. R_{authors,{shared authored documents}, authors} with whom each author co-authored a document,
3. R_{authors,{shared authored documents}, subject areas} the subjects each author wrote on,
4. R_{authors,{shared authored documents,subject areas},
  authors} authors that wrote on the same subject areas.
The nodes and links in a graph may include the relation
1. R_{nodes,{nodes},nodes} the nodes two steps away from each node.

The domain of a relation is the subset of domain set members for which the relation has a tuple with a reference from that member. This may be any subset of the relation's domain set. The range of a relation is the subset of range set members for which the relation has a tuple with a reference to that member. This may be any subset of the relation's range set.

A relation covers its domain set if its domain is its entire domain set. A relation covers its range set if its range is its entire range set.

If each domain member is mapped to no more than one range member then the relation is considered unitary. If the relation is unitary and each mapped domain member is mapped to a different unique range member then the relation is considered 1:1.

Some functions are useful for relations:

S_a = DomainSet(R_a,b)
S_b = RangeSet(R_a,b)
{indices of referenced members of S_a} = Domain(R_a,b)
{indices of referenced members of S_b} = Range(R_a,b)
{number of referenced members of S_a} = DomainSize(R_a,b)
{number of referenced members of S_b} = RangeSize(R_a,b)
{counts of references for each member of S_b} = RefCounts(R_a,b)

In the lists of indices for the functions three and four above duplicate indices are suppressed, but the order of the first occurrance of each index is retained. For the fifth function above the list of counts includes a zero for each domain set member with no references.

Since a relation with L tuples can have one tuple for every pair of domain and range set members, giving

L = N²

The more general case is a relation with a fixed maximum small number of tuples per domain set member, giving

L = kN

therefore a relation may be representable in O(N²) space complexity, but more commonly is representable in O(N) space complexity.

3. The Field

A field has a name, a domain set, and a collection of entries, one for each member of its domain set. Except for their associations to other set members via relations, all the properties of the domain set members are represented in fields over the domain set. Each field contains a homogeneous collection of set member property or attibute values, much like a column in a database table, where each set member represents a single row in the table and each column's entries have a single data type and data interpretation for every value it contains. In practical use a field can have associated metadata that applies to all its contained entries, such as the data type, its units (i.e. meters, liters), the quantity type represented (i.e. length, volume) and a description of what the values represent (i.e. x-coordinate, atmospheric model volume). These entries may be of any data type: boolean values, numeric values, string values, enumerated values, vector values, or even more complex user defined constructs.

A field may be identified by F_f,a where f represents the name of the field (i.e. a representative name, such as "specific heat" or "stress tensor") and a represents the name of its domain set.

Examples of fields for the example sets given above are:

A collection of generalized mesh objects may include the fields
1. F_{Pressure,zones} attributing the pressures to the zones,
2. F_Volume,zones attributing the volumes to the zones,
3. F_{Temperature,zones} attributing the temperatures to the zones,
4. F_{Flow vector,zones} attributing the Flow vectors to the zones,
5. F_{Flow vector,faces} attributing the Flow vectors to the faces,
6. F_{Coordinates,nodes} attributing the coordinates to the nodes,
A social network model's objects may include the fields
1. F_Name,authors attributing the names to the authors,
2. F_Age,authors attributing the ages to the authors,
3. F_{Document type,authored documents} attributing the document types to the authored documents,
4. F_{Keyword,subject areas} attributing the keywords to the subject areas,
The nodes and links in a graph may include the fields
1. F_Name,nodes attributing names to the nodes.
2. F_Weight,nodes attributing weights to the nodes.
3. F_Weight,links attributing weights to the links.

Some functions are useful for fields:

member_value_type = ValueType(F_f,a)
member_value_size = ValueSize(F_f,a)
S_a = DomainSet(F_f,a)
{indices of members of S_a with non-null values} = Domain(F_f,a)
{indices of members of S_a with "True" or nonzero values} = True(F_{boolean or numeric field,a})
{indices of members of S_a with "False" or zero values} = False(F_{boolean or numeric field,a})
F_arbitrary,a = FieldOf(S_a, {a value or null for each member of the set})
{the value entry for each member of S_a} = Vector(F_arbitrary,a);

Where the first allows access to the domain set of the field, the next three allow getting the member list as an array of reference indices for conditions on the field entry values, and the last two allow the generation of a field from a vector of numbers, one for each domain set member and the extraction of the vector of numbers encapsulated by the field.

Fields are generally representable in O(N) space complexity, but with user-defined and matrix values that may depend on the sizes of other sets this can be of greater space complexity for specific applications.

4. The Subset

A subset is a set with an associated parent set, and a subset relation which indicates which members of the parent set are included in the subset. The subset relation is a 1:1 relation which covers the subset (its domain set). The subset's n sequentially indexed (from 0 to n-1) members are each mapped to one of the members of its parent set that the subset "includes", but the members of the subset are not necessarily in the order they occur in the parent set, as reordering can be a significant and useful aspect of how the subset may be used.

An example is:

Given:
   S_p is a parent set of 20 members.
   S_a is a subset of 5 members.
   R_a,p is the subset relation from S_a to S_p.
where its tuples are
   R_a,p = {(0,16), (1,3), (2,7), (3,0), (4,15)}

and they map the 5 members of the subset to their corresponding unique members of the parent set, at the same time defining a new member ordering.

The subset is merely a reference to the parent set's members, and therefore each field whose domain set is the parent set contains field values which also apply to the members in the subset. Similarly every relation that references the parent set as a domain set or range set may contain references to the members that are in the subset, and therefore may be transformed to relations that have the subset as their domain set or range set.

Examples of these parent set fields and relations over the subsets are as follows:

A collection of generalized mesh objects may have subsets of the mesh elements
1. Interior zones which still have the pressure, temperature, and other property fields associated with the parent set members,
2. Boundary zones which still have the associated links to their faces,
3. Similar subsets of other elements retaining refrences to the properties and associations of their members in the parent set.
A social network model's objects may have subsets of the primary parent sets
1. A subset of authors retain their names and associations with the documents they authored and their employers.
2. A subset of authored documents retain their document types and names as well as associations with the authors and the subject areas they cover.
The nodes and links in a graph may have subsets indicating subgraphs of interest
1. A subset of nodes that represent a categorization in a semantic graph, whose members still share the parent set member properties and are still related (linked) to other members in the parent set.
2. A subset of links that are only within a subset of nodes, but still retain their weights or other properties from the parent graph.

Some functions that are useful for generating subsets and their subset relation:

S_a1 = Subset(S_a, {member indices})
S_a2 = Subset(S_a1, (boolean-valued field))
S_a = Parent(S_a1)
S_a3 = GrandparentSubset(S_a2)

In the first the member indices of the parent members to be included are listed in the order they are to be referenced. Only values non-null and greater than or equal to zero are considered. In the second an expression that returns true or false for each parent set member index is used to select those member indices to include in the subset. In the third the parent set of the subset S_a1 is obtained. In the fourth a copy of the subset is created, but it is a subset of the next higher parent, giving access to the subset relation to the grandparent of the subset, which is the composition of two subset relations:

R_a2,a = R_a2,a1 * R_a1,a

which makes S_a3 represent the same members of S_a as are in S_a2, but as a direct descendant of S_a rather than a child of S_a1.

For subsets in general it can be easily shown that if

R_a,b = SubsetRel(S_a);

then

S_a = Subset(S_b, Range(R_a,b))

The projections of fields and relations through subset and other relations as discussed in GranparentSubset() above are discussed more fully in the later section on Relation Operations (Compose)..

A subset, being a set, is always of order O(1) space complexity, but its subset relation is always of order O(N) space complexity.

5. The Relation Set

A relation set is a set whose members correspond to the tuples in a given relation. This allows the relation tuples to be treated as members of a set, and thus have associated fields of values and relations mapping the member tuples to and from memebers of other sets.

An example is:

S_{R_a,b} is the relation set over tuples in R_a,b.

Examples of relation sets are as follows:

The link relations in a graph may have a relation set
1. to allow the graph links to have one or more fields of properties associated with each set member (graph link)
2. to be able to specify subsets of links.

Some functions that are useful for relation sets and their relations are:

S_{R_a,b} = RelSet(R_a,b)
R_a,b = SetRel(S_{R_a,b})
(from_index, to_index) = SetRelTuple(S_{R_a,b}, set_member_index)
set_member_index = RelSetMemberIndex(R_a,b, from_index, to_index)

The first allows retrieving the set for the relation, creating the set as needed, and the second allows retrieving the relation for the relation set, or null if the set is not a relation set.

It can be easily shown that a subset is always a relation set for its subset relation.

A relation set, being a set, is always of order O(1) space complexity.

B. The SRF Algebra Object Operations

The operations defined over the basic SRF Algebra objects enable complex relationships to be easily described, derived, and manipulated to extract other less obvious relationships. They can also describe the complex algorithmic manipulations needed to express needed model-based algorithms, such as those required for parallel partitioning of complex logical data structures.

The basic operations that may be performed on the SRF object types are discussed in the following groups:

Relation operations
Field operations
Set and subset operations

For examples in the discussions of the operations that follow a few sets and relations are defined:

S_a -- A set named a.
It contains 5 members (0, 1, 2, 3, 4).
S_b -- A set named b.
It contains 6 members (0, 1, 2, 3, 4, 5).
S_c -- A set named c.
It contains 4 members (0, 1, 2, 3).
S_a1 -- A subset named a1.
It contains the 2 members (2, 3) from its parent set.
S_a2 -- A subset named a2.
It contains the 3 members (0, 1, 2) from its parent set.
R_a1,a -- A subset relation for S_a1 to its parent S_a.

R_a1,a = {(0,2), (1,3)}
R_a2,a -- A subset relation for S_a2 to its parent S_a.

R_a2,a = {(0,0), (1,1), (2,2)}
R_a,a -- A relation over the members of S_a.

R_a,a = {(0,1), (1,2), (1,3), (2,0), (2,4), (3, 4)}
R_a,b -- A relation from the members of S_a to the members of S_b.

R_a,b = {(0,0), (1,1), (1,2), (2,4)}
R_b,c -- A relation from the members of S_b to the members of S_c.

R_b,c = {(1,1), (2,2), (4,3)}
R_{c,{path 1},a} -- A relation from the members of S_c to the members of S_a.

R_{c,{path 1},a} = {(0,0), (0,1), (0,3), (1,0),(1,1), (1,2), (1,4), (3,1), (3,3), (3,4)}
R_{c,{path 2},a} -- Another relation from the members of S_c to the members of S_a.

R_{c,{path 2},a} = {(0,0), (0,3), (0,4) (1,0),(1,1), (1,3), (2,0),(2,2), (3,1), (3,2), (3,3), (3,4)}

1. Relation Operations

A number of basic and useful operations are defined for relations. The basic relation operations are as follows:

Inverse -- reverse the relation by reversing the tuples in the relation.
Composition -- successively compose pairs of relations into a single relation by "following the links".
Relation Intersection -- keep only the tuples that are in both relations.
Relation Union -- keep the tuples that are in either relation.
Relation Difference -- Keep only the tuples that are in the first and NOT in the second relation.
Relation Extraction -- include only the tuples that are "from" the indicated domain set member indices.

a. Invert

Since each relation is a directed mapping from the domain set's members to the range set's members it is possible to define its inverse by reversing each tuple (mapping pair) in the relation, as for R_a1,a given above

R_a1,a^-1 = ({(0,2), (1,3)})^-1 = {(2,0), (3,1)} = R_a,a1

Each tuple is reversed, and in the inverse relation the domain set and range set are swapped. It is easy to show that the inverse of the inverse of a relation is an identity operation resulting in the original relation. Using R_a1,a^-1 from above

(R_a1,a^-1)^-1 = R_a,a1^-1 = ({(2,0), (3,1)})^-1 = {(0,2), (1,3)} = R_a1,a

Using the inverse, relations can be reversed to be applied as transformations in the opposite direction via the "compose" operation.

Implementing a relation inverse can be done in O(L) time and space complexity.

b. Compose

If one relation's range set is the same as a second relation's domain set then the compose of the two relations is defined, and may be created by following the mappings from the first relation's domain set members to its range set members, then follow through the second relation's mappings from that set member (in its domain set) to its corresponding range set members. For relations with a single mapping in each relation this is illustrated by

{(1,2)} * {(2, 3)} --> {(1,3)}

or, for a more complex relation pair, each with multiple tuples, such as using those defined above

R_a1,{a,a},a = R_a1,a * R_a,a

gives

R_a1,{a,a},a = {(0,2), (1,3)} * {(0,1), (1,2), (1,3), (2,0), (2,4), (3, 4)}

R_a1,{a,a},a = {(0,0), (0,4), (1,4)}

where the resulting relation's mappings are from the domain set of the first relation to the range set of the second relation.

Another example is:

Calculate

R_a,{b},c = R_a,b * R_b,c

R_a,{b},c = {(0,0), (1,1), (1,2), (2,4)} * {(1,1), (2,2), (4,3)}

giving

R_a,{b},c = ((1,1), (1,2), (2,3))

It is easy to see that the composition order determines the range set/domain set pairings between consecutive relations, so commutativity is not possible in the general case.

However, it is easily demonstrated that the compose operation does exhibit associativity, so a sequence of multiple relations may be composed via the successive composition of any consecutive pairs as long as the composition logical order is maintained.

(R_a,b * R_b,c) * R_c,d = R_a,b * (R_b,c * R_c,d)

It is also easily shown that compose and inverse maintain the distributive property, except the arguments for the inverted compose must be reversed, as in

R_c,{b},a = (R_a,b * R_b,c)^-1

is the same as

R_c,{b},a = R_b,c^-1 * R_a,b^-1

which is a common feature of non-commutative algebras, such as in matrix manipulations.

Implementing a relation compose can always be done in O(L²) time and space complexity, but for relations with fixed numbers of range set references, which is the more common situation, this can be done in O(L) time and space complexity.

c. Relations Intersect

Two or more relations which share a common domain set and a common range set may be combined using the relations intersection (^) by selecting only those tuples that are in all of the relations into a single result relation. The new relation is from the common domain set to the common range set.

For sample relations from above this is illustrated by

R_{c3,{c1 ^ c2},a} = R_{c,{path 1},a} ^ R_{c,{path 2},a}

giving

R_{c3,{c1 ^ c2},a} = {(0,0), (0,1), (0,3), (1,0),(1,1), (1,2), (1,4), (3,1), (3,3), (3,4)} ^
{(0,0), (0,3), (0,4) (1,0),(1,1), (1,3), (2,0),(2,2), (3,1), (3,2), (3,3), (3,4)}

R_{c3,{c1 ^ c2},a} = {(0,0), (0,3), (1,0),(1,1), (3,1), (3,3), (3,4)}

The resulting relation's mappings are from the common domain set of the relations to the common range set of the relations. This can easily be shown to be associative and commutative, as well as distributive over the inverse and composition operations above.

Implementing a relation intersection can always be done in O(L²) time and space complexity, but for the more common case of a limited number of references per domain set member this becomes O(L) time and space complexity.

d. Relations Union

Two or more relations which share a common domain set and a common range set may be combined using the r-union by combining all the tuples in the relations into a single relation. Duplicates are suppressed, and the new relation is from the common domain set to the common range set.

For sample relations from above this is illustrated by

R_{c3,{c1 + c2},a} = R_{c,{path 1},a} + R_{c,{path 2},a}

giving

R_{c3,{c1 + c2},a} = {(0,0), (0,1), (0,3), (1,0),(1,1), (1,2), (1,4), (3,1), (3,3), (3,4)} +
{(0,0), (0,3), (0,4) (1,0),(1,1), (1,3), (2,0),(2,2), (3,1), (3,2), (3,3), (3,4)}

R_{c3,{c1 + c2},a} = {(0,0), (0,1), (0,3), (0,4), (1,0),(1,1), (1,2), (1,3), (1,4), (2,0),(2,2), (3,1), (3,2), (3,3), (3,4)}

Implementing a relation union can always be done in O(L²) time and space complexity, but for the more common case of a limited number of references per domain set member this becomes O(L) time and space complexity.

e. Relations Difference

Two or more relations which share a common domain set and a common range set may be combined using the r-difference by retaining all the tuples in the first relation that are not in the second relation into a single result relation. The new relation is from the common domain set to the common range set.

For sample relations from above this is illustrated by

R_{c3,{c1 - c2},a} = R_{c,{path 1},a} - R_{c,{path 2},a}

giving

R_{c3,{c1 - c2},a} = {(0,0), (0,1), (0,3), (1,0),(1,1), (1,2), (1,4), (3,1), (3,3), (3,4)} -
{(0,0), (0,3), (0,4) (1,0),(1,1), (1,3), (2,0),(2,2), (3,1), (3,2), (3,3), (3,4)}

R_{c3,{c1 - c2},a} = {(0,1), (1,2), (1,4)}

The resulting relation's mappings are from the common domain set of the relations to the common range set of the relations. This operation is not associative or commutative, but it is distributive over both the inverse and composition operations above.

Implementing a relation difference can always be done in O(L²) time and space complexity, but for the more common case of a limited number of references per domain set member this becomes O(L) time and space complexity.

f. Relation Extraction

A single relation may have many references from each domain member to each of many range members. A simple result relation from one or a subset of the domain members can be generated by extracting the tuples that reference only those specific domain set members. Similarly a simple result relation from the domain set members to only one or a specific subset of the range set members can be generated by extracting the tuples that reference only those specific range set members.

For sample relations from above this is illustrated by

R_{c,{c[1,2],{path 2},a},a} = R_{c[1,2],{path 2},a}

R_{c,{c[1,2],{path 2},a},a} = Extract({1,2}, R_{c,{path 2},a})

giving

R_{c,{c[1,2],{path 2},a},a} = (1,0),(1,1), (1,3), (2,0),(2,2)}

and

R_{c,{c,{path 2},a[0,1,3]},a} = R_{c,{path 2},a[0,1,3]}

R_{c,{c,{path 2},a[0,1,3]},a} = Extract(R_{c,{path 2},a}, {0,1,3})

giving

R_{c,{c,{path 2},a[0,1,3]},a} = (0,0),(0,3), (1,0),(1,1), (1,3), (2,0), (3,1),(3,3)}

The resulting relation's mappings are from the domain set of the original relation to the range set of the original relation.

Implementing a relation extraction can always be done in O(L) time and space complexity.

2. Field Operations

Fields have the following basic types of operations:

collective operations -- those that operate on the values of one or more fields over a single domain set, giving another field over the same domain set as in

F_c,d = F_a,d + F_b,d
aggregate operations -- those that generate a single aggregate value from all the values in one or more fields as in

mean = Mean(F_a,d)
or
corr_coef = Corrolate(F_a,d, F_b,d)

Each of these forms is discussed further.

a. Collective Field Operations

A collective field operation is an operation on one or more fields that results in creating another field. All the arguments are fields over the same domain set as the result, except for constant arguments.

The collective operations include any unary, binary, or n-ary operation that can be applied to the field values, using the same domain set member index for each of the argument fields' values, giving a value for the result field for that same domain set member. Any member-wise operation on set attributes that returns a field of values over that set is a collective operation.

These operations include but are not limited to the basic arithmetic operations of add, subtract, multiply, and divide, as well as the numeric value trigonometric, logarithmic, and exponential numerical operations. Fields with string values can include collective operations such as concatenation of corresponding values or more complex string operations. Any other unary, binary, or n-ary functions that may be applied to the corresponding field values for each set member index is a candidate collective operations, including user-defined functions. The function argument field or constant data types, argument counts, and result data types must be compatible with the function definition.

Some examples of collective operations on fields follow:

F_sum,a = F_arg1,a + F_arg2,a + F_arg3,a

means

sum[i] = arg1[i] + arg2[i] + arg3[i]

for all i, a member index from a.
F_difference,a = F_arg1,a - F_arg2,a
F_product,a = F_arg1,a * F_arg2,a * F_arg3,a
F_quotient,a = F_arg1,a / F_arg2,a
F_modulus,a = F_arg1,a % F_arg2,a
F_{concatenation,a} = F_string1,a + F_string2,a + F_string3,a
F_{sin_of_angle,a} = sin(F_angle,a)

which means

r[i] = sin(angle[i])

for all i, a member index from a.

One useful convention is the use of comparator operations on fields to generate new boolean-valued fields. Some are defined as follows:

True if equal --
F_boolean,a = (F_x,a == value)
True if not equal --
F_boolean,a = (F_x,a != value)
True if less than --
F_boolean,a = (F_ordered,a < value)
True if less or equal than --
F_boolean,a = (F_ordered,a <= value)
True if greater than --
F_boolean,a = (F_ordered,a > value)
True if greater or equal than --
F_boolean,a = (F_ordered,a >= value)

Similarly boolean field operations are defined as follows:

NOT ("~") --
F_boolean,a = (~F_{boolean_1,a})
AND ("&") --
F_boolean,a = (F_{boolean_1,a} & F_{boolean_2,a})
OR ("|") --
F_boolean,a = (F_{boolean_1,a} | F_{boolean_2,a})
XOR ("^") --
F_boolean,a = (F_{boolean_1,a} ^ F_{boolean_2,a})

All collective operations are member-by-member operations on the corresponding member values from the argument fields, and each member operation gives a corresponding member value for the result field. Unary, binary, and substantially more complex operations can be defined over fields in this way. Collective field operations may be defined for any field data value types to which the operations may apply, including numeric, string, vector, matrix, and user defined operations for user defined value types.

Implementing a composite field operation can always be done in O(N) time and space complexity.

b. Aggregate Field Operations

The aggregate functions combine all the values in one or more fields to generate a single value, such as in statistical mean, variance, or sum. This can use values from multiple fields, such as a correlation coefficient between two fields' value sequences.

Examples of some aggregate functions are as follows:

NumTrue(F_boolean,a) is the number of true values stored in F_boolean,a.
NumFalse(F_boolean,a) is the number of false values stored in F_boolean,a.
NumNull(F_x,a) is the number of null values stored in F_x,a.
Count(F_x,a) is the count of non-null values stored in F_x,a.
Mean(F_{temperature,a}) is the mean of the non-null numeric values stored in F_{temperature,a}.
Variance(F_{temperature,a}) is the variance of the non-null numeric values stored in F_{temperature,a}.
Max(F_{temperature,a}) is the maximum of the non-null numeric values stored in F_{temperature,a}.
Min(F_{temperature,a}) is the minimum of the non-null numeric values stored in F_{temperature,a}.

In a similar manner many other aggregate functions may be defined over fields, and may be defined to be pertinent to only specific value types.

Implementing an aggregate field operation can always be done in O(N) time and space complexity.

3. Set Operations

In general set theory the set operations are defined relative to some "universe" of set members [1,2] within which each set exists. For the SRF Algebra all sets are merely indexed sets, and the only relevant "universe" for a set is a parent set's members. This means all SRF set operations are defined only over subsets and between subsets of a common parent set. If one wishes to use a larger "universe" context then a common ancestory parent set may be used via the appropriate application of the GrandparentSubset() operation defined in the Subsets section above.

Since subset membership is defined by a subset relation that enumerate the members of the parent set included in that subset, the set operations include generating the new subset relations, as well as their new subsets.

The defined set operations for subsets include the following:

Set complement (~),
Sets intersection (^),
Sets union (+), and
Sets difference (-).

Each is discussed further below.

a. Set Complement

The complement of a subset is the subset which includes all the members of the parent set that are not in the subset. This is shown by:

S_~a1 = ~S_a1

Which, for the subset relations:

R_a1,a = {(0,2),(1,3)}
with
{2,3} = Range(R_a1,a)

But S_a has 5 members, so the complement of that set of range indices is
      {0,1,4} = ~Range(R_a1,a)
or
      {0,1,4} = Range(R_~a1,a)
giving
       R_~a1,a = {(0,0),(1,1),(2,4)}

Implementing a set complement operation can always be done in O(N) time and space complexity, where N is the size of the parent set.

b. Sets Intersection

Since the two sets are subsets of a common parent set, then the result subset is a subset which includes the parent set members which are in both subsets.

To calculate
       S_a1^a2 = S_a1 ^ S_a2
The values range set refrences must be intersected:        R_a1,a = {(0,2),(1,3)}
or
       {2,3} = Range(R_a1,a)
and        R_a2,a = {(0,0),(1,1),(2,2)}
or
       {0,1,2} = Range(R_a2,a)
giving
       Range(R_a1^a2,a) = {2}
or
       R_a1^a2,a = {(0,2)}

Implementing a set intersection operation can always be done in O(N) time and space complexity, where N is the size of the parent set.

c. Sets Union

Since the two sets are subsets of a common parent set, then the result subset is a subset which includes the parent set members which are in either subset.

To calculate
       S_a1+a2 = S_a1 + S_a2
The values range set refrences must be intersected:        R_a1,a = {(0,2),(1,3)}
or
       {2,3} = Range(R_a1,a)
and        R_a2,a = {(0,0),(1,1),(2,2)}
or
       {0,1,2} = Range(R_a2,a)
giving
       Range(R_a1^a2,a) = {0,1,2,3}
or
       R_a1^a2,a = {(0,0),(1,1),(2,2),(3,3)}

Implementing a set union operation can always be done in O(N) time and space complexity, where N is the size of the parent set.

d. Sets Difference

Since the two sets are subsets of a common parent set, then the result subset is a subset which includes the parent set members which are in the first subset but not in the second.

To calculate
       S_a1-a2 = S_a1 - S_a2
The values range set refrences must be intersected:        R_a1,a = {(0,2),(1,3)}
or
       {2,3} = Range(R_a1,a)
and        R_a2,a = {(0,0),(1,1),(2,2)}
or
       {0,1,2} = Range(R_a2,a)
giving the set difference
       Range(R_a1^a2,a) = {3}
or
       R_a1^a2,a = {(0,3)}

Implementing a set difference operation can always be done in O(N) time and space complexity, where N is the size of the parent set.

III. Applications

The SRF Algebra defines a sets, relations, and fields based notation and algebra for representing the logical structures and relationships of data objects. By expressing data as sets of object abstractions, property fields over those set members, and relationships between the members of the sets the algebraic notation gives the designer and analyst a common and logical symbology for tackling otherwise intractable complex manual tasks. Using the SRF Algebra researchers have developed implementations in C, C++, Perl, and Java that automate the operations of the algebra, enabling the representation of of general polyhedral mesh data, computer vision image data model representations and processing algorithms, social networks models and analysis, and critical infrastructure networks models and analysis.

The SRF abstraction and notation has simplified the descriptions of complex models and provided operations needed for manipulating the relationships. These have enabled defining the needed operation descriptions to perform data-parallel decomposition and generating communications maps for the partitioned models [9], as well as revising the relations and maps for dynamic repartitioning of models during a run [10]. SRF abstractions are also being used for representing models of multiple levels of abstraction for multi-network critical infrastructure models, allowing the description and analysis of very complex and extremely large network models. Structured mesh models can be easily converted to the generalized sets and relations so that developed algorithms and tools can be applied to them also.

These basic concepts and the notation presented here have allowed researchers to easily build generalized mesh models, large-scale network models, particle models, molecular models, and mixed mesh and network models for integrating multiple model forms.

VI. Summary

This paper defines the basic objects and operations essential to applying the SRF Algebra. The concepts included in SRF form a useful foundation for both representing models and for manipulating the metadata to analyze and manipulate the details of large models. Continued work has identified more complex operations which enhance the usefulness and scalability of the basic packages. Methods have been devised so that the parallelization process itself can be parallelized, allowing the future application of these concepts and techniques to exascale parallel computer programs.

Implementing the operations of the SRF Algebra in multiple languages has allowed these concepts to be applied to a variety of applications. Work continues, appling SRF in even more application areas.

References:

James R. Holten III, "Sets, Relations, and Fields (SRF) Overview", SRF.html , 2007-03-07.
Paul R. Halmos, "Naive Set Theory" , Springer, 1974.
Patrick Suppes, "Axiomatic Set Theory" , Dover, 1972.
C. J. Date "An Introduction to Database Systems" , Addison Wesley (8th Edition) 2003.
Butler, "SAF Reference Manual", Lawrence Livermore National Laboratory, Jan. 29, 2001.
Larry A. Schoof, Victor R. Yarberry, "Exodus II: A Finite Element Data Model", Sandia National Laboratories, November 1995.
Nicos Christofides "Graph Theory: and Algorithmic Approach" , Academic Press, 1975.
Ravindra K. Ahuja, Thomas L.Magnanti, James B. Orlin "Network Flows" , Prentice Hall, 1993.
James R. Holten III, "Parallel Partitioning via SRF" , ParallelPartitioning_20101111.html , 2010-11-11.
James R. Holten III, "Parallel MxN Repartitioning" , MXNrepartitioning.html , 2003-03-18.