Sets, Relations, and Fields (SRF):
Application to Social Networks

July 23, 2007

by
James R. Holten III
Senior Research Scientist
Institute for Complex Additive Systems Analysis
New Mexico Institute of Mining and Technology
jholten@icasa.nmt.edu



I. Overview

This paper describes the application of the Sets, Relations, and Fields (SRF) concepts to the analysis of social network graphs. It also shows the ease of converting that analysis into programs for applying the derived analysis techniques to real data.

A. Background

Analyzing graphs involves following links to "discover" associations among the data objects represented by the graph nodes. This rapidly becomes extremely complex and there are no easy notational methods for describing the steps of that analysis in an algebraic or algorithmic language.

SRF has been created to allow an algebraic representation of the relationships between sets of data. It also enables easy application of a variety of techniques loosely based on a variation of relational algebra to algebraically derive complex data relationships and express them in an easily understandable fashion.

Applying SRF to graphs requires thinking of the nodes in a graph as a set and the links as directed links represented as tuples in a single relation. All the attributes that are needed to describe the properties of the nodes and links, such as node or link IDs, node or link weights, naming, specifying that nodes or links fall in designated categories, and specifying any other node or link attributes may done by representing the values as fields over the discrete space defined by the members of the node or link sets. Similarly subsets of the nodes or links may be specified, and relationships between and among those subsets may be easily explored.

This view of graphs allows the use of simple set, relation, and field operations to be applied to the analysis of semantic graphs such as those found in complex social networks.

B. Approach

To apply SRF to complex social networks the SRF representations use the definition of a graph as

  1. A global node set of all the nodes in the graph,
  2. A relation whose tuples represent the directed links between the pairs of nodes,
  3. A set that represents the set of those link tuples, and
  4. The collection of fields of attributes over the node and link set members.
This view allows the representation of any number of attributes over the nodes and the links of the graph, as well as allowing the simple representation of subgraphs which include a subset of the global nodes and the subset of links which are limited to those nodes in the subset, as well as "node subgraphs" which use a subset of the nodes and a completely different collection of links.

Being able to apply the various set, relation, and field operations included in SRF allows clear representation of extremely complex analysis steps.

II. A Graph Representation

The SRF approach is based on the indexed set, relations between these sets' members, and fields of values over the sets' members. These allow a graph to be represented as

  1. The global node set, Sg;
  2. The global graph edge relation relating the pairs of members of Sg , Rg,g;
  3. The set of tuples in the relation to allow fields of values over the graph links, SRgg; and
  4. The fields of attribute values over the sets
This representation allows the metadata describing the relationships among the data forms to be included in the notation as well as the implementation of the SRF graph form, which enables the analyst to easily keep track of the many complexities that are inherent in complex systems analysis.

A. The Node Sets

In a semantic graph the global node set consists of many nodes which may fall into any of several categories. In social networks derived from authorship of documents and associations with parent companies these can include such categories as

  1. Authors,
  2. Authored documents,
  3. Concept keywords,
  4. Subject areas,
  5. Organizations,
  6. Facilities, and
  7. Geographic facility locations.

Since these are all in a single global node set it is difficult to derive meaningful relationships without expressing the data organization in some form other than just attributes of the set members. Clarity can be achieved in using several approaches
  1. The global node set can be broken down into node subsets for each of the above categories,
  2. A group of categories; such as authors, authored documents, and concepts can form a single subset of the nodes, and therfore a subgraph, as some operations over just that subgraph may be useful.
Such subsets and subgraph representations will be discussed in greater detail later.

Being able to represent and maintain metadata about the subsets and their metadata, such as child-parent relationships, relationships between subsets, and which subset represents data in what categories is essential for analysis of these complex systems.

The set operations union, intersect, and difference must be within a global set member space, so they may only apply to subsets of a "BASE_SET", which for graphs would be the subsets of the global node set, or the subsets of the global graph edges set.

B. The Relations

A relation maps the members of a domain set to members of a range set as a collection of tuples, each of which map a single domain set member to a single range set member. A relation may represent many different relationships. Some relation forms include

  1. A generalized relation which maps from each of the members of any set (the domain set) each to any number of members of another set (the range set), such as Rx,y , maps from members of Sx to members of Sy;
  2. A subset relation which maps each member in a (domain set) subset to a single distinct member in the parent (range) set, such as Rsubset,parent , mapping from members of Ssubset to members of Sparent; and
  3. A directed graph links relation which maps the members of a single set to one another, giving a relation with the same domain set and range set, such as Rx,x , mapping from members of Sx back to members of Sx.

By representing the links, subsets, and other set member associations as relations it becomes easy to define operations that give a variation of relational algebra over the relations. The defined operations include the following:

1. Basic Operations

  1. The inverse is just the reversal of the domain and range sets and each of the tuples associating members of the two sets, such as

       Ry,x = Rx,y-1

    which reverses all the tuple association pairs.
  2. The composition is the combination of two relations, and thus the associations from the members of a set to the members of an intermediate set, and from the members of the intermediate set to the members of a third set into a single collection of associations from the members of the first set to the members of the third set. The operation may be represented as

       Rx(y)z = Rx,y * Ry,z

    where the range set of the first relation must be the domain set of the second relation and the resulting relation is the mapping from the members of the first relation's domain set to the members of the second relation's range set.
  3. The minimal decomposition of a relation is defined for a relation which maps to only some of the members of the range set. The decomposition generates a subset and its subset relation, including only those range set members the original relation references, and then generates a redirected relation to the members of the new subset.

       (Sy , Ry,z , Rx,y) = MinDecom(Rx,z)

    which includes the subset Sy and its subset relation Ry,z , as well as the redirected relation to the members of the subset, Rx,y.

2. Subset Relation Operations

These operations generate the subset that is a result of the operation as well as the subset relation for that subset. The subsets must share a parent set.

Given:

   Sp -- a parent set,
   Sx -- a subset of the parent set,
   Rx,p -- the subset relation to the parent set,
   Sy -- another subset of the parent set, and
   Ry,p -- the other subset relation to the parent set.

The set operations may be defined over the subset relations as follows:


  1. The domain set union performs the normal set union by generating the new subset relation and the subset as the result of the set union. It is used by the set union operation above. This is demonstrated by

       (Sa , Ra,p) = DomainSetUnion(Rx,p , Ry,p)

  2. The domain set intersect performs the normal set intersection by generating the new subset relation and the subset as the result of the set intersection. It is used by the set intersection operation above. This is demonstrated by

       (Sb , Rb,p) = DomainSetIntersect(Rx,p , Ry,p)

  3. The domain set diff performs the normal set difference by generating the new subset relation and the subset as the result of the set difference. It is used by the set difference operation above. This is demonstrated by

       (Sc , Rc,p) = DomainSetDiff(Rx,p , Ry,p)

3. Relation References Operations

These operations work on the collections of multiple references from each domain set member, generating the collection of new references that is a result of the operation on the collections of references. The relations to be combined must share the same domain set and the same range set.

Given:

   Sz -- a range set.
   Sx -- a domain set.
   R1: x,z -- the first relation,
   R2: x,z -- the other relation

The references collections operations may be defined over the relations as follows:


  1. The range set union generates a new relation whose tuples include, for each domain set member, the union of the referenced range set member collections.

       R3: x,z = RangeSetUnion(R1: x,z , R2: x,z)

  2. The range set intersect generates a new relation whose tuples include, for each domain set member, the intersection of the referenced range set member collections.

       R4: x,z = RangeSetIntersect(R1: x,z , R2: x,z)

  3. The range set diff generates a new relation whose tuples include, for each domain set member, the difference of the referenced range set member collections.

       R5: x,z = RangeSetDiff(R1: x,z , R2: x,z)

By combining these operations algebraically and algorithmically the analyst can express extremely complex analysis tasks in an unambiguous and easily implemented form.

C. The Fields

A field is a named collection of values of a homogeneous data type (such as numeric, string, or user defined data reference) and associates each single value object with a single member of the field's domain set as F<name>: x over the members of the set Sx.

Basic operations that are defined over the value type for a given field may be applied to each value corresponding to the members of the domain set. Such application such as applying trigonometric functions to each member of a field:

   Fsin(angle): x = sin(Fangle: x)

Given the following:

   Sp -- the parent set

   Sx -- the subset

   Rx,p -- the subset relation

   Fangle: p -- the field over the parent set

   Fdirection: p -- the field over the subset

Operations between fields and relations may be used to get field values over the members of the related sets.

Fields may be extracted from the field over the members of the parent set to get just the values over the members of the subset as
   Fangle: x = Rx,p * Fangle: p

or from the subset to members of parent sets as

   Fdirection(x): p = Rx,p-1 * Fdirection: x

In the latter case the field becomes sparse over the parent set's members, as it is not defined over any members not in the original subset. However, fields over multiple subsets may be merged into a single field over the parent field as long as the following apply:

This is illustrated by:

   Fdirection: p = FieldParentJoin(Fdirection: x , Fdirection: y , Fdirection: z)

which joins the fields over Sx , Sy, and Sz to form a single field over their parent set Sp.

Fields may be used to define new subsets by extracting all members of the domain set for which the field values meet some specified criteria such as

   (Sx , Rx,p) = FieldSubset(Fdirection: p , <selection criteria>)

Fields values may be extracted based on a mapping via a relation, allowing the values of a field over a given member subset may be applied to calculations using fields over another set. In this case multiple field values may be mapped to each relation domain set member.

   Faggregated_dir: x) = FieldMapping(Fdirection: p , Rx,p , <aggregating function>)

where the aggregating function can be such operations as sum, sum of squares, or average.

D. Graphs from Sets and Relations

A graph is represented by a set of nodes and the relations of tuples formed by the tuples represented directed edges of the graph, such as

   Gg = {Sg , Rg,g}

A subgraph is represented by a subset of the parent graph's nodes and the relations that are included between the nodes in the subset, as in

   Ga = {Sa , Ra,g * Rg,g * Ra,g-1}

A node subgraph is represented by a subset of the parent graph's nodes but the relation that represents the edges of the subgraph are NOT the edges of the nodes in the parent graph. This may be a relation representing any relationship among those subgraph nodes, such as all authors who authored papers with common concept keywords

   Ga = {Sa , Ra,{dcd},a}

where

   Ra,{dcd},a = Ra,d * Rd,c * Rd,c-1 * Ra,d-1

Being able to derive and represent these two types of subgraphs allows the use of all the set, relation, and field operations to be used to extract patterns of meaning from semantic graphs in a clear and unambiguous manner.

E. Metadata Graphs via a Set of Sets

To easily identify and specify the sequence of relation operations needed to get a relationship between nodes in a node subgraph based on their associations through nodes in some other subgraph or subgraphs it is useful to be able to visualize the relationships between the subgraph nodes at a metadata level.

If the nodes in each node category are identified as a subset of nodes and treated as a single entity then the collection of these node subsets form a set of graph hypernodes, and the relations connecting the members of one subset to the members of another can be collapsed into a single edge connecting nodes in the graph of hypernodes, giving a hypergraph. This hypergraph is also known as a metadata graph of the node categories.

By specifying the metadata graph of the node categories then relationships among the node categories can be easily expressed as paths tracing through the intermediate nodes for each of the categories that can play a part in defining a data relationship of interest. An example is to consider the hypergraph path

   authors-->documents-->concepts-->documents-->authors

which describes "Find authors which write documents which have concepts in common".

The metadata graph is generated from the global graph as follows:

  1. The global graph is defined as

       Gg = {Sg , Rg,g}

  2. The category subgraphs are defined as

       (Sauthors , Rauthors,g) = FieldSubset(Fcategories: g , equals("author"))

       (Sdocuments , Rdocuments,g) = FieldSubset(Fcategories: g , equals("document"))

       (Sconcepts , Rconcepts,g) = FieldSubset(Fcategories: g , equals("concept"))

  3. The "inter-category node subset" relations are defined as

       Rauthors,documents = Rauthors,g * Rg,g * Rdocuments,g-1

       Rdocuments,concepts = Rdocuments,g * Rg,g * Rconcepts,g-1

  4. The metadata graph is defined as

       Gm = {Sm , Rm,m}

    where

       Sm = {Sauthors , Sdocuments , Sconcepts}

    and, since there are only the two relations of interest among the subsets of nodes, the relation Rm,m is defined by the simple tuples given by

       Rm,m = {(authors, documents) , (documents, concepts)}

The above simply defines the relationships and the algorithms that must be applied to generate the needed graphs. This can be easily extended to far more complex social networks, and the metagraph allows easy specification of any relationships desired and available from the given data.

III. Summary

By applying the SRF notation and implementation the logical complexity of the needed analysis of social networks becomes tractable and intuitive. The explicit representation of the metadata allows the problems to be expressed clearly at a high level, and then it can be easily converted to an algorithmic implementation for generating the low level realationships described by the metadata derivation.

The application above of these techniques to a social network illustrates that the complex relationships can be described easily, and the steps needed to extract the data for those relationships can similarly be easily derived and described.

By applying this approach and these techniques even more intricate analysis of far more complex systems becomes not only viable, but simple to understand, derive, and implement.


References:

Sets, Relations, and Fields: Overview , July 23, 2007