This paper describes the application of the Sets, Relations, and Fields (SRF) concepts to the analysis of social network graphs. It also shows the ease of converting that analysis into programs for applying the derived analysis techniques to real data.
Analyzing graphs involves following links to "discover" associations among the data objects represented by the graph nodes. This rapidly becomes extremely complex and there are no easy notational methods for describing the steps of that analysis in an algebraic or algorithmic language.
SRF has been created to allow an algebraic representation of the relationships between sets of data. It also enables easy application of a variety of techniques loosely based on a variation of relational algebra to algebraically derive complex data relationships and express them in an easily understandable fashion.
Applying SRF to graphs requires thinking of the nodes in a graph as a set and the links as directed links represented as tuples in a single relation. All the attributes that are needed to describe the properties of the nodes and links, such as node or link IDs, node or link weights, naming, specifying that nodes or links fall in designated categories, and specifying any other node or link attributes may done by representing the values as fields over the discrete space defined by the members of the node or link sets. Similarly subsets of the nodes or links may be specified, and relationships between and among those subsets may be easily explored.
This view of graphs allows the use of simple set, relation, and field operations to be applied to the analysis of semantic graphs such as those found in complex social networks.
To apply SRF to complex social networks the SRF representations use the definition of a graph as
Being able to apply the various set, relation, and field operations included in SRF allows clear representation of extremely complex analysis steps.
The SRF approach is based on the indexed set, relations between these sets' members, and fields of values over the sets' members. These allow a graph to be represented as
In a semantic graph the global node set consists of many nodes which may fall into any of several categories. In social networks derived from authorship of documents and associations with parent companies these can include such categories as
Being able to represent and maintain metadata about the subsets and their metadata, such as child-parent relationships, relationships between subsets, and which subset represents data in what categories is essential for analysis of these complex systems.
The set operations union, intersect, and difference must be within a global set member space, so they may only apply to subsets of a "BASE_SET", which for graphs would be the subsets of the global node set, or the subsets of the global graph edges set.
A relation maps the members of a domain set to members of a range set as a collection of tuples, each of which map a single domain set member to a single range set member. A relation may represent many different relationships. Some relation forms include
By representing the links, subsets, and other set member associations as relations it becomes easy to define operations that give a variation of relational algebra over the relations. The defined operations include the following:
These operations generate the subset that is a result of the operation as well as the subset relation for that subset. The subsets must share a parent set.
Given:
Sp -- a parent set,
Sx -- a subset of the parent set,
Rx,p -- the subset relation to the
parent set,
Sy -- another subset of the
parent set, and
Ry,p -- the other subset relation
to the parent set.
The set operations may be defined over the subset relations as
follows:
These operations work on the collections of multiple references from each domain set member, generating the collection of new references that is a result of the operation on the collections of references. The relations to be combined must share the same domain set and the same range set.
Given:
Sz -- a range set.
Sx -- a domain set.
R1: x,z -- the first relation,
R2: x,z -- the other relation
The references collections operations may be defined over the relations
as follows:
A field is a named collection of values of a homogeneous data type (such as numeric, string, or user defined data reference) and associates each single value object with a single member of the field's domain set as F<name>: x over the members of the set Sx.
Basic operations that are defined over the value type for a given field may
be applied to each value corresponding to the members of the domain set.
Such application such as applying trigonometric functions to each member of
a field:
Fsin(angle): x = sin(Fangle: x)
Given the following:
Sp -- the parent set
Sx -- the subset
Rx,p -- the subset relation
Fangle: p -- the field over the parent set
Fdirection: p -- the field over the subset
Operations between fields and relations may be used to get field values over
the members of the related sets.
Fields may be extracted from the field over the members of the parent set
to get just the values over the members of the subset as
Fangle: x = Rx,p *
Fangle: p
or from the subset to members of parent sets as
Fdirection(x): p =
Rx,p-1 *
Fdirection: x
In the latter case the field becomes sparse over the parent set's members, as
it is not defined over any members not in the original subset. However,
fields over multiple subsets may be merged into a single field over the
parent field as long as the following apply:
Fields may be used to define new subsets by extracting all members of the
domain set for which the field values meet some specified criteria such
as
(Sx ,
Rx,p) =
FieldSubset(Fdirection: p ,
<selection criteria>)
Fields values may be extracted based on a mapping via a relation, allowing
the values of a field over a given member subset may be applied to
calculations using fields over another set. In this case multiple field values
may be mapped to each relation domain set member.
Faggregated_dir: x) =
FieldMapping(Fdirection: p ,
Rx,p ,
<aggregating function>)
where the aggregating function can be such operations as sum, sum of squares,
or average.
A graph is represented by a set of nodes and the relations of
tuples formed by the tuples represented directed edges of the graph, such as
Gg = {Sg ,
Rg,g}
A subgraph is represented by a subset of the parent graph's nodes and
the relations that are included between the nodes in the subset, as in
Ga = {Sa ,
Ra,g *
Rg,g *
Ra,g-1}
A node subgraph is represented by a subset of the parent graph's nodes
but the relation that represents the edges of the subgraph are NOT the edges
of the nodes in the parent graph. This may be a relation representing any
relationship among those subgraph nodes, such as all authors who authored
papers with common concept keywords
Ga = {Sa ,
Ra,{dcd},a}
where
Ra,{dcd},a =
Ra,d *
Rd,c *
Rd,c-1 *
Ra,d-1
Being able to derive and represent these two types of subgraphs allows the use of all the set, relation, and field operations to be used to extract patterns of meaning from semantic graphs in a clear and unambiguous manner.
To easily identify and specify the sequence of relation operations needed to get a relationship between nodes in a node subgraph based on their associations through nodes in some other subgraph or subgraphs it is useful to be able to visualize the relationships between the subgraph nodes at a metadata level.
If the nodes in each node category are identified as a subset of nodes and treated as a single entity then the collection of these node subsets form a set of graph hypernodes, and the relations connecting the members of one subset to the members of another can be collapsed into a single edge connecting nodes in the graph of hypernodes, giving a hypergraph. This hypergraph is also known as a metadata graph of the node categories.
By specifying the metadata graph of the node categories then relationships
among the node categories can be easily expressed as paths tracing through
the intermediate nodes for each of the categories that can play a part in
defining a data relationship of interest. An example is to consider the
hypergraph path
authors-->documents-->concepts-->documents-->authors
which describes "Find authors which write documents which have concepts
in common".
The metadata graph is generated from the global graph as follows:
The above simply defines the relationships and the algorithms that must be applied to generate the needed graphs. This can be easily extended to far more complex social networks, and the metagraph allows easy specification of any relationships desired and available from the given data.
By applying the SRF notation and implementation the logical complexity of the needed analysis of social networks becomes tractable and intuitive. The explicit representation of the metadata allows the problems to be expressed clearly at a high level, and then it can be easily converted to an algorithmic implementation for generating the low level realationships described by the metadata derivation.
The application above of these techniques to a social network illustrates that the complex relationships can be described easily, and the steps needed to extract the data for those relationships can similarly be easily derived and described.
By applying this approach and these techniques even more intricate analysis of far more complex systems becomes not only viable, but simple to understand, derive, and implement.