Charu C. Aggarwal and K. Subbian. Evolutionary Network Analysis: A Survey, ACM Computing Surveys, 47(1), April, 2014.

Aggarwal and Subbian give a survey of Evolutionary Network Analysis in this work, noting that data mining algorithms need to be updated and modified to account for network evolution. They divide evolving network analysis into two distinct categories:

  1. Maintenance methods, where a static analysis at any point is updated or maintained as the network evolves, and
  2. Analytic evolution analysis, in which the changes–or differences across time–are focused upon and quantified.

The authors point out that the latter is “closely related to the problem of outlier detection in temporal networks because temporal outliers are often defined as (abrupt) change points” (10:2). To address the definition of evolution, the authors note and describe two different types of evolving networks,

  1. Slowly evolving networks, that evolve slowly over time and may be considered as a series of snapshots at distinct times, and
  2. Streaming networks, such as email or twitter graphs, that change rapidly and therefore may be represented as graph streams.

Slowly evolving networks are described as bibliographic or web-based networks, in which changes occur slowly, on the order of days or months. However, later the authors specifically define evolution of graphs, “Graphs evolve over time as new edges are added and old ones are deleted” (10:10). This definition ignores node evolution, outside of a separate content evolution they describe later in the paper that adds context to the evolving graph.

Analysis in this paper focuses heavliy on clustering and community detection, focusing on updating and maintaining communities and addressing how the communities change across time.

Maintenance Methods

For slowly evolving networks, the authors note that since changes in these kinds of networks are relatively slow, maintaining results of a data mining algrithm is simply adjusting results for the network at time for the network at . An overview is given of a few different analyses and techniques:

  1. Clustering and community detection: New clusters should accurately depict the change in the data and old clusters should be easily followed from one snapshot to the next.
  2. Low rank appoximation: Itdentifies the communities and anomalies by utilizing the adjacency matrix at subsequent snapshots. One method discussed factors the adjacency matrix , where and are low-rank matrices, for that timestep. Functions may then be defined to update and to their counterparts at .
  3. Classification: Labels for a subset of the nodes are used to predict labels for the rest.
  4. Link Prediction: Predict future edges based on the past trends. Recent work uses multiple snapshots to better predict links for future time points.
  5. Tensor Factorization: The authors argue that typically, evolving graphs are third-order tensors consisting of a time-series of 2D adjacency matrices.

The authors then note, when discussing streaming graphs, that no methods exist for many of the techniques that were applied to slowly evolving graphs, stating that “this is a fertile area for future research” (10:9). Specifically, they only adress clustering and classification. Of greater interest in their discussion of streaming maintenance methods, they draw importance to page rank analysis. Since the web is constantly changing, search engines must use snapshots of the entire web. However, estimating page rank over an evolving graph stream would be very useful, noting a method in Bahmani et al, 2010. Also if interest is the Maximum Cardinalyt Matching problem, to find “the largest set of edges such that no two adjacent edges are selected” (10:10).

Analytical Evolutionary Analysis

Evolutionary analysis, the authors state, is about “understanding the overall dynamics of the entire evolution of the graph” (10:10). They note the typical trend, which is densification with shrinking diameters (from Leskovec et al 2005). This evolution, however, is driven by the definition that graph evolution is driven by new edges added and old ones deleted. The authors do list a few good metrics for change:
* centrality, * community behavior, * minimum description length, * shortest paths, and * rules.

The authors spend most of the paper discussing evolutionary analyses of slowly evolving networks, including a dicussion of evolution analysis laws (10:11 - 10:12) and evolutionary network generation.

  1. Communities and their evolution: How community structure changes across snapshots, as well as expand and contract. Aggarwal and Yu (2005) construct a differential graph to measure the changes between snapshots. A key issue the authors note is appropriately matching communities across time points.
  2. Spectral Methods, used to cluster networks.
  3. Shortest path distance evolution: How the top- shortest path distances change across time. Important and interesting becasue most real-world graphs have significant changes in distances between nodes (shrinking diameters for most social network and web graphs).
  4. Network evolution with minimum discription length principle: This may be important, as snapshots are collapsed into ranges if clusters do not change.
  5. Role dynamics: How roles, that may change with network structure, of nodes change over time. These roles, however, are the dynamic ones learned or assigned by analysis.
  6. Visualization: The authors briefly point to a few works on visualization, but do not discuss any with great detail.

In their discussion on streaming graphs, the authors only consider finding outliers and influences.

Conclusions

The authors conclude their survey with a long list of applications for evolutionary network analysis, including citations of related work. This may prove highly relevant. The domains discussed include:

Further Questions

  1. Can we talk about how nodes change in relation to the web graph? We normally consider a graph of pages and links, but those pages change in both content and functionality over their lifespans. For example, the yahoo homepage has had different functions over the years, and the msn.com homepage has also been migrated to live.com.
  2. Consider the Citeseer citation graph, discussed on page 10:15. Can we combine research groups (or universities), each into their own node? These nodes would then have changing participants across time. We could then search for communities among these new nodes and how those communities form and morph across time. Did the current people at the university affect group ties while resident at the university or for all time? Or perhaps the status of the university influenced the collaboration between like-minded or similar peer institutions?
  3. Can the differential graph (p 10:15) as explored in Aggarwal and Yu 2005 be used to construct a temporal graph structure without necessitating snapshots? (or snapshots of the entire graph?)
  4. Can we avoid the problem of matching communities across snapshots if we store the graph in such a manner that does not use snapshots?