Todo

  • Ask Joseph about sending Adoption Info
    • Who’s in the scope that he’s collecting
    • Ask for all of his spreadsheets
  • Add people nodes to sankey diagrams (DONE)
  • Look up visualization papers
  • Decide on Committee
  • Make progress on proposal research

Notes

  • Kathleen wants to follow wives (first wives) through the system, not just the men
  • Sankey
    • Structure by blood (currently what it does)
    • Structure by spirit (what constitutes spirit? Adoption? Membership in group?)
    • There are multiple Sankey networks to compare:
      • Blood male-lineage
      • Blood female-lineage
      • Spirit
    • How do they compare?
      • Also compare with the church organization graph. Which influences which?
      • Ex: RS ———- AQ ———– Q12
    • Look at Kindred Britain
    • Do we want people as nodes? (DONE)
      • Could add the people as nodes of a different color or shape on the sankey flow daigram
    • Ways to filter the sankey based on various facets:
      • in a particular group (AQ, RS, …)
      • in a particular pioneer company
  • Chord Diagrams
    • Use an interface that the “Next” and “Previous” buttons go to the next or previous event, rather than by year
  • Divorces
    • Better way to show divorces on Chord?
      • Chop them short so that they don’t go to the outer rim of the chord
    • Multiple types of divorces need to be captured
      • Divorce Date
      • Cancellation Date
      • Abandonment Date (perhaps the husband or wife just abandoned their family to start another one)
      • New table
        • Marriage events
          • Date (of the event)
          • Event type (controlled vocab: marriage, cancellation, divorce, abandonment)
          • Marriage ID
          • Notes
  • Capturing Geographical Information
    • If the root marriage wants to expand, the next logical question is who to include?
      • Kathleen is interested to know if this includes neighbors or people close to the couple
  • Visualization Research Locations

Charu C. Aggarwal and K. Subbian. Evolutionary Network Analysis: A Survey, ACM Computing Surveys, 47(1), April, 2014.

Aggarwal and Subbian give a survey of Evolutionary Network Analysis in this work, noting that data mining algorithms need to be updated and modified to account for network evolution. They divide evolving network analysis into two distinct categories:

  1. Maintenance methods, where a static analysis at any point is updated or maintained as the network evolves, and
  2. Analytic evolution analysis, in which the changes–or differences across time–are focused upon and quantified.

The authors point out that the latter is “closely related to the problem of outlier detection in temporal networks because temporal outliers are often defined as (abrupt) change points” (10:2). To address the definition of evolution, the authors note and describe two different types of evolving networks,

  1. Slowly evolving networks, that evolve slowly over time and may be considered as a series of snapshots at distinct times, and
  2. Streaming networks, such as email or twitter graphs, that change rapidly and therefore may be represented as graph streams.

Slowly evolving networks are described as bibliographic or web-based networks, in which changes occur slowly, on the order of days or months. However, later the authors specifically define evolution of graphs, “Graphs evolve over time as new edges are added and old ones are deleted” (10:10). This definition ignores node evolution, outside of a separate content evolution they describe later in the paper that adds context to the evolving graph.

Analysis in this paper focuses heavliy on clustering and community detection, focusing on updating and maintaining communities and addressing how the communities change across time.

Maintenance Methods

For slowly evolving networks, the authors note that since changes in these kinds of networks are relatively slow, maintaining results of a data mining algrithm is simply adjusting results for the network at time for the network at . An overview is given of a few different analyses and techniques:

  1. Clustering and community detection: New clusters should accurately depict the change in the data and old clusters should be easily followed from one snapshot to the next.
  2. Low rank appoximation: Itdentifies the communities and anomalies by utilizing the adjacency matrix at subsequent snapshots. One method discussed factors the adjacency matrix , where and are low-rank matrices, for that timestep. Functions may then be defined to update and to their counterparts at .
  3. Classification: Labels for a subset of the nodes are used to predict labels for the rest.
  4. Link Prediction: Predict future edges based on the past trends. Recent work uses multiple snapshots to better predict links for future time points.
  5. Tensor Factorization: The authors argue that typically, evolving graphs are third-order tensors consisting of a time-series of 2D adjacency matrices.

The authors then note, when discussing streaming graphs, that no methods exist for many of the techniques that were applied to slowly evolving graphs, stating that “this is a fertile area for future research” (10:9). Specifically, they only adress clustering and classification. Of greater interest in their discussion of streaming maintenance methods, they draw importance to page rank analysis. Since the web is constantly changing, search engines must use snapshots of the entire web. However, estimating page rank over an evolving graph stream would be very useful, noting a method in Bahmani et al, 2010. Also if interest is the Maximum Cardinalyt Matching problem, to find “the largest set of edges such that no two adjacent edges are selected” (10:10).

Analytical Evolutionary Analysis

Evolutionary analysis, the authors state, is about “understanding the overall dynamics of the entire evolution of the graph” (10:10). They note the typical trend, which is densification with shrinking diameters (from Leskovec et al 2005). This evolution, however, is driven by the definition that graph evolution is driven by new edges added and old ones deleted. The authors do list a few good metrics for change:
* centrality, * community behavior, * minimum description length, * shortest paths, and * rules.

The authors spend most of the paper discussing evolutionary analyses of slowly evolving networks, including a dicussion of evolution analysis laws (10:11 - 10:12) and evolutionary network generation.

  1. Communities and their evolution: How community structure changes across snapshots, as well as expand and contract. Aggarwal and Yu (2005) construct a differential graph to measure the changes between snapshots. A key issue the authors note is appropriately matching communities across time points.
  2. Spectral Methods, used to cluster networks.
  3. Shortest path distance evolution: How the top- shortest path distances change across time. Important and interesting becasue most real-world graphs have significant changes in distances between nodes (shrinking diameters for most social network and web graphs).
  4. Network evolution with minimum discription length principle: This may be important, as snapshots are collapsed into ranges if clusters do not change.
  5. Role dynamics: How roles, that may change with network structure, of nodes change over time. These roles, however, are the dynamic ones learned or assigned by analysis.
  6. Visualization: The authors briefly point to a few works on visualization, but do not discuss any with great detail.

In their discussion on streaming graphs, the authors only consider finding outliers and influences.

Conclusions

The authors conclude their survey with a long list of applications for evolutionary network analysis, including citations of related work. This may prove highly relevant. The domains discussed include:

  • World wide web
  • Communications (cell, phone, twitter)
  • Road networks
  • Social networking and blogs
  • Recommendations
  • News
  • Bibliographic networks

Further Questions

  1. Can we talk about how nodes change in relation to the web graph? We normally consider a graph of pages and links, but those pages change in both content and functionality over their lifespans. For example, the yahoo homepage has had different functions over the years, and the msn.com homepage has also been migrated to live.com.
  2. Consider the Citeseer citation graph, discussed on page 10:15. Can we combine research groups (or universities), each into their own node? These nodes would then have changing participants across time. We could then search for communities among these new nodes and how those communities form and morph across time. Did the current people at the university affect group ties while resident at the university or for all time? Or perhaps the status of the university influenced the collaboration between like-minded or similar peer institutions?
  3. Can the differential graph (p 10:15) as explored in Aggarwal and Yu 2005 be used to construct a temporal graph structure without necessitating snapshots? (or snapshots of the entire graph?)
  4. Can we avoid the problem of matching communities across snapshots if we store the graph in such a manner that does not use snapshots?

Friday Evening Talk

  1. Book: The emergence of digital humanities by Steven Jones (this speaker)
  2. Book: Graphs Maps Trees, Frank Moretti (not sure on name)
  3. Book: To save everything, click here

stephenejones.org

Saturday Morning Talk

  • Chen and Floridi, An analysis of in formation in visualization, Synthese, Springer 2012
  • Chen, an information-theoretic framework for visualization, 2010
  • Larkin and Simon, 1987
  • Kahneman, Thinking fast and thinking slow
  • false visializaions: when journalists get data viz wrong (Krum)
  • max fürbringer, 1888, diagrams representing the phylogenetic tree of birds
  • takeaways
    • from wow! to why?
    • visualizations entail compromise
    • we need more graphical standards
  • need renderization methods that consistently produce the same visualization
  • design for information from meirelles

Salons Project Talk

  • Melanie Conroy
  • Palladio… used heavily to display viz
    • talk with her about this work

Isabel Meirelles, Associate Professor of Graphic Design at Northeastern University

  • Hierarchical Structures are ordered sets where elements and/or subsets are organized in a given relationship to one another, both among themselves and within the whole
    • Nested or stacked structures
      • Cartesian
      • Polar
      • Other
        • Voronoi tree maps ??
    • We are dealing with an abstract concept
      • Map can be hierarchical: country, state, province, …
      • Best representation to depict the hierarchy is to find the best conceptual metaphor, not the visual metaphor
      • Abstract data: showing a tree structure
      • Physical data: non-overlapping boxes
    • Metaphor (of the tree)
      • Schema: the tree is a part-whole configuration for structuring a target concept
      • Structural elements: a whole, parts and a configuration. Because the parts can exist without constituting a whole…
    • Trees
      • Trees and “arbores had a solid authoritative base in the Bible. The Book of Genesis speaks of the tree of life planted by God in the Garden…”
      • In early work, people used the roots of the tree as part of the visualization
    • We have also seen uses of the vine
    • Using the human body diagram, which looks like a tree,
    • Also the house as a metaphor as well as the book
    • IBM Many Eyes
    • Isabel has written a book on these topics
    • Visualizations
      • Ernst Haeckel “Paleontological Tree of Vertebrates”
      • Henrich … polar system
      • Dual graphs, where one is the tree, and another is a nested circular treemap, which shows the clusters that are invisible in the tree
    • Recent techniques
      • 3D visualizations: George Robertsom, Snapshot of the “Cone Tree” visualization technique, 1991
      • Johnson, Snapshot of the “TreeViz” interface that uses treemap to represent files in a computer
      • Munzner, snapshot of the “3-D Hyperbolic Trees”, 1998
  • Relational Structures organize data for which relationships are key to the system being visualized
    • Food networks, metabolic networks, power structure (World Finance Corporations), “They Rule” 2004 by Josh On, complexity of American military strategy, small partial map of the internet using data from the Opte Project by Matt Britt, Subway map
    • Origins of Network Science
      • Euler puzzle: traverse the bridges only once, travel to all places
        • Beginning of graph mathematics
          • lands were nodes, bridges were edges
        • Figured out it was not possible
    • “A network is a simplified representation that reduces a system to an abstract structure capturing only the basics of connection patterns and little else. Vertices and edges in a network can be labeled with additional information…” – Mark Newman
    • Says math calls node and edge as “site and bond” or “actor and tie”
    • Definitions:
      • Ranking: ordered sequence of friends
      • Type: friend, relative, co-author
    • Layout types
      • linear
      • force-directed
      • circular
        • Bundling chords going to same direction
          • Recent technique
        • http://well-formed-eigenvector.com (I think)
      • community structure
      • geography based
      • sankey type
        • Fineo by Density Design Research Lab, Italy 2010
        • Over the decades, how states have shifted, by Mike Bostock
      • force-directed centered on node
        • One node in the center
      • polar
      • radial community structure
        • Google+ ripples: how messages have been disseminated on Google+
      • matrix
  • Takeaways
    1. Questions influence visualizations (descriptive vs argumentative)
      • It might be about the process rather than the output
    2. Solving a design problem requires finding the right representation
      • Visualization over time
    3. Visualizations largely depend on the task and audience
      • Perceptual and cognitive constraints (going against the metaphor)
    4. Visualizations are human-centered
  • Book: Design for Information
  • Leonardo
  • Workshop


categories: Meetings tags: Meetings —

Agenda

  1. Presentation so far
    • Still in progress
  2. Chord diagram changes
  3. Sankey changes needed for presentation?
  4. Teaching opportunity for CS 1010 next semester
    • Should I apply for that, since I’m wanting to teach
    • Would that help with SNAC funding?
  5. SNAC might have some interesting temporal questions
    • See notes on computer from meeting
  6. Thesis
    • Committee formation should happen soon?