Notes

Were there sematics to Tang (et al)’s use of in their definition? (Ie that up to edges could be traversed in each snapshot of the graph)
- Why did they do it? Why did they choose $h$ ?
- What does it mean semantically? (this is beyond computationally). Is it something where they assumed that a snapshot was “long enough” to traverse $h$ edges during that time? What is $h$ defined to be in there examples?
- They chose h as the horizon to model the speed of messages passed compared to the length of the time window. So, it allows for multiple events to happen within one snapshot to cover when the snapshots are too coarse-grained. For their example grapn–Enron emails–they use a time interval of 1 day and horizon h=1. That is likely unrealistic: 1 email per day.
Why is clustering coefficient important over various s? If we take multiple different sized time-neighborhoods around and flatten the graph over that interval, what does that mean? What does that say about the graph around ? Is there anything we can get out of it for dynamics?
- If we look over the differnt distributions of the clustering coefficient, what is it trying to tell us?
- Also, which collapsing / flattening scheme are you using at that point? Does it matter?
Really, we’re harking on the dynamics of the graph and understanding/characterizing the graph
- If we take a distribution of a metric, we should classify it. Ex: clustering coefficient for sliding window over time. That would give us a distribution of CC for each point in time. What does that look like?
  - Could be discribed by family: Gaussian, Laplacian, Viral, other…
  - So, it could be something like: This TIVG falls into $X$ class of distribution with $Y$ peak value with regards to clustering coefficient
We have this immediate similarity to other applications (like the imaging/vision Gaussian pyramid)
We really need to look at the semantic connections! What does this measure mean, semantically?
See hand-drawn notes for more