To Do
- Look for notes from the twitter / gnip presentation. I don’t think I have any, though.
- Look up the gamma probability distribution function (used in the gnip presentation)
- What happens with closeness/harmonic centrality when there are multiple disjoint cycles? What if the cycles are of the same size?
- Look up the proposal format the department requires (grad handbook). Length is 15 pages?
- We want to be characterizing the shapes of these graphs/distributions, too. Not just explaining what happens at one point in time, but what does the change look like over time?
- Aside: if for betweenness centrality, if at time it’s 0, we can tell if it’s disjointed, a cycle, or a clique by counting the edges up to , since if there are none it’s disjoint, if there are it’s a cycle, and if there are more than it’s a clique.
- Possible measure that could be interesting to someone: how many nodes have paths of length in time?
- A possible extension moving forward: use the full and compute the measure over the entire evolving network to get a ground truth value, then
- can we use our measures to predict what would happen? So, take a shorter subset of and compute the measure. Can we predict what will happen after that subset? Does it match what we measured for all of ?
- This could be applicaple in streaming graphs
- Develop and write the thesis statement
Notes
- Thesis statement notes
- Be careful with the wording, to make sure you’re not promising too much or too little, and to make sure it is research
- It could be “There are 4 different measures that can be made over evolving networks,” however, there is no research in this statement
- There could be a notion that they mean something, but that’s tricky to state
- Could have clauses dealing with the efficiency of calculation (usually a relative measure, such as more efficient than brute force), but these should be left for the body of the text and not in the thesis, since they are hard to guarantee success.
- Open question that could come up in the presentation and needs to be addressed:
- Are you going to improve it? (the current efficiency / calculation)
- Are you going to prove a lower bound?
- Is your improvement significant in terms of research?
- We should not say anywhere: we fail in the dissertation if we can’t prove a lower bound. That is bad.
- We should specifically state what we’re going to do. Ex: We’re going to do this and that, and if we end up with a proof of lower bound, then that’s awesome and great, but extra
- It needs to be more than: definition, coding and implementation
- What’s the research question there?
- It needs to be something that can fail. If it can’t fail, then it’s nothing more than development. It’s NOT research
- The thesis statement needs to be worded in a way that’s meaningful to me
- When I later need to say, “should I be looking at this, or that?” the thesis statement should be my guide:
- do what’s most relevant to the thesis statement
- If not, It might be that I need to update the thesis statement (which changes the direction of the project). In this case, I should take it to the committee and see what they say and if there are any red flags that are obvious to them.
- Include: Extensions to evolving networks
- They allow us to capture things in our motivating applications that aren’t able to be captured in other current evolving networks
- Node-identity definitions and repercussions
- The extensions we defined are important to allow access to ___ that are not accessible from current work
- Include?: Measures about the graphs that will have relation to semantics of our motivating examples. This is tricky
- How to know when we succeed?
- Collaborator says it’s properly captured. This is NOT a good research plan, so we need a variation of this
- Compare to already agreed upon ways of capturing these changes and semantics. That is, “we capture at least as well as this other method” and we’re faster, quicker, more efficient, or better in some way
- We might not have these comparisons to current state of the art / agreed upon ways since we’re doing new things, like the Mormon marriages
- Might need to build up a straw man argument in the text to show taht we can’t do this.
- We don’t have the ability to do this, so say we’re not going to do that (“Usually people will compare this to other agreed upon way, but there aren’t agreed upon ways to do that, so we’re going to approximate it by…?”)
- Thesis statement (possibility):
- Evolving networks
- extensions
- metrics over those extensions
- motivating examples
- Note, from Worthy: This is still my dissertation. Don’t try to make it (or figure out) what Worthy’s saying. Make it your own! What’s of interest to you?
- Comments to look for from committee:
- This is only implementation, and it’s not research
- You think that’s interesting, but we don’t. It’s not intersting in CS.
- You don’t have a way to establish it, and they don’t think that counts (as a dissertation in CS). Specifically, I wrote: “You don’t have a way to establish it taht they don’t think counts”
- One way to get around (or temper) a professor who wants a rigorous mathematical proof (of lower bound or something) is to have empirical evidence:
- Create an empirical framework (mutliple families of graphs) and carry out the measurements over these families of graphs and compare them
- Need to worry about the synthetic data having the results we’re looking for baked in. So, we’ll need to craft synthetic graphs carefully.
- We’d want to make sure that the families of graphs are not skewed to our metrics/measures, but are a good representative sample (like the k-p graphs of the axiom paper?)
- This might be hard with real-world data, and simpler with synthetic data.
- Reasonings for ensuring a choice of synthetic data design:
- It’s current best practice (or state of the art)
- or, theoretically, it’s good to do this because ___.