As Director of Data Science for sqrrl , I’m always looking to push the limits of data theory and application. Recently I attended the 30th meeting of New York Area Theory Day, a semi-annual seminar put on by Columbia University’s Department of Computer Science. This years schedule included four interesting talks, which I’ve summarized below, and which we’ll be thinking about @sqrrl_inc on how to apply to help our customers make the most of their Big Data efforts.
Professor Daniel Spielman of Yale gave a talk about graph sparsifiers. For those of you unfamiliar, a sparsifier utilizes techniques to take a large graph G and create a graph H with the same nodes but with many fewer edges, O(n log n) or O(n) for example. The graph H has some very nice properties, like having the same communities, eigenvalues, etc. of the original graph G. Sparsifiers are quite relevant to web-scale data. Huge graphs can be analyzed by reducing the graph in scale while retaining the essential properties of the graph. Dr. Spielman didn’t talk much about the scalability of these algorithms to web-scale graphs, but we’re thinking about this actively for practical applications.
Professor Vijay Vazirani of Georgia Tech gave a talk about solving complicated Nash games in the context of both economic consumption (the usual case) and production (a less studied case) and the associated complexity of these solutions. I believe that game theory has wide reaching applications for our customers at sqrrl. I think there could be some very interesting applications of high-dimensional game theory to the realm of practical data science.
Professor Maria Chudnovsky from Columbia followed with a talk about some interesting results in translating local graph properties (“this graph has none of this small graph in it”) to global results (“this graph has no clique or stable set of size less than log |V|). This area of theory encompasses the Erdos-Hajnal Conjecture and its derivatives, and most general cases of these statements are not yet proven. However, the theory has a very rich and interesting set of proved lemmas and open problems. I’m unsure still how to apply these results to practical data problems, but will be actively following the space to see how others begin to make such applications.
Professor Sanjeev Khanna of the University of Pennsylvania finished with a talk about the state of the art in the edge-disjoint paths problem: given sets of pairs of sources and sinks, we wish to find the most paths through the graph between the sources and sinks that don’t share an edge. Many practical problems can potentially be cast as variants of this problem. The problem is NP-Hard even in very restricted cases but there are some practical approximations for the problem, especially when a limited number of shared edges are allowed. In the construction of these approximation algorithms there are some interesting constructs that could by themselves be useful for practical graph theory and data science. An example would be grid embeddings present in planar graphs that are used to construct approximate answers.
A lot of great theory was packed into just a few hours at Theory Day. If anyone would like to share their take on the event, please send me tweets @_SecretStache_ or send me an email at firstname.lastname@example.org