Calendar - 海角社区

海角社区

Skip to main content

[Defense] Extensible Graph Analytics for Large-scale Data Science

Tuesday, April 5, 2022

11:45 am - 12:45 pm

In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
Xiantian Zhou
will defend her proposal
Extensible Graph Analytics for Large-scale Data Science


Abstract

Graph analytics require specialized storage and algorithms, setting them apart from machine learning. Currently, the best approaches to analyze 鈥渂ig graphs鈥 either work completely in main memory or they require building so-called graph engines. SQL-based solutions are somewhere in between: they are not as comprehensive as memory-based solutions (in terms of breadth of graph problems) and they are competitive with graph engines (slightly slower in some problems). We first propose optimized SQL algorithms to analyze complicated graphs metrics such as triangle, betweenness centrality, and diameter on distributed DBMSs. Then, we develop a general C++ function based on a semiring algorithm. The function can help solve many graph problems. It also works for graphs that cannot fit in the main memory. The function is developed in C++, but it can be easily called in Python. Finally, we explore a fourth, but natural, alternative: studying how to program graph algorithms within the Python ecosystem, but following database system principles. We thereby present a solution inspired by previous research on analyzing graphs with SQL queries. Our solution is based on a general semiring operator, which allows easy programming of several graph algorithms by swapping functions. We study how to optimize our operator as a primitive query, treating Python functions as basic database operators. Even though our solution cannot compete with graph engines to analyze massive graphs, it can be an acceptable solution to analyze graphs in an average computer today, without main memory limitations. Moreover, we expect our solution to become more viable as hardware gets faster/cheaper and Python becomes more popular.


Tuesday, April 5, 2022
11:45AM - 1:45PM CT
PGH 392

Dr. Carlos Ordonez, dissertation advisor

Faculty, students and the general public are invited.

Doctoral Proposal Defense