[Defense] Extensible Graph Analytics for Large-scale Data Science
Tuesday, April 5, 2022
11:45 am - 12:45 pm
In
Partial
Fulfillment
of
the
Requirements
for
the
Degree
of
Doctor
of
Philosophy
Xiantian
Zhou
will
defend
her
proposal
Extensible
Graph
Analytics
for
Large-scale
Data
Science
Abstract
Graph analytics require specialized storage and algorithms, setting them apart from machine learning. Currently, the best approaches to analyze 鈥渂ig graphs鈥 either work completely in main memory or they require building so-called graph engines. SQL-based solutions are somewhere in between: they are not as comprehensive as memory-based solutions (in terms of breadth of graph problems) and they are competitive with graph engines (slightly slower in some problems). We first propose optimized SQL algorithms to analyze complicated graphs metrics such as triangle, betweenness centrality, and diameter on distributed DBMSs. Then, we develop a general C++ function based on a semiring algorithm. The function can help solve many graph problems. It also works for graphs that cannot fit in the main memory. The function is developed in C++, but it can be easily called in Python. Finally, we explore a fourth, but natural, alternative: studying how to program graph algorithms within the Python ecosystem, but following database system principles. We thereby present a solution inspired by previous research on analyzing graphs with SQL queries. Our solution is based on a general semiring operator, which allows easy programming of several graph algorithms by swapping functions. We study how to optimize our operator as a primitive query, treating Python functions as basic database operators. Even though our solution cannot compete with graph engines to analyze massive graphs, it can be an acceptable solution to analyze graphs in an average computer today, without main memory limitations. Moreover, we expect our solution to become more viable as hardware gets faster/cheaper and Python becomes more popular.
Tuesday,
April
5,
2022
11:45AM
-
1:45PM
CT
PGH
392
Dr. Carlos Ordonez, dissertation advisor
Faculty, students and the general public are invited.
