Sometimes it is useful to extract table names from an SQL statement, for example if you are trying to figure out dependencies for your Hive or BigQuery (or whatever) tables.
It is actually a lot simpler than it seems and you don’t need to write your own SQL parser or find one out there. In SQL table names always follow the FROM and JOIN keywords. So all you have to do is split the statemement into tokens, and scan the list for any mention of FROM or JOIN and grab the next token.
Here is a very simplistic Python function that does this using regular expressions:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
|
This is obviously not perfect, for example in BigQuery there is a
possibility that what follows SELECT
is a UDF name, but I’ll leave
working around that as an exercise for the reader.