Not able to query glue/Athena views ['java.lang.IllegalArgumentException: Can not create a Path from an empty string;'] #29

mvaniterson · 2020-05-22T11:22:19Z

I'm running EMR cluster with the 'AWS Glue Data Catalog as the Metastore for Hive' option enable.
Connecting through a Spark Notebook working fine e.g

spark.sql("show databases")
spark.catalog.setCurrentDatabase(<databasename>)
spark.sql("""select * from <table> limit 10""").show()

All working as expected but when querying a view got the following error:

spark.sql("""select * from <view> limit 10""").show()

An error was encountered:
'java.lang.IllegalArgumentException: Can not create a Path from an empty string;'
Traceback (most recent call last):
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 767, in sql
    return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
  File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 69, in deco
    raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: 'java.lang.IllegalArgumentException: Can not create a Path from an empty string;'

I guess since views are not stored I somewhere have to specify a temp path but cannot find out how?

The text was updated successfully, but these errors were encountered:

sbottelli · 2020-06-10T10:08:28Z

I have the same issue and get the same error of @mvaniterson.
How can we solve it?

jsmithnoble · 2020-08-11T16:09:27Z

I am encountering the same issue using only glue and the spark.sql api.

bbenzikry · 2020-09-01T02:17:11Z

I may be late to the party, but I hope this may help someone who runs into one of those cryptic errors ( we encountered this during table creation when location was not defined properly in the catalog )

https://docs.databricks.com/data/metastores/aws-glue-metastore.html#troubleshooting

Accessing tables and views created by other systems, such as AWS Athena or Presto, may or may not work in Databricks Runtime or Spark. This is not supported.
While they may sometimes work, such as when the table is a Hive-compatible one, others may fail with cryptic error messages. For example, accessing a view created by Athena, Databricks Runtime, or Spark may throw an exception like:
IllegalArgumentException: Can not create a Path from an empty string
That is because Athena and Presto store view metadata in a different format than what Databricks Runtime and Spark expect.

Personally we create a delta table over the same path for spark/spark sql and use Athena for generic querying to circumvent this.

kironp · 2020-09-25T13:33:07Z

I too have been investigating this exact same issue. @bbenzikry Would you please explain a bit more about the "delta table" workaround? For now, I have to create two separate views- one from Spark and another from Athena since these are not mutually compatible.

bbenzikry · 2020-09-26T23:18:26Z

Hi @kironp, sorry for not replying sooner.
Our delta table use is not a workaround, it's our main approach for working with tables.

Our method is similar to what you said you already tried. We don't consume Athena views from spark at all. We use the same glue catalog and create 2 table definitions and views - one for delta ( https://github.com/delta-io/delta ) and one for Athena.

Both definitions are configured to use the same path by generating an Athena table from the delta manifest ( https://docs.delta.io/0.7.0/presto-integration.html )

abdulbasitds · 2020-09-29T09:44:44Z

I have created a Spark cluster with Use AWS Glue Data Catalog for table metadata.

Now I can use table and display tables as following

spark.sql("use my_db_name")
spark.sql("show tables").show(truncate=False)
+------------+---------------------------+-----------+
|database    |tableName                  |isTemporary|
+------------+---------------------------+-----------+
|  my_db_name|tabel1                     |false      |
|  my_db_name|desired_table              |false      |
|  my_db_name|tabel3                     |false      |
+------------+---------------------------+-----------+

But accessing the individual table gives the error

spark.sql("describe desired_table")

'java.lang.IllegalArgumentException: Can not create a Path from an empty string;' Traceback (most recent call last): File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 767, in sql return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped) File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__ answer, self.gateway_client, self.target_id, self.name) File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 69, in deco raise AnalysisException(s.split(': ', 1)[1], stackTrace) pyspark.sql.utils.AnalysisException: 'java.lang.IllegalArgumentException: Can not create a Path from an empty string;'

I have tried to enable hive metastore as

sqlContext = SparkSession.builder\
            .config("hive.metastore.client.factory.class", "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory")\
            .enableHiveSupport()\
            .getOrCreate();

But didnt work

IstvanM · 2020-11-03T12:43:53Z

For me the same issue happened, when I created a View in Athena, then tried to query it in a Glue Job via Spark.
The Issue I think is that Glue Catalog is handling views in a special way. They are Table, just without PATH to a real location (on s3 or alternatives). I couldn't find a workaround so far.

hojatbay · 2020-11-24T14:48:46Z

Is there any update on this?

mattiamatrix · 2020-12-04T13:14:30Z

Same annoying issue here!
Someone mentioned that the view has to be Hive-compatible.
How do I make sure that my view is indeed Hive-compatible?

sbottelli · 2020-12-17T09:21:02Z

I found a solution for this problem!
Is possible to query Athena views by using jdbc connection thorugh spark_read_jdbc with appropriate options and configurations

IstvanM · 2020-12-21T10:22:07Z

I also found a workaround, that will enable both Glue (Spark) and Athena to read the same View from the Glue Catalog. My solution is based on this:
https://stackoverflow.com/questions/56289272/create-aws-athena-view-programmatically/56347331#56347331

TLDR:
It is necessary to create the View in Spark (Glue):
spark.sql("create view YOUR_DB.YOUR_VIEW as select * from SOME_TABLE")

then you can overwrite 2 properties of that Glue Catalog "View" with Boto3 still in the same glue job run:

...
import boto3

spark.sql("create view YOUR_DB.YOUR_VIEW as select * from SOME_TABLE")

glue = boto3.client("glue")
view_from_spark = glue.get_table(DatabaseName="YOUR_DB", Name="YOUR_VIEW")
view_from_spark['Table']['Parameters']['presto_view'] = 'true'
view_from_spark['Table']['ViewOriginalText'] = base64_json 
#base64_json Base64 encoded JSON that describes the table schema inFacebook Presto format.

glue.update_table(DatabaseName="YOUR_DB", TableInput=view_from_spark['Table'])

Note: you need do some cleanup on the view_from_spark before updating the table.

base_64_json should be something like this in the end...

base64_json = '/* Presto View: eyJvcmlnaW5hbFNxbCI6IihcbiAgIFNFTEVDVCAqXG4gICBGUk9NXG4gICAgIHJhdy51bXNfdXNlcnNcbikgIiwiY2F0YWxvZyI6ImF3c2RhdGFjYXRhbG9nIiwic2NoZW1hIjoiY2xlY ... == */'

After this you can do both:

Athena: select * from YOUR_DB.YOUR_VIEW
Glue Spark: spark.sql("select * from YOUR_DB.YOUR_VIEW").show(10)

It is a hacky workaround, I am not sure this would work for all use cases... Luckily it works for us as we rely mostly on Spark and Athena is just for ad hoc querying.

I will post my generic solution to this once it is ready.

IstvanM · 2020-12-21T12:34:36Z

So here is my generic workaround. Keep in mind that the query has to be Both Spark and Presto compatible. I suggest to keep the SQL query of the Views as simple as possible.
How it works:

it creates first an Athena View from the query
gets the Presto Schema in Base64 from that View via Boto3
deletes the Athena View
Creates a spark based view for the same query
updates the spark view with the Presto Schema so Athena can read it as well.

import boto3
import time


def execute_blocking_athena_query(query: str):
    athena = boto3.client("athena")
    res = athena.start_query_execution(QueryString=query)
    execution_id = res["QueryExecutionId"]
    while True:
        res = athena.get_query_execution(QueryExecutionId=execution_id)
        state = res["QueryExecution"]["Status"]["State"]
        if state == "SUCCEEDED":
            return
        if state in ["FAILED", "CANCELLED"]:
            raise Exception(res["QueryExecution"]["Status"]["StateChangeReason"])
        time.sleep(1)


def create_cross_platform_view(db: str, table: str, query: str, spark_session):
    glue = boto3.client("glue")
    glue.delete_table(DatabaseName=db, Name=table)
    create_view_sql = f"create view {db}.{table} as {query}"
    execute_blocking_athena_query(create_view_sql)
    presto_schema = glue.get_table(DatabaseName=db, Name=table)["Table"][
        "ViewOriginalText"
    ]
    glue.delete_table(DatabaseName=db, Name=table)

    spark_session.sql(create_view_sql).show()
    spark_view = glue.get_table(DatabaseName=db, Name=table)["Table"]
    for key in [
        "DatabaseName",
        "CreateTime",
        "UpdateTime",
        "CreatedBy",
        "IsRegisteredWithLakeFormation",
        "CatalogId",
    ]:
        if key in spark_view:
            del spark_view[key]
    spark_view["ViewOriginalText"] = presto_schema
    spark_view["Parameters"]["presto_view"] = "true"
    spark_view = glue.update_table(DatabaseName=db, TableInput=spark_view)

spark_session = ... # insert code to create the session
create_cross_platform_view("YOUR_DB", "TEST_VIEW", "select * from YOUR_DB.YOUR_TABLE", spark_session)

talalryz · 2021-04-01T17:21:21Z

Thank you @IstvanM! Your solution works well.

I had a couple of questions:

What is the reason for deleting the keys for "DatabaseName","CreateTime","UpdateTime","CreatedBy","IsRegisteredWithLakeFormation", "CatalogId"?
I couldn't find any documentation for they keys ViewOriginalText (except for this) and setting
spark_view["Parameters"]["presto_view"] = "true". Do you think this is something that aws might easily change in a future update or are there any reasons to believe the opposite?

IstvanM · 2021-06-07T08:39:59Z

@talalryz I believe they can change it, but it is very unlikely because it would break a lot of Views for a lot of users.
For your other question, I don't exactly remember why we removed those properties. You can try with them, they might not cause any issue. The main problem with our solution is that there are many SQL language differences between Athena and Spark like:

date_diff(...) vs. datediff (...)
varchar vs. string 
etc.

So we use this solution in a limited way.

MarsSu0618 · 2021-07-01T04:04:42Z

@sbottelli How did you setting your options and configurations?

vuchetichbalint · 2021-08-09T08:49:31Z

I'm not sure why, for me this error message seems to be notebook-related, and it only means that "something is wrong".

For example:

With Spark notebook: Can not create a Path from an empty string ...
With spark-submit CLI: org.apache.hadoop.hive.ql.metadata.HiveException: Duplicate column name foobar in the table definition.

And after I resolved this issue, it also works from the notebook as well.

appunni-m · 2022-06-28T15:09:32Z

Hey did you try setting the Location option on Database when creating database ? I got same error when using Database without Location option

grzegorzplech · 2022-08-04T17:01:50Z

I was struggling with a similar issue: In AWS EMR (Hive and Spark/pyspark) I was trying to create a table in Glue Metastore (using .saveAsTable() and .bucketBy())

I was using the "default" database for my tests.

The problem was an empty default location for the database...

I set it to s3://my-data-bucket/path/to/tables
... and it works!

I've followed this answer https://repost.aws/questions/QU5Vg4fVMMT02Qo3NM21CrCg

The default database location regardless of the metastore must be set. Good luck!

hiltercoty · 2022-09-29T20:22:57Z

I was struggling with a similar issue: In AWS EMR (Hive and Spark/pyspark) I was trying to create a table in Glue Metastore (using .saveAsTable() and .bucketBy())

I was using the "default" database for my tests.

The problem was an empty default location for the database...

I set it to s3://my-data-bucket/path/to/tables ... and it works!

I've followed this answer https://repost.aws/questions/QU5Vg4fVMMT02Qo3NM21CrCg

The default database location regardless of the metastore must be set. Good luck!

I am using Glue interactive sessions and this absolutely doesn't work in my case. I set the location value for all databases but still have this error.

moomindani mentioned this issue Jan 31, 2023

Resolves external catalog integration issue delta-io/delta#1579

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not able to query glue/Athena views ['java.lang.IllegalArgumentException: Can not create a Path from an empty string;'] #29

Not able to query glue/Athena views ['java.lang.IllegalArgumentException: Can not create a Path from an empty string;'] #29

mvaniterson commented May 22, 2020

sbottelli commented Jun 10, 2020

jsmithnoble commented Aug 11, 2020

bbenzikry commented Sep 1, 2020 •

edited

Loading

kironp commented Sep 25, 2020

bbenzikry commented Sep 26, 2020 •

edited

Loading

abdulbasitds commented Sep 29, 2020

IstvanM commented Nov 3, 2020

hojatbay commented Nov 24, 2020

mattiamatrix commented Dec 4, 2020

sbottelli commented Dec 17, 2020

IstvanM commented Dec 21, 2020 •

edited

Loading

IstvanM commented Dec 21, 2020 •

edited

Loading

talalryz commented Apr 1, 2021

IstvanM commented Jun 7, 2021 •

edited

Loading

MarsSu0618 commented Jul 1, 2021 •

edited

Loading

vuchetichbalint commented Aug 9, 2021

appunni-m commented Jun 28, 2022

grzegorzplech commented Aug 4, 2022

hiltercoty commented Sep 29, 2022

Not able to query glue/Athena views ['java.lang.IllegalArgumentException: Can not create a Path from an empty string;'] #29

Not able to query glue/Athena views ['java.lang.IllegalArgumentException: Can not create a Path from an empty string;'] #29

Comments

mvaniterson commented May 22, 2020

sbottelli commented Jun 10, 2020

jsmithnoble commented Aug 11, 2020

bbenzikry commented Sep 1, 2020 • edited Loading

kironp commented Sep 25, 2020

bbenzikry commented Sep 26, 2020 • edited Loading

abdulbasitds commented Sep 29, 2020

IstvanM commented Nov 3, 2020

hojatbay commented Nov 24, 2020

mattiamatrix commented Dec 4, 2020

sbottelli commented Dec 17, 2020

IstvanM commented Dec 21, 2020 • edited Loading

IstvanM commented Dec 21, 2020 • edited Loading

talalryz commented Apr 1, 2021

IstvanM commented Jun 7, 2021 • edited Loading

MarsSu0618 commented Jul 1, 2021 • edited Loading

vuchetichbalint commented Aug 9, 2021

appunni-m commented Jun 28, 2022

grzegorzplech commented Aug 4, 2022

hiltercoty commented Sep 29, 2022

bbenzikry commented Sep 1, 2020 •

edited

Loading

bbenzikry commented Sep 26, 2020 •

edited

Loading

IstvanM commented Dec 21, 2020 •

edited

Loading

IstvanM commented Dec 21, 2020 •

edited

Loading

IstvanM commented Jun 7, 2021 •

edited

Loading

MarsSu0618 commented Jul 1, 2021 •

edited

Loading