python - SparkSQL sql syntax for nth item in array -
i have json object has unfortunate combination of nesting , arrays. not totally obvious how query spark sql.
here sample object:
{ stuff: [ {a:1,b:2,c:3} ] }
so, in javascript, value c
, i'd write mydata.stuff[0].c
and in spark sql query, if array wasn't there, i'd able use dot notation:
select stuff.c blah
but can't, because innermost object wrapped in array.
i've tried:
select stuff.0.c blah // fail select stuff.[0].c blah // fail
so, magical way select data? or supported yet?
it not clear mean json object lets consider 2 different cases:
an array of structs
import tempfile path = tempfile.mktemp() open(path, "w") fw: fw.write('''{"stuff": [{"a": 1, "b": 2, "c": 3}]}''') df = sqlcontext.read.json(path) df.registertemptable("df") df.printschema() ## root ## |-- stuff: array (nullable = true) ## | |-- element: struct (containsnull = true) ## | | |-- a: long (nullable = true) ## | | |-- b: long (nullable = true) ## | | |-- c: long (nullable = true) sqlcontext.sql("select stuff[0].a df").show() ## +---+ ## |_c0| ## +---+ ## | 1| ## +---+
an array of maps
# note: schema inference dictionaries has been deprecated # don't use in practice df = sc.parallelize([{"stuff": [{"a": 1, "b": 2, "c": 3}]}]).todf() df.registertemptable("df") df.printschema() ## root ## |-- stuff: array (nullable = true) ## | |-- element: map (containsnull = true) ## | | |-- key: string ## | | |-- value: long (valuecontainsnull = true) sqlcontext.sql("select stuff[0]['a'] df").show() ## +---+ ## |_c0| ## +---+ ## | 1| ## +---+
Comments
Post a Comment