python - SparkSQL sql syntax for nth item in array -

- January 15, 2013

i have json object has unfortunate combination of nesting , arrays. not totally obvious how query spark sql.

here sample object:

{   stuff: [     {a:1,b:2,c:3}   ] }

so, in javascript, value c, i'd write mydata.stuff[0].c

and in spark sql query, if array wasn't there, i'd able use dot notation:

select stuff.c blah

but can't, because innermost object wrapped in array.

i've tried:

select stuff.0.c blah // fail select stuff.[0].c blah // fail

so, magical way select data? or supported yet?

it not clear mean json object lets consider 2 different cases:

an array of structs

import tempfile      path = tempfile.mktemp() open(path, "w") fw:      fw.write('''{"stuff": [{"a": 1, "b": 2, "c": 3}]}''') df = sqlcontext.read.json(path) df.registertemptable("df")  df.printschema() ## root ##  |-- stuff: array (nullable = true) ##  |    |-- element: struct (containsnull = true) ##  |    |    |-- a: long (nullable = true) ##  |    |    |-- b: long (nullable = true) ##  |    |    |-- c: long (nullable = true)  sqlcontext.sql("select stuff[0].a df").show()  ## +---+ ## |_c0| ## +---+ ## |  1| ## +---+

an array of maps

# note: schema inference dictionaries has been deprecated # don't use in practice df = sc.parallelize([{"stuff": [{"a": 1, "b": 2, "c": 3}]}]).todf() df.registertemptable("df")  df.printschema() ## root ##  |-- stuff: array (nullable = true) ##  |    |-- element: map (containsnull = true) ##  |    |    |-- key: string ##  |    |    |-- value: long (valuecontainsnull = true)  sqlcontext.sql("select stuff[0]['a'] df").show() ## +---+ ## |_c0| ## +---+ ## |  1| ## +---+

see querying spark sql dataframe complex types

Search This Blog

Camp

python - SparkSQL sql syntax for nth item in array -

Comments

Post a Comment

Popular posts from this blog

sql - VB.NET Operand type clash: date is incompatible with int error -

SVG stroke-linecap doesn't work for circles in Firefox? -

python - TypeError: Scalar value for argument 'color' is not numeric in openCV -