python - SparkSQL sql syntax for nth item in array -


i have json object has unfortunate combination of nesting , arrays. not totally obvious how query spark sql.

here sample object:

{   stuff: [     {a:1,b:2,c:3}   ] } 

so, in javascript, value c, i'd write mydata.stuff[0].c

and in spark sql query, if array wasn't there, i'd able use dot notation:

select stuff.c blah 

but can't, because innermost object wrapped in array.

i've tried:

select stuff.0.c blah // fail select stuff.[0].c blah // fail 

so, magical way select data? or supported yet?

it not clear mean json object lets consider 2 different cases:

  1. an array of structs

    import tempfile      path = tempfile.mktemp() open(path, "w") fw:      fw.write('''{"stuff": [{"a": 1, "b": 2, "c": 3}]}''') df = sqlcontext.read.json(path) df.registertemptable("df")  df.printschema() ## root ##  |-- stuff: array (nullable = true) ##  |    |-- element: struct (containsnull = true) ##  |    |    |-- a: long (nullable = true) ##  |    |    |-- b: long (nullable = true) ##  |    |    |-- c: long (nullable = true)  sqlcontext.sql("select stuff[0].a df").show()  ## +---+ ## |_c0| ## +---+ ## |  1| ## +---+ 
  2. an array of maps

    # note: schema inference dictionaries has been deprecated # don't use in practice df = sc.parallelize([{"stuff": [{"a": 1, "b": 2, "c": 3}]}]).todf() df.registertemptable("df")  df.printschema() ## root ##  |-- stuff: array (nullable = true) ##  |    |-- element: map (containsnull = true) ##  |    |    |-- key: string ##  |    |    |-- value: long (valuecontainsnull = true)  sqlcontext.sql("select stuff[0]['a'] df").show() ## +---+ ## |_c0| ## +---+ ## |  1| ## +---+ 

see querying spark sql dataframe complex types


Comments

Popular posts from this blog

sql - VB.NET Operand type clash: date is incompatible with int error -

SVG stroke-linecap doesn't work for circles in Firefox? -

python - TypeError: Scalar value for argument 'color' is not numeric in openCV -