python - Reduce by multiple combinations of key in Spark Streaming -

- January 15, 2010

i have spark job listens kinesis stream , aggregations on it. rdds in dstream have structure follows - {"name":"abc", "sex": "m", "age":25, "points": 2}, .....

i want calculate sum of points across different combinations of keys ie, name, sex, age, sex_age, name_sex, etc

the way doing right follows -

count_by_name = instream.map(lambda x: (x.get('name'), x.get('points'))).reducebykey(lambda x, y: x[1]+y[1]) count_by_age = instream.map(lambda x: (x.get('age'), x.get('points')).reducebykey(lambda x, y: x[1] + y[1]) count_by_age_and_sex = instream.map(lambda x: (x.get('age')+'_'+x.get('sex'), x.get('points'))).reducebykey(lambda x, y: x[1]+y[1])

and on each key. not elegant way this, since there can more keys , more combinations. , have write separate reducer each combination.

is there better way can parallelization or modularity of code?

Search This Blog

Camp

python - Reduce by multiple combinations of key in Spark Streaming -

Comments

Post a Comment

Popular posts from this blog

SVG stroke-linecap doesn't work for circles in Firefox? -

routes - Laravel 4 Wildcard Routing to Different Controllers -

cross browser - XSLT namespace-alias Not Working in Firefox or Chrome -