python - Reduce by multiple combinations of key in Spark Streaming -


i have spark job listens kinesis stream , aggregations on it. rdds in dstream have structure follows - {"name":"abc", "sex": "m", "age":25, "points": 2}, .....

i want calculate sum of points across different combinations of keys ie, name, sex, age, sex_age, name_sex, etc

the way doing right follows -

count_by_name = instream.map(lambda x: (x.get('name'), x.get('points'))).reducebykey(lambda x, y: x[1]+y[1]) count_by_age = instream.map(lambda x: (x.get('age'), x.get('points')).reducebykey(lambda x, y: x[1] + y[1]) count_by_age_and_sex = instream.map(lambda x: (x.get('age')+'_'+x.get('sex'), x.get('points'))).reducebykey(lambda x, y: x[1]+y[1]) 

and on each key. not elegant way this, since there can more keys , more combinations. , have write separate reducer each combination.

is there better way can parallelization or modularity of code?


Comments

Popular posts from this blog

sql - VB.NET Operand type clash: date is incompatible with int error -

SVG stroke-linecap doesn't work for circles in Firefox? -

python - TypeError: Scalar value for argument 'color' is not numeric in openCV -