python - Reduce by multiple combinations of key in Spark Streaming -
i have spark job listens kinesis stream , aggregations on it. rdds in dstream have structure follows - {"name":"abc", "sex": "m", "age":25, "points": 2}, .....
i want calculate sum of points across different combinations of keys ie, name, sex, age, sex_age, name_sex, etc
the way doing right follows -
count_by_name = instream.map(lambda x: (x.get('name'), x.get('points'))).reducebykey(lambda x, y: x[1]+y[1]) count_by_age = instream.map(lambda x: (x.get('age'), x.get('points')).reducebykey(lambda x, y: x[1] + y[1]) count_by_age_and_sex = instream.map(lambda x: (x.get('age')+'_'+x.get('sex'), x.get('points'))).reducebykey(lambda x, y: x[1]+y[1])
and on each key. not elegant way this, since there can more keys , more combinations. , have write separate reducer each combination.
is there better way can parallelization or modularity of code?
Comments
Post a Comment