python - Reduce by multiple combinations of key in Spark Streaming -


i have spark job listens kinesis stream , aggregations on it. rdds in dstream have structure follows - {"name":"abc", "sex": "m", "age":25, "points": 2}, .....

i want calculate sum of points across different combinations of keys ie, name, sex, age, sex_age, name_sex, etc

the way doing right follows -

count_by_name = instream.map(lambda x: (x.get('name'), x.get('points'))).reducebykey(lambda x, y: x[1]+y[1]) count_by_age = instream.map(lambda x: (x.get('age'), x.get('points')).reducebykey(lambda x, y: x[1] + y[1]) count_by_age_and_sex = instream.map(lambda x: (x.get('age')+'_'+x.get('sex'), x.get('points'))).reducebykey(lambda x, y: x[1]+y[1]) 

and on each key. not elegant way this, since there can more keys , more combinations. , have write separate reducer each combination.

is there better way can parallelization or modularity of code?


Comments

Popular posts from this blog

android - Why am I getting the message 'Youractivity.java is not an activity subclass or alias' -

python - How do I create a list index that loops through integers in another list -

c# - “System.Security.Cryptography.CryptographicException: Keyset does not exist” when reading private key from remote machine -