python - How to average columns in a data frame based on grouping of another dataframe -
i have 2 csv data looks this:
gene,stem1,stem2,stem3,b1,b2,b3,t1 foo,20,10,11,23,22,79,3 bar,17,13,505,12,13,88,1 qui,17,13,5,12,13,88,3
and this:
celltype,phenotype sc,stem1 bc,b2 sc,stem2 sc,stem3 bc,b1 tc,t1 bc,b3
the data frame this:
in [5]: import pandas pd in [7]: main_df = pd.read_table("http://dpaste.com/2mrrrm3.txt", sep=",") in [8]: main_df out[8]: gene stem1 stem2 stem3 b1 b2 b3 t1 0 foo 20 10 11 23 22 79 3 1 bar 17 13 505 12 13 88 1 2 qui 17 13 5 12 13 88 3 in [11]: source_df = pd.read_table("http://dpaste.com/091pne5.txt", sep=",") in [12]: source_df out[12]: celltype phenotype 0 sc stem1 1 bc b2 2 sc stem2 3 sc stem3 4 bc b1 5 tc t1 6 bc b3
what want average every column in main_df
based on grouping in source_df
. looks in end:
sc bc tc foo (20+10+11)/3 (23+22+79)/3 3/1 bar (17+13+505)/3 (12+13+88)/3 1/1 qui (17+13+5)/3 (12+13+88)/3 3/1
how can achieve that?
you convert source_df
dict
, apply main_df
using .groupby()
on axis=1
:
main_df.set_index('gene', inplace=true) col_dict = source_df.set_index('phenotype').squeeze().to_dict() main_df.groupby(col_dict, axis=1).mean() bc sc tc gene foo 41.333333 13.666667 3 bar 37.666667 178.333333 1 qui 37.666667 11.666667 3
Comments
Post a Comment