python - How to average columns in a data frame based on grouping of another dataframe -

- July 15, 2012

i have 2 csv data looks this:

gene,stem1,stem2,stem3,b1,b2,b3,t1 foo,20,10,11,23,22,79,3 bar,17,13,505,12,13,88,1 qui,17,13,5,12,13,88,3

and this:

celltype,phenotype sc,stem1 bc,b2 sc,stem2 sc,stem3 bc,b1 tc,t1 bc,b3

the data frame this:

in [5]: import pandas pd in [7]: main_df = pd.read_table("http://dpaste.com/2mrrrm3.txt", sep=",")  in [8]: main_df out[8]:       gene  stem1  stem2  stem3  b1  b2  b3  t1     0  foo     20     10     11  23  22  79   3     1  bar     17     13    505  12  13  88   1     2  qui     17     13      5  12  13  88   3   in [11]: source_df = pd.read_table("http://dpaste.com/091pne5.txt", sep=",")  in [12]: source_df out[12]:   celltype phenotype 0       sc     stem1 1       bc        b2 2       sc     stem2 3       sc     stem3 4       bc        b1 5       tc        t1 6       bc        b3

what want average every column in main_df based on grouping in source_df. looks in end:

       sc                bc                tc foo   (20+10+11)/3     (23+22+79)/3        3/1 bar   (17+13+505)/3    (12+13+88)/3        1/1 qui   (17+13+5)/3      (12+13+88)/3        3/1

how can achieve that?

you convert source_df dict , apply main_df using .groupby() on axis=1:

main_df.set_index('gene', inplace=true) col_dict = source_df.set_index('phenotype').squeeze().to_dict() main_df.groupby(col_dict, axis=1).mean()               bc          sc  tc gene                            foo   41.333333   13.666667   3 bar   37.666667  178.333333   1 qui   37.666667   11.666667   3

Search This Blog

Camp

python - How to average columns in a data frame based on grouping of another dataframe -

Comments

Post a Comment

Popular posts from this blog

sql - VB.NET Operand type clash: date is incompatible with int error -

SVG stroke-linecap doesn't work for circles in Firefox? -

python - TypeError: Scalar value for argument 'color' is not numeric in openCV -