python - remove low counts from pandas data frame column on condition -
i have following pandas data frame:
new = pd.series(np.array([0, 1, 0, 0, 2, 2])) df = pd.dataframe(new, columns=['a'])
i output occurrences of each value by:
print df['a'].value_counts()
then have following:
0 3 2 2 1 1 dtype: int64
now want remove rows column 'a' value less 2. can iterate through each value in df['a'] , remove if value count less 2, takes long time large data frame multiple columns. can't figure out what's efficient way that.
one approach join counts data original df.
df2 = pd.dataframe(df['a'].value_counts()) df2.reset_index(inplace=true) df2.columns = ['a','counts'] # df2 = # counts # 0 0 3 # 1 2 2 # 2 1 1 df3 = df.merge(df2,on='a') # df3 = # counts # 0 0 3 # 1 0 3 # 2 0 3 # 3 1 1 # 4 2 2 # 5 2 2 # filter df3[df3.counts>=2]
Comments
Post a Comment