python - convert text columns into numbers in sklearn -
i'm new data analytics. i'm trying models in python sklearn. have dataset in of columns have text columns. below,
is there way convert these column values numbers in pandas or sklearn?. assigning numbers these values right?. , if new string pops out in test data?.
please advice.
you can convert them integer codes using categorical datatype.
column = column.astype('category') column_encoded = column.cat.codes
as long use use tree based model deep enough trees, eg gradientboostingclassifier(max_depth=10
), model should able split out categories again.
Comments
Post a Comment