Code snippet for running Stanford CoreNLP Chinese NER -


i'm running corenlp inside java program, using maven dependencies. need run ner on raw chinese text. please provide code snippet this?

i found instruction: "... first need run stanford word segmenter or other chinese word segmenter, , run ner on output of that!" can't figure out how that. somehow splice chinesesegmenterannotator english ner pipeline? need chinesedocumenttosentenceprocessor before that? can done using stanfordcorenlp , right set of properties, or else required? have chinese models.

thanks.

you can run entire pipeline on chinese text. key difference use segment annotator instead of tokenize annotator.

here properties use whole chinese pipeline. can remove annotator don't need. in case can stop @ ner , remove properties parse, mention, , coref.

# pipeline options - lemma no-op chinese needed because coref demands (bad old requirements system) annotators = segment, ssplit, pos, lemma, ner, parse, mention, coref  # segment customannotatorclass.segment = edu.stanford.nlp.pipeline.chinesesegmenterannotator  segment.model = edu/stanford/nlp/models/segmenter/chinese/ctb.gz segment.sighancorporadict = edu/stanford/nlp/models/segmenter/chinese segment.serdictionary = edu/stanford/nlp/models/segmenter/chinese/dict-chris6.ser.gz segment.sighanpostprocessing = true  # sentence split ssplit.boundarytokenregex = [.]|[!?]+|[。]|[!?]+  # pos pos.model = edu/stanford/nlp/models/pos-tagger/chinese-distsim/chinese-distsim.tagger  # ner ner.model = edu/stanford/nlp/models/ner/chinese.misc.distsim.crf.ser.gz ner.applynumericclassifiers = false ner.usesutime = false  # parse parse.model = edu/stanford/nlp/models/lexparser/chinesefactored.ser.gz  # coref , mention coref.sieves = chineseheadmatch, exactstringmatch, preciseconstructs, strictheadmatch1, strictheadmatch2, strictheadmatch3, strictheadmatch4, pronounmatch coref.input.type = raw coref.postprocessing = true coref.calculatefeatureimportance = false coref.useconstituencytree = true coref.usesemantics = false coref.md.type = rule coref.mode = hybrid coref.path.word2vec = coref.language = zh coref.print.md.log = false coref.defaultpronounagreement = true coref.zh.dict = edu/stanford/nlp/models/dcoref/zh-attributes.txt.gz 

when chance i'll try write full demo class including proper imports. snippet of code run pipeline on chinese text. make sure have chinese language models jar in classpath. can go here how add chinese language models jar in maven.

properties props = new properties(); props = stringutils.propfiletoproperties("stanfordcorenlp-chinese.properties"); // properties file run entire pipeline // if uncomment following line go ner //props.setproperty("annotators","segment,ssplit,pos,lemma,ner"); stanfordcorenlp pipeline = new stanfordcorenlp(props); annotation annotation = new annotation("whatever chinese text is"); pipeline.annotate(annotation); 

Comments

Popular posts from this blog

sql - VB.NET Operand type clash: date is incompatible with int error -

SVG stroke-linecap doesn't work for circles in Firefox? -

python - TypeError: Scalar value for argument 'color' is not numeric in openCV -