Spark scala running -
hi new spark , scala. running scala code in spark scala prompt. program fine, it's showing "defined module mllib" not printing on screen. have done wrong? there other way run program spark in scala shell , output?
import org.apache.spark.{sparkconf, sparkcontext} import org.apache.spark.mllib.classification.logisticregressionwithsgd import org.apache.spark.mllib.feature.hashingtf import org.apache.spark.mllib.regression.labeledpoint object mllib { def main(args: array[string]) { val conf = new sparkconf().setappname(s"book example: scala") val sc = new sparkcontext(conf) // load 2 types of emails text files: spam , ham (non-spam). // each line has text 1 email. val spam = sc.textfile("/home/training/spam.txt") val ham = sc.textfile("/home/training/ham.txt") // create hashingtf instance map email text vectors of 100 features. val tf = new hashingtf(numfeatures = 100) // each email split words, , each word mapped 1 feature. val spamfeatures = spam.map(email => tf.transform(email.split(" "))) val hamfeatures = ham.map(email => tf.transform(email.split(" "))) // create labeledpoint datasets positive (spam) , negative (ham) examples. val positiveexamples = spamfeatures.map(features => labeledpoint(1, features)) val negativeexamples = hamfeatures.map(features => labeledpoint(0, features)) val trainingdata = positiveexamples ++ negativeexamples trainingdata.cache() // cache data since logistic regression iterative algorithm. // create logistic regression learner uses lbfgs optimizer. val lrlearner = new logisticregressionwithsgd() // run actual learning algorithm on training data. val model = lrlearner.run(trainingdata) // test on positive example (spam) , negative 1 (ham). // first apply same hashingtf feature transformation used on training data. val postestexample = tf.transform("o m g cheap stuff sending money ...".split(" ")) val negtestexample = tf.transform("hi dad, started studying spark other ...".split(" ")) // use learned model predict spam/ham new emails. println(s"prediction positive test example: ${model.predict(postestexample)}") println(s"prediction negative test example: ${model.predict(negtestexample)}") sc.stop() } }
a couple of things:
you defined object
in the spark shell, main
class won't called immediately. you'll have call explicitly after define object
:
mllib.main(array())
in fact, if continue work on shell/repl can away object altogether; can define function directly. example:
import org.apache.spark.{sparkconf, sparkcontext} import org.apache.spark.mllib.classification.logisticregressionwithsgd import org.apache.spark.mllib.feature.hashingtf import org.apache.spark.mllib.regression.labeledpoint def mllib { //the rest of code }
however, shouldn't initialize sparkcontext
within shell. documentation:
in spark shell, special interpreter-aware sparkcontext created you, in variable called sc. making own sparkcontext not work
so, have either remove bit code, or compile jar , run using spark-submit
Comments
Post a Comment