Spark scala running -


hi new spark , scala. running scala code in spark scala prompt. program fine, it's showing "defined module mllib" not printing on screen. have done wrong? there other way run program spark in scala shell , output?

 import org.apache.spark.{sparkconf, sparkcontext}     import org.apache.spark.mllib.classification.logisticregressionwithsgd     import org.apache.spark.mllib.feature.hashingtf     import org.apache.spark.mllib.regression.labeledpoint  object mllib {    def main(args: array[string]) {     val conf = new sparkconf().setappname(s"book example: scala")     val sc = new sparkcontext(conf)      // load 2 types of emails text files: spam , ham (non-spam).     // each line has text 1 email.     val spam = sc.textfile("/home/training/spam.txt")     val ham = sc.textfile("/home/training/ham.txt")      // create hashingtf instance map email text vectors of 100 features.     val tf = new hashingtf(numfeatures = 100)     // each email split words, , each word mapped 1 feature.     val spamfeatures = spam.map(email => tf.transform(email.split(" ")))     val hamfeatures = ham.map(email => tf.transform(email.split(" ")))      // create labeledpoint datasets positive (spam) , negative (ham) examples.     val positiveexamples = spamfeatures.map(features => labeledpoint(1, features))     val negativeexamples = hamfeatures.map(features => labeledpoint(0, features))     val trainingdata = positiveexamples ++ negativeexamples     trainingdata.cache() // cache data since logistic regression iterative algorithm.      // create logistic regression learner uses lbfgs optimizer.     val lrlearner = new logisticregressionwithsgd()     // run actual learning algorithm on training data.     val model = lrlearner.run(trainingdata)      // test on positive example (spam) , negative 1 (ham).     // first apply same hashingtf feature transformation used on training data.     val postestexample = tf.transform("o m g cheap stuff sending money ...".split(" "))     val negtestexample = tf.transform("hi dad, started studying spark other ...".split(" "))     // use learned model predict spam/ham new emails.     println(s"prediction positive test example: ${model.predict(postestexample)}")     println(s"prediction negative test example: ${model.predict(negtestexample)}")      sc.stop()   } } 

a couple of things:

you defined object in the spark shell, main class won't called immediately. you'll have call explicitly after define object:

mllib.main(array())

in fact, if continue work on shell/repl can away object altogether; can define function directly. example:

import org.apache.spark.{sparkconf, sparkcontext} import org.apache.spark.mllib.classification.logisticregressionwithsgd import org.apache.spark.mllib.feature.hashingtf import org.apache.spark.mllib.regression.labeledpoint  def mllib {     //the rest of code } 

however, shouldn't initialize sparkcontext within shell. documentation:

in spark shell, special interpreter-aware sparkcontext created you, in variable called sc. making own sparkcontext not work

so, have either remove bit code, or compile jar , run using spark-submit


Comments

Popular posts from this blog

SVG stroke-linecap doesn't work for circles in Firefox? -

routes - Laravel 4 Wildcard Routing to Different Controllers -

cross browser - XSLT namespace-alias Not Working in Firefox or Chrome -