Skip to main content

Hello,

I found the following description in a Redbook.

Apache Spark Implementation on IBM z/OS
http://www.redbooks.ibm.com/redbooks/pdfs/sg248325.pdf

2.7 Spark R
Spark R is an R package that provides a light-weight front-end to use Apache Spark from R.
Spark R exposes the Spark API through the RDD class, and enables users to interactively run jobs from the R shell on a cluster. Rocket Software is in progress with a port of R to z/OS, and interested clients should contact either Rocket directly or IBM.

I’d like to use R with Apache Spark on z/OS.
Can we get Spark R library for z/OS?
Do you have any documentation about how to use SparkR on z/OS?

regards.
Tomohiro Taguchi

Hello,

I found the following description in a Redbook.

Apache Spark Implementation on IBM z/OS
http://www.redbooks.ibm.com/redbooks/pdfs/sg248325.pdf

2.7 Spark R
Spark R is an R package that provides a light-weight front-end to use Apache Spark from R.
Spark R exposes the Spark API through the RDD class, and enables users to interactively run jobs from the R shell on a cluster. Rocket Software is in progress with a port of R to z/OS, and interested clients should contact either Rocket directly or IBM.

I’d like to use R with Apache Spark on z/OS.
Can we get Spark R library for z/OS?
Do you have any documentation about how to use SparkR on z/OS?

regards.
Tomohiro Taguchi

Hello,
You will need Rocket’s latest R distribution, as announced here on the Forum. I think you already have it. You will need to install both the R tar file and the Devel tar file.

You will need IBM’s Spark distribution. There are two of these, Spark 1.5.2 and Spark 2.0.2. I myself have only tested with 1.5.2.

You will also need the Spark R code. You can get this from Apache’s 1.5.2 distribution (from the source or from any binary distribution), or from Apache’s 2.0.2 distribution (I could not find R in their source distribution, it is somewhere else now, but it is in any binary distribution). Copy the R code from the Apache distribution into the IBM distribution. This is the R directory at the top level.

I prefer to give file tags to all the files, to allow both ascii and ebcdic files to be handled well by programs such as vi or emacs. Set _BPXK_AUTOCVT to ON in your init file. Run the “autotag” program that is in the bin directory, like this: “autotag -R -s -L 12 ibm_spark_directory”. This recursively tags all files that are not already tagged, based on their contents, and allows a small number of unusual characters; any more than that number, and the file will be tagged as binary.

One of our Spark R demos began with these lines:
.libPaths(c(file.path(Sys.getenv(“SPARK_HOME”), “R”, “lib”), .libPaths()))
library(SparkR)
sc <- sparkR.init(master = “local”, # try local[*] to use all cores
appName = “Analyzer”,
sparkEnvir = list(spark.driver.memory=“2g”))

regards,
Rick Harris
Rocket Software


Hello,

I found the following description in a Redbook.

Apache Spark Implementation on IBM z/OS
http://www.redbooks.ibm.com/redbooks/pdfs/sg248325.pdf

2.7 Spark R
Spark R is an R package that provides a light-weight front-end to use Apache Spark from R.
Spark R exposes the Spark API through the RDD class, and enables users to interactively run jobs from the R shell on a cluster. Rocket Software is in progress with a port of R to z/OS, and interested clients should contact either Rocket directly or IBM.

I’d like to use R with Apache Spark on z/OS.
Can we get Spark R library for z/OS?
Do you have any documentation about how to use SparkR on z/OS?

regards.
Tomohiro Taguchi

Thank you for your support!

I have already installed Spark on z/OS V2.0.2
I get apache’s 2.0.2 distribution and copy “R” directory to IBM distribution (/usr/lpp/IBM/Spark/R).
Then I run “autotag -R -s -L 12 /usr/lpp/IBM/Spark/R” command.
After that, I can create session to spark from R console like following.

> Sys.getenv("SPARK_HOME")
[1] "/usr/lpp/IBM/Spark"

> library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib")))

Attaching package: 'SparkR'

The following objects are masked from 'package:stats':

    cov, filter, lag, na.omit, predict, sd, var, window

The following objects are masked from 'package:base':

    as.data.frame, colnames, colnames<-, drop, endsWith, intersect,
    rank, rbind, sample, startsWith, subset, summary, transform, union


> sparkR.session(master = "local[*]", sparkConfig = list(spark.driver.memory = "2g"))
Spark package found in SPARK_HOME: /usr/lpp/IBM/Spark
Launching java with spark-submit command /usr/lpp/IBM/Spark/bin/spark-submit   --driver-memory "2g" sparkr-shell /tmp/RtmpE6u1I0/backend_port50100327c7b2670
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
17/02/13 15:55:41 WARN NetUtil: Failed to find the loopback interface
17/02/13 15:55:42 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Java ref type org.apache.spark.sql.SparkSession id 1

I will be able to try next test using SparkR.
Thank you so much!

Could you please answer more questions?

(1) I’d like to know the detail of “autotag” command.
What do the “-R -s -L 12” options mean?
How can I get the reference of “autotag” command?

(2) Do you have any offical guide for using SparkR on z/OS?

Regards
Tomohiro Taguchi


Thank you for your support!

I have already installed Spark on z/OS V2.0.2
I get apache’s 2.0.2 distribution and copy “R” directory to IBM distribution (/usr/lpp/IBM/Spark/R).
Then I run “autotag -R -s -L 12 /usr/lpp/IBM/Spark/R” command.
After that, I can create session to spark from R console like following.

> Sys.getenv("SPARK_HOME")
[1] "/usr/lpp/IBM/Spark"

> library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib")))

Attaching package: 'SparkR'

The following objects are masked from 'package:stats':

    cov, filter, lag, na.omit, predict, sd, var, window

The following objects are masked from 'package:base':

    as.data.frame, colnames, colnames<-, drop, endsWith, intersect,
    rank, rbind, sample, startsWith, subset, summary, transform, union


> sparkR.session(master = "local[*]", sparkConfig = list(spark.driver.memory = "2g"))
Spark package found in SPARK_HOME: /usr/lpp/IBM/Spark
Launching java with spark-submit command /usr/lpp/IBM/Spark/bin/spark-submit   --driver-memory "2g" sparkr-shell /tmp/RtmpE6u1I0/backend_port50100327c7b2670
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
17/02/13 15:55:41 WARN NetUtil: Failed to find the loopback interface
17/02/13 15:55:42 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Java ref type org.apache.spark.sql.SparkSession id 1

I will be able to try next test using SparkR.
Thank you so much!

Could you please answer more questions?

(1) I’d like to know the detail of “autotag” command.
What do the “-R -s -L 12” options mean?
How can I get the reference of “autotag” command?

(2) Do you have any offical guide for using SparkR on z/OS?

Regards
Tomohiro Taguchi

Hi, Here iam providing the link. This link provides official guide for using Spark R on z/OS.

visit here! http://www.redbooks.ibm.com/abstracts/sg248325.html?Open