Pyspark MCQ
Pyspark MCQ
Pyspark MCQ
given the sfpd rdd to create a pair rdd consisting of tuples consisting of the
form (category,1) in scala use?
--> val pairs = sfpd.map(x=>x.parallelize))
6.The keys transformation returns an RDD with ordered keys from key value psir RDD?
T or F
--> TRUE
8). which partitioner class used to order keys according to sort order respective
to given type?
--> Rangepartitioner
9.the primary Machine Learning api for spark now is ____ based api?
-->DataFrame
11. the number of stages in a job is no of RDD in DAG, scheduler can truncate
lineage when ?
-->RDD is cched or persisted
12.combining a set of filtered edges and filtered vertives from a graph creates
what structure?
-->subgraph
16. Given the pair RDD country that contain tuple (country, count()) which one to
get lowest
refugee in scala?
Ans; val low-=country.map(x=>(x._2,x._1)).sortbykey(false).first
18. What r some of the things u can monitor in spark web UI?
--> All of above
21.which of the below command used to remove a broadcast variable bvar from memory?
--> bvar.unpersist()
22. A dataframe can be created from existing RDD . You would create dataframe from
existing
rdd by inferring schema using case classes in which case?
--> if all your users are going to need dataset parsed in same way
26) Which dataframe method is used to remove column from resultant dataframe?
Ans; drop()
30) sparkSQL translated commands into codes ,these codes are processed by ?
ans; executor node
31)
33) pyspark is bunch figuring structure keeps running on grp of item and perform
information
unification . T or F.
ans;True
35) ___ leverages spark core fast scheduling capability for performing streaming
analytics?
Ans; SparkStreaming