How much Scala should a Spark developer know?

Number of times, I come across the point of view that we are Spark developers not Scala developers. We are not required to go deep in Scala. We should know as much as is needed by Spark. I smile and get tempted to ask if there is a formula in their knowledge that determines how much of Scala is needed by Spark!

On a different note, I have a different point of view here. If we want to express our processing needs efficiently and effectively to Spark we better be good in Scala. If we have only rudimentary language skills, our expression will be overly verbose and convoluted. We will try to devise tricks and techniques to achieve the end results with limited Scala vocabulary and grammar we possess. This is error prone, less productive and complicated. Also capabilities and features needed to facilitate our expression might be built into language and might be more stable and efficient than our custom logic that we plan to write.

For example we might use if else construct to achieve decent amount of match feature, but using match will be more elegant and effective. Another example can be case classes, we can achieve similar results with regular classes, but the ease and brevity case classes bring to table is unbeatable.

How testable, extendable and reusable one’s data processing logic is, depends good deal on how well he uses the language features. It’s like having a genie but not knowing enough of his language to communicate your long standing very specific desire.
At the end of the day the key offer from Spark is a collection of items, resilient & distributed. The items might be text document, key value pair or data row. Rest of system is just to keep this collection up and running efficiently and effectively.

I agree Scala has a stiff learning curve and number of adapters have also talked on similar lines. Nonetheless it pays to learn it!