Refactoring / layout of functional Scala

问题

This one liner...

 Console.println(io.Source.fromFile("names.txt").getLines.mkString.split(",").map{x:String => x.slice(1, x.length -1)}.sortBy { x => x}.zipWithIndex.map{t =>{ (t._2 +1)*(t._1.map{_.toChar - "A"(0).toChar + 1}.sum)}}.sum);

... is my solution to Project Euler problem 22. It seems to work, and it's written in (my attempt at) functional style.

This example is a bit extreme, but my question is a bit more general - how do you prefer to write/format/comment functional style code? The functional approach seems to encourage a sequence of method calls, which I find can get unreadable, and also leaves nowhere obvious to put comments.

Also, when I write procedural code, I find I write small methods each with one purpose and with meaningful names. When I write functional code, I seem to be developing a habit that produces lines a little like the one above, where (to me) the meaning is difficult to decipher - and also the individual computations are difficult to re-use elsewhere. Quite a lot of the functional code examples I see on the web are similarly terse (to me) obscure.

What should I be doing? Writing little functions for each part of the computatation with names meaningful in the current context? (even if they're little more than a wrapper for map, say?)

For the example I've given, what are better ways of writing it, and presenting it?

(Like all style questions, this one is subjective. There's no reason it should get argumentative, though!)

回答1:

A trivial first-attempt at tidying it up is to just remove the leading Console. the trailing ; and the explicit :String type - all of which are unnecessary - add some indentation and import io.Source:

import io.Source
println(
  Source.fromFile("names.txt").getLines.mkString.split(",").map{
    x => x.slice(1, x.length -1)
  }.sortBy {x => x}.zipWithIndex.map{
    t =>{ (t._2 +1)*(t._1.map{_.toChar - "A"(0).toChar + 1}.sum)}
  }.sum
)

The next step is to clean it up a little, use pattern matching when mapping over list of tuples and identity instead of x=>x. toChar is also unnecessary for characters, and single quotes can be used to represent character literals.

import io.Source
println(
  Source.fromFile("names.txt").getLines.mkString.split(",").map {
    x => x.slice(1, x.length -1)
  }.sortBy(identity).zipWithIndex.map {
    case (v, idx) =>{ (idx+1)*(v.map{_ - 'A' + 1}.sum)}
  }.sum
)

A few more changes also help make the intent of the code far clearer:

import io.Source
println(
  Source.fromFile("names.txt").getLines.mkString.split(",")
  .map { _.stripPrefix("\"").stripSuffix("\"") }
  .sortBy(identity)
  .map { _.map{_ - 'A' + 1}.sum }
  .zipWithIndex
  .map { case (v, idx) => (idx+1) * v }
  .sum
)

The next step, to make it more "functional", is to break it into "functions" (sneaky, huh?). Ideally each function will have a name clearly expressing its purpose, and it will be short (aim for it to be a single expression, so braces aren't required):

import io.Source

def unquote(s:String) = s.stripPrefix("\"").stripSuffix("\"")

def wordsFrom(fname:String) =
  Source.fromFile(fname).getLines.mkString.split(",").map(unquote)

def letterPos(c:Char) = c - 'A' + 1

println(
  wordsFrom("names.txt")
  .sortBy(identity)
  .map { _.map(letterPos).sum }
  .zipWithIndex
  .map { case (v, idx) => (idx+1) * v }
  .sum
)

wordsFrom is an obvious 1-liner, but I split it for easier formatting on stackOverflow

回答2:

Here is what I think might be a better way to lay it out:

Console.println(
    io.Source.fromFile("names.txt")
    .getLines.mkString.split(",")
    .map{x:String => x.slice(1, x.length -1)}
    .sortBy { x => x}
    .zipWithIndex
    .map{t =>{ (t._2 +1)*(t._1.map{_.toChar - "A"(0).toChar + 1}.sum)}}
    .sum);

I feel like deep in the recesses of my brain there's some algorithm that makes decisions about code-layout trade-offs between horizontal and vertical space, but I don't seem to have direct access to enable me to articulate it. :)

Regarding introducing names rather than using lambdas, I think what I typically do is, if I am tempted to put a short comment describing the intent of the code, but a good identifier name may do the same, then I may factor a one-time lambda into a named function in order to get the readability benefit of the identifier name. The line with the toChar calls is the only one above that looks like a candidate to me. (To be clear, I'd factor (part of) the lambda inside the map, but the map call itself.) Alternatively, the introduction of vertical whitespace gives you a place to hang a //comment which is an alternative to introducing an identifier name.

(Disclaimer: I don't write Scala, so if anything I say conflicts with style conventions, then ignore me :), but I imagine a lot of this advice is mostly-language-agnostic.)

回答3:

Speaking strictly to how to format that code, without making any structural changes, I'd do it like this:

Console println (
  (
    io.Source
    fromFile "names.txt"
    getLines ()
    mkString ""
    split ","
    map (x => x.slice(1, x.length - 1))
    sortBy (x => x)
    zipWithIndex
  )
  map (t => (t._2 + 1) * (t._1 map (_.toChar - "A"(0).toChar + 1) sum) )
  sum
)

Or, perhaps, to get around the parameterless methods, I'd do this:

Console println (
  io.Source
  .fromFile("names.txt")
  .getLines
  .mkString
  .split(",")
  .map(x => x.slice(1, x.length - 1))
  .sortBy(x => x)
  .zipWithIndex
  .map(t => (t._2 + 1) * (t._1 map (_.toChar - "A"(0).toChar + 1) sum) )
  .sum
)

Note that there is plenty of space for comments, but, generally speaking, what is being done is usually clear. People not used to it might sometimes get lost halfway through, without variables to keep track of the meaning/type of the transformed value.

Now, some things I'd do differently are:

println ( // instead of Console println
  Source // put import elsewhere
  .fromFile("names.txt")
  .mkString // Unless you want to get rid of /n, which is unnecessary here
  .split(",")
  .map(x => x.slice(1, x.length - 1))
  .sorted // instead of sortBy
  .zipWithIndex
  .map { // use { only for multiple statements and, as in this case, pattern matching
    case (c, index) => (index + 1) * (c map (_ - 'A' + 1) sum) // chars are chars
  }
  .sum
)

I'd also not do sum and multiplying in the same step, so:

  .sorted
  .map(_ map (_ - 'A' + 1) sum)
  .zipWithIndex
  .map { case (av, index) => av * (index + 1) }
  .sum

Finally, I don't much like that string resizing, so I might resort to regex instead. Add a little refactoring, and this is something I'd probably write:

  import scala.io.Source
  def names = Source fromFile "names.txt" mkString

  def wordExtractor = """"(.*?)"""".r
  def words = for {
    m <- wordExtractor findAllIn names matchData
  } yield m group 1

  def alphabeticValue(s: String) = s map (_ - 'A' + 1) sum
  def wordsAV = words.toList.sorted map alphabeticValue

  def multByIndex(t: (Int, Int)) = t match {
    case (av, index) => av * (index + 1)
  }
  def wordsAVByIndex = wordsAV.zipWithIndex map multByIndex

  println(wordsAVByIndex.sum)

EDIT

The next step would be a renaming refactoring -- choosing names that better convey what each part of the code is doing. Ken suggested better names in the comments, and I'd appropriate them for one more variant (it also showcases nicely how much better naming improves readability).

import scala.io.Source
def rawData = Source fromFile "names.txt" mkString

// I'd rather write "match" than "m" in the next snippet, but
// the former is a keyword in Scala, so "m" has become more
// common in my code than "i". Also, make the return type of
// getWordsOf clear, because iterators can be tricky.
// Returning a list, however, makes a much less cleaner
// definition.

def wordExtractor = """"(.*?)"""".r
def getWordsOf(input: String): Iterator[String] = for {
  m <- wordExtractor findAllIn input matchData
} yield m group 1
def wordList = getWordsOf(rawData).toList

// I stole letterPosition from Kevin's solution. There, I said it. :-)

def letterPosition(c: Char) = c.toUpper - 'A' + 1 // .toUpper isn't necessary
def alphabeticValueOfWord(word: String) = word map letterPosition sum
def alphabeticValues = wordList.sorted map alphabeticValueOfWord

// I don't like multAVByIndex, but I haven't decided on a better
// option yet. I'm not very fond of declaring methods that return
// functions either, but I find that better than explicitly handling
// tuples (oh, for the day tuples/arguments are unified!).

def multAVByIndex = (alphabeticValue: Int, index: Int) => 
  alphabeticValue * (index + 1)
def scores = alphabeticValues.zipWithIndex map multAVByIndex.tupled

println(scores sum)

回答4:

In addition to Kevin's solution,

the key is to split the functionality clearly and neatly, taking in consideration of reusability and readability.

To make the code even shorter and cleaner, it seems that the for expression can be used.


def inputString(inputFile: String) = io.Source.fromFile(inputFile).getLines.mkString

def inputWords(input: String) = input.split("[,\"]").filter("" != )

Console.println {
    (for { (s, idx) <- inputWords(inputString("names.txt")).sorted.zipWithIndex }
        yield s.map(_ - 'A' + 1).sum * (idx + 1)).sum
}

The s.map(_-'A'+1) part can be further put into a function, say wordSum, if you want it to be more distinct.

来源：https://stackoverflow.com/questions/4238902/refactoring-layout-of-functional-scala

标签

scala

refactoring

functional-programming

coding-style