Static/strong typing and refactoring

It seems to me that the most invaluable thing about a static/strongly-typed programming language is that it helps refactoring: if/when you change any API, then the compiler will tell you what that change has broken.

I can imagine writing code in a runtime/weakly-typed language ... but I can't imagine refactoring without the compiler's help, and I can't imagine writing tens of thousands of lines of code without refactoring.

Is this true?

I think you're conflating when types are checked with how they're checked. Runtime typing isn't necessarily weak.

The main advantage of static types is exactly what you say: they're exhaustive. You can be confident all call sites conform to the type just by letting the compiler do it's thing.

The main limitation of static types is that they're limited in the constraints they can express. This varies by language, with most languages having relatively simple type systems (c, java), and others with extremely powerful type systems (haskell, cayenne).

Because of this limitation types on their own are not sufficient. For example, in java types are more or less restricted to checking type names match. This means the meaning of any constraint you want checked has to be encoded into a naming scheme of some sort, hence the plethora of indirections and boiler plate common to java code. C++ is a little better in that templates allow a bit more expressiveness, but don't come close to what you can do with dependent types. I'm not sure what the downsides to the more powerful type systems are, though clearly there must be some or more people would be using them in industry.

Even if you're using static typing, chances are it's not expressive enough to check everything you care about, so you'll need to write tests too. Whether static typing saves you more effort than it requires in boilerplate is a debate that's raged for ages and that I don't think has a simple answer for all situations.

As to your second question:

How can we re-factor safely in a runtime typed language?

The answer is tests. Your tests have to cover all the cases that matter. Tools can help you in gauging how exhaustive your tests are. Coverage checking tools let you know wether lines of code are covered by the tests or not. Test mutation tools (jester, heckle) can let you know if your tests are logically incomplete. Acceptance tests let you know what you've written matches requirements, and lastly regression and performance tests ensure that each new version of the product maintains the quality of the last.

One of the great things about having proper testing in place vs relying on elaborate type indirections is that debugging becomes much simpler. When running the tests you get specific failed assertions within tests that clearly express what they're doing, rather than obtuse compiler error statements (think c++ template errors).

No matter what tools you use: writing code you're confident in will require effort. It most likely will require writing a lot of tests. If the penalty for bugs is very high, such as aerospace or medical control software, you may need to use formal mathematical methods to prove the behavior of your software, which makes such development extremely expensive.

I totally agree with your sentiment. The very flexibility that dynamically typed languages are supposed to be good at is actually what makes the code very hard to maintain. Really, is there such a thing as a program that continues to work if the data types are changed in a non trivial way without actually changing the code?

In the mean time, you could check the type of variable being passed, and somehow fail if its not the expected type. You'd still have to run your code to root out those cases, but at least something would tell you.

I think Google's internal tools actually do a compilation and probably type checking to their Javascript. I wish I had those tools.

To start, I'm a native Perl programmer so on the one hand I've never programmed with the net of static types. OTOH I've never programmed with them so I can't speak to their benefits. What I can speak to is what its like to refactor.

I don't find the lack of static types to be a problem wrt refactoring. What I find a problem is the lack of a refactoring browser. Dynamic languages have the problem that you don't really know what the code is really going to do until you actually run it. Perl has this more than most. Perl has the additional problem of having a very complicated, almost unparsable, syntax. Result: no refactoring tools (though they're working very rapidly on that). The end result is I have to refactor by hand. And that is what introduces bugs.

I have tests to catch them... usually. I do find myself often in front of a steaming pile of untested and nigh untestable code with the chicken/egg problem of having to refactor the code in order to test it, but having to test it in order to refactor it. Ick. At this point I have to write some very dumb, high level "does the program output the same thing it did before" sort of tests just to make sure I didn't break something.

Static types, as envisioned in Java or C++ or C#, really only solve a small class of programming problems. They guarantee your interfaces are passed bits of data with the right label. But just because you get a Collection doesn't mean that Collection contains the data you think it does. Because you get an integer doesn't mean you got the right integer. Your method takes a User object, but is that User logged in?

Classic example: public static double sqrt(double a) is the signature for the Java square root function. Square root doesn't work on negative numbers. Where does it say that in the signature? It doesn't. Even worse, where does it say what that function even does? The signature only says what types it takes and what it returns. It says nothing about what happens in between and that's where the interesting code lives. Some people have tried to capture the full API by using design by contract, which can broadly be described as embedding run-time tests of your function's inputs, outputs and side effects (or lack thereof)... but that's another show.

An API is far more than just function signatures (if it wasn't, you wouldn't need all that descriptive prose in the Javadocs) and refactoring is far more even than just changing the API.

The biggest refactoring advantage a statically typed, statically compiled, non-dynamic language gives you is the ability to write refactoring tools to do quite complex refactorings for you because it knows where all the calls to your methods are. I'm pretty envious of IntelliJ IDEA.

I would say refactoring goes beyond what the compiler can check, even in statically-typed languages. Refactoring is just changing a programs internal structure without affecting the external behavior. Even in dynamic languages, there are still things that you can expect to happen and test for, you just lose a little bit of assistance from the compiler.

One of the benefits of using var in C# 3.0 is that you can often change the type without breaking any code. The type needs to still look the same - properties with the same names must exist, methods with the same or similar signature must still exist. But you can really change to a very different type, even without using something like ReSharper.

来源：https://stackoverflow.com/questions/880410/static-strong-typing-and-refactoring

标签

refactoring

strong-typing

static-typing

weak-typing