Managing highly repetitive code and documentation in Java

橙三吉。 提交于 2019-11-27 06:28:48
SyntaxT3rr0r

For people that absolutely need performance, boxing and unboxing and generified collections and whatnot are big no-no's.

The same problem happens in performance computing where you need the same complex to work both for float and double (say some of the method shown in Goldberd's "What every computer scientist should know about floating-point numbers" paper).

There's a reason why Trove's TIntIntHashMap runs circles around Java's HashMap<Integer,Integer> when working with a similar amount of data.

Now how are Trove collection's source code written?

By using source code instrumentation of course :)

There are several Java libraries for higher performance (much higher than the default Java ones) that use code generators to create the repeated source code.

We all know that "source code instrumentation" is evil and that code generation is crap, but still that's how people who really know what they're doing (i.e. the kind of people that write stuff like Trove) do it :)

For what it is worth we generate source code that contains big warnings like:

/*
 * This .java source file has been auto-generated from the template xxxxx
 * 
 * DO NOT MODIFY THIS FILE FOR IT SHALL GET OVERWRITTEN
 * 
 */

If you absolutely must duplicate code, follow the great examples you've given and group all of that code in one place where it's easy to find and fix when you have to make a change. Document the duplication and, more importantly, the reason for the duplication so that everyone who comes after you is aware of both.

From Wikipedia Don't Repeat Yourself (DRY) or Duplication is Evil (DIE)

In some contexts, the effort required to enforce the DRY philosophy may be greater than the effort to maintain separate copies of the data. In some other contexts, duplicated information is immutable or kept under a control tight enough to make DRY not required.

There is probably no answer or technique to prevent problems like that.

Adam Gent

Even fancy pants languages like Haskell have repetitive code (see my post on haskell and serialization)

It seems there are three choices to this problem:

  1. Use reflection and lose performance
  2. Use preprocessing like Template Haskell or Caml4p equivalent for your language and live with nastiness
  3. Or my personal favorite use macros if your language supports it (scheme, and lisp)

I consider the macros different than preprocessing because the macros are usually in the same language that the target is where as preprocessing is a different language.

I think Lisp/Scheme macros would solve many of these problems.

I get that Sun has to document like this for the Java SE library code and maybe other 3rd party library writers do as well.

However, I think it is an utter waste to copy and paste documentation throughout a file like this in code that is only used in house. I know many people will disagree because it will make their in house JavaDocs look less clean. However, the trade off is that is makes their code more clean which, in my opinion, is more important.

Java primitive types screw you, especially when it comes to arrays. If you're specifically asking about code involving primitive types, then I would say just try to avoid them. The Object[] method is sufficient if you use the boxed types.

In general, you need lots of unit tests and there really isn't anything else to be done, other than resorting to reflection. Like you said, it's another subject entirely, but don't be too afraid of reflection. Write the DRYest code you can first, then profile it and determine if the reflection performance hit is really bad enough to warrant writing out and maintaining the extra code.

You could use a code generator to construct variations of the code using a template. In that case, the java source is a product of the generator and the real code is the template.

Given two code fragments that are claimed to be similar, most languages have limited facilities for constructing abstractions that unify the code fragments into a monolith. To abstract when your language can't do it, you have to step outside the language :-{

The most general "abstraction" mechanism is a full macro processor which can apply arbitrary computations to the "macro body" while instantiating it (think Post or string-rewriting system, which is Turing capable). M4 and GPM are quintessential examples. The C preprocessor isn't one of these.

If you have such a macro processor, you can construct an "abstraction" as a macro, and run the macro processor on your "abstracted" source text to produce the actual source code you compile and run.

You can also use more limited versions of the ideas, often called "code generators". These are usually not Turing capable, but in many cases they work well enough. It depends on how sophisticated your "macro instantiation" needs to be. (The reason people are enamored with the C++ template mechanism is ths despite its ugliness, it is Turing capable and so people can do truly ugly but astonishing code generation tasks with it). Another answer here mentions Trove, which is apparantly in the more limited but still very useful category.

Really general macro processors (like M4) manipulate just text; that makes them powerful but they don't handle the structure of programming language well, and it is really awkward to write a generaor in such a mcaro processor that can not only produce code, but optimize the generated result. Most code generators that I encounter are "plug this string into this string template" and so cannot do any optimization of a generated result. If you want generation of arbitrary code and high performance to boot, you need something that is Turing capable but understands the structure of the generated code so it can easily manipulate (e.g., optimize) it).

Such a tool is called a Program Transformation System. Such a tool parses the source text just like a compiler does,and then carries analyses/transformations on it to achieve a desired effect. If you can put markers in the source text of your program (e.g, structured comments or annotations in langauges that have them) directing the program transformaiton tool what to do, then you can use it to carry out such abstraction instantiation, code generation, and/or code optimization. (One poster's suggestion of hooking into the Java compiler is a variation on this idea). Using a general puprose transformation system (such as DMS Software Reengineering Tookit means you can do this for essentially any language.

A lot of this kind of repetition can now be avoided thanks to generics. They're a godsend when writing the same code where only the types change.

Sadly though, I think generic arrays are still not very well supported. For now at least, use containers that allow you to take advantage of generics. Polymorphism is also a useful tool to reduce this kind of code duplication.

To answer your question about how to handle code that absolutely must be duplicated... Tag each instance with easily searchable comments. There are some java preprocessors out there, that add C-style macros. I think I remember netbeans having one.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!