What algorithms can analyse call dependencies for library fission?

拟墨画扇 提交于 2019-12-03 17:08:04

(This is the same kind of problem that people have with header files in C and C++ programs, too.)

It isn't just "calls" which create dependencies; it is any kind of reference, to a member variable, a static variable or even a constant definition.

Basically what you need to do is to discover all the fine grain dependencies (this usually requires a compiler-like analysis tool that reads the code and discovers such dependencies, between declared language elements (declarations, fields, methods, classes, packages if you are java-centric, etc.) and other language elements. using the semantics of the language in which the libraries are written. (Such an analyis is probably conservative). This is essence gives you a giant graph, with nodes being language elements, and arcs being "uses".

The library packaging problem in the abstract is breaking this graph apart into chunks, minimizing cross-chunk dependency arcs. This may give you huge numbers of small libraries.

The practical problem is grouping together some chunks which have no actual dependency on one another, but are commonly used together. For instance, a set of buffer access procedures may not have any explicit dependency on a definition of default buffersize, but you probably want one library containing both, rather than two libraries with one containing just the default buffersize declaration. This notion of used-together is really a problem domain artifact, and isn't visible anywhere in the code except for perhaps some statistical co-occurence of uses.

The hard part of this problem is discovering the fine-grain semantic dependencies. You can approximate this by hand, but if there is any scale to the problem, you won't have the appetite to do it. (People don't reorganize header files for the same reason). Pretty much you need language tools to do the analysis, big graph management to propose chunks, statistical analysis to get a hueristic grouping, and probably a UI to allow a domain expert to editing the grouping to produce the revised libraries.

Then you need a tool to go back to code that uses the legacy libraries, and modify them to use the revised libraries. Both the library refactoring and the code base revision requires massive code analysis and change, which needs automation.

Our DMS Software Reengineering Toolkit with its many language front ends is probably a good foundation for implementing such a library reorganization. We've considered doing this for C and C++ [which is why I have this response], but its a big task even for us to do. We'd love some serious additional motivation!.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!