How do I run a code formatter over my source without modifying git history?

后端 未结 5 1749
日久生厌
日久生厌 2020-11-30 10:35

I am trying to format an entire repo using a code formatter tool. In doing so, I want to keep information about who committed which line, so that commands like git bla

5条回答
  •  南方客
    南方客 (楼主)
    2020-11-30 10:55

    There must be a way to format the codebase while preserving the author information for each line.

    One thing you could do is to branch from some earlier commit, reformat the code, and then rebase master to your branch. That would preserve authorship for all the changes that came after whatever commit you start from.

    So that's the idea, but there are some big reasons that you shouldn't do it:

    1. Rebasing a shared branch is a bad idea. The fact that you even care about preserving the authorship of changes probably means that there are a number of people actively working on the code. If you go and rebase the master branch, then every fork or clone of your repo is going to have a master branch with the old history, and that's bound to cause confusion and pain unless you're very careful about managing the process and making certain that everybody is aware of what you're doing and updates their copies appropriately. A better approach would probably be to not rebase master, but instead merge the commits from master into your branch. Then, have everybody start using the new branch instead of master.

    2. Merge conflicts. In reformatting the entire codebase, you're probably going to make changes to a large number of lines in almost every file. When you merge the subsequent commits, whether that's via rebase or merge, you'll likely have a large number of conflicts to resolve. If you take the approach I suggested above and merge commits from master into your new branch instead of rebasing, then it'll be easier to resolve those conflicts in an orderly way because you can merge a few commits at a time until you're caught up.

    3. Incomplete solution. You're going to have to figure out where in the history you want to insert your reformatting operation. The farther back you go, the more you'll preserve the authorship of changes, but the more work it'll be to merge in the subsequent changes. So you'll probably still end up with lots of code where your reformatting commit is the latest change.

    4. Limited benefit. You never actually lose authorship information in git -- it's just that tools typically only show who made the most recent change. But you can still go back and look at prior commits and dig through the entire history of any piece of code, including who made it. So the only thing that inserting your reformatting operation into the history really buys you is the convenience of seeing who changed some piece of code without the extra step of going back to an earlier commit.

    5. It's dishonest. When you rewrite the history of a branch, you're changing a factual recording of how the code changed over time, and that can create real problems. Let's imagine that your reformatting isn't quite as inconsequential as you mean it to be, and in doing the reformatting you actually create a bug. Let's say, for example, that you introduce some extra white space into a multi-line string constant. Weeks later, somebody finally notices the problem and goes looking for the cause, and it looks like the change was made a year and a half ago (because that's where you inserted your reformatting into the history). But the problem seems new -- it doesn't show up in the build that shipped two months ago, so what the heck is going on?

    6. Benefit diminishes over time. As development continues, the changes that you're trying to hard not to cover up will be covered up by some other changes anyway, and your reformatting changes would likewise be superseded by those new changes. As time and development march on, the work you do to bury your reformatting changes won't mean much.

    If you don't want your name showing up as the author of every line in your project, but you also don't want to live with the problems described above, then you might want to rethink you approach. A better solution might be to tackle the reformatting as a team: get everyone on the team to agree to run the formatter on any file that they change, and make proper formatting a requirement in all code reviews going forward. Over time, your team will cover most of the code, and the authorship information will be mostly appropriate since every file that gets reformatted was going to be changed anyway. You may eventually end up with a small number of files that never get reformatted because they're very stable and don't need updates, and you can choose to reformat them (because having some badly formatted files makes you nuts) or not (because nobody is really working in those files anyway).

提交回复
热议问题