I have a repo of 10 GB on a Linux machine which is on NFS. The first time git status
takes 36 minutes and subsequent git status
takes 8 minutes. Seems Git depends on the OS for caching files. Only the first git
commands like commit
, status
that involves pack/repack the whole repo takes a very long time for a huge repo. I am not sure if you have used git status
on such a large repo, but has anyone come across this issue?
I have tried git gc
, git clean
, git repack
but the time taken is still/almost the same.
Will sub-modules or any other concepts like breaking the repo into smaller ones help? If so which is the best for splitting a larger repo. Is there any other way to improve time taken for git commands on a large repo?
To be more precise, git depends on the efficiency of the lstat(2)
system call, so tweaking your client’s “attribute cache timeout” might do the trick.
The manual for git-update-index
— essentially a manual mode for git-status
— describes what you can do to alleviate this, by using the --assume-unchanged
flag to suppress its normal behavior and manually update the paths that you have changed. You might even program your editor to unset this flag every time you save a file.
The alternative, as you suggest, is to reduce the size of your checkout (the size of the packfiles doesn’t really come into play here). The options are a sparse checkout, submodules, or Google’s repo tool.
(There’s a mailing list thread about using Git with NFS, but it doesn’t answer many questions.)
I'm also seeing this problem on a large project shared over NFS.
It took me some time to discover the flag -uno that can be given to both git commit and git status.
What this flag does is to disable looking for untracked files. This reduces the number of nfs operations significantly. The reason is that in order for git to discover untracked files it has to look in all subdirectories so if you have many subdirectories this will hurt you. By disabling git from looking for untracked files you eliminate all these NFS operations.
Combine this with the core.preloadindex flag and you can get resonable perfomance even on NFS.
Try git gc. Also, git clean may help.
UPDATE - Not sure where the down vote came from, but the git manual specifically states:
Runs a number of housekeeping tasks within the current repository, such as compressing file revisions (to reduce disk space and increase performance) and removing unreachable objects which may have been created from prior invocations of git add.
Users are encouraged to run this task on a regular basis within each repository to maintain good disk space utilization and good operating performance.
I always notice a difference after running git gc when git status is slow!
UPDATE II - Not sure how I missed this, but the OP already tried git gc and git clean. I swear that wasn't originally there, but I don't see any changes in the edits. Sorry for that!
If your git repo makes heavy use of submodules, you can greatly speed up the performance of git status by editing the config file in the .git directory and setting ignore = dirty
on any particularly large/heavy submodules. For example:
[submodule "mysubmodule"]
url = ssh://mysubmoduleURL
ignore = dirty
You'll lose the convenience of a reminder that there are unstaged changes in any of the submodules that you may have forgotten about, but you'll still retain the main convenience of knowing when the submodules are out of sync with the main repo. Plus, you can still change your working directory to the submodule itself and use git status within it as per usual to see more information. See this question for more details about what "dirty" means.
The performance of git status should improve with Git 2.13 (Q2 2017).
See commit 950a234 (14 Apr 2017) by Jeff Hostetler (jeffhostetler
).
(Merged by Junio C Hamano -- gitster
-- in commit 8b6bba6, 24 Apr 2017)
> string-list
: use ALLOC_GROW
macro when reallocing string_list
Use
ALLOC_GROW()
macro when reallocing astring_list
array rather than simply increasing it by 32.
This is a performance optimization.During status on a very large repo and there are many changes, a significant percentage of the total run time is spent reallocing the
wt_status.changes
array.This change decreases the time in
wt_status_collect_changes_worktree()
from 125 seconds to 45 seconds on my very large repository.
Plus, Git 2.17 (Q2 2018) will introduce a new trace, for measuring where the time is spent in the index-heavy operations.
See commit ca54d9b (27 Jan 2018) by Nguyễn Thái Ngọc Duy (pclouds
).
(Merged by Junio C Hamano -- gitster
-- in commit 090dbea, 15 Feb 2018)
trace
: measure where the time is spent in the index-heavy operationsAll the known heavy code blocks are measured (except object database access). This should help identify if an optimization is effective or not.
An unoptimized git-status would give something like below:
0.001791141 s: read cache ...
0.004011363 s: preload index
0.000516161 s: refresh index
0.003139257 s: git command: ... 'status' '--porcelain=2'
0.006788129 s: diff-files
0.002090267 s: diff-index
0.001885735 s: initialize name hash
0.032013138 s: read directory
0.051781209 s: git command: './git' 'status'
The same Git 2.17 (Q2 2018) improves git status
with:
commit f39a757, commit 3ca1897, commit fd9b544, commit d7d1b49 (09 Jan 2018) by Jeff Hostetler (
jeffhostetler
).
(Merged by Junio C Hamano --gitster
-- in commit 4094e47, 08 Mar 2018)
"git status
" can spend a lot of cycles to compute the relation between the current branch and its upstream, which can now be disabled with "--no-ahead-behind
" option.commit ebbed3b (25 Feb 2018) by Derrick Stolee (
derrickstolee
).
revision.c
: reduce object database queriesIn
mark_parents_uninteresting()
, we check for the existence of an object file to see if we should treat a commit as parsed. The result is to set the "parsed" bit on the commit.Modify the condition to only check
has_object_file()
if the result would change the parsed bit.When a local branch is different from its upstream ref, "
git status
" will compute ahead/behind counts.
This usespaint_down_to_common()
and hitsmark_parents_uninteresting()
.On a copy of the Linux repo with a local instance of "master" behind the remote branch "
origin/master
" by ~60,000 commits, we find the performance of "git status
" went from 1.42 seconds to 1.32 seconds, for a relative difference of -7.0%.
git config --global core.preloadIndex true
Did the job for me. Check the official documentation here.
In our codebase where we have somewhere in the range of 20 - 30 submodules,git status --ignore-submodules
sped things up for me drastically. Do note that this will not report on the status of submodules.
Something that hasn't been mentioned yet is, to activate the filesystem cache on windows machines (linux filesystems are completly different and git was optimized for them, therefore this probably only helps on windows).
git config core.fscache true
As a last resort, if git is still slow, one could turn off the modification time inspection, that git needs to find out which files have changed.
git config core.ignoreStat true
BUT: Changed files have to be added afterwards by the dev himself with git add
. Git doesn't find changes itself.
来源:https://stackoverflow.com/questions/4994772/ways-to-improve-git-status-performance