BK: Data mining, Chapter 2 - getting to know your data
Why: real-world data are typically noisy, enormous in volume, and may originate from a hodgepodge of heterogeneous sources. mean; median; mode(most common value); distribution; Knowing such basic statistics regarding each attribute makes it easier to fill in missing values, smooth noisy values, and spot outliers during data preprocessing. 来源: https://www.cnblogs.com/dulun/p/12293674.html