kmeans

说好不哭!数据可视化深度干货,前端开发下一个涨薪点在这里~

天大地大妈咪最大 提交于 2019-12-03 13:18:10
随着互联网在各行各业的影响不断深入,数据规模越来越大,各企业也越来越重视数据的价值。作为一家专业的数据智能公司,个推从消息推送服务起家,经过多年的持续耕耘,积累沉淀了海量数据,在数据可视化领域也开展了深入的探索和实践。 个推的数据可视化探索和应用由需求出发,从基于开源平台到结合个性化需求进行定制化开发,打造出个推实时消息推送下发图,人群分布热力图等优秀数据可视化作品。这一过程中,个推积累沉淀了大量的数据可视化组件,打磨了自己的数据可视化技术能力。其中,个推热力图正应用在智慧城市、人口空间规划、公共服务等领域,为其提供强大的数据支撑。 个推消息下发图 个推打造的湖滨商圈区域人口热力图 本文就和大家分享一下个推的数据可视化实践、遇到的问题及解决思路,希望大家能从中有所受益。 一、数据可视化的构成 数据可视化由四类可视化元素构成:背景信息、标尺、坐标系、视觉暗示。 1.1 背景信息 背景信息就是标题、度量单位、注释等附加类的信息。主要是为了帮助大屏受众更好地理解相关背景信息,即5W信息:何人(who)、何事(what)、何时(when)、何地(where)、为何(why)。 1.2 标尺 标尺主要用来衡量不同方向和维度上的数据大小,常用的有数字标尺、分类标尺、时间标尺等,类似我们熟悉的刻度。 1.3 坐标系 坐标系有一个结构化的空间,还有指定图形和颜色画在哪里的规则,用于编码数据的时候

个推数据可视化之人群热力图、消息下发图前端开发实践

和自甴很熟 提交于 2019-12-03 13:17:54
随着互联网在各行各业的影响不断深入,数据规模越来越大,各企业也越来越重视数据的价值。作为一家专业的数据智能公司,个推从消息推送服务起家,经过多年的持续耕耘,积累沉淀了海量数据,在数据可视化领域也开展了深入的探索和实践。 个推的数据可视化探索和应用由需求出发,从基于开源平台到结合个性化需求进行定制化开发,打造出个推实时消息推送下发图,人群分布热力图等优秀数据可视化作品。这一过程中,个推积累沉淀了大量的数据可视化组件,打磨了自己的数据可视化技术能力。其中,个推热力图正应用在智慧城市、人口空间规划、公共服务等领域,为其提供强大的数据支撑。 个推消息下发图 个推打造的湖滨商圈区域人口热力图 本文就和大家分享一下个推的数据可视化实践、遇到的问题及解决思路,希望大家能从中有所受益。 一、数据可视化的构成 数据可视化由四类可视化元素构成:背景信息、标尺、坐标系、视觉暗示。 1.1 背景信息 背景信息就是标题、度量单位、注释等附加类的信息。主要是为了帮助大屏受众更好地理解相关背景信息,即5W信息:何人(who)、何事(what)、何时(when)、何地(where)、为何(why)。 1.2 标尺 标尺主要用来衡量不同方向和维度上的数据大小,常用的有数字标尺、分类标尺、时间标尺等,类似我们熟悉的刻度。 1.3 坐标系 坐标系有一个结构化的空间,还有指定图形和颜色画在哪里的规则,用于编码数据的时候

聚类-kmeans

自古美人都是妖i 提交于 2019-12-03 09:38:56
一、作用 给定一些离散点,然后将这些离散点进行分类,这也叫做聚类。例如,将一下离散点分为两类,中心点也就是绿点和橙色的点也叫做簇心。 二、步骤 选取k个初始质心(作为初始cluster); repeat:对每个样本点,计算得到距其最近的质心,将其类别标为该质心所对应的cluster; 重新计算k个cluser对应的质心; until 质心不再发生变化 三、CODE(SKLEARN) 1 from numpy import * 2 from sklearn.cluster import KMeans 3 from sklearn.model_selection import train_test_split 4 import matplotlib.pyplot as plt 5 6 # create data 7 n_data = array([[random.randint(100, 1000), random.randint(100, 1000)] for i in range(0, 1000)]) 8 label_n = array([0 for i in range(0, 1000)]) 9 p_data = array([[-1 * random.randint(100, 1000), -1 * random.randint(100, 1000)] for i in

Show rows on clustered kmeans data

匿名 (未验证) 提交于 2019-12-03 08:59:04
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: Hi I was wondering when you cluster data on the figure screen is there a way to show which rows the data points belong to when you scroll over them? From the picture above I was hoping there would be a way in which if I select or scroll over the points that I could tell which row it belonged to. Here is the code: %% dimensionality reduction columns = 6 [U,S,V]=svds(fulldata,columns); %% randomly select dataset rows = 1000; columns = 6; %# pick random rows indX = randperm( size(fulldata,1) ); indX = indX(1:rows); %# pick random columns indY =

Python scikit-learn KMeans is being killed (9) while computing silhouette score

匿名 (未验证) 提交于 2019-12-03 08:57:35
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I'm currently working on an image dataset (250 000 images, so just as much as features vectors, everyone of them composed of 132 features) and trying to use the KMeans function provided by sklearn. I run it on Mac OS X 10.10, Python 2.7 and sklearn 0.15.2, and after a while I only obtain a: Killed: 9 Error when running these command lines: nb_cls = int(raw_input("Number of clusters chosen :")) clusterer = sklearn.cluster.KMeans(n_clusters=nb_cls) clusters_labels = clusterer.fit_predict(X) silhouette = sklearn.metrics.silhouette_score(X,

Sklearn kmeans equivalent of elbow method

匿名 (未验证) 提交于 2019-12-03 08:54:24
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: Let's say I'm examining up to 10 clusters, with scipy I usually generate the 'elbow' plot as follows: from scipy import cluster cluster_array = [cluster.vq.kmeans(my_matrix, i) for i in range(1,10)] pyplot.plot([var for (cent,var) in cluster_array]) pyplot.show() I have since became motivated to use sklearn for clustering, however I'm not sure how to create the array needed to plot as in the scipy case. My best guess was: from sklearn.cluster import KMeans km = [KMeans(n_clusters=i) for i range(1,10)] cluster_array = [km[i].fit(my_matrix)]

R kmeans NAs in foreign function call (arg 13) error

匿名 (未验证) 提交于 2019-12-03 08:54:24
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I have numeric data in a vector and I'm trying to run kmeans on it. The following gives an error > kmeans( mydata, centers = 2 ) # trying centers 2 to 20 but failing at 2 Error in do_one(nmeth) : NAs in foreign function call (arg 13) In addition: Warning message: In do_one(nmeth) : NAs introduced by coercion > str(mydata) num [1:44990687] 3.44e-06 3.44e-06 3.44e-06 3.44e-06 4.35e-05 ... > is.numeric(mydata) [1] TRUE My code works for the datasets that are smaller than this one, so I suspect it may have something to do with the size of the

KMeans clustering in PySpark

匿名 (未验证) 提交于 2019-12-03 08:28:06
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I have a spark dataframe 'mydataframe' with many columns. I am trying to run kmeans on only two columns: lat and long (latitude & longitude) using them as simple values). I want to extract 7 clusters based on just those 2 columns and then I want to attach the cluster asignment to my original dataframe. I've tried: from numpy import array from math import sqrt from pyspark.mllib.clustering import KMeans, KMeansModel # Prepare a data frame with just 2 columns: data = mydataframe.select('lat', 'long') data_rdd = data.rdd # needs to be an RDD

Kmeans matlab “Empty cluster created at iteration 1” error

匿名 (未验证) 提交于 2019-12-03 02:51:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I'm using this script to cluster a set of 3D points using the kmeans matlab function but I always get this error "Empty cluster created at iteration 1". The script I'm using: [G,C] = kmeans(XX, K, 'distance','sqEuclidean', 'start','sample'); XX can be found in this link XX value and the K is set to 3 So if anyone could please advise me why this is happening. 回答1: It is simply telling you that during the assign-recompute iterations, a cluster became empty (lost all assigned points). This is usually caused by an inadequate cluster

R draw kmeans clustering with heatmap

匿名 (未验证) 提交于 2019-12-03 01:58:03
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I would like to cluster a matrix with kmeans, and be able to plot it as heatmap. It sounds quite trivial, and I have seen many plots like this. I have tried to google atround, but can't find a way round it. I'd like to be able to plot something like panel A or B on this figure. Let say I have a matrix with 250 rows and 5 columns. I don't want to cluster the columns, just the rows. m = matrix(rnorm(25), 250, 5) km = kmeans(m, 10) Then how do I plot those 10 clusters as a heatmap ? You comments and helps is more than welcome. Thanks. 回答1: