normalize

I want to flatten JSON column in a Pandas DataFrame

谁说胖子不能爱 提交于 2019-11-28 09:20:44
问题 I have an input dataframe df which is as follows: id e 1 {"k1":"v1","k2":"v2"} 2 {"k1":"v3","k2":"v4"} 3 {"k1":"v5","k2":"v6"} I want to "flatten" the column 'e' so that my resultant dataframe is: id e.k1 e.k2 1 v1 v2 2 v3 v4 3 v5 v6 How can I do this? I tried using json_normalize but did not have much success 回答1: Here is a way to use pandas.io.json.json_normalize(): from pandas.io.json import json_normalize df = df.join(json_normalize(df["e"].tolist()).add_prefix("e.")).drop(["e"], axis=1)

How to Normalize Names

随声附和 提交于 2019-11-27 15:31:32
问题 I am using pandas dataframes and I have data where I have customers per company. However, the company titles vary slightly but ultimately affect the data. Example: Company Customers AAAB 1,000 AAAB Inc. 900 The AAAB Inc. 20 AAAB the INC 10 I want to get the total customers out of a data base of several different companies with the companies having non-standard names. Any idea where I should start? 回答1: I remember reading this blog about the fuzzywuzzy library (looking into another question),

pandas.io.json.json_normalize with very nested json

懵懂的女人 提交于 2019-11-27 14:00:02
I have been trying to normalize a very nested json file I will later analyze. What I am struggling with is how to go more than one level deep to normalize. I went through the pandas.io.json.json_normalize documentation, since it does exactly what I want it to do. I have been able to normalize part of it and now understand how dictionaries work, but I am still not there. With below code I am able to get only the first level. import json import pandas as pd from pandas.io.json import json_normalize with open('authors_sample.json') as f: d = json.load(f) raw = json_normalize(d['hits']['hits'])

Removing diacritics in Silverlight (String.Normalize issue)

不羁岁月 提交于 2019-11-27 06:59:59
问题 I did create a function that transforms diacritic characters into non-diacritic characters (based on this post) Here’s the code: Public Function RemoveDiacritics(ByVal searchInString As String) As String Dim returnValue As String = "" Dim formD As String = searchInString.Normalize(System.Text.NormalizationForm.FormD) Dim unicodeCategory As System.Globalization.UnicodeCategory = Nothing Dim stringBuilder As New System.Text.StringBuilder() For formScan As Integer = 0 To formD.Length - 1

Normalize columns of pandas data frame

流过昼夜 提交于 2019-11-26 19:20:41
I have a dataframe in pandas where each column has different value range. For example: df: A B C 1000 10 0.5 765 5 0.35 800 7 0.09 Any idea how I can normalize the columns of this dataframe where each value is between 0 and 1? My desired output is: A B C 1 1 1 0.765 0.5 0.7 0.8 0.7 0.18(which is 0.09/0.5) Sandman You can use the package sklearn and its associated preprocessing utilities to normalize the data. from sklearn import preprocessing x = df.values #returns a numpy array min_max_scaler = preprocessing.MinMaxScaler() x_scaled = min_max_scaler.fit_transform(x) df = pandas.DataFrame(x

How can I normalize a URL in python

大城市里の小女人 提交于 2019-11-26 17:19:11
I'd like to know do I normalize a URL in python. For example, If I have a url string like : " http://www.example.com/foo goo/bar.html" I need a library in python that will transform the extra space (or any other non normalized character) to a proper URL. Armin Ronacher Have a look at this module: werkzeug.utils . (now in werkzeug.urls ) The function you are looking for is called "url_fix" and works like this: >>> url_fix(u'http://de.wikipedia.org/wiki/Elf (Begriffsklärung)') 'http://de.wikipedia.org/wiki/Elf%20%28Begriffskl%C3%A4rung%29' It's implemented in Werkzeug as follows: import urllib

Normalize columns of pandas data frame

▼魔方 西西 提交于 2019-11-26 06:12:25
问题 I have a dataframe in pandas where each column has different value range. For example: df: A B C 1000 10 0.5 765 5 0.35 800 7 0.09 Any idea how I can normalize the columns of this dataframe where each value is between 0 and 1? My desired output is: A B C 1 1 1 0.765 0.5 0.7 0.8 0.7 0.18(which is 0.09/0.5) 回答1: You can use the package sklearn and its associated preprocessing utilities to normalize the data. import pandas as pd from sklearn import preprocessing x = df.values #returns a numpy