dataframe

Python function returns only the first value instead of a dataframe

浪子不回头ぞ 提交于 2021-02-05 08:35:59
问题 I have build a function where I append the returns of 5 portfolios to a dataframe which I want to return to a variable . When I run the commands within the function row by row(kind of debugging) I end upwith the variable 'folioReturn'(which is the one I want my script to return) having the right amount of values (e.x 5). But if I call the function, only the first value of the dataframe is returned. Does anyone know how I can get the whole dataframe ? def portfolioReturns (securities,

Unable to write PySpark Dataframe created from two zipped dataframes

有些话、适合烂在心里 提交于 2021-02-05 08:32:40
问题 I am trying to follow the example given here for combining two dataframes without a shared join key (combining by "index" in a database table or pandas dataframe, except that PySpark does not have that concept): My Code left_df = left_df.repartition(right_df.rdd.getNumPartitions()) # FWIW, num of partitions = 303 joined_schema = StructType(left_df.schema.fields + right_df.schema.fields) interim_rdd = left_df.rdd.zip(right_df.rdd).map(lambda x: x[0] + x[1]) full_data = spark.createDataFrame

How get all matches using str.contains in python regex?

自作多情 提交于 2021-02-05 08:10:55
问题 I have a data frame, in which I need to find all the possible matches rows which match with terms . My code is texts = ['foo abc', 'foobar xyz', 'xyz baz32', 'baz 45','fooz','bazzar','foo baz'] terms = ['foo','baz','foo baz'] # create df df = pd.DataFrame({'Match_text': texts}) #cretae pattern pat = r'\b(?:{})\b'.format('|'.join(terms)) # use str.contains to find matchs df = df[df['Match_text'].str.contains(pat)] #create pattern p = re.compile(pat) #search for pattern in the column results =

How to get the mode of a column in pandas where there are few of the same mode values pandas

試著忘記壹切 提交于 2021-02-05 08:09:31
问题 I have a data frame and i'd like to get the mode of a specific column. i'm using: freq_mode = df.mode()['my_col'][0] However I get the error: ValueError: ('The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()', 'occurred at index my_col') I'm guessing it's because I have few mode that are the same. I need any of the mode, it doesn't matter. How can I use any() to get any of the mode existed? 回答1: For me your code working nice with sample data. If

How to flip a column of ratios, convert into a fraction and convert to a float

…衆ロ難τιáo~ 提交于 2021-02-05 08:08:06
问题 I have the following data frame: Date Ratio 0 2000-06-21 4:1 1 2000-06-22 3:2 2 2000-06-23 5:7 3 2000-06-24 7:1 For each item in the Ratio column, I want to reverse the ratio, convert it into a fraction and convert it to a float. Meaning 4:1 would become 1:4, then the : would be replaced with a / and finally it would get 0.25. 3:2 would become 2/3 which is converted to 0.66666666666. So far I only have the following code: df['Ratio'] = df['Ratio'].str.split(":") 回答1: Create new DataFrame with

Pandas: How to create a column based on values of another column?

陌路散爱 提交于 2021-02-05 08:06:21
问题 I need to create a new column at the end of a data frame, where the values in that new column are the result of applying some function who's parameters are based on other columns. Specifically, from another column, but a different row. So for example, if my data frame had two columns, containing values x_i , y_i respectively, my third column would be f(x_(i-1), y_(i-1)) I know that to create create a new column, the easiest way would be to do something like df['new_row'] = ... But I'm not

How to flip a column of ratios, convert into a fraction and convert to a float

*爱你&永不变心* 提交于 2021-02-05 08:05:52
问题 I have the following data frame: Date Ratio 0 2000-06-21 4:1 1 2000-06-22 3:2 2 2000-06-23 5:7 3 2000-06-24 7:1 For each item in the Ratio column, I want to reverse the ratio, convert it into a fraction and convert it to a float. Meaning 4:1 would become 1:4, then the : would be replaced with a / and finally it would get 0.25. 3:2 would become 2/3 which is converted to 0.66666666666. So far I only have the following code: df['Ratio'] = df['Ratio'].str.split(":") 回答1: Create new DataFrame with

How to flip a column of ratios, convert into a fraction and convert to a float

痴心易碎 提交于 2021-02-05 08:04:46
问题 I have the following data frame: Date Ratio 0 2000-06-21 4:1 1 2000-06-22 3:2 2 2000-06-23 5:7 3 2000-06-24 7:1 For each item in the Ratio column, I want to reverse the ratio, convert it into a fraction and convert it to a float. Meaning 4:1 would become 1:4, then the : would be replaced with a / and finally it would get 0.25. 3:2 would become 2/3 which is converted to 0.66666666666. So far I only have the following code: df['Ratio'] = df['Ratio'].str.split(":") 回答1: Create new DataFrame with

Pandas: How to create a column based on values of another column?

ぐ巨炮叔叔 提交于 2021-02-05 08:04:18
问题 I need to create a new column at the end of a data frame, where the values in that new column are the result of applying some function who's parameters are based on other columns. Specifically, from another column, but a different row. So for example, if my data frame had two columns, containing values x_i , y_i respectively, my third column would be f(x_(i-1), y_(i-1)) I know that to create create a new column, the easiest way would be to do something like df['new_row'] = ... But I'm not

pandas: multiply column depending on other column

别来无恙 提交于 2021-02-05 07:59:05
问题 I have a dataframe with column a and b. I want to multiply column a with value x if b is true and with value y if b is false. What is the best way to achieve this? 回答1: You could do it in 2 steps: df.loc[df.b, 'a'] *= x df.loc[df.b == False, 'a'] *= y Or in 1 step using where : In [366]: df = pd.DataFrame({'a':randn(5), 'b':[True, True, False, True, False]}) df Out[366]: a b 0 0.619641 True 1 -2.080053 True 2 0.379665 False 3 0.134897 True 4 1.580838 False In [367]: df.a *= np.where(df.b, 5