dataframe | 易学教程

Python function returns only the first value instead of a dataframe

阅读更多关于 Python function returns only the first value instead of a dataframe

问题 I have build a function where I append the returns of 5 portfolios to a dataframe which I want to return to a variable . When I run the commands within the function row by row(kind of debugging) I end upwith the variable 'folioReturn'(which is the one I want my script to return) having the right amount of values (e.x 5). But if I call the function, only the first value of the dataframe is returned. Does anyone know how I can get the whole dataframe ? def portfolioReturns (securities,

Unable to write PySpark Dataframe created from two zipped dataframes

阅读更多关于 Unable to write PySpark Dataframe created from two zipped dataframes

问题 I am trying to follow the example given here for combining two dataframes without a shared join key (combining by "index" in a database table or pandas dataframe, except that PySpark does not have that concept): My Code left_df = left_df.repartition(right_df.rdd.getNumPartitions()) # FWIW, num of partitions = 303 joined_schema = StructType(left_df.schema.fields + right_df.schema.fields) interim_rdd = left_df.rdd.zip(right_df.rdd).map(lambda x: x[0] + x[1]) full_data = spark.createDataFrame

How get all matches using str.contains in python regex?

阅读更多关于 How get all matches using str.contains in python regex?

问题 I have a data frame, in which I need to find all the possible matches rows which match with terms . My code is texts = ['foo abc', 'foobar xyz', 'xyz baz32', 'baz 45','fooz','bazzar','foo baz'] terms = ['foo','baz','foo baz'] # create df df = pd.DataFrame({'Match_text': texts}) #cretae pattern pat = r'\b(?:{})\b'.format('|'.join(terms)) # use str.contains to find matchs df = df[df['Match_text'].str.contains(pat)] #create pattern p = re.compile(pat) #search for pattern in the column results =

How to get the mode of a column in pandas where there are few of the same mode values pandas

阅读更多关于 How to get the mode of a column in pandas where there are few of the same mode values pandas

问题 I have a data frame and i'd like to get the mode of a specific column. i'm using: freq_mode = df.mode()['my_col'][0] However I get the error: ValueError: ('The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()', 'occurred at index my_col') I'm guessing it's because I have few mode that are the same. I need any of the mode, it doesn't matter. How can I use any() to get any of the mode existed? 回答1: For me your code working nice with sample data. If

How to flip a column of ratios, convert into a fraction and convert to a float

阅读更多关于 How to flip a column of ratios, convert into a fraction and convert to a float

问题 I have the following data frame: Date Ratio 0 2000-06-21 4:1 1 2000-06-22 3:2 2 2000-06-23 5:7 3 2000-06-24 7:1 For each item in the Ratio column, I want to reverse the ratio, convert it into a fraction and convert it to a float. Meaning 4:1 would become 1:4, then the : would be replaced with a / and finally it would get 0.25. 3:2 would become 2/3 which is converted to 0.66666666666. So far I only have the following code: df['Ratio'] = df['Ratio'].str.split(":") 回答1: Create new DataFrame with

Pandas: How to create a column based on values of another column?

阅读更多关于 Pandas: How to create a column based on values of another column?

问题 I need to create a new column at the end of a data frame, where the values in that new column are the result of applying some function who's parameters are based on other columns. Specifically, from another column, but a different row. So for example, if my data frame had two columns, containing values x_i , y_i respectively, my third column would be f(x_(i-1), y_(i-1)) I know that to create create a new column, the easiest way would be to do something like df['new_row'] = ... But I'm not

How to flip a column of ratios, convert into a fraction and convert to a float

阅读更多关于 How to flip a column of ratios, convert into a fraction and convert to a float

How to flip a column of ratios, convert into a fraction and convert to a float

阅读更多关于 How to flip a column of ratios, convert into a fraction and convert to a float

Pandas: How to create a column based on values of another column?

阅读更多关于 Pandas: How to create a column based on values of another column?

pandas: multiply column depending on other column

阅读更多关于 pandas: multiply column depending on other column

问题 I have a dataframe with column a and b. I want to multiply column a with value x if b is true and with value y if b is false. What is the best way to achieve this? 回答1: You could do it in 2 steps: df.loc[df.b, 'a'] *= x df.loc[df.b == False, 'a'] *= y Or in 1 step using where : In [366]: df = pd.DataFrame({'a':randn(5), 'b':[True, True, False, True, False]}) df Out[366]: a b 0 0.619641 True 1 -2.080053 True 2 0.379665 False 3 0.134897 True 4 1.580838 False In [367]: df.a *= np.where(df.b, 5