data-science

Python: Unstacked DataFrame is too big, causing int32 overflow

坚强是说给别人听的谎言 提交于 2021-02-19 09:44:51
问题 I have a big dataset and when I try to run this code I get a memory error. user_by_movie = user_items.groupby(['user_id', 'movie_id'])['rating'].max().unstack() here is the error: ValueError: Unstacked DataFrame is too big, causing int32 overflow I have run it on another machine and it worked fine! how can I fix this error? 回答1: As it turns out this was not an issue on pandas 0.21. I am using a Jupyter notebook and I need the latest version of pandas for the rest of the code. So I did this:

Python: Unstacked DataFrame is too big, causing int32 overflow

人走茶凉 提交于 2021-02-19 09:44:14
问题 I have a big dataset and when I try to run this code I get a memory error. user_by_movie = user_items.groupby(['user_id', 'movie_id'])['rating'].max().unstack() here is the error: ValueError: Unstacked DataFrame is too big, causing int32 overflow I have run it on another machine and it worked fine! how can I fix this error? 回答1: As it turns out this was not an issue on pandas 0.21. I am using a Jupyter notebook and I need the latest version of pandas for the rest of the code. So I did this:

how to get a continuous rolling mean in pandas?

一曲冷凌霜 提交于 2021-02-19 02:46:39
问题 Looking to get a continuous rolling mean of a dataframe. df looks like this index price 0 4 1 6 2 10 3 12 looking to get a continuous rolling of price the goal is to have it look this a moving mean of all the prices. index price mean 0 4 4 1 6 5 2 10 6.67 3 12 8 thank you in advance! 回答1: you can use expanding: df['mean'] = df.price.expanding().mean() df index price mean 0 4 4.000000 1 6 5.000000 2 10 6.666667 3 12 8.000000 回答2: Welcome to SO: Hopefully people will soon remember you from

How to create a historical timeline with Python

房东的猫 提交于 2021-02-18 18:20:26
问题 So I've seen a few answers on here that helped a bit, but my dataset is larger than the ones that have been answered previously. To give a sense of what I'm working with, here's a link to the full dataset. I've included a picture of one attempted solution, which was found at this link: . The issue is that 1. This is difficult to read and 2. I don't know how to flatten it out so that it looks like a traditional timeline. The issue becomes more apparent when I try and work with larger segments,

How to create a historical timeline with Python

冷暖自知 提交于 2021-02-18 18:20:07
问题 So I've seen a few answers on here that helped a bit, but my dataset is larger than the ones that have been answered previously. To give a sense of what I'm working with, here's a link to the full dataset. I've included a picture of one attempted solution, which was found at this link: . The issue is that 1. This is difficult to read and 2. I don't know how to flatten it out so that it looks like a traditional timeline. The issue becomes more apparent when I try and work with larger segments,

Random_state's contribution to accuracy

江枫思渺然 提交于 2021-02-17 06:30:51
问题 Okay, this is interesting.. I executed the same code a couple of times and each time I got a different accuracy_score . I figured that I was not using any random_state value while train_test splitting . so I used random_state=0 and got consistent Accuracy_score of 82%. but... then I thought to give it a try with different random_state number and I set random_state=128 and Accuracy_score becomes 84%. Now I need to understand why is that and how random_state affects the accuracy of the model.

Plotly / How to change the default color pallete in Plotly?

孤人 提交于 2021-02-16 14:46:15
问题 I was able to force the default theme using import plotly.io as pio pio.templates.default = 'plotly_white' But I am struggling to set a default color palette. Any ideas how to change this? Thanks 回答1: You can add new items to pio.templates . import plotly.io as pio import plotly.graph_objects as go pio.templates["myname"] = go.layout.Template( layout=go.Layout( colorway=['#ff0000', '#00ff00', '#0000ff'] ) ) pio.templates.default = 'myname' See more here: https://plotly.com/python/templates/

How could I solve this error to scrape Twitter with Python?

a 夏天 提交于 2021-02-11 14:18:33
问题 I'm trying to do a personal project for my portfolio, I would like to scrape the tweets about the president Macron but I get this error with twitterscrapper . from twitterscraper import query_tweets import datetime as dt import pandas as pd begin_date=dt.date(2020,11,18) end_date=dt.date(2020,11,19) limit=1000 lang='English' tweets=query_tweets("#macron",begindate=begin_date,enddate=end_date,limit=limit,lang=lang) Error: TypeError: query_tweets() got an unexpected keyword argument 'begindate'

Polynomial Regression values generated too far from the coordinates

主宰稳场 提交于 2021-02-10 06:38:29
问题 As per the the below code for Polynomial Regression coefficients value, when I calculate the regression value at any x point. Value obtained is way more away from the equivalent y coordinate (specially for the below coordinates). Can anyone explain why the difference is so high, can this be minimized or any flaw in understanding. The current requirement is not a difference of more 150 at every point. import numpy as np x=[0,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100] y=[0,885

Polynomial Regression values generated too far from the coordinates

安稳与你 提交于 2021-02-10 06:38:20
问题 As per the the below code for Polynomial Regression coefficients value, when I calculate the regression value at any x point. Value obtained is way more away from the equivalent y coordinate (specially for the below coordinates). Can anyone explain why the difference is so high, can this be minimized or any flaw in understanding. The current requirement is not a difference of more 150 at every point. import numpy as np x=[0,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100] y=[0,885