featuretools

Do `normalize_entity()`, `add_relationships()` are logically same in featuretools?

拜拜、爱过 提交于 2020-12-15 05:32:41
问题 Example: buy_log_df = pd.DataFrame( [ ["2020-01-01", 0, 1, 2, 2, 200], ["2020-01-02", 1, 1, 1, 3, 100], ["2020-01-02", 2, 2, 1, 1, 100], ["2020-01-03", 3, 3, 3, 1, 300], ], columns=['date', 'sale_id', 'customer_id', "item_id", "quantity", "price"] ) es = ft.EntitySet(id="sale_set") es = es.entity_from_dataframe( "sales", dataframe=buy_log_df, index="sale_id", time_index='date' ) es = es.normalize_entity( new_entity_id="items", base_entity_id="sales", index="item_id", additional_variables=[

Do `normalize_entity()`, `add_relationships()` are logically same in featuretools?

久未见 提交于 2020-12-15 05:31:26
问题 Example: buy_log_df = pd.DataFrame( [ ["2020-01-01", 0, 1, 2, 2, 200], ["2020-01-02", 1, 1, 1, 3, 100], ["2020-01-02", 2, 2, 1, 1, 100], ["2020-01-03", 3, 3, 3, 1, 300], ], columns=['date', 'sale_id', 'customer_id', "item_id", "quantity", "price"] ) es = ft.EntitySet(id="sale_set") es = es.entity_from_dataframe( "sales", dataframe=buy_log_df, index="sale_id", time_index='date' ) es = es.normalize_entity( new_entity_id="items", base_entity_id="sales", index="item_id", additional_variables=[

I got stuck trying to fetch the previous value based on a criteria

北战南征 提交于 2020-01-15 10:33:19
问题 I'm new to FeatureTools library, and I got stuck trying to create two types of features, both are related to fetching previous values. One is the previous value itself for 'QUANTIDADE' , 'VALOR_TOTAL' and 'DATA_NOTA' , and the other is the time since the previous observation (days) which has 'DATA_NOTA' as the date field. I don't know if it is possible to do it with FeaturelTools. If someone can help me, I would appreciate it. I have a dataframe (df) as folowing: When I normalize the above df

I got stuck trying to fetch the previous value based on a criteria

*爱你&永不变心* 提交于 2020-01-15 10:32:26
问题 I'm new to FeatureTools library, and I got stuck trying to create two types of features, both are related to fetching previous values. One is the previous value itself for 'QUANTIDADE' , 'VALOR_TOTAL' and 'DATA_NOTA' , and the other is the time since the previous observation (days) which has 'DATA_NOTA' as the date field. I don't know if it is possible to do it with FeaturelTools. If someone can help me, I would appreciate it. I have a dataframe (df) as folowing: When I normalize the above df

How to use Featuretools to create features from multiple columns in single dataframe by column values?

百般思念 提交于 2020-01-01 07:02:14
问题 I'm trying to predict results of football matches based on earlier results. I'm running Python 3.6 on Windows and using Featuretools 0.4.1. Let's say I have the following dataframe representing history of results. Original DataFame Using the dataframe above I want to create the following dataframe which will be fed to machine learning algorithm as X . Note that goal averages for home and away teams need to be calculated by team despite their past match venues. Is there a way to create such a

Featuretools create index from multiple columns

杀马特。学长 韩版系。学妹 提交于 2019-12-11 12:23:48
问题 I am trying to create an entity from a dataframe using the entity_from_dataframe function in featuretools. Is there a way to define the index if it comprises of more than one column. I'm unsure if I need a list, tuple or some other data structure. This is the code: es=es.entity_from_dataframe(entity_id="credit", dataframe=credit_df, index=["ID1","ID2"] ) It generates the following error regarding hashability TypeError: unhashable type: 'list' 回答1: You can only have a single variable be your

Should we exclude target variable from DFS in featuretools?

 ̄綄美尐妖づ 提交于 2019-12-10 19:07:40
问题 While passing the dataframes as entities in an entityset and use DFS on that, are we supposed to exclude target variable from the DFS? I have a model that had 0.76 roc_auc score after traditional feature selection methods tried manually and used feature tools to see if it improves the score. So used DFS on entityset that included target variable as well. Surprisingly, the roc_auc score went up to 0.996 and accuracy to 0.9997 and so i am doubtful of the scores as i passed target variable as

How to use Featuretools to create features from multiple columns in single dataframe by column values?

十年热恋 提交于 2019-12-03 20:52:30
I'm trying to predict results of football matches based on earlier results. I'm running Python 3.6 on Windows and using Featuretools 0.4.1. Let's say I have the following dataframe representing history of results. Original DataFame Using the dataframe above I want to create the following dataframe which will be fed to machine learning algorithm as X . Note that goal averages for home and away teams need to be calculated by team despite their past match venues. Is there a way to create such a dataframe using Featuretools ? Resulting Dataframe Excel file used to simulate the transformation can

Featuretools: Can it be applied on a single table to generate features even when there is no datetime related column?

社会主义新天地 提交于 2019-12-03 17:17:31
问题 The featuretools documentation states in its very first sentence: "Featuretools is a framework to perform automated feature engineering. It excels at transforming temporal and relational datasets into feature matrices for machine learning." This seems to imply that dataset must have a datetime column. I just want to have it confirmed that this is actually so. That is, for example, I cannot use it on 'iris' dataset to generate new features? If dataset need not have time variable, how would I

Featuretools: Can it be applied on a single table to generate features even when there is no datetime related column?

◇◆丶佛笑我妖孽 提交于 2019-12-03 07:08:21
The featuretools documentation states in its very first sentence: "Featuretools is a framework to perform automated feature engineering. It excels at transforming temporal and relational datasets into feature matrices for machine learning." This seems to imply that dataset must have a datetime column. I just want to have it confirmed that this is actually so. That is, for example, I cannot use it on 'iris' dataset to generate new features? If dataset need not have time variable, how would I use it to generate features on 'iris' dataset. I will be grateful for a reply. Thanks. willk