How to handle the frequent changes in dataset in azure Machine Learning studio?

自闭症网瘾萝莉.ら 提交于 2020-03-24 14:14:36

问题


How to handle the frequent changes in the dataset in Azure Machine Learning Studio. My dataset may change over time, I need to add more rows to dataset. How will I refresh the dataset which I currently use to train the model by using the newly updated dataset. I need this work to be done programmatically(in c# or python) instead of doing it manually in the studio.


回答1:


When registering an AzureML Dataset, no data is moved, just some information like where the data is and how it should be loaded are stored. The purpose is to make accessing the data as simple as calling dataset = Dataset.get(name="my dataset")

In the snippet below (full example), if I register the dataset, I could technically overwrite weather/2018/11.csv with a new version after registering, and my Dataset definition would stay the same, but the new data would be available if you use in it training after overwriting.

# create a TabularDataset from 3 paths in datastore
datastore_paths = [(datastore, 'weather/2018/11.csv'),
                   (datastore, 'weather/2018/12.csv'),
                   (datastore, 'weather/2019/*.csv')]
weather_ds = Dataset.Tabular.from_delimited_files(path=datastore_paths)

However, there are two more recommended approaches (my team does both)

  1. Isolate your data and register a new version of the Dataset, so that you can always roll-back to a previous version of a Dataset version . Dataset Versioning Best Practice
  2. Use a wildcard/glob datapath to refer to a folder that has new data loaded into it on a regular basis. In this way you can have a Dataset that is growing in size over time without having to re-register.



回答2:


Does that works for you? https://stackoverflow.com/a/60639631/12925558

You can manipulate the dataset object



来源:https://stackoverflow.com/questions/60652742/how-to-handle-the-frequent-changes-in-dataset-in-azure-machine-learning-studio

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!