What is the best way to handle large data with Tensorflow.js and tf.Tensor?

后端未结

关注

 2  592

南方客 2021-01-06 05:13

Question

I am using tf.Tensor and tf.concat() to handle large training data, and I found continuous using of tf.concat() get

2条回答

清歌不尽 (楼主)

2021-01-06 05:39
Though there is not a single way of creating a tensor, the answer of the questions lies to what is done with the tensors created.

Performance

tensors are immutable, therefore each time, tf.concat is called a new tensor is created.
```
let x = tf.tensor1d([2]);
console.log(tf.memory()) // "numTensors": 1
const y = tf.tensor1d([3])
x = tf.concat([x, y])
console.log(tf.memory()) // "numTensors": 3, 
```
```
  
    
    
  

  
  
```
As we can see from the snippet above, the number of tensors that is created when tf.concat is called is 3 and not 2 . It is true that tf.tidy will dispose of unused tensors. But this operation of creating and disposing of tensors will become most and most costly as the created tensor is getting bigger and bigger. This is both an issue of memory consumption and computation since creating a new tensor will always delegate to a backend.

creating tensor from large data

Now that the issue of performance is understood, what is the best way to proceed ?
- create the whole array in js and when the whole array is completed, then create the tensor.
```
for (i= 0; i < data.length; i++) {
  // fill array x
  x.push(dataValue)
}
// create the tensor
tf.tensor(x)
```
Though, it is the trivial solution, it is not always possible. Because create an array will keep data in memory and we can easily run out of memory with big data entries. Therefore sometimes, it might be best instead of creating the whole javascript array to create chunk of arrays and create a tensor from those chunk of arrays and start to process those tensors as soon as they are created. The chunk tensors can be merged using tf.concat again if necessary. But it might not always be required.

For instance we can call model.fit() repeatedly using chunk of tensors instead of calling it once with a big tensor that might take long to create. In this case, there is no need to concatenate the chunk tensors.
- if possible create a dataset using tf.data. This is the ideal solution, if we are next to fit a model with the data.
```
function makeIterator() {

  const iterator = {
    next: () => {
      let result;
      if (index < data.length) {
        result = {value: dataValue, done: false};
        index++;
        return result;
      }
      return {value: dataValue, done: true};
    }
  };
  return iterator;
}
const ds = tf.data.generator(makeIterator);
```
The advantage of using tf.data is that the whole dataset is created by batches when needed during model.fit call.
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...

What is the best way to handle large data with Tensorflow.js and tf.Tensor?

Question

Performance

creating tensor from large data