Tensorflow vs Tensorflow JS different results for floating point arithmetic computations

问题

I have converted a Tensorflow model to Tensorflow JS and tried using in the browser. There are some preprocessing steps which are to be executed on the inout before feeding it to the model for inference. I have implemented these steps same as the Tensorflow. The problem is the inference results are not same on TF JS in comparison with Tensorflow. So I have started debugging the code and found that the results from the floating point arithmetic operations in the preprocessing on TF JS are different from the Tensorflow which is running on Docker container with GPU. Code used in the TF JS is below.

       var tensor3d = tf.tensor3d(image,[height,width,1],'float32')

        var pi= PI.toString();
        if(bs == 14 && pi.indexOf('1') != -1 ) {

          tensor3d =  tensor3d.sub(-9798.6993999999995).div(7104.607118190255)

        }
        else if(bs == 12 && pi.indexOf('1') != -1) {

          tensor3d = tensor3d.sub(-3384.9893000000002).div(1190.0708513300835)
        }
        else if(bs == 12 && pi.indexOf('2') != -1) {

          tensor3d =  tensor3d.sub(978.31200000000001).div(1092.2426342420442)

        }
        var resizedTensor = tensor3d.resizeNearestNeighbor([224,224]).toFloat()
        var copiedTens = tf.tile(resizedTensor,[1,1,3])
        return copiedTens.expandDims();

Python code blocks used

ds = pydicom.dcmread(input_filename, stop_before_pixels=True)
if (ds.BitsStored == 12) and '1' in ds.PhotometricInterpretation:
    normalize_mean = -3384.9893000000002
    normalize_std = 1190.0708513300835
elif (ds.BitsStored == 12) and '2' in ds.PhotometricInterpretation:
    normalize_mean = 978.31200000000001
    normalize_std = 1092.2426342420442
elif (ds.BitsStored == 14) and '1' in ds.PhotometricInterpretation:
    normalize_mean = -9798.6993999999995
    normalize_std = 7104.607118190255
else:
    error_response = "Unable to read required metadata, or metadata invalid. 
    BitsStored: {}. PhotometricInterpretation: {}".format(ds.BitsStored, 
    ds.PhotometricInterpretation)
    error_json = {'code': 500, 'message': error_response}
    self._set_headers(500)
    self.wfile.write(json.dumps(error_json).encode())
    return

    normalization = Normalization(mean=normalize_mean, std=normalize_std)
    resize = ResizeImage()
    copy_channels = CopyChannels()
    inference_data_collection.append_preprocessor([normalization, resize, 
    copy_channels])

Normalization code

    def normalize(self, normalize_numpy, mask_numpy=None):

        normalize_numpy = normalize_numpy.astype(float)

        if mask_numpy is not None:
            mask = mask_numpy > 0
        elif self.mask_zeros:
            mask = np.nonzero(normalize_numpy)
        else:
            mask = None

        if mask is None:
            normalize_numpy = (normalize_numpy - self.mean) / self.std
        else:
            raise NotImplementedError

        return normalize_numpy

ResizeImage code

   from skimage.transform import resize

   def Resize(self, data_group):

        input_data = data_group.preprocessed_case

        output_data = resize(input_data, self.output_dim)

        data_group.preprocessed_case = output_data
        self.output_data = output_data

CopyChannels code

    def CopyChannels(self, data_group):

        input_data = data_group.preprocessed_case

        if self.new_channel_dim:
            output_data = np.stack([input_data] * self.channel_multiplier, -1)
        else:
            output_data = np.tile(input_data, (1, 1, self.channel_multiplier))

        data_group.preprocessed_case = output_data
        self.output_data = output_data

Sample outoputs Left is Tensorflow on Docker with GPU and right is TF JS:

The results are actually different after every step.

回答1:

There might be a number of possibilities that can lead to the issue.

1- The ops used in python are not used in the same manner in both js and python. If that is the case, using exactly the same ops will get rid of the issue.

2- The tensors image might be read differently by the python library and the browser canvas. Actually, accross browsers the canvas pixel don't always have the same value due to some operations like anti-aliasing, etc ... as explained in this answer. So there might be some slight differences in the result of the operations. To make sure that this is the root cause of the issue, first try to print the python and the js array image and see if they are alike. It is likely that the 3d tensor is different in js and python.

tensor3d = tf.tensor3d(image,[height,width,1],'float32')

In this case, instead of reading directly the image in the browser, one can use the python library to convert image to array of tensor. And use tfjs to read directly this array instead of the image. That way, the input tensors will be the same both for in js and in python.

3 - it is a float32 precision issue. tensor3d is created with the dtype float32 and depending on the operations used, there might be a precision issue. Consider this operation:

tf.scalar(12045, 'int32').mul(tf.scalar(12045, 'int32')).print(); // 145082032 instead of 145082025

The same precision issue will be encountered in python with the following:

a = tf.constant([12045], dtype='float32') * tf.constant([12045], dtype='float32')
tf.print(a) // 145082032

In python this can be solved by using int32 dtype. However because of the webgl float32 limitation the same thing can't be done using the webgl backend on tfjs. In neural networks, this precision issue is not a great deal. To get rid of it, one can change the backend using setBackend('cpu') for instance which is much slower.

来源：https://stackoverflow.com/questions/56649680/tensorflow-vs-tensorflow-js-different-results-for-floating-point-arithmetic-comp

标签

javascript

tensorflow

floating-point

tensorflow.js

tensorflowjs