Tensorflow every iteration of for loop gets slower and slower

问题

I'm doing some evaluation for my algorithm where I'm comparing some generated images to a ground truth image by computing 3 different types of loss between the two images. The logic of the code is:

I loop over all ground truth images
For each ground truth image, I loop over the relevant generated images and check each against the ground truth image by computing 3 losses

The running time of the code is increasing for each iteration as seen below. This makes it so the code can't finish running in a reasonable amount of time. What could be causing this?

Code is included below. Also I'm using the Edward library for Tensorflow, if that's relevant. I create my session using the following command:

sess = ed.get_session()

Starting evaluation... 100%|█████████████████████████████████████████████| 40/40 [01:36<00:00, 2.53s/it] ---------- Summary Image 001 ------------ Starting evaluation... 100%|█████████████████████████████████████████████| 40/40 [01:44<00:00, 2.61s/it] ---------- Summary Image 002 ------------ Starting evaluation... 100%|█████████████████████████████████████████████| 40/40 [01:57<00:00, 3.59s/it] ---------- Summary Image 003 ------------ Starting evaluation... 100%|█████████████████████████████████████████████| 40/40 [02:16<00:00, 3.34s/it] ---------- Summary Image 004 ------------ Starting evaluation... 100%|█████████████████████████████████████████████| 40/40 [02:25<00:00, 3.56s/it] ---------- Summary Image 005 ------------ Starting evaluation... 100%|█████████████████████████████████████████████| 40/40 [02:45<00:00, 4.00s/it] ---------- Summary Image 006 ------------ Starting evaluation... 100%|█████████████████████████████████████████████| 40/40 [02:54<00:00, 4.19s/it] ---------- Summary Image 007 ------------ Starting evaluation... 100%|█████████████████████████████████████████████| 40/40 [03:11<00:00, 4.58s/it] ---------- Summary Image 008 ------------ Starting evaluation... 100%|████████████████████████████████████████████| 40/40 [03:26<00:00, 5.02s/it] ---------- Summary Image 009 ------------ Starting evaluation... 100%|████████████████████████████████████████████| 40/40 [03:38<00:00, 5.58s/it] ---------- Summary Image 010 ------------ Starting evaluation... 100%|████████████████████████████████████████████| 40/40 [03:51<00:00, 5.77s/it]

for i in range(inference_batch_size):
    compare_vae_hmc_loss(model.decode_op, model.encode_op, model.discriminator_l_op,
                               x_ad[i:i+1], samples_to_check[:, i, :], config)

def compare_vae_hmc_loss(P, Q, DiscL, x_gt, samples_to_check, config):
    print ("Starting evaluation...")

    x_samples_to_check = ...

    for i, sample in enumerate(tqdm(x_samples_to_check)):

        for j in range(sample_to_vis):
            plot_save(x_samples_to_check[j], './out/{}_mcmc_sample_{}.png'.format(img_num, j + 1))

        avg_img = np.mean(x_samples_to_check, axis=0)
        plot_save(avg_img, './out/{}_mcmcMean.png'.format(img_num))

        r_loss = recon_loss(x_gt, sample)
        l_loss = l2_loss(x_gt, sample)
        lat_loss = l_latent_loss(l_th_x_gt, l_th_layer_samples[i:i+1])
        total_recon_loss += r_loss
        total_l2_loss += l_loss
        total_latent_loss += lat_loss

        if r_loss < best_recon_loss:
            best_recon_sample = sample
            best_recon_loss = r_loss

        if l_loss < best_l2_loss:
            best_l2_sample = sample
            best_l2_loss = l_loss

        if lat_loss < best_latent_loss:
            best_latent_sample = sample
            best_latent_loss = lat_loss

def l2_loss(x_gt, x_hmc):
    if jernej_Q_P:
        return tf.norm(x_gt - x_hmc).eval()
    else:
        return tf.norm(x_gt-x_hmc).eval()


def recon_loss(x_gt, x_hmc):
    if jernej_Q_P:
        return tf.reduce_sum(tf.nn.sigmoid_cross_entropy_with_logits(logits=x_hmc, labels=x_gt), 1).eval()
    else:
        return tf.reduce_sum(tf.nn.sigmoid_cross_entropy_with_logits(logits=x_hmc[1], labels=x_gt), 1).eval()


def l_latent_loss(l_th_x_gt, l_th_x_hmc):
    return tf.norm(l_th_x_gt - l_th_x_hmc).eval()

回答1:

The problem is that you're adding new ops to the graph every sample you process -- the compare_vae_hmc_loss function adds new nodes (it calls tf. functions) every time it's executed. This means that your graph is getting bigger and bigger and taking more and more memory.

What you need to do is define the computational graph once and then invoke it multiple times. All of those calls like return tf.norm(x_gt-x_hmc).eval() are creating a new node in the graph that will persist forever. Instead, you should save the nodes you create once, remember the python variable, and then pull on that node each time you need it.

来源：https://stackoverflow.com/questions/47372815/tensorflow-every-iteration-of-for-loop-gets-slower-and-slower

标签

tensorflow