python - Tensorflow network diverges if reading/preprocessing is done on cpu -
i have code reads images tensor record, preprocesses them (cropping, apply random hue/saturation etc., through tensorflows own methods) , uses shuffle_batch_join
generate batches.
tf.variable_scope('dump_reader'): all_files = glob.glob(dump_file + '*') filename_queue = tf.train.string_input_producer(all_files, num_epochs=epochs) example_list = [read_tensor_record(filename_queue, image_size) _ in range(read_threads)] return tf.train.shuffle_batch_join(example_list, batch_size=batch_size, capacity=min_queue_size + batch_size * 16, min_after_dequeue=min_queue_size)
this works , leads converging network when placing operations on gpu. however, right bottleneck of code , i'd speed placing on cpu wrapping block in with tf.device('/cpu:0'):
. have faster iterations (about 1/5th), network diverges after 10 iterations, leading loss of nan. when visually inspecting samples created in tensorboard, there no apparent difference.
why convergence behavior different on cpu vs gpu? how further investigate weird behavior?
i having similar issue. in case had pinned image transformation ops gpu, data being fed queues on cpu. me mistake pin ops gpu, pinned them cpu with tf.device('/cpu:0')
, numerical instability issues went away.
notably, had run these image preprocessing steps on gpu not using queues load data. feed data directly gpu via placeholder , ran fine.
it when started using queues, , loaded queues independent threads started seeing issue (i able use queues in sequential manner when didn't load them separate threads).
i don't have exact answer happened yet, seem making sure data , image pre-processing ops consistently placed on same device quite critical.
Comments
Post a Comment