Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No response in training process #69

Open
dexter2406 opened this issue Nov 24, 2020 · 4 comments
Open

No response in training process #69

dexter2406 opened this issue Nov 24, 2020 · 4 comments

Comments

@dexter2406
Copy link

Hi I found the program doesn't respond when I start training. The displayed information is like the following. There is no error report either.

 np_resource = np.dtype([("resource", np.ubyte, 1)])
{'add_dispnet': True,
 'add_flownet': False,
 'add_posenet': True,
 'alpha_recon_image': 0.85,
 'batch_size': 4,
 'checkpoint_dir': 'models\\geonet_posenet\\results',
 'dataset_dir': 'data\\kitti\\formatted_data',
 'depth_test_split': 'eigen',
 'disp_smooth_weight': 0.5,
 'dispnet_encoder': 'resnet50',
...
 'output_dir': None,
 'pose_test_seq': 9,
 'rigid_warp_weight': 1.0,
 'save_ckpt_freq': 5000,
 'scale_normalize': False,
 'seq_length': 5}
2020-11-24 15:04:21.853792: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE
instructions, but these are available on your machine and could speed up CPU computations.
...
2020-11-24 15:04:21.933181: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA
instructions, but these are available on your machine and could speed up CPU computations.
Trainable variables:
depth_net/Conv/weights:0
depth_net/Conv/BatchNorm/beta:0
depth_net/Conv_1/weights:0
depth_net/Conv_1/BatchNorm/beta:0
depth_net/Conv_2/weights:0
...
pose_net/Conv_3/BatchNorm/beta:0
pose_net/Conv_4/weights:0
pose_net/Conv_4/BatchNorm/beta:0
pose_net/Conv_5/weights:0
pose_net/Conv_5/BatchNorm/beta:0
pose_net/Conv_6/weights:0
pose_net/Conv_6/BatchNorm/beta:0
pose_net/Conv_7/weights:0
pose_net/Conv_7/biases:0
parameter_count = 60047292
@dexter2406
Copy link
Author

I wait for about 20min and notice that there are following files are generated:

graph.pbtxt
events.out.tfevents.1606226671.DESKTOP-AVNMGK4

even though there's still no progress shown - maybe because your code has no visualization for training process? And what are these two files for?

Thanks for your time!

@yzcjtr
Copy link
Owner

yzcjtr commented Nov 24, 2020

Hi, can you confirm the library version you are using? From the signal above, the training hasn't started at all; otherwise, the loss value per iteration will be printed.

@dexter2406
Copy link
Author

Thanks for the reply. I'm using (mainly):

python=3.6.12
tensorflow==1.2.0
scipy==1.1.0
numpy==1.19.4
matplotlib==3.3.3
opencv-python==4.4.0
pillow==8.0.1

I know it's stated that this code is only tested in python==2.7 and tf==1.1, but they are not supported right now, so I tried new versions. I slightly modified the code according to the error repoort, but then I came to this where I didn't know what went wrong.

@yzcjtr
Copy link
Owner

yzcjtr commented Nov 26, 2020

TF 1.2 should be alright, but I'm not sure if python 3 is okay for this repo. I would suggest adding some checkpoints in the code and locate where it's stuck?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants