Clean up logging #45

frostedoyster · 2024-02-02T19:42:34Z

This aims to clean up the logging, according to #39.
The resulting logger can be used by several models, so it has been extracted into utils

Before:

[2024-02-02 08:54:04,880][metatensor.models.cli.train_model][INFO] - Setting up training set
[2024-02-02 08:54:04,956][metatensor.models.utils.data.readers.readers][INFO] - Forces found in section 'energy'. Forces are taken for training!
[2024-02-02 08:54:04,976][metatensor.models.utils.data.readers.readers][WARNING] - Stress not found in section 'energy'. Continue without stress!
[2024-02-02 08:54:04,977][metatensor.models.cli.train_model][INFO] - Setting up test set
[2024-02-02 08:54:04,977][metatensor.models.cli.train_model][INFO] - Setting up validation set
[2024-02-02 08:54:04,978][metatensor.models.cli.train_model][INFO] - Setting up model
[2024-02-02 08:54:05,069][metatensor.models.cli.train_model][INFO] - Calling architecture trainer
[2024-02-02 08:54:05,069][metatensor.models.soap_bpnn.train][INFO] - Checking datasets for consistency
rascaline::math::splines -- spline reached requested accuracy (1.000e-8) with 321 reference points (max absolute error is 5.699e-8)
[2024-02-02 08:54:05,158][metatensor.models.soap_bpnn.train][INFO] - Calculating composition weights
[2024-02-02 08:54:05,273][metatensor.models.soap_bpnn.train][INFO] - Setting up data loaders
[2024-02-02 08:54:05,424][metatensor.models.soap_bpnn.train][INFO] - Starting training
rascaline::math::splines -- spline reached requested accuracy (1.000e-8) with 321 reference points (max absolute error is 5.699e-8)
rascaline::math::splines -- spline reached requested accuracy (1.000e-8) with 321 reference points (max absolute error is 5.699e-8)
rascaline::math::splines -- spline reached requested accuracy (1.000e-8) with 321 reference points (max absolute error is 5.699e-8)
rascaline::math::splines -- spline reached requested accuracy (1.000e-8) with 321 reference points (max absolute error is 5.699e-8)
rascaline::math::splines -- spline reached requested accuracy (1.000e-8) with 321 reference points (max absolute error is 5.699e-8)
rascaline::math::splines -- spline reached requested accuracy (1.000e-8) with 321 reference points (max absolute error is 5.699e-8)
rascaline::math::splines -- spline reached requested accuracy (1.000e-8) with 321 reference points (max absolute error is 5.699e-8)
rascaline::math::splines -- spline reached requested accuracy (1.000e-8) with 321 reference points (max absolute error is 5.699e-8)
rascaline::math::splines -- spline reached requested accuracy (1.000e-8) with 321 reference points (max absolute error is 5.699e-8)
rascaline::math::splines -- spline reached requested accuracy (1.000e-8) with 321 reference points (max absolute error is 5.699e-8)
rascaline::math::splines -- spline reached requested accuracy (1.000e-8) with 321 reference points (max absolute error is 5.699e-8)
rascaline::math::splines -- spline reached requested accuracy (1.000e-8) with 321 reference points (max absolute error is 5.699e-8)
rascaline::math::splines -- spline reached requested accuracy (1.000e-8) with 321 reference points (max absolute error is 5.699e-8)
rascaline::math::splines -- spline reached requested accuracy (1.000e-8) with 321 reference points (max absolute error is 5.699e-8)
rascaline::math::splines -- spline reached requested accuracy (1.000e-8) with 321 reference points (max absolute error is 5.699e-8)
rascaline::math::splines -- spline reached requested accuracy (1.000e-8) with 321 reference points (max absolute error is 5.699e-8)
rascaline::math::splines -- spline reached requested accuracy (1.000e-8) with 321 reference points (max absolute error is 5.699e-8)
/home/filippo/.local/lib/python3.8/site-packages/torch/autograd/__init__.py:251: UserWarning: second derivatives with respect to positions are not implemented and will not be accumulated during backward() calls. If you need second derivatives, please open an issue on rascaline repository. (Triggered internally at /home/filippo/rascaline/rascaline-torch/src/autograd.cpp:313.)
  Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
rascaline::math::splines -- spline reached requested accuracy (1.000e-8) with 321 reference points (max absolute error is 5.699e-8)
rascaline::math::splines -- spline reached requested accuracy (1.000e-8) with 321 reference points (max absolute error is 5.699e-8)
rascaline::math::splines -- spline reached requested accuracy (1.000e-8) with 321 reference points (max absolute error is 5.699e-8)
rascaline::math::splines -- spline reached requested accuracy (1.000e-8) with 321 reference points (max absolute error is 5.699e-8)
[2024-02-02 08:54:09,217][metatensor.models.soap_bpnn.train][INFO] - Epoch    0, train loss:  4104.6695, validation loss:   614.6490, train energy RMSE:     3.8793, train force RMSE:    28.3881, valid energy RMSE:     3.3691, valid force RMSE:    24.5621
[2024-02-02 08:54:15,572][metatensor.models.soap_bpnn.train][INFO] - Epoch    5, train loss:  2332.1070, validation loss:   328.8712, train energy RMSE:     2.9285, train force RMSE:    21.3973, valid energy RMSE:     2.0911, valid force RMSE:    18.0138
[2024-02-02 08:54:21,735][metatensor.models.soap_bpnn.train][INFO] - Epoch   10, train loss:   389.5255, validation loss:    61.7523, train energy RMSE:     2.2921, train force RMSE:     8.5236, valid energy RMSE:     1.7062, valid force RMSE:     7.6708
[2024-02-02 08:54:28,088][metatensor.models.soap_bpnn.train][INFO] - Epoch   15, train loss:   148.4480, validation loss:    28.8797, train energy RMSE:     1.7587, train force RMSE:     5.1572, valid energy RMSE:     1.6739, valid force RMSE:     5.1066
[2024-02-02 08:54:34,342][metatensor.models.soap_bpnn.train][INFO] - Epoch   20, train loss:   116.4343, validation loss:    25.2689, train energy RMSE:     2.1110, train force RMSE:     4.3394, valid energy RMSE:     2.4750, valid force RMSE:     4.3753
[2024-02-02 08:54:40,748][metatensor.models.soap_bpnn.train][INFO] - Epoch   25, train loss:   103.5423, validation loss:    16.3115, train energy RMSE:     2.2714, train force RMSE:     3.9433, valid energy RMSE:     0.9048, valid force RMSE:     3.9361

After:

[2024-02-02 08:50:28,485][metatensor.models.cli.train_model][INFO] - Setting up training set
[2024-02-02 08:50:28,560][metatensor.models.utils.data.readers.readers][INFO] - Forces found in section 'energy'. Forces are taken for training!
[2024-02-02 08:50:28,582][metatensor.models.utils.data.readers.readers][WARNING] - Stress not found in section 'energy'. Continue without stress!
[2024-02-02 08:50:28,583][metatensor.models.cli.train_model][INFO] - Setting up test set
[2024-02-02 08:50:28,583][metatensor.models.cli.train_model][INFO] - Setting up validation set
[2024-02-02 08:50:28,583][metatensor.models.cli.train_model][INFO] - Setting up model
[2024-02-02 08:50:28,673][metatensor.models.cli.train_model][INFO] - Calling architecture trainer
[2024-02-02 08:50:28,674][metatensor.models.soap_bpnn.train][INFO] - Checking datasets for consistency
[2024-02-02 08:50:28,761][metatensor.models.soap_bpnn.train][INFO] - Calculating composition weights
[2024-02-02 08:50:28,875][metatensor.models.soap_bpnn.train][INFO] - Setting up data loaders
[2024-02-02 08:50:29,044][metatensor.models.soap_bpnn.train][INFO] - Starting training
[2024-02-02 08:50:32,343][metatensor.models.utils.logging][INFO] - Epoch    0, train loss: 3930.5, validation loss: 614.71, train energy RMSE: 4.0425, train force RMSE: 27.744, valid energy RMSE: 3.1709, valid force RMSE: 24.590
[2024-02-02 08:50:39,017][metatensor.models.utils.logging][INFO] - Epoch    5, train loss: 2309.8, validation loss: 330.64, train energy RMSE: 2.7989, train force RMSE: 21.310, valid energy RMSE: 1.8786, valid force RMSE: 18.086
[2024-02-02 08:50:45,312][metatensor.models.utils.logging][INFO] - Epoch   10, train loss:  319.1, validation loss:  66.38, train energy RMSE: 2.1813, train force RMSE:  7.685, valid energy RMSE: 4.0625, valid force RMSE:  7.062
[2024-02-02 08:50:51,406][metatensor.models.utils.logging][INFO] - Epoch   15, train loss:  155.8, validation loss:  34.09, train energy RMSE: 2.8850, train force RMSE:  4.779, valid energy RMSE: 2.9216, valid force RMSE:  5.056
[2024-02-02 08:50:57,619][metatensor.models.utils.logging][INFO] - Epoch   20, train loss:  106.5, validation loss:  22.33, train energy RMSE: 2.1761, train force RMSE:  4.070, valid energy RMSE: 1.4479, valid force RMSE:  4.498
[2024-02-02 08:51:03,782][metatensor.models.utils.logging][INFO] - Epoch   25, train loss:  106.5, validation loss:  17.12, train energy RMSE: 2.3252, train force RMSE:  3.988, valid energy RMSE: 1.0178, valid force RMSE:  4.011

📚 Documentation preview 📚: https://metatensor-models--45.org.readthedocs.build/en/45/

Luthaf · 2024-02-03T12:19:35Z

[2024-02-02 08:50:28,485][metatensor.models.cli.train_model][INFO] - Setting up training set
[2024-02-02 08:50:28,560][metatensor.models.utils.data.readers.readers][INFO] - Forces found in section 'energy'. Forces are taken for training!
[2024-02-02 08:50:28,582][metatensor.models.utils.data.readers.readers][WARNING] - Stress not found in section 'energy'. Continue without stress!
[2024-02-02 08:50:28,583][metatensor.models.cli.train_model][INFO] - Setting up test set

This is still a bit overwhelming IMO. Could we do something like

[2024-02-02 08:50:28,485][metatensor.models.cli.train_model]
    [INFO] - Setting up training set
[2024-02-02 08:50:28,560][metatensor.models.utils.data.readers.readers]
    [INFO] - Forces found in section 'energy'. Forces are taken for training!
[2024-02-02 08:50:28,582][metatensor.models.utils.data.readers.readers]
    [WARNING] - Stress not found in section 'energy'. Continue without stress!
[2024-02-02 08:50:28,583][metatensor.models.cli.train_model]
    [INFO] - Setting up test set

Or remove the module path since it is not very relevant to users.

[2024-02-02 08:50:28,485][INFO] - Setting up training set
[2024-02-02 08:50:28,560][INFO] - Forces found in section 'energy'. Forces are taken for training!
[2024-02-02 08:50:28,582][WARNING] - Stress not found in section 'energy'. Continue without stress!
[2024-02-02 08:50:28,583][INFO] - Setting up test set

PicoCentauri · 2024-02-03T15:44:58Z

Or remove the module path since it is not very relevant to users.

Yes, we can try. This something related to how hydra prints the logging. We have to overwrite their config.

PicoCentauri

Good improvement to the logging.

In general, please follow our conventions for docstrings. When chatGPT generates the code for you, you can ask for rst docstrings in sphinx format. 🤫

PicoCentauri · 2024-02-03T15:47:22Z

src/metatensor/models/utils/logging.py

+        Args:
+            model_capabilities: The capabilities of the model.
+            train_loss_0: The initial training loss.
+            validation_loss_0: The initial validation loss.
+            train_info_0: The initial training metrics.
+            validation_info_0: The initial validation metrics.


This is not how we write docstrings. Can you adjust this please?

PicoCentauri · 2024-02-03T15:49:06Z

src/metatensor/models/utils/logging.py

+        logger.info(logging_string)
+
+
+def _get_digits(value: float) -> Tuple[int, int]:


Is this chatGPT code?

Looks very cumbersome.

Python has a Decimal library which does this for you. i.e

t = Decimal(1.532).normalize().as_tuple()

gives:

DecimalTuple(sign=0, digits=(1, 5, 3, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 8, 4, 2, 1, 7, 0, 9, 4, 3), exponent=-26)

which I think contains all the information you need in a single line. You can event get the mantissa and the exponent if you like...

Hmmm I don't think I can use it to do what I want... Yes, what I do is quite cumbersome but I think it's the best solution for now. The function essentially gets a number and decides what parts of the number should be printed so that there are always at least 5 significant digits

What's wrong with %12.5e or equivalent?

I have the strong feeling that many of the people we're dealing with won't be comfortable with exponential notation

Yes, I would go for format specifiers: f"something {some_float:12.5e}". See documentation here: https://docs.python.org/3.11/library/string.html#formatspec

I have the strong feeling that many of the people we're dealing with won't be comfortable with exponential notation

I'm not sure about this, but we can also use g specifiers instead.

I agree. Maybe the g is the better option then f

PicoCentauri · 2024-02-03T15:53:53Z

src/metatensor/models/utils/logging.py

+logger = logging.getLogger(__name__)
+
+
+class MetricLogger:


You promised also testing for this in your issue. Maybe at least one test that is checking if a log is produced should be added.

Sure, I will test after we find a way to log the way we want

PicoCentauri · 2024-02-03T15:55:26Z

src/metatensor/models/soap_bpnn/train.py

+        if epoch == 0:
+            metric_logger = MetricLogger(
+                model_capabilities,
+                train_loss,
+                validation_loss,
+                finalized_train_info,
+                finalized_validation_info,
+            )


Why don't you initialize the logger before entering the training loop?

Because it needs the values from the first epoch to format things properly

frostedoyster · 2024-02-05T05:50:22Z

Personally, I would go for the following:

[2024-02-02 08:50:28][INFO] - Setting up training set
[2024-02-02 08:50:28][INFO] - Forces found in section 'energy'. Forces are taken for training!
[2024-02-02 08:50:28][WARNING] - Stress not found in section 'energy'. Continue without stress!
[2024-02-02 08:50:28][INFO] - Setting up test set

i.e., removing both the module and the thousandth of seconds (which shouldn't be relevant during our training exercises, it clogs up the log and we can definitely do without it)

Clean up logging

53a1d80

frostedoyster requested a review from PicoCentauri February 2, 2024 19:43

frostedoyster requested review from PicoCentauri and removed request for PicoCentauri February 3, 2024 12:41

PicoCentauri requested changes Feb 3, 2024

View reviewed changes

Correct docstrings

d7dbed9

frostedoyster requested a review from PicoCentauri February 5, 2024 16:56

PicoCentauri approved these changes Feb 5, 2024

View reviewed changes

frostedoyster merged commit cd9a8ab into main Feb 5, 2024
8 checks passed

frostedoyster deleted the clean-up-logging branch February 5, 2024 17:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up logging #45

Clean up logging #45

frostedoyster commented Feb 2, 2024 •

edited by github-actions bot

Loading

Luthaf commented Feb 3, 2024

PicoCentauri commented Feb 3, 2024

PicoCentauri left a comment

PicoCentauri Feb 3, 2024

PicoCentauri Feb 3, 2024

frostedoyster Feb 4, 2024

ceriottm Feb 5, 2024

frostedoyster Feb 5, 2024

Luthaf Feb 5, 2024

Luthaf Feb 5, 2024 •

edited

Loading

PicoCentauri Feb 5, 2024

PicoCentauri Feb 3, 2024

frostedoyster Feb 4, 2024

PicoCentauri Feb 3, 2024

frostedoyster Feb 4, 2024

frostedoyster commented Feb 5, 2024

		logger.info(logging_string)


		def _get_digits(value: float) -> Tuple[int, int]:

Clean up logging #45

Clean up logging #45

Conversation

frostedoyster commented Feb 2, 2024 • edited by github-actions bot Loading

Luthaf commented Feb 3, 2024

PicoCentauri commented Feb 3, 2024

PicoCentauri left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Luthaf Feb 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

frostedoyster commented Feb 5, 2024

frostedoyster commented Feb 2, 2024 •

edited by github-actions bot

Loading

Luthaf Feb 5, 2024 •

edited

Loading