Add debug logging to odo. #502

groutr · 2016-11-16T02:23:45Z

This was something that I added to an old fork I had of odo. I found it useful for understanding the route of transformations in odo and it also helped me understand an issue that I was having at the time.

Is logging like this something that is desirable for odo? The log will record the route that odo took transforming from source to destination.

In [1]: import odo

In [2]: import logging

In [3]: import pandas as pd

In [4]: logging.basicConfig(level=logging.DEBUG)

In [5]: a = list(range(10))

In [6]: odo.odo(a, pd.DataFrame)
DEBUG:odo.core:[list -> ndarray] list_to_numpy ({'dshape': dshape("10 * int64")})
DEBUG:odo.core:[ndarray -> DataFrame] numpy_to_dataframe ({'dshape': dshape("10 * int64")})
Out[6]: 
   0
0  0
1  1
2  2
3  3
4  4
5  5
6  6
7  7
8  8
9  9

llllllllll

Thanks for posting this PR!

I think this could help with debugging in some cases but I am worried about the overhead when idle. Can you post some profiling data to see how much time is spent in dropped log lines?

llllllllll · 2016-11-16T22:52:09Z

odo/core.py

 from contextlib import contextmanager
 from warnings import warn

 import networkx as nx
 from datashape import discover
 from .utils import expand_tuples, ignoring

+log = logging.getLogger(__name__)
+log.addHandler(logging.NullHandler())


why are you explicitly logging registering a handler for this?

I think I misunderstood initially what NullHandler was for. It's not needed here and I will remove it.

llllllllll · 2016-11-16T22:52:38Z

odo/core.py

            x = f(x, excluded_edges=excluded_edges, **kwargs)
        return x
    except NotImplementedError as e:
        if kwargs.get('raise_on_errors'):
            raise
        warn(FailedConversionWarning(A, B, e))
+        log.warning('Failed Conversion {} -> {} ({})'.format(A, B, str(e)))


do we really need a log warning and a warning.warn?

I don't know if both are desired. The difference between warning.warn and log.warning is explained on this page https://docs.python.org/2/howto/logging.html. In short, warning.warn is intended for libraries where application code should be modified to accomodate a change in the library.

llllllllll · 2016-11-16T22:56:00Z

odo/core.py

@@ -57,12 +60,15 @@ def _transform(graph, target, source, excluded_edges=None, ooc_types=ooc_types,
               ooc_types=ooc_types)
    try:
        for (A, B, f) in pth:
+            log.debug('[{src} -> {dest}] {f} ({kwargs})'.format(f=f.__name__,


in general you should let logging libraries handle string formatting, this ensuers that no formatting happens unless the log message is being handled. logging uses % formatting so you can pass the values to format in as positionals after the message format.

groutr · 2016-11-17T04:22:51Z

Since the path is known before the loop, an alternative approach would be to log the path before traversing it. Though as far as performance goes, my initial impression is the calls to various conversion functions would far outweigh the overhead from calls to the log.

groutr · 2016-11-17T07:27:58Z

I've done some simple (very simple) benchmarking/profiling. Yes, logging does add some overhead. If we are logging below the log level, each log call results in three function calls. getEffectiveLevel is probably the most expensive call (has a while loop that traverses the current logger and all its parents), the other two seem to be relatively cheap. I don't think any of odo's paths even approach 100 hops in length.

I think my alternative solution will sidestep much of this overhead. Logging should be fine if we avoid calling it in the loop. Thoughts?

import logging
log = logging.getLogger(__name__)
loops = 100
def withlogging():
    y = 0
    for x in range(loops):
        log.debug("This is a debug message %s", x)
        y += 1

def outlogging():
    y = 0
    log.debug("This is a debug message %s", loops)
    for x in range(loops):
        y += 1

def withoutlogging():
    y = 0
    for x in range(loops):
        y+= 1

(no configuring of logging level)
>>> %timeit withlogging()
10000 loops, best of 3: 109 µs per loop
>>> %timeit withoutlogging()
100000 loops, best of 3: 4.52 µs per loop
>>> %timeit outlogging()
100000 loops, best of 3: 5.84 µs per loop
>>> %prun withlogging()
        304 function calls in 0.000 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 withlog.py:5(wlogging)
      100    0.000    0.000    0.000    0.000 __init__.py:1515(isEnabledFor)
      100    0.000    0.000    0.000    0.000 __init__.py:1501(getEffectiveLevel)
      100    0.000    0.000    0.000    0.000 __init__.py:1257(debug)
        1    0.000    0.000    0.000    0.000 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
>>> %prun withoutlogging()
         4 function calls in 0.000 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 withlog.py:12(wologging)
        1    0.000    0.000    0.000    0.000 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
>>> %prun outlogging()
        7 function calls in 0.000 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 withlog.py:17(outlogging)
        1    0.000    0.000    0.000    0.000 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 __init__.py:1257(debug)
        1    0.000    0.000    0.000    0.000 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 __init__.py:1515(isEnabledFor)
        1    0.000    0.000    0.000    0.000 __init__.py:1501(getEffectiveLevel)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

groutr · 2016-11-18T05:50:40Z

I left the log.debug call in the for loop for now.
Variations on my system seem to have a more visible effect than the log.debug calls.

With logging, but no configuration (best of 7 runs) list -> DataFrame with 1 element.
0.0003306s
Without logging (commented out) (best of 7 runs) tuple -> DataFrame with 1 element.
0.0003268s

I tried to use a path that I was easy to benchmark and reasonable fast (not spend a whole lot of time with data structures. The path was 4 hops.
tuple -> list -> ndarray -> pd.DataFrame

EDIT: Updated timings

llllllllll · 2016-12-06T22:34:27Z

These changes look good, could you rebase? After the ci runs on the rebase I will merge.

groutr · 2016-12-09T19:59:46Z

I'll rebase hopefully this weekend.

llllllllll requested changes Nov 16, 2016

View reviewed changes

llllllllll added dev experience enhancement labels Nov 16, 2016

llllllllll approved these changes Dec 6, 2016

View reviewed changes

Add rudimentry debug logging to odo.

279b889

groutr force-pushed the route-logging branch from bb39be8 to 279b889 Compare December 15, 2016 06:10

groutr and others added 2 commits December 15, 2016 00:13

Update variable names. Add cost to debug message.

93df92a

Merge branch 'master' into route-logging

c2253c5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add debug logging to odo. #502

Add debug logging to odo. #502

groutr commented Nov 16, 2016

llllllllll left a comment •

edited

Loading

llllllllll Nov 16, 2016

groutr Nov 17, 2016

llllllllll Nov 16, 2016

groutr Nov 17, 2016 •

edited

Loading

llllllllll Nov 16, 2016

groutr commented Nov 17, 2016 •

edited

Loading

groutr commented Nov 17, 2016

groutr commented Nov 18, 2016 •

edited

Loading

llllllllll commented Dec 6, 2016

groutr commented Dec 9, 2016

Add debug logging to odo. #502

Are you sure you want to change the base?

Add debug logging to odo. #502

Conversation

groutr commented Nov 16, 2016

llllllllll left a comment • edited Loading

Choose a reason for hiding this comment

llllllllll Nov 16, 2016

Choose a reason for hiding this comment

groutr Nov 17, 2016

Choose a reason for hiding this comment

llllllllll Nov 16, 2016

Choose a reason for hiding this comment

groutr Nov 17, 2016 • edited Loading

Choose a reason for hiding this comment

llllllllll Nov 16, 2016

Choose a reason for hiding this comment

groutr commented Nov 17, 2016 • edited Loading

groutr commented Nov 17, 2016

groutr commented Nov 18, 2016 • edited Loading

llllllllll commented Dec 6, 2016

groutr commented Dec 9, 2016

llllllllll left a comment •

edited

Loading

groutr Nov 17, 2016 •

edited

Loading

groutr commented Nov 17, 2016 •

edited

Loading

groutr commented Nov 18, 2016 •

edited

Loading