Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add debug logging to odo. #502

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

groutr
Copy link
Contributor

@groutr groutr commented Nov 16, 2016

This was something that I added to an old fork I had of odo. I found it useful for understanding the route of transformations in odo and it also helped me understand an issue that I was having at the time.

Is logging like this something that is desirable for odo? The log will record the route that odo took transforming from source to destination.

In [1]: import odo

In [2]: import logging

In [3]: import pandas as pd

In [4]: logging.basicConfig(level=logging.DEBUG)

In [5]: a = list(range(10))

In [6]: odo.odo(a, pd.DataFrame)
DEBUG:odo.core:[list -> ndarray] list_to_numpy ({'dshape': dshape("10 * int64")})
DEBUG:odo.core:[ndarray -> DataFrame] numpy_to_dataframe ({'dshape': dshape("10 * int64")})
Out[6]: 
   0
0  0
1  1
2  2
3  3
4  4
5  5
6  6
7  7
8  8
9  9

Copy link
Member

@llllllllll llllllllll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for posting this PR!

I think this could help with debugging in some cases but I am worried about the overhead when idle. Can you post some profiling data to see how much time is spent in dropped log lines?

from contextlib import contextmanager
from warnings import warn

import networkx as nx
from datashape import discover
from .utils import expand_tuples, ignoring

log = logging.getLogger(__name__)
log.addHandler(logging.NullHandler())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you explicitly logging registering a handler for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I misunderstood initially what NullHandler was for. It's not needed here and I will remove it.

x = f(x, excluded_edges=excluded_edges, **kwargs)
return x
except NotImplementedError as e:
if kwargs.get('raise_on_errors'):
raise
warn(FailedConversionWarning(A, B, e))
log.warning('Failed Conversion {} -> {} ({})'.format(A, B, str(e)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need a log warning and a warning.warn?

Copy link
Contributor Author

@groutr groutr Nov 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if both are desired. The difference between warning.warn and log.warning is explained on this page https://docs.python.org/2/howto/logging.html. In short, warning.warn is intended for libraries where application code should be modified to accomodate a change in the library.

@@ -57,12 +60,15 @@ def _transform(graph, target, source, excluded_edges=None, ooc_types=ooc_types,
ooc_types=ooc_types)
try:
for (A, B, f) in pth:
log.debug('[{src} -> {dest}] {f} ({kwargs})'.format(f=f.__name__,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general you should let logging libraries handle string formatting, this ensuers that no formatting happens unless the log message is being handled. logging uses % formatting so you can pass the values to format in as positionals after the message format.

@groutr
Copy link
Contributor Author

groutr commented Nov 17, 2016

Since the path is known before the loop, an alternative approach would be to log the path before traversing it. Though as far as performance goes, my initial impression is the calls to various conversion functions would far outweigh the overhead from calls to the log.

@groutr
Copy link
Contributor Author

groutr commented Nov 17, 2016

I've done some simple (very simple) benchmarking/profiling. Yes, logging does add some overhead. If we are logging below the log level, each log call results in three function calls. getEffectiveLevel is probably the most expensive call (has a while loop that traverses the current logger and all its parents), the other two seem to be relatively cheap. I don't think any of odo's paths even approach 100 hops in length.

I think my alternative solution will sidestep much of this overhead. Logging should be fine if we avoid calling it in the loop. Thoughts?

import logging
log = logging.getLogger(__name__)
loops = 100
def withlogging():
    y = 0
    for x in range(loops):
        log.debug("This is a debug message %s", x)
        y += 1

def outlogging():
    y = 0
    log.debug("This is a debug message %s", loops)
    for x in range(loops):
        y += 1

def withoutlogging():
    y = 0
    for x in range(loops):
        y+= 1
(no configuring of logging level)
>>> %timeit withlogging()
10000 loops, best of 3: 109 µs per loop
>>> %timeit withoutlogging()
100000 loops, best of 3: 4.52 µs per loop
>>> %timeit outlogging()
100000 loops, best of 3: 5.84 µs per loop
>>> %prun withlogging()
        304 function calls in 0.000 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 withlog.py:5(wlogging)
      100    0.000    0.000    0.000    0.000 __init__.py:1515(isEnabledFor)
      100    0.000    0.000    0.000    0.000 __init__.py:1501(getEffectiveLevel)
      100    0.000    0.000    0.000    0.000 __init__.py:1257(debug)
        1    0.000    0.000    0.000    0.000 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
>>> %prun withoutlogging()
         4 function calls in 0.000 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 withlog.py:12(wologging)
        1    0.000    0.000    0.000    0.000 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
>>> %prun outlogging()
        7 function calls in 0.000 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 withlog.py:17(outlogging)
        1    0.000    0.000    0.000    0.000 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 __init__.py:1257(debug)
        1    0.000    0.000    0.000    0.000 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 __init__.py:1515(isEnabledFor)
        1    0.000    0.000    0.000    0.000 __init__.py:1501(getEffectiveLevel)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

@groutr
Copy link
Contributor Author

groutr commented Nov 18, 2016

I left the log.debug call in the for loop for now.
Variations on my system seem to have a more visible effect than the log.debug calls.

With logging, but no configuration (best of 7 runs) list -> DataFrame with 1 element.
0.0003306s
Without logging (commented out) (best of 7 runs) tuple -> DataFrame with 1 element.
0.0003268s

I tried to use a path that I was easy to benchmark and reasonable fast (not spend a whole lot of time with data structures. The path was 4 hops.
tuple -> list -> ndarray -> pd.DataFrame

EDIT: Updated timings

@llllllllll
Copy link
Member

These changes look good, could you rebase? After the ci runs on the rebase I will merge.

@groutr
Copy link
Contributor Author

groutr commented Dec 9, 2016

I'll rebase hopefully this weekend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants