Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different .show behaviors in notebooks for version 0.1.5 and 0.1.6 #1075

Closed
shyamsn97 opened this issue Jun 21, 2023 · 3 comments
Closed

Different .show behaviors in notebooks for version 0.1.5 and 0.1.6 #1075

shyamsn97 opened this issue Jun 21, 2023 · 3 comments

Comments

@shyamsn97
Copy link

Describe the bug
Calling .show on a daft dataframe has different behaviors in 0.1.5 vs 0.1.6

To Reproduce
In a notebook, run this code w/ getdaft==0.1.5 and 0.1.6:

import daft
from daft import DataType
from PIL import Image
import requests
import numpy as np

images = [
    "https://raw.githubusercontent.com/Purukitto/pokemon-data.json/master/images/pokedex/hires/001.png",
]

df = daft.from_pydict({"images":images, "initial_array":np.zeros((1,512))})

@daft.udf(return_dtype=DataType.python())
def download_images(images):
    return [Image.open(requests.get(im, stream=True).raw).convert("RGB") for im in images.to_pylist()]

@daft.udf(return_dtype=DataType.python())
def arrays(column):
    return np.zeros((1, 512))

df = df.with_column("downloaded_images", download_images(daft.col('images'))).collect()
df = df.with_column("embeddings", arrays(daft.col("images"))).collect()

df.show(1)

In the code above I'm initializing a dataframe with an image url and a dummy array. Then I'm creating a column with a PIL Image that is downloaded and adding another column with a dummy array for the "embeddings"

Version 0.1.5:

Screenshot from 2023-06-21 12-11-01

Version 0.1.6:
Screenshot from 2023-06-21 12-10-38

Not sure if this is on my end w/ respect to jupyter versions / other python pkg versions. Let me know if you need any additional context, thanks!

Additional context
jupyter-lab==4.0.2
python==3.9

@jaychia
Copy link
Contributor

jaychia commented Jun 21, 2023

Indeed! You actually should be able to get the displays working by using our Daft native image types instead.

I modified your code to take advantage of the new image types - using .url.download().image.decode(). Notice that PIL is no longer involved here and everything happens in Rust-land with expressions!

import daft
from daft import DataType
from PIL import Image
import requests
import numpy as np

images = [
    "https://raw.githubusercontent.com/Purukitto/pokemon-data.json/master/images/pokedex/hires/001.png",
]

df = daft.from_pydict({"images":images, "initial_array":np.zeros((1,512))})

@daft.udf(return_dtype=DataType.python())
def arrays(column):
    return np.zeros((1, 512))

df = df.with_column("downloaded_images", daft.col('images').url.download().image.decode()).collect()
df = df.with_column("embeddings", arrays(daft.col("images"))).collect()

df.show(1)

Regardless though, we should re-enable this display feature for PIL Python types! I'll work on re-enabling this feature so that if you have a python() type column with PIL images it will detect the object and display it correctly.

@shyamsn97
Copy link
Author

awesome! Thanks for the explanation :)

jaychia added a commit that referenced this issue Jun 22, 2023
Closes: #1077 and #1075 

* Adds a fix for `Series.from_pylist(..., pyobj="force")` when the list
is a list of PIL Images (cc @clarkzinzow)
* Re-enables our HTML viz hooks in Python, for Python objects when
calling into the HTML repr for Python arrays

---------

Co-authored-by: Jay Chia <[email protected]@users.noreply.github.com>
@jaychia
Copy link
Contributor

jaychia commented Jun 22, 2023

This should be fixed by #1078!

@jaychia jaychia closed this as completed Jun 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants