Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic example in readme fails (Word2Vec download 404s) #38

Open
SebastianCallh opened this issue Oct 27, 2022 · 4 comments
Open

Basic example in readme fails (Word2Vec download 404s) #38

SebastianCallh opened this issue Oct 27, 2022 · 4 comments

Comments

@SebastianCallh
Copy link

Running the example in the readme

using Embeddings
const embtable = load_embeddings(Word2Vec) # or load_embeddings(FastText_Text) or ...

fails with

ERROR: HTTP.ExceptionRequest.StatusError(404, "GET", "/dl4j-distribution/GoogleNews-vectors-negative300.bin.gz", HTTP.Messages.Response:
"""
HTTP/1.1 404 Not Found
x-amz-request-id: 7CJ4RS3EZ3VHMSR4
x-amz-id-2: JQ2JTqHhFeLJ7JtP5pJM+AzcR3Kq8kKB4Hy5Tars31NaRlk3Xo++mRiLVYHArclGUSZQm5Ztv/o=
Content-Type: application/xml
Transfer-Encoding: chunked
Date: Thu, 27 Oct 2022 15:01:28 GMT
Server: AmazonS3

""")

Are the word2vec embeddings available elsewhere? Otherwise this should probably be addressed in the readme.

@oxinabox
Copy link
Member

A good question, I suspect they must be available somewhere else.
They are so often used, though they are old now.

@logankilpatrick
Copy link
Contributor

I just added the weights to hugging face: https://huggingface.co/LoganKilpatrick/GoogleNews-vectors-negative300/blob/main/GoogleNews-vectors-negative300.bin.gz

@ngiann
Copy link

ngiann commented Feb 6, 2023

Thanks for opening this issue and the replies so far. I copied the new URL and inserted at this line:

"https://s3.amazonaws.com/dl4j-distribution/GoogleNews-vectors-negative300.bin.gz",

Unfortunately, when I try load_embeddings(Word2Vec), I get the following error message.

7-Zip (a) [64] 17.04 : Copyright (c) 1999-2021 Igor Pavlov : 2017-08-28
p7zip Version 17.04 (locale=en_GB.UTF-8,Utf16=on,HugeFiles=on,64 bits,16 CPUs Intel(R) Core(TM) i7-10875H CPU @ 2.30GHz (A0652),ASM,AES-NI)

Scanning the drive for archives:
1 file, 36239 bytes (36 KiB)                        

Extracting archive: /home/nikos/.julia/datadeps/word2vec 300d/GoogleNews-vectors-negative300.bin.gz
ERROR: /home/nikos/.julia/datadeps/word2vec 300d/GoogleNews-vectors-negative300.bin.gz
/home/nikos/.julia/datadeps/word2vec 300d/GoogleNews-vectors-negative300.bin.gz
Open ERROR: Can not open the file as [gzip] archive


ERRORS:
Is not archive
    
Can't open as archive: 1
Files: 0
Size:       0
Compressed: 0

I downloaded the file manually from the new URL and this works.
Once, downloaded I opened the file with Archive manager in Ubuntu and this worked too.

@oxinabox
Copy link
Member

hmm that's weird, 7zip is normally very reliable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants