GitHub - dylanlanigansmith/journalist: Multimodal LLM Web Agents that actually work

Journalist - Multimodal Browser Agents

VIDEOS GO HERE

make an edit with screenshots of each step + total terminal output

Overview

Foundationally, this works because the new OpenAI API 'define json for output/functions' stuff is really effective, and because gpt4o-mini is so cost-effective. This is my first time using the actual openAPI library and also Selenium, only had done traditional scraping in the past.

This is a PoC agent that hooks up gpt-4o-mini to a browser. The multimodal large-language model can play the Wikipedia Game, winning frequently.

Building/Utilization

functionally v2 is better but v1 is simple and has nice logging, easier to see how it works

ocrlib is only macOS compatible right now.

TODO

port OCR (tesseract) or do no ocr mode
better utilise config
buy more api credits :(

#Python 3.10
#for dependencies see requirements.txt, sorry in advance. create a venv if you're into that sorta thing. 
#run once to generate default config.json in current directory, stick your openAI API key in it, etc.

#usage: python /path/to/journalv?.py <url> "Prompt"
#examples:
$ python v1/src/journalist.py http://pagewithinfo.com "Find what color the info should be?"

$ python v2/src/jv2.py http://pagewithinfo.com "Find what color the info should be?"

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
v1/src		v1/src
v2/src		v2/src
.gitignore		.gitignore
.python-version		.python-version
LICENSE.txt		LICENSE.txt
README.md		README.md
prompts.txt		prompts.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Journalist - Multimodal Browser Agents

Overview

Building/Utilization

About

Releases

Packages

Languages

License

dylanlanigansmith/journalist

Folders and files

Latest commit

History

Repository files navigation

Journalist - Multimodal Browser Agents

Overview

Building/Utilization

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages