Releases: reworkd/bananalyzer
Releases · reworkd/bananalyzer
v.0.7.3
What's Changed
- Added contact schema field descriptions by @KhoomeiK in #27
- Forum test set by @Srijan-Subedi in #24
- Additional tests by @Srijan-Subedi in #28
- made eval matcher more pythonic by @KhoomeiK in #29
- Bump mypy from 1.7.0 to 1.7.1 by @dependabot in #31
- Rename (sub)domain -> (sub)category and add url domain flag by @KhoomeiK in #32
- 🍌 Hackathon start by @asim-shrestha in #33
- hackathon by @awtkns in #34
- 👀 you know whats up by @Srijan-Subedi in #36
- Autogenerate annotations for evals with tarsier+LLM by @KhoomeiK in #37
- attorney detail auto annotations by @KhoomeiK in #39
- 💯 Axis new 10 evals by @Srijan-Subedi in #38
- ✨ Attorney data, schema descriptions, auto annotate bugfixes by @asim-shrestha in #41
- Just MHTML files by @Srijan-Subedi in #43
New Contributors
- @Srijan-Subedi made their first contribution in #24
Full Changelog: v.0.6.0...v.0.7.3
v.0.6.0
What's Changed
- ✨ Separate static data from package by @asim-shrestha in #14
- ✨ Re do data by @asim-shrestha in #16
- ✨ Convert to crlf on download by @asim-shrestha in #17
- 🫡 Aplha numeric scoring by @awtkns in #19
- update example labels by @KhoomeiK in #18
- Revert "update example labels" by @asim-shrestha in #20
- ✨ Update formatting by @asim-shrestha in #21
- merge main by @KhoomeiK in #23
- fix import order for black by @KhoomeiK in #22
- 🍌 Add more schema information to fetch models by @asim-shrestha in #8
- d✨ Ignore none by @asim-shrestha in #25
- Fix none by @asim-shrestha in #26
New Contributors
Full Changelog: v.0.5.0...v.0.6.0
v.0.5.0
What's Changed
- Bump mypy from 1.6.1 to 1.7.0 by @dependabot in #2
- Bump black from 23.10.1 to 23.11.0 by @dependabot in #1
- 🍌 Update eval to print failure reason by @asim-shrestha in #3
- ✨ Update detail page examples by @asim-shrestha in #4
- ✨ Retries to listing scraper by @asim-shrestha in #5
- 🍌 Updated more examples by @asim-shrestha in #7
- 📄 Add 10 additional JSON evals by @asim-shrestha in #11
- 📄 URL evals by @asim-shrestha in #10
- 🙉 Monke Good by @awtkns in #13
- 🚀 Field parameterization by @awtkns in #15
New Contributors
- @dependabot made their first contribution in #2
- @asim-shrestha made their first contribution in #3
- @awtkns made their first contribution in #13
Full Changelog: v.0.1.0...v.0.5.0
v.0.1.0
Super excited for the first version of Banana-lyzer, an open source AI Agent evaluation framework and dataset for web tasks with Playwright (And has a banana theme because why not) 🍌
We aim to solve the following issues with testing web agents:
- Websites change overtime, are affected by latency, and may have anti bot protections.
We need a system that can reliably save and deploy historic/static snapshots of websites. - Standard web practices are loose and there is an abundance of different underlying ways to represent a single individual website. For an agent to best generalize, we require building a diverse dataset of websites across industries and use-cases.
- We have specific evaluation criteria and agent use cases focusing on structured and direct information retrieval across websites.
- There exists valuable web task datasets and evaluations that we'd like to unify in a single repo (Mind2Web, WebArena, etc).