This Rust program extracts product information from any website using a web scraper and the ChatGPT API. It fetches the HTML content from a given URL and extracts relevant texts and images. The program then generates a prompt to call the ChatGPT API, which helps in extracting the required product information. The extracted information is returned as a JSON object with a specific schema.
-
Install Rust programming language: https://www.rust-lang.org/tools/install
-
Ensure yout system has build extras
sudo apt install build-essential
-
install libssl-dev
sudo apt-get -y install libssl-dev
-
Install the newest version of protobuff
ARCH="linux-x86_64" && \ VERSION="22.2" && \ curl -OL "https://github.com/protocolbuffers/protobuf/releases/download/v$VERSION/protoc-$VERSION-$ARCH.zip" && \ sudo unzip -o "protoc-$VERSION-$ARCH.zip" bin/protoc "include/*" -d /usr/local && \ rm -f "protoc-$VERSION-$ARCH.zip"
- Clone the repository
git clone https://github.com/hansaskov/extract-product
cd extract-product
- Set up the environment variable for OpenAI API key in Secrets.toml
Make a GET request to the /extract endpoint with the following query parameter:
url
(required): The URL of the website from which the product information is to be extracted.
curl "https://extract-product.shuttleapp.rs?url=https://www.example.com/product"
{
"name": "Example Product",
"price": "$29.99",
"description": "This is an example product description.",
"image_url": "https://www.example.com/images/product.jpg"
}