System for summarizing the content of webpages by analyzing their HTML. Easily extract key information from any webpage and view it in a condensed format.
sequenceDiagram
participant User
participant GoogleSpreadsheet as Google Spreadsheet
participant GAS as Google Apps Script
participant GitHubActions as GitHub Actions
participant ChatGPTAPI as ChatGPT API
User->>GoogleSpreadsheet: Enter article URL
GoogleSpreadsheet->>GAS: Trigger on change
GAS->>GitHubActions: Start job
GitHubActions->>GitHubActions: Retrieve HTML from URL entered by User
GitHubActions->>GitHubActions: Execute html2markdown
GitHubActions->>GAS: POST API with converted markdown to save in Spreadsheet
GAS->>ChatGPTAPI: Regular batch to send markdown content for summarization
ChatGPTAPI-->>GAS: Return summary
GAS->>GoogleSpreadsheet: Transcribe summary in Spreadsheet
tree . -I node_modules
.
├── LICENSE
├── README.md
├── html2markdown
│ ├── README.md
│ ├── package.json
│ ├── src
│ │ └── main.ts
│ ├── tsconfig.json
│ └── yarn.lock
└── webpage-summarizer
├── README.md
├── package-lock.json
├── package.json
├── src
│ ├── apis
│ │ ├── do-get.ts
│ │ └── do-post.ts
│ ├── appsscript.json
│ ├── clients
│ │ ├── chatgpt-client.ts
│ │ └── github-client.ts
│ ├── jobs
│ │ ├── invoke-html-to-markdown.ts
│ │ └── summarize.ts
│ └── modules
│ ├── auth.ts
│ ├── fetch-from-spreadsheet.ts
│ ├── fetch-target-urls.ts
│ ├── filter-latest-summaries.ts
│ ├── log-error.ts
│ ├── schema.ts
│ └── setup.ts
├── tools
│ ├── deploy.sh
│ └── open.sh
└── tsconfig.json
9 directories, 27 files
Folk this repository and clone it.
# example
git clone [email protected]:${YOUR_GITHUB_USER_NAME}/webpage-summarizer.git
- Install clasp
npm install -g @google/clasp
- Enable the Google Apps Script API: https://script.google.com/home/usersettings
- Execute below commands
cd webpage-summarizer
npm install
clasp login
clasp create --title "webpage-summarizer" --type sheets --rootDir ./src
clasp push --force
clasp open
- Set environment variables
- required
GITHUB_OWNER
: Your GitHub usernameGITHUB_REPO
: Your GitHub repository nameGITHUB_TOKEN
: Your GitHub Personal Access TokenGITHUB_WORKFLOW_ID
: Your GitHub Actions workflow IDOPENAI_API_KEY
: Your OpenAI API KeyWEBPAGE_SUMMARIZER_API_KEY
: Any string you generate for easy authentication in Google Apps Script- This used for authentication when you access below
WEBPAGE_SUMMARIZER_API_URL
- This used for authentication when you access below
- Prepare spreadsheet as follows
summaries
sheet- content (string)
- summary (string)
- url (string)
- date (string)
prompts
sheet- instruction (string)
- Write your own prompts for ChatGPT API.
- constraints (string)
- Write your own prompts for ChatGPT API.
- instruction (string)
latest_summaries
sheet- summary (string)
- url (string)
- date (string)
error_logs
sheet
- Publish API
- Copy your Web App URL (
WEBPAGE_SUMMARIZER_API_URL
) andWEBPAGE_SUMMARIZER_API_KEY
- Set triggers by executing
setUp
function on Google Apps Script
Add below secrets to your repository.
- required
WEBPAGE_SUMMARIZER_API_KEY
: Your Web App API Key (see above)
- optional
- If you want to use webpage-summarizer only for yourself, add below secret. This secret is used as a default endpoint.
WEBPAGE_SUMMARIZER_API_URL
: Your Web App URL (see above)
- If you want to use webpage-summarizer with multiple users, add below secrets.
USERS_INFO_URL
: API endpoint for getting GAS Web App URL associated with a target user- This API endpoint returns JSON with
url
field which is GAS Web App URL associated with a target user or null.
- This API endpoint returns JSON with
USERS_INFO_API_KEY
: Any string you generate for easy authentication in Google Apps Script- This used for authentication when you access
USERS_INFO_URL
- This used for authentication when you access
- If you want to use webpage-summarizer only for yourself, add below secret. This secret is used as a default endpoint.