Skip to content

Commit

Permalink
Export more, add documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
russellsteadman authored Oct 12, 2023
1 parent 42e1bdf commit f1ff646
Show file tree
Hide file tree
Showing 4 changed files with 84 additions and 6 deletions.
79 changes: 78 additions & 1 deletion packages/bot/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,83 @@
# NetScrape

Web scraping made efficient, simple, and compliant.
Web scraping for Node.js made efficient, simple, and compliant. NetScrape complies with the [Robots Exclusion Protocol](https://www.rfc-editor.org/rfc/rfc9309.html).

## Installation

```bash
npm install netscrape
```

## Usage

Netscrape is designed to be simple, but also extensible enough for advanced use cases. The following example demonstrates how to make a simple request to a website.

```js
import Bot from 'netscrape';

const exampleBot = new Bot({ name: 'ExampleBot', version: '1.0' });

try {
const response = await exampleBot.makeRequest('https://www.example.com/path');
console.log(response.body);
} catch (error) {
console.error(error);
}
```

### Bot#constructor

```ts
import Bot from 'netscrape';

type BotOptions = {
name: string;
version: string;
minimumRequestDelay?: number;
maximumRequestDelay?: number;
disableCaching?: boolean;
policyURL?: string;
hideLibraryAgent?: boolean;
userAgent?: string;
};

const exampleBot = new Bot({
name: 'ExampleBot' /* required, Name of your bot */,
version: '1.0' /* required, Version of your bot */,
minimumRequestDelay: 1000 /* optional, Minimum delay between requests in milliseconds */,
maximumRequestDelay: 5000 /* optional, Maximum delay between requests in milliseconds (default 10000) */,
disableCaching:
true /* optional, Disable caching of responses (default false) */,
policyURL:
'https://www.example.com/robots.txt' /* optional, URL to robots.txt file (default https://npm.im/netscrape) */,
hideLibraryAgent:
true /* optional, Hide the library agent from the user agent (default false) */,
userAgent:
'ExampleBot/1.0' /* optional, Custom user agent, overrides all other user agent fields */,
});
```

### Bot#makeRequest

```ts
import Bot from 'netscrape';

const exampleBot = new Bot({ name: 'ExampleBot', version: '1.0' });

try {
/* Note: Bot#makeRequest automatically requests /robots.txt in the background */
const response = await exampleBot.makeRequest(
'https://www.example.com/path' /* required, well-formatted URL to make request to */,
false /* optional, should you return a byte stream instead of utf8 text */,
);

/* Bot#makeRequest returns the raw npm.im/got package request response */
console.log(response.body);
} catch (error) {
/* Robots.txt rejection, robots.txt 500 error, etc. */
console.error(error);
}
```

## License

Expand Down
2 changes: 1 addition & 1 deletion packages/bot/package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "netscrape",
"version": "0.1.0-alpha1",
"version": "0.1.0",
"description": "A structural framework for creating good bots",
"author": "Russell Steadman",
"license": "MIT",
Expand Down
2 changes: 1 addition & 1 deletion packages/bot/src/errors.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
enum BotErrorType {
export enum BotErrorType {
Request,
Delay,
RobotsTxt,
Expand Down
7 changes: 4 additions & 3 deletions packages/bot/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ import * as Errors from './errors.js';
import { RobotsTxt } from 'exclusion';
import got, { Options, type Request, type Response } from 'got';

type BotOptions = {
export type BotOptions = {
name: string;
version: string;
minimumRequestDelay?: number;
Expand All @@ -27,8 +27,8 @@ class Bot {
readonly botName!: string;
private requestDelay = {} as Record<string, number>;
private requestTime = {} as Record<string, Date>;
cache!: QuickLRU<unknown, unknown>;
dnsCachable!: CacheableLookup;
private cache!: QuickLRU<unknown, unknown>;
private dnsCachable!: CacheableLookup;
private options!: BotOptions;

constructor(options: BotOptions) {
Expand Down Expand Up @@ -291,4 +291,5 @@ class Bot {
}
}

export * from './errors.js';
export default Bot;

0 comments on commit f1ff646

Please sign in to comment.