Skip to content

Commit

Permalink
feat(words): Add words function (#805)
Browse files Browse the repository at this point in the history
* docs(words): Add docs for words

* feat(words): Add words function

* test(words): Add words tests

* fix: type error fixed

* fix: lint

* fix: lint

* fix: lint

* fix: rename file

* fix: fix: rename file

* fix: change expln

* fix: update regular expression for Unicode word matching & change expln

* fix: Excluded control characters from the pattern.

* fix: Excluded control characters from the pattern.

* fix: lint

* fix:lint

* fix: lint

* fix: slice lint

* Add docs & fix words to be stricter

* chore: Use words instead of getWords

* lint

---------

Co-authored-by: Sojin Park <[email protected]>
  • Loading branch information
scato3 and raon0211 authored Nov 10, 2024
1 parent 087a982 commit 3123121
Show file tree
Hide file tree
Showing 20 changed files with 326 additions and 30 deletions.
15 changes: 15 additions & 0 deletions benchmarks/performance/words.bench.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
import { bench, describe } from 'vitest';
import { words as wordsToolkit } from 'es-toolkit';
import { words as wordLodash_ } from 'lodash';

describe('Performance Comparison: es-toolkit words vs lodash words', () => {
const testString = 'This is a test string with different_cases and UPPERCASE words 🚀 and more symbols';

bench('es-toolkit words', () => {
wordsToolkit(testString);
});

bench('lodash words', () => {
wordLodash_(testString);
});
});
44 changes: 44 additions & 0 deletions docs/ja/reference/string/words.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# words

文字列を単語単位で分割し、配列として返します。ASCIIおよびUnicode文字をすべて単語として認識できます。

## インターフェース

```ts
function words(str: string): string[];
```

### パラメータ

- `str` (`string`): 単語に分割する文字列です。

### 戻り値

(`string[]`): 文字列を単語単位で分割した配列です。

##

```typescript
words('fred, barney, & pebbles');
// => ['fred', 'barney', 'pebbles']

words('camelCaseHTTPRequest🚀');
// => ['camel', 'Case', 'HTTP', 'Request', '🚀']

words('Lunedì 18 Set');
// => ['Lunedì', '18', 'Set']
```

## Lodash 互換性

`es-toolkit/compat` から `words` をインポートすると、Lodash と互換になります。

- `words`では、文字列を分割する正規表現を変更するために、第二引数`pattern`を提供できます。
- `words`は、第一引数が文字列でない場合、自動的に文字列に変換します。

```typescript
import { words } from 'es-toolkit/compat';

words('fred, barney, & pebbles', /[^, ]+/g);
// 戻り値: ['fred', 'barney', '&', 'pebbles']
```
44 changes: 44 additions & 0 deletions docs/ko/reference/string/words.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# words

문자열을 단어 단위로 분리해 배열로 반환해요. ASCII 및 유니코드 문자를 모두 단어로 인식할 수 있어요.

## 인터페이스

```ts
function words(str: string): string[];
```

### 파라미터

- `str` (`string`): 단어로 분리할 문자열.

### 반환 값

(`string[]`): 문자열을 단어 단위로 분리한 배열.

## 예시

```typescript
words('fred, barney, & pebbles');
// => ['fred', 'barney', 'pebbles']

words('camelCaseHTTPRequest🚀');
// => ['camel', 'Case', 'HTTP', 'Request', '🚀']

words('Lunedì 18 Set');
// => ['Lunedì', '18', 'Set']
```

## Lodash 호환성

`es-toolkit/compat`에서 `chunk`를 가져오면 lodash와 호환돼요.

- `words`에서 문자열을 분리하는 정규식을 바꾸기 위해서 두 번째 인자 `pattern`을 제공할 수 있어요.
- `words`는 첫 번째 인자가 문자열이 아닌 경우, 자동으로 문자열로 바꿔요.

```typescript
import { words } from 'es-toolkit/compat';

words('fred, barney, & pebbles', /[^, ]+/g);
// 반환 값: ['fred', 'barney', '&', 'pebbles']
```
44 changes: 44 additions & 0 deletions docs/reference/string/words.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# words

Splits a string into an array of words. It can recognize both ASCII and Unicode characters as words.

## Signature

```ts
function words(str: string): string[];
```

### Parameters

- `str` (`string`): The string to split into words.

### Returns

(`string[]`): An array of words extracted from the string.

## Examples

```typescript
words('fred, barney, & pebbles');
// => ['fred', 'barney', 'pebbles']

words('camelCaseHTTPRequest🚀');
// => ['camel', 'Case', 'HTTP', 'Request', '🚀']

words('Lunedì 18 Set');
// => ['Lunedì', '18', 'Set']
```

## Lodash Compatibility

To ensure full compatibility with lodash, you can import `words` from `es-toolkit/compat`.

- `words` also takes an optional second parameter, `pattern`, which allows you to define custom patterns for splitting the string.
- `words` will automatically convert the first argument to a string if it isn't one already.

```typescript
import { words } from 'es-toolkit/compat';

words('fred, barney, & pebbles', /[^, ]+/g);
// Returns ['fred', 'barney', '&', 'pebbles']
```
44 changes: 44 additions & 0 deletions docs/zh_hans/reference/string/words.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# words

将字符串拆分为单词数组。它可以识别 ASCII 和 Unicode 字符作为单词。

## 签名

```ts
function words(str: string): string[];
```

### 参数

- `str` (`string`): 要拆分为单词的字符串。

### 返回值

(`string[]`): 从字符串中提取的单词数组。

## 示例

```typescript
words('fred, barney, & pebbles');
// => ['fred', 'barney', 'pebbles']

words('camelCaseHTTPRequest🚀');
// => ['camel', 'Case', 'HTTP', 'Request', '🚀']

words('Lunedì 18 Set');
// => ['Lunedì', '18', 'Set']
```

## Lodash 兼容性

`es-toolkit/compat` 中导入 `words` 以实现与 lodash 的完全兼容。

- `words` 还接受一个可选的第二个参数 `pattern`,允许您定义自定义模式来拆分字符串。
- 如果第一个参数不是字符串,`words` 将自动将其转换为字符串。

```typescript
import { words } from 'es-toolkit/compat';

words('fred, barney, & pebbles', /[^, ]+/g);
// ['fred', 'barney', '&', 'pebbles']
```
9 changes: 5 additions & 4 deletions src/compat/array/differenceBy.spec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,13 @@ describe('differenceBy', () => {
expect(actual).toEqual([{ x: 2 }]);
});

it('should provide correct `iteratee` arguments', () => {
it('should provide correct iteratee arguments', () => {
let args: any;

differenceBy([2.1, 1.2], [2.3, 3.4], function () {
// eslint-disable-next-line
args || (args = slice.call(arguments));
differenceBy([2.1, 1.2], [2.3, 3.4], function (...rest: any[]) {
if (!args) {
args = slice.call(rest);
}
});

expect(args).toEqual([2.3]);
Expand Down
2 changes: 1 addition & 1 deletion src/compat/string/startCase.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
import { getWords } from '../../string/_internal/getWords.ts';
import { words as getWords } from '../../string/words.ts';
import { normalizeForCase } from '../_internal/normalizeForCase.ts';

/**
Expand Down
44 changes: 44 additions & 0 deletions src/compat/string/words.spec.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
import { describe, expect, it } from 'vitest';
import { words } from './words';

describe('words', () => {
it('splits a simple ASCII comma-separated string into words', () => {
const result = words('fred, barney, & pebbles');
expect(result).toEqual(['fred', 'barney', 'pebbles']);
});

it('splits a string with custom pattern', () => {
const result = words('fred, barney, & pebbles', /[^, ]+/g);
expect(result).toEqual(['fred', 'barney', '&', 'pebbles']);
});

it('returns an empty array when input is an empty string', () => {
const result = words('');
expect(result).toEqual([]);
});

it('correctly handles a string with multiple number inputs', () => {
const result = words('+0 -3 +3 -4 +4');
expect(result).toEqual(['0', '3', '3', '4', '4']);
});

it('splits a space-separated string into individual words', () => {
const result = words('split these words');
expect(result).toEqual(['split', 'these', 'words']);
});

it('splits a string representation of an array', () => {
const result = words([1, 2, 3]);
expect(result).toEqual(['1', '2', '3']);
});

it('returns an empty array when input is undefined', () => {
const result = words(undefined);
expect(result).toEqual([]);
});

it('correctly handles a string with Unicode emojis and special characters', () => {
const result = words('example🚀with✨emojis💡and🔍special🌟characters');
expect(result).toEqual(['example', '🚀', 'with', '✨', 'emojis', '💡', 'and', '🔍', 'special', '🌟', 'characters']);
});
});
22 changes: 22 additions & 0 deletions src/compat/string/words.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
import { CASE_SPLIT_PATTERN } from '../../string/words.ts';
import { toString } from '../util/toString.ts';

/**
* Splits `string` into an array of its words.
*
* @param {string | object} str - The string or object that is to be split into words.
* @param {RegExp | string} [pattern] - The pattern to match words.
* @returns {string[]} - Returns the words of `string`.
*
* @example
* const wordsArray1 = words('fred, barney, & pebbles');
* // => ['fred', 'barney', 'pebbles']
*
*/
export function words(str?: string | object, pattern: RegExp | string = CASE_SPLIT_PATTERN): string[] {
const input = toString(str);

const words = Array.from(input.match(pattern) ?? []);

return words.filter(x => x !== '');
}
2 changes: 1 addition & 1 deletion src/string/camelCase.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import { getWords } from './_internal/getWords.ts';
import { capitalize } from './capitalize.ts';
import { words as getWords } from './words.ts';

/**
* Converts a string to camel case.
Expand Down
2 changes: 1 addition & 1 deletion src/string/constantCase.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
import { getWords } from './_internal/getWords.ts';
import { words as getWords } from './words.ts';

/**
* Converts a string to constant case.
Expand Down
1 change: 1 addition & 0 deletions src/string/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,4 @@ export { trimStart } from './trimStart.ts';
export { unescape } from './unescape.ts';
export { upperCase } from './upperCase.ts';
export { upperFirst } from './upperFirst.ts';
export { words } from './words.ts';
2 changes: 1 addition & 1 deletion src/string/kebabCase.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
import { getWords } from './_internal/getWords.ts';
import { words as getWords } from './words.ts';

/**
* Converts a string to kebab case.
Expand Down
2 changes: 1 addition & 1 deletion src/string/lowerCase.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
import { getWords } from './_internal/getWords.ts';
import { words as getWords } from './words.ts';

/**
* Converts a string to lower case.
Expand Down
2 changes: 1 addition & 1 deletion src/string/pascalCase.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import { getWords } from './_internal/getWords.ts';
import { capitalize } from './capitalize.ts';
import { words as getWords } from './words.ts';

/**
* Converts a string to Pascal case.
Expand Down
2 changes: 1 addition & 1 deletion src/string/snakeCase.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
import { getWords } from './_internal/getWords.ts';
import { words as getWords } from './words.ts';

/**
* Converts a string to snake case.
Expand Down
2 changes: 1 addition & 1 deletion src/string/startCase.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
import { getWords } from './_internal/getWords.ts';
import { words as getWords } from './words.ts';

/**
* Converts the first character of each word in a string to uppercase and the remaining characters to lowercase.
Expand Down
2 changes: 1 addition & 1 deletion src/string/upperCase.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
import { getWords } from './_internal/getWords.ts';
import { words as getWords } from './words.ts';

/**
* Converts a string to upper case.
Expand Down
Loading

0 comments on commit 3123121

Please sign in to comment.