-
Notifications
You must be signed in to change notification settings - Fork 316
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat(words): Add words function (#805)
* docs(words): Add docs for words * feat(words): Add words function * test(words): Add words tests * fix: type error fixed * fix: lint * fix: lint * fix: lint * fix: rename file * fix: fix: rename file * fix: change expln * fix: update regular expression for Unicode word matching & change expln * fix: Excluded control characters from the pattern. * fix: Excluded control characters from the pattern. * fix: lint * fix:lint * fix: lint * fix: slice lint * Add docs & fix words to be stricter * chore: Use words instead of getWords * lint --------- Co-authored-by: Sojin Park <[email protected]>
- Loading branch information
Showing
20 changed files
with
326 additions
and
30 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
import { bench, describe } from 'vitest'; | ||
import { words as wordsToolkit } from 'es-toolkit'; | ||
import { words as wordLodash_ } from 'lodash'; | ||
|
||
describe('Performance Comparison: es-toolkit words vs lodash words', () => { | ||
const testString = 'This is a test string with different_cases and UPPERCASE words 🚀 and more symbols'; | ||
|
||
bench('es-toolkit words', () => { | ||
wordsToolkit(testString); | ||
}); | ||
|
||
bench('lodash words', () => { | ||
wordLodash_(testString); | ||
}); | ||
}); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
# words | ||
|
||
文字列を単語単位で分割し、配列として返します。ASCIIおよびUnicode文字をすべて単語として認識できます。 | ||
|
||
## インターフェース | ||
|
||
```ts | ||
function words(str: string): string[]; | ||
``` | ||
|
||
### パラメータ | ||
|
||
- `str` (`string`): 単語に分割する文字列です。 | ||
|
||
### 戻り値 | ||
|
||
(`string[]`): 文字列を単語単位で分割した配列です。 | ||
|
||
## 例 | ||
|
||
```typescript | ||
words('fred, barney, & pebbles'); | ||
// => ['fred', 'barney', 'pebbles'] | ||
|
||
words('camelCaseHTTPRequest🚀'); | ||
// => ['camel', 'Case', 'HTTP', 'Request', '🚀'] | ||
|
||
words('Lunedì 18 Set'); | ||
// => ['Lunedì', '18', 'Set'] | ||
``` | ||
|
||
## Lodash 互換性 | ||
|
||
`es-toolkit/compat` から `words` をインポートすると、Lodash と互換になります。 | ||
|
||
- `words`では、文字列を分割する正規表現を変更するために、第二引数`pattern`を提供できます。 | ||
- `words`は、第一引数が文字列でない場合、自動的に文字列に変換します。 | ||
|
||
```typescript | ||
import { words } from 'es-toolkit/compat'; | ||
|
||
words('fred, barney, & pebbles', /[^, ]+/g); | ||
// 戻り値: ['fred', 'barney', '&', 'pebbles'] | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
# words | ||
|
||
문자열을 단어 단위로 분리해 배열로 반환해요. ASCII 및 유니코드 문자를 모두 단어로 인식할 수 있어요. | ||
|
||
## 인터페이스 | ||
|
||
```ts | ||
function words(str: string): string[]; | ||
``` | ||
|
||
### 파라미터 | ||
|
||
- `str` (`string`): 단어로 분리할 문자열. | ||
|
||
### 반환 값 | ||
|
||
(`string[]`): 문자열을 단어 단위로 분리한 배열. | ||
|
||
## 예시 | ||
|
||
```typescript | ||
words('fred, barney, & pebbles'); | ||
// => ['fred', 'barney', 'pebbles'] | ||
|
||
words('camelCaseHTTPRequest🚀'); | ||
// => ['camel', 'Case', 'HTTP', 'Request', '🚀'] | ||
|
||
words('Lunedì 18 Set'); | ||
// => ['Lunedì', '18', 'Set'] | ||
``` | ||
|
||
## Lodash 호환성 | ||
|
||
`es-toolkit/compat`에서 `chunk`를 가져오면 lodash와 호환돼요. | ||
|
||
- `words`에서 문자열을 분리하는 정규식을 바꾸기 위해서 두 번째 인자 `pattern`을 제공할 수 있어요. | ||
- `words`는 첫 번째 인자가 문자열이 아닌 경우, 자동으로 문자열로 바꿔요. | ||
|
||
```typescript | ||
import { words } from 'es-toolkit/compat'; | ||
|
||
words('fred, barney, & pebbles', /[^, ]+/g); | ||
// 반환 값: ['fred', 'barney', '&', 'pebbles'] | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
# words | ||
|
||
Splits a string into an array of words. It can recognize both ASCII and Unicode characters as words. | ||
|
||
## Signature | ||
|
||
```ts | ||
function words(str: string): string[]; | ||
``` | ||
|
||
### Parameters | ||
|
||
- `str` (`string`): The string to split into words. | ||
|
||
### Returns | ||
|
||
(`string[]`): An array of words extracted from the string. | ||
|
||
## Examples | ||
|
||
```typescript | ||
words('fred, barney, & pebbles'); | ||
// => ['fred', 'barney', 'pebbles'] | ||
|
||
words('camelCaseHTTPRequest🚀'); | ||
// => ['camel', 'Case', 'HTTP', 'Request', '🚀'] | ||
|
||
words('Lunedì 18 Set'); | ||
// => ['Lunedì', '18', 'Set'] | ||
``` | ||
|
||
## Lodash Compatibility | ||
|
||
To ensure full compatibility with lodash, you can import `words` from `es-toolkit/compat`. | ||
|
||
- `words` also takes an optional second parameter, `pattern`, which allows you to define custom patterns for splitting the string. | ||
- `words` will automatically convert the first argument to a string if it isn't one already. | ||
|
||
```typescript | ||
import { words } from 'es-toolkit/compat'; | ||
|
||
words('fred, barney, & pebbles', /[^, ]+/g); | ||
// Returns ['fred', 'barney', '&', 'pebbles'] | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
# words | ||
|
||
将字符串拆分为单词数组。它可以识别 ASCII 和 Unicode 字符作为单词。 | ||
|
||
## 签名 | ||
|
||
```ts | ||
function words(str: string): string[]; | ||
``` | ||
|
||
### 参数 | ||
|
||
- `str` (`string`): 要拆分为单词的字符串。 | ||
|
||
### 返回值 | ||
|
||
(`string[]`): 从字符串中提取的单词数组。 | ||
|
||
## 示例 | ||
|
||
```typescript | ||
words('fred, barney, & pebbles'); | ||
// => ['fred', 'barney', 'pebbles'] | ||
|
||
words('camelCaseHTTPRequest🚀'); | ||
// => ['camel', 'Case', 'HTTP', 'Request', '🚀'] | ||
|
||
words('Lunedì 18 Set'); | ||
// => ['Lunedì', '18', 'Set'] | ||
``` | ||
|
||
## Lodash 兼容性 | ||
|
||
从 `es-toolkit/compat` 中导入 `words` 以实现与 lodash 的完全兼容。 | ||
|
||
- `words` 还接受一个可选的第二个参数 `pattern`,允许您定义自定义模式来拆分字符串。 | ||
- 如果第一个参数不是字符串,`words` 将自动将其转换为字符串。 | ||
|
||
```typescript | ||
import { words } from 'es-toolkit/compat'; | ||
|
||
words('fred, barney, & pebbles', /[^, ]+/g); | ||
// ['fred', 'barney', '&', 'pebbles'] | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
import { describe, expect, it } from 'vitest'; | ||
import { words } from './words'; | ||
|
||
describe('words', () => { | ||
it('splits a simple ASCII comma-separated string into words', () => { | ||
const result = words('fred, barney, & pebbles'); | ||
expect(result).toEqual(['fred', 'barney', 'pebbles']); | ||
}); | ||
|
||
it('splits a string with custom pattern', () => { | ||
const result = words('fred, barney, & pebbles', /[^, ]+/g); | ||
expect(result).toEqual(['fred', 'barney', '&', 'pebbles']); | ||
}); | ||
|
||
it('returns an empty array when input is an empty string', () => { | ||
const result = words(''); | ||
expect(result).toEqual([]); | ||
}); | ||
|
||
it('correctly handles a string with multiple number inputs', () => { | ||
const result = words('+0 -3 +3 -4 +4'); | ||
expect(result).toEqual(['0', '3', '3', '4', '4']); | ||
}); | ||
|
||
it('splits a space-separated string into individual words', () => { | ||
const result = words('split these words'); | ||
expect(result).toEqual(['split', 'these', 'words']); | ||
}); | ||
|
||
it('splits a string representation of an array', () => { | ||
const result = words([1, 2, 3]); | ||
expect(result).toEqual(['1', '2', '3']); | ||
}); | ||
|
||
it('returns an empty array when input is undefined', () => { | ||
const result = words(undefined); | ||
expect(result).toEqual([]); | ||
}); | ||
|
||
it('correctly handles a string with Unicode emojis and special characters', () => { | ||
const result = words('example🚀with✨emojis💡and🔍special🌟characters'); | ||
expect(result).toEqual(['example', '🚀', 'with', '✨', 'emojis', '💡', 'and', '🔍', 'special', '🌟', 'characters']); | ||
}); | ||
}); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
import { CASE_SPLIT_PATTERN } from '../../string/words.ts'; | ||
import { toString } from '../util/toString.ts'; | ||
|
||
/** | ||
* Splits `string` into an array of its words. | ||
* | ||
* @param {string | object} str - The string or object that is to be split into words. | ||
* @param {RegExp | string} [pattern] - The pattern to match words. | ||
* @returns {string[]} - Returns the words of `string`. | ||
* | ||
* @example | ||
* const wordsArray1 = words('fred, barney, & pebbles'); | ||
* // => ['fred', 'barney', 'pebbles'] | ||
* | ||
*/ | ||
export function words(str?: string | object, pattern: RegExp | string = CASE_SPLIT_PATTERN): string[] { | ||
const input = toString(str); | ||
|
||
const words = Array.from(input.match(pattern) ?? []); | ||
|
||
return words.filter(x => x !== ''); | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.