Skip to content

Commit

Permalink
Merge pull request #44 from data-integrations/username-colon
Browse files Browse the repository at this point in the history
PLUGIN-1524 redesign the plugin properties to split up path field
  • Loading branch information
albertshau authored Jul 26, 2023
2 parents ced248c + e33e6aa commit 1f8c89d
Show file tree
Hide file tree
Showing 13 changed files with 1,746 additions and 665 deletions.
34 changes: 25 additions & 9 deletions docs/FTP-batchsource.md → docs/FTPSource-batchsource.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
# FTP Batch Source


Description
-----------
Batch source for an FTP or SFTP source. Prefix of the path ('ftp://...' or 'sftp://...') determines the source server
Expand All @@ -16,8 +15,17 @@ Properties
----------
**Reference Name:** Name used to uniquely identify this source for lineage, annotating metadata, etc.

**Path:** Path to file(s) to be read. The path uses filename expansion (globbing) to read files.
Path is expected to be of the form prefix://username:password@hostname:port/path
**Server Type:** Whether to read from an FTP or SFTP server

**Host:** Host to read from.

**Port:** Optional port to read from. If no port is given, it will default to 21 for FTP and 22 for SFTP.

**Path:** Path to the file or directory to read from. For example: /path/to/directory.

**User:** User name to use for authentication.

**Password:** Password to use for authentication.

**Format:** Format of the data to read.
The format must be one of 'blob', 'csv', 'delimited', 'json', 'text', 'tsv', or the
Expand All @@ -38,20 +46,28 @@ JSON - is not supported. You must manually provide the output schema.

**Delimiter:** Delimiter to use when the format is 'delimited'. This will be ignored for other formats.

**Use First Row as Header:** Whether to use the first line of each file as the column headers. Supported formats are 'text', 'csv', 'tsv', and 'delimited'.
**Use First Row as Header:** Whether to use the first line of each file as the column headers. Supported formats are '
text', 'csv', 'tsv', and 'delimited'.

**Enable Quoted Values** Whether to treat content between quotes as a value. This value will only be used if the format
is 'csv', 'tsv' or 'delimited'. For example, if this is set to true, a line that looks like `1, "a, b, c"` will output two fields.
The first field will have `1` as its value and the second will have `a, b, c` as its value. The quote characters will be trimmed.
is 'csv', 'tsv' or 'delimited'. For example, if this is set to true, a line that looks like `1, "a, b, c"` will output
two fields.
The first field will have `1` as its value and the second will have `a, b, c` as its value. The quote characters will be
trimmed.
The newline delimiter cannot be within quotes.

It also assumes the quotes are well enclosed. The left quote will match the first following quote right before the delimiter. If there is an
It also assumes the quotes are well enclosed. The left quote will match the first following quote right before the
delimiter. If there is an
unenclosed quote, an error will occur.

**Regex Path Filter:** Regex to filter out files in the path. It accepts regular expression which is applied to the complete
**Enable Multiline Support** Enable the support for a single field, enclosed in quotes, to span over multiple lines.
This value will only be used if the format is 'csv', 'tsv' or 'delimited'. The default value is false.

**Regex Path Filter:** Regex to filter out files in the path. It accepts regular expression which is applied to the
complete
path and returns the list of files that match the specified pattern.

**Allow Empty Input:** Identify if path needs to be ignored or not, for case when directory or file does not
exists. If set to true it will treat the not present folder as 0 input and log a warning. Default is false.

**File System Properties:** Additional properties to use with the InputFormat when reading the data.
**File System Properties:** Additional properties to use with the InputFormat when reading the data.
File renamed without changes
15 changes: 13 additions & 2 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@

<groupId>io.cdap.plugin</groupId>
<artifactId>ftp-plugins</artifactId>
<version>3.3.0-SNAPSHOT</version>
<version>4.0.0-SNAPSHOT</version>

<licenses>
<license>
Expand Down Expand Up @@ -216,8 +216,20 @@
<artifactId>slf4j-api</artifactId>
<groupId>org.slf4j</groupId>
</exclusion>
<exclusion>
<groupId>commons-net</groupId>
<artifactId>commons-net</artifactId>
</exclusion>
</exclusions>
</dependency>
<!--This dependency is added to fix design time validations as NoClassDefFoundError was coming for FTPClient class.
It is already part of hadoop-common library. Make sure to upgrade its version whenever hadoop-common library
version got bumped up. -->
<dependency>
<groupId>commons-net</groupId>
<artifactId>commons-net</artifactId>
<version>3.1</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
Expand Down Expand Up @@ -396,7 +408,6 @@
<includes>
<include>**/*TestSuite.java</include>
<include>**/*Test.java</include>
<exclude>**/*TestRun.java</exclude>
</includes>
<excludes>
<exclude>**/*TestBase.java</exclude>
Expand Down
Loading

0 comments on commit 1f8c89d

Please sign in to comment.