Skip to content

Commit

Permalink
Documentation + example improvements (#77)
Browse files Browse the repository at this point in the history
  • Loading branch information
alexr-bq authored May 7, 2021
1 parent 4d1cd1b commit 9eb82b7
Show file tree
Hide file tree
Showing 11 changed files with 18 additions and 3,080 deletions.
14 changes: 3 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,29 +8,21 @@ Why is this connector desired instead of using a more generic JDBC connector? A
* Segmentation. We can use the way that Vertica segments data to inform our operations.
* Ability to use other technology as an intermediary for data transfer. This is necessary for maximizing performance of parallel loads.


This connector is built as a JAR file to be sourced in your spark application.
This connector is built as a JAR file to be sourced in your spark application. This is accessible through [maven central](https://repo1.maven.org/maven2/com/vertica/spark/vertica-spark/) or you can build the JAR yourself with sbt assembly.

The connector relies on a distributed filesystem, such as HDFS, to act as a bridge between Spark and Vertica. This is done to allow for maximum performance at high scale.

![Overview](img/Overview.png?raw=true "")

## Alpha Disclaimer

This is an alpha version of this connector. Not ready for production use. The purpose of this alpha release is to gather feedback and iterate.

There is no current artifact release. You can build a jar with instructions in the [Contribution Guide](CONTRIBUTING.md).

Once you have a JAR, you can use it as desired in your spark applications. If you are basing such an application on one of our SBT-based examples programs, make a lib/ directory in the project root, and put the JAR there.


## Getting Started

To get started with using the connector, we'll need to make sure all the prerequisites are in place. These are:
- A Vertica installation
- An HDFS cluster, for use as an intermediary between Spark and Vertica
- A spark application, either running locally for quick testing, or running on a spark cluster.

For an easier quick test of the connector using a dockerized environment, see [this guide for running our examples.](examples/README.md)

### Vertica

Follow the [Vertica Documenation](https://www.vertica.com/docs/10.0.x/HTML/Content/Authoring/InstallationGuide/Other/InstallationGuide.htm) for steps on installing Vertica.
Expand Down
9 changes: 5 additions & 4 deletions examples/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
#How to run the examples
# How to run the examples

Make sure you have docker, docker-compose, and sbt installed.
Make sure you have docker and sbt installed.
Tested using docker 20.10.0, sbt 1.4.1

Clone this repository:
```
Expand All @@ -10,7 +11,7 @@ git clone https://github.com/vertica/spark-connector.git
From the project's root directory:
```
cd docker
docker-compose up -d
docker compose up -d
```
This will create a docker image for a client container and docker containers for a sandbox client environment and single-node clusters for both Vertica and HDFS.

Expand All @@ -34,4 +35,4 @@ docker exec docker_hdfs_1 /opt/hadoop/sbin/start-dfs.sh

Run `docker exec -it docker_client_1 /bin/bash` to enter the sandbox client environment.

Now just run `sbt run` from the `/spark-connector/examples/demo` directory.
Now just run `sbt run` from the `/spark-connector/examples/demo` directory.
8 changes: 4 additions & 4 deletions examples/basic-read/src/main/resources/application.conf
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
functional-tests {
host="localhost"
host="vertica"
port=5433
db="testdb"
user="release"
password="password"
db="docker"
user="dbadmin"
password=""
log=true
filepath="hdfs://hdfs:8020/data/"
dirpath="hdfs://hdfs:8020/data/dirtest/"
Expand Down
2,844 changes: 0 additions & 2,844 deletions examples/basic-write/output.txt

This file was deleted.

2 changes: 1 addition & 1 deletion examples/basic-write/src/main/resources/application.conf
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
functional-tests {
host="localhost"
host="vertica"
port=5433
db="docker"
user="dbadmin"
Expand Down
10 changes: 5 additions & 5 deletions examples/demo/src/main/resources/application.conf
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
functional-tests {
host="eng-g9-051"
host="vertica"
port=5433
db="testdb"
user="release"
db="docker"
user="dbadmin"
password=""
log=true
filepath="hdfs://localhost:8020/data/"
dirpath="hdfs://localhost:8020/data/dirtest/"
filepath="hdfs://hdfs:8020/data/"
dirpath="hdfs://hdfs:8020/data/dirtest/"
}

1 change: 0 additions & 1 deletion examples/demo/src/main/scala/example/Main.scala
Original file line number Diff line number Diff line change
Expand Up @@ -256,7 +256,6 @@ object Main extends App {
"basicRead" -> demoCases.basicRead,
"columnPushdown" -> demoCases.columnPushdown,
"filterPushdown" -> demoCases.filterPushdown,
"filterPushdownDate" -> demoCases.filterPushdownDate,
"writeAppendMode" -> demoCases.writeAppendMode,
"writeOverwriteMode" -> demoCases.writeOverwriteMode,
"writeErrorIfExistsMode" -> demoCases.writeErrorIfExistsMode,
Expand Down
24 changes: 0 additions & 24 deletions examples/looped/build.sbt

This file was deleted.

11 changes: 0 additions & 11 deletions examples/looped/src/main/resources/application.conf

This file was deleted.

65 changes: 0 additions & 65 deletions examples/looped/src/main/scala/example/Main.scala

This file was deleted.

110 changes: 0 additions & 110 deletions examples/looped/src/main/scala/example/TestUtils.scala

This file was deleted.

0 comments on commit 9eb82b7

Please sign in to comment.