Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(python): Update Excel and database pages in user guide #14721

Merged
merged 4 commits into from
Mar 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions docs/user-guide/io/database.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,13 +41,13 @@ $ pip install connectorx

ADBC (Arrow Database Connectivity) is an engine supported by the Apache Arrow project. ADBC aims to be both an API standard for connecting to databases and libraries implementing this standard in a range of languages.

It is still early days for ADBC so support for different databases is still limited. At present drivers for ADBC are only available for [Postgres and SQLite](https://arrow.apache.org/adbc/0.1.0/driver/cpp/index.html). To install ADBC you need to install the driver for your database. For example to install the driver for SQLite you run
It is still early days for ADBC so support for different databases is limited. At present, drivers for ADBC are only available for [Postgres](https://pypi.org/project/adbc-driver-postgresql/), [SQLite](https://pypi.org/project/adbc-driver-sqlite/) and [Snowflake](https://pypi.org/project/adbc-driver-snowflake/). To install ADBC, you need to install the driver for your database. For example, to install the driver for SQLite, you run:

```shell
$ pip install adbc-driver-sqlite
```

As ADBC is not the default engine you must specify the engine as an argument to `pl.read_database_uri`
As ADBC is not the default engine, you must specify the engine as an argument to `pl.read_database_uri`.

{{code_block('user-guide/io/database','adbc',['read_database_uri'])}}

Expand All @@ -57,7 +57,7 @@ We can write to a database with Polars using the `pl.write_database` function.

### Engines

As with reading from a database above Polars uses an _engine_ to write to a database. The currently supported engines are:
As with reading from a database above, Polars uses an _engine_ to write to a database. The currently supported engines are:

- [SQLAlchemy](https://www.sqlalchemy.org/) and
- Arrow Database Connectivity (ADBC)
Expand All @@ -78,6 +78,6 @@ In the SQLAlchemy approach, Polars converts the `DataFrame` to a Pandas `DataFra

#### ADBC

As with reading from a database, you can also use ADBC to write to a SQLite or Posgres database. As shown above, you need to install the appropriate ADBC driver for your database.
ADBC can also be used to write to a database. Writing is supported for the same databases that support reading with ADBC. As shown above, you need to install the appropriate ADBC driver for your database.

{{code_block('user-guide/io/database','write_adbc',['write_database'])}}
15 changes: 11 additions & 4 deletions docs/user-guide/io/excel.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,21 @@ From a performance perspective, we recommend using other formats if possible, su

## Read

Polars does not have a native Excel reader.
Instead, it uses external libraries to parse Excel files into objects that Polars can parse.
To read Excel files, we must install either the (default) xlsx2csv library or one of the alternatives as an additional dependency.
Polars does not have a native Excel reader. Instead, it uses external libraries to parse Excel files into objects that Polars can parse. The available engines are:

- xlsx2csv: This is the current default.
- openpyxl: Typically slower than xls2csv, but can provide more flexibility for files that are difficult to parse.
- pyxlsb: For reading binary Excel files (xlsb).
- fastexcel: This reader is based on [calamine](https://github.com/tafia/calamine) and is typically the fastest reader but has fewer features than xls2csv.

Although fastexcel is not the default at this point, we recommend trying fastexcel first and using xlsx2csv or openpyxl if you encounter issues.

To use one of these engines, the appropriate Python package must be installed as an additional dependency.

=== ":fontawesome-brands-python: Python"

```shell
$ pip install xlsx2csv openpyxl pyxlsb
$ pip install xlsx2csv openpyxl pyxlsb fastexcel
```

The default Excel reader is xlsx2csv.
Expand Down