Docker

The suggested way to run this project is with Docker:

docker run -it -v ./config.yml:/app/config.yml -v target_location:/app/output ghcr.io/valentin-metz/tum_video_scraper:master

You'll need to link in the configuration file config.yml. You can find an example in the root of this repository under example_config.yml. The output folder you specify in the config file will be the target location inside the docker container, so make sure to mount your target location to /app/output.

How to find subject identifiers?

Subject identifiers are used to specify the subjects you want to download.

TUM-Live:

For TUM-Live, you can find them in the URL of your lecture series.

Example: https://live.rbg.tum.de/?year=2024&term=W&slug=ws24PiSE

In this case, the subject identifier is 2024/W/ws24PiSE.

First the year (2024 in this case)
Then the term (W for winter term)
Then the unique course slug (ws24PiSE for Patterns in Software Engineering (IN2081))

You need to look at the URL for this one, as they are not consistent between courses / years.

Finally, you can specify the video stream you want to download. Usually TUM-Live offers three:

The combined view (specified with :COMB after the subject identifier)
The presentation view (specified with :PRES after the subject identifier)
The presenter camera view (specified with :CAM after the subject identifier)

Panopto:

For Panopto, you need to supply a folderID. You can find these in the URL of the folder you want to download.

Example: https://tum.cloud.panopto.eu/Panopto/Pages/Sessions/List.aspx#folderID=%22a150c6d5-6cbe-40b0-8dc1-ad0a00967dfb%22

In this case, a150c6d5-6cbe-40b0-8dc1-ad0a00967dfb would be the folderID.

There are `.lock` files in my output folder!

The .lock files are used to prevent the same video from being downloaded twice. They are generated at the start of a run, and if the run gets interrupted, they will not be deleted. If you want to run the scraper again, you'll need to delete the .lock files manually.

You can use this feature to do partial downloads of a lecture series. Simply start the scraper, interrupt it after the .lock files have been created, and delete only those .lock files of which you want to download the videos.

You won't need anything below this line if you are running from Docker.

Installation

If you want to run this project directly from the python source, you'll need to install the following system dependencies:

python  >= 3.11
ffmpeg  >= 6.1
firefox >= 120.0
geckodriver >= 0.33

In addition to that, you'll need the python dependencies specified in requirements.txt. Create a virtual environment (in the project folder) and install project-dependencies into it:

python3 -m venv venv
source ./venv/bin/activate
python3 -m pip install -U pip
python3 -m pip install -U -r requirements.txt

Run the project with:

python3 src/main.py -c config.yml

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
.github/workflows		.github/workflows
src		src
test		test
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
example_config.yml		example_config.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Docker

How to find subject identifiers?

TUM-Live:

Panopto:

There are `.lock` files in my output folder!

Installation

About

Releases

Packages

Contributors 4

Languages

Valentin-Metz/tum_video_scraper

Folders and files

Latest commit

History

Repository files navigation

Docker

How to find subject identifiers?

TUM-Live:

Panopto:

There are .lock files in my output folder!

Installation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

There are `.lock` files in my output folder!

Packages