The suggested way to run this project is with Docker:
docker run -it -v ./config.yml:/app/config.yml -v target_location:/app/output ghcr.io/valentin-metz/tum_video_scraper:master
You'll need to link in the configuration file config.yml
.
You can find an example in the root of this repository under example_config.yml
.
The output folder you specify in the config file will be the target location inside the docker container,
so make sure to mount your target location to /app/output
.
Subject identifiers are used to specify the subjects you want to download.
For TUM-Live, you can find them in the URL of your lecture series.
Example:
https://live.rbg.tum.de/?year=2024&term=W&slug=ws24PiSE
In this case, the subject identifier is 2024/W/ws24PiSE
.
- First the
year
(2024
in this case) - Then the
term
(W
for winter term) - Then the unique course
slug
(ws24PiSE
forPatterns in Software Engineering (IN2081)
)
You need to look at the URL for this one, as they are not consistent between courses / years.
Finally, you can specify the video stream you want to download. Usually TUM-Live offers three:
- The combined view (specified with
:COMB
after the subject identifier) - The presentation view (specified with
:PRES
after the subject identifier) - The presenter camera view (specified with
:CAM
after the subject identifier)
For Panopto, you need to supply a folderID
.
You can find these in the URL of the folder you want to download.
Example: https://tum.cloud.panopto.eu/Panopto/Pages/Sessions/List.aspx#folderID=%22a150c6d5-6cbe-40b0-8dc1-ad0a00967dfb%22
In this case, a150c6d5-6cbe-40b0-8dc1-ad0a00967dfb
would be the folderID
.
The .lock
files are used to prevent the same video from being downloaded twice.
They are generated at the start of a run, and if the run gets interrupted, they will not be deleted.
If you want to run the scraper again, you'll need to delete the .lock
files manually.
You can use this feature to do partial downloads of a lecture series.
Simply start the scraper, interrupt it after the .lock
files have been created,
and delete only those .lock
files of which you want to download the videos.
You won't need anything below this line if you are running from Docker.
If you want to run this project directly from the python source, you'll need to install the following system dependencies:
python >= 3.11
ffmpeg >= 6.1
firefox >= 120.0
geckodriver >= 0.33
In addition to that, you'll need the python dependencies specified in requirements.txt
.
Create a virtual environment (in the project folder) and install project-dependencies into it:
python3 -m venv venv
source ./venv/bin/activate
python3 -m pip install -U pip
python3 -m pip install -U -r requirements.txt
Run the project with:
python3 src/main.py -c config.yml