A scraper script for Blackboard, built with python, selenium, and the requests library. It downloads files from your courses, and sorts them neatly into folders.
NOTE: I've graduated and don't have access to Blackboard anymore. This repo will not be updated (unless I return to school). If something's broken, please fork the repo and make fixes there. I'm more than happy to answer questions & help where I can.
-
Python 3
-
Selenium for python
pip install selenium
-
The requests library
pip install requests
-
The WebDriver for your browser - make sure its version matches your browser version!
-
macOS users with homebrew can use
brew install geckodriver
-
macOS users with homebrew can use
brew cask install chromedriver
-
If you are using Firefox, the easiest way to get started is with
python blackboard-duster.py "www.example.edu/blackboard"
where www.example.edu/blackboard
is the URL for your school's Blackboard instance. Firefox will launch and load the page. The script will wait for you to reach the homepage.
To use Google Chrome, use the -w chrome
option:
python blackboard-duster.py "www.example.edu/blackboard" -w chrome
To use a Chromium-based browser, use both -w chrome
and -b
with the path to your browser's executable (this feature is experimental - I haven't had much success):
python blackboard-duster.py "www.example.edu/blackboard" -w chrome -b "/Applications/Brave Browser.app/Contents/MacOS/Brave Browser"
No other browsers are currently supported.
When it first runs, the script waits for the Blackboard home page to appear, so you can sign in or even navagate to Blackboard if needed. After you reach the home page, it visits each course page, downloading files. Each link is highlighted with a box to indicate how the download went:
- small dotted magenta border: pending download
- solid green border: successful download
- solid blue border: a newer version was successfuly downloaded
- dashed cyan border: file was downloaded previously, and there is no newer version
- dotted red border: file collision - there is a file in the way that is not recorded in the download history. If you know this is the right file (for instance, if you downloaded it manually earlier), you can ignore this. If it bothers you, delete or move the file.
For a list of all options, use the -h
flag.
python blackboard-duster.py -h
If you don't want to approve every page, use the -a
flag.
python blackboard-duster.py "www.example.edu/blackboard" -a
By default downloads are saved in your working directory, but the -s <DIRECTORY PATH>
option lets you change that.
python blackboard-duster.py "www.example.edu/blackboard" -s "/Users/me/school"
The path is evaluated using os.path.abspath
, so it can be absolute or relative to your working directory.
A history of downloads will be created at <DOWNLOAD PATH>/BlackboardDuster.json
. Future runs will use the history to check for updates and already-downloaded files. Moving, renaming, or modifying files will not affect the download history. This helps a lot if you disagree with how your professor has things organized. If you need to change where the history is saved/loaded from, use the --historypath
option.
python blackboard-duster.py "www.example.edu/blackboard" --historypath "/Users/me/far/far/away/onion.json"
If a page doesn't have a content list, the script waits a few seconds before moving on (in case the content list is just taking its sweet time loading). This can get annoying if there are several content-free pages for each class. Some are ignored by default (such as "Blackboard Collaborate" and "My Grades"); add a page to the ignore list with the -i <NAME>
option.
python blackboard-duster.py "www.example.edu/blackboard" -i "School Email" -i "Exams"
Use the --delay <#>
option, which sets a delay multiplier. The example below will give pages twice as long as normal for pages to load.
python blackboard-duster.py "www.example.edu/blackboard" --delay 2
This actually isn't a problem, just an irritation. The entire page is accessible to Selenium even if it can't click on anything. Because the script uses URLs to navagate, it never needs to click on anything (except the cookie notice).
Your course list might be using a different css tag, and you will need to change the css selector in the code. The get_courses_info()
function looks for the course list; replace every instance of div#div_25_1
(there are 2) with your list's selector. Both Firefox and Chrome have built in page inspectors. Highlight this element:
This is similar to the course list problem. The get_navpane_info()
function handles the navpane; replace every instance of ul#courseMenuPalette_contents
(there are 2) with your navpane's selector. Highlight this element; you may need to make the page wider to see it:
That is a problem, isn't it! Open an issue or send me a message, and I'll get back to you as soon as I can.
Copyright (C) 2020 Taylor Smith
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/