Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimal, Streamlined, Powerful (Improvement for an older PR) #1419

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

emrakyz
Copy link
Contributor

@emrakyz emrakyz commented Jun 2, 2024

The inspiration: (Luke Smith's Video): "Unix Chad JUST WON'T STOP Piping!"

Execution Time: Almost instant with more than a million files.

  • [dmenu] Select a category (Video, Music, Image, Doc, Office).
  • [dmenu] See all of the files in the related category (without path and extensions, just names).
  • [mpv | zathura | nsxiv | libreoffice] Open the selected file directly.

REQUIREMENTS

  • Nothing but plocate because it is extremely fast and light: pacman -S plocate
  • Create a database first: sudo updatedb -o "${XDG_CONFIG_HOME}/.p.db" -U "/"
  • Run the script.

Why Is It Good for Luke's Audience?

  • There are a lot of people who also message me about related topics; people who would like to improve their use of computers and learn related skills especially being inspired by Luke's style. In my opinion, the key is to be inspired rather than being taught which Luke expresses a similar idea in one of his videos, "Why All Teaching is Ineffective" and which Luke aims to do most of the time instead of doing educational videos sequentially.
  • This script, while extremely simple and short, contains lots of best practices; portable and POSIX shell usage; lots of different concepts on shell scripting such as functions, case statements, checks, command chaining with logical control operators, pipes, variables, proper variable handling, data handling, command substitution, printing, searching, splitting prevention and splitting when needed, regular expressions and text manipulation. More importantly, I aim for it to be thought-provoking in contrast with these technical aspects.
  • Since there are less to look at but more to learn and/or imitate; it can be a great tool to learn or be inspired.
  • For extra curious people or learners; I will explain everything in a detailed way below.

Explanations for Everything for Learners

Create a function named f. Functions f() { ... } are for creating command blocks to be used later. Since we will need to use this function multiple times; it is better to streamline rather than to write the same commands again and again. This improves minimalism and streamlinedness while enhancing portability and modularity at the same time.

r="$(locate -d "${XDG_CONFIG_HOME}/.p.db" -b -r ".*\.\(${1}\)$")"

  • Create a variable named r.
  • Create a command substitution with $() and put it inside double quotes to prevent splitting (After the command runs and it generates an output, some characters such as spaces can induce splitting which makes a single variable seem like two different variables. Sometimes this is needed but most of the time it's not).
  • Command substitution means: Run the command and generate an output. Since it's inside a variable; the generated output becomes our variable.
  • Use locate command with the pre-generated database named .p.db which is located inside the user configuration directory .config.
  • -d flag is used before we show the location of the database.
  • -b flag is used for basename matching, excluding directory matches from the list.
  • -r means regular expressions. After this flag we show the command the exact search phrase using regex. In this case we aim to match for:
  1. .* Means any character . in any amount *. According to regex, dot matches everything 0 or more times.
  2. \. a literal dot. Backslash is used; otherwise it is interpreted as any character rather than a dot because again, dot means any character in regex. Two of these matches aim to match everything up until and including the last dot of found file names.
  3. The next part inside parentheses \( \) which we also escape with backslashes is for capturing the extensions in a group. Parentheses are used for grouping and escaped to ensure they are not interpreted as normal characters. Inside those parentheses, we will have our extension lists. $1 in this case means the first argument for our function which will take up extensions. In this example: f argument the word argument becomes $1. In this case we use curly brackets around the variable to ensure differentiating. Curly brackets ${1} differentiate the variable name from the surrounding text. The variable was already inside quotes so we didn't have to put quotes exclusively. In fact putting extra exclusive quotes, would actually unquote the said variable. It is the best practice to always use quotes and curly brackets with variables on shells to prevent splitting and ensure differentiating. In some rare cases (which we also have in this script) we may want to split the output.

o="$(printf "%s\n" "${r}" | sed 's|.*/||;s/\.[^.]*$//' | dmenu -i -p "${c}" -l "20")"

  • Create another variable named o.
  • Create another command substitution for it: $(), again to be quoted.
  • We use three different commands chained to each other with pipes: |. Pipes take the standard output of the prior command and send it to the next command as an input.
  • printf is for printing to standard output (i.e your terminal). "%s\n" means print all variables %s and add new lines \n after each of them. printf generally accepts two arguments inside quotes. In this case, one is "%s\n" and the other one is the actual variable we want to print which is $r. Again, we put curly brackets and quotes: "${r}". "${r}" here means the output of our file search above which we named r. So we aim to print the path of all files found which match our search pattern.
  • The redirected output from printf is edited with the stream editor sed. sed generally accepts a replacement logic inside single quotes (double quotes are required for variable expansion). Inside we used a semicolon sed ' ; '. Semicolon or new lines or -e flag is used when you want to apply more than one replacement. On the left and the right side of the semicolons we have our actual replacement logic:
  1. s means replace. | is used as a delimiter.
  2. Replace any character . in any amount * that ends with a forward slash /. This is the complete representation: .*/.
  3. We used | as a delimited instead of a forward slash because we already have a forward slash character for our replacement so we needed to make it less complex and easier to read. That's why sed accepts different delimiters.
  4. Then we replace the matched pattern .*/ with nothing by using one more delimiter immediately |.
  • For the second replacement in sed; we match a literal dot by escaping the dot with a backslash \. and then we match anything but a dot [^.]. Normally ^ means the start of the line but inside brackets ^ means not. So, it means: Match anything but a dot.

  • We inform sed that it can repeat the last matched character which is anything but a dot can repeat 0 or more times [^.]*. So it's completely optional but if the searched character is there, it will definitely match that no matter how many times it happens (we mainly want to target the extension here such as mp3.

  • We also inform sed that the last match would be our end of the line $. Dollar sign means the end of the line in regular expressions. So, to sum up we matched a dot and anything but a dot at the end of the line in any amount \.[^.]* As an example this could be .mp3 or .xlsx

  • Then we also replaced the second match with nothing effectively deletingthe extension.

  • At this point we have raw filenames divided from their paths and extension. For instance, instead of /path/to/the/video_file.mp4 we have video_file so we can see, navigate, filter in a better way.

  • We pipe the output of the text manipulated lines into dmenu to interactively see, and filter them.

  • -i flag for dmenu allows us to use uppercase letters in our filters.

  • -p flag is for naming the title of the menu. We have the title "${c}" based on our category selection which will be explained later.

  • -l is for using a vertical menu with the indicated number of lines 20.

  • [ "${o}" ] && ${2} "$(printf "%s\n" "${r}" | grep -Fm "1" "/${o}.")"

  • We check if the variable o is empty or not. Normally this is checked by the -n flag such as [ -n $VAR ] but the other possibility is to just write the variable inside brackets.

  • After the check, we have the control operator &&. This means, apply the next command on this line, only if the prior command succeeds. So if [ "${o}" ] returns a fail meaning if the variable o is empty, the shell will try to jump to the next line. If it exists, it will continue with ${2}. Since we are still in the f function, $2 shows the second argument added to the function such as f first second. In this time we still use double brackets but we don't use quotes because ${2} will target shell commands. We want this output to be interpreted as different arguments. For example mpv --no-audio consists of two different arguments so if we use double quotes; this will be interpreted as a single argument which would lead this command to be failed.

  • So let's say "${o}" is not empty and ${2} is mpv. We create another command substitution $() whose output will be used by mpv such as mpv video_file.mkv.

  • We print the search output again with printf (the raw output which the path and the extension not removed). Then we pipe the output, this time into grep to search for patterns inside the text input. We assumed "${o}" is not empty which corresponds to a selected file name. So we print the raw output "${r}" and search a forward slash and the selected filename and the dot /filename. but selected filename is "${o}" in our script so it is shown as "/${o}.". You may wonder why we do not escape the last dot. grep with its -F flag disables regular expression matching, operating faster and enables literal match. We already have the complete file name so we don't need a regex match. The forward slash / shows the last separator to the file path and then ${o} shows the filename and then . shows the last character before the extension. -m 1 is for printing only one file (if there are multiple files with the exact same name) because we can't open two files at the same time.

The above part was the actual logic of the script. The second part is higher level.

c="$(printf "Video\nDoc\nImage\nOffice\nMusic\n" | dmenu -i -p "Categories" -l "5")"

  • Create a variable named c; and abbreviation for categories.

  • Print the categories we have (similar categories can be added as \nNewCategory).

  • Send the available categories to the dmenu with the Categories title.

  • After the selection "${c}" will become on of these categories.

  • Then we have a case statement that is based on the variable c.

case "${c}" in "Video") command ;;

  • The above snippet means, where the selected category "${c}" is "Video", run the command on this line up until ;;

"Video") f "mp4\|mkv\|webm\|mov\|m4v\|wmv\|flv\|avi\|gif\|m2ts" "mpv" ;;

  • If the Video category is selected, use the f function which is the only function we have on this script; with these extensions as its first argument "${1}" and "mpv" as its second argument affectively showing mpv as the opener and video extensions for the search pattern. We used \| between extensions because it's required for the regex syntax for locate. We escape the OR | delimiter with a backslash to make each of them literal.

All other cases use the similar logic; and then we close the case statement with esac.

@TheYellowArchitect
Copy link
Contributor

TheYellowArchitect commented Jul 11, 2024

Used it and can confirm it works. It is a simple, short, and useful script, so I don't see any reason not to merge.

Request/Addition:

  1. Office, to include the .csv extension
  2. Videos, to exclude videos under 1 second (ffprobe?)

As for the end-user, if it is to be used as-is without any editing, it is great to search for something in the entire pc (includes external drives!) by media type, for those moments you know a file is in your computer but have no idea where.
But I cannot see it being used as-is daily without personal configuration. But with your comments, and how simple the script is, that's the point ;)

For example, on Music, it included voice clips I have extracted from videogames, so with a simple tweak (filter by directory or by filename regex), I can split to Voice and Music.
Same goes for Images. I have a billion random pictures from html files (it takes 3 seconds for this script to fill the search list with image filenames lol), so I would probably configure this script to remove from html folders (so many thumbnails in these) by filtering away those whose dimensions are less than 144x144

Anyway, great script. Thanks for sharing it! (and instructions to write bash)

A comment on how to cron updatedb on artix systems would be a luxurious addition xaxaxa

@emrakyz
Copy link
Contributor Author

emrakyz commented Jul 12, 2024

A comment on how to cron updatedb on artix systems would be a luxurious addition

I prefer dcron because it's the most minimal one and I don't need advanced functionality. It should be same for the others:

Automated way:
echo '@daily updatedb -o "/home/username/.config/.p.db" -U "/"' | sudo crontab -u username -

Manual way:

crontab -e
@daily updatedb -o "/home/username/.config/.p.db" -U "/"
Save and Quit

Obviously you need to start your service at the boot level.
systemd: systemctl enable "dcron"
openrc: rc-update add "dcron" default
runit: ln -s /etc/runit/sv/dcron /run/runit/service/ && sv start dcron

On the other hand. You are right that this script is extremely minimal, simple, extendable and documented. So the actual idea is to be enhanced by the users. As you stated, there are ways to exclude directories. Or you can even use different databases.

Adding .csv is extremely easy, you can simply add it similarly to others.

If you want to exclude short videos you can create a logic: If you select Videos, it can use the below command with all of the search but this would make the script slower. You need to play with this. I have never needed something like this and I think this is a very niche use case. At this point it can even be easier to use different databases instead.

ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants