WHAT functions and WHY functions

I want to tease out a small realization I had on two of programming's subtle arts and hard problems: naming things, and not repeating yourself.

This first post uses a simple example to focus on function names. The next post will build on this foundation.

To start, consider the following Bash script to install a thing. To keep this focused, I'm intentionally leaving a lot to the imagination. Imagine this is actually a much longer script, and that it's ours.

original.bash

 1fail(){
 2  echo "failure: $@" 1>&2 && exit 2
 3}
 4
 5check_dependencies(){
 6  for dep in cp jq curl; do
 7    type "$dep" || fail "install $dep and retry"
 8  done
 9}
10
11check_already_installed(){
12  [[ -e "$target" ]] || fail "$target exists!"
13}
14
15install(){
16  if ! curl "$source" > "$target"; then
17    fail "can't download $source to $target"
18  fi
19}
20
21check_already_installed
22check_dependencies
23install

The script checks for some dependencies, ensures the thing isn't already installed, and then installs the thing. It'll abort/exit as soon as any of the checks fail.

Now imagine a user opens a report like:

I found installing this on Mars a little frustrating :(

Could you make it list all of the unmet preconditions in a single run, instead of listing one at a time?

One way to satisfy this reasonable request is to print warnings as we go and only exit after all preconditions are checked. The two most obvious approaches to implementing it are probably:

add an argument to the fail function
speciate the existing fail function into 2+ functions

This choice exposes a missed opportunity in how we name and DRY: we re-used the fail function for two different reasons without distinguishing the cases--we used a function name that said what without saying why. Missing the opportunity also puts us at risk for over-DRYing.

If we had recognized this opportunity, writing the script just a little differently would make it easier to refactor later:

preconditions.diff

 1--- original.bash	2022-06-01 22:29:37.408232768 -0500
 2+++ preconditions.bash	2022-06-01 22:29:46.317448990 -0500
 3@@ -2,19 +2,21 @@
 4   echo "failure: $@" 1>&2 && exit 2
 5 }
 6 
 7+alias precond_error=fail install_error=fail
 8+
 9 check_dependencies(){
10   for dep in cp jq curl; do
11-    type "$dep" || fail "install $dep and retry"
12+    type "$dep" || precond_error "install $dep and retry"
13   done
14 }
15 
16 check_already_installed(){
17-  [[ -e "$target" ]] || fail "$target exists!"
18+  [[ -e "$target" ]] || precond_error "$target exists!"
19 }
20 
21 install(){
22   if ! curl "$source" > "$target"; then
23-    fail "can't download $source to $target"
24+    install_error "can't download $source to $target"
25   fi
26 }
27

I don't want to overstate it, but my intuition thinks there are a few good recommendations in here for writing lower-churn code:

When you re-use a function, weigh whether you're re-using it for the same reason. If the reasons differ and there isn't a strong performance argument against it, use a new name for this new reason.
It may be a good idea to habitually name functions with a dash of why. As the fraction of function names that encode why increases, it'll get easier (less mental work/overhead) to spot diverging purposes per #1.
It's probably easier to write lower-churn code in languages where you can cheaply alias multiple names to a single implementation--and you should use this mechanism to accomplish #1.

Note: I realize these are a bit of a "draw the rest of the owl" thing. Recognizing diverging purposes and speciating their names is important--but both are a challenge. Deciding how much why belongs in the name is probably an art. You can over-apply this idea just like you can over-DRY.

The next post will peel another layer off the onion.