sed: Branching and flow control
6.4 Branching and Flow Control
==============================
The branching commands ‘b’, ‘t’, and ‘T’ enable changing the flow of
‘sed’ programs.
By default, ‘sed’ reads an input line into the pattern buffer, then
continues to processes all commands in order. Commands without
addresses affect all lines. Commands with addresses affect only
matching lines. ⇒Execution Cycle and ⇒Addresses overview.
‘sed’ does not support a typical ‘if/then’ construct. Instead, some
commands can be used as conditionals or to change the default flow
control:
‘d’
delete (clears) the current pattern space, and restart the program
cycle without processing the rest of the commands and without
printing the pattern space.
‘D’
delete the contents of the pattern space _up to the first newline_,
and restart the program cycle without processing the rest of the
commands and without printing the pattern space.
‘[addr]X’
‘[addr]{ X ; X ; X }’
‘/regexp/X’
‘/regexp/{ X ; X ; X }’
Addresses and regular expressions can be used as an ‘if/then’
conditional: If [ADDR] matches the current pattern space, execute
the command(s). For example: The command ‘/^#/d’ means: _if_ the
current pattern matches the regular expression ‘^#’ (a line
starting with a hash), _then_ execute the ‘d’ command: delete the
line without printing it, and restart the program cycle
immediately.
‘b’
branch unconditionally (that is: always jump to a label, skipping
or repeating other commands, without restarting a new cycle).
Combined with an address, the branch can be conditionally executed
on matched lines.
‘t’
branch conditionally (that is: jump to a label) _only if_ a ‘s///’
command has succeeded since the last input line was read or another
conditional branch was taken.
‘T’
similar but opposite to the ‘t’ command: branch only if there has
been _no_ successful substitutions since the last input line was
read.
The following two ‘sed’ programs are equivalent. The first
(contrived) example uses the ‘b’ command to skip the ‘s///’ command on
lines containing ‘1’. The second example uses an address with negation
(‘!’) to perform substitution only on desired lines. The ‘y///’ command
is still executed on all lines:
$ printf '%s\n' a1 a2 a3 | sed -E '/1/bx ; s/a/z/ ; :x ; y/123/456/'
a4
z5
z6
$ printf '%s\n' a1 a2 a3 | sed -E '/1/!s/a/z/ ; y/123/456/'
a4
z5
z6
6.4.1 Branching and Cycles
--------------------------
The ‘b’,‘t’ and ‘T’ commands can be followed by a label (typically a
single letter). Labels are defined with a colon followed by one or more
letters (e.g. ‘:x’). If the label is omitted the branch commands
restart the cycle. Note the difference between branching to a label and
restarting the cycle: when a cycle is restarted, ‘sed’ first prints the
current content of the pattern space, then reads the next input line
into the pattern space; Jumping to a label (even if it is at the
beginning of the program) does not print the pattern space and does not
read the next input line.
The following program is a no-op. The ‘b’ command (the only command
in the program) does not have a label, and thus simply restarts the
cycle. On each cycle, the pattern space is printed and the next input
line is read:
$ seq 3 | sed b
1
2
3
The following example is an infinite-loop - it doesn’t terminate and
doesn’t print anything. The ‘b’ command jumps to the ‘x’ label, and a
new cycle is never started:
$ seq 3 | sed ':x ; bx'
# The above command requires gnu sed (which supports additional
# commands following a label, without a newline). A portable equivalent:
# sed -e ':x' -e bx
Branching is often complemented with the ‘n’ or ‘N’ commands: both
commands read the next input line into the pattern space without waiting
for the cycle to restart. Before reading the next input line, ‘n’
prints the current pattern space then empties it, while ‘N’ appends a
newline and the next input line to the pattern space.
Consider the following two examples:
$ seq 3 | sed ':x ; n ; bx'
1
2
3
$ seq 3 | sed ':x ; N ; bx'
1
2
3
• Both examples do not inf-loop, despite never starting a new cycle.
• In the first example, the ‘n’ commands first prints the content of
the pattern space, empties the pattern space then reads the next
input line.
• In the second example, the ‘N’ commands appends the next input line
to the pattern space (with a newline). Lines are accumulated in
the pattern space until there are no more input lines to read, then
the ‘N’ command terminates the ‘sed’ program. When the program
terminates, the end-of-cycle actions are performed, and the entire
pattern space is printed.
• The second example requires GNU ‘sed’, because it uses the
non-POSIX-standard behavior of ‘N’. See the “‘N’ command on the
last line” paragraph in ⇒Reporting Bugs.
• To further examine the difference between the two examples, try the
following commands:
printf '%s\n' aa bb cc dd | sed ':x ; n ; = ; bx'
printf '%s\n' aa bb cc dd | sed ':x ; N ; = ; bx'
printf '%s\n' aa bb cc dd | sed ':x ; n ; s/\n/***/ ; bx'
printf '%s\n' aa bb cc dd | sed ':x ; N ; s/\n/***/ ; bx'
6.4.2 Branching example: joining lines
--------------------------------------
As a real-world example of using branching, consider the case of
quoted-printable (https://en.wikipedia.org/wiki/Quoted-printable) files,
typically used to encode email messages. In these files long lines are
split and marked with a “soft line break” consisting of a single ‘=’
character at the end of the line:
$ cat jaques.txt
All the wor=
ld's a stag=
e,
And all the=
men and wo=
men merely =
players:
They have t=
heir exits =
and their e=
ntrances;
And one man=
in his tim=
e plays man=
y parts.
The following program uses an address match ‘/=$/’ as a conditional:
If the current pattern space ends with a ‘=’, it reads the next input
line using ‘N’, replaces all ‘=’ characters which are followed by a
newline, and unconditionally branches (‘b’) to the beginning of the
program without restarting a new cycle. If the pattern space does not
ends with ‘=’, the default action is performed: the pattern space is
printed and a new cycle is started:
$ sed ':x ; /=$/ { N ; s/=\n//g ; bx }' jaques.txt
All the world's a stage,
And all the men and women merely players:
They have their exits and their entrances;
And one man in his time plays many parts.
Here’s an alternative program with a slightly different approach: On
all lines except the last, ‘N’ appends the line to the pattern space. A
substitution command then removes soft line breaks (‘=’ at the end of a
line, i.e. followed by a newline) by replacing them with an empty
string. _if_ the substitution was successful (meaning the pattern space
contained a line which should be joined), The conditional branch command
‘t’ jumps to the beginning of the program without completing or
restarting the cycle. If the substitution failed (meaning there were no
soft line breaks), The ‘t’ command will _not_ branch. Then, ‘P’ will
print the pattern space content until the first newline, and ‘D’ will
delete the pattern space content until the first new line. (To learn
more about ‘N’, ‘P’ and ‘D’ commands ⇒Multiline techniques).
$ sed ':x ; $!N ; s/=\n// ; tx ; P ; D' jaques.txt
All the world's a stage,
And all the men and women merely players:
They have their exits and their entrances;
And one man in his time plays many parts.
For more line-joining examples ⇒Joining lines.