Solved Confused about saving $@ into a variable

Howdy!

I'm making a script that has to deal with filenames with whitespaces in them and I encoutered an issue. I made a small simple script to demonstrate my question:

foo
sh:
#!/bin/sh

for arg in "$@"; do
    echo "[1] Arg: $arg"
done
########
args="$@"
for arg in "$args"; do
    echo "[2] Arg: $arg"
done
########
args="$@"
for arg in $args; do
    echo "[3] Arg: $arg"
done
########
args=$@
for arg in "$args"; do
    echo "[4] Arg: $arg"
done
########
args=$@
for arg in $args; do
    echo "[5] Arg: $arg"
done

The output of: ./foo arg1\ space arg2\ space
Code:
[1] Arg: arg1 space
[1] Arg: arg2 space
[2] Arg: arg1 space arg2 space
[3] Arg: arg1
[3] Arg: space
[3] Arg: arg2
[3] Arg: space
[4] Arg: arg1 space arg2 space
[5] Arg: arg1
[5] Arg: space
[5] Arg: arg2
[5] Arg: space

Now, a behaviour that I actually want is case [1], when I can iterate over the argument list and have each argument recognized correctly (with whitespaces preserved). But I noticed that there's no way to achieve that if I also want to store $@ in a variable. I tried all of the 4 possible combinations of quotations (cases [2]-[5]) and none of them worked. I tried to grab some info from sh man page (did a search for 'quotes' stuff), but couldn't find anything related to my question there.

Could you please explain why none of [2]-[5] act like [1] and how can I have both: 1) output like in [1] and 2) have $@ stored in a variable.

Thanks.
 
Very strange behavior, for as i know you run a program with two command line arguments each having a space in it. Sorry i'm bailing out, i would go for a higher level language , not script & not perl.
 
sh:
#!/bin/sh
IFS=$'\x16'
########
args="$@"
unset IFS
# do normal shell stuff
for l in a b c "1 2 3" d; do echo $l;done
set -- "what the heck" "is this"
echo $1 "==" $2
# when you want to use the original $@
IFS=$'\x16'
#
# for arg in $args works here too after restoring IFS to \x16
set -- $args
unset IFS

for arg in "$@"; do
    echo "[2] Arg: $arg"
done
########
 
XY problem. Saving the positional parameters into a variable is already done for you, the variable is called $@. Why do you need another variable? I guess you define and use shell functions. Inside the function’s definition $@ refers to the function’s actual parameters, not the script’s parameters.

It is pretty straightforward in shells supporting array variables. Yet – except $@sh(1) doesn’t support array variables. If your function takes a fixed number of parameters, a workaround is to always pass the positional parameters next to the parameters meant for the function:​
Bash:
my_function 42 "${@}"
[…] I tried to grab some info from sh man page (did a search for 'quotes' stuff), but couldn't find anything related to my question there. […]
The only relevant passage is:​
$@ Expands to the positional parameters, starting from one. When the expansion occurs within double‐quotes, each positional parameter expands as a separate argument. If there are no positional parameters, the expansion of @ generates zero arguments, even when @ is double‐quoted. What this basically means, for example, is if $1 is "abc" and $2 is "def ghi", then "$@" expands to the two arguments:
"abc""def ghi"
 
sh:
#!/bin/sh
IFS=$'\x16'
########
args="$@"
unset IFS
# do normal shell stuff
for l in a b c "1 2 3" d; do echo $l;done
set -- "what the heck" "is this"
echo $1 "==" $2
# when you want to use the original $@
IFS=$'\x16'
#
# for arg in $args works here too after restoring IFS to \x16
set -- $args
unset IFS

for arg in "$@"; do
    echo "[2] Arg: $arg"
done
########
Oh, yeah, that's exactly what I was looking for, thank you! I also tried to fiddle with different IFS values to make it work, but couldn't find a proper value for it.
But how does this value of \x16 work? I found that this is the SYN (synchronous idle) control character. Didn't fully understand what it actually does though: it seems to be sent when there's no data to send, but why can it be used as separator for arguments? Does it mean that this character is sent after every argument? Is it documented somewhere?
 
Saving the positional parameters into a variable is already done for you, the variable is called $@. Why do you need another variable? I guess you define and use shell functions. Inside the function’s definition $@ refers to the function’s actual parameters, not the script’s parameters.
If your function takes a fixed number of parameters, a workaround is to always pass the positional parameters next to the parameters meant for the function:[/] my_function 42 "${@}"
Yes, I know all that. And yep, I'm using shell functions and pass _script_ arguments into them as parameters. I just want to give $@ (inside a function) a sensible name by means of assigning it to a variable. I can't just fully understand why this step breaks iteration with for loop.
 
i chose ^V (\x16) because it is a small chance it was part of one of the args
the idea is the a="$@" will join the args by using IFS so putting something rare there will allow you to properly split it again in the future
it works with a tab pretty well too
 
I can't just fully understand why this step breaks iteration with for loop.
Because $@ doesn't preserve quotes when you try to assign it to args variable, i.e. in your case with default IFS you are getting args='arg1 space arg2 space', which later expands to 1 or 4 different words depending on whether args is quoted or not when used.
 
I'm making a script that has to deal with filenames with whitespaces in them and I encoutered an issue.
IF you want to actually process filenames with white space characters in its broadest sense, then you'll have more extensive problems. 3.413 White Space - POSIX:
A sequence of one or more characters that belong to the space character class as defined via the LC_CTYPE category in the current locale or a specified locale.

In the POSIX locale, white space consists of one or more <blank> (<space> and <tab> characters), <newline>, <carriage-return>, <form-feed>, and <vertical-tab> characters.

OTOH, if you do not have to deal with <newline>, <carriage-return>, <form-feed>, and <vertical-tab> characters, it would seem that putting the file names in a seperate file, one on each line, and reading/processing the filenames from this file in a shell* script is the easier way to proceed.

___
* or awk(1) script
 
i chose ^V (\x16) because it is a small chance it was part of one of the args
the idea is the a="$@" will join the args by using IFS so putting something rare there will allow you to properly split it again in the future
it works with a tab pretty well too
Thank you for explanation! I like the solution :)

Thank you all guys for providing such valuable info on the topic, I appreciate all your help and my question is answered and solved now.
 
Because $@ doesn't preserve quotes when you try to assign it to args variable, i.e. in your case with default IFS you are getting args='arg1 space arg2 space', which later expands to 1 or 4 different words depending on whether args is quoted or not when used.
I have to correct myself: the issue is not because of the removed quotes, I was (obviously) wrong. Even if the variable contained them ( args='"arg1 space" "arg2 space"'), args would still expand to 4 words instead of 2, i.e. '"arg1', 'space"', '"arg2', 'space"'. So yeah, changing IFS back and forth is the only way I can think of to get the array back from the string with POSIX.
 
FWIW when doing tests like this you should also test on filename with 2 spaces in a row. Including at the end of a filename.

It is easy to script in a way that a single space is preserved but multiple spaces are collapsed into one.
 
FWIW when doing tests like this you should also test on filename with 2 spaces in a row. Including at the end of a filename.

It is easy to script in a way that a single space is preserved but multiple spaces are collapsed into one.
Thanks, I didn't think about this case. But, I tested my script now and both multiple spaces in a row and space in the end work fine.
 
i chose ^V (\x16) because it is a small chance it was part of one of the args
the idea is the a="$@" will join the args by using IFS so putting something rare there will allow you to properly split it again in the future
it works with a tab pretty well too
The only character (or byte) that is guaranteed to not be in a file path (the combination of directory and file name) is nul. The only two characters that are guaranteed to not be in a file name (which excludes the directory) is slash and nul. So the best choice for a separator is nul. This also remains true when encoding in UTF, as neither slash nor nul will happen as continuation bytes of sequences.

Anecdote: If you implement file systems, it's great fun to put slashes into file names, and watch what breaks. Or have two files in the same directory that have identical names (which can validly happen if you do character set translation on file names). Like experimenting with explosives.
 
Back
Top