Coalescing refers to the process of checking the value of each input parameter in the order in which they are listed and returns the first non-missing value. In SAS, the behavior of each of the coalesce functions depends on the processing context, either in a data step or in a
PROC SQL statement. These differences are important to acknowledge, especially when the code is migrated from
PROC SQL to a data step.
Suppose, for example, that the following piece of code needs to be modified into a data step:
The intuitive approach would be to simply rewrite the code as follows:
Although this approach seems to be correct at first glance, it may cause a potential issue: the
coalesce function in the context of a data step treats parameters as numeric values. More specifically, if
column2 are characters, the automatic SAS conversion kicks in by trying to convert these values to numerics. If such a conversion fails, a null value will be passed to the function. Simply switching to the character-compatible function,
coalescec, may still have some side-effects as we will demonstrate in the following sections.
This article is therefore intended to review the behavior of the various
coealesce functions in SAS, and to outline the hurdles when using these functions. All the examples will be driven from the following base table:
The base table will allow us to study the behaviors of the various
coalesce functions with variables of different lengths and types as summarized as follows:
Coalesce in a data step
In order to coalesce data in a data step, SAS offers two options:
coalesce(argument-1<..., argument-n>). All the arguments need to be numeric.
coalescec( argument-1<..., argument-n>). It performs a similar functionality for character arguments.
coalescec in a data-step
It is important to note that if the
coalescec function returns a value to a variable that has not previously been assigned a length, then that variable is given a length of 200 bytes. As a consequence, if one of the arguments has a length above 200 characters, the result may be truncated. To avoid any data truncation, the length of the output variable needs to be set ahead of the function call.
The resulting dataset has the following structure:
coalescec function with a parameter of size greater than 200 (here,
c_var_long) will return by default a result with a length of 200. Setting explicitly the size of the output variable through the
length statement is the only way to prevent a possible truncation.
coalesce in a data-step
coalesce function accepts only numerics and returns by default a numeric of length 8, even if the underlying parameters have a shorter length. Unlike its sibling character-based function, there is no risk of truncation as the maximum length for a numeric in SAS is **8</em>. Specifying the length of the return value in advance allows however the user to have full control of the length of the result.
Note that the third variable of the output table (
coalesce_empty_with_n_long2) is not equal anymore to
n_var_long as the numeric has been truncated. Truncating non-integers is not recommended due to the lost of precision.
Coalescing in a SQL-Step
coalescec in SQL
coalescec function is still available in the context of a
PROC SQL statement. Similarly to the data step, setting explicitly the length through the column attribute may avoid an undesired truncation.
The output table is structured as follows:
coalesce in SQL
It is recommended to use the
coalesce SQL function for both numeric and character variables as SAS determines then the type of the output column based on the types of the arguments. While you need to use two functions, one for each data type, to coalesce variables in a data step, only one function is required in SQL.
coalesce function over the
coalescec function shifts the responsibility of defining the result type to the SQL processor. This behavior is in line with the SQL standard.
Note that the length of the ouput column is equal to the maximum length of the input parameters.
We have covered so far the use of the coalesce functions when all the parameters are of the same type. Let’s review the behaviours when there is a type mismatch.
Mixing Types in a data-step
In a data-step, the arguments are converted if required. More specifically, if an numeric argument is used with the
coalescec function, SAS will convert it to a character variable. Such a conversion is straightforward and is unlikely to fail. SAS will issue however a note in the log. With the
coalesce function, SAS will try to convert a character variable into a numeric variable. If such a conversion fails, the argument will be set to NULL.
Running the previous piece of code will produce the following entry in the log:
Mixing types in SQL
While SAS will attempt to convert the parameters to the appropriate types, the
coalesce function generates an error if there is a mismatch.
The previous piece of code will issue three consecutive errors of the same type:
The following table summarizes the differences between
coalescec depending on the context of the processing step. These differences are critical to keep in mind especially when code is being re-factored to use a different processing context.
|Context||DATA Step||SQL Step|
|Type of the Parameters||Numeric||Character||Character or Numeric. All the parameters need to be of the same type.||Character|
|Result Length||8 by default unless specified otherwise||200 by default unless specified otherwise||The length is adjusted based on the length of the parameters||200 by default unless specified otherwise|
|Risk if length is not specified||No loss of precision||Possible truncation||No loss of precision||Possible truncation|
|Type Conversion||Conversion from character to numeric may fail. Generates a note in the log.||Automatic conversion. Generates a note in the log.||Generates an error if there is an issue with conversion.||Automatic conversion. Generates a note in the log.|