Un outil pour savoir si une commande est posix - retour accueil
git clone git://bebou.netlib.re/isposix
Log | Files | Refs | README |
awk.html (144120B)
1 <!-- Copyright 2001-2024 IEEE and The Open Group, All Rights Reserved --> 2 <!DOCTYPE HTML> 3 <html lang="en"> 4 <head> 5 <meta name="generator" content="HTML Tidy for HTML5 for Linux version 5.8.0"> 6 <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> 7 <link type="text/css" rel="stylesheet" href="style.css"><!-- Generated by The Open Group rhtm tool v1.2.4 --> 8 <!-- Copyright (c) 2001-2024 The Open Group, All Rights Reserved --> 9 <title>awk</title> 10 </head> 11 <body bgcolor="white"> 12 <div class="NAVHEADER"> 13 <table summary="Header navigation table" class="nav" width="100%" border="0" cellpadding="0" cellspacing="0"> 14 <tr class="nav"> 15 <td class="nav" width="15%" align="left" valign="bottom"><a href="../utilities/at.html" accesskey="P"><<< 16 Previous</a></td> 17 <td class="nav" width="70%" align="center" valign="bottom"><a href="contents.html">Home</a></td> 18 <td class="nav" width="15%" align="right" valign="bottom"><a href="../utilities/basename.html" accesskey="N">Next 19 >>></a></td> 20 </tr> 21 </table> 22 <hr align="left" width="100%"></div> 23 <script language="JavaScript" src="../jscript/codes.js"></script><basefont size="3"> 24 <center><font size="2">The Open Group Base Specifications Issue 8<br> 25 IEEE Std 1003.1-2024<br> 26 Copyright © 2001-2024 The IEEE and The Open Group</font></center> 27 <hr size="2" noshade> 28 <a name="top" id="top"></a> <a name="awk" id="awk"></a> <a name="tag_20_06" id="tag_20_06"></a><!-- awk --> 29 <h4 class="mansect"><a name="tag_20_06_01" id="tag_20_06_01"></a>NAME</h4> 30 <blockquote>awk — pattern scanning and processing language</blockquote> 31 <h4 class="mansect"><a name="tag_20_06_02" id="tag_20_06_02"></a>SYNOPSIS</h4> 32 <blockquote class="synopsis"> 33 <p><code><tt>awk</tt> <b>[</b><tt>-F</tt> <i>sepstring</i><b>] [</b><tt>-v</tt> <i>assignment</i><b>]</b><tt>...</tt> <i>program</i> 34 <b>[</b><i>argument</i><tt>...</tt><b>]</b> <tt><br> 35 <br> 36 awk</tt> <b>[</b><tt>-F</tt> <i>sepstring</i><b>]</b> <tt>-f</tt> <i>progfile</i> <b>[</b><tt>-f</tt> 37 <i>progfile</i><b>]</b><tt>...</tt> <b>[</b><tt>-v</tt> <i>assignment</i><b>]</b><tt>...<br> 38 </tt> <b>[</b><i>argument</i><tt>...</tt><b>]</b> <tt><br></tt></code></p> 39 </blockquote> 40 <h4 class="mansect"><a name="tag_20_06_03" id="tag_20_06_03"></a>DESCRIPTION</h4> 41 <blockquote> 42 <p>The <i>awk</i> utility shall execute programs written in the <i>awk</i> programming language, which is specialized for textual 43 data manipulation. An <i>awk</i> program is a sequence of patterns and corresponding actions. When input is read that matches a 44 pattern, the action associated with that pattern is carried out.</p> 45 <p>Input shall be interpreted as a sequence of records. By default, a record is a line, less its terminating <newline>, but 46 this can be changed by using the <b>RS</b> built-in variable. Each record of input shall be matched in turn against each pattern in 47 the program. For each pattern matched, the associated action shall be executed.</p> 48 <p>The <i>awk</i> utility shall interpret each input record as a sequence of fields where, by default, a field is a string of 49 non-<blank> non-<newline> characters. This default <blank> and <newline> field delimiter can be changed by 50 using the <b>FS</b> built-in variable or the <b>-F</b> <i>sepstring</i> option. The <i>awk</i> utility shall denote the first field 51 in a record $1, the second $2, and so on. The symbol $0 shall refer to the entire record; setting any other field causes the 52 re-evaluation of $0. Assigning to $0 shall reset the values of all other fields and the <b>NF</b> built-in variable.</p> 53 </blockquote> 54 <h4 class="mansect"><a name="tag_20_06_04" id="tag_20_06_04"></a>OPTIONS</h4> 55 <blockquote> 56 <p>The <i>awk</i> utility shall conform to XBD <a href="../basedefs/V1_chap12.html#tag_12_02"><i>12.2 Utility Syntax 57 Guidelines</i></a> .</p> 58 <p>The following options shall be supported:</p> 59 <dl compact> 60 <dd></dd> 61 <dt><b>-F </b><i>sepstring</i></dt> 62 <dd>Define the input field separator. This option shall be equivalent to: 63 <pre> 64 <tt>-v FS=</tt><i>sepstring 65 </i></pre> 66 <p>except that if <b>-F</b> <i>sepstring</i> and <b>-v</b> <i><tt>FS=</tt>sepstring</i> are both used, it is unspecified whether 67 the <b>FS</b> assignment resulting from <b>-F</b> <i>sepstring</i> is processed in command line order or is processed after the 68 last <b>-v</b> <i><tt>FS=</tt>sepstring</i>. See the description of the <b>FS</b> built-in variable, and how it is used, in the 69 EXTENDED DESCRIPTION section.</p> 70 </dd> 71 <dt><b>-f </b><i>progfile</i></dt> 72 <dd>Specify the pathname of the file <i>progfile</i> containing an <i>awk</i> program. A pathname of <tt>'-'</tt> shall denote the 73 standard input. If multiple instances of this option are specified, the concatenation of the files specified as <i>progfile</i> in 74 the order specified shall be the <i>awk</i> program. The <i>awk</i> program can alternatively be specified in the command line as a 75 single argument.</dd> 76 <dt><b>-v </b><i>assignment</i></dt> 77 <dd> 78 The application shall ensure that the <i>assignment</i> argument is in the same form as an <i>assignment</i> operand. The specified 79 variable assignment shall occur prior to executing the <i>awk</i> program, including the actions associated with <b>BEGIN</b> 80 patterns (if any). Multiple occurrences of this option can be specified.</dd> 81 </dl> 82 </blockquote> 83 <h4 class="mansect"><a name="tag_20_06_05" id="tag_20_06_05"></a>OPERANDS</h4> 84 <blockquote> 85 <p>The following operands shall be supported:</p> 86 <dl compact> 87 <dd></dd> 88 <dt><i>program</i></dt> 89 <dd>If no <b>-f</b> option is specified, the first operand to <i>awk</i> shall be the text of the <i>awk</i> program. The 90 application shall supply the <i>program</i> operand as a single argument to <i>awk</i>. If the text does not end in a 91 <newline>, <i>awk</i> shall interpret the text as if it did.</dd> 92 <dt><i>argument</i></dt> 93 <dd>Either of the following two types of <i>argument</i> can be intermixed: 94 <dl compact> 95 <dd></dd> 96 <dt><i>file</i></dt> 97 <dd>A pathname of a file that contains the input to be read, which is matched against the set of patterns in the program. If no 98 <i>file</i> operands or their equivalents, achieved by modifying the <i>awk</i> variables <b>ARGV</b> and <b>ARGC</b>, are 99 specified, or if a <i>file</i> operand is <tt>'-'</tt>, the standard input shall be used.</dd> 100 <dt><i>assignment</i></dt> 101 <dd>An operand that begins with an <underscore> or alphabetic character from the portable character set (see the table in XBD 102 <a href="../basedefs/V1_chap06.html#tag_06_01"><i>6.1 Portable Character Set</i></a> ), followed by a sequence of underscores, 103 digits, and alphabetics from the portable character set, followed by the <tt>'='</tt> character, shall specify a variable 104 assignment rather than a pathname. The characters before the <tt>'='</tt> represent the name of an <i>awk</i> variable; if that 105 name is an <i>awk</i> reserved word (see <a href="#tag_20_06_13_16">Grammar</a> ) the behavior is undefined. The characters 106 following the <equals-sign> shall be interpreted as if they appeared in the <i>awk</i> program preceded and followed by a 107 double-quote (<tt>'"'</tt> ) character, as a <b>STRING</b> token (see <a href="#tag_20_06_13_16">Grammar</a> ), except that if the 108 last character is an unescaped <backslash>, it shall be interpreted as a literal <backslash> rather than as the first 109 character of the sequence <tt>"\""</tt>. The variable shall be assigned the value of that <b>STRING</b> token and, if appropriate, 110 shall be considered a <i>numeric string</i> (see <a href="#tag_20_06_13_02">Expressions in awk</a> ), the variable shall also be 111 assigned its numeric value. Each such variable assignment shall occur just prior to the processing of the following <i>file</i>, if 112 any. Thus, an assignment before the first <i>file</i> argument shall be executed after the <b>BEGIN</b> actions (if any), while an 113 assignment after the last <i>file</i> argument shall occur before the <b>END</b> actions (if any). If there are no <i>file</i> 114 arguments or their equivalents, achieved by modifying the <i>awk</i> variables <b>ARGV</b> and <b>ARGC</b>, assignments shall be 115 executed before processing the standard input.</dd> 116 </dl> 117 </dd> 118 </dl> 119 </blockquote> 120 <h4 class="mansect"><a name="tag_20_06_06" id="tag_20_06_06"></a>STDIN</h4> 121 <blockquote> 122 <p>The standard input shall be used only if no <i>file</i> operands or their equivalents, achieved by modifying the <i>awk</i> 123 variables <b>ARGV</b> and <b>ARGC</b>, are specified; or if a <i>file</i> operand, or its equivalent, is <tt>'-'</tt>; or if a 124 <i>progfile</i> option-argument is <tt>'-'</tt>; see the INPUT FILES section. If the <i>awk</i> program contains no actions and no 125 patterns, but is otherwise a valid <i>awk</i> program, standard input and any <i>file</i> operands shall not be read and <i>awk</i> 126 shall exit with a return status of zero.</p> 127 </blockquote> 128 <h4 class="mansect"><a name="tag_20_06_07" id="tag_20_06_07"></a>INPUT FILES</h4> 129 <blockquote> 130 <p>Input files to the <i>awk</i> program from any of the following sources shall be text files:</p> 131 <ul> 132 <li> 133 <p>Any <i>file</i> operands or their equivalents, achieved by modifying the <i>awk</i> variables <b>ARGV</b> and <b>ARGC</b></p> 134 </li> 135 <li> 136 <p>Standard input in the absence of any <i>file</i> operands, or their equivalents</p> 137 </li> 138 <li> 139 <p>Arguments to the <b>getline</b> function</p> 140 </li> 141 </ul> 142 <p>Whether the variable <b>RS</b> is set to a value other than a <newline> or not, for these files, implementations shall 143 support records terminated with the specified separator up to {LINE_MAX} bytes and may support longer records.</p> 144 <p>If <b>-f</b> <i>progfile</i> is specified, the application shall ensure that the files named by each of the <i>progfile</i> 145 option-arguments are text files and their concatenation, in the same order as they appear in the arguments, is an <i>awk</i> 146 program.</p> 147 </blockquote> 148 <h4 class="mansect"><a name="tag_20_06_08" id="tag_20_06_08"></a>ENVIRONMENT VARIABLES</h4> 149 <blockquote> 150 <p>The following environment variables shall affect the execution of <i>awk</i>:</p> 151 <dl compact> 152 <dd></dd> 153 <dt><i>LANG</i></dt> 154 <dd>Provide a default value for the internationalization variables that are unset or null. (See XBD <a href= 155 "../basedefs/V1_chap08.html#tag_08_02"><i>8.2 Internationalization Variables</i></a> for the precedence of internationalization 156 variables used to determine the values of locale categories.)</dd> 157 <dt><i>LC_ALL</i></dt> 158 <dd>If set to a non-empty string value, override the values of all the other internationalization variables.</dd> 159 <dt><i>LC_COLLATE</i></dt> 160 <dd> 161 Determine the locale for the behavior of ranges, equivalence classes, and multi-character collating elements within regular 162 expressions and in comparisons of string values.</dd> 163 <dt><i>LC_CTYPE</i></dt> 164 <dd>Determine the locale for the interpretation of sequences of bytes of text data as characters (for example, single-byte as 165 opposed to multi-byte characters in arguments and input files), the behavior of character classes within regular expressions, the 166 identification of characters as letters, and the mapping of uppercase and lowercase characters for the <b>toupper</b> and 167 <b>tolower</b> functions.</dd> 168 <dt><i>LC_MESSAGES</i></dt> 169 <dd> 170 Determine the locale that should be used to affect the format and contents of diagnostic messages written to standard error.</dd> 171 <dt><i>LC_NUMERIC</i></dt> 172 <dd> 173 Determine the radix character used when interpreting numeric input, performing conversions between numeric and string values, and 174 formatting numeric output. Regardless of locale, the <period> character (the decimal-point character of the POSIX locale) is 175 the decimal-point character recognized in processing <i>awk</i> programs (including assignments in command line arguments).</dd> 176 <dt><i>NLSPATH</i></dt> 177 <dd><sup>[<a href="javascript:open_code('XSI')">XSI</a>]</sup> <img src="../images/opt-start.gif" alt="[Option Start]" border="0"> 178 Determine the location of messages objects and message catalogs. <img src="../images/opt-end.gif" alt="[Option End]" border= 179 "0"></dd> 180 <dt><i>PATH</i></dt> 181 <dd>Determine the search path when looking for commands executed by <i>system</i>(<i>expr</i>), or input and output pipes; see XBD 182 <a href="../basedefs/V1_chap08.html#tag_08"><i>8. Environment Variables</i></a> .</dd> 183 </dl> 184 <p>In addition, all environment variables shall be visible via the <i>awk</i> variable <b>ENVIRON</b>.</p> 185 </blockquote> 186 <h4 class="mansect"><a name="tag_20_06_09" id="tag_20_06_09"></a>ASYNCHRONOUS EVENTS</h4> 187 <blockquote> 188 <p>Default.</p> 189 </blockquote> 190 <h4 class="mansect"><a name="tag_20_06_10" id="tag_20_06_10"></a>STDOUT</h4> 191 <blockquote> 192 <p>The nature of the output files depends on the <i>awk</i> program.</p> 193 </blockquote> 194 <h4 class="mansect"><a name="tag_20_06_11" id="tag_20_06_11"></a>STDERR</h4> 195 <blockquote> 196 <p>The standard error shall be used only for diagnostic messages.</p> 197 </blockquote> 198 <h4 class="mansect"><a name="tag_20_06_12" id="tag_20_06_12"></a>OUTPUT FILES</h4> 199 <blockquote> 200 <p>The nature of the output files depends on the <i>awk</i> program.<br></p> 201 </blockquote> 202 <h4 class="mansect"><a name="tag_20_06_13" id="tag_20_06_13"></a>EXTENDED DESCRIPTION</h4> 203 <blockquote> 204 <h5><a name="tag_20_06_13_01" id="tag_20_06_13_01"></a>Overall Program Structure</h5> 205 <p>An <i>awk</i> program is composed of pairs of the form:</p> 206 <pre> 207 <i>pattern</i><tt> { </tt><i>action</i><tt> } 208 </tt></pre> 209 <p>Either the pattern or the action (including the enclosing brace characters) can be omitted.</p> 210 <p>A missing pattern shall match any record of input, and a missing action shall be equivalent to:</p> 211 <pre> 212 <tt>{ print } 213 </tt></pre> 214 <p>Execution of the <i>awk</i> program shall start by first executing the actions associated with all <b>BEGIN</b> patterns in the 215 order they occur in the program. Then each <i>file</i> operand (or standard input if no files were specified) shall be processed in 216 turn by reading data from the file until a record separator is seen (<newline> by default). Before the first reference to a 217 field in the record is evaluated, the record shall be split into fields, according to the rules in <a href= 218 "#tag_20_06_13_04">Regular Expressions</a> , using the value of <b>FS</b> that was current at the time the record was read. Each 219 pattern in the program then shall be evaluated in the order of occurrence, and the action associated with each pattern that matches 220 the current record executed. The action for a matching pattern shall be executed before evaluating subsequent patterns. Finally, 221 the actions associated with all <b>END</b> patterns shall be executed in the order they occur in the program.</p> 222 <h5><a name="tag_20_06_13_02" id="tag_20_06_13_02"></a>Expressions in awk</h5> 223 <p>Expressions describe computations used in <i>patterns</i> and <i>actions</i>. In the following table, valid expression 224 operations are given in groups from highest precedence first to lowest precedence last, with equal-precedence operators grouped 225 between horizontal lines. In expression evaluation, where the grammar is formally ambiguous, higher precedence operators shall be 226 evaluated before lower precedence operators. In this table <i>expr</i>, <i>expr1</i>, <i>expr2</i>, and <i>expr3</i> represent any 227 expression, while lvalue represents any entity that can be assigned to (that is, on the left side of an assignment operator). The 228 precise syntax of expressions is given in <a href="#tag_20_06_13_16">Grammar</a> .</p> 229 <p class="caption"><a name="tagtcjh_14" id="tagtcjh_14"></a> Table: Expressions in Decreasing Precedence in awk</p> 230 <center> 231 <table border="1" cellpadding="3" align="center"> 232 <tr valign="top"> 233 <th align="center"> 234 <p class="tent"><b>Syntax</b></p> 235 </th> 236 <th align="center"> 237 <p class="tent"><b>Name</b></p> 238 </th> 239 <th align="center"> 240 <p class="tent"><b>Type of Result</b></p> 241 </th> 242 <th align="center"> 243 <p class="tent"><b>Associativity</b></p> 244 </th> 245 </tr> 246 247 <tr valign="top"> 248 <td align="left"> 249 <p class="tent">(<i>expr</i>)</p> 250 </td> 251 <td align="left"> 252 <p class="tent">Grouping</p> 253 </td> 254 <td align="left"> 255 <p class="tent">Type of <i>expr</i></p> 256 </td> 257 <td align="left"> 258 <p class="tent">N/A</p> 259 </td> 260 </tr> 261 262 <tr valign="top"> 263 <td align="left"> 264 <p class="tent">$<i>expr</i></p> 265 </td> 266 <td align="left"> 267 <p class="tent">Field reference</p> 268 </td> 269 <td align="left"> 270 <p class="tent">Uninitialized or String</p> 271 </td> 272 <td align="left"> 273 <p class="tent">N/A</p> 274 </td> 275 </tr> 276 277 <tr valign="top"> 278 <td align="left"> 279 <p class="tent">lvalue ++</p> 280 <p class="tent">lvalue --</p> 281 </td> 282 <td align="left"> 283 <p class="tent">Post-increment</p> 284 <p class="tent">Post-decrement</p> 285 </td> 286 <td align="left"> 287 <p class="tent">Numeric</p> 288 <p class="tent">Numeric</p> 289 </td> 290 <td align="left"> 291 <p class="tent">N/A</p> 292 <p class="tent">N/A</p> 293 </td> 294 </tr> 295 296 <tr valign="top"> 297 <td align="left"> 298 <p class="tent">++ lvalue</p> 299 <p class="tent">-- lvalue</p> 300 </td> 301 <td align="left"> 302 <p class="tent">Pre-increment</p> 303 <p class="tent">Pre-decrement</p> 304 </td> 305 <td align="left"> 306 <p class="tent">Numeric</p> 307 <p class="tent">Numeric</p> 308 </td> 309 <td align="left"> 310 <p class="tent">N/A</p> 311 <p class="tent">N/A</p> 312 </td> 313 </tr> 314 315 316 <tr valign="top"> 317 <td align="left"> 318 <p class="tent"><i>expr</i> ^ <i>expr</i></p> 319 </td> 320 <td align="left"> 321 <p class="tent">Exponentiation</p> 322 </td> 323 <td align="left"> 324 <p class="tent">Numeric</p> 325 </td> 326 <td align="left"> 327 <p class="tent">Right</p> 328 </td> 329 </tr> 330 331 <tr valign="top"> 332 <td align="left"> 333 <p class="tent">! <i>expr</i></p> 334 <p class="tent">+ <i>expr</i></p> 335 <p class="tent">- <i>expr</i></p> 336 </td> 337 <td align="left"> 338 <p class="tent">Logical not</p> 339 <p class="tent">Unary plus</p> 340 <p class="tent">Unary minus</p> 341 </td> 342 <td align="left"> 343 <p class="tent">Numeric</p> 344 <p class="tent">Numeric</p> 345 <p class="tent">Numeric</p> 346 </td> 347 <td align="left"> 348 <p class="tent">N/A</p> 349 <p class="tent">N/A</p> 350 <p class="tent">N/A</p> 351 </td> 352 </tr> 353 354 <tr valign="top"> 355 <td align="left"> 356 <p class="tent"><i>expr</i> * <i>expr</i></p> 357 <p class="tent"><i>expr</i> / <i>expr</i></p> 358 <p class="tent"><i>expr</i> % <i>expr</i></p> 359 </td> 360 <td align="left"> 361 <p class="tent">Multiplication</p> 362 <p class="tent">Division</p> 363 <p class="tent">Modulus</p> 364 </td> 365 <td align="left"> 366 <p class="tent">Numeric</p> 367 <p class="tent">Numeric</p> 368 <p class="tent">Numeric</p> 369 </td> 370 <td align="left"> 371 <p class="tent">Left</p> 372 <p class="tent">Left</p> 373 <p class="tent">Left</p> 374 </td> 375 </tr> 376 377 <tr valign="top"> 378 <td align="left"> 379 <p class="tent"><i>expr</i> + <i>expr</i></p> 380 <p class="tent"><i>expr</i> - <i>expr</i></p> 381 </td> 382 <td align="left"> 383 <p class="tent">Addition</p> 384 <p class="tent">Subtraction</p> 385 </td> 386 <td align="left"> 387 <p class="tent">Numeric</p> 388 <p class="tent">Numeric</p> 389 </td> 390 <td align="left"> 391 <p class="tent">Left</p> 392 <p class="tent">Left</p> 393 </td> 394 </tr> 395 396 397 <tr valign="top"> 398 <td align="left"> 399 <p class="tent"><i>expr</i> <i>expr</i></p> 400 </td> 401 <td align="left"> 402 <p class="tent">String concatenation</p> 403 </td> 404 <td align="left"> 405 <p class="tent">String</p> 406 </td> 407 <td align="left"> 408 <p class="tent">Left</p> 409 </td> 410 </tr> 411 412 <tr valign="top"> 413 <td align="left"> 414 <p class="tent"><i>expr</i> < <i>expr</i></p> 415 <p class="tent"><i>expr</i> <= <i>expr</i></p> 416 <p class="tent"><i>expr</i> != <i>expr</i></p> 417 <p class="tent"><i>expr</i> == <i>expr</i></p> 418 <p class="tent"><i>expr</i> > <i>expr</i></p> 419 <p class="tent"><i>expr</i> >= <i>expr</i></p> 420 </td> 421 <td align="left"> 422 <p class="tent">Less than</p> 423 <p class="tent">Less than or equal to</p> 424 <p class="tent">Not equal to</p> 425 <p class="tent">Equal to</p> 426 <p class="tent">Greater than</p> 427 <p class="tent">Greater than or equal to</p> 428 </td> 429 <td align="left"> 430 <p class="tent">Numeric</p> 431 <p class="tent">Numeric</p> 432 <p class="tent">Numeric</p> 433 <p class="tent">Numeric</p> 434 <p class="tent">Numeric</p> 435 <p class="tent">Numeric</p> 436 </td> 437 <td align="left"> 438 <p class="tent">None</p> 439 <p class="tent">None</p> 440 <p class="tent">None</p> 441 <p class="tent">None</p> 442 <p class="tent">None</p> 443 <p class="tent">None</p> 444 </td> 445 </tr> 446 447 448 <tr valign="top"> 449 <td align="left"> 450 <p class="tent"><i>expr</i> ˜ <i>expr</i></p> 451 <p class="tent"><i>expr</i> !˜ <i>expr</i></p> 452 </td> 453 <td align="left"> 454 <p class="tent">ERE match</p> 455 <p class="tent">ERE non-match</p> 456 </td> 457 <td align="left"> 458 <p class="tent">Numeric</p> 459 <p class="tent">Numeric</p> 460 </td> 461 <td align="left"> 462 <p class="tent">None</p> 463 <p class="tent">None</p> 464 </td> 465 </tr> 466 467 <tr valign="top"> 468 <td align="left"> 469 <p class="tent"><i>expr</i> in array</p> 470 <p class="tent">(<i>index</i>) in <i>array</i></p> 471 </td> 472 <td align="left"> 473 <p class="tent">Array membership</p> 474 <p class="tent">Multi-dimension array membership</p> 475 </td> 476 <td align="left"> 477 <p class="tent">Numeric</p> 478 <p class="tent">Numeric</p> 479 </td> 480 <td align="left"> 481 <p class="tent">Left</p> 482 <p class="tent">Left</p> 483 </td> 484 </tr> 485 486 <tr valign="top"> 487 <td align="left"> 488 <p class="tent"><i>expr</i> && <i>expr</i></p> 489 </td> 490 <td align="left"> 491 <p class="tent">Logical AND</p> 492 </td> 493 <td align="left"> 494 <p class="tent">Numeric</p> 495 </td> 496 <td align="left"> 497 <p class="tent">Left</p> 498 </td> 499 </tr> 500 501 <tr valign="top"> 502 <td align="left"> 503 <p class="tent"><i>expr</i> || <i>expr</i></p> 504 </td> 505 <td align="left"> 506 <p class="tent">Logical OR</p> 507 </td> 508 <td align="left"> 509 <p class="tent">Numeric</p> 510 </td> 511 <td align="left"> 512 <p class="tent">Left</p> 513 </td> 514 </tr> 515 516 <tr valign="top"> 517 <td align="left"> 518 <p class="tent"><i>expr1</i> ? <i>expr2</i> : <i>expr3</i></p> 519 </td> 520 <td align="left"> 521 <p class="tent">Conditional expression</p> 522 </td> 523 <td align="left"> 524 <p class="tent">Type of selected<br><i>expr2</i> or <i>expr3</i></p> 525 </td> 526 <td align="left"> 527 <p class="tent">Right</p> 528 </td> 529 </tr> 530 531 <tr valign="top"> 532 <td align="left"> 533 <p class="tent">lvalue ^= <i>expr</i></p> 534 <p class="tent">lvalue %= <i>expr</i></p> 535 <p class="tent">lvalue *= <i>expr</i></p> 536 <p class="tent">lvalue /= <i>expr</i></p> 537 <p class="tent">lvalue += <i>expr</i></p> 538 <p class="tent">lvalue -= <i>expr</i></p> 539 <p class="tent">lvalue = <i>expr</i></p> 540 </td> 541 <td align="left"> 542 <p class="tent">Exponentiation assignment</p> 543 <p class="tent">Modulus assignment</p> 544 <p class="tent">Multiplication assignment</p> 545 <p class="tent">Division assignment</p> 546 <p class="tent">Addition assignment</p> 547 <p class="tent">Subtraction assignment</p> 548 <p class="tent">Assignment</p> 549 </td> 550 <td align="left"> 551 <p class="tent">Numeric</p> 552 <p class="tent">Numeric</p> 553 <p class="tent">Numeric</p> 554 <p class="tent">Numeric</p> 555 <p class="tent">Numeric</p> 556 <p class="tent">Numeric</p> 557 <p class="tent">Type of <i>expr</i></p> 558 </td> 559 <td align="left"> 560 <p class="tent">Right</p> 561 <p class="tent">Right</p> 562 <p class="tent">Right</p> 563 <p class="tent">Right</p> 564 <p class="tent">Right</p> 565 <p class="tent">Right</p> 566 <p class="tent">Right</p> 567 </td> 568 </tr> 569 570 </table> 571 </center> 572 <p class="tent">Each expression shall have either a string value, a numeric value, or both. Except as stated for specific contexts, 573 the value of an expression shall be implicitly converted to the type needed for the context in which it is used. A string value 574 shall be converted to a numeric value either by the equivalent of the following calls to functions defined by the ISO C 575 standard:</p> 576 <pre> 577 <tt>setlocale(LC_NUMERIC, ""); 578 </tt><i>numeric_value</i><tt> = atof(</tt><i>string_value</i><tt>); 579 </tt></pre> 580 <p class="tent">or by converting the initial portion of the string to type <b>double</b> representation as follows:</p> 581 <blockquote>The input string is decomposed into two parts: an initial, possibly empty, sequence of white-space characters (as 582 specified by <a href="../functions/isspace.html"><i>isspace</i>()</a>) and a subject sequence interpreted as a floating-point 583 constant. 584 <p class="tent">The expected form of the subject sequence is an optional <tt>'+'</tt> or <tt>'-'</tt> sign, then a non-empty 585 sequence of digits optionally containing a radix character, then an optional exponent part. An exponent part consists of 586 <tt>'e'</tt> or <tt>'E'</tt>, followed by an optional sign, followed by one or more decimal digits.</p> 587 <p class="tent">The sequence starting with the first digit or the radix character (whichever occurs first) is interpreted as a 588 floating constant of the C language, except that the radix character shall be used in place of a <period>, and if neither an 589 exponent part nor a radix character appears, a radix character is assumed to follow the last digit in the string. If the subject 590 sequence begins with a <hyphen-minus>, the value resulting from the conversion is negated.</p> 591 </blockquote> 592 <p class="tent">A numeric value that is exactly equal to the value of an integer (see <a href= 593 "../utilities/V3_chap01.html#tag_18_01_02"><i>1.1.2 Concepts Derived from the ISO C Standard</i></a> ) shall be converted to a 594 string by the equivalent of a call to the <b>sprintf</b> function (see <a href="#tag_20_06_13_13">String Functions</a> ) with the 595 string <tt>"%d"</tt> as the <i>fmt</i> argument and the numeric value being converted as the first and only <i>expr</i> argument. 596 Any other numeric value shall be converted to a string by the equivalent of a call to the <b>sprintf</b> function with the value of 597 the variable <b>CONVFMT</b> as the <i>fmt</i> argument and the numeric value being converted as the first and only <i>expr</i> 598 argument. The result of the conversion is unspecified if the value of <b>CONVFMT</b> is not a floating-point format specification. 599 This volume of POSIX.1-2024 specifies no explicit conversions between numbers and strings. An application can force an expression 600 to be treated as a number by adding zero to it, or can force it to be treated as a string by concatenating the null string 601 (<tt>""</tt>) to it.</p> 602 <p class="tent">A string value shall be considered a <i>numeric string</i> if it comes from one of the following:</p> 603 <ol> 604 <li class="tent">Field variables</li> 605 <li class="tent">Input from the <i>getline</i>() function</li> 606 <li class="tent"><b>FILENAME</b></li> 607 <li class="tent"><b>ARGV</b> array elements</li> 608 <li class="tent"><b>ENVIRON</b> array elements</li> 609 <li class="tent">Array elements created by the <i>split</i>() function</li> 610 <li class="tent">A command line variable assignment</li> 611 <li class="tent">Variable assignment from another numeric string variable</li> 612 </ol> 613 <p class="tent">and an implementation-dependent condition corresponding to either case (a) or (b) below is met.</p> 614 <ol type="a"> 615 <li class="tent">After the equivalent of the following calls to functions defined by the ISO C standard, 616 <i>string_value_end</i> would differ from <i>string_value</i>, and any characters before the terminating null character in 617 <i>string_value_end</i> would be <blank> characters: 618 <pre> 619 <tt>char *string_value_end; 620 setlocale(LC_NUMERIC, ""); 621 numeric_value = strtod (string_value, &string_value_end); 622 </tt></pre></li> 623 <li class="tent">After all the following conversions have been applied, the resulting string would lexically be recognized as a 624 <b>NUMBER</b> token as described by the lexical conventions in <a href="#tag_20_06_13_16">Grammar</a> : 625 <ul> 626 <li class="tent">All leading and trailing <blank> characters are discarded.</li> 627 <li class="tent">If the first non-<blank> is <tt>'+'</tt> or <tt>'-'</tt>, it is discarded.</li> 628 <li class="tent">Each occurrence of the radix character from the current locale is changed to a <period>.</li> 629 </ul> 630 </li> 631 </ol> 632 In case (a) the numeric value of the <i>numeric string</i> shall be the value that would be returned by the <a href= 633 "../functions/strtod.html"><i>strtod</i>()</a> call. In case (b) if the first non-<blank> is <tt>'-'</tt>, the numeric value 634 of the <i>numeric string</i> shall be the negation of the numeric value of the recognized <b>NUMBER</b> token; otherwise, the 635 numeric value of the <i>numeric string</i> shall be the numeric value of the recognized <b>NUMBER</b> token. Whether or not a 636 string is a <i>numeric string</i> shall be relevant only in contexts where that term is used in this section. 637 <p class="tent">When an expression is used in a Boolean context, if it has a numeric value, a value of zero shall be treated as 638 false and any other value shall be treated as true. Otherwise, a string value of the null string shall be treated as false and any 639 other value shall be treated as true. A Boolean context shall be one of the following:</p> 640 <ul> 641 <li class="tent">The first subexpression of a conditional expression</li> 642 <li class="tent">An expression operated on by logical NOT, logical AND, or logical OR</li> 643 <li class="tent">The second expression of a <b>for</b> statement</li> 644 <li class="tent">The expression of an <b>if</b> statement</li> 645 <li class="tent">The expression of the <b>while</b> clause in either a <b>while</b> or <b>do</b>...<b>while</b> statement</li> 646 <li class="tent">An expression used as a pattern (as in Overall Program Structure)</li> 647 </ul> 648 <p class="tent">All arithmetic shall follow the semantics of floating-point arithmetic as specified by the ISO C standard (see 649 <a href="../utilities/V3_chap01.html#tag_18_01_02"><i>1.1.2 Concepts Derived from the ISO C Standard</i></a> ).</p> 650 <p class="tent">The value of the expression:</p> 651 <pre> 652 <i>expr1</i><tt> ^ </tt><i>expr2</i><tt> 653 </tt></pre> 654 <p class="tent">shall be equivalent to the value returned by the ISO C standard function call:</p> 655 <pre> 656 <tt>pow(</tt><i>expr1</i><tt>, </tt><i>expr2</i><tt>) 657 </tt></pre> 658 <p class="tent">The expression:</p> 659 <pre> 660 <tt>lvalue ^= </tt><i>expr</i><tt> 661 </tt></pre> 662 <p class="tent">shall be equivalent to the ISO C standard expression:</p> 663 <pre> 664 <tt>lvalue = pow(lvalue, </tt><i>expr</i><tt>) 665 </tt></pre> 666 <p class="tent">except that lvalue shall be evaluated only once. The value of the expression:</p> 667 <pre> 668 <i>expr1</i><tt> % </tt><i>expr2</i><tt> 669 </tt></pre> 670 <p class="tent">shall be equivalent to the value returned by the ISO C standard function call:</p> 671 <pre> 672 <tt>fmod(</tt><i>expr1</i><tt>, </tt><i>expr2</i><tt>) 673 </tt></pre> 674 <p class="tent">The expression:</p> 675 <pre> 676 <tt>lvalue %= </tt><i>expr</i><tt> 677 </tt></pre> 678 <p class="tent">shall be equivalent to the ISO C standard expression:</p> 679 <pre> 680 <tt>lvalue = fmod(lvalue, </tt><i>expr</i><tt>) 681 </tt></pre> 682 <p class="tent">except that lvalue shall be evaluated only once.</p> 683 <p class="tent">Variables and fields shall be set by the assignment statement:</p> 684 <pre> 685 <tt>lvalue = </tt><i>expression</i><tt> 686 </tt></pre> 687 <p class="tent">and the type of <i>expression</i> shall determine the resulting variable type. The assignment includes the 688 arithmetic assignments (<tt>"+="</tt>, <tt>"-="</tt>, <tt>"*="</tt>, <tt>"/="</tt>, <tt>"%="</tt>, <tt>"^="</tt>, <tt>"++"</tt>, 689 <tt>"--"</tt>) all of which shall produce a numeric result. The left-hand side of an assignment and the target of increment and 690 decrement operators can be one of a variable, an array with index, or a field selector.</p> 691 <p class="tent">The <i>awk</i> language supplies arrays that are used for storing numbers or strings. Arrays need not be declared. 692 They shall initially be empty, and their sizes shall change dynamically. The subscripts, or element identifiers, are strings, 693 providing a type of associative array capability. An array name followed by a subscript within square brackets can be used as an 694 lvalue and thus as an expression, as described in the grammar; see <a href="#tag_20_06_13_16">Grammar</a> . Unsubscripted array 695 names can be used in only the following contexts:</p> 696 <ul> 697 <li class="tent">A parameter in a function definition or function call</li> 698 <li class="tent">The <b>NAME</b> token following any use of the keyword <b>in</b> as specified in the grammar (see <a href= 699 "#tag_20_06_13_16">Grammar</a> ); if the name used in this context is not an array name, the behavior is undefined</li> 700 <li class="tent">The <b>NAME</b> token following the keyword <b>Delete</b> without a subscript as specified in the grammar (see 701 <a href="#tag_20_06_13_16">Grammar</a> ); if the name used in this context is not an array name, the behavior is undefined.</li> 702 </ul> 703 <p class="tent">A valid array <i>index</i> shall consist of one or more <comma>-separated expressions, similar to the way in 704 which multi-dimensional arrays are indexed in some programming languages. Because <i>awk</i> arrays are really one-dimensional, 705 such a <comma>-separated list shall be converted to a single string by concatenating the string values of the separate 706 expressions, each separated from the other by the value of the <b>SUBSEP</b> variable. Thus, the following two index operations 707 shall be equivalent:</p> 708 <pre> 709 <i>var</i><b>[</b><i>expr1</i><tt>, </tt><i>expr2</i><tt>, ... </tt><i>exprn</i><b>] 710 <br class="tent"> 711 </b><i>var</i><b>[</b><i>expr1</i><tt> SUBSEP </tt><i>expr2</i><tt> SUBSEP ... SUBSEP </tt><i>exprn</i><b>]</b><tt> 712 </tt></pre> 713 <p class="tent">The application shall ensure that a multi-dimensioned <i>index</i> used with the <b>in</b> operator is 714 parenthesized. The <b>in</b> operator, which tests for the existence of a particular array element, shall not cause that element to 715 exist. Any other reference to a nonexistent array element shall automatically create it.</p> 716 <p class="tent">Comparisons (with the <tt>'<'</tt>, <tt>"<="</tt>, <tt>"!="</tt>, <tt>"=="</tt>, <tt>'>'</tt>, and 717 <tt>">="</tt> operators) shall be made numerically:</p> 718 <ul> 719 <li class="tent">if both operands are numeric,</li> 720 <li class="tent">if one is numeric and the other has a string value that is a numeric string,</li> 721 <li class="tent">if both have string values that are numeric strings, or</li> 722 <li class="tent">if one is numeric and the other has the uninitialized value.</li> 723 </ul> 724 <p class="tent">Otherwise, operands shall be converted to strings as required and a string comparison shall be made as follows:</p> 725 <ul> 726 <li class="tent">For the <tt>"!="</tt> and <tt>"=="</tt> operators, the strings shall be compared to check if they are identical 727 (not to check if they collate equally).</li> 728 <li class="tent">For the other operators, the strings shall be compared using the locale-specific collation sequence.</li> 729 </ul> 730 <p class="tent">The value of the comparison expression shall be 1 if the relation is true, or 0 if the relation is false.</p> 731 <h5><a name="tag_20_06_13_03" id="tag_20_06_13_03"></a>Variables and Special Variables</h5> 732 <p class="tent">Variables can be used in an <i>awk</i> program by referencing them. With the exception of function parameters (see 733 <a href="#tag_20_06_13_15">User-Defined Functions</a> ), they are not explicitly declared. Function parameter names shall be local 734 to the function; all other variable names shall be global. The same name shall not be used as both a function parameter name and as 735 the name of a function or a special <i>awk</i> variable. The same name shall not be used both as a variable name with global scope 736 and as the name of a function. The same name shall not be used within the same scope both as a scalar variable and as an array. 737 Uninitialized variables, including scalar variables, array elements, and field variables, shall have an uninitialized value. An 738 uninitialized value shall have both a numeric value of zero and a string value of the empty string. Evaluation of variables with an 739 uninitialized value, to either string or numeric, shall be determined by the context in which they are used.</p> 740 <p class="tent">Field variables shall be designated by a <tt>'$'</tt> followed by a number or numerical expression. The effect of 741 the field number <i>expression</i> evaluating to anything other than a non-negative integer is unspecified; uninitialized variables 742 or string values need not be converted to numeric values in this context. New field variables can be created by assigning a value 743 to them. References to nonexistent fields (that is, fields after $<b>NF</b>), shall evaluate to the uninitialized value. Such 744 references shall not create new fields. However, assigning to a nonexistent field (for example, $(<b>NF</b>+2)=5) shall increase 745 the value of <b>NF</b>; create any intervening fields with the uninitialized value; and cause the value of $0 to be recomputed, 746 with the fields being separated by the value of <b>OFS</b>. Each field variable shall have a string value or an uninitialized value 747 when created. Field variables shall have the uninitialized value when created from $0 using <b>FS</b> and the variable does not 748 contain any characters. If appropriate, the field variable shall be considered a numeric string (see <a href= 749 "#tag_20_06_13_02">Expressions in awk</a> ).</p> 750 <p class="tent">Implementations shall support the following other special variables that are set by <i>awk</i>:</p> 751 <dl compact> 752 <dd></dd> 753 <dt><b>ARGC</b></dt> 754 <dd>A number determining when the iteration described for <b>ARGV</b> stops. When an <i>awk</i> program starts, <b>ARGC</b> shall 755 be initialized to the number of elements in the <b>ARGV</b> array. <b>ARGC</b> can be updated by the <i>awk</i> program and by 756 assignment operands. If <b>ARGC</b> is set to a value less than 1, the behavior is unspecified. It is unspecified whether 757 alterations to <b>ARGC</b> can be made using the <b>-v</b> option.</dd> 758 <dt><b>ARGV</b></dt> 759 <dd>An array containing, initially, the command name (see <a href="../utilities/V3_chap02.html#tag_19_09_01"><i>2.9.1 Simple 760 Commands</i></a> ) used to invoke <i>awk</i> in <tt>ARGV[0]</tt> and the command line arguments, if any, excluding options and the 761 <i>program</i> operand, in <tt>ARGV[1]</tt> through <tt>ARGV[ARGC-1]</tt>. The elements in <b>ARGV</b> can be assigned new values 762 or deleted, and new elements can be added. Note that alterations to <b>ARGV</b> cannot be made using either the <i>assignment</i> 763 operand or the <b>-v</b> option, because an operand with a <tt>'['</tt> before <tt>'='</tt> is treated as a <i>file</i> operand, 764 not an <i>assignment</i> operand, and applications are required to ensure that the <b>-v</b> option-argument has the same form as 765 an <i>assignment</i> operand. (See the OPTIONS and OPERANDS sections.) 766 <p class="tent">After processing the <b>BEGIN</b> actions, if any, <i>awk</i> begins interating over the elements of <b>ARGV</b>, 767 processing them as if they were <i>argument</i> operands. It shall behave as if the implementation maintains an internal counter 768 that is initialized to 1 and increments by 1 at the end of each iteration. For each iteration, the following shall occur:</p> 769 <ul> 770 <li class="tent">If the internal counter is greater than or equal to the current value of <b>ARGC</b> and no <i>file</i> operands 771 have been processed, <i>awk</i> shall set <b>FILENAME</b> to <tt>'-'</tt> and process standard input as if it was given as a file 772 operand. The internal counter shall not be incremented at the end of this iteration.</li> 773 <li class="tent">Otherwise, if the internal counter is greater than or equal to the current value of <b>ARGC</b>, the iterations 774 shall stop and processing of the <b>END</b> actions, if any, shall begin. Any <b>ARGV</b> elements with index values greater than 775 or equal to <b>ARGC</b> shall not be processed as <i>argument</i> operands.</li> 776 <li class="tent">Otherwise, if the element <tt>ARGV[</tt> <i>internal counter value</i><tt>]</tt> does not exist, it is unspecified 777 whether that element is created. No other action shall be taken.</li> 778 <li class="tent">Otherwise, if <tt>ARGV[</tt> <i>internal counter value</i><tt>]</tt> is a null string, no action shall be 779 taken.</li> 780 <li class="tent">Otherwise, if <tt>ARGV[</tt> <i>internal counter value</i><tt>]</tt> matches the format of an <i>assignment</i> 781 operand (see OPERANDS), <i>awk</i> shall process the assignment.</li> 782 <li class="tent">Otherwise, <tt>ARGV[</tt> <i>internal counter value</i><tt>]</tt> shall be treated as a <i>file</i> operand, 783 <b>FILENAME</b> shall be set to that value, and the named file, or standard input if the value is <tt>'-'</tt>, shall be processed 784 as an input file.</li> 785 </ul> 786 <p class="tent">Since only non-null elements are processed, setting an element of <b>ARGV</b> to the null string or deleting it 787 means that it shall not be treated as an <i>argument</i> operand.</p> 788 </dd> 789 <dt><b>CONVFMT</b></dt> 790 <dd>The <b>printf</b> format for converting numbers to strings (except for output statements, where <b>OFMT</b> is used); 791 <tt>"%.6g"</tt> by default.</dd> 792 <dt><b>ENVIRON</b></dt> 793 <dd>An array representing the value of the environment, as described in the <i>exec</i> functions defined in the System Interfaces 794 volume of POSIX.1-2024. The indices of the array shall be strings consisting of the names of the environment variables, and the 795 value of each array element shall be a string consisting of the value of that variable. If appropriate, the environment variable 796 shall be considered a <i>numeric string</i> (see <a href="#tag_20_06_13_02">Expressions in awk</a> ); the array element shall also 797 have its numeric value. 798 <p class="tent">In all cases where the behavior of <i>awk</i> is affected by environment variables (including the environment of 799 any commands that <i>awk</i> executes via the <b>system</b> function or via pipeline redirections with the <b>print</b> statement, 800 the <b>printf</b> statement, or the <b>getline</b> function), the environment used shall be the environment at the time <i>awk</i> 801 began executing; it is implementation-defined whether any modification of <b>ENVIRON</b> affects this environment.</p> 802 </dd> 803 <dt><b>FILENAME</b></dt> 804 <dd>The pathname used to open the current input file, or <tt>'-'</tt> if the file is standard input. Inside a <b>BEGIN</b> action 805 <b>FILENAME</b> shall be unset. Inside an <b>END</b> action the value shall be the name of the last input file processed. If an 806 application changes the value of <b>FILENAME</b>, the results are unspecified.</dd> 807 <dt><b>FNR</b></dt> 808 <dd>The ordinal number of the current record in the current file. Inside a <b>BEGIN</b> action the value shall be zero. Inside an 809 <b>END</b> action the value shall be the number of the last record processed in the last file processed.</dd> 810 <dt><b>FS</b></dt> 811 <dd>Input field separator regular expression; a <space> by default.</dd> 812 <dt><b>NF</b></dt> 813 <dd>The number of fields in the current record. Inside a <b>BEGIN</b> action, the use of <b>NF</b> is undefined unless a 814 <b>getline</b> function without a <i>var</i> argument is executed previously. Inside an <b>END</b> action, <b>NF</b> shall retain 815 the value it had for the last record read, unless a subsequent, redirected, <b>getline</b> function without a <i>var</i> argument 816 is performed prior to entering the <b>END</b> action.</dd> 817 <dt><b>NR</b></dt> 818 <dd>The ordinal number of the current record from the start of input. Inside a <b>BEGIN</b> action the value shall be zero. Inside 819 an <b>END</b> action the value shall be the number of the last record processed. Records skipped by the <b>nextfile</b> statement 820 shall not be included.</dd> 821 <dt><b>OFMT</b></dt> 822 <dd>The <b>printf</b> format for converting numbers to strings in output statements (see <a href="#tag_20_06_13_10">Output 823 Statements</a> ); <tt>"%.6g"</tt> by default. The result of the conversion is unspecified if the value of <b>OFMT</b> is not a 824 floating-point format specification.</dd> 825 <dt><b>OFS</b></dt> 826 <dd>The <b>print</b> statement output field separator; <space> by default.</dd> 827 <dt><b>ORS</b></dt> 828 <dd>The <b>print</b> statement output record separator; a <newline> by default.</dd> 829 <dt><b>RLENGTH</b></dt> 830 <dd>The length of the string matched by the <b>match</b> function.</dd> 831 <dt><b>RS</b></dt> 832 <dd>The first character of the string value of <b>RS</b> shall be the input record separator; a <newline> by default. If 833 <b>RS</b> contains more than one character, the results are unspecified. If <b>RS</b> is null, then records are separated by 834 sequences consisting of a <newline> plus one or more blank lines, leading or trailing blank lines shall not result in empty 835 records at the beginning or end of the input, and a <newline> shall always be a field separator, no matter what the value of 836 <b>FS</b> is.</dd> 837 <dt><b>RSTART</b></dt> 838 <dd>The starting position of the string matched by the <b>match</b> function, numbering from 1. This shall always be equivalent to 839 the return value of the <b>match</b> function.</dd> 840 <dt><b>SUBSEP</b></dt> 841 <dd>The subscript separator string for multi-dimensional arrays; the default value is implementation-defined.</dd> 842 </dl> 843 <h5><a name="tag_20_06_13_04" id="tag_20_06_13_04"></a>Regular Expressions</h5> 844 <p class="tent">The <i>awk</i> utility shall make use of the extended regular expression notation (see XBD <a href= 845 "../basedefs/V1_chap09.html#tag_09_04"><i>9.4 Extended Regular Expressions</i></a> ) except that it shall allow the use of 846 C-language conventions for escaping special characters within the EREs, as specified in the table in XBD <a href= 847 "../basedefs/V1_chap05.html#tag_05"><i>5. File Format Notation</i></a> for <tt>'\\'</tt>, <tt>'\a'</tt>, <tt>'\b'</tt>, 848 <tt>'\f'</tt>, <tt>'\n'</tt>, <tt>'\r'</tt>, <tt>'\t'</tt>, <tt>'\v'</tt> and in the following table for other sequences; these 849 escape sequences shall be recognized both inside and outside bracket expressions. Note that records need not be separated by 850 <newline> characters and string constants can contain <newline> characters, so even the <tt>"\n"</tt> sequence is valid 851 in <i>awk</i> EREs. Using a <slash> character within the lexical token <b>ERE</b> (except as one of the two delimiters) 852 requires the escaping shown in the following table.<br></p> 853 <p class="caption"><a name="tagtcjh_15" id="tagtcjh_15"></a> Table: Escape Sequences in awk</p> 854 <center> 855 <table border="1" cellpadding="3" align="center"> 856 <tr valign="top"> 857 <th align="center"> 858 <p class="tent"><b>Escape Sequence</b></p> 859 </th> 860 <th align="center"> 861 <p class="tent"><b>Description</b></p> 862 </th> 863 <th align="center"> 864 <p class="tent"><b>Meaning</b></p> 865 </th> 866 </tr> 867 <tr valign="top"> 868 <td align="left"> 869 <p class="tent">\"</p> 870 </td> 871 <td align="left"> 872 <p class="tent"><backslash> <quotation-mark></p> 873 </td> 874 <td align="left"> 875 <p class="tent">In the lexical token <b>STRING</b>, <quotation-mark> character. Otherwise undefined.</p> 876 </td> 877 </tr> 878 <tr valign="top"> 879 <td align="left"> 880 <p class="tent">\/</p> 881 </td> 882 <td align="left"> 883 <p class="tent"><backslash> <slash></p> 884 </td> 885 <td align="left"> 886 <p class="tent">In the lexical token <b>ERE</b>, <slash> character. Otherwise undefined.</p> 887 </td> 888 </tr> 889 <tr valign="top"> 890 <td align="left"> 891 <p class="tent">\ddd</p> 892 </td> 893 <td align="left"> 894 <p class="tent">A <backslash> character followed by the longest sequence of one, two, or three octal-digit characters 895 (01234567). If all of the digits are 0 (that is, representation of the NUL character), the behavior is undefined. If the digits 896 produce a value greater than octal 377, the behavior is undefined.</p> 897 </td> 898 <td align="left"> 899 <p class="tent">The character whose encoding is represented by the one, two, or three-digit octal integer. Multi-byte characters 900 require multiple, concatenated escape sequences of this type, including the leading <backslash> for each byte.</p> 901 </td> 902 </tr> 903 <tr valign="top"> 904 <td align="left"> 905 <p class="tent">\., \[, \(,\*, \+, \?, \{, \|, \^, \$</p> 906 </td> 907 <td align="left"> 908 <p class="tent">A <backslash> character followed by a character that has a special meaning in EREs (see XBD <a href= 909 "../basedefs/V1_chap09.html#tag_09_04"><i>9.4 Extended Regular Expressions</i></a> ), other than <backslash>.</p> 910 </td> 911 <td align="left"> 912 <p class="tent">In the lexical token <b>ERE</b> when not inside a bracket expression, the sequence shall represent itself. 913 Otherwise undefined.</p> 914 </td> 915 </tr> 916 <tr valign="top"> 917 <td align="left"> 918 <p class="tent">\\</p> 919 </td> 920 <td align="left"> 921 <p class="tent">Two <backslash> characters.</p> 922 </td> 923 <td align="left"> 924 <p class="tent">In the lexical token <b>ERE</b>, the sequence shall represent itself. In the lexical token <b>STRING</b>, it shall 925 represent a single <backslash>.</p> 926 </td> 927 </tr> 928 <tr valign="top"> 929 <td align="left"> 930 <p class="tent">\c</p> 931 </td> 932 <td align="left"> 933 <p class="tent">A <backslash> character followed by any character not described in this table or in the table in XBD <a href= 934 "../basedefs/V1_chap05.html#tag_05"><i>5. File Format Notation</i></a> (<tt>'\\'</tt>, <tt>'\a'</tt>, <tt>'\b'</tt>, <tt>'\f'</tt>, 935 <tt>'\n'</tt>, <tt>'\r'</tt>, <tt>'\t'</tt>, <tt>'\v'</tt>).</p> 936 </td> 937 <td align="left"> 938 <p class="tent">Undefined</p> 939 </td> 940 </tr> 941 </table> 942 </center> 943 <p class="tent">A regular expression can be matched against a specific field or string by using one of the two regular expression 944 matching operators, <tt>'~'</tt> and <tt>"!~"</tt>. These operators shall interpret their right-hand operand as a regular 945 expression and their left-hand operand as a string. If the regular expression matches the string, the <tt>'~'</tt> expression shall 946 evaluate to a value of 1, and the <tt>"!~"</tt> expression shall evaluate to a value of 0. (The regular expression matching 947 operation is as defined by the term matched in XBD <a href="../basedefs/V1_chap09.html#tag_09_01"><i>9.1 Regular Expression 948 Definitions</i></a> , where a match occurs on any part of the string unless the regular expression is limited with the 949 <circumflex> or <dollar-sign> special characters.) If the regular expression does not match the string, the 950 <tt>'~'</tt> expression shall evaluate to a value of 0, and the <tt>"!~"</tt> expression shall evaluate to a value of 1. If the 951 right-hand operand is any expression other than the lexical token <b>ERE</b>, the string value of the expression shall be 952 interpreted as an extended regular expression, including the escape conventions described above. Note that these escape conventions 953 shall also be applied in determining the value of a string literal (the lexical token <b>STRING</b>), and thus shall be applied a 954 second time when a string literal is used in this context.</p> 955 <p class="tent">When an <b>ERE</b> token appears as an expression in any context other than as the right-hand of the <tt>'~'</tt> 956 or <tt>"!~"</tt> operator or as one of the built-in function arguments described below, the value of the resulting expression shall 957 be the equivalent of:</p> 958 <pre> 959 <tt>$0 ~ /</tt><i>ere</i><tt>/ 960 </tt></pre> 961 <p class="tent">The <i>ere</i> argument to the <b>gsub</b>, <b>match</b>, <b>sub</b> functions, and the <i>fs</i> argument to the 962 <b>split</b> function (see <a href="#tag_20_06_13_13">String Functions</a> ) shall be interpreted as extended regular expressions. 963 These can be either <b>ERE</b> tokens or arbitrary expressions, and shall be interpreted in the same manner as the right-hand side 964 of the <tt>'~'</tt> or <tt>"!~"</tt> operator.</p> 965 <p class="tent">An extended regular expression can be used to separate fields by assigning a string containing the expression to 966 the built-in variable <b>FS</b>, either directly or as a consequence of using the <b>-F</b> <i>sepstring</i> option. The default 967 value of the <b>FS</b> variable shall be a single <space>. The following describes <b>FS</b> behavior:</p> 968 <ol> 969 <li class="tent">If <b>FS</b> is a null string, the behavior is unspecified.</li> 970 <li class="tent">If <b>FS</b> is a single character: 971 <ol type="a"> 972 <li class="tent">If <b>FS</b> is <space>, skip leading and trailing <blank> and <newline> characters; fields 973 shall be delimited by sets of one or more <blank> or <newline> characters.</li> 974 <li class="tent">Otherwise, if <b>FS</b> is any other character <i>c</i>, fields shall be delimited by each single occurrence of 975 <i>c</i>.</li> 976 </ol> 977 </li> 978 <li class="tent">Otherwise, the string value of <b>FS</b> shall be considered to be an extended regular expression. Each occurrence 979 of a sequence of one or more characters matching the extended regular expression shall delimit fields.</li> 980 </ol> 981 <p class="tent">When ERE matching is performed against input records; that is, the match is against $0 and the current value of $0 982 resulted from processing an input record, record separator characters (the first character of the value of the variable <b>RS</b>, 983 <newline> by default) cannot be embedded in the expression, and no expression shall match the record separator character. If 984 the record separator is not <newline>, <newline> characters embedded in the expression can be matched. When ERE 985 matching is not performed against input records, it shall be based on text strings; any character (including <newline> and 986 the record separator) can be embedded in the pattern, and an appropriate pattern shall match any character. However, in all 987 <i>awk</i> ERE matching, the use of one or more NUL characters in the pattern, input record, or text string produces undefined 988 results.</p> 989 <h5><a name="tag_20_06_13_05" id="tag_20_06_13_05"></a>Patterns</h5> 990 <p class="tent">A <i>pattern</i> is any valid <i>expression</i>, a range specified by two expressions separated by a comma, or one 991 of the two special patterns <b>BEGIN</b> or <b>END</b>.</p> 992 <h5><a name="tag_20_06_13_06" id="tag_20_06_13_06"></a>Special Patterns</h5> 993 <p class="tent">The <i>awk</i> utility shall recognize two special patterns, <b>BEGIN</b> and <b>END</b>. Each <b>BEGIN</b> pattern 994 shall be matched once and its associated action executed before the first record of input is read—except possibly by use of the 995 <b>getline</b> function (see <a href="#tag_20_06_13_14">Input/Output and General Functions</a> ) in a prior <b>BEGIN</b> action—and 996 before command line assignment is done. Each <b>END</b> pattern shall be matched once and its associated action executed after the 997 last record of input has been read, or if there is no further input file to process following a <b>nextfile</b> statement. These 998 two patterns shall have associated actions.</p> 999 <p class="tent"><b>BEGIN</b> and <b>END</b> shall not combine with other patterns. Multiple <b>BEGIN</b> and <b>END</b> patterns 1000 shall be allowed. The actions associated with the <b>BEGIN</b> patterns shall be executed in the order specified in the program, as 1001 are the <b>END</b> actions. An <b>END</b> pattern can precede a <b>BEGIN</b> pattern in a program.</p> 1002 <p class="tent">If an <i>awk</i> program consists of only actions with the pattern <b>BEGIN</b>, and the <b>BEGIN</b> action 1003 contains no <b>getline</b> function, <i>awk</i> shall exit without reading its input when the last statement in the last 1004 <b>BEGIN</b> action is executed. If an <i>awk</i> program consists of only actions with the pattern <b>END</b> or only actions with 1005 the patterns <b>BEGIN</b> and <b>END</b>, the input shall be read before the statements in the <b>END</b> actions are executed.</p> 1006 <h5><a name="tag_20_06_13_07" id="tag_20_06_13_07"></a>Expression Patterns</h5> 1007 <p class="tent">An expression pattern shall be evaluated as if it were an expression in a Boolean context. If the result is true, 1008 the pattern shall be considered to match, and the associated action (if any) shall be executed. If the result is false, the action 1009 shall not be executed.</p> 1010 <h5><a name="tag_20_06_13_08" id="tag_20_06_13_08"></a>Pattern Ranges</h5> 1011 <p class="tent">A pattern range consists of two expressions separated by a comma; in this case, the action shall be performed for 1012 all records between a match of the first expression and the following match of the second expression, inclusive. At this point, the 1013 pattern range can be repeated starting at input records subsequent to the end of the matched range.</p> 1014 <h5><a name="tag_20_06_13_09" id="tag_20_06_13_09"></a>Actions</h5> 1015 <p class="tent">An action is a sequence of statements as shown in the grammar in <a href="#tag_20_06_13_16">Grammar</a> . Any 1016 single statement can be replaced by a statement list enclosed in curly braces. The application shall ensure that statements in a 1017 statement list are separated by <newline> or <semicolon> characters. Statements in a statement list shall be executed 1018 sequentially in the order that they appear.</p> 1019 <p class="tent">The <i>expression</i> acting as the conditional in an <b>if</b> statement shall be evaluated and if it is non-zero 1020 or non-null, the following statement shall be executed; otherwise, if <b>else</b> is present, the statement following the 1021 <b>else</b> shall be executed.</p> 1022 <p class="tent">The <b>if</b>, <b>while</b>, <b>do</b>...<b>while</b>, <b>for</b>, <b>break</b>, and <b>continue</b> statements are 1023 based on the ISO C standard (see <a href="../utilities/V3_chap01.html#tag_18_01_02"><i>1.1.2 Concepts Derived from the ISO C 1024 Standard</i></a> ), except that the Boolean expressions shall be treated as described in <a href="#tag_20_06_13_02">Expressions in 1025 awk</a> , and except in the case of:</p> 1026 <pre> 1027 <tt>for (</tt><i>variable</i><tt> in </tt><i>array</i><tt>) 1028 </tt></pre> 1029 <p class="tent">which shall iterate, assigning each <i>index</i> of <i>array</i> to <i>variable</i> in an unspecified order. The 1030 results of adding new elements to <i>array</i> within such a <b>for</b> loop are undefined. If a <b>break</b> or <b>continue</b> 1031 statement occurs outside of a loop, the behavior is undefined.</p> 1032 <p class="tent">The <b>delete</b> statement shall remove either a specified individual array element or, if no element is 1033 specified, all array elements. Thus, the following code:</p> 1034 <pre> 1035 <tt>for (index in array) 1036 delete array[index] 1037 </tt></pre> 1038 <p class="tent">is equivalent to:</p> 1039 <pre> 1040 <tt>delete array 1041 </tt></pre> 1042 <p class="tent">Both delete all elements of the array.</p> 1043 <p class="tent">The <b>next</b> statement shall cause all further processing of the current input record to be abandoned. The 1044 behavior is undefined if a <b>next</b> statement appears or is invoked in a <b>BEGIN</b> or <b>END</b> action.</p> 1045 <p class="tent">The <b>nextfile</b> statement shall cause all further processing of the current input file to be abandoned. The 1046 behavior is undefined if a <b>nextfile</b> statement appears or is invoked in a <b>BEGIN</b> or <b>END</b> action, or in a 1047 user-defined function.</p> 1048 <p class="tent">The <b>exit</b> statement shall invoke all <b>END</b> actions in the order in which they occur in the program 1049 source and then terminate the program without reading further input. An <b>exit</b> statement inside an <b>END</b> action shall 1050 terminate the program without further execution of <b>END</b> actions. If an expression is specified in an <b>exit</b> statement, 1051 its numeric value shall be the exit status of <i>awk</i>, unless subsequent errors are encountered or a subsequent <b>exit</b> 1052 statement with an expression is executed.</p> 1053 <h5><a name="tag_20_06_13_10" id="tag_20_06_13_10"></a>Output Statements</h5> 1054 <p class="tent">Both <b>print</b> and <b>printf</b> statements shall write to standard output by default. The output shall be 1055 written to the location specified by <i>output_redirection</i> if one is supplied, as follows:</p> 1056 <pre> 1057 <tt>> </tt><i>expression</i><tt> 1058 >> </tt><i>expression</i><tt> 1059 | </tt><i>expression</i><tt> 1060 </tt></pre> 1061 <p class="tent">In all cases, the <i>expression</i> shall be evaluated to produce a string that is used as a pathname into which to 1062 write (for <tt>'>'</tt> or <tt>">>"</tt>) or as a command to be executed (for <tt>'|'</tt>). Using the first two forms, if 1063 the file of that name is not currently open, it shall be opened, creating it if necessary and using the first form, truncating the 1064 file. The output then shall be appended to the file. As long as the file remains open, subsequent calls in which <i>expression</i> 1065 evaluates to the same string value shall simply append output to the file. The file remains open until the <b>close</b> function 1066 (see <a href="#tag_20_06_13_14">Input/Output and General Functions</a> ) is called with an expression that evaluates to the same 1067 string value.</p> 1068 <p class="tent">The third form shall write output onto a stream piped to the input of a command. The stream shall be created if no 1069 stream is currently open with the value of <i>expression</i> as its command name. The stream created shall be equivalent to one 1070 created by a call to the <a href="../functions/popen.html"><i>popen</i>()</a> function defined in the System Interfaces volume of 1071 POSIX.1-2024 with the value of <i>expression</i> as the <i>command</i> argument and a value of <i>w</i> as the <i>mode</i> 1072 argument. As long as the stream remains open, subsequent calls in which <i>expression</i> evaluates to the same string value shall 1073 write output to the existing stream. The stream shall remain open until the <b>close</b> function (see <a href= 1074 "#tag_20_06_13_14">Input/Output and General Functions</a> ) is called with an expression that evaluates to the same string value. 1075 At that time, the stream shall be closed as if by a call to the <a href="../functions/pclose.html"><i>pclose</i>()</a> function 1076 defined in the System Interfaces volume of POSIX.1-2024.</p> 1077 <p class="tent">As described in detail by the grammar in <a href="#tag_20_06_13_16">Grammar</a> , these output statements shall 1078 take a <comma>-separated list of <i>expression</i>s referred to in the grammar by the non-terminal symbols <b>expr_list</b>, 1079 <b>print_expr_list</b>, or <b>print_expr_list_opt</b>. This list is referred to here as the <i>expression list</i>, and each member 1080 is referred to as an <i>expression argument</i>.</p> 1081 <p class="tent">The <b>print</b> statement shall write the value of each expression argument onto the indicated output stream 1082 separated by the current output field separator (see variable <b>OFS</b> above), and terminated by the output record separator (see 1083 variable <b>ORS</b> above). All expression arguments shall be taken as strings, being converted if necessary; this conversion shall 1084 be as described in <a href="#tag_20_06_13_02">Expressions in awk</a> , with the exception that the <b>printf</b> format in 1085 <b>OFMT</b> shall be used instead of the value in <b>CONVFMT</b>. An empty expression list shall stand for the whole input record 1086 ($0).</p> 1087 <p class="tent">The <b>printf</b> statement shall produce output based on a notation similar to the File Format Notation used to 1088 describe file formats in this volume of POSIX.1-2024 (see XBD <a href="../basedefs/V1_chap05.html#tag_05"><i>5. File Format 1089 Notation</i></a> ). Output shall be produced as specified with the first <i>expression</i> argument as the string <i>format</i> and 1090 subsequent <i>expression</i> arguments as the strings <i>arg1</i> to <i>argn</i>, inclusive, with the following exceptions:</p> 1091 <ol> 1092 <li class="tent">The <i>format</i> shall be an actual character string rather than a graphical representation. Therefore, it cannot 1093 contain empty character positions. The <space> in the <i>format</i> string, in any context other than a <i>flag</i> of a 1094 conversion specification, shall be treated as an ordinary character that is copied to the output.</li> 1095 <li class="tent">If the character set contains a <tt>'Δ'</tt> character and that character appears in the <i>format</i> string, it 1096 shall be treated as an ordinary character that is copied to the output.</li> 1097 <li class="tent">The <i>escape sequences</i> beginning with a <backslash> character shall be treated as sequences of ordinary 1098 characters that are copied to the output. Note that these same sequences shall be interpreted lexically by <i>awk</i> when they 1099 appear in literal strings, but they shall not be treated specially by the <b>printf</b> statement.</li> 1100 <li class="tent">A <i>field width</i> or <i>precision</i> can be specified as the <tt>'*'</tt> character instead of a digit string. 1101 In this case the next argument from the expression list shall be fetched and its numeric value taken as the field width or 1102 precision.</li> 1103 <li class="tent">The implementation shall not precede or follow output from the <tt>d</tt> or <tt>u</tt> conversion specifier 1104 characters with <blank> characters not specified by the <i>format</i> string.</li> 1105 <li class="tent">The implementation shall not precede output from the <tt>o</tt> conversion specifier character with leading zeros 1106 not specified by the <i>format</i> string.</li> 1107 <li class="tent">For the <tt>c</tt> conversion specifier character: if the argument has a numeric value, the character whose 1108 encoding is that value shall be output. If the value is zero or is not the encoding of any character in the character set, the 1109 behavior is undefined. If the argument does not have a numeric value, the first character of the string value shall be output; if 1110 the string does not contain any characters, the behavior is undefined.</li> 1111 <li class="tent">For each conversion specification that consumes an argument, the next expression argument shall be evaluated. With 1112 the exception of the <tt>c</tt> conversion specifier character, the value shall be converted (according to the rules specified in 1113 <a href="#tag_20_06_13_02">Expressions in awk</a> ) to the appropriate type for the conversion specification.</li> 1114 <li class="tent">If there are insufficient expression arguments to satisfy all the conversion specifications in the <i>format</i> 1115 string, the behavior is undefined.</li> 1116 <li class="tent">If any character sequence in the <i>format</i> string begins with a <tt>'%'</tt> character, but does not form a 1117 valid conversion specification, the behavior is unspecified.</li> 1118 </ol> 1119 <p class="tent">Both <b>print</b> and <b>printf</b> can output at least {LINE_MAX} bytes.</p> 1120 <h5><a name="tag_20_06_13_11" id="tag_20_06_13_11"></a>Functions</h5> 1121 <p class="tent">The <i>awk</i> language has a variety of built-in functions: arithmetic, string, input/output, and general.</p> 1122 <p class="tent">Function parameters, if present, can be either scalars or arrays; the behavior is undefined if an array name is 1123 passed as a parameter that the function uses as a scalar, or if a scalar expression is passed as a parameter that the function uses 1124 as an array. Function parameters shall be passed by value if scalar and by reference if array name.</p> 1125 <h5><a name="tag_20_06_13_12" id="tag_20_06_13_12"></a>Arithmetic Functions</h5> 1126 <p class="tent">The arithmetic functions, except for <b>int</b>, shall be based on the ISO C standard (see <a href= 1127 "../utilities/V3_chap01.html#tag_18_01_02"><i>1.1.2 Concepts Derived from the ISO C Standard</i></a> ). The behavior is undefined 1128 in cases where the ISO C standard specifies that an error be returned or that the behavior is undefined. Although the grammar 1129 (see <a href="#tag_20_06_13_16">Grammar</a> ) permits built-in functions to appear with no arguments or parentheses, unless the 1130 argument or parentheses are indicated as optional in the following list (by displaying them within the <tt>"[]"</tt> brackets), 1131 such use is undefined.</p> 1132 <dl compact> 1133 <dd></dd> 1134 <dt><b>atan2</b>(<i>y</i>,<i>x</i>)</dt> 1135 <dd>Return arctangent of <i>y</i>/<i>x</i> in radians in the range [-ℼ,ℼ].</dd> 1136 <dt><b>cos</b>(<i>x</i>)</dt> 1137 <dd>Return cosine of <i>x</i>, where <i>x</i> is in radians.</dd> 1138 <dt><b>sin</b>(<i>x</i>)</dt> 1139 <dd>Return sine of <i>x</i>, where <i>x</i> is in radians.</dd> 1140 <dt><b>exp</b>(<i>x</i>)</dt> 1141 <dd>Return the exponential function of <i>x</i>.</dd> 1142 <dt><b>log</b>(<i>x</i>)</dt> 1143 <dd>Return the natural logarithm of <i>x</i>.</dd> 1144 <dt><b>sqrt</b>(<i>x</i>)</dt> 1145 <dd>Return the square root of <i>x</i>.</dd> 1146 <dt><b>int</b>(<i>x</i>)</dt> 1147 <dd>Return the argument truncated to an integer. Truncation shall be toward 0 when <i>x</i>>0.</dd> 1148 <dt><b>rand</b>()</dt> 1149 <dd>Return a floating point pseudo-random number <i>n</i>, such that 0<=<i>n</i><1.</dd> 1150 <dt><b>srand</b>(<b>[</b><i>expr</i><b>]</b>)</dt> 1151 <dd>Set the seed value for <b>rand</b> to <i>expr</i> or use the seconds since the Epoch if <i>expr</i> is omitted. The previous 1152 seed value shall be returned. The behavior is unspecified if <i>expr</i> is not an integer expression or if the value of 1153 <i>expr</i> is not within the range 0 through 2<sup><small>31</small></sup>-1 (2147483647), inclusive. The initial seed value is 1154 unspecified if <b>rand</b> is called without calling <b>srand</b> first. The <b>srand</b> function uses the argument as a seed for 1155 a new sequence of pseudo-random numbers to be returned by subsequent calls to <b>rand</b>. If <b>srand</b> is then called with the 1156 same seed value, the sequence of pseudo-random numbers shall be repeated.</dd> 1157 </dl> 1158 <h5><a name="tag_20_06_13_13" id="tag_20_06_13_13"></a>String Functions</h5> 1159 <p class="tent">The string functions in the following list shall be supported. Although the grammar (see <a href= 1160 "#tag_20_06_13_16">Grammar</a> ) permits built-in functions to appear with no arguments or parentheses, unless the argument or 1161 parentheses are indicated as optional in the following list (by displaying them within the <tt>"[]"</tt> brackets), such use is 1162 undefined.</p> 1163 <dl compact> 1164 <dd></dd> 1165 <dt><b>gsub</b>(<i>ere</i>, <i>repl</i><b>[</b>, <i>in</i><b>]</b>)</dt> 1166 <dd> 1167 Behave like <b>sub</b> (see below), except that it shall replace all occurrences of the regular expression (like the <a href= 1168 "../utilities/ed.html"><i>ed</i></a> utility global substitute) in $0 or in the <i>in</i> argument, when specified.</dd> 1169 <dt><b>index</b>(<i>s</i>, <i>t</i>)</dt> 1170 <dd>Return the position, in characters, numbering from 1, in string <i>s</i> where string <i>t</i> first occurs, or zero if it does 1171 not occur at all.</dd> 1172 <dt><b>length[</b>(<b>[</b><i>arg</i><b>]</b>)<b>]</b></dt> 1173 <dd> 1174 If <i>arg</i> is an array, return the number of elements in the array; otherwise, return the length, in characters, of <i>arg</i> 1175 taken as a string, or of the whole record, $0, if there is no argument.</dd> 1176 <dt><b>match</b>(<i>s</i>, <i>ere</i>)</dt> 1177 <dd>Return the position, in characters, numbering from 1, in string <i>s</i> where the extended regular expression <i>ere</i> 1178 occurs, or zero if it does not occur at all. RSTART shall be set to the starting position (which is the same as the returned 1179 value), zero if no match is found; RLENGTH shall be set to the length of the matched string, -1 if no match is found.</dd> 1180 <dt><b>split</b>(<i>s</i>, <i>a</i><b>[</b>, <i>fs </i><b>]</b>)</dt> 1181 <dd> 1182 Split the string <i>s</i> into array elements <i>a</i>[1], <i>a</i>[2], ..., <i>a</i>[<i>n</i>], and return <i>n</i>. All elements 1183 of the array shall be deleted before the split is performed. The separation shall be done with the ERE <i>fs</i> or with the field 1184 separator <b>FS</b> if <i>fs</i> is not given. Each array element shall have a string value when created and, if appropriate, the 1185 array element shall be considered a numeric string (see <a href="#tag_20_06_13_02">Expressions in awk</a> ). The effect of a null 1186 string as the value of <i>fs</i> is unspecified.</dd> 1187 <dt><b>sprintf</b>(<i>fmt</i>, <i>expr</i>, <i>expr</i>, ...)</dt> 1188 <dd> 1189 Format the expressions according to the <b>printf</b> format given by <i>fmt</i> and return the resulting string.</dd> 1190 <dt><b>sub(</b><i>ere</i>, <i>repl</i><b>[</b>, <i>in </i><b>]</b>)</dt> 1191 <dd> 1192 Substitute the string <i>repl</i> in place of the first instance of the extended regular expression <i>ERE</i> in string <i>in</i> 1193 and return the number of substitutions. An <ampersand> (<tt>'&'</tt>) appearing in the string <i>repl</i> shall be 1194 replaced by the string from <i>in</i> that matches the ERE. An <ampersand> preceded with a <backslash> shall be 1195 interpreted as the literal <ampersand> character. An occurrence of two consecutive <backslash> characters shall be 1196 interpreted as just a single literal <backslash> character. Any other occurrence of a <backslash> (for example, 1197 preceding any other character) shall be treated as a literal <backslash> character. Note that if <i>repl</i> is a string 1198 literal (the lexical token <b>STRING</b>; see <a href="#tag_20_06_13_16">Grammar</a> ), the handling of the <ampersand> 1199 character occurs after any lexical processing, including any lexical <backslash>-escape sequence processing. If <i>in</i> is 1200 specified and it is not an lvalue (see <a href="#tag_20_06_13_02">Expressions in awk</a> ), the behavior is undefined. If <i>in</i> 1201 is omitted, <i>awk</i> shall use the current record ($0) in its place.</dd> 1202 <dt><b>substr</b>(<i>s</i>, <i>m</i><b>[</b>, <i>n </i><b>]</b>)</dt> 1203 <dd> 1204 Return the at most <i>n</i>-character substring of <i>s</i> that begins at position <i>m</i>, numbering from 1. If <i>n</i> is 1205 omitted, or if <i>n</i> specifies more characters than are left in the string, the length of the substring shall be limited by the 1206 length of the string <i>s</i>.</dd> 1207 <dt><b>tolower</b>(<i>s</i>)</dt> 1208 <dd>Return a string based on the string <i>s</i>. Each character in <i>s</i> that is an uppercase letter specified to have a 1209 <b>tolower</b> mapping by the <i>LC_CTYPE</i> category of the current locale shall be replaced in the returned string by the 1210 lowercase letter specified by the mapping. Other characters in <i>s</i> shall be unchanged in the returned string.</dd> 1211 <dt><b>toupper</b>(<i>s</i>)</dt> 1212 <dd>Return a string based on the string <i>s</i>. Each character in <i>s</i> that is a lowercase letter specified to have a 1213 <b>toupper</b> mapping by the <i>LC_CTYPE</i> category of the current locale is replaced in the returned string by the uppercase 1214 letter specified by the mapping. Other characters in <i>s</i> are unchanged in the returned string.</dd> 1215 </dl> 1216 <p class="tent">All of the preceding functions that take <i>ERE</i> as a parameter expect a pattern or a string valued expression 1217 that is a regular expression as defined in <a href="#tag_20_06_13_04">Regular Expressions</a> .</p> 1218 <h5><a name="tag_20_06_13_14" id="tag_20_06_13_14"></a>Input/Output and General Functions</h5> 1219 <p class="tent">The input/output and general functions are:</p> 1220 <dl compact> 1221 <dd></dd> 1222 <dt><b>close</b>(<i>expression</i>)</dt> 1223 <dd> 1224 Close the file or pipe opened by a <b>print</b> or <b>printf</b> statement or a call to <b>getline</b> with the same string-valued 1225 <i>expression</i>. The limit on the number of open <i>expression</i> arguments is implementation-defined. If the close was 1226 successful, the function shall return zero; otherwise, it shall return non-zero.</dd> 1227 <dt><b>fflush</b>(<b>[</b><i>expression</i><b>]</b>)</dt> 1228 <dd> 1229 Write any unwritten data to the file or piped stream opened by a <b>print</b> or <b>printf</b> statement with the same 1230 string-valued <i>expression</i>. If no argument, or if <i>expression</i> evaluates to the null string, then write all such data for 1231 all such open files and piped streams, and standard output. 1232 <p class="tent">If <b>fflush</b> is successful, it shall return 0; otherwise, it shall return non-zero.</p> 1233 </dd> 1234 <dt><i>expression | </i><b>getline [</b><i>var</i><b>]</b></dt> 1235 <dd> 1236 Read a record of input from a stream piped from the output of a command. The stream shall be created if no stream is currently open 1237 with the value of <i>expression</i> as its command name. The stream created shall be equivalent to one created by a call to the 1238 <a href="../functions/popen.html"><i>popen</i>()</a> function with the value of <i>expression</i> as the <i>command</i> argument 1239 and a value of <i>r</i> as the <i>mode</i> argument. As long as the stream remains open, subsequent calls in which 1240 <i>expression</i> evaluates to the same string value shall read subsequent records from the stream. The stream shall remain open 1241 until the <b>close</b> function is called with an expression that evaluates to the same string value. At that time, the stream 1242 shall be closed as if by a call to the <a href="../functions/pclose.html"><i>pclose</i>()</a> function. If <i>var</i> is omitted, 1243 $0 and <b>NF</b> shall be set; otherwise, <i>var</i> shall be set and, if appropriate, it shall be considered a numeric string (see 1244 <a href="#tag_20_06_13_02">Expressions in awk</a> ). 1245 <p class="tent">The <b>getline</b> operator can form ambiguous constructs when there are unparenthesized operators (including 1246 concatenate) to the left of the <tt>'|'</tt> (to the beginning of the expression containing <b>getline</b>). In the context of the 1247 <tt>'$'</tt> operator, <tt>'|'</tt> shall behave as if it had a lower precedence than <tt>'$'</tt>. The result of evaluating other 1248 operators is unspecified, and conforming applications shall parenthesize properly all such usages.</p> 1249 </dd> 1250 <dt><b>getline</b></dt> 1251 <dd>Set $0 to the next input record from the current input file. This form of <b>getline</b> shall set the <b>NF</b>, <b>NR</b>, 1252 and <b>FNR</b> variables.</dd> 1253 <dt><b>getline </b><i>var</i></dt> 1254 <dd>Set variable <i>var</i> to the next input record from the current input file and, if appropriate, <i>var</i> shall be 1255 considered a numeric string (see <a href="#tag_20_06_13_02">Expressions in awk</a> ). This form of <b>getline</b> shall set the 1256 <b>FNR</b> and <b>NR</b> variables.</dd> 1257 <dt><b>getline [</b><i>var</i><b>] </b>< <i>expression</i></dt> 1258 <dd> 1259 Read the next record of input from a named file. The <i>expression</i> shall be evaluated to produce a string that is used as a 1260 pathname. If the file of that name is not currently open, it shall be opened. As long as the stream remains open, subsequent calls 1261 in which <i>expression</i> evaluates to the same string value shall read subsequent records from the file. The file shall remain 1262 open until the <b>close</b> function is called with an expression that evaluates to the same string value. If <i>var</i> is 1263 omitted, $0 and <b>NF</b> shall be set; otherwise, <i>var</i> shall be set and, if appropriate, it shall be considered a numeric 1264 string (see <a href="#tag_20_06_13_02">Expressions in awk</a> ). 1265 <p class="tent">The <b>getline</b> operator can form ambiguous constructs when there are unparenthesized binary operators 1266 (including concatenate) to the right of the <tt>'<'</tt> (up to the end of the expression containing the <b>getline</b>). The 1267 result of evaluating such a construct is unspecified, and conforming applications shall parenthesize properly all such usages.</p> 1268 </dd> 1269 <dt><b>system</b>(<i>expression</i>)</dt> 1270 <dd> 1271 Execute the command given by <i>expression</i> in a manner equivalent to the <a href="../functions/system.html"><i>system</i>()</a> 1272 function defined in the System Interfaces volume of POSIX.1-2024 and return the exit status of the command.</dd> 1273 </dl> 1274 <p class="tent">All forms of <b>getline</b> shall return 1 for successful input, zero for end-of-file, and -1 for an error.</p> 1275 <p class="tent">Where strings are used as the name of a file or pipeline, the application shall ensure that the strings are 1276 textually identical. The terminology "same string value" implies that "equivalent strings", even those that differ only by 1277 <space> characters, represent different files.</p> 1278 <h5><a name="tag_20_06_13_15" id="tag_20_06_13_15"></a>User-Defined Functions</h5> 1279 <p class="tent">The <i>awk</i> language also provides user-defined functions. Such functions can be defined as:</p> 1280 <pre> 1281 <tt>function </tt><i>name</i><tt>(</tt><b>[</b><i>parameter</i><tt>, ...</tt><b>]</b><tt>) { </tt><i>statements</i><tt> } 1282 </tt></pre> 1283 <p class="tent">A function can be referred to anywhere in an <i>awk</i> program; in particular, its use can precede its definition. 1284 The scope of a function is global.</p> 1285 <p class="tent">The number of parameters in the function definition need not match the number of parameters in the function call. 1286 Excess formal parameters can be used as local variables. If fewer arguments are supplied in a function call than are in the 1287 function definition, the extra parameters that are used in the function body as scalars shall evaluate to the uninitialized value 1288 until they are otherwise initialized, and the extra parameters that are used in the function body as arrays shall be treated as 1289 uninitialized arrays where each element evaluates to the uninitialized value until otherwise initialized.</p> 1290 <p class="tent">When invoking a function, no white space can be placed between the function name and the opening parenthesis. 1291 Function calls can be nested and recursive calls can be made upon functions. Upon return from any nested or recursive function 1292 call, the values of all of the calling function's parameters shall be unchanged, except for array parameters passed by reference. 1293 The <b>return</b> statement can be used to return a value. If a <b>return</b> statement appears outside of a function definition, 1294 the behavior is undefined.</p> 1295 <p class="tent">In the function definition, <newline> characters shall be optional before the opening brace and after the 1296 closing brace. Function definitions can appear anywhere in the program where a <i>pattern-action</i> pair is allowed.</p> 1297 <h5><a name="tag_20_06_13_16" id="tag_20_06_13_16"></a>Grammar</h5> 1298 <p class="tent">The grammar in this section and the lexical conventions in the following section shall together describe the syntax 1299 for <i>awk</i> programs. The general conventions for this style of grammar are described in <a href= 1300 "../utilities/V3_chap01.html#tag_18_03"><i>1.3 Grammar Conventions</i></a> . A valid program can be represented as the non-terminal 1301 symbol <i>program</i> in the grammar. This formal syntax shall take precedence over the preceding text syntax description.</p> 1302 <pre> 1303 <tt>%token NAME NUMBER STRING ERE 1304 %token FUNC_NAME /* Name followed by '(' without white space. */ 1305 <br class="tent"> 1306 /* Keywords */ 1307 %token Begin End 1308 /* 'BEGIN' 'END' */ 1309 <br class="tent"> 1310 %token Break Continue Delete Do Else 1311 /* 'break' 'continue' 'delete' 'do' 'else' */ 1312 <br class="tent"> 1313 %token Exit For Function If In Next 1314 /* 'exit' 'for' 'function' 'if' 'in' 'next' */ 1315 <br class="tent"> 1316 %token Nextfile Print Printf Return While 1317 /* 'nextfile' 'print' 'printf' 'return' 'while' */ 1318 <br class="tent"> 1319 /* Reserved function names */ 1320 %token BUILTIN_FUNC_NAME 1321 /* One token for the following: 1322 * atan2 cos sin exp log sqrt int rand srand 1323 * gsub index length match split sprintf sub 1324 * substr tolower toupper close fflush system 1325 */ 1326 %token GETLINE 1327 /* Syntactically different from other built-ins. */ 1328 <br class="tent"> 1329 /* Two-character tokens. */ 1330 %token ADD_ASSIGN SUB_ASSIGN MUL_ASSIGN DIV_ASSIGN MOD_ASSIGN POW_ASSIGN 1331 /* '+=' '-=' '*=' '/=' '%=' '^=' */ 1332 <br class="tent"> 1333 %token OR AND NO_MATCH EQ LE GE NE INCR DECR APPEND 1334 /* '||' '&&' '!~' '==' '<=' '>=' '!=' '++' '--' '>>' */ 1335 <br class="tent"> 1336 /* One-character tokens. */ 1337 %token '{' '}' '(' ')' '[' ']' ',' ';' NEWLINE 1338 %token '+' '-' '*' '%' '^' '!' '>' '<' '|' '?' ':' '~' '$' '=' 1339 <br class="tent"> 1340 %start program 1341 %% 1342 <br class="tent"> 1343 program : item_list 1344 | item_list item 1345 ; 1346 <br class="tent"> 1347 item_list : /* empty */ 1348 | item_list item terminator 1349 ; 1350 <br class="tent"> 1351 item : action 1352 | pattern action 1353 | normal_pattern 1354 | Function NAME '(' param_list_opt ')' 1355 newline_opt action 1356 | Function FUNC_NAME '(' param_list_opt ')' 1357 newline_opt action 1358 ; 1359 <br class="tent"> 1360 param_list_opt : /* empty */ 1361 | param_list 1362 ; 1363 <br class="tent"> 1364 param_list : NAME 1365 | param_list ',' NAME 1366 ; 1367 <br class="tent"> 1368 pattern : normal_pattern 1369 | special_pattern 1370 ; 1371 <br class="tent"> 1372 normal_pattern : expr 1373 | expr ',' newline_opt expr 1374 ; 1375 <br class="tent"> 1376 special_pattern : Begin 1377 | End 1378 ; 1379 <br class="tent"> 1380 action : '{' newline_opt '}' 1381 | '{' newline_opt terminated_statement_list '}' 1382 | '{' newline_opt unterminated_statement_list '}' 1383 ; 1384 <br class="tent"> 1385 terminator : terminator NEWLINE 1386 | ';' 1387 | NEWLINE 1388 ; 1389 <br class="tent"> 1390 terminated_statement_list : terminated_statement 1391 | terminated_statement_list terminated_statement 1392 ; 1393 <br class="tent"> 1394 unterminated_statement_list : unterminated_statement 1395 | terminated_statement_list unterminated_statement 1396 ; 1397 <br class="tent"> 1398 terminated_statement : action newline_opt 1399 | If '(' expr ')' newline_opt terminated_statement 1400 | If '(' expr ')' newline_opt terminated_statement 1401 Else newline_opt terminated_statement 1402 | While '(' expr ')' newline_opt terminated_statement 1403 | For '(' simple_statement_opt ';' 1404 expr_opt ';' simple_statement_opt ')' newline_opt 1405 terminated_statement 1406 | For '(' NAME In NAME ')' newline_opt 1407 terminated_statement 1408 | ';' newline_opt 1409 | terminatable_statement NEWLINE newline_opt 1410 | terminatable_statement ';' newline_opt 1411 ; 1412 <br class="tent"> 1413 unterminated_statement : terminatable_statement 1414 | If '(' expr ')' newline_opt unterminated_statement 1415 | If '(' expr ')' newline_opt terminated_statement 1416 Else newline_opt unterminated_statement 1417 | While '(' expr ')' newline_opt unterminated_statement 1418 | For '(' simple_statement_opt ';' 1419 expr_opt ';' simple_statement_opt ')' newline_opt 1420 unterminated_statement 1421 | For '(' NAME In NAME ')' newline_opt 1422 unterminated_statement 1423 ; 1424 <br class="tent"> 1425 terminatable_statement : simple_statement 1426 | Break 1427 | Continue 1428 | Next 1429 | Nextfile 1430 | Exit expr_opt 1431 | Return expr_opt 1432 | Do newline_opt terminated_statement While '(' expr ')' 1433 ; 1434 <br class="tent"> 1435 simple_statement_opt : /* empty */ 1436 | simple_statement 1437 ; 1438 <br class="tent"> 1439 simple_statement : Delete NAME '[' expr_list ']' 1440 | Delete NAME 1441 | expr 1442 | print_statement 1443 ; 1444 <br class="tent"> 1445 print_statement : simple_print_statement 1446 | simple_print_statement output_redirection 1447 ; 1448 <br class="tent"> 1449 simple_print_statement : Print print_expr_list_opt 1450 | Print '(' multiple_expr_list ')' 1451 | Printf print_expr_list 1452 | Printf '(' multiple_expr_list ')' 1453 ; 1454 <br class="tent"> 1455 output_redirection : '>' expr 1456 | APPEND expr 1457 | '|' expr 1458 ; 1459 <br class="tent"> 1460 expr_list_opt : /* empty */ 1461 | expr_list 1462 ; 1463 <br class="tent"> 1464 expr_list : expr 1465 | multiple_expr_list 1466 ; 1467 <br class="tent"> 1468 multiple_expr_list : expr ',' newline_opt expr 1469 | multiple_expr_list ',' newline_opt expr 1470 ; 1471 <br class="tent"> 1472 expr_opt : /* empty */ 1473 | expr 1474 ; 1475 <br class="tent"> 1476 expr : unary_expr 1477 | non_unary_expr 1478 ; 1479 <br class="tent"> 1480 unary_expr : '+' expr 1481 | '-' expr 1482 | unary_expr '^' expr 1483 | unary_expr '*' expr 1484 | unary_expr '/' expr 1485 | unary_expr '%' expr 1486 | unary_expr '+' expr 1487 | unary_expr '-' expr 1488 | unary_expr non_unary_expr 1489 | unary_expr '<' expr 1490 | unary_expr LE expr 1491 | unary_expr NE expr 1492 | unary_expr EQ expr 1493 | unary_expr '>' expr 1494 | unary_expr GE expr 1495 | unary_expr '~' expr 1496 | unary_expr NO_MATCH expr 1497 | unary_expr In NAME 1498 | unary_expr AND newline_opt expr 1499 | unary_expr OR newline_opt expr 1500 | unary_expr '?' expr ':' expr 1501 | unary_input_function 1502 ; 1503 <br class="tent"> 1504 non_unary_expr : '(' expr ')' 1505 | '!' expr 1506 | non_unary_expr '^' expr 1507 | non_unary_expr '*' expr 1508 | non_unary_expr '/' expr 1509 | non_unary_expr '%' expr 1510 | non_unary_expr '+' expr 1511 | non_unary_expr '-' expr 1512 | non_unary_expr non_unary_expr 1513 | non_unary_expr '<' expr 1514 | non_unary_expr LE expr 1515 | non_unary_expr NE expr 1516 | non_unary_expr EQ expr 1517 | non_unary_expr '>' expr 1518 | non_unary_expr GE expr 1519 | non_unary_expr '~' expr 1520 | non_unary_expr NO_MATCH expr 1521 | non_unary_expr In NAME 1522 | '(' multiple_expr_list ')' In NAME 1523 | non_unary_expr AND newline_opt expr 1524 | non_unary_expr OR newline_opt expr 1525 | non_unary_expr '?' expr ':' expr 1526 | NUMBER 1527 | STRING 1528 | lvalue 1529 | ERE 1530 | lvalue INCR 1531 | lvalue DECR 1532 | INCR lvalue 1533 | DECR lvalue 1534 | lvalue POW_ASSIGN expr 1535 | lvalue MOD_ASSIGN expr 1536 | lvalue MUL_ASSIGN expr 1537 | lvalue DIV_ASSIGN expr 1538 | lvalue ADD_ASSIGN expr 1539 | lvalue SUB_ASSIGN expr 1540 | lvalue '=' expr 1541 | FUNC_NAME '(' expr_list_opt ')' 1542 /* no white space allowed before '(' */ 1543 | BUILTIN_FUNC_NAME '(' expr_list_opt ')' 1544 | BUILTIN_FUNC_NAME 1545 | non_unary_input_function 1546 ; 1547 <br class="tent"> 1548 print_expr_list_opt : /* empty */ 1549 | print_expr_list 1550 ; 1551 <br class="tent"> 1552 print_expr_list : print_expr 1553 | print_expr_list ',' newline_opt print_expr 1554 ; 1555 <br class="tent"> 1556 print_expr : unary_print_expr 1557 | non_unary_print_expr 1558 ; 1559 <br class="tent"> 1560 unary_print_expr : '+' print_expr 1561 | '-' print_expr 1562 | unary_print_expr '^' print_expr 1563 | unary_print_expr '*' print_expr 1564 | unary_print_expr '/' print_expr 1565 | unary_print_expr '%' print_expr 1566 | unary_print_expr '+' print_expr 1567 | unary_print_expr '-' print_expr 1568 | unary_print_expr non_unary_print_expr 1569 | unary_print_expr '~' print_expr 1570 | unary_print_expr NO_MATCH print_expr 1571 | unary_print_expr In NAME 1572 | unary_print_expr AND newline_opt print_expr 1573 | unary_print_expr OR newline_opt print_expr 1574 | unary_print_expr '?' print_expr ':' print_expr 1575 ; 1576 <br class="tent"> 1577 non_unary_print_expr : '(' expr ')' 1578 | '!' print_expr 1579 | non_unary_print_expr '^' print_expr 1580 | non_unary_print_expr '*' print_expr 1581 | non_unary_print_expr '/' print_expr 1582 | non_unary_print_expr '%' print_expr 1583 | non_unary_print_expr '+' print_expr 1584 | non_unary_print_expr '-' print_expr 1585 | non_unary_print_expr non_unary_print_expr 1586 | non_unary_print_expr '~' print_expr 1587 | non_unary_print_expr NO_MATCH print_expr 1588 | non_unary_print_expr In NAME 1589 | '(' multiple_expr_list ')' In NAME 1590 | non_unary_print_expr AND newline_opt print_expr 1591 | non_unary_print_expr OR newline_opt print_expr 1592 | non_unary_print_expr '?' print_expr ':' print_expr 1593 | NUMBER 1594 | STRING 1595 | lvalue 1596 | ERE 1597 | lvalue INCR 1598 | lvalue DECR 1599 | INCR lvalue 1600 | DECR lvalue 1601 | lvalue POW_ASSIGN print_expr 1602 | lvalue MOD_ASSIGN print_expr 1603 | lvalue MUL_ASSIGN print_expr 1604 | lvalue DIV_ASSIGN print_expr 1605 | lvalue ADD_ASSIGN print_expr 1606 | lvalue SUB_ASSIGN print_expr 1607 | lvalue '=' print_expr 1608 | FUNC_NAME '(' expr_list_opt ')' 1609 /* no white space allowed before '(' */ 1610 | BUILTIN_FUNC_NAME '(' expr_list_opt ')' 1611 | BUILTIN_FUNC_NAME 1612 ; 1613 <br class="tent"> 1614 lvalue : NAME 1615 | NAME '[' expr_list ']' 1616 | '$' expr 1617 ; 1618 <br class="tent"> 1619 non_unary_input_function : simple_get 1620 | simple_get '<' expr 1621 | non_unary_expr '|' simple_get 1622 ; 1623 <br class="tent"> 1624 unary_input_function : unary_expr '|' simple_get 1625 ; 1626 <br class="tent"> 1627 simple_get : GETLINE 1628 | GETLINE lvalue 1629 ; 1630 <br class="tent"> 1631 newline_opt : /* empty */ 1632 | newline_opt NEWLINE 1633 ; 1634 </tt></pre> 1635 <p class="tent">This grammar has several ambiguities that shall be resolved as follows:</p> 1636 <ul> 1637 <li class="tent">Operator precedence and associativity shall be as described in <a href="#tagtcjh_14">Expressions in Decreasing 1638 Precedence in awk</a> .</li> 1639 <li class="tent">In case of ambiguity, an <b>else</b> shall be associated with the most immediately preceding <b>if</b> that would 1640 satisfy the grammar.</li> 1641 <li class="tent">In some contexts, a <slash> (<tt>'/'</tt>) that is used to surround an ERE could also be the division 1642 operator. This shall be resolved in such a way that wherever the division operator could appear, a <slash> is assumed to be 1643 the division operator. (There is no unary division operator.)</li> 1644 </ul> 1645 <p class="tent">Each expression in an <i>awk</i> program shall conform to the precedence and associativity rules, even when this is 1646 not needed to resolve an ambiguity. For example, because <tt>'$'</tt> has higher precedence than <tt>'++'</tt>, the string 1647 <tt>"$x++--"</tt> is not a valid <i>awk</i> expression, even though it is unambiguously parsed by the grammar as 1648 <tt>"$(x++)--"</tt>.</p> 1649 <p class="tent">One convention that might not be obvious from the formal grammar is where <newline> characters are 1650 acceptable. There are several obvious placements such as terminating a statement, and a <backslash> can be used to escape 1651 <newline> characters between any lexical tokens. In addition, <newline> characters without <backslash> characters 1652 can follow a comma, an open brace, logical AND operator (<tt>"&&"</tt>), logical OR operator (<tt>"||"</tt>), the <b>do</b> 1653 keyword, the <b>else</b> keyword, and the closing parenthesis of an <b>if</b>, <b>for</b>, or <b>while</b> statement. For 1654 example:</p> 1655 <pre> 1656 <tt>{ print $1, 1657 $2 } 1658 </tt></pre> 1659 <h5><a name="tag_20_06_13_17" id="tag_20_06_13_17"></a>Lexical Conventions</h5> 1660 <p class="tent">The lexical conventions for <i>awk</i> programs, with respect to the preceding grammar, shall be as follows:</p> 1661 <ol> 1662 <li class="tent">Except as noted, <i>awk</i> shall recognize the longest possible token or delimiter beginning at a given 1663 point.</li> 1664 <li class="tent">A comment shall consist of any characters beginning with the <number-sign> character and terminated by, but 1665 excluding the next occurrence of, a <newline>. Comments shall have no effect, except to delimit lexical tokens.</li> 1666 <li class="tent">The <newline> shall be recognized as the token <b>NEWLINE</b>.</li> 1667 <li class="tent">A <backslash> character immediately followed by a <newline> shall have no effect.</li> 1668 <li class="tent">The token <b>STRING</b> shall represent a string constant. A string constant shall begin with the character 1669 <tt>'"'</tt>. Within a string constant, a <backslash> character shall be considered to begin an escape sequence as specified 1670 in the table in XBD <a href="../basedefs/V1_chap05.html#tag_05"><i>5. File Format Notation</i></a> (<tt>'\\'</tt>, <tt>'\a'</tt>, 1671 <tt>'\b'</tt>, <tt>'\f'</tt>, <tt>'\n'</tt>, <tt>'\r'</tt>, <tt>'\t'</tt>, <tt>'\v'</tt>). In addition, the escape sequences in 1672 <a href="#tagtcjh_15">Escape Sequences in awk</a> shall be recognized. A <newline> shall not occur within a string constant. 1673 A string constant shall be terminated by the first unescaped occurrence of the character <tt>'"'</tt> after the one that begins the 1674 string constant. The value of the string shall be the sequence of all unescaped characters and values of escape sequences between, 1675 but not including, the two delimiting <tt>'"'</tt> characters.</li> 1676 <li class="tent">The token <b>ERE</b> represents an extended regular expression constant. An ERE constant shall begin with the 1677 <slash> character. Within an ERE constant, a <backslash> character shall be considered to begin an escape sequence as 1678 specified in the table in XBD <a href="../basedefs/V1_chap05.html#tag_05"><i>5. File Format Notation</i></a> . In addition, the 1679 escape sequences in <a href="#tagtcjh_15">Escape Sequences in awk</a> shall be recognized. The application shall ensure that a 1680 <newline> does not occur within an ERE constant. An ERE constant shall be terminated by the first unescaped occurrence of the 1681 <slash> character after the one that begins the ERE constant. The extended regular expression represented by the ERE constant 1682 shall be the sequence of all unescaped characters and values of escape sequences between, but not including, the two delimiting 1683 <slash> characters.</li> 1684 <li class="tent">A <blank> shall have no effect, except to delimit lexical tokens or within <b>STRING</b> or <b>ERE</b> 1685 tokens.</li> 1686 <li class="tent">The token <b>NUMBER</b> shall represent a numeric constant. Its form and numeric value shall either be equivalent 1687 to the <b>decimal-floating-constant</b> token as specified by the ISO C standard, or it shall be a sequence of decimal digits 1688 and shall be evaluated as an integer constant in decimal. In addition, implementations may accept numeric constants with the form 1689 and numeric value equivalent to the <b>hexadecimal-constant</b> and <b>hexadecimal-floating-constant</b> tokens as specified by the 1690 ISO C standard. Note that these forms do not use the radix character from the current locale; they always use a 1691 <period>. 1692 <p class="tent">If the value is too large or too small to be representable (see <a href= 1693 "../utilities/V3_chap01.html#tag_18_01_02"><i>1.1.2 Concepts Derived from the ISO C Standard</i></a> ), the behavior is 1694 undefined.</p> 1695 </li> 1696 <li class="tent">A sequence of underscores, digits, and alphabetics from the portable character set (see XBD <a href= 1697 "../basedefs/V1_chap06.html#tag_06_01"><i>6.1 Portable Character Set</i></a> ), beginning with an <underscore> or alphabetic 1698 character, shall be considered a word.</li> 1699 <li class="tent">The following words are keywords that shall be recognized as individual tokens; the name of the token is the same 1700 as the keyword: 1701 <table cellpadding="3"> 1702 <tr valign="top"> 1703 <td align="left"> 1704 <p class="tent"><b><br> 1705 BEGIN<br> 1706 break<br> 1707 continue<br></b></p> 1708 </td> 1709 <td align="left"> 1710 <p class="tent"><b><br> 1711 delete<br> 1712 do<br> 1713 else<br></b></p> 1714 </td> 1715 <td align="left"> 1716 <p class="tent"><b><br> 1717 END<br> 1718 exit<br> 1719 for<br></b></p> 1720 </td> 1721 <td align="left"> 1722 <p class="tent"><b><br> 1723 function<br> 1724 getline<br> 1725 if<br></b></p> 1726 </td> 1727 <td align="left"> 1728 <p class="tent"><b><br> 1729 in<br> 1730 next<br> 1731 nextfile<br></b></p> 1732 </td> 1733 <td align="left"> 1734 <p class="tent"><b><br> 1735 print<br> 1736 printf<br> 1737 return<br></b></p> 1738 </td> 1739 <td align="left"> 1740 <p class="tent"><b><br> 1741 while<br></b></p> 1742 </td> 1743 </tr> 1744 </table> 1745 </li> 1746 <li class="tent">The following words are names of built-in functions and shall be recognized as the token <b>BUILTIN_FUNC_NAME</b>: 1747 <table cellpadding="3"> 1748 <tr valign="top"> 1749 <td align="left"> 1750 <p class="tent"><b><br> 1751 atan2<br> 1752 close<br> 1753 cos<br> 1754 exp<br></b></p> 1755 </td> 1756 <td align="left"> 1757 <p class="tent"><b><br> 1758 fflush<br> 1759 gsub<br> 1760 index<br></b></p> 1761 </td> 1762 <td align="left"> 1763 <p class="tent"><b><br> 1764 int<br> 1765 length<br> 1766 log<br></b></p> 1767 </td> 1768 <td align="left"> 1769 <p class="tent"><b><br> 1770 match<br> 1771 rand<br> 1772 sin<br></b></p> 1773 </td> 1774 <td align="left"> 1775 <p class="tent"><b><br> 1776 split<br> 1777 sprintf<br> 1778 sqrt<br></b></p> 1779 </td> 1780 <td align="left"> 1781 <p class="tent"><b><br> 1782 srand<br> 1783 sub<br> 1784 substr<br></b></p> 1785 </td> 1786 <td align="left"> 1787 <p class="tent"><b><br> 1788 system<br> 1789 tolower<br> 1790 toupper<br></b></p> 1791 </td> 1792 </tr> 1793 </table> 1794 <p class="tent">The above-listed keywords and names of built-in functions are considered reserved words.</p> 1795 </li> 1796 <li class="tent">The token <b>NAME</b> shall consist of a word that is not a keyword or a name of a built-in function and is not 1797 followed immediately (without any delimiters) by the <tt>'('</tt> character.</li> 1798 <li class="tent">The token <b>FUNC_NAME</b> shall consist of a word that is not a keyword or a name of a built-in function, 1799 followed immediately (without any delimiters) by the <tt>'('</tt> character. The <tt>'('</tt> character shall not be included as 1800 part of the token.</li> 1801 <li class="tent">The following two-character sequences shall be recognized as the named tokens: 1802 <center> 1803 <table border="1" cellpadding="3" align="center"> 1804 <tr valign="top"> 1805 <th align="center"> 1806 <p class="tent"><b>Token Name</b></p> 1807 </th> 1808 <th align="center"> 1809 <p class="tent"><b>Sequence</b></p> 1810 </th> 1811 <th align="center"> 1812 <p class="tent"><b>Token Name</b></p> 1813 </th> 1814 <th align="center"> 1815 <p class="tent"><b>Sequence</b></p> 1816 </th> 1817 </tr> 1818 <tr valign="top"> 1819 <td align="left"> 1820 <p class="tent"><b>ADD_ASSIGN</b></p> 1821 </td> 1822 <td align="center"> 1823 <p class="tent">+=</p> 1824 </td> 1825 <td align="left"> 1826 <p class="tent"><b>NO_MATCH</b></p> 1827 </td> 1828 <td align="center"> 1829 <p class="tent">!~</p> 1830 </td> 1831 </tr> 1832 <tr valign="top"> 1833 <td align="left"> 1834 <p class="tent"><b>SUB_ASSIGN</b></p> 1835 </td> 1836 <td align="center"> 1837 <p class="tent">-=</p> 1838 </td> 1839 <td align="left"> 1840 <p class="tent"><b>EQ</b></p> 1841 </td> 1842 <td align="center"> 1843 <p class="tent">==</p> 1844 </td> 1845 </tr> 1846 <tr valign="top"> 1847 <td align="left"> 1848 <p class="tent"><b>MUL_ASSIGN</b></p> 1849 </td> 1850 <td align="center"> 1851 <p class="tent">*=</p> 1852 </td> 1853 <td align="left"> 1854 <p class="tent"><b>LE</b></p> 1855 </td> 1856 <td align="center"> 1857 <p class="tent"><=</p> 1858 </td> 1859 </tr> 1860 <tr valign="top"> 1861 <td align="left"> 1862 <p class="tent"><b>DIV_ASSIGN</b></p> 1863 </td> 1864 <td align="center"> 1865 <p class="tent">/=</p> 1866 </td> 1867 <td align="left"> 1868 <p class="tent"><b>GE</b></p> 1869 </td> 1870 <td align="center"> 1871 <p class="tent">>=</p> 1872 </td> 1873 </tr> 1874 <tr valign="top"> 1875 <td align="left"> 1876 <p class="tent"><b>MOD_ASSIGN</b></p> 1877 </td> 1878 <td align="center"> 1879 <p class="tent">%=</p> 1880 </td> 1881 <td align="left"> 1882 <p class="tent"><b>NE</b></p> 1883 </td> 1884 <td align="center"> 1885 <p class="tent">!=</p> 1886 </td> 1887 </tr> 1888 <tr valign="top"> 1889 <td align="left"> 1890 <p class="tent"><b>POW_ASSIGN</b></p> 1891 </td> 1892 <td align="center"> 1893 <p class="tent">^=</p> 1894 </td> 1895 <td align="left"> 1896 <p class="tent"><b>INCR</b></p> 1897 </td> 1898 <td align="center"> 1899 <p class="tent">++</p> 1900 </td> 1901 </tr> 1902 <tr valign="top"> 1903 <td align="left"> 1904 <p class="tent"><b>OR</b></p> 1905 </td> 1906 <td align="center"> 1907 <p class="tent">||</p> 1908 </td> 1909 <td align="left"> 1910 <p class="tent"><b>DECR</b></p> 1911 </td> 1912 <td align="center"> 1913 <p class="tent">--</p> 1914 </td> 1915 </tr> 1916 <tr valign="top"> 1917 <td align="left"> 1918 <p class="tent"><b>AND</b></p> 1919 </td> 1920 <td align="center"> 1921 <p class="tent">&&</p> 1922 </td> 1923 <td align="left"> 1924 <p class="tent"><b>APPEND</b></p> 1925 </td> 1926 <td align="center"> 1927 <p class="tent">>></p> 1928 </td> 1929 </tr> 1930 </table> 1931 </center> 1932 </li> 1933 <li class="tent">The following single characters shall be recognized as tokens whose names are the character: 1934 <pre> 1935 <tt><newline> { } ( ) [ ] , ; + - * % ^ ! > < | ? : ~ $ = 1936 </tt></pre></li> 1937 </ol> 1938 <p class="tent">There is a lexical ambiguity between the token <b>ERE</b> and the tokens <tt>'/'</tt> and <b>DIV_ASSIGN</b>. When 1939 an input sequence begins with a <slash> character in any syntactic context where the token <tt>'/'</tt> or <b>DIV_ASSIGN</b> 1940 could appear as the next token in a valid program, the longer of those two tokens that can be recognized shall be recognized. In 1941 any other syntactic context where the token <b>ERE</b> could appear as the next token in a valid program, the token <b>ERE</b> 1942 shall be recognized.</p> 1943 </blockquote> 1944 <h4 class="mansect"><a name="tag_20_06_14" id="tag_20_06_14"></a>EXIT STATUS</h4> 1945 <blockquote> 1946 <p>The following exit values shall be returned:</p> 1947 <dl compact> 1948 <dd></dd> 1949 <dt> 0</dt> 1950 <dd>All input files were processed successfully.</dd> 1951 <dt>>0</dt> 1952 <dd>An error occurred.</dd> 1953 </dl> 1954 <p class="tent">The exit status can be altered within the program by using an <b>exit</b> expression.</p> 1955 </blockquote> 1956 <h4 class="mansect"><a name="tag_20_06_15" id="tag_20_06_15"></a>CONSEQUENCES OF ERRORS</h4> 1957 <blockquote> 1958 <p>If any <i>file</i> operand is specified and the named file cannot be accessed, <i>awk</i> shall write a diagnostic message to 1959 standard error and terminate without any further action.</p> 1960 <p class="tent">If the program specified by either the <i>program</i> operand or a <i>progfile</i> operand is not a valid 1961 <i>awk</i> program (as specified in the EXTENDED DESCRIPTION section), the behavior is undefined.</p> 1962 </blockquote> 1963 <hr> 1964 <div class="box"><em>The following sections are informative.</em></div> 1965 <h4 class="mansect"><a name="tag_20_06_16" id="tag_20_06_16"></a>APPLICATION USAGE</h4> 1966 <blockquote> 1967 <p>Since <backslash> has a special meaning both in the <i>assignment</i> option-argument to the <b>-v</b> option and in the 1968 <i>assignment</i> operand, applications that need to pass strings to <i>awk</i> without special interpretation of <backslash> 1969 should not use these methods but should instead make use of the <b>ARGV</b> or <b>ENVIRON</b> array.</p> 1970 <p class="tent">The <b>index</b>, <b>length</b>, <b>match</b>, and <b>substr</b> functions should not be confused with similar 1971 functions in the ISO C standard; the <i>awk</i> versions deal with characters, while the ISO C standard deals with 1972 bytes.</p> 1973 <p class="tent">Because the concatenation operation is represented by adjacent expressions rather than an explicit operator, it is 1974 often necessary to use parentheses to enforce the proper evaluation precedence.</p> 1975 <p class="tent">When using <i>awk</i> to process pathnames, it is recommended that LC_ALL, or at least LC_CTYPE and LC_COLLATE, are 1976 set to POSIX or C in the environment, since pathnames can contain byte sequences that do not form valid characters in some locales, 1977 in which case the utility's behavior would be undefined. In the POSIX locale each byte is a valid single-byte character, and 1978 therefore this problem is avoided.</p> 1979 <p class="tent">Since the <tt>"=="</tt> operator checks if strings are identical, not whether they collate equally, applications 1980 needing to check whether strings collate equally can use:</p> 1981 <pre> 1982 <tt>a <= b && a >= b 1983 </tt></pre> 1984 <p class="tent">To specify a <i>file</i> operand naming a file with a name containing an <equals-sign>, users can use 1985 <tt>"./"</tt> as the first two characters of a relative file pathname that starts with an <underscore> or an alphabetic 1986 character to keep the <i>file</i> operand from being interpreted as an <i>assignment</i> operand. Similarly, <tt>"./-"</tt> can be 1987 used to access a file named <tt>'-'</tt> in the current directory rather than use standard input.</p> 1988 </blockquote> 1989 <h4 class="mansect"><a name="tag_20_06_17" id="tag_20_06_17"></a>EXAMPLES</h4> 1990 <blockquote> 1991 <p>The <i>awk</i> program specified in the command line is most easily specified within single-quotes (for example, 1992 '<i>program</i>') for applications using <a href="../utilities/sh.html"><i>sh</i></a>, because <i>awk</i> programs commonly contain 1993 characters that are special to the shell, including double-quotes. In the cases where an <i>awk</i> program contains single-quote 1994 characters, it is usually easiest to specify most of the program as strings within single-quotes concatenated by the shell with 1995 quoted single-quote characters. For example:</p> 1996 <pre> 1997 <tt>awk '/'\''/ { print "quote:", $0 }' 1998 </tt></pre> 1999 <p class="tent">prints all lines from the standard input containing a single-quote character, prefixed with <i>quote</i>:.</p> 2000 <p class="tent">The following are examples of simple <i>awk</i> programs:</p> 2001 <ol> 2002 <li class="tent">Write to the standard output all input lines for which field 3 is greater than 5: 2003 <pre> 2004 <tt>$3 > 5 2005 </tt></pre></li> 2006 <li class="tent">Write every tenth line: 2007 <pre> 2008 <tt>(NR % 10) == 0 2009 </tt></pre></li> 2010 <li class="tent">Write any line with a substring matching the regular expression: 2011 <pre> 2012 <tt>/(G|D)(2[0-9][[:alpha:]]*)/ 2013 </tt></pre></li> 2014 <li class="tent">Print any line with a substring containing a <tt>'G'</tt> or <tt>'D'</tt>, followed by a sequence of digits and 2015 characters. This example uses character classes <b>digit</b> and <b>alpha</b> to match language-independent digit and alphabetic 2016 characters respectively: 2017 <pre> 2018 <tt>/(G|D)([[:digit:][:alpha:]]*)/ 2019 </tt></pre></li> 2020 <li class="tent">Write any line in which the second field matches the regular expression and the fourth field does not: 2021 <pre> 2022 <tt>$2 ~ /xyz/ && $4 !~ /xyz/ 2023 </tt></pre></li> 2024 <li class="tent">Write any line in which the second field contains a <backslash>: 2025 <pre> 2026 <tt>$2 ~ /\\/ 2027 </tt></pre></li> 2028 <li class="tent">Write any line in which the second field contains a <backslash>. Note that <backslash>-escapes are 2029 interpreted twice; once in lexical processing of the string and once in processing the regular expression: 2030 <pre> 2031 <tt>$2 ~ "\\\\" 2032 </tt></pre></li> 2033 <li class="tent">Write the second to the last and the last field in each line. Separate the fields by a <colon>: 2034 <pre> 2035 <tt>{OFS=":";print $(NF-1), $NF} 2036 </tt></pre></li> 2037 <li class="tent">Write the line number and number of fields in each line. The three strings representing the line number, the 2038 <colon>, and the number of fields are concatenated and that string is written to standard output: 2039 <pre> 2040 <tt>{print NR ":" NF} 2041 </tt></pre></li> 2042 <li class="tent">Write lines longer than 72 characters: 2043 <pre> 2044 <tt>length($0) > 72 2045 </tt></pre></li> 2046 <li class="tent">Write the first two fields in opposite order separated by <b>OFS</b>: 2047 <pre> 2048 <tt>{ print $2, $1 } 2049 </tt></pre></li> 2050 <li class="tent">Same, with input fields separated by a <comma> or <space> and <tab> characters, or both: 2051 <pre> 2052 <tt>BEGIN { FS = ",[ \t]*|[ \t]+" } 2053 { print $2, $1 } 2054 </tt></pre></li> 2055 <li class="tent">Add up the first column, print sum, and average: 2056 <pre> 2057 <tt> {s += $1 } 2058 END {print "sum is ", s, " average is", s/NR} 2059 </tt></pre></li> 2060 <li class="tent">Write fields in reverse order, one per line (many lines out for each line in): 2061 <pre> 2062 <tt>{ for (i = NF; i > 0; --i) print $i } 2063 </tt></pre></li> 2064 <li class="tent">Write all lines between occurrences of the strings <b>start</b> and <b>stop</b>: 2065 <pre> 2066 <tt>/start/, /stop/ 2067 </tt></pre></li> 2068 <li class="tent">Write all lines whose first field is different from the previous one: 2069 <pre> 2070 <tt>$1 != prev { print; prev = $1 } 2071 </tt></pre></li> 2072 <li class="tent">Simulate <a href="../utilities/echo.html"><i>echo</i></a>: 2073 <pre> 2074 <tt>BEGIN { 2075 for (i = 1; i < ARGC; ++i) 2076 printf("%s%s", ARGV[i], i==ARGC-1?"\n":" ") 2077 } 2078 </tt></pre></li> 2079 <li class="tent">Write the path prefixes contained in the <i>PATH</i> environment variable, one per line: 2080 <pre> 2081 <tt>BEGIN { 2082 n = split (ENVIRON["PATH"], path, ":") 2083 for (i = 1; i <= n; ++i) 2084 print path[i] 2085 } 2086 </tt></pre></li> 2087 <li class="tent">If there is a file named <b>input</b> containing page headers of the form: Page # 2088 <p class="tent">and a file named <b>program</b> that contains:</p> 2089 <pre> 2090 <tt>/Page/ { $2 = n++; } 2091 { print } 2092 </tt></pre> 2093 then the command line: 2094 <pre> 2095 <tt>awk -f program n=5 input 2096 </tt></pre> 2097 <p class="tent">prints the file <b>input</b>, filling in page numbers starting at 5.</p> 2098 </li> 2099 </ol> 2100 </blockquote> 2101 <h4 class="mansect"><a name="tag_20_06_18" id="tag_20_06_18"></a>RATIONALE</h4> 2102 <blockquote> 2103 <p>This description is based on the new <i>awk</i>, "nawk", (see the referenced <i>The AWK Programming Language</i>), which 2104 introduced a number of new features to the historical <i>awk</i>:</p> 2105 <ol> 2106 <li class="tent">New keywords: <b>delete</b>, <b>do</b>, <b>function</b>, <b>return</b></li> 2107 <li class="tent">New built-in functions: <b>atan2</b>, <b>close</b>, <b>cos</b>, <b>gsub</b>, <b>match</b>, <b>rand</b>, 2108 <b>sin</b>, <b>srand</b>, <b>sub</b>, <b>system</b></li> 2109 <li class="tent">New predefined variables: <b>FNR</b>, <b>ARGC</b>, <b>ARGV</b>, <b>RSTART</b>, <b>RLENGTH</b>, <b>SUBSEP</b></li> 2110 <li class="tent">New expression operators: <b>?</b>, <b>:</b>, <b>,</b>, <b>^</b></li> 2111 <li class="tent">The <b>FS</b> variable and the third argument to <b>split</b>, now treated as extended regular expressions.</li> 2112 <li class="tent">The operator precedence, changed to more closely match the C language. Two examples of code that operate 2113 differently are: 2114 <pre> 2115 <tt>while ( n /= 10 > 1) ... 2116 if (!"wk" ~ /bwk/) ... 2117 </tt></pre></li> 2118 </ol> 2119 <p class="tent">Several features have been added based on newer implementations of <i>awk</i>:</p> 2120 <ul> 2121 <li class="tent">Multiple instances of <b>-f</b> <i>progfile</i> are permitted.</li> 2122 <li class="tent">The new option <b>-v</b> <i>assignment.</i></li> 2123 <li class="tent">The new predefined variable <b>ENVIRON</b>.</li> 2124 <li class="tent">New built-in functions <b>toupper</b> and <b>tolower</b>.</li> 2125 <li class="tent">More formatting capabilities are added to <b>printf</b> to match the ISO C standard.</li> 2126 </ul> 2127 <p class="tent">Earlier versions of this standard required implementations to support multiple adjacent <semicolon>s, lines 2128 with one or more <semicolon> before a rule (<i>pattern-action</i> pairs), and lines with only <semicolon>(s). These are 2129 not required by this standard and are considered poor programming practice, but can be accepted by an implementation of <i>awk</i> 2130 as an extension.</p> 2131 <p class="tent">The overall <i>awk</i> syntax has always been based on the C language, with a few features from the shell command 2132 language and other sources. Because of this, it is not completely compatible with any other language, which has caused confusion 2133 for some users. It is not the intent of the standard developers to address such issues. A few relatively minor changes toward 2134 making the language more compatible with the ISO C standard were made; most of these changes are based on similar changes in 2135 recent implementations, as described above. There remain several C-language conventions that are not in <i>awk</i>. One of the 2136 notable ones is the <comma> operator, which is commonly used to specify multiple expressions in the C language <b>for</b> 2137 statement. Also, there are various places where <i>awk</i> is more restrictive than the C language regarding the type of expression 2138 that can be used in a given context. These limitations are due to the different features that the <i>awk</i> language does 2139 provide.</p> 2140 <p class="tent">Regular expressions in <i>awk</i> have been extended somewhat from historical implementations to make them a pure 2141 superset of extended regular expressions, as defined by POSIX.1-2024 (see XBD <a href="../basedefs/V1_chap09.html#tag_09_04"><i>9.4 2142 Extended Regular Expressions</i></a> ). The main extensions are internationalization features and interval expressions. Historical 2143 implementations of <i>awk</i> have long supported <backslash>-escape sequences as an extension to extended regular 2144 expressions, and this extension has been retained despite inconsistency with other utilities. The number of escape sequences 2145 recognized in both extended regular expressions and strings has varied (generally increasing with time) among implementations. The 2146 set specified by POSIX.1-2024 includes most sequences known to be supported by popular implementations and by the ISO C 2147 standard. One sequence that is not supported is hexadecimal value escapes beginning with <tt>'\x'</tt>. This would allow values 2148 expressed in more than 9 bits to be used within <i>awk</i> as in the ISO C standard. However, because this syntax has a 2149 non-deterministic length, it does not permit the subsequent character to be a hexadecimal digit. This limitation can be dealt with 2150 in the C language by the use of lexical string concatenation. In the <i>awk</i> language, concatenation could also be a solution 2151 for strings, but not for extended regular expressions (either lexical ERE tokens or strings used dynamically as regular 2152 expressions). Because of this limitation, the feature has not been added to POSIX.1-2024.</p> 2153 <p class="tent">When a string variable is used in a context where an extended regular expression normally appears (where the 2154 lexical token ERE is used in the grammar) the string does not contain the literal <slash> characters.</p> 2155 <p class="tent">Some versions of <i>awk</i> allow the form:</p> 2156 <pre> 2157 <tt>func name(args, ... ) { statements } 2158 </tt></pre> 2159 <p class="tent">This has been deprecated by the authors of the language, who asked that it not be specified.</p> 2160 <p class="tent">Historical implementations of <i>awk</i> produce an error if a <b>next</b> statement is executed in a <b>BEGIN</b> 2161 action, and cause <i>awk</i> to terminate if a <b>next</b> statement is executed in an <b>END</b> action. This behavior has not 2162 been documented, and it was not believed that it was necessary to standardize it.</p> 2163 <p class="tent">The specification of conversions between string and numeric values is much more detailed than in the documentation 2164 of historical implementations or in the referenced <i>The AWK Programming Language</i>. Although most of the behavior is designed 2165 to be intuitive, the details are necessary to ensure compatible behavior from different implementations. This is especially 2166 important in relational expressions since the types of the operands determine whether a string or numeric comparison is performed. 2167 From the perspective of an application developer, it is usually sufficient to expect intuitive behavior and to force conversions 2168 (by adding zero or concatenating a null string) when the type of an expression does not obviously match what is needed. The intent 2169 has been to specify historical practice in almost all cases. The one exception is that, in historical implementations, variables 2170 and constants maintain both string and numeric values after their original value is converted by any use. This means that 2171 referencing a variable or constant can have unexpected side-effects. For example, with historical implementations the following 2172 program:</p> 2173 <pre> 2174 <tt>{ 2175 a = "+2" 2176 b = 2 2177 if (NR % 2) 2178 c = a + b 2179 if (a == b) 2180 print "numeric comparison" 2181 else 2182 print "string comparison" 2183 } 2184 </tt></pre> 2185 <p class="tent">would perform a numeric comparison (and output numeric comparison) for each odd-numbered line, but perform a string 2186 comparison (and output string comparison) for each even-numbered line. POSIX.1-2024 ensures that comparisons will be numeric if 2187 necessary. With historical implementations, the following program:</p> 2188 <pre> 2189 <tt>BEGIN { 2190 OFMT = "%e" 2191 print 3.14 2192 OFMT = "%f" 2193 print 3.14 2194 } 2195 </tt></pre> 2196 <p class="tent">would output <tt>"3.140000e+00"</tt> twice, because in the second <b>print</b> statement the constant 2197 <tt>"3.14"</tt> would have a string value from the previous conversion. POSIX.1-2024 requires that the output of the second 2198 <b>print</b> statement be <tt>"3.140000"</tt>. The behavior of historical implementations was seen as too unintuitive and 2199 unpredictable.</p> 2200 <p class="tent">It was pointed out that with the rules contained in early drafts, the following script would print nothing:</p> 2201 <pre> 2202 <tt>BEGIN { 2203 y[1.5] = 1 2204 OFMT = "%e" 2205 print y[1.5] 2206 } 2207 </tt></pre> 2208 <p class="tent">Therefore, a new variable, <b>CONVFMT</b>, was introduced. The <b>OFMT</b> variable is now restricted to affecting 2209 output conversions of numbers to strings and <b>CONVFMT</b> is used for internal conversions, such as comparisons or array 2210 indexing. The default value is the same as that for <b>OFMT</b>, so unless a program changes <b>CONVFMT</b> (which no historical 2211 program would do), it will receive the historical behavior associated with internal string conversions.</p> 2212 <p class="tent">The POSIX <i>awk</i> lexical and syntactic conventions are specified more formally than in other sources. Again the 2213 intent has been to specify historical practice. One convention that may not be obvious from the formal grammar as in other verbal 2214 descriptions is where <newline> characters are acceptable. There are several obvious placements such as terminating a 2215 statement, and a <backslash> can be used to escape <newline> characters between any lexical tokens. In addition, 2216 <newline> characters without <backslash> characters can follow a comma, an open brace, a logical AND operator 2217 (<tt>"&&"</tt>), a logical OR operator (<tt>"||"</tt>), the <b>do</b> keyword, the <b>else</b> keyword, and the closing 2218 parenthesis of an <b>if</b>, <b>for</b>, or <b>while</b> statement. For example:</p> 2219 <pre> 2220 <tt>{ print $1, 2221 $2 } 2222 </tt></pre> 2223 <p class="tent">The requirement that <i>awk</i> add a trailing <newline> to the program argument text is to simplify the 2224 grammar, making it match a text file in form. There is no way for an application or test suite to determine whether a literal 2225 <newline> is added or whether <i>awk</i> simply acts as if it did.</p> 2226 <p class="tent">POSIX.1-2024 requires several changes from historical implementations in order to support internationalization. 2227 Probably the most subtle of these is the use of the decimal-point character, defined by the <i>LC_NUMERIC</i> category of the 2228 locale, in representations of floating-point numbers. This locale-specific character is used in recognizing numeric input, in 2229 converting between strings and numeric values, and in formatting output. However, regardless of locale, the <period> 2230 character (the decimal-point character of the POSIX locale) is the decimal-point character recognized in processing <i>awk</i> 2231 programs (including assignments in command line arguments). This is essentially the same convention as the one used in the 2232 ISO C standard. The difference is that the C language includes the <a href= 2233 "../functions/setlocale.html"><i>setlocale</i>()</a> function, which permits an application to modify its locale. Because of this 2234 capability, a C application begins executing with its locale set to the C locale, and only executes in the environment-specified 2235 locale after an explicit call to <a href="../functions/setlocale.html"><i>setlocale</i>()</a>. However, adding such an elaborate 2236 new feature to the <i>awk</i> language was seen as inappropriate for POSIX.1-2024. It is possible to execute an <i>awk</i> program 2237 explicitly in any desired locale by setting the environment in the shell.</p> 2238 <p class="tent">The undefined behavior resulting from NULs in extended regular expressions allows future extensions for the GNU 2239 <i>gawk</i> program to process binary data.</p> 2240 <p class="tent">The behavior in the case of invalid <i>awk</i> programs (including lexical, syntactic, and semantic errors) is 2241 undefined because it was considered overly limiting on implementations to specify. In most cases such errors can be expected to 2242 produce a diagnostic and a non-zero exit status. However, some implementations may choose to extend the language in ways that make 2243 use of certain invalid constructs. Other invalid constructs might be deemed worthy of a warning, but otherwise cause some 2244 reasonable behavior. Still other constructs may be very difficult to detect in some implementations. Also, different 2245 implementations might detect a given error during an initial parsing of the program (before reading any input files) while others 2246 might detect it when executing the program after reading some input. Implementors should be aware that diagnosing errors as early 2247 as possible and producing useful diagnostics can ease debugging of applications, and thus make an implementation more usable.</p> 2248 <p class="tent">The unspecified behavior from using multi-character <b>RS</b> values is to allow possible future extensions based 2249 on extended regular expressions used for record separators. Historical implementations take the first character of the string and 2250 ignore the others.</p> 2251 <p class="tent">Unspecified behavior when <a href= 2252 "../utilities/split.html"><i>split</i></a>(<i>string</i>,<i>array</i>,<null>) is used is to allow a proposed future extension 2253 that would split up a string into an array of individual characters.</p> 2254 <p class="tent">In the context of the <b>getline</b> function, equally good arguments for different precedences of the <b>|</b> and 2255 <b><</b> operators can be made. Historical practice has been that:</p> 2256 <pre> 2257 <tt>getline < "a" "b" 2258 </tt></pre> 2259 <p class="tent">is parsed as:</p> 2260 <pre> 2261 <tt>( getline < "a" ) "b" 2262 </tt></pre> 2263 <p class="tent">although many would argue that the intent was that the file <b>ab</b> should be read. However:</p> 2264 <pre> 2265 <tt>getline < "x" + 1 2266 </tt></pre> 2267 <p class="tent">parses as:</p> 2268 <pre> 2269 <tt>getline < ( "x" + 1 ) 2270 </tt></pre> 2271 <p class="tent">Similar problems occur with the <b>|</b> version of <b>getline</b>, particularly in combination with <b>$</b>. For 2272 example:</p> 2273 <pre> 2274 <tt>$"echo hi" | getline 2275 </tt></pre> 2276 <p class="tent">(This situation is particularly problematic when used in a <b>print</b> statement, where the <b>|getline</b> part 2277 might be a redirection of the <b>print</b>.)</p> 2278 <p class="tent">Since in most cases such constructs are not (or at least should not) be used (because they have a natural ambiguity 2279 for which there is no conventional parsing), the meaning of these constructs has been made explicitly unspecified. (The effect is 2280 that a conforming application that runs into the problem must parenthesize to resolve the ambiguity.) There appeared to be few if 2281 any actual uses of such constructs.</p> 2282 <p class="tent">Grammars can be written that would cause an error under these circumstances. Where backwards-compatibility is not a 2283 large consideration, implementors may wish to use such grammars.</p> 2284 <p class="tent">Some historical implementations have allowed some built-in functions to be called without an argument list, the 2285 result being a default argument list chosen in some "reasonable" way. Use of <b>length</b> as a synonym for <b>length($0)</b> is 2286 the only one of these forms that is thought to be widely known or widely used; this particular form is documented in various places 2287 (for example, most historical <i>awk</i> reference pages, although not in the referenced <i>The AWK Programming Language</i>) as 2288 legitimate practice. With this exception, default argument lists have always been undocumented and vaguely defined, and it is not 2289 at all clear how (or if) they should be generalized to user-defined functions. They add no useful functionality and preclude 2290 possible future extensions that might need to name functions without calling them. Not standardizing them seems the simplest 2291 course. The standard developers considered that <b>length</b> merited special treatment, however, since it has been documented in 2292 the past and sees possibly substantial use in historical programs. Accordingly, this usage has been made legitimate, but 2293 Issue 5 removed the obsolescent marking for XSI-conforming implementations and many otherwise conforming applications depend 2294 on this feature.</p> 2295 <p class="tent">In <b>sub</b> and <b>gsub</b>, if <i>repl</i> is a string literal (the lexical token <b>STRING</b>), then two 2296 consecutive <backslash> characters should be used in the string to ensure a single <backslash> will precede the 2297 <ampersand> when the resultant string is passed to the function. (For example, to specify one literal <ampersand> in 2298 the replacement string, use <b>gsub</b>(<b>ERE</b>, <tt>"\\&"</tt>).)</p> 2299 <p class="tent">Historically, the only special character in the <i>repl</i> argument of <b>sub</b> and <b>gsub</b> string functions 2300 was the <ampersand> (<tt>'&'</tt>) character and preceding it with the <backslash> character was used to turn off 2301 its special meaning.</p> 2302 <p class="tent">The description in the ISO POSIX-2:1993 standard introduced behavior such that the <backslash> character 2303 was another special character and it was unspecified whether there were any other special characters. This description introduced 2304 several portability problems, some of which are described below, and so it has been replaced with the more historical description. 2305 Some of the problems include:</p> 2306 <ul> 2307 <li class="tent">Historically, to create the replacement string, a script could use <b>gsub</b>(<b>ERE</b>, <tt>"\\&"</tt>), 2308 but with the ISO POSIX-2:1993 standard wording, it was necessary to use <b>gsub</b>(<b>ERE</b>, <tt>"\\\\&"</tt>). The 2309 <backslash> characters are doubled here because all string literals are subject to lexical analysis, which would reduce each 2310 pair of <backslash> characters to a single <backslash> before being passed to <b>gsub</b>.</li> 2311 <li class="tent">Since it was unspecified what the special characters were, for portable scripts to guarantee that characters are 2312 printed literally, each character had to be preceded with a <backslash>. (For example, a portable script had to use 2313 <b>gsub</b>(<b>ERE</b>, <tt>"\\h\\i"</tt>) to produce a replacement string of <tt>"hi"</tt>.)</li> 2314 </ul> 2315 <p class="tent">The description for comparisons in the ISO POSIX-2:1993 standard did not properly describe historical practice 2316 because of the way numeric strings are compared as numbers. The current rules cause the following code:</p> 2317 <pre> 2318 <tt>if (0 == "000") 2319 print "strange, but true" 2320 else 2321 print "not true" 2322 </tt></pre> 2323 <p class="tent">to do a numeric comparison, causing the <b>if</b> to succeed. It should be intuitively obvious that this is 2324 incorrect behavior, and indeed, no historical implementation of <i>awk</i> actually behaves this way.</p> 2325 <p class="tent">To fix this problem, the definition of <i>numeric string</i> was enhanced to include only those values obtained 2326 from specific circumstances (mostly external sources) where it is not possible to determine unambiguously whether the value is 2327 intended to be a string or a numeric.</p> 2328 <p class="tent">Variables that are assigned to a numeric string shall also be treated as a numeric string. (For example, the notion 2329 of a numeric string can be propagated across assignments.) In comparisons, all variables having the uninitialized value are to be 2330 treated as a numeric operand evaluating to the numeric value zero.</p> 2331 <p class="tent">Uninitialized variables include all types of variables including scalars, array elements, and fields. The 2332 definition of an uninitialized value in <a href="#tag_20_06_13_03">Variables and Special Variables</a> is necessary to describe the 2333 value placed on uninitialized variables and on fields that are valid (for example, <b><</b> <b>$NF</b>) but have no characters 2334 in them and to describe how these variables are to be used in comparisons. A valid field, such as <b>$1</b>, that has no characters 2335 in it can be obtained from an input line of <tt>"\t\t"</tt> when <b>FS=</b><tt>'\t'</tt>. Historically, the comparison 2336 (<b>$1<</b>10) was done numerically after evaluating <b>$1</b> to the value zero.</p> 2337 <p class="tent">The phrase "... also shall have the numeric value of the numeric string" was removed from several sections of the 2338 ISO POSIX-2:1993 standard because is specifies an unnecessary implementation detail. It is not necessary for POSIX.1-2024 to 2339 specify that these objects be assigned two different values. It is only necessary to specify that these objects may evaluate to two 2340 different values depending on context.</p> 2341 <p class="tent">Historical implementations of <i>awk</i> did not parse hexadecimal integer or floating constants like 2342 <tt>"0xa"</tt> and <tt>"0xap0"</tt>. Due to an oversight, the 2001 through 2004 editions of this standard required support for 2343 hexadecimal floating constants. This was due to the reference to <a href="../functions/atof.html"><i>atof</i>()</a>. This version 2344 of the standard allows but does not require implementations to use <a href="../functions/atof.html"><i>atof</i>()</a> and includes 2345 a description of how floating-point numbers are recognized as an alternative to match historic behavior. The intent of this change 2346 is to allow implementations to recognize floating-point constants according to either the ISO/IEC 9899:1990 standard or 2347 ISO/IEC 9899:1999 standard, and to allow (but not require) implementations to recognize hexadecimal integer constants.</p> 2348 <p class="tent">Historical implementations of <i>awk</i> did not support floating-point infinities and NaNs in <i>numeric 2349 strings</i>; e.g., <tt>"-INF"</tt> and <tt>"NaN"</tt>. However, implementations that use the <a href= 2350 "../functions/atof.html"><i>atof</i>()</a> or <a href="../functions/strtod.html"><i>strtod</i>()</a> functions to do the conversion 2351 picked up support for these values if they used a ISO/IEC 9899:1999 standard version of the function instead of a 2352 ISO/IEC 9899:1990 standard version. Due to an oversight, the 2001 through 2004 editions of this standard did not allow support 2353 for infinities and NaNs, but in this revision support is allowed (but not required). This is a silent change to the behavior of 2354 <i>awk</i> programs; for example, in the POSIX locale the expression:</p> 2355 <pre> 2356 <tt>("-INF" + 0 < 0) 2357 </tt></pre> 2358 <p class="tent">formerly had the value 0 because <tt>"-INF"</tt> converted to 0, but now it may have the value 0 or 1.</p> 2359 <p class="tent">Deleting all elements of an array one element at a time, via:</p> 2360 <pre> 2361 <tt>for (index in array) 2362 delete array[index] 2363 </tt></pre> 2364 <p class="tent">is usually not efficient. This standard requires <tt>delete array</tt> to have the same effects, and this was 2365 supported in most implementations as a more efficient operation. It is also possible to use <tt>split("", array)</tt> to achieve 2366 the same effect and efficiency.</p> 2367 </blockquote> 2368 <h4 class="mansect"><a name="tag_20_06_19" id="tag_20_06_19"></a>FUTURE DIRECTIONS</h4> 2369 <blockquote> 2370 <p>If this utility is directed to create a new directory entry that contains any bytes that have the encoded value of a 2371 <newline> character, implementations are encouraged to treat this as an error. A future version of this standard may require 2372 implementations to treat this as an error.</p> 2373 <p class="tent">A future version of this standard may require <b>srand</b> to accept any numeric value and calculate the seed by 2374 taking the provided value, converting it to an integer, and calculating the integer value modulo 2375 2<sup><small><i>n</i></small></sup> where <i>n</i> is an implementation-defined value greater than or equal to 32.</p> 2376 <p class="tent">A future version of this standard may require the initial seed for the <b>rand</b> function (the seed value used if 2377 <b>srand</b> is not called) to be an integer between 0 and 2<sup><small><i>n</i></small></sup>-1 inclusive where <i>n</i> is an 2378 implementation-defined value greater than or equal to 32. Additionally, the initial seed value may be required to be a 2379 (pseudo-)random value such that two invocations of <i>awk</i> are unlikely to emit the same sequence of random values (unless the 2380 seed is explicitly set to the same value via <b>srand</b>).</p> 2381 <p class="tent">A future version of this standard may define a new <b>posix_srand</b> function that enables application authors to 2382 set the seed to a (pseudo-)random value generated by the system. Alternatively, the specification of the <b>srand</b> function may 2383 be altered to provide some means to set the default seed value to a (pseudo-)random value.</p> 2384 </blockquote> 2385 <h4 class="mansect"><a name="tag_20_06_20" id="tag_20_06_20"></a>SEE ALSO</h4> 2386 <blockquote> 2387 <p><a href="../utilities/V3_chap01.html#tag_18_03"><i>1.3 Grammar Conventions</i></a> , <a href= 2388 "../utilities/grep.html#"><i>grep</i></a> , <a href="../utilities/lex.html#"><i>lex</i></a> , <a href= 2389 "../utilities/sed.html#"><i>sed</i></a></p> 2390 <p class="tent">XBD <a href="../basedefs/V1_chap05.html#tag_05"><i>5. File Format Notation</i></a> , <a href= 2391 "../basedefs/V1_chap06.html#tag_06_01"><i>6.1 Portable Character Set</i></a> , <a href="../basedefs/V1_chap08.html#tag_08"><i>8. 2392 Environment Variables</i></a> , <a href="../basedefs/V1_chap09.html#tag_09"><i>9. Regular Expressions</i></a> , <a href= 2393 "../basedefs/V1_chap12.html#tag_12_02"><i>12.2 Utility Syntax Guidelines</i></a></p> 2394 <p class="tent">XSH <a href="../functions/atof.html#"><i>atof</i></a> , <a href="../functions/exec.html#tag_17_129"><i>exec</i></a> 2395 , <a href="../functions/isspace.html#"><i>isspace</i></a> , <a href="../functions/popen.html#"><i>popen</i></a> , <a href= 2396 "../functions/setlocale.html#"><i>setlocale</i></a> , <a href="../functions/strtod.html#"><i>strtod</i></a></p> 2397 </blockquote> 2398 <h4 class="mansect"><a name="tag_20_06_21" id="tag_20_06_21"></a>CHANGE HISTORY</h4> 2399 <blockquote> 2400 <p>First released in Issue 2.</p> 2401 </blockquote> 2402 <h4 class="mansect"><a name="tag_20_06_22" id="tag_20_06_22"></a>Issue 5</h4> 2403 <blockquote> 2404 <p>The FUTURE DIRECTIONS section is added.</p> 2405 </blockquote> 2406 <h4 class="mansect"><a name="tag_20_06_23" id="tag_20_06_23"></a>Issue 6</h4> 2407 <blockquote> 2408 <p>The <i>awk</i> utility is aligned with the IEEE P1003.2b draft standard.</p> 2409 <p class="tent">The normative text is reworded to avoid use of the term "must" for application requirements.<br></p> 2410 <p class="tent">IEEE PASC Interpretation 1003.2 #211 is applied, adding the sentence "An occurrence of two consecutive 2411 <backslash> characters shall be interpreted as just a single literal <backslash> character." into the description of 2412 the <b>sub</b> string function.</p> 2413 </blockquote> 2414 <h4 class="mansect"><a name="tag_20_06_24" id="tag_20_06_24"></a>Issue 7</h4> 2415 <blockquote> 2416 <p>PASC Interpretation 1003.2-1992 #107 (SD5-XCU-ERN-73) is applied, updating the description of the <b>OFS</b> variable.</p> 2417 <p class="tent">Austin Group Interpretation 1003.1-2001 #189 is applied.</p> 2418 <p class="tent">Austin Group Interpretation 1003.1-2001 #201 is applied, permitting implementations to support infinities and 2419 NaNs.</p> 2420 <p class="tent">SD5-XCU-ERN-79 is applied, restoring the horizontal lines to <a href="#tagtcjh_14">Expressions in Decreasing 2421 Precedence in awk</a> , and SD5-XCU-ERN-80 is applied, changing the order of some table entries.</p> 2422 <p class="tent">SD5-XCU-ERN-87 is applied, updating the descriptive text of the Grammar.</p> 2423 <p class="tent">SD5-XCU-ERN-97 is applied, updating the SYNOPSIS.</p> 2424 <p class="tent">The EXTENDED DESCRIPTION is changed to make the support of hexadecimal integer and floating constants optional.</p> 2425 <p class="tent">POSIX.1-2008, Technical Corrigendum 1, XCU/TC1-2008/0057 [224], XCU/TC1-2008/0058 [454], XCU/TC1-2008/0059 [224], 2426 XCU/TC1-2008/0060 [224], XCU/TC1-2008/0061 [254], XCU/TC1-2008/0062 [254], XCU/TC1-2008/0063 [224], and XCU/TC1-2008/0064 [454] are 2427 applied.</p> 2428 <p class="tent">POSIX.1-2008, Technical Corrigendum 2, XCU/TC2-2008/0058 [584], XCU/TC2-2008/0059 [963], XCU/TC2-2008/0060 [226], 2429 XCU/TC2-2008/0061 [663], XCU/TC2-2008/0062 [963], XCU/TC2-2008/0063 [226], and XCU/TC2-2008/0064 [963] are applied.</p> 2430 </blockquote> 2431 <h4 class="mansect"><a name="tag_20_06_25" id="tag_20_06_25"></a>Issue 8</h4> 2432 <blockquote> 2433 <p>Austin Group Defect 251 is applied, encouraging implementations to disallow the creation of filenames containing any bytes that 2434 have the encoded value of a <newline> character.</p> 2435 <p class="tent">Austin Group Defects 544 and 1136 are applied, requiring implementations to accept the <b>delete</b> statement with 2436 an unsubscripted array name.</p> 2437 <p class="tent">Austin Group Defect 607 is applied, adding the <b>nextfile</b> statement.</p> 2438 <p class="tent">Austin Group Defect 634 is applied, adding the <b>fflush</b> function.</p> 2439 <p class="tent">Austin Group Defects 974 and 1451 are applied, clarifying the <b>ARGC</b>, <b>ARGV</b> and <b>FILENAME</b> 2440 variables, and adding to APPLICATION USAGE.</p> 2441 <p class="tent">Austin Group Defect 983 is applied, changing the descriptions of the <b>rand</b> and <b>srand</b> functions and the 2442 FUTURE DIRECTIONS section.</p> 2443 <p class="tent">Austin Group Defect 1070 is applied, requiring the <tt>"!="</tt> and <tt>"=="</tt> operators to perform string 2444 comparisons by checking if the strings are identical (and not by checking if they collate equally).</p> 2445 <p class="tent">Austin Group Defect 1105 is applied, clarifying the requirements for <backslash> escaping.</p> 2446 <p class="tent">Austin Group Defect 1122 is applied, changing the description of <i>NLSPATH .</i></p> 2447 <p class="tent">Austin Group Defect 1198 is applied, requiring comparisons to be performed numerically when both operands have 2448 string values that are numeric strings.</p> 2449 <p class="tent">Austin Group Defect 1277 is applied, clarifying that using a <slash> character within an ERE requires 2450 escaping only if it is within the lexical token <b>ERE</b>.</p> 2451 <p class="tent">Austin Group Defect 1320 is applied, clarifying the condition under which ERE matching is against input 2452 records.</p> 2453 <p class="tent">Austin Group Defect 1395 is applied, changing the requirements for string to number conversion.</p> 2454 <p class="tent">Austin Group Defect 1468 is applied, clarifying the behavior when <b>FS</b> is an ERE that can match the null 2455 string.</p> 2456 <p class="tent">Austin Group Defect 1566 is applied, specifying the behavior of the <b>length</b> function when passed an array 2457 argument.</p> 2458 </blockquote> 2459 <div class="box"><em>End of informative text.</em></div> 2460 <hr> 2461 <p> </p> 2462 <a href="#top"><span class="topOfPage">return to top of page</span></a><br> 2463 <hr size="2" noshade> 2464 <center><font size="2">UNIX® is a registered Trademark of The Open Group.<br> 2465 POSIX™ is a Trademark of The IEEE.<br> 2466 Copyright © 2001-2024 The IEEE and The Open Group, All Rights Reserved<br> 2467 [ <a href="../mindex.html">Main Index</a> | <a href="../basedefs/contents.html">XBD</a> | <a href= 2468 "../functions/contents.html">XSH</a> | <a href="../utilities/contents.html">XCU</a> | <a href="../xrat/contents.html">XRAT</a> 2469 ]</font></center> 2470 <hr size="2" noshade> 2471 <div class="NAVHEADER"> 2472 <table summary="Header navigation table" class="nav" width="100%" border="0" cellpadding="0" cellspacing="0"> 2473 <tr class="nav"> 2474 <td class="nav" width="15%" align="left" valign="bottom"><a href="../utilities/at.html" accesskey="P"><<< 2475 Previous</a></td> 2476 <td class="nav" width="70%" align="center" valign="bottom"><a href="contents.html">Home</a></td> 2477 <td class="nav" width="15%" align="right" valign="bottom"><a href="../utilities/basename.html" accesskey="N">Next 2478 >>></a></td> 2479 </tr> 2480 </table> 2481 <hr align="left" width="100%"></div> 2482 </body> 2483 </html>