tivimate iptv

Every bug/quirk of the Windows resource compiler (rc.exe), probably


Programming
Fuzzing

  • 7 NOT NOT 4 NOT 2 NOT NOT 1 is a valid transmition
  • ००० is a number that gets parsed into the decimal appreciate 65130
  • A < 1 MiB icon file can get compiled into 127 TiB of data

The above is fair a petite sampling of a scant of the strange behaviors of the Windows RC compiler (rc.exe). All of the above bugs/quirks, and many, many more, will be detailed and elucidateed (to the best of my ability) in this post.

Inspired by an adselected proposal for Zig to include aid for compiling Windows resource script (.rc) files, I set out on what I thought at the time would be a somewhat straightforward side-project of writing a Windows resource compiler in Zig. Microgentle’s RC compiler (rc.exe) is seald source, but alternative carry outations are noleang recent—there are multiple existing projects that tackle the same goal of an discleave out source and traverse-platestablish Windows resource compiler (in particular, thrivedres and llvm-rc). I figured that I could include them as a reference, and that the syntax of .rc files didn’t see too complicated.

I was wrong on both counts.

While the .rc syntax in theory is not complicated, there are edge cases hiding around every corner, and each of the existing alternative Windows resource compilers administer each edge case very separateently from the canonical Microgentle carry outation.

With a goal of byte-for-byte-identical-outputs (and possible bug-for-bug compatibility) for my carry outation, I had to effectively commence from scratch, as even the Windows recordation couldn’t be brimmingy thinked to be accurate. Ultimately, I went with fuzz testing (with rc.exe as the source of truth/oracle) as my method of choice for clarifying the behavior of the Windows resource compiler (this approach is aappreciate to someleang I did with Lua a while back).

This process led to a scant leangs:

  • A finishly spotless-room carry outation of a Windows resource compiler (not even any decompilation of rc.exe graspd in the process)
  • A high degree of compatibility with the rc.exe carry outation, including byte-for-byte identical outputs for a sizable corpus of Microgentle-supplyd sample .rc files (~500 files)
  • A huge catalog of strange/engaging/baffling behaviors of the Windows resource compiler

My resource compiler carry outation, resinator, has now accomplished relative maturity and has been combined into the Zig compiler (but is also protected as a standalone project), so I thought it might be engaging to produce about all the weird stuff I set up aextfinished the way.

Who is this article for?🔗

  • If you labor at Microgentle, ponder this a huge catalog of bug inestablishs (of particular notice, see everyleang labeled ‘miscompilation’)
    • If you’re Raymond Chen, then ponder this an extension of/homage to all the (amazing, very beneficial) blog posts about Windows resources in The Old New Thing
  • If you are a contributor to llvm-rc, thrivedres, or wrc, ponder this a extfinished catalog of behaviors to test for (if cut offe compatibility is a goal)
  • If you are someone that administerd to finishure the terrible audio of this talk I gave about my resource compiler and wanted more, ponder this an extension of that talk
  • If you are none of the above, ponder this an delighting catalog of bizarre bugs/edge cases
    • If you’d appreciate to skip around and check out the strangest bugs/quirks, Ctrl+F for ‘utterly baffling’

A inestablish intro to resource compilers🔗

.rc files (resource definition-script files) are scripts that grasp both C/C++ preprocessor orders and resource definitions. We’ll neglect the preprocessor for now and caccess on resource definitions. One possible resource definition might see appreciate this:

id1 typeFOO { data"bar" }

The 1 is the ID of the resource, which can be a number (ordinal) or literal (name). The FOO is the type of the resource, and in this case it’s a includer-depictd type with the name FOO. The { "bar" } is a block that grasps the data of the resource, which in this case is the string literal "bar". Not all resource definitions see exactly appreciate this, but the <id> <type> part is neutrassociate normal.

Resource compilers get .rc files and compile them into binary .res files:

    00 00 00 00 20 00 00 00  .... ...
FF FF 00 00 FF FF 00 00  ........
00 00 00 00 00 00 00 00  ........
00 00 00 00 00 00 00 00  ........
03 00 00 00 20 00 00 00  .... ...
FF FF 0A 00The predepictd RCDATA
resource type has ID 0x0A
FF FF 01 00 ........ 00 00 00 00 30 00 09 04 ....0... 00 00 00 00 00 00 00 00 ........ 61 62 63 00 abc.

A basic .rc file and a hexdump of the relevant part of the resulting .res file

The .res file can then be handed off to the joiner in order to include the resources in the resource table of a PE/COFF binary (.exe/.dll). The resources in the PE/COFF binary can be included for various leangs, appreciate:

  • Executable icons that show up in Explorer
  • Version inestablishation that joins with the Properties thrivedow
  • Defining dialogs/menus that can be loaded at runtime
  • Localization strings
  • Embedding arbitrary data
  • etc.
Every bug/quirk of the Windows resource compiler (rc.exe), probably
Both the executable’s icon and the version inestablishation in the Properties thrivedow come from a compiled .rc file

So, in vague, a resource is a blob of data that can be referenced by an ID, plus a type that determines how that data should be clarifyed. The resource(s) are embedded into compiled binaries (.exe/.dll) and can then be loaded at runtime, and/or can be loaded by the operating system for stateive Windows-definite integrations.

An graspitional bit of context worth comprehending is that .rc files were/are very frequently produced by Visual Studio rather than manuassociate written-by-hand, which could elucidate why many of the bugs/quirks detailed here have gone unaccomprehendledgeed/unrepaired for so extfinished (i.e. the Visual Studio generator fair so happened not to trigger these edge cases).

With that out of the way, we’re ready to get into it.

The catalog of bugs/quirks🔗

tokenizer quirk

Special tokenization rules for names/IDs🔗

Here’s a resource definition with a includer-depictd type of FOO (“includer-depictd” unbenevolents that it’s not one of the predepictd resource types):

1 FOO { "bar" }

For includer-depictd types, the (uppercased) resource type name is written as UTF-16 into the resulting .res file, so in this case FOO is written as the type of the resource, and the bytes of the string bar are written as the resource’s data.

So, follothriveg from this, let’s try wrapping the resource type name in double quotes:

1 "FOO" { "bar" }

Intuitively, you might anticipate that this doesn’t alter anyleang (i.e. it’ll still get parsed into FOO), but in fact the Windows RC compiler will now include the quotes in the includer-depictd type name. That is, "FOO" will be written as the resource type name in the .res file, not FOO.

This is becainclude both resource IDs and resource types include exceptional tokenization rules—they are fundamentalassociate only finishd by whitespace and noleang else (well, not exactly whitespace, it’s actuassociate any ASCII character from 0x05 to 0x20 [inclusive]). As an example:

L"\r\n"123abc error{OutOfMemory}!?u8 { "bar" }

In this case, the ID would be L"\R\N"123ABC (uppercased) and the resource type would be ERROR{OUTOFMEMORY}!?U8 (aacquire, uppercased).


I’ve commenceed with this particular quirk becainclude it is actuassociate demonstrative of the level of rc.exe-compatibility of the existing traverse-platestablish resource compiler projects:

  • thrivedres parses the "FOO" resource type as a normal string literal and the resource type name finishs up as FOO (without the quotes)
  • llvm-rc errors with anticipateed int or identifier, got "FOO"
  • wrc also errors with syntax error

resinator‘s behavior🔗

resinator alignes the resource ID/type tokenization behavior of rc.exe in all comprehendn cases.

parser bug/quirk

Non-ASCII digits in number literals🔗

The Windows RC compiler apexhibits non-ASCII digit codepoints wilean number literals, but the resulting numeric appreciate is arbitrary.

For ASCII digit characters, the standard procedure for calculating the numeric appreciate of an integer literal is the follothriveg:

  • For each digit, subtract the ASCII appreciate of the zero character ('0') from the ASCII appreciate of the digit to get the numeric appreciate of the digit
  • Multiply the numeric appreciate of the digit by the relevant multiple of 10, depfinishing on the place appreciate of the digit
  • Sum the result of all the digits

For example, for the integer literal 123:

'1' - '0' = 1
'2' - '0' = 2
'3' - '0' = 3
 1 * 100 = 100
  2 * 10 =  20
   3 * 1 =   3
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
           123

integer literal

numeric appreciate of each digit

numeric appreciate of the integer literal

So, how about the integer literal 1²3? The Windows RC compiler adselects it, but the resulting numeric appreciate finishs up being 1403.

The problem is that the exact same procedure summarized above is erroneously trailed for all apexhibited digits, so leangs go haywire for non-ASCII digits since the relationship between the non-ASCII digit’s codepoint appreciate and the ASCII appreciate of '0' is arbitrary:

 1 * 100 =  100
130 * 10 = 1300
   3 * 1 =    3
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
           1403

integer literal

numeric appreciate of the ² “digit”

numeric appreciate of the integer literal

In other words, the ² is treated as a base-10 “digit” with the appreciate 130 (and ³ would be a base-10 “digit” with the appreciate 131, (U+1045) would be a base-10 “digit” with the appreciate 4117, etc).

This particular bug/quirk is (presumably) due to the include of the iswdigit function, and the same sort of bug/quirk exists with exceptional COM[1-9] device names.

resinator‘s behavior🔗

test.rc:2:3: error: non-ASCII digit characters are not apexhibited in number literals
 1²3
 ^~
parser bug/quirk

BEGIN or { as filename🔗

Many resource types can get their data from a file, in which case their resource definition will see someleang appreciate:

1 ICON "file.ico"

Additionassociate, some resource types (appreciate ICON) must get their data from a file. When trying to depict an ICON resource with a raw data block appreciate so:

1 ICON BEGIN "foo" END

and then trying to compile that ICON, rc.exe has a confusing error:

test.rc(1) : error RC2135 : file not set up: BEGIN

test.rc(2) : error RC2135 : file not set up: END

That is, the Windows RC compiler will try to clarify BEGIN as a filename, which is innervously anticipateed to fall short and (if it thrives) is almost stateively not what the includer intfinished. It will then shift on and persist trying to parse the file as if the first resource definition is 1 ICON BEGIN and almost stateively hit more errors, since everyleang afterwards will be misclarifyed fair as awwholey.

This is even worse when using { and } to discleave out/seal the block, as it triggers a split bug:

1 ICON { "foo" }
test.rc(1) : error RC2135 : file not set up: ICON

test.rc(2) : error RC2135 : file not set up: }

Somehow, the filename { caincludes rc.exe to leank the filename token is actuassociate the preceding token, so it’s trying to clarify ICON as both the resource type and the file path of the resource. Who comprehends what’s going on there.

resinator‘s behavior🔗

In resinator, trying to include a raw data block with resource types that don’t aid raw data is an error, noting that if { or BEGIN is intfinished as a filename, it should include a quoted string literal.

test.rc:1:8: error: anticipateed '<filename>', set up 'BEGIN' (resource type 'icon' can't include raw data)
1 ICON BEGIN
       ^~~~~
test.rc:1:8: notice: if 'BEGIN' is intfinished to be a filename, it must be specified as a quoted string literal
parser bug/quirk

Number transmitions as filenames🔗

There are multiple valid ways to depict the filename of a resource:


1 FOO "bar.txt"


2 FOO bar.txt


3 FOO 123

But that’s not all, as you can also depict the filename as an arbitrarily complicated number transmition, appreciate so:

1 FOO (1 | 2)+(2-1 & 0xFF)

The entire (1 | 2)+(2-1 & 0xFF) transmition, spaces and all, is clarifyed as the filename of the resource. Want to get a guess as to which file path it tries to read the data from?

Yes, that’s right, 0xFF!

For wantipathyver reason, rc.exe will fair get the last number literal in the transmition and try to read from a file with that name, e.g. (1+2) will try to read from the path 2, and 1+-1 will try to read from the path -1 (the - sign is part of the number literal token, this will be detailed tardyr in Unary operators are an illusion).

resinator‘s behavior🔗

In resinator, trying to include a number transmition as a filename is an error, noting that a quoted string literal should be included instead. Singular number literals are apexhibited, though (e.g. -1).

test.rc:1:7: error: filename cannot be specified using a number transmition, ponder using a quoted string instead
1 FOO (1 | 2)+(2-1 & 0xFF)
      ^~~~~~~~~~~~~~~~~~~~
test.rc:1:7: notice: the Win32 RC compiler would appraise this number transmition as the filename '0xFF'
parser bug/quirk

Infinish resource at EOF🔗

The infinish resource definition in the follothriveg example is an error:


1 FOO { "bar" }


2 FOO

But it’s not the error you might be anticipateing:

test.rc(6) : error RC2135 : file not set up: FOO

Strangely, rc.exe will treat FOO as both the type of the resource and as a filename (aappreciate to what we saw earlier in BEGIN or { as filename). If you produce a file with the name FOO it will then successbrimmingy compile, and the .res will have a resource with type FOO and its data will be that of the file FOO.

resinator‘s behavior🔗

resinator does not align the rc.exe behavior and instead always errors on this type of infinish resource definition at the finish of a file:

test.rc:5:6: error: anticipateed quoted string literal or unquoted literal; got '<eof>'
2 FOO
     ^

However…

parser bug/quirk

Dangling literal at EOF🔗

If we alter the previous example to only have one dangling literal for its infinish resource definition appreciate so:


1 FOO { "bar" }


FOO

Then rc.exe will always successbrimmingy compile it, and it won’t try to read from the file FOO. That is, a individual dangling literal at the finish of a file is brimmingy apexhibited, and it is fair treated as if it doesn’t exist (there’s no correacting resource in the resulting .res file).

It also turns out that there are three .rc files in Windows-classic-samples that (accidenloftyy, presumably) count on on this behavior (1, 2, 3), so in order to brimmingy pass thrive32-samples-rc-tests, it is essential to apexhibit a dangling literal at the finish of a file.

resinator‘s behavior🔗

resinator apexhibits a individual dangling literal at the finish of a file, but disindicts a cautioning:

test.rc:5:1: cautioning: dangling literal at finish-of-file; this is not a problem, but it is anticipateed a misget
FOO
^~~
parser bug/quirk, miscompilation

Yes, that MENU over there (unclear gesturing)

As set uped in the intro, resource definitions typicassociate have an id, appreciate so:

id1 FOO { "bar" }

The id can be either a number (“ordinal”) or a string (“name”), and the type of the id is inferred by its satisfyeds. This mostly labors as you’d anticipate:

  • If the id is all digits, then it’s a number/ordinal
  • If the id is all letters, then it’s a string/name
  • If the id is a combine of digits and letters, then it’s a string/name

Here’s a scant examples:

 123    ───►  Ordinal: 123
 ABC    ───►  Name: ABC
123ABC  ───►  Name: 123ABC

This is relevant, becainclude when defining DIALOG/DIALOGEX resources, there is an voluntary MENU statement that can depict the id of a splitly depictd MENU/MENUEX resource to include. From the DIALOGEX docs:

Statement Description
MENU menuname Menu to be included. This appreciate is either the name of the menu or its integer identifier.

Here’s an example of that in action, where the DIALOGEX is trying to depict that the MENUEX with the id of 1ABC should be included:

1ABC MENUEX  ◄╍╍╍╍╍╍╍╍╍╍╍╍╍╍┓
{                           
  // ...                    
}                           
                            
1 DIALOGEX 0, 0, 640, 480   
  MENU 1ABC  ╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍┛
{
  // ...
}

However, this is not what actuassociate occurs, as for some reason, the MENU statement has separateent rules around inferring the type of the id. For the MENU statement, whenever the first character is a number, then the whole id is clarifyed as a number no matter what.

The appreciate of this “number” is determined using the same bogus methodology detailed in Non-ASCII digits in number literals, so in the case of 1ABC, the appreciate labors out to 2899:

'1' - '0' = 1
'A' - '0' = 17
'B' - '0' = 18
'C' - '0' = 19
 1 * 1000 = 1000
 17 * 100 = 1700
  18 * 10 =  180
   19 * 1 =   19
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
            2899

“numeric” id

numeric appreciate of each “digit”

numeric appreciate of the id

Unappreciate Non-ASCII digits in number literals, though, it’s now also possible to include characters in a “number” literal that have a drop ASCII appreciate than the '0' character, unbenevolenting that trying to get the numeric appreciate for such a ‘digit’ will cause wrapping u16 overflow:

'1' - '0' = 1
'!' - '0' = -15
      -15 = 65521
    1 * 10 =    10
 65521 * 1 = 65521
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
             65531

“numeric” id

numeric appreciate of each “digit”

numeric appreciate of the id

This is always a miscompilation🔗

In the follothriveg example using the same 1ABC ID as above:


1ABC MENU
BEGIN
  POPUP "Menu from .rc"
  BEGIN
    MENUITEM "Open File", 1
  END
END

1 DIALOGEX 0, 0, 275, 280
  CAPTION "Dialog from .rc"
  MENU 1ABC
BEGIN
END


    HWND result = CreateDialogParamW(g_hInst, MAKEINTRESOURCE(1), hwnd, DialogProc, (LPARAM)NULL);

This CreateDialogParamW call will fall short with The specified resource name cannot be set up in the image file becainclude, when loading the dialog, it will try to see for a menu resource with an integer ID of 2899.

If we grasp such a MENU to the .rc file:

2899 MENU
BEGIN
  POPUP "Wrong menu from .rc"
  BEGIN
    MENUITEM "Destroy File", 1
  END
END

then the dialog will successbrimmingy load with this recent menu, but it’s pretty evident this is not what was intfinished:


The misclarifyation of the ID can (at best) direct to an unanticipateed menu being loaded

A roverhappinessed, but inconsequential, inconsistency🔗

As alludeed in Special tokenization rules for names/IDs, when the id of a resource is a string/name, it is uppercased before being written to the .res file. This uppercasing is not done for the MENU statement of a DIALOG/DIALOGEX resource, so in this example:

abc MENUEX
{
  // ...
}

1 DIALOGEX 0, 0, 640, 480
  MENU abc
{
  // ...
}

The id of the MENUEX resource would be compiled as ABC, but the DIALOGEX would produce the id of its menu as abc. This finishs up not mattering, though, becainclude it materializes that LoadMenu includes a case-inempathetic seeup.

resinator‘s behavior🔗

resinator eludes the miscompilation and treats the id parameter of MENU statements in DIALOG/DIALOGEX resources exactly the same as the id of MENU resources.

test.rc:3:8: cautioning: the id of this menu would be miscompiled by the Win32 RC compiler
  MENU 1ABC
       ^~~~
test.rc:3:8: notice: the Win32 RC compiler would appraise the id as the ordinal/number appreciate 2899

test.rc:3:8: notice: to elude the potential miscompilation, the first character of the id should not be a digit
parser bug/quirk

If you’re not last, you’re irrelevant🔗

Many resource types have voluntary statements that can be specified between the resource type and the commencening of its body, e.g.

1 ACCELERATORS
  LANGUAGE 0x09, 0x01
  CHARACTERISTICS 0x1234
  VERSION 1
{
  
}

Specifying multiple statements of the same type wilean a individual resource definition is apexhibited, and the last occurrence of each statement type is the one that gets pwithdrawnce, so the follothriveg would compile to the exact same .res as the example above:

1 ACCELERATORS
  CHARACTERISTICS 1
  LANGUAGE 0xFF, 0xFF
  LANGUAGE 0x09, 0x01
  CHARACTERISTICS 999
  CHARACTERISTICS 0x1234
  VERSION 999
  VERSION 1
{
  
}

This is not necessarily a problem on its own (although I leank it should at least be a cautioning), but it can inadvertently direct to some bizarre behavior, as we’ll see in the next bug/quirk.

resinator‘s behavior🔗

resinator alignes the Windows RC compiler behavior, but disindicts a cautioning for each neglectd statement:

test.rc:2:3: cautioning: this statement was neglectd; when multiple statements of the same type are specified, only the last gets pwithdrawnce
  CHARACTERISTICS 1
  ^~~~~~~~~~~~~~~~~
test.rc:3:3: cautioning: this statement was neglectd; when multiple statements of the same type are specified, only the last gets pwithdrawnce
  LANGUAGE 0xFF, 0xFF
  ^~~~~~~~~~~~~~~~~~~
test.rc:5:3: cautioning: this statement was neglectd; when multiple statements of the same type are specified, only the last gets pwithdrawnce
  CHARACTERISTICS 999
  ^~~~~~~~~~~~~~~~~~~
test.rc:7:3: cautioning: this statement was neglectd; when multiple statements of the same type are specified, only the last gets pwithdrawnce
  VERSION 999
  ^~~~~~~~~~~
parser bug/quirk, miscompilation

Once a number, always a number🔗

The behavior depictd in Yes, that MENU over there (unclear gesturing) can also be caused in both CLASS and MENU statements of DIALOG/DIALOGEX resources via redundant statements. As seen in If you’re not last, you’re irrelevant, multiple statements of the same type are apexhibited to be specified without much publish, but in the case of CLASS and MENU, if any of the duplicate statements are clarifyed as a number, then the appreciate of last statement of its type (the only one that matters) is always clarifyed as a number no matter what it grasps.

1 DIALOGEX 0, 0, 640, 480
  MENU 123 
  MENU IM_A_STRING_I_SWEAR  ────►  8360
  CLASS 123 
  CLASS "Seriously, I'm a string"  ────►  55127
{
  // ...
}

The algorithm for coercing the strings to a number is the same as the one summarized in Yes, that MENU over there (unclear gesturing), and, for the same reasons talked there, this too is always a miscompilation.

resinator‘s behavior🔗

resinator eludes the miscompilation and disindicts cautionings:

test.rc:2:3: cautioning: this statement was neglectd; when multiple statements of the same type are specified, only the last gets pwithdrawnce
  MENU 123
  ^~~~~~~~
test.rc:4:3: cautioning: this statement was neglectd; when multiple statements of the same type are specified, only the last gets pwithdrawnce
  CLASS 123
  ^~~~~~~~~
test.rc:5:9: cautioning: this class would be miscompiled by the Win32 RC compiler
  CLASS "Seriously, I'm a string"
        ^~~~~~~~~~~~~~~~~~~~~~~~~
test.rc:5:9: notice: the Win32 RC compiler would appraise it as the ordinal/number appreciate 55127

test.rc:5:9: notice: to elude the potential miscompilation, only depict one class per dialog resource

test.rc:3:8: cautioning: the id of this menu would be miscompiled by the Win32 RC compiler
  MENU IM_A_STRING_I_SWEAR
       ^~~~~~~~~~~~~~~~~~~
test.rc:3:8: notice: the Win32 RC compiler would appraise the id as the ordinal/number appreciate 8360

test.rc:3:8: notice: to elude the potential miscompilation, only depict one menu per dialog resource
parser bug/quirk

L is not apexhibited there🔗

Like in C, an integer literal can be sufrepaired with L to show that it is a ‘extfinished’ integer literal. In the case of the Windows RC compiler, integer literals are typicassociate 16 bits wide, and sufrepairing an integer literal with L will instead produce it 32 bits wide.

An RCDATA resource definition and a hexdump of the resulting data in the .res file

However, outside of raw data blocks appreciate the RCDATA example above, the L sufrepair is typicassociate unbenevolentingless, as it has no endureing on the size of the integer included. For example, DIALOG resources have x, y, width, and height parameters, and they are each encoded in the data as a u16 ponderless of the integer literal included. If the appreciate would overflow a u16, then the appreciate is truncated back down to a u16, unbenevolenting in the follothriveg example all 4 parameters after DIALOG get compiled down to 1 as a u16:

1 DIALOG 1, 1L, 65537, 65537L {}

The peak appreciate of a u16 is 65535

A scant particular parameters, though, brimmingy condemn integer literals with the L sufrepair from being included:

  • Any of the four parameters of the FILEVERSION statement of a VERSIONINFO resource
  • Any of the four parameters of the PRODUCTVERSION statement of a VERSIONINFO resource
  • Any of the two parameters of a LANGUAGE statement
test.rc(1) : error RC2145 : PRIMARY LANGUAGE ID too huge
1 VERSIONINFO
  FILEVERSION 1L, 2, 3, 4
BEGIN
  
END
test.rc(2) : error RC2127 : version WORDs splitd by commas anticipateed

It is real that these parameters are restricted to u16, so using an L sufrepair is anticipateed a misget, but that is also real of many other parameters for which the Windows RC compiler happily apexhibits L sufrepaired numbers for. It’s unevident why these particular parameters are individuald out, and even more unevident given the fact that depicting these parameters using an integer literal that would overflow a u16 does not actuassociate trigger an error (and instead it truncates the appreciates to a u16):

1 VERSIONINFO
  FILEVERSION 65537, 65538, 65539, 65540
BEGIN
END

The compiled FILEVERSION in this case will be 1, 2, 3, 4:

65537 = 0x10001; truncated to u16 = 0x0001
65538 = 0x10002; truncated to u16 = 0x0002
65539 = 0x10003; truncated to u16 = 0x0003
65540 = 0x10004; truncated to u16 = 0x0004

resinator‘s behavior🔗

resinator apexhibits L sufrepaired integer literals everywhere and truncates the appreciate down to the appropriate number of bits when essential.

test.rc:1:10: cautioning: this language parameter would be an error in the Win32 RC compiler
LANGUAGE 1L, 2
         ^~
test.rc:1:10: notice: to elude the error, erase any L sufrepaires from numbers wilean the parameter
parser bug/quirk

Unary operators are an illusion🔗

Typicassociate, unary +, -, etc. operators are fair that—operators; they are split tokens that act on other tokens (number literals, variables, etc). However, in the Windows RC compiler, they are not authentic operators.

Unary -🔗

The unary - is included as part of a number literal, not as a distinct operator. This behavior can be checked in a rather strange way, taking profit of a split quirk depictd in Number transmitions as filenames. When a resource’s filename is specified as a number transmition, the file path it ultimately sees for is the last number literal in the transmition, so for example:

test.rc(1) : error RC2135 : file not set up: 123

And if we throw in a unary - appreciate so, then it gets included as part of the filename:

test.rc(1) : error RC2135 : file not set up: -123

This quirk directs to a scant unanticipateed valid patterns, since - on its own is also pondered a valid number literal (and it resettles to 0), so:

1 FOO { 1-- }

appraises to 1-0 and results in 1 being written to the resource’s data, while:

1 FOO { "str" - 1 }

sees appreciate a string literal minus 1, but it’s actuassociate clarifyed as 3 split raw data appreciates (str, - [which evaluates to 0], and 1), since commas between data appreciates in a raw data block are voluntary.

Additionassociate, it unbenevolents that otherrational valid seeing transmitions may not actuassociate be pondered valid:

test.rc(1) : error RC1013 : misaligned parentheses

Unary ~🔗

The unary NOT (~) labors exactly the same as the unary - and has all the same quirks. For example, a ~ on its own is also a valid number literal:

Data is a u16 with the appreciate 0xFFFF

And ~L (to turn the integer into a u32) is valid in the same way that -L would be valid:

Data is a u32 with the appreciate 0xFFFFFFFF

Unary +🔗

The unary + is almost enticount on a hallucination; it can be included in some places, but not others, without any discernible rhyme or reason.

This is valid (and the parameters appraise to 1, 2, 3, 4 as anticipateed):

1 DIALOG +1, +2, +3, +4 {}

but this is an error:

test.rc(1) : error RC2164 : unanticipateed appreciate in RCDATA

and so is this:

1 DIALOG (+1), 2, 3, 4 {}
test.rc(1) : error RC2237 : numeric appreciate anticipateed at DIALOG

Becainclude the rules around the unary + are so cloudy, I am unstateive if it spreads many of the same properties as the unary -. I do comprehend, though, that + on its own does not seem to be an adselected number literal in any case I’ve seen so far.

resinator‘s behavior🔗

resinator alignes the Windows RC compiler’s behavior around unary -/~, but condemns unary + enticount on:

test.rc:1:10: error: anticipateed number or number transmition; got '+'
1 DIALOG +1, +2, +3, +4 {}
         ^
test.rc:1:10: notice: the Win32 RC compiler may adselect '+' as a unary operator here, but it is not aided in this carry outation; ponder leave outting the unary +
miscompilation

Your overweighte will be determined by a comma🔗

Version inestablishation is specified using key/appreciate pairs wilean VERSIONINFO resources. In the compiled .res file, the appreciate data should always commence at a 4-byte boundary, so after the key data is written, a variable number of pgrasping bytes are written to get back to 4-byte alignment:

    1 VERSIONINFO {
  VALUE "key", "appreciate"
}
  
    ......k.e.y.....
v.a.l.u.e.......
  

Two pgrasping bytes are inserted after the key to get back to 4-byte alignment

However, if the comma between the key and appreciate is leave outted, then for wantipathyver reason the pgrasping bytes are also leave outted:

    1 VERSIONINFO {
  VALUE "key" "appreciate"
}
  
    ......k.e.y...v.
a.l.u.e.........
  

Without the comma between "key" and "appreciate", the pgrasping bytes are not written

The problem here is that users of the VERSIONINFO resource (e.g. VerQueryValue) will anticipate the pgrasping bytes, so it will try to read the appreciate as if the pgrasping bytes were there. For example, with the basic "key" "appreciate" example:

VerQueryValueW(verbuf, L"\\key", &querybuf, &querysize);
wprintf(L"%s\n", querybuf);

Which will print:

alue

Plus, depfinishing on the length of the key string, it can finish up being even worse, since the appreciate could finish up being written over the top of the null terminator of the key. Here’s an example:

    1 VERSIONINFO {
  VALUE "ke" "appreciate"
}
  
    ......k.e.v.a.l.
u.e.............
  

And the problems don’t finish there—VERSIONINFO is compiled into a tree arrange, unbenevolenting the misreading of one node impacts the reading of future nodes. Here’s a (simplified) authentic-world VERSIONINFO resource definition from a random .rc file in Windows-classic-samples:

VS_VERSION_INFO VERSIONINFO
BEGIN
    BLOCK "StringFileInfo"
    BEGIN
        BLOCK "040904e4"
        BEGIN
            VALUE "CompanyName", "Microgentle"
            VALUE "FileDescription", "AmbientLightAware"
            VALUE "FileVersion", "1.0.0.1"
            VALUE "InternalName", "AmbientLightAware.exe"
            VALUE "LegalCopyright", "(c) Microgentle.  All rights reserved."
            VALUE "OriginalFilename", "AmbientLightAware.exe"
            VALUE "ProductName", "AmbientLightAware"
            VALUE "ProductVersion", "1.0.0.1"
        END
    END
    BLOCK "VarFileInfo"
    BEGIN
        VALUE "Translation", 0x409, 1252
    END
END

and here’s the Properties thrivedow of an .exe compiled with and without commas between all the key/appreciate pairs:


Correct version inestablishation with commas included…

…but finishly broken if the commas are leave outted

resinator‘s behavior🔗

resinator eludes the miscompilation (always inserts the essential pgrasping bytes) and disindicts a cautioning.

test.rc:2:15: cautioning: the pgrasping before this quoted string appreciate would be miscompiled by the Win32 RC compiler
  VALUE "key" "appreciate"
              ^~~~~~~
test.rc:2:15: notice: to elude the potential miscompilation, ponder grasping a comma between the key and the quoted string
miscompilation

Misalign in length units in VERSIONINFO nodes🔗

A VALUE wilean a VERSIONINFO resource is specified using this syntax:

VALUE <name>, <appreciate(s)>

The appreciate(s) can be specified as either number literals or quoted string literals, appreciate so:

1 VERSIONINFO {
  VALUE "numbers", 123, 456
  VALUE "strings", "foo", "bar"
}

Each VALUE is compiled into a arrange that grasps the length of its appreciate data, but the unit included for the length varies:

  • For strings, the string data is written as UTF-16, and the length is given in UTF-16 code units (2 bytes per code unit)
  • For numbers, the numbers are written either as u16 or u32 (depfinishing on the presence of an L sufrepair), and the length is given in bytes

So, for the above example, the "numbers" appreciate would be compiled into a node with:

  • “Binary” data, unbenevolenting the length is given in bytes
  • A length of 4, since each number literal is compiled as a u16
  • Data bytes of 7B 00 C8 01, where 7B 00 is 123 and C8 01 is 456 (as little-finishian u16)

and the "strings" appreciate would be compiled into a node with:

  • “String” data, unbenevolenting the length is given in UTF-16 code units
  • A length of 8, since each string is 3 UTF-16 code units plus a NUL-terminator
  • Data bytes of 66 00 6F 00 6F 00 00 00 62 00 61 00 72 00 00 00, where 66 00 6F 00 6F 00 00 00 is "foo" and 62 00 61 00 72 00 00 00 is "bar" (both as NUL-finishd little-finishian UTF-16)

This is a bit bizarre, but when splitd out appreciate this it labors fine. The problem is that there is noleang stopping you from combineing strings and numbers in one appreciate, in which case the Windows RC compiler freaks out and produces the type as “binary” (unbenevolenting the length should be clarifyed as a byte count), but the length as a combineture of byte count and UTF-16 code unit count. For example, with this resource:

1 VERSIONINFO {
  VALUE "someleang", "foo", 123
}

Its appreciate’s data will get compiled into these bytes: 66 00 6F 00 6F 00 00 00 7B 00, where 66 00 6F 00 6F 00 00 00 is "foo" (as NUL-finishd little-finishian UTF-16) and 7B 00 is 123 (as a little-finishian u16). This produces for a total of 10 bytes (8 for "foo", 2 for 123), but the Windows RC compiler erroneously inestablishs the appreciate’s data length as 6 (4 for "foo" [counted as UTF-16 code units], and 2 for 123 [counted as bytes]).

This miscompilation has aappreciate results as those detailed in Your overweighte will be determined by a comma:

  • The brimming data of the appreciate will not be read by a parser
  • Due to the tree arrange of VERSIONINFO resource data, this has knock-on effects on all follothriveg nodes, unbenevolenting the entire resource will be mangled

The return of the unbenevolentingful comma🔗

Before, I shelp that string appreciates were compiled as NUL-finishd UTF-16 strings, but this is only the case when either:

  • It is the last data element of a VALUE, or
  • There is a comma separating it from the element after it

So, this:

1 VERSIONINFO {
  VALUE "strings", "foo", "bar"
}

will be compiled with a NUL terminator after both foo and bar, but this:

1 VERSIONINFO {
  VALUE "strings", "foo" "bar"
}

will be compiled only with a NUL terminator after bar. This is also aappreciate to Your overweighte will be determined by a comma, but unappreciate that comma quirk, I don’t ponder this one a miscompilation becainclude the result is not invalid/mangled, and there is a possible include-case for this behavior (concatenating two or more string literals together). However, this behavior is not alludeed in the recordation, so it’s unevident if it’s actuassociate intfinished.

resinator‘s behavior🔗

resinator eludes the length-roverhappinessed miscompilation and disindicts a cautioning:

test.rc:2:22: cautioning: the byte count of this appreciate would be miscompiled by the Win32 RC compiler
  VALUE "someleang", "foo", 123
                     ^~~~~~~~~~
test.rc:2:22: notice: to elude the potential miscompilation, do not combine numbers and strings wilean a appreciate

but alignes the “unbenevolentingful comma” behavior of the Windows RC compiler.

fundamental concept

Turning off flags with NOT transmitions🔗

Let’s say you wanted to depict a dialog resource with a button, but you wanted the button to commence inapparent. You’d do this with a NOT transmition in the “style” parameter of the button appreciate so:

1 DIALOGEX 0, 0, 282, 239
{
  PUSHBUTTON "Cancel",1,129,212,50,14, NOT WS_VISIBLE
}

Since WS_VISIBLE is set by default, this will unset it and produce the button inapparent. If there are any other flags that should be applied, they can be bitrational OR’d appreciate so:

1 DIALOGEX 0, 0, 282, 239
{
  PUSHBUTTON "Cancel",1,129,212,50,14, NOT WS_VISIBLE | BS_VCENTER
}

WS_VISIBLE and BS_VCENTER are fair numbers under-the-hood. For spresentedy’s sake, let’s pretfinish their appreciates are 0x1 for WS_VISIBLE and 0x2 for BS_VCENTER and then caccess on this simplified NOT transmition:

NOT 0x1 | 0x2

Since WS_VISIBLE is on by default, the default appreciate of these flags is 0x1, and so the resulting appreciate is appraised appreciate this:

operation

binary recurrentation of the result

hex recurrentation of the result

Default appreciate: 0x1

0x1

NOT 0x1

0x0

| 0x2

0x2

Ordering matters as well. If we switch the transmition to:

NOT 0x1 | 0x1

then we finish up with 0x1 as the result:

operation

binary recurrentation of the result

hex recurrentation of the result

Default appreciate: 0x1

0x1

NOT 0x1

0x0

| 0x1

0x1

If, instead, the ordering was reversed appreciate so:

0x1 | NOT 0x1

then the appreciate at the finish would be 0x0:

operation

binary recurrentation of the result

hex recurrentation of the result

Default appreciate: 0x1

0x1

0x1

0x1

| NOT 0x1

0x0

With these fundamental examples, NOT seems pretty straightforward, however…

utterly baffling

NOT is incomprehensible🔗

Practicassociate any deviation outside the basic examples summarized in Turning off flags with NOT transmitions directs to bizarre and inexplicable results. For example, these transmitions are all adselected by the Windows RC compiler:

  • NOT (1 | 2)
  • NOT () 2
  • 7 NOT NOT 4 NOT 2 NOT NOT 1

The first one sees appreciate it produces sense, as insightwholey the (1 | 2) would be appraised first so in theory it should be equivalent to NOT 3. However, if the default appreciate of the flags is 0, then the transmition NOT (1 | 2) (somehow) appraises to 2, whereas NOT 3 would appraise to 0.

NOT () 2 seems appreciate it should evidently be a syntax error, but for wantipathyver reason it’s adselected by the Windows RC compiler and also appraises to 2.

7 NOT NOT 4 NOT 2 NOT NOT 1 is enticount on incomprehensible, and fair as incomprehensibly, it also results in 2 (if the default appreciate is 0).

This behavior is so bizarre and evidently inaccurate that I didn’t even try to comprehfinish what’s going on here, so your guess is as excellent as mine on this one.

resinator‘s behavior🔗

resinator only adselects NOT <number>, anyleang else is an error:

test.rc:2:13: error: anticipateed '<number>', got '('
  STYLE NOT () 2
            ^

All 3 of the above examples direct to compile errors in resinator.

parser bug/quirk

NOT can be included in places it produces no sense🔗

The strangeness of NOT doesn’t finish there, as the Windows RC compiler also apexhibits it to be included in many (but not all) places that a number transmition can be included.

As an example, here are NOT transmitions included in the x, y, width, and height arguments of a DIALOGEX resource:

1 DIALOGEX NOT 1, NOT 2, NOT 3, NOT 4
{
  
}

This doesn’t necessarily cainclude problems, but since NOT is only beneficial in the context of turning off helpd-by-default flags of a bit flag parameter, there’s no reason to apexhibit NOT transmitions outside of that context.

However, there is an extra bit of weirdness graspd here, since stateive NOT transmitions cainclude errors in some places but not others. For example, the transmition 1 | NOT 2 is an error if it’s included in the type parameter of a MENUEX‘s MENUITEM, but NOT 2 | 1 is toloftyy adselected.

1 MENUEX {
  
  MENUITEM "bar", 101, 1 | NOT 2
  
  MENUITEM "foo", 100, NOT 2 | 1
}

resinator‘s behavior🔗

resinator errors if NOT transmitions are tryed to be included outside of bit flag parameters:

test.rc:1:12: error: anticipateed number or number transmition; got 'NOT'
1 DIALOGEX NOT 1, NOT 2, NOT 3, NOT 4
           ^~~
miscompilation, crash

No one has thought about FONT resources for decades🔗

As far as I can inestablish, the FONT resource has exactly one purpose: creating .fon files, which are resource-only .dlls (i.e. a .dll with resources, but no entry point) renamed to have a .fon extension. Such .fon files grasp a collection of fonts in the obsolete .fnt font establishat.

The .fon establishat is mostly obsolete, but is still aided in up-to-date Windows, and Windows still ships with some .fon files included:

The Terminal font included in Windows 10 is a .fon file

This .fon-roverhappinessed purpose for the FONT resource, however, has been irrelevant for decades, and, as far as I can inestablish, has not labored brimmingy accurately since the 16-bit version of the Windows RC compiler. To comprehfinish why, though, we have to comprehfinish a little bit about the .fnt establishat.

In version 1 of the .fnt establishat, specified by the Windows 1.03 SDK from 1986, the total size of all the inactive fields in the header was 117 bytes, with a scant fields grasping offsets to variable-length data elsewhere in the file. Here’s a (truncated) visualization, with some relevant ‘offset’ fields enhugeed:

....version....
......size.....
...duplicateright...
......type.....
. . . etc . . .
. . . etc . . .
.device_offset. ───► NUL-finishd device name.
..face_offset.. ───► NUL-finishd font face name.
....bits_ptr...
..bits_offset..

In version 3 of the .fnt establishat (and presumably version 2, but I can’t discover much info about version 2), all of the fields up to and including bits_offset are the same, but there are an graspitional 31 bytes of recent fields, making for a total size of 148 bytes:

....version....
. . . etc . . .
. . . etc . . .
.device_offset.
..face_offset..
....bits_ptr...
..bits_offset..
....reserved... ◄─┐
.....flags..... ◄─┤
.....aspace.... ◄─┤
.....bspace.... ◄─┼── recent fields
.....cspace.... ◄─┤
...color_ptr... ◄─┤
...reserved1.................. ◄─┘
...............

Getting back to resource compilation, FONT resources wilean .rc files are collected and compiled into the follothriveg resources:

  • A RT_FONT resource for each FONT, where the data is the verbatim file satisfyeds of the .fnt file
  • A FONTDIR resource that grasps data about each font, in the establishat specified by FONTGROUPHDR
    • side notice: the string FONTDIR is the type of this resource, it doesn’t have an associated integer ID appreciate most other Windows-depictd resources do

Wilean the FONTDIR resource, there is a FONTDIRENTRY for each font, grasping much of the inestablishation in the .fnt header. In fact, the data actuassociate alignes the version 1 .fnt header almost exactly, with only a scant separateences at the finish:

.fnt version 1      FONTDIRENTRY

....version.... == ...dfVersion...
......size..... == .....dfSize....
...duplicateright... == ..dfCopyright..
......type..... == .....dfType....
. . . etc . . . == . . . etc . . .
. . . etc . . . == . . . etc . . .
.device_offset. == ....dfDevice...
..face_offset.. == .....dfFace....
....bits_ptr... =? ...dfReserved..
..bits_offset..    NUL-finishd device name.
                   NUL-finishd font face name.

The establishats align, except FONTDIRENTRY does not include bits_offset and instead it has trailing variable-length strings

This recorded FONTDIRENTRY is what the obsolete 16-bit version of rc.exe outputs: 113 bytes plus two variable-length NUL-finishd strings at the finish. However, commenceing with the 32-bit resource compiler, contrary to the recordation, rc.exe now outputs FONTDIRENTRY as 148 bytes plus the two variable-length NUL-finishd strings.

You might watch that this 148 number has come up before; it’s the size of the .fnt version 3 header. So, commenceing with the 32-bit rc.exe, FONTDIRENTRY as-written-by-the-resource-compiler is effectively the first 148 bytes of the .fnt file, plus the two strings discoverd at the positions given by the device_offset and face_offset fields. Or, at least, that’s evidently the intention, but this is labeled ‘miscompilation’ for a reason.

Let’s get this example .fnt file for instance:

....version....
. . . etc . . .
. . . etc . . .
.device_offset. ───► some device.
..face_offset.. ───► some font face.
. . . etc . . .
. . . etc . . .
...reserved1...
...............
...............

When compiled with the anciaccess 16-bit Windows RC compiler, some device and some font face are written as trailing strings in the FONTDIRENTRY (as anticipateed), but when compiled with the up-to-date rc.exe, both strings get written as 0-length (only a NUL terminator). The reason why is rather silly, so let’s go thcdisorrowfulmireful it. Here’s the recorded FONTDIRENTRY establishat aacquire, this time with some annotations:

      FONTDIRENTRY

-113 ...dfVersion... (2 bytes)
-111 .....dfSize.... (4 bytes)
-107 ..dfCopyright.. (60 bytes)
 -47 .....dfType.... (2 bytes)
     . . . etc . . .
     . . . etc . . .
 -12 ....dfDevice... (4 bytes)
  -8 .....dfFace.... (4 bytes)
  -4 ...dfReserved.. (4 bytes)

The numbers on the left recurrent the offset from the finish of the FONTDIRENTRY data to the commence of the field

It turns out that the Windows RC compiler includes the offset from the finish of FONTDIRENTRY to get the appreciates of the dfDevice and dfFace fields. This labors fine when those offsets are unchanging, but, as we’ve seen, the Windows RC compiler now includes an unrecorded FONTDIRENTRY definition that is is 35 bytes extfinisheder, but these challengingcoded offsets were never modernized accordingly. This unbenevolents that the Windows RC compiler is actuassociate trying to read the dfDevice and dfFace fields from this part of the .fnt version 3 header:

    ....version....
    . . . etc . . .
    . . . etc . . .
    .device_offset.
    ..face_offset..
    . . . etc . . .
    . . . etc . . .
-12 ...reserved1... ───► ???
 -8 ............... ───► ???
 -4 ...............

The Windows RC compiler reads data from the reserved1 field and clarifys it as dfDevice and dfFace

Becainclude this bug happens to finish up reading data from a reserved field, it’s very anticipateed for that data to fair grasp zeroes, which unbenevolents it will try to read the NUL-finishd strings commenceing at offset 0 from the commence of the file. As a second coincidence, the first field of a .fnt file is a u16 grasping the version, and the only versions I’m conscious of are:

  • Version 1, 0x0100 encoded as little-finishian, so the bytes at offset 0 are 00 01
  • Version 2, 0x0200 encoded as little-finishian, so the bytes at offset 0 are 00 02
  • Version 3, 0x0300 encoded as little-finishian, so the bytes at offset 0 are 00 03

In all three cases, the first byte is 0x00, unbenevolenting trying to read a NUL finishd string from offset 0 always finishs up with a 0-length string for all comprehendn/valid .fnt versions. So, in rehearse, the Windows RC compiler almost always produces the trailing szDeviceName and szFaceName strings as 0-length strings.

This behavior can be checked by originateing a .fnt file with actual offsets to NUL-finishd strings wilean the reserved data field that the Windows RC compiler erroneously reads from:

....version....
. . . etc . . .
. . . etc . . .
.device_offset. ───► some device.
..face_offset.. ───► some font face.
. . . etc . . .
. . . etc . . .
...reserved1... ───► i dare you to read me.
............... ───► you wouldn't.
...............

Compiling such a FONT resource, we do indeed see that the strings i dare you to read me and you wouldn't are written to the FONTDIRENTRY for this FONT rather than some device and some font face.

Does any of this even matter?🔗

Well, no, not reassociate. The whole concept of the FONTDIR grasping inestablishation about all the RT_FONT resources is someleang of a historical relic, anticipateed only relevant when resources were constrained enough that having an overwatch of the font data all in once place apexhibited for selectimization opportunities that made a separateence.

From what I can inestablish, though, on up-to-date Windows, the FONTDIR resource is neglectd enticount on:

  • Linker carry outations will happily join .res files that grasp RT_FONT resources with no FONTDIR resource
  • Windows will happily load/inslofty .fon files that grasp RT_FONT resources with no FONTDIR resource

However, there are a scant caveats…

Misinclude of the FONT resource for non-.fnt fonts🔗

I’m not stateive how prevalent this is, but it can be forgiven that someone might not authenticize that FONT is only intfinished to be included with a font establishat that has been obsolete for multiple decades, and try to include the FONT resource with a up-to-date font establishat.

In fact, there is one Microgentle-supplyd Windows-classic-samples example program that includes FONT resources with .ttf files to include custom fonts in a program: Win7Samples/multimedia/DirectWrite/CustomFont. This is unbenevolentt to be an example of using the DirectWrite APIs depictd here, but this is almost stateively a misinclude of the FONT resource. Other examples, however, include includer-depictd resource types for including .ttf font files, which seems appreciate the accurate choice.

When using non-.fnt files with the FONT resource, the resulting FONTDIRENTRY will be made up of garbage, since it effectively fair gets the first 148 bytes of the file and stuffs it into the FONTDIRENTRY establishat. An graspitional complication with this is that the Windows RC compiler will still try to read NUL-finishd strings using the offsets from the dfDevice and dfFace fields (or at least, where it leanks they are). These offset appreciates, in turn, will have much more variance since the establishat of .fnt and .ttf are so separateent.

This unbenevolents that using FONT with .ttf files may direct to errors, since…

“Negative” offsets direct to errors🔗

For who comprehends what reason, the dfDevice and dfFace appreciates are seemingly treated as signed integers, even though they ostensibly grasp an offset from the commencening of the .fnt file, so a adverse appreciate produces no sense. When the sign bit is set in either of these fields, the Windows RC compiler will error with:

overweightal error RW1023: I/O error seeking in file

This unbenevolents that, for some subset of valid .ttf files (or other non-.fnt font establishats), the Windows RC compiler will fall short with this error.

Other oddities and crashes🔗

  • If the font file is 140 bytes or scanter, the Windows RC compiler seems to default to a dfFace of 0 (as the [incorrect] location of the dfFace field is past the finish of the file).
  • If the file is 75 bytes or petiteer with no 0x00 bytes, the FONTDIR data for it will be 149 bytes (the first n being the bytes from the file, then the rest are 0x00 pgrasping bytes). After that, there will be n bytes from the file aacquire, and then a final 0x00.
  • If the file is between 76 and 140 bytes extfinished with no 0x00 bytes, the Windows RC compiler will crash.

resinator‘s behavior🔗

I’m still not quite stateive what the best course of action is here. I’ve written up what I see as the possibilities here, and for now I’ve gone with what I’m calling the “semi-compatibility while eludeing the keen edges” approach:

Do someleang aappreciate enough to the Win32 compiler in the normal case, but elude emulating the buggy behavior where it produces sense. That would see appreciate a FONTDIRENTRY with the follothriveg establishat:

  • The first 148 bytes from the file verbatim, with no clarifyation whatsoever, trailed by two NUL bytes (correacting to ‘device name’ and ‘face name’ both being zero length strings)

This would apexhibit the FONTDIR to align byte-for-byte with the Win32 RC compiler in the normal case (since very frequently the misclarifyed dfDevice/dfFace will be 0 or point somewhere outside the bounds of the file and therefore will be written as a zero-length string anyway), and only separate in the case where the Win32 RC compiler produces some bogus string(s) to the szDeviceName/szFaceName.

This also helps the include-case of non-.FNT files without any slack finishs.

In stupidinutive: produce the recent/unrecorded FONTDIRENTRY establishat, but elude the crashes, elude the adverse integer-roverhappinessed errors, and always produce szDeviceName and szFaceName as 0-length.

fundamental concept

The graspment of a C/C++ preprocessor🔗

In the intro, I shelp:

.rc files are scripts that grasp both C/C++ preprocessor orders and resource definitions.

So far, I’ve only caccessed on resource definitions, but the graspment of the C/C++ preprocessor cannot be neglectd. From the About Resource Files recordation:

The syntax and semantics for the RC preprocessor are aappreciate to those of the Microgentle C/C++ compiler. However, RC aids a subset of the preprocessor honestives, depicts, and pragmas in a script.

The primary include-case for this is two-fanciaccess:

  • Inclusion of C/C++ headers wilean a .rc file to pull in constants, e.g. #include <thrivedows.h> to apexhibit usage of thrivedow style constants appreciate WS_VISIBLE, WS_BORDER, etc.
  • Being able to spread a .h file between your .rc file and your C/C++ source files, where the .h file grasps leangs appreciate the IDs of various resources.

Here’s some snippets that show both include-cases:


#depict DIALOG_ID 123
#depict BUTTON_ID 234

#include <thrivedows.h>
#include "resource.h"


DIALOG_ID DIALOGEX 0, 0, 282, 239
  
  STYLE DS_SETFONT | DS_MODALFRAME | DS_CENTER | WS_POPUP | WS_CAPTION | WS_SYSMENU
  CAPTION "Dialog"
{
  
  PUSHBUTTON "Button", BUTTON_ID, 129, 182, 50, 14
}

#include <thrivedows.h>
#include "resource.h"


  
  HWND result = CreateDialogParamW(hInst, MAKEINTRESOURCEW(DIALOG_ID), hwnd, DialogProc, (LPARAM)NULL);



  
  HWND button = GetDlgItem(hwnd, BUTTON_ID);

With this setup, changing DIALOG_ID/BUTTON_ID in resource.h impacts both resource.rc and main.c, so they are always kept in sync.

preprocessor bug/quirk, parser bug/quirk

Multiline strings don’t behave as anticipateed/recorded🔗

Wilean the STRINGTABLE resource recordation we see this statement:

The string […] must occupy a individual line in the source file (unless a ‘\’ is included as a line continuation).

This is aappreciate to the rules around C strings:

char *my_string = "Line 1
Line 2";
multilinestring.c:1:19: error: leave outing terminating '"' character
char *my_string = "Line 1
                  ^

Splitting a string atraverse multiple lines without using &bsol; is an error in C

char *my_string = "Line 1 \
Line 2";

printf("%s\n", my_string); results in:

Line 1 Line 2

And yet, contrary to the recordation, splitting a string atraverse multiple lines without \ continuations is not an error in the Windows RC compiler. Here’s an example:

1 RCDATA {

"foo
bar"

}

This will successbrimmingy compile, and the data of the RCDATA resource will finish up as

66 6F 6F 20 0A 62 61 72   foo space.\nbar

I’m not stateive why this is apexhibited, and I also don’t have an exarrangeation for why a space character sneaks into the resulting data out of nowhere. It’s also worth noting that whitespace is collapsed in these should-be-invalid multiline strings. For example, this:

"foo

    bar"

will get compiled into exactly the same data as above (with only a space and a recentline between foo and bar).

But, this on its own is only a insignificant nuisance from the perspective of carry outing a resource compiler—it is unrecorded behavior, but it’s pretty effortless to account for. The authentic problems commence when someone actuassociate includes \ as intfinished.

The collapse of whitespace is imminent🔗

C pop quiz: what will get printed in this example (i.e. what will my_string appraise to)?

char *my_string = "Line 1 \
                   Line 2";

#include <stdio.h>

int main() {
  printf("%s\n", my_string);
  return 0;
}

Let’s compile it with a scant separateent compilers to discover out:

> zig run multilinestring.c -lc
Line 1                    Line 2

> clang multilinestring.c
> a.exe
Line 1                    Line 2

> cl.exe multilinestring.c
> multilinestring.exe
Line 1                    Line 2

That is, the whitespace preceding “Line 2” is included in the string literal.

However, the Windows RC compiler behaves separateently here. If we pass the same example thcdisorrowfulmireful its preprocessor, we finish up with:

#line 1 "multilinestring.c"
char *my_string = "Line 1 \
Line 2";
  1. The \ remains (aappreciate to the MSVC compiler, see the notice above)
  2. The whitespace before “Line 2” is erased

So the appreciate of my_string would be Line 1 Line 2 (well, not reassociate, since char *my_string = doesn’t have a unbenevolenting in .rc files, but you get the idea). This separatence in behavior from C has pragmatic consequences: in this .rc file from one of the Windows-classic-samples example programs, we see the follothriveg, which gets profit of the rc.exe-preprocessor-definite-whitespace-collapsing behavior:

STRINGTABLE 
BEGIN
    
    IDS_MESSAGETEMPLATEFS   "The drop aim is %s.\n\
                            %d files/honestories in HDROP\n\
                            The path to the first object is\n\
                            \t%s."
    
END

Plus, in stateive circumstances, this separateence between rc.exe and C (appreciate other separateences to C) can direct to bugs. This is a rather contrived example, but here’s one way leangs could go wrong:


#depict FOO_TEXT "foo \
                  bar"
#depict IDC_BUTTON_FOO 1001

#include "foo.h"

1 DIALOGEX 0, 0, 275, 280
BEGIN
    PUSHBUTTON FOO_TEXT, IDC_BUTTON_FOO, 7, 73, 93, 14
END

#include "foo.h"


    HWND hFooBtn = GetDlgItem(hDlg, IDC_BUTTON_FOO);
    
    
    SfinishMessage(hFooBtn, WM_SETTEXT, 0, (LPARAM) _T(FOO_TEXT));

In this example, the button depictd in the DIALOGEX would commence with the text foo bar, since that is the appreciate that the Windows RC compiler resettles FOO_TEXT to be, but the SfinishMessage call would then set the text to foo                   bar, since that’s what the C compiler resettles FOO_TEXT to be.

resinator‘s behavior🔗

resinator includes the Aro preprocessor, which unbenevolents it acts appreciate a C compiler. In the future, resinator will anticipateed fork Aro (mostly to aid UTF-16 encoded files), which could apexhibit aligning the behavior of rc.exe in this case as well.

parser bug/quirk, utterly baffling

Escaping quotes is fraught🔗

Aacquire from the STRINGTABLE resource docs:

To embed quotes in the string, include the follothriveg sequence: "". For example, """Line three""" depicts a string that is distake parted as trails:

"Line three"

This is separateent from C, where \" is included to escape quotes wilean a string literal, so in C to get "Line three" you’d do "\"Line three\"".

This separateence, though, can direct to some reassociate bizarre results, since the preprocessor still includes the C escaping rules. Take this basic example:

"\""BLAH"

Here’s how that is seen from the perspective of the preprocessor:

string"\""identifierBLAHstring (unfinished)"

And from the perspective of the compiler:

string"\""BLAH"

So, follothriveg from this, say you had this .rc file:

#depict BLAH "hello"

1 RCDATA { "\""BLAH" }

Since we comprehend the preprocessor sees BLAH as an identifier and we’ve done #depict BLAH "hello", it will replace BLAH with "hello", directing to this result:

1 RCDATA { "\"""hello"" }

which would now be parsed by the compiler as:

string"\"""identifierhellostring""

and direct to a compile error:

test.rc(3) : error RC2104 : undepictd keyword or key name: hello

This is fair one example, but the vague disconsentment around escaped quotes between the preprocessor and the compiler can direct to some reassociate unanticipateed error messages.

Wait, but what actuassociate happens to the backslash?🔗

Backing up a bit, I shelp that the compiler sees "\""BLAH" as one string literal token, so:

1 RCDATA { string"\""BLAH" }

If we compile this, then the data of this RCDATA resource finishs up as:

"BLAH

That is, the \ brimmingy drops out and the "" is treated as an escaped quote. This seems to some sort of exceptional case, as this behavior is not current for other unaccomprehendledged escape sequences, e.g. "\k" will finish up as \k when compiled, and "\" will finish up as \.

resinator‘s behavior🔗

Using \" wilean string literals is always an error, since (as alludeed) it can direct to leangs appreciate unanticipateed macro expansions and challenging-to-comprehfinish errors when the preprocessor and the compiler disconsent.

test.rc:1:13: error: escaping quotes with \" is not apexhibited (include "" instead)
1 RCDATA { "\""BLAH" }
            ^~

This may alter if it turns out \" is normally included in the untamed, but that seems doubtful to be the case.

parser bug/quirk

The column of a tab character matters🔗

Literal tab characters (U+009) wilean an .rc file get altered by the preprocessor into a variable number of spaces (1-8), depfinishing on the column of the tab character in the source file. This unbenevolents that whitespace can impact the output of the compiler. Here’s a scant examples, where ──── denotices a tab character:

1 RCDATA {
"────"
}

the tab gets compiled to 7 spaces:

1 RCDATA {
   "────"
}

the tab gets compiled to 4 spaces:

1 RCDATA {
      "────"
}

the tab gets compiled to 1 space:

resinator‘s behavior🔗

resinator alignes the Win32 RC compiler behavior, but disindicts a cautioning

test.rc:2:4: cautioning: the tab character(s) in this string will be altered into a variable number of spaces (determined by the column of the tab character in the .rc file)
   " "
   ^~~
test.rc:2:4: notice: to include the tab character itself in a string, the escape sequence \t should be included
fundamental concept

The Windows RC compiler ‘speaks’ UTF-16🔗

As alludeed before, .rc files are compiled in two distinct steps:

  1. First, they are run thcdisorrowfulmireful a C/C++ preprocessor (rc.exe has a preprocessor carry outation built-in)
  2. The result of the preprocessing step is then compiled into a .res file

In graspition to a subset of the normal C/C++ preprocessor honestives, there is one resource-compiler-definite #pragma code_page honestive that apexhibits changing which code page is active mid-file. This unbenevolents that .rc files can have a combineture of encodings wilean a individual file:

#pragma code_page(1252) 
1 RCDATA { "This is clarifyed as Windows-1252: €" }

#pragma code_page(65001) 
2 RCDATA { "This is clarifyed as UTF-8: €" }

If the above example file is saved as Windows-1252, each is encoded as the byte 0x80, unbenevolenting:

  • The (0x80) in the RCDATA with ID 1 will be clarifyed as a
  • The (0x80) in the RCDATA with ID 2 will try to be clarifyed as UTF-8, but 0x80 is an invalid commence byte for a UTF-8 sequence, so it will be replaced during preprocessing with the Unicode replacement character (� or U+FFFD)

So, if we run the Windows-1252-encoded file thcdisorrowfulmireful only the rc.exe preprocessor (using the unrecorded rc.exe /p selection), the result is a file with the follothriveg satisfyeds:

#pragma code_page 1252
1 RCDATA { "This is clarifyed as Windows-1252: €" }

#pragma code_page 65001
2 RCDATA { "This is clarifyed as UTF-8: �" }

If, instead, the example file is saved as UTF-8, each is encoded as the byte sequence 0xE2 0x82 0xAC, unbenevolenting:

  • The (0xE2 0x82 0xAC) in the RCDATA with ID 1 will be clarifyed as €
  • The (0xE2 0x82 0xAC) in the RCDATA with ID 2 will be clarifyed as

So, if we run the UTF-8-encoded version thcdisorrowfulmireful the rc.exe preprocessor, the result sees appreciate this:

#pragma code_page 1252
1 RCDATA { "This is clarifyed as Windows-1252: €" }

#pragma code_page 65001
2 RCDATA { "This is clarifyed as UTF-8: €" }

In both of these examples, the result of the rc.exe preprocessor is encoded as UTF-16. This is becainclude, in the Windows RC compiler, the relevant code page clarifyation is done during preprocessing, and the output of the preprocessor is always UTF-16. This, in turn, unbenevolents that the parser/compiler of the Windows RC compiler always ingests UTF-16, as there’s no selection to skip the preprocessing step.

This will be relevant for future bugs/quirks, so fair file this comprehendledge away for now.

preprocessor bug/quirk

Extreme #pragma code_page appreciates🔗

As seen above, the resource-compiler-definite preprocessor honestive #pragma code_page can be included to alter the current code page mid-file. It’s included appreciate so:

#pragma code_page(1252) 


#pragma code_page(65001) 

The catalog of possible code pages can be set up here. If you try to include one that is not valid, rc.exe will error with:

overweightal error RC4214: Codepage not valid:  neglectd

But what happens if you try to include an innervously huge code page appreciate (fantasticer or equivalent to the max of a u32)? Most of the time it errors in the same way as above, but occasionassociate there’s a strange / inexplicable error. Here’s a pickion of a scant:

#pragma code_page(4294967296)
error RC4212: Codepage not integer:  )
overweightal error RC1116: RC terminating after preprocessor errors
#pragma code_page(4295032296)
overweightal error RC22105: MultiByteToWideChar fall shorted.
#pragma code_page(4295032297)
test.rc(2) : error RC2177: constant too huge
test.rc(2) : error RC4212: Codepage not integer:  4
overweightal error RC1116: RC terminating after preprocessor errors

I don’t have an exarrangeation for this behavior, especiassociate with ponders to why only certian innervous appreciates cause an error at all.

resinator‘s behavior🔗

resinator treats code pages outdoing the max of a u32 as a overweightal error.

test.rc:1:1: error: code page too huge in #pragma code_page
#pragma code_page ( 4294967296 )
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This is a split error from the one caincluded by invalid/unaided code pages:

test.rc:1:1: error: invalid or obsremedy code page in #pragma code_page
#pragma code_page ( 64999 )
^~~~~~~~~~~~~~~~~~~~~~~~~~~
test.rc:1:1: error: unaided code page 'utf7 (id=65000)' in #pragma code_page
#pragma code_page ( 65000 )
^~~~~~~~~~~~~~~~~~~~~~~~~~~
preprocessor/parser bug/quirk

Escaping in wide string literals🔗

In normal string literals, invalid escape sequences get compiled into their literal characters. For example:

1 RCDATA {
   "abc\k"  ────►  abc\k
}

However, for reasons obsremedy, invalid escape characters wilean wide string literals fade from the compiled result enticount on:

1 RCDATA {
  L"abc\k"  ────►  a.b.c.
}

On its own, this is fair an inexplicable quirk, but when combined with other quirks, it gets liftd to the level of a (potential) bug.

In combination with tab characters🔗

As detailed in The column of a tab character matters, an embedded tab character gets altered to a variable number of spaces depfinishing on which column it’s at in the file. This happens during preprocesing, which unbenevolents that by the time a string literal is parsed, the tab character will have been replaced with space character(s). This, in turn, unbenevolents that “escaping” an embedded tab character will actuassociate finish up escaping a space character.

Here’s an example where the tab character (denoticed by ────) will get altered to 6 space characters:

1 RCDATA {
L"\────"
}

And here’s what that example sees appreciate after preprocessing (notice that the escape sequence now applies to a individual space character).

1 RCDATA {
L"\······"
}

With the quirk around invalid escape sequences in wide string literals, this unbenevolents that the “escaped space” gets skipped over/neglectd when parsing the string, unbenevolenting that the compiled data in this case will have 5 space characters instead of 6.

In combination with codepoints recurrented by a surrogate pair🔗

As detailed in The Windows RC compiler ‘speaks’ UTF-16, the output of the Windows RC preprocessor is always encoded as UTF-16. In UTF-16, codepoints >= U+10000 are encoded as a surrogate pair (two u16 code units). For example, the codepoint for 𐐷 (U+10437) is encoded in UTF-16 as <0xD801><0xDC37>.

So, let’s say we have this .rc file:

#pragma code_page(65001)
1 RCDATA {
  L"\𐐷"
}

The file is encoded as UTF-8, unbenevolenting the 𐐷 is encoded as 4 bytes appreciate so:

#pragma code_page(65001)
1 RCDATA {
  L"\<0xF0><0x90><0x90><0xB7>"
}

When run thcdisorrowfulmireful the Windows RC preprocessor, it parses the file successbrimmingy and outputs the accurate UTF-16 encoding of the 𐐷 codepoint (recall that the Windows RC preprocessor always outputs UTF-16):

1 RCDATA {
L"\𐐷"
}

However, the Windows RC parser does not seem to be conscious of surrogate pairs, and therefore treats the escape sequence as only pertaining to the first u16 surrogate code unit (the “high surrogate”):

1 RCDATA {
L"\<0xD801><0xDC37>"
}

This unbenevolents that the \<0xD801> is treated as an invalid escape sequence and skipped, and only <0xDC37> produces it into the compiled resource data. This will essentiassociate always finish up being invalid UTF-16, since an unpaired surrogate code unit is ill-established (the only way it wouldn’t finish up as ill-established is if an intentionassociate unpaired high surrogate code unit was included before the escape sequence, e.g. L"\xD801\𐐷").

resinator‘s behavior🔗

resinator currently trys to align the Windows RC compiler’s behavior exactly, and emutardys the includeion between the preprocessor and wide string escape sequences in its string parser.

The reasoning for emulating the Windows RC compiler for escaped tabs/escaped surrogate pairs seems rather dubious, though, so this may alter in the future.

miscompilation

STRINGTABLE semantics bypass🔗

The STRINGTABLE resource is intfinished for embedding string data, which can then be loaded at runtime with LoadString. A STRINGTABLE resource definition sees someleang appreciate this:

STRINGTABLE {
  0, "Hello"
  1, "Goodbye"
}

Notice that there is no id before the STRINGTABLE resource type. This is becainclude all strings wilean STRINGTABLE resources are bundled together in groups of 16 based on their ID and language (we can neglect the language part for now, though). So, if we have this example .rc file:

STRINGTABLE {
  1, "Goodbye"
}

STRINGTABLE {
  0, "Hello"
  23, "Hm"
}

The "Hello" and "Goodbye" strings will be grouped together into one resource, and the "Hm" will be put into another. Each group is written as a series of 16 length integers (one for each string wilean the group), and each length is promptly trailed by a UTF-16 encoded string of that length (if the length is non-zero). So, for example, the first group grasps the strings with IDs 0-15, unbenevolenting, for the .rc file above, the first group would be compiled as:

05 00 48 00 65 00 6C 00  ..H.e.l.
6C 00 6F 00 07 00 47 00  l.o...G.
6F 00 6F 00 64 00 62 00  o.o.d.b.
79 00 65 00 00 00 00 00  y.e.....
00 00 00 00 00 00 00 00  ........
00 00 00 00 00 00 00 00  ........
00 00 00 00 00 00 00 00  ........

Internassociate, STRINGTABLE resources get compiled as the integer resource type RT_STRING, which is 6. The ID of the resource is based on the grouping, so strings with IDs 0-15 go into a RT_STRING resource with ID 1, 16-31 go into a resource with ID 2, etc.

The above is all well and excellent, but what happens if you manuassociate depict a resource with the RT_STRING type of 6? The Windows RC compiler has no qualms with that at all, and compiles it aanticipateed to a includer-depictd resource, so the data of the resource below will be 3 bytes extfinished, grasping foo:

1 6 {
  "foo"
}

In the compiled resource, though, the resource type and ID are indifferentiateable from a properly depictd STRINGTABLE. This unbenevolents that compiling the above resource and then trying to include LoadString will thrive, even though the resource’s data does not adhere at all to the intfinished arrange of a RT_STRING resource:

UINT string_id = 0;
WCHAR buf[1024];
int len = LoadStringW(NULL, string_id, buf, 1024);
if (len != 0) {
    printf("len: %d\n", len);
    wprintf(L"%s\n", buf);
}

That code will output:

len: 1023
o

Let’s leank about what’s going on here. We compiled a resource with three bytes of data: foo. We have no authentic administer over what trails that data in the compiled binary, so we can leank about how this resource is clarifyed by LoadString appreciate this:

66 6F 6F ?? ?? ?? ?? ??  foo?????
?? ?? ?? ?? ?? ?? ?? ??  ????????
          ...               ...  

The first two bytes, 66 6F, are treated as a little-finishian u16 grasping the length of the string that trails it. 66 6F as a little-finishian u16 is 28518, so LoadString leanks that the string with ID 0 is 28 thousand UTF-16 code units extfinished. All of the ?? bytes are those that happen to trail the resource data—they could in theory be anyleang. So, LoadString will erroneously try to read this gargantuan string into buf, but since we only supplyd a buffer of 1024, it only fills up to that size and stops.

In the actual compiled binary of my test program, the bytes follothriveg foo happen to see appreciate this:

66 6F 6F 00 00 00 00 00  foo.....
3C 3F 78 6D 6C 20 76 65  <?xml ve
          ...               ...  

This unbenevolents that the last o in foo happens to be trailed by 00, and 6F 00 is clarifyed as a UTF-16 o character, and that happens to be trailed by 00 00 which is treated as a NUL terminator by wprintf. This elucidates the o we got earlier from wprintf(L"%s\n", buf);. However, if we print the brimming 1023 wchar‘s of the buf appreciate so:

for (int i = 0; i < len; i++) {
    const char* bytes = &buf[i];
    printf("%d: %02X %02X\n", i, bytes[0], bytes[1]);
}

Then it shows more evidently that LoadString did indeed read past our resource data and commenceed loading bytes from toloftyy unroverhappinessed areas of the compiled binary (notice that these bytes align the hexdump above):

0: 6F 00
1: 00 00
2: 00 00
3: 3C 3F
4: 78 6D
5: 6C 20
6: 76 65
...

If we then alter our program to try to load a string with an ID of 1, then the LoadStringW call will crash wilean RtlLoadString (and it would do the same for any ID from 1-15):

Exception thrown at 0x00007FFA63623C88 (ntdll.dll) in stringtabletest.exe: 0xC0000005: Access violation reading location 0x00007FF7A80A2F6E.

  ntdll.dll!RtlLoadString()
  KernelBase.dll!LoadStringBaseExW()
  includer32.dll!LoadStringW()
> stringtabletest.exe!main(...)

This is becainclude, in order to load a string with ID 1, the bytes of the string with ID 0 need to be skipped past. That is, LoadString will determine that the string with ID 0 has a length of 28 thousand, and then try to skip ahead in the file 56 thousand bytes (since the length is in UTF-16 code units), which in our case is well past the finish of the file.

resinator‘s behavior🔗

test.rc:1:3: error: the number 6 (RT_STRING) cannot be included as a resource type
1 6 {
  ^
test.rc:1:3: notice: using RT_STRING honestly anticipateed results in an invalid .res file, include a STRINGTABLE instead
parser bug/quirk, utterly baffling

CONTROL: “I’m fair going to pretfinish I didn’t see that”🔗

Wilean DIALOG/DIALOGEX resources, there are predepictd administers appreciate PUSHBUTTON, CHECKBOX, etc, which are actuassociate fair syntactic sugar for generic CONTROL statements with particular default appreciates for the “class name” and “style” parameters.

For example, these two statements are equivalent:

classCHECKBOX, text"foo", id1, x2, y3, w4, h5
classCONTROL, "foo", 1, class nameBUTTON, styleBS_CHECKBOX | WS_TABSTOP, 2, 3, 4, 5

There is someleang bizarre about the “style” parameter of a generic administer statement, though. For wantipathyver reason, it apexhibits an extra token wilean it and will act as if it doesn’t exist.

CONTROL, "text", 1, BUTTON, BS_CHECKBOX | WS_TABSTOP "why is this apexhibited"style, 2, 3, 4, 5

The "why is this apexhibited" string is finishly neglectd, and this CONTROL will be compiled exactly the same as the previous CONTROL statement shown above.

The extra token can be many leangs (string, number, =, etc), but not anyleang. For example, if the extra token is ;, then it will error with anticipateed numerical dialog constant.

CONTROL: “Okay, I see that transmition, but I don’t comprehfinish it”🔗

Instead of a individual extra token in the style parameter of a CONTROL, it’s also possible to sneak an extra number transmition in there appreciate so:

CONTROL, "text", 1, BUTTON, BS_CHECKBOX | WS_TABSTOP (7+8)style, 2, 3, 4, 5

In this case, the Windows RC compiler no extfinisheder neglects the transmition, but still behaves strangely. Instead of the entire (7+8) transmition being treated as the x parameter appreciate one might anticipate, in this case only the 8 in the transmition is treated as the x parameter, so it finishs up clarifyed appreciate this:

CONTROL, "text", 1, BUTTON, styleBS_CHECKBOX | WS_TABSTOP (7+x8), y2, w3, h4, exstyle5

My guess is that the aappreciateity between this number-transmition-roverhappinessed-behavior and Number transmitions as filenames is not a coincidence, but beyond that I couldn’t inestablish you what’s going on here.

resinator‘s behavior🔗

Such extra tokens/transmitions are never neglectd by resinator; they are always treated as the x parameter, and a cautioning is disindictted if there is no comma between the style and x parameters.

test.rc:4:57: cautioning: this token could be erroneously skipped over by the Win32 RC compiler
  CONTROL, "text", 1, BUTTON, 0x00000002L | 0x00010000L "why is this apexhibited", 2, 3, 4, 5
                                                        ^~~~~~~~~~~~~~~~~~~~~
test.rc:4:57: notice: this line begind from line 4 of file 'test.rc'
  CONTROL, "text", 1, BUTTON, BS_CHECKBOX | WS_TABSTOP "why is this apexhibited", 2, 3, 4, 5

test.rc:4:31: notice: to elude the potential miscompilation, ponder grasping a comma after the style parameter
  CONTROL, "text", 1, BUTTON, 0x00000002L | 0x00010000L "why is this apexhibited", 2, 3, 4, 5
                              ^~~~~~~~~~~~~~~~~~~~~~~~~
test.rc:4:57: error: anticipateed number or number transmition; got '"why is this apexhibited"'
  CONTROL, "text", 1, BUTTON, 0x00000002L | 0x00010000L "why is this apexhibited", 2, 3, 4, 5
                                                        ^~~~~~~~~~~~~~~~~~~~~
miscompilation

That’s odd, I thought you needed more pgrasping🔗

In DIALOGEX resources, a administer statement is recorded to have the follothriveg syntax:

administer [[text,]] id, x, y, width, height[[, style[[, extended-style]]]][, helpId]
[{ data-element-1 [, data-element-2 [,  . . . ]]}]

For now, we can neglect everyleang except the [{ data-element-1 [, data-element-2 [, . . . ]]}] part, which is recorded appreciate so:

administerData

Control-definite data for the administer. When a dialog is produced, and a administer in that dialog which has administer-definite data is produced, a pointer to that data is passed into the administer’s thrivedow procedure thcdisorrowfulmireful the lParam of the WM_CREATE message for that administer.

Here’s an example, where the string "foo" is the administer data:

1 DIALOGEX 0, 0, 282, 239 {
  PUSHBUTTON "Cancel",1,129,212,50,14 { "foo" }
}

After a very extfinished time of having no idea how to get back this data from a Win32 program, I finassociate figured it out while writing this article. As far as I comprehend, the WM_CREATE event can only be getd for custom administers or by superclassing a predepictd administer.

So, let’s say in our program we sign up a class named CustomControl. We can then include it in a DIALOGEX resource appreciate this:

1 DIALOGEX 0, 0, 282, 239 {
  CONTROL "text", 901, "CustomControl", 0, 129,212,50,14 { "foo" }
}

The administer data ("foo") will get compiled as 03 00 66 6F 6F, where 03 00 is the length of the administer data in bytes (3 as a little-finishian u16) and 66 6F 6F are the bytes of foo.

If we load this dialog, then our custom administer’s WNDPROC callback will get a WM_CREATE event where the LPARAM parameter is a pointer to a CREATESTRUCT and ((CREATESTRUCT*)lParam)->lpCreateParams will be a pointer to the administer data (if any exists). So, in our case, the lpCreateParams pointer points to memory that sees the same as the bytes shown above: a u16 length first, and the specified number of bytes follothriveg it. If we administer the event appreciate this:


    case WM_CREATE:
      if (lParam) {
        CREATESTRUCT* produce_params = (CREATESTRUCT*)lParam;
        const BYTE* data = produce_params->lpCreateParams;
        if (data) {
          WORD len = *((WORD*)data);
          printf("administer data len: %d\n", len);
          for (WORD i = 0; i < len; i++) {
              printf("%02X ", data[2 + i]);
          }
          printf("\n");
        }
      }
      fracture;

then we get this output (with some graspitional printing of the callback parameters):

CustomProc hwnd: 00000000022C0A8A msg: WM_CREATE wParam: 0000000000000000 lParam: 000000D7624FE730
administer data len: 3
66 6F 6F

Nice! Now let’s try to grasp a second CONTROL:

1 DIALOGEX 0, 0, 282, 239 {
  CONTROL "text", 901, "CustomControl", 0, 129,212,50,14 { "foo" }
  CONTROL "text", 902, "CustomControl", 0, 189,212,50,14 { "bar" }
}

With this, the CreateDialogParamW call commences fall shorting with:

Cannot discover thrivedow class.

Why would that be? Well, it turns out that the Windows RC compiler miscompiles the pgrasping bytes follothriveg a administer if its administer data has an odd number of bytes. This is aappreciate to what’s depictd in Your overweighte will be determined by a comma, but in the opposite honestion: instead of grasping too scant pgrasping bytes, the Windows RC compiler in this case will grasp too many.

Each administer wilean a dialog resource is anticipateed to be 4-byte aligned (unbenevolenting its memory commences at an offset that is a multiple of 4). So, if the bytes at finish of one administer sees appreciate this, where the dotted boxes recurrent 4-byte boundaries:

  ........foo         

then we only need one byte of pgrasping after foo to secure the next administer is 4-byte aligned:

  ........foo.........

However, the Windows RC compiler erroneously inserts two graspitional pgrasping bytes in this case, unbenevolenting the administer afterwards is misaligned by two bytes:

  ........foo.........

This caincludes every field of the misaligned administer to be misread, directing to a malestablished dialog that can’t be loaded. As alludeed, this is only the case with odd administer data byte counts; if we grasp or erase a byte from the administer data, then this miscompilation does not happen and the accurate amount of pgrasping is written. Here’s what it sees appreciate if "foo" is alterd to "fo":

  ........fo..........

This is a miscompilation that seems very effortless to accidenloftyy hit, but it has gone unaccomprehendledgeed/unrepaired for so extfinished presumably becainclude this ‘administer data’ syntax is very seldom included. For example, there’s not a individual usage of this feature anywhere wilean Windows-classic-samples.

resinator‘s behavior🔗

resinator will elude the miscompilation and will disindict a cautioning when it accomprehendledges that the Windows RC compiler would miscompile:

test.rc:3:3: cautioning: the pgrasping before this administer would be miscompiled by the Win32 RC compiler (it would insert 2 extra bytes of pgrasping)
  CONTROL "text", 902, "CustomControl", 1, 189,212,50,14,2,3 { "bar" }
  ^~~~~~~
test.rc:3:3: notice: to elude the potential miscompilation, ponder grasping one more byte to the administer data of the administer preceding this one
miscompilation, utterly baffling

CONTROL class specified as a number🔗

A generic CONTROL wilean a DIALOG/DIALOGEX resource is specified appreciate this:

classCONTROL, "foo", 1, class nameBUTTON, 1, 2, 3, 4, 5

The class name can be a string literal ("CustomControlClass") or one of BUTTON, EDIT, STATIC, LISTBOX, SCROLLBAR, or COMBOBOX. Internassociate, those unquoted literals are fair predepictd appreciates that compile down to numeric integers:

BUTTON    ──► 0x80
EDIT      ──► 0x81
STATIC    ──► 0x82
LISTBOX   ──► 0x83
SCROLLBAR ──► 0x84
COMBOBOX  ──► 0x85

There’s plenty of pwithdrawnce wilean the Windows RC compiler that you can swap out a predepictd type for its underlying integer and get the same result, and indeed the Windows RC compiler does not protest if you try to do so in this case:

CONTROL, "foo", 1, class name0x80, 1, 2, 3, 4, 5

Before we see at what happens, though, we need to comprehfinish how appreciates that can be either a string or a number get compiled. For such appreciates, if it is a string, it is always compiled as NUL-finishd UTF-16:

66 00 6F 00 6F 00 00 00  f.o.o...

If such a appreciate is a number, then it’s compiled as a pair of u16 appreciates: 0xFFFF and then the actual number appreciate follothriveg that, where the 0xFFFF acts as a indicator that the unclear string/number appreciate is a number. So, if the number is 0x80, it would get compiled into:

FF FF 80 00  ....

The above (FF FF 80 00) is what BUTTON gets compiled into, since BUTTON gets transtardyd to the integer 0x80 under-the-hood. However, getting back to this example:

CONTROL, "foo", 1, class name0x80, 1, 2, 3, 4, 5

We should anticipate the 0x80 also gets compiled into FF FF 80 00, but instead the Windows RC compiler compiles it into:

80 FF 00 00

As far as I can inestablish, the behavior here is to:

  • Truncate the appreciate to a u8
  • If the truncated appreciate is >= 0x80, grasp 0xFF00 and produce the result as a little-finishian u32
  • If the truncated appreciate is < 0x80 but not zero, produce the appreciate as a little-finishian u32
  • If the truncated appreciate is zero, produce zero as a u16

Some examples:

 0x00 ──► 00 00
 0x01 ──► 01 00 00 00
 0x7F ──► 7F 00 00 00
 0x80 ──► FF 80 00 00
 0xFF ──► FF FF 00 00
0x100 ──► 00 00
0x101 ──► 01 00 00 00
0x17F ──► 7F 00 00 00
0x180 ──► FF 80 00 00
0x1FF ──► FF FF 00 00
      etc

I only have the faintest idea of what could be going on here. My guess is that this is some sort of half-baked leftover behavior from the 16-bit resource compiler that never got properly modernized in the shift to the 32-bit compiler, since in the 16-bit version of rc.exe, numbers were compiled as FF <number as u8> instead of FF FF <number as u16>. However, the results we see don’t brimmingy align what we’d anticipate if that were the case—instead of FF 80, we get 80 FF, so I don’t leank this exarrangeation hanciaccesss up.

resinator‘s behavior🔗

resinator will elude the miscompilation and will disindict a cautioning:

test.rc:2:22: cautioning: the administer class of this CONTROL would be miscompiled by the Win32 RC compiler
  CONTROL, "foo", 1, 0x80, 1, 2, 3, 4, 5
                     ^~~~
test.rc:2:22: notice: to elude the potential miscompilation, ponder depicting the administer class using a string (BUTTON, EDIT, etc) instead of a number
compiler bug/quirk

CONTROL class specified as a string literal🔗

I shelp in CONTROL class specified as a number that class name can be specified as a particular set of unquoted identifiers (BUTTON, EDIT, STATIC, etc). I left out that it’s also possible to depict them as quoted string literals—these are equivalent to the unquoted BUTTON class name:

CONTROL, "foo", 1, "BUTTON", 1, 2, 3, 4, 5
CONTROL, "foo", 1, L"BUTTON", 1, 2, 3, 4, 5

Additionassociate, this equivalence is determined after parsing, so these are also equivalent, since \x42 parses to the ASCII character B:

CONTROL, "foo", 1, "\x42UTTON", 1, 2, 3, 4, 5
CONTROL, "foo", 1, L"\x42UTTON", 1, 2, 3, 4, 5

All of the above examples get treated the same as the unquoted literal BUTTON, which gets compiled to FF FF 80 00 as alludeed in the previous section.

A string masquerading as a number🔗

For class name strings that do not parse into one of the predepictd classes (BUTTON, EDIT, STATIC, etc), the class name typicassociate gets written as NUL-finishd UTF-16. For example:

61 00 62 00 63 00 00 00   a.b.c...

However, if you include an L prerepaired string that commences with a \xFFFF escape, then the appreciate is written as if it were a number (i.e. the appreciate is always 32-bits extfinished and has the establishat FF FF <number as u16>). Here’s an example:

All but the first z drop out, as seemingly the first character appreciate after the \xFFFF escape is written as a u16. Here’s another example using a 4-digit hex escape after the \xFFFF:

So, with this bug/quirk, this:

which is indistinguisable from the compiled establish of the class name specified as either an unquoted literal (BUTTON) or quoted string ("BUTTON"). I want to say that this edge case is so definite that it has to have been intentional, but I’m not stateive I can rule out the idea that some very strange confluence of quirks is coming together to produce this behavior unintentionassociate.

resinator‘s behavior🔗

resinator alignes the behavior of the Windows RC compiler for the "BUTTON"/"\x42UTTON" examples, but the L"\xFFFF..." edge case has not yet been determined on as of now.

leave outing error, miscompilation

Cursor posing as an icon and vice versa🔗

The ICON and CURSOR resource types anticipate a .ico file and a .cur file, admireively. The establishat of .ico and .cur is identical, but there is an ‘image type’ field that denotices the type of the file (1 for icon, 2 for cursor).

The Windows RC compiler does not discriminate on what type is included for which resource. If we have foo.ico with the ‘icon’ type, and foo.cur with the ‘cursor’ type, then the Windows RC compiler will happily adselect all of the follothriveg resources:

1 ICON "foo.ico"
2 ICON "foo.cur"
3 CURSOR "foo.ico"
4 CURSOR "foo.cur"

However, the resources with the misaligned types becomes a problem in the resulting .res file becainclude ICON and CURSOR have separateent establishats for their resource data. When the type is ‘cursor’, a LOCALHEADER consisting of two cursor-definite u16 fields is written at the commence of the resource data. This unbenevolents that:

  • An ICON resource with a .cur file will produce those extra cursor-definite fields, but still ‘upgrasp’ itself as an ICON resource
  • A CURSOR resource with an .ico file will not produce those cursor-definite fields, but still ‘upgrasp’ itself as a CURSOR resource
  • In both of these cases, trying to load the resource will always finish up with an inaccurate/invalid result becainclude the parser will be assuming that those fields exist/don’t exist based on the resource type

So, such a misalign always directs to inaccurate/invalid resources in the .res file.

resinator‘s behavior🔗

resinator errors if the resource type (ICON/CURSOR) doesn’t align the type specified in the .ico/.cur file:

test.rc:1:10: error: resource type 'cursor' does not align type 'icon' specified in the file
1 CURSOR "foo.ico"
         ^~~~~~~~~
unessential restrictation

PNG encoded cursors are erroneously declineed🔗

.ico/.cur files are a ‘honestory’ of multiple icons/cursors, included for separateent resolutions. Historicassociate, each image was a device-self-reliant bitmap (DIB), but nowadays they can also be encoded as PNG.

The Windows RC compiler is fine with .ico files that have PNG encoded images, but for wantipathyver reason declines .cur files with PNG encoded images.


1 ICON "png.ico"

2 CURSOR "png.cur"

This restrictation is provably man-made, though. If a .res file grasps a CURSOR resource with PNG encoded image(s), then LoadCursor labors accurately and the cursor distake parts accurately.

resinator‘s behavior🔗

resinator apexhibits PNG encoded cursor images, and cautions about the Windows RC compiler behavior:

test.rc:2:10: cautioning: the resource at index 0 of this cursor has the establishat 'png'; this would be an error in the Win32 RC compiler
2 CURSOR png.cur
         ^~~~~~~
miscompilation, utterly baffling

Adversarial icons/cursors can direct to arbitrarily huge .res files🔗

Each image in a .ico/.cur file has a correacting header entry which grasps (a)
the size of the image in bytes, and (b) the offset of the image’s data wilean the file. The Windows RC file brimmingy thinks that this inestablishation is accurate; it will never error ponderless of how malestablished these two pieces of inestablishation are.

If the inestablished size of an image is huger than the size of the .ico/.cur file itself, the Windows RC compiler will:

  • Write however many bytes there are before the finish of the file
  • Write zeroes for any bytes that are past the finish of the file, except
  • Once it has written 0x4000 bytes total, it will repeat these steps aacquire and aacquire until it accomplishes the brimming inestablished size

Becainclude a .ico/.cur can grasp up to 65535 images, and each image wilean can inestablish its size as up to 2 GiB (more on this in the next bug/quirk), this unbenevolents that a petite (< 1 MiB) evilly produceed .ico/.cur could cainclude the Windows RC compiler to try to produce up to 127 TiB of data to the .res file.

resinator‘s behavior🔗

resinator errors if the inestablished file size of an image is huger than the size of the .ico/.cur file:

test.rc:1:8: error: unable to read icon file 'test.ico': ImpossibleDataSize
1 ICON test.ico
       ^~~~~~~~
miscompilation, utterly baffling

Adversarial icons/cursors can direct to infinitely huge .res files🔗

As alludeed in Adversarial icons/cursors can direct to arbitrarily huge .res files, each image wilean an icon/cursor can inestablish its size as up to 2 GiB. However, the field for the image size is actuassociate 4 bytes wide, unbenevolenting the peak should technicassociate be 4 GiB.

The 2 GiB restrict comes from the fact that the Windows RC compiler actuassociate clarifys this field as a signed integer, so if you try to depict an image with a size huger than 2 GiB, it’ll get clarifyed as adverse. We can somewhat check this by compiling with the verbose flag (/v):

Writing ICON:1, lang:0x409, size -6000000

When this happens, the Windows RC compiler seemingly accesss into an infinite loop when writing the icon data to the .res file, unbenevolenting it will persist trying to produce garbage until (presumably) all the space of the challenging drive has been included up.

resinator‘s behavior🔗

resinator eludes misclarifying the image size as signed, and apexhibits images of up to 4 GiB to be specified if the .ico/.cur file actuassociate is huge enough to grasp them.

miscompilation

Icon/cursor images with impossibly petite sizes direct to bogus .res files🔗

Similar to Adversarial icons/cursors can direct to arbitrarily huge .res files, it’s also possible for images to depict their size as impossibly petite:

  • If the size of an image is inestablished as zero, then the Windows RC compiler will:
    • Write an arbitrary size for the resource’s data
    • Not actuassociate produce any bytes to the data section of the resource
  • If the size of an image is petiteer than the header of the image establishat, then the Windows RC compiler will:
    • Read the brimming header for the image, even if it goes past the inestablished finish of the image data
    • Write the inestablished number of bytes to the .res file, which can never be a valid image since it is petiteer than the header size of the image establishat

resinator‘s behavior🔗

resinator errors if the inestablished size of an image wilean a .ico/.cur is too petite to grasp a valid image header:

test.rc:1:8: error: unable to read icon file 'test.ico': ImpossibleDataSize
1 ICON test.ico
       ^~~~~~~~
miscompilation

Bitmaps with leave outing bytes in their color table🔗

BITMAP resources anticipate .bmp files, which are cdisorrowfulmirewholey arranged someleang appreciate this:

    ..BITMAPFILEHEADER..
..BITMAPINFOHEADER..
....................
....color table.....
....................
....pixel data......
....................
....................
  

The color table has a variable number of entries, prescribed by either the biClrUsed field of the BITMAPINFOHEADER, or, if biClrUsed is zero, 2n where n is the number of bits per pixel (biBitCount). When the number of bits per pixel is 8 or scanter, this color table is included as a color palette for the pixels in the image:

color index

color rgb

color

Example color table (above) and some pixel data that references the color table (below)

This is relevant becainclude the Windows resource compiler does not fair produce the bitmap data to the .res verbatim. Instead, it nakeds the BITMAPFILEHEADER and will always produce the anticipateed number of color table bytes, even if the number of color table bytes in the file doesn’t align anticipateations.

    ..BITMAPFILEHEADER..
..BITMAPINFOHEADER..
....................
....pixel data......
....................
....................
  
    ..BITMAPINFOHEADER..
....................
....color table.....
....................
....pixel data......
....................
....................
  

A bitmap file that leave outs the color table even though a color table is anticipateed, and the data written to the .res for that bitmap

Typicassociate, a bitmap with a stupidinutiveer-than-anticipateed color table is pondered invalid (or, at least, Windows and Firefox fall short to rfinisher them), but the Windows RC compiler does not error on such files. Instead, it will finishly neglect the bounds of the color table and fair read into the follothriveg pixel data if essential, treating it as color data.

    ..BITMAPFILEHEADER..
..BITMAPINFOHEADER..
....................

....pixel data...... ....................

....................
    ..BITMAPINFOHEADER..
....................

..."color table".... ....................

....pixel data...... .................... ....................

When compiled with the Windows RC compiler, the bytes of the color table in the .res will consist of the bytes in the summarized region of the pixel data in the distinct bitmap file.

Further, if it runs out of pixel data to read (i.e. the inferred size of the color table extfinishs beyond the finish of the file), it will commence filling in the remaining leave outing color table bytes with zeroes.

From invalid to valid🔗

Interestingly, the behavior with ponders to petiteer-than-anticipateed color tables unbenevolents that an invalid bitmap compiled as a resource can finish up becoming a valid bitmap. For example, if you have a bitmap with 12 actual entries in the color table, but BITMAPFILEHEADER.biClrUsed says there are 13, Windows ponders that an invalid bitmap and won’t rfinisher it. If you get that bitmap and compile it as a resource, though:

1 BITMAP "invalid.bmp"

The resulting .res will pad the color table of the bitmap to get up to the anticipateed number of entries (13 in this case), and therefore the resulting resource will rfinisher fine when using LoadBitmap to load it.

Maliciously produceed bitmaps🔗

The gloomy side of this bug/quirk is that the Windows RC compiler does not have any restrict as to how many leave outing color palette bytes it apexhibits, and this is even the case when there are possible challenging restricts includeable (e.g. a bitmap with 4-bits-per-pixel can only have 24 (16) colors, but the Windows RC compiler doesn’t mind if a bitmap says it has more than that).

The biClrUsed field (which grasps the number of color table entries) is a u32, unbenevolenting a bitmap can depict it grasps up to 4.29 billion entries in its color table, where each color entry is 4 bytes extfinished (or 3 bytes for anciaccess Windows 2.0 bitmaps). This unbenevolents that a evilly produceed bitmap can cause the Windows RC compiler to produce up to 16 GiB of color table data when writing its resource, even if the file itself doesn’t grasp any color table at all.

resinator‘s behavior🔗

resinator errors if there are any leave outing palette bytes:

test.rc:1:10: error: bitmap has 16 leave outing color palette bytes
1 BITMAP leave outing_palette_bytes.bmp
         ^~~~~~~~~~~~~~~~~~~~~~~~~
test.rc:1:10: notice: the Win32 RC compiler would erroneously pad out the leave outing bytes (and the grasped pgrasping bytes would include 6 bytes of the pixel data)

For a evilly produceed bitmap, that error might see appreciate:

test.rc:1:10: error: bitmap has 17179869180 leave outing color palette bytes
1 BITMAP think_me.bmp
         ^~~~~~~~~~~~
test.rc:1:10: notice: the Win32 RC compiler would erroneously pad out the leave outing bytes

There’s also a cautioning for extra bytes between the color table and the pixel data:

test.rc:2:10: cautioning: bitmap has 4 extra bytes preceding the pixel data which will be neglectd
2 BITMAP extra_palette_bytes.bmp
         ^~~~~~~~~~~~~~~~~~~~~~~
miscompilation

Bitmaps with BITFIELDS and a color palette🔗

When testing leangs using the bitmaps from bmpsuite, there is one well-established .bmp file that rc.exe and resinator administer separateently:

g/rgb16-565pal.bmp: A 16-bit image with both a BITFIELDS segment and a palette.

The details aren’t too meaningful here, so fair comprehend that the file is arranged appreciate this:

    ..BITMAPFILEHEADER..
..BITMAPINFOHEADER..
....................
.....bitfields......
....color table.....
....................
....pixel data......
....................
....................
  

As alludeed earlier, the BITMAPFILEHEADER is dropped when compiling a BITMAP resource, but for wantipathyver reason, rc.exe also drops the color table when compiling this .bmp, so it finishs up appreciate this in the compiled .res:

    ..BITMAPINFOHEADER..
....................
.....bitfields......
....pixel data......
....................
....................
  

Note, though, that wilean the BITMAPINFOHEADER, it still says that there is a color table current (definiteassociate, that there are 256 entries in the color table), so this is anticipateed a miscompilation. One possibility here is that it’s not intfinished to be valid for a .bmp to grasp both color masks and a color table, but that seems dubious becainclude Windows rfinishers the distinct .bmp file fair fine in Explorer/Pboilingos.

resinator‘s behavior🔗

resinator does not drop the color table, so in the compiled .res the bitmap resource data sees appreciate this:

    ..BITMAPINFOHEADER..
....................
.....bitfields......
....color table.....
....................
....pixel data......
....................
....................
  

and while I leank this is accurate, it turns out that…

LoadBitmap mangles both versions anyway🔗

When the compiled resources are loaded with LoadBitmap and drawn using BitBlt, neither the rc.exe-compiled version, nor the resinator-compiled version are drawn accurately:

intfinished image

bitmap resource from rc.exe

bitmap resource from resinator

My guess/hope is that this a bug in LoadBitmap, as I think the resinator-compiled resource should be accurate/valid.

parser bug/quirk, utterly baffling

The strange power of the lonely seal parenthesis🔗

Likely due to some number transmition parsing code gone haywire, a individual seal parenthesis ) is occasionassociate treated as a ‘valid’ transmition, with bizarre consequences.

Similar to what was detailed in BEGIN or { as filename, using ) as a filename has the same includeion as { where the preceding token is treated as both the resource type and the filename.

test.rc(2) : error RC2135 : file not set up: RCDATA

But that’s not all; get this, for example, where we depict an RCDATA resource using a raw data block:

1 RCDATA { 1, ), ), ), 2 }

This should very evidently be a syntax error, but it’s actuassociate adselected by the Windows RC compiler. What does the RC compiler do, you ask? Well, it fair skips right over all the ), of course, and the data of this resource finishs up as:

  the 1 (u16 little finishian) → 01 00 02 00 ← the 2 (u16 little finishian)

I shelp ‘skip’ becainclude that’s truly what seems to happen. For example, for resource definitions that get positional parameters appreciate so:

1 DIALOGEX 1, 2, 3, 4 {
  
  CHECKBOX  "test",  1,  2,  3,  4,  5,  6
}

If you replace the <id> parameter of 1 with ), then all the parameters shift over and they get clarifyed appreciate this instead:

1 DIALOGEX 1, 2, 3, 4 {
  
  CHECKBOX  "test",  ),  2,  3,  4,  5,  6
}

Note also that all of this is only real of the seal parenthesis. The discleave out parenthesis was not deemed worthy of the same power:

test.rc(1) : error RC2237 : numeric appreciate anticipateed at 1
test.rc(1) : error RC1013 : misaligned parentheses

Instead, ( was bestowed a separateent power, which we’ll see next.

resinator‘s behavior🔗

A individual seal parenthesis is never a valid transmition in resinator:

test.rc:2:20: error: anticipateed number or number transmition; got ')'
  CHECKBOX "test", ), 2, 3, 4, 5, 6
                   ^
test.rc:2:20: notice: the Win32 RC compiler would adselect ')' as a valid transmition, but it would be skipped over and potentiassociate direct to unanticipateed outcomes
parser bug/quirk, utterly baffling

The strange power of the frifinishly discleave out parenthesis🔗

While the seal parenthesis has a bug/quirk involving being isotardyd, the discleave out parenthesis has a bug/quirk pondering being snug up aacquirest another token.

This is (somehow) apexhibited:

1 DIALOGEX 1(, (2, (3(, ((((4(((( {}

In the above case, the parameters are clarifyed as if the ( characters don’t exist, e.g. they compile to the appreciates 1, 2, 3, and 4.

This power of ( does not have infinite accomplish, though—in other places a ( directs to an misaligned parentheses error as you might anticipate:

test.rc(1) : error RC1013 : misaligned parentheses

There’s no chance I’m interested in bug-for-bug compatibility with this behavior, so I haven’t spendigated it beyond the shapexhibit examples above. I’m stateive there are more strange implications of this bug lurking for those willing to dive meaningfuler.

resinator‘s behavior🔗

An unseald discleave out parenthesis is always an error resinator:

test.rc:1:14: error: anticipateed number or number transmition; got ','
1 DIALOGEX 1(, (2, (3(, ((((4(((( {}
             ^
parser bug/quirk

General comma-roverhappinessed inconsistencies🔗

The rules around commas wilean statements can be one of the follothriveg depfinishing on the context:

  • Exactly one comma
  • Zero or one comma
  • Zero or any number of commas

And these rules can be combineed and aligned wilean statements. I’ve tried to codify my empathetic of the rules around commas in a test .rc file I wrote. Here’s an example statement that grasps all 3 rules:

AUTO3STATE,, "mytext",, 900,, 1 2 3 4, 3 | NOT 1L, NOT 1 | 3L

,, shows “zero or any number of commas”, shows “zero or one comma”, and , shows “exactly 1 comma”

Empty parameters🔗

In most places where parameters cannot have any number of commas separating them, ,, will direct to a compile error. For example:

1 ACCELERATORS {
  "^b",, 1
}
test.rc(2) : error RC2107 : anticipateed numeric order appreciate

However, there are a scant places where desotardy parameters are adselected, and therefore ,, is not a compile error, e.g. in the MENUITEM of a MENUEX resource:

1 MENUEX {
  
  MENUITEM "foo", 0, 0, 0,
  MENUITEM "foo", , , ,
  MENUITEM "foo",,,,
  
  MENUITEM "foo"
}

Adding one more comma will cainclude a compile error:

1 MENUEX {
  MENUITEM "foo",,,,,
}
test.rc(2) : error RC2235 : too many arguments supplied

Italic is individuald out🔗

DIALOGEX resources can depict a font to include using a FONT voluntary statement appreciate so:

1 DIALOGEX 1, 2, 3, 4
  FONT 16, "Foo"
{
  
}

The brimming syntax of the FONT statement in this context is:

FONT pointsize16, typeface"Foo", weight1, italic2, charset3

weight, italic, and charset are voluntary

For wantipathyver reason, while weight and charset can be desotardy parameters, italic seemingly cannot, since this fall shorts:

1 DIALOGEX 1, 2, 3, 4
  FONT 16, "Foo", , , 
{
  
}
test.rc(2) : error RC2112 : BEGIN anticipateed in dialog

test.rc(6) : error RC2135 : file not set up: }

but this thrives:

1 DIALOGEX 1, 2, 3, 4
  FONT 16, "Foo", , 0, 
{
  
}

Due to the strangeness of the error, I’m assuming that this italic-parameter-definite-behavior is unintfinished.

Further weirdness🔗

Continuing on with the FONT statement of DIALOGEX resources: as we saw in If you’re not last, you’re irrelevant, if there are duplicate statements of the same type, all but the last one is neglectd:

1 DIALOGEX 1, 2, 3, 4
  FONT 16, "Foo", 1, 2, 3 
  FONT 32, "Bar", 4, 5, 6
{
  
}

In the above example, the appreciates-as-compiled will all come from this FONT statement:

  FONT 32, "Bar", 4, 5, 6

However, given that the weight, italic, and charset parameters are voluntary, if you don’t depict them, then their appreciates from the previous FONT statement(s) do actuassociate carry over, with the exception of the charset parameter:

1 DIALOGEX 1, 2, 3, 4
  FONT 16, "Foo", 1, 2, 3
  FONT 32, "Bar"
{
  
}

With the above, the FONT statement that finishs up being compiled will effectively be:

  FONT 32, "Bar", 1, 2, 1

where the last 1 is the charset parameter’s default appreciate (DEFAULT_CHARSET) rather than the 3 we might anticipate from the duplicate FONT statement.

resinator‘s behavior🔗

resinator alignes the Windows RC compiler behavior, but has better error messages/graspitonal cautionings where appropriate:

test.rc:2:21: error: anticipateed number or number transmition; got ','
  FONT 16, "Foo", , ,
                    ^
test.rc:2:21: notice: this line begind from line 2 of file 'test.rc'
  FONT 16, "Foo", /*weight*/, /*italic*/, /*charset*/
test.rc:2:3: cautioning: this statement was neglectd; when multiple statements of the same type are specified, only the last gets pwithdrawnce
  FONT 16, "Foo", 1, 2, 3
  ^~~~~~~~~~~~~~~~~~~~~~~
parser bug/quirk

NUL in filenames🔗

If a filename appraises to a string that grasps a NUL (0x00) character, the Windows RC compiler treats it as a terminator. For example,

1 RCDATA "hello\x00world"

will try to read from the file hello. This is comprehfinishable pondering how C administers strings, but doesn’t exactly seem appreciate desirable behavior since it happens quietly.

resinator‘s behavior🔗

Any appraised filename string grasping a NUL is an error:

test.rc:1:10: error: appraised filename grasps a condemned codepoint: <U+0000>
1 RCDATA "hello\x00world"
         ^~~~~~~~~~~~~~~~
parser bug/quirk, utterly baffling

Subtracting zero can direct to bizarre results🔗

This compiles:

1 DIALOGEX 1, 2, 3, 4 - 0 {}

This doesn’t:

1 DIALOGEX 1, 2, 3, 4-0 {}
test.rc(1) : error RC2112 : BEGIN anticipateed in dialog

I don’t have a finish empathetic as to why, but it seems to be roverhappinessed to subtracting the appreciate zero wilean stateive contexts.

Resource definitions that compile:

  • 1 RCDATA { 4-0 }
  • 1 DIALOGEX 1, 2, 3, 4--0 {}
  • 1 DIALOGEX 1, 2, 3, 4-(0) {}

Resource definitions that error:

  • 1 DIALOGEX 1, 2, 3, 4-0x0 {}
  • 1 DIALOGEX 1, 2, 3, (4-0) {}

The only graspitional inestablishation I have is that the follothriveg:

1 DIALOGEX 1, 2, 3, 10-0x0+5 {} hello

will error, and with the /verbose flag set, rc.exe will output:

test.rc.
test.rc(1) : error RC2112 : BEGIN anticipateed in dialog

Writing DIALOG:1,       lang:0x409,     size 0.
test.rc(1) : error RC2135 : file not set up: hello

Writing {}:+5,  lang:0x409,     size 0

The verbose output gives us a hint that the Windows RC compiler is clarifying the +5 {} hello as a recent resource definition appreciate so:

id+5 type{} filenamehello

So, somehow, the subtraction of the zero caincluded the BEGIN anticipateed in dialog error, and then the Windows RC compiler promptly recommenceed its parser state and began parsing a recent resource definition from scratch. This doesn’t give much insight into why subtracting zero caincludes an error in the first place, but I thought it was a sairyly engaging graspitional wrinkle.

resinator‘s behavior🔗

resinator does not treat subtracting zero as exceptional, and therefore never errors on any transmitions that subtract zero.

Ideassociate, a cautioning would be disindictted in cases where the Windows RC compiler would error, but accomprehendledgeing when that would be the case is not someleang I’m able of doing currently due to my deficiency of empathetic of this bug/quirk.

parser bug/quirk

All operators have equivalent pwithdrawnce🔗

In the Windows RC compiler, all operators have equivalent pwithdrawnce, which is not the case in C. This unbenevolents that there is a misalign between the pwithdrawnce included by the preprocessor (C/C++ operator pwithdrawnce) and the pwithdrawnce included by the compiler.

Instead of detailing this bug/quirk, though, I’m fair going to join to Raymond Chen’s excellent description (finish with the potential consequences):

resinator‘s behavior🔗

resinator alignes the behavior of the Windows RC compiler with ponders to operator pwithdrawnce (i.e. it also grasps an operator-pwithdrawnce-misalign between the preprocessor and the compiler)

parser bug/quirk

That’s not my \a🔗

The Windows RC compiler aids some (but not all) C escape sequences wilean string literals.

Supported

  • \a
  • \n
  • \r
  • \t
  • \nnn (or \nnnnnnn in wide literals)
  • \xhh (or \xhhhh in wide literals)

All of the aided escape sequences behave aanticipateed to how they do in C, with the exception of \a. In C, \a is transtardyd to the hex appreciate 0x07 (aka the “Alert (Beep, Bell)” administer character), while the Windows RC compiler transtardys \a to 0x08 (aka the “Backspace” administer character).

On first glance, this seems appreciate a bug, but there may be some historical reason for this that I’m leave outing the context for.

resinator‘s behavior🔗

resinator alignes the behavior of the Windows RC compiler, translating \a to 0x08.

unrecorded, cli bug/quirk

Unrecorded/strange order-line selections🔗

/sl: Maximum string length, with a twist🔗

From the help text of the Windows RC compiler (rc.exe /?):

/sl      Specify the resource string length restrict in percentage

No further inestablishation is given, and the CLI recordation doesn’t even allude the selection. It turns out that the /sl selection anticipates a number between 1 and 100:

overweightal error RC1235: invalid selection - string length restrict percentage should be between 1 and 100 inclusive

What this selection administers is the peak number of characters wilean a string literal. For example, 4098 a characters wilean a string literal will fall short with string literal too extfinished:

1 RCDATA { "aaaa<...>aaaa" }

So, what are the actual restricts here? What does 100% of the peak string literal length restrict get you?

  • The default peak string literal length (if /sl is not specified) is 4097; it will error if there are 4098 characters in a string literal.
  • If /sl 50 is specified, the peak string literal length becomes 4096 rather than 4097. There is no /sl setting that’s equivalent to the default string literal length restrict, since the selection is restricted to whole numbers.
  • If /sl 100 is specified, the peak length of a string literal becomes 8192.
  • If /sl 33 is set, the peak string literal length becomes 2703 (8192 * 0.33 = 2,703.36). 2704 characters will error with string literal too extfinished.
  • If /sl 15 is set, the peak string literal length becomes 1228 (8192 * 0.15 = 1,228.8). 1229 characters will error with string literal too extfinished.

And to top it all off, rc.exe will crash if /sl 100 is set and there is a string literal with exactly 8193 characters in it. If one more character is grasped to the string literal, it errors with ‘string literal too extfinished’.

resinator‘s behavior🔗

resinator includes codepoint count as the restricting factor and eludes the crash when /sl 100 is set.

string-literal-8193.rc:2:2: error: string literal too extfinished (max is currently 8192 characters)
 "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa<...truncated...>
 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

/a: The obsremedy🔗

/a seems to be a accomprehendledged selection but it’s unevident what it does and the selection is toloftyy unrecorded (and also was not an selection in the 16-bit version of the compiler from what I can inestablish). I was unable to discover anyleang that it impacts about the output of rc.exe.

resinator‘s behavior🔗

<cli>: cautioning: selection /a has no effect (it is unrecorded and its function is obsremedy in the Win32 RC compiler)
 ... /a ...
     ~^

/?c and frifinishs: LCX/LCE secret selections🔗

Either one of /?c or /hc will grasp a normassociate secret ‘Comments rerelocateing switches:’ section to the help menu, with /t and /t-prerepaired selections dealing with .LCX and .LCE files.

Comments rerelocateing switches:
   /t           Generate .LCX output file
   /tp:<prerepair> Extract only comments commenceing with <prerepair>
   /tm          Do not save mnemonics into the output file
   /tc          Do not save comments into the output file
   /tw          Distake part cautioning if custom resources does not have LCX file
   /te          Treat all cautionings as errors
   /ti          Save source file inestablishation for each resource
   /ta          Extract data for all resources
   /tn          Rename .LCE file

I can discover zero info about any of this online. A produced .LCE file seems to be an XML file with some info about the comments and resources in the .rc file(s).

resinator‘s behavior🔗
<cli>: error: the /t selection is unaided
 ... /t ...
     ~^

(and aappreciate errors for all of the other roverhappinessed selections)

/p: Okay, I’ll only preprocess, but you’re not going to appreciate it🔗

The unrecorded /p selection will output the preprocessed version of the .rc file to <filename>.rcpp instead of outputting a .res file (i.e. it will only run the preprocessor). However, there are two sairyly strange leangs about this selection:

  • There doesn’t materialize to be any way to administer the name of the .rcpp file (/fo does not impact it)
  • rc.exe will always exit with exit code 1 when the /p selection is included, even on success
resinator‘s behavior🔗

resinator accomprehendledges the /p selection, but (1) it apexhibits /fo to administer the file name of the preprocessed output file, and (2) it exits with 0 on success.

/s: What’s HWB?🔗

The selection /s <obsremedy> will insert a bunch of resources with name HWB into the .res. I can’t discover any info on this except a notice on this page saying that HWB is a resource name that is reserved by Visual Studio. The selection seems to need a appreciate but the appreciate doesn’t seem to have any impact on the .res satisfyeds and it seems to adselect any appreciate without protestt.

resinator‘s behavior🔗
<cli>: error: the /s selection is unaided
 ... /s ...
     ~^

/z: Mysterious font substitution🔗

The unrecorded /z selection almost always errors with

overweightal error RC1212: invalid selection - /z argument leave outing replace font name

To elude this error, a appreciate with / in it seems to do the trick (e.g. rc.exe /z foo/bar test.rc), but it’s still unevident to me what purpose (if any) this selection has. The title of No one has thought about FONT resources for decades is probably relevant here, too.

resinator‘s behavior🔗
<cli>: error: the /z selection is unaided
 ... /z ...
     ~^
unrecorded

Unrecorded resource types🔗

Most predepictd resource types have some level of recordation here (or are at least cataloged), but there are a scant that are accomprehendledged but not recorded.

DLGINCLUDE🔗

The minuscule bit of includeable recordation I could discover for DLGINCLUDE comes from Microgentle KB Archive/91697:

The dialog editor needs a way to comprehend what include file is associated with a resource file that it discleave outs. Rather than prompt the includer for the name of the include file, the name of the include file is embedded in the resource file in most cases.

Here’s an example from sdkdiff.rc in Windows-classic-samples:

1 DLGINCLUDE "wdiffrc.h"

Further details from Microgentle KB Archive/91697:

In the Win32 SDK, alters were made so that this resource has its own resource type; it was alterd from an RCDATA-type resource with the exceptional name, DLGINCLUDE, to a DLGINCLUDE resource type whose name can be specified.

So, in the 16-bit Windows RC compiler, a DLGINCLUDE would have seeed someleang appreciate this:

DLGINCLUDE RCDATA DISCARDABLE
BEGIN
    "GUTILSRC.H\0"
END

DLGINCLUDE resources get compiled into the .res, but subsequently get neglectd by cvtres.exe (the tool that turns the .res into a COFF object file) and therefore do not produce it into the final joined binary. So, in pragmatic terms, DLGINCLUDE is enticount on unbenevolentingless outside of the Visual Studio dialog editor GUI as far as I comprehend.

DLGINIT🔗

The purpose of this resource seems appreciate it could be aappreciate to administerData in DIALOGEX resources (as detailed in That’s odd, I thought you needed more pgrasping)—that is, it is included to depict administer-definite data that is loaded/included when initializing a particular administer wilean a dialog.

Here’s an example from bits_ie.rc of Windows-classic-samples:

IDD_DIALOG DLGINIT
BEGIN
    IDC_PRIORITY, 0x403, 11, 0
0x6f46, 0x6572, 0x7267, 0x756f, 0x646e, "\000" 
    IDC_PRIORITY, 0x403, 5, 0
0x6948, 0x6867, "\000" 
    IDC_PRIORITY, 0x403, 7, 0
0x6f4e, 0x6d72, 0x6c61, "\000" 
    IDC_PRIORITY, 0x403, 4, 0
0x6f4c, 0x0077, 
    0
END

The resource itself is compiled the same way an RCDATA or User-depictd resource would be when using a raw data block, so each number is compiled as a 16-bit little-finishian integer. The anticipateed arrange of the data seems to be reliant on the type of administer it’s for (in this case, IDC_PRIORITY is the ID for a COMBOBOX administer). In the above example, the establishat seems to be someleang appreciate:

    <administer id>, <language id>, <data length in bytes>, <obsremedy>
<data ...>

The particular establishat is not very relevant, though, as it is (1) also enticount on unrecorded, and (2) produced by the Visual Studio dialog editor.

It is worth noting, though, that the <data ...> parts of the above example, when written as little-finishian u16 integers, correact to the bytes for the ASCII string Foreground, High, Normal, and Low. These strings can also be seen in the Properties thrivedow of the dialog editor in Visual Studio (and the dialog editor is almost stateively how the DLGINIT was produced in the first place):

The Data section of Combo-box Controls in Visual Studio correacts to the DLGINIT data

While it would produce sense for these strings to be included to poputardy the initial selections in the combo box, I couldn’t actuassociate get modifications to the DLGINIT to impact anyleang in the compiled program in my testing. I’m guessing that’s due to a misget on my part, though; my comprehendledge of the Visual Studio GUI side of .rc files is essentiassociate zero.

TOOLBAR🔗

The unrecorded TOOLBAR resource seems to be included in combination with CreateToolbarEx to produce a toolbar of buttons from a bitmap. Here’s the syntax:

<id> TOOLBAR <button width> <button height> {
  
  BUTTON <id>
  
  SEPARATOR
  
}

This resource is included in a scant separateent .rc files wilean Windows-classic-samples. Here’s one example from VCExplore.Rc:

IDR_TOOLBAR_MAIN TOOLBAR DISCARDABLE  16, 15
BEGIN
    BUTTON      ID_TBTN_CONNECT
    SEPARATOR
    BUTTON      ID_TBTN_REFRESH
    SEPARATOR
    BUTTON      ID_TBTN_NEW
    BUTTON      ID_TBTN_SAVE
    BUTTON      ID_TBTN_DELETE
    SEPARATOR
    BUTTON      ID_TBTN_START_APP
    BUTTON      ID_TBTN_STOP_APP
    BUTTON      ID_TBTN_INSTALL_APP
    BUTTON      ID_TBTN_EXPORT_APP
    SEPARATOR
    BUTTON      ID_TBTN_INSTALL_COMPONENT
    BUTTON      ID_TBTN_IMPORT_COMPONENT
    SEPARATOR
    BUTTON      ID_TBTN_UTILITY
    SEPARATOR
    BUTTON      ID_TBTN_ABOUT
END

Additionassociate, a BITMAP resource is depictd with the same ID as the toolbar:

IDR_TOOLBAR_MAIN        BITMAP  DISCARDABLE     "res\\toolbar1.bmp"

The example toolbar bitmap, each icon is 16×15

With the TOOLBAR and BITMAP resources together, and with a CreateToolbarEx call as alludeed above, we get a functional toolbar that sees appreciate this:

The toolbar as distake parted in the GUI; notice the gaps between some of the buttons (the gaps were specified in the .rc file)

resinator‘s behavior🔗

resinator aids these unrecorded resource types, and trys to align the behavior of the Windows RC compiler exactly.

utterly baffling

Certain DLGINCLUDE filenames fracture the preprocessor🔗

The follothriveg script, when encoded as Windows-1252, will cainclude the rc.exe preprocessor to freak out and output what seems to be garbage:

1 DLGINCLUDE "\001ýA\001\001\x1aý\xFF"

If we run this thcdisorrowfulmireful the preprocessor appreciate so:

> rc.exe /p test.rc

Preprocessed file produced in: test.rcpp

Then, in this particular case, it outputs mostly CJK characters and test.rcpp finishs up seeing appreciate this:

#line 1 "C:\\Users\\Ryan\\Programming\\Zig\\resinator\\tmp\\RCa18588"
#line 1 "test.rc"
#line 1 "test.rc"
‱䱄䥇䍎啌䕄∠ぜ㄰䇽ぜ㄰ぜ㄰硜愱峽䙸≆

The most minimal reproduction I’ve set up is:

1 DLGINCLUDE "â"""

which outputs:

#line 1 "C:\\Users\\Ryan\\Programming\\Zig\\resinator\\tmp\\RCa21256"
#line 1 "test.rc"
#line 1 "test.rc"
‱䱄䥇䍎啌䕄∠⋢∢

As alludeed in The Windows RC compiler ‘speaks’ UTF-16, the result of the preprocessor is always encoded as UTF-16, and the above is the result of clarifying the preprocessed file as UTF-16. If, instead, we clarify the preprocessed file as UTF-8 (or ASCII), we would see someleang appreciate this instead:

#<0x00>l<0x00>i<0x00>n<0x00>e<0x00> <0x00>1<0x00> <0x00>"<0x00>C<0x00>:<0x00>\<0x00>\<0x00>U<0x00>s<0x00>e<0x00>r<0x00>s<0x00>\<0x00>\<0x00>R<0x00>y<0x00>a<0x00>n<0x00>\<0x00>\<0x00>P<0x00>r<0x00>o<0x00>g<0x00>r<0x00>a<0x00>m<0x00>m<0x00>i<0x00>n<0x00>g<0x00>\<0x00>\<0x00>Z<0x00>i<0x00>g<0x00>\<0x00>\<0x00>r<0x00>e<0x00>s<0x00>i<0x00>n<0x00>a<0x00>t<0x00>o<0x00>r<0x00>\<0x00>\<0x00>t<0x00>m<0x00>p<0x00>\<0x00>\<0x00>R<0x00>C<0x00>a<0x00>2<0x00>2<0x00>9<0x00>4<0x00>0<0x00>"<0x00>
<0x00>
<0x00>#<0x00>l<0x00>i<0x00>n<0x00>e<0x00> <0x00>1<0x00> <0x00>"<0x00>t<0x00>e<0x00>s<0x00>t<0x00>.<0x00>r<0x00>c<0x00>"<0x00>
<0x00>
<0x00>#<0x00>l<0x00>i<0x00>n<0x00>e<0x00> <0x00>1<0x00> <0x00>"<0x00>t<0x00>e<0x00>s<0x00>t<0x00>.<0x00>r<0x00>c<0x00>"<0x00>
<0x00>
<0x00>1 DLGINCLUDE "?"""
<0x00>
<0x00>

With this clarifyation, we can see that 1 DLGINCLUDE "â""" actuassociate did get disindictted by the preprocessor (albeit with â replaced by ?), but it was disindictted as a individual-byte-encoding (e.g. ASCII) while the rest of the file was disindictted as UTF-16 (hence all the <0x00> bytes). The file combineing encodings appreciate this unbenevolents that it is finishly unusable, but at least we comprehend a little bit about what’s going on. As to why or how this bug could manifest, that is finishly uncomprehendable. I can’t even hazard a guess as to why stateive DLGINCLUDE string literals would cainclude the preprocessor to output parts of the file with a individual-byte-encoding.

Some normalalities between all the reproductions of this bug I’ve set up so far:

  • The byte count of the .rc file is even, no reproduction has had a filesize with an odd byte count.
  • The number of distinct sequences (a byte, an escaped integer, or an escaped quote) in the filename string has to be petite (min: 2, max: 18)

resinator‘s behavior🔗

resinator eludes this bug and administers the impacted strings the same way that other DLGINCLUDE strings are administerd by the Windows RC compiler

utterly baffling

Certain DLGINCLUDE filenames trigger leave outing '=' in EXSTYLE=<flags> errors🔗

Certain strings, when included with the DLGINCLUDE resource, will cainclude a seemingly enticount on disjoined error. Here’s one example (truncated, the brimming reproduction is fair a extfinisheder sequence of random characters/escapes):

1 DLGINCLUDE "\06f\x2\x2b\445q\105[ð\134\x90<...truncated...>"

If we try to compile this, we get this error:

test.rc(2) : error RC2136 : leave outing '=' in EXSTYLE=<flags>

Not only do I not comprehend why this error would ever be triggered for DLGINCLUDE (EXSTYLE is definite to DIALOG/DIALOGEX), I’m not even stateive what this error unbenevolents or how it could be triggered normassociate, since EXSTYLE doesn’t include the syntax EXSTYLE=<flags> at all. If we actuassociate try to include the EXSTYLE=<flags> syntax, it gives us an error, so this is not a case of an error message for an unrecorded feature:

1 DIALOG 1, 2, 3, 4
  EXSTYLE=1
{
  
}
test.rc(2) : error RC2112 : BEGIN anticipateed in dialog

test.rc(4) : error RC2135 : file not set up: END

I have two possible theories of what might be going on here:

  1. The error is intfinished but the error message is wrong, i.e. it’s using some inside code for an error message that never got its message modernized accordingly
  2. There’s a lot of undepictd behavior being call upond here, and it fair so happens that some random (normassociate impossible?) error is the result

I’m leaning more towards selection 2, since there’s no evident reason why the strings that reproduce the error would cainclude any error at all. One point aacquirest it, though, is that I’ve set up quite a scant separateent reproductions that all trigger the same error—the only authentic normalality in the reproductions is that they all have around 240 to 250 distinct characters/escape sequences wilean the DLGINCLUDE string literal.

resinator‘s behavior🔗

resinator eludes the error and administers the impacted strings the same way that other DLGINCLUDE strings are administerd by the Windows RC compiler

unrecorded

Various other unrecorded/misrecorded leangs🔗

Predepictd macros🔗

The recordation only alludes RC_INVOKED, but _WIN32 is also depictd by default by the Windows RC compiler. For example, this successbrimmingy compiles and the .res grasps the RCDATA resource.

#ifdef _WIN32
1 RCDATA { "hello" }
#finishif

Dialog administers🔗

In the “Edit Control Statements” recordation:

  • BEDIT is cataloged, but is unaccomprehendledged by the Windows RC compiler and will error with undepictd keyword or key name: BEDIT if you try to include it
  • HEDIT and IEDIT are cataloged and are accomprehendledged, but have no further recordation

In the “GROUPBOX administer” recordation, it says:

The GROUPBOX statement, which you can include only in a DIALOGEX statement, depicts the text, identifier, stupidensions, and attributes of a administer thrivedow.

However, the “can include only in a DIALOGEX statement” (unbenevolenting it’s not apexhibited in a DIALOG resource) is not actuassociate real, since this compiles successbrimmingy:

1 DIALOG 0, 0, 640, 480 {
  GROUPBOX "text", 1, 2, 3, 4, 5
}

In the “Button Control Statements” recordation, USERBUTTON is cataloged (and is accomprehendledged by the Windows RC compiler), but grasps no further recordation.

HTML can include a raw data block, too🔗

In the RCDATA and User-depictd resource recordation, it alludes that they can include raw data blocks:

The data can have any establishat and can be depictd […] as a series of numbers and strings (if the raw-data block is specified).

The HTML resource recordation does not allude raw data blocks, even though it, too, can include them:

1 HTML { "foo" }

GRAYED and INACTIVE🔗

In both the MENUITEM and POPUP recordation:

Option Description
GRAYED […]. This selection cannot be included with the INACTIVE selection.
INACTIVE […]. This selection cannot be included with the GRAYED selection.

However, there is no cautioning or error if they are included together:

1 MENU {
  POPUP "bar", GRAYED, INACTIVE {
    MENUITEM "foo", 1, GRAYED, INACTIVE
  }
}

It’s not evident to me why the recordation says that they cannot be included together, and I haven’t (yet) put in the effort to spendigate if there are any pragmatic consequences of doing so.

Semicolon comments

From the Comments recordation:

RC aids C-style syntax for both individual-line comments and block comments. Single-line comments commence with two forward slashes (//) and run to the finish of the line.

What’s not alludeed is that a semicolon (;) is treated cdisorrowfulmirewholey the same as //:

; this is treated as a comment
1 RCDATA { "foo" } ; this is also treated as a comment

There is one separateence, though, and that’s how each is treated wilean a resource ID/type. As alludeed in Special tokenization rules for names/IDs, resource ID/type tokens are fundamentalassociate only finishd by whitespace. However, // wilean an ID/type is treated as the commence of a comment, so this, for example, errors:

test.rc(2) : error RC2135 : file not set up: RC

See “Infinish resource at EOF” for an exarrangeation of the error

This is not the case for semicolons, though, where the follothriveg example compiles into a resource with the type RC;DATA:

1 RC;DATA { "foo" }

We can be reasonably stateive that the semicolon comment is an intentional feature due to its presence in a file wilean Windows-classic-samples:

; Version stamping inestablishation:

VS_VERSION_INFO VERSIONINFO
...

; String table

STRINGTABLE
...

but it is wholly unrecorded.

BLOCK statements aid appreciates, too🔗

As detailed in Misalign in length units in VERSIONINFO nodes, VALUE statements wilean VERSIONINFO resources are specified appreciate so:

VALUE <name>, <appreciate(s)>

Some examples:

1 VERSIONINFO {
  VALUE "numbers", 123, 456
  VALUE "strings", "foo", "bar"
}

There are also BLOCK statements, which themselves can grasp BLOCK/VALUE statements:

1 VERSIONINFO {
  BLOCK "foo" {
    VALUE "child", "of", "foo"
    BLOCK "bar" {
      VALUE "nested", "appreciate"
    }
  }
}

What is not alludeed anywhere that I’ve seen, though, is that BLOCK statements can also have <appreciate(s)> after their name parameter appreciate so:

1 VERSIONINFO {
  BLOCK "foo", "bar", "baz" {
    
  }
}

In rehearse, this capability is almost enticount on irrelevant. Even though VERSIONINFO apexhibits you to depict any arbitrary tree arrange that you’d appreciate, users of the VERSIONINFO resource anticipate a very particular arrange with stateive BLOCK names. In fact, it’s comprehfinishable that this is left out of the recordation, since the VERSIONINFO recordation doesn’t record BLOCK/VALUE statements in vague, but rather only StringFileInfo BLOCK and VarFileInfo BLOCK, definiteassociate.

resinator‘s behavior🔗

For all of the unrecorded leangs detailed in this section, resinator trys to align the behavior of the Windows RC compiler 1:1 (or, as seally as my current empathetic of the Windows RC compiler’s behavior apexhibits).

parser bug/quirk, miscompilation

Non-ASCII accelerator characters🔗

The ACCELERATORS resource can be included to essentiassociate depict boilingkeys for a program. In the message loop of a Win32 program, the TranstardyAccelerator function can be included to automaticassociate turn the relevant keystrokes into WM_COMMAND messages with the associated idappreciate as the parameter (unbenevolenting it can be administerd appreciate any other message coming from a menu, button, etc).

Simplified example from Using Keyboard Accelerators:

1 ACCELERATORS {
  "B", 300, CONTROL, VIRTKEY
}

This associates the key combination Ctrl + B with the ID 300 which can then be administerd in Win32 message loop processing code appreciate this:


        case WM_COMMAND: 
            switch (LOWORD(wParam)) 
            {
                case 300:

There are also a number of ways to depict the keys for an accelerator, but the relevant establish here is depicting “administer characters” using a string literal with a ^ character, e.g. "^B".

When depicting a administer character using ^ with an ASCII character that is outside of the range of A-Z (case inempathetic), the Windows RC compiler will give the follothriveg error:

1 ACCELERATORS {
  "^!", 300
}
test.rc(2) : error RC2154 : administer character out of range [^A - ^Z]

However, contrary to what the error implies, many (but not all) non-ASCII characters outside the A-Z range are actuassociate adselected. For example, this is not an error (when the file is encoded as UTF-8):

#pragma code_page(65001)
1 ACCELERATORS {
  "^Ξ", 300
}

When evaluating these ^ strings, the final ‘administer character’ appreciate is determined by subtracting 0x40 from the ASCII uppercased appreciate of the character follothriveg the ^, so in the case of ^b that would see appreciate:

character (hex appreciate)

uppercased (hex appreciate)

administer character appreciate

The same process is included for any apexhibited codepoints outside the A-Z range, but the uppercasing is only done for ASCII appreciates, so in the example above with Ξ (the codepoint U+039E; Greek Capital Letter Xi), the appreciate is calcutardyd appreciate this:

codepoint (hex appreciate)

administer character appreciate

I think this is a bogus appreciate, since the final appreciate of a administer character is unbenevolentt to be in the range of 0x01 (^A) thcdisorrowfulmireful 0x1A (^Z), which are treated speciassociate. My assumption is that a appreciate of 0x035E would fair be treated as the Unicode codepoint U+035E (Combining Double Macron), but I’m unstateive exactly how I would go about testing this assumption since all aspects of the includeion between accelerators and non-ASCII key appreciates are still brimmingy cloudy to me.

resinator‘s behavior🔗

In resinator, administer characters specified as a quoted string with a ^ in an ACCELERATORS resource (e.g. "^C") must be in the range of A-Z (case inempathetic).

test.rc:3:3: error: invalid accelerator key '"^Ξ"': ControlCharacterOutOfRange
  "^Ξ", 1
  ^~~~~
fundamental concept

The enticount on unrecorded concept of the ‘output’ code page🔗

As alludeed in The Windows RC compiler ‘speaks’ UTF-16, there are #pragma code_page preprocessor honestives that can alter how each line of the input .rc file is clarifyed. Additionassociate, the default code page for a file can also be set via the CLI /c selection, e.g. /c65001 to set the default code page to UTF-8.

What was not alludeed, however, is that the code page impacts both how the input is clarifyed and how the output is encoded. Take the follothriveg example:

1 RCDATA { "Ó" }

When saved as Windows-1252 (the default code page for the Windows RC compiler), the 0xD3 byte in the string will be clarifyed as Ó and written to the .res as its Windows-1252 recurrentation (0xD3).

If the same Windows-1252-encoded file is compiled with the default code page set to UTF-8 (rc.exe /c65001), then the 0xD3 byte in the .rc file will be an invalid UTF-8 byte sequence and get replaced with � during preprocessing, and becainclude the code page is UTF-8, the output in the .res file will also be encoded as UTF-8, so the bytes 0xEF 0xBF 0xBD (the UTF-8 sequence for �) will be written.

This is all pretty reasonable, but leangs commence to get truly bizarre when you grasp #pragma code_page into the combine:

#pragma code_page(1252)
1 RCDATA { "Ó" }

When saved as Windows-1252 and compiled with Windows-1252 as the default code page, this will labor the same as depictd above. However, if we compile the same Windows-1252-encoded .rc file with the default code page set to UTF-8 (rc.exe /c65001), we see someleang rather strange:

  • The input 0xD3 byte is clarifyed as Ó, as anticipateed since the #pragma code_page alterd the code page to 1252
  • The output in the .res is 0xC3 0x93, the UTF-8 sequence for Ó (instead of the anticipateed 0xD3 which is the Windows-1252 encoding of Ó)

That is, the #pragma code_page alterd the input code page, but there is a distinct output code page that can be out-of-sync with the input code page. In this instance, the input code page for the 1 RCDATA ... line is Windows-1252, but the output code page is still the default set from the CLI selection (in this case, UTF-8).

Even more bizarcount on, this discombinetedness can only occur when a #pragma code_page is the first ‘leang’ in the file:


#pragma code_page(1252)
1 RCDATA { "Ó" }

With this, still saved as Windows-1252, the code page from the CLI selection no extfinisheder matters—even when compiled with /c65001, the 0xD3 in the file is both clarifyed as Windows-1252 (Ó) and outputted as Windows-1252 (0xD3).

I included the nebulous term ‘leang’ becainclude the rules for what stops the discombinet code page phenomenon is equassociate nebulous. Here’s what I currently comprehend can come before the first #pragma code_page while still causing the input/output code page desync:

  • Any whitespace
  • A non-code_page pragma honestive (e.g. #pragma foo)
  • An #include that includes a file with a .h or .c extension (the satisfyeds of those files are neglectd after preprocessing)
  • A code_page pragma with an invalid code page, but only if the /w CLI selection is set which turns invalid code page pragmas into cautionings instead of errors

I have a senseing this catalog is infinish, though, as I only recently figured out that it’s not an inherent bug/quirk of the first #pragma code_page in the file. Here’s a file grasping all of the above elements:

#include "desotardy.h"
    #pragma code_page(123456789)
#pragma foo

#pragma code_page(1252)
1 RCDATA { "Ó" }

When compiled with rc.exe /c65001 /w, the above still shows the input/output code page desync (i.e. the Ó is clarifyed as Windows-1252 but compiled into UTF-8).

So, to condense, this is how leangs seem to labor:

  • The CLI /c selection sets both the input and output code pages
  • If the first #pragma code_page in the file is also the first ‘leang’ in the file, then it only sets the input code page, and does not alter the output code page
  • Any other #pragma code_page honestives set both the input and output code pages

This behavior is baffling and I’ve not seen it alludeed anywhere on the internet at any point in time. Even the concept of the code page impacting the encoding of the output is brimmingy unrecorded as far as I can inestablish.

resinator‘s behavior🔗

resinator emutardys the behavior of the Windows RC compiler, but disindicts a cautioning:

test.rc:1:1: cautioning: #pragma code_page as the first leang in the .rc script can cainclude the input and output code pages to become out-of-sync
#pragma code_page ( 1252 )
^~~~~~~~~~~~~~~~~~~~~~~~~~
test.rc:1:1: notice: this line begind from line 1 of file 'test.rc'
#pragma code_page(1252)

test.rc:1:1: notice: to elude unanticipateed behavior, grasp a comment (or anyleang else) above the #pragma code_page line

It’s possible that resinator will not emutardy the input/output code page desync in the future, but still disindict a cautioning about the Windows RC compiler behavior when the situation is accomprehendledgeed.

preprocessor bug/quirk

That’s not whitespace, this is whitespace🔗

As touched on in The collapse of whitespace is imminent, the preprocessor trims whitespace. What wasn’t alludeed cltimely, though, is that this whitespace trimming happens for every line in the file (and it only trims directing whitespace). So, for example, if you run this basic example thcdisorrowfulmireful the preprocessor:

1 RCDATA {
    "this was indented"
}

it becomes this after preprocessing:

1 RCDATA {
"this was indented"
}

Additionassociate, as inestablishly alludeed in Special tokenization rules for names/IDs, the Windows RC compiler treats any ASCII character from 0x05 to 0x20 (inclusive) as whitespace for the purpose of tokenization. However, it turns out that this is not the set of characters that the preprocessor treats as whitespace.

To determine what the preprocessor ponders to be whitespace, we can get profit of its whitespace collapsing behavior. For example, if we run the follothriveg script thcdisorrowfulmireful the preprocessor, we will see that it does not get collapsed, so therefore we comprehend the preprocessor does not ponder <0x05> to be whitespace:

1 RCDATA {
<0x05>   "this was indented"
}

If we iterate over every codepoint and check if they get collapsed, we can figure out exactly what the preprocessor sees as whitespace. These are the results:

  • U+0009 Horizontal Tab (\t)
  • U+000A Line Feed (\n)
  • U+000B Vertical Tab
  • U+000C Form Feed
  • U+000D Carriage Return (\r)
  • U+0020 Space
  • U+00A0 No-Break Space
  • U+1680 Ogham Space Mark
  • U+180E Mongolian Vowel Separator
  • U+2000 En Quad
  • U+2001 Em Quad
  • U+2002 En Space
  • U+2003 Em Space
  • U+2004 Three-Per-Em Space
  • U+2005 Four-Per-Em Space
  • U+2006 Six-Per-Em Space
  • U+2007 Figure Space
  • U+2008 Punctuation Space
  • U+2009 Thin Space
  • U+200A Hair Space
  • U+2028 Line Separator
  • U+2029 Paragraph Separator
  • U+202F Narrow No-Break Space
  • U+205F Medium Mathematical Space
  • U+3000 Ideodetailed Space

This catalog almost alignes exactly with the Windows carry outation of iswspace, but iswspace returns real for U+0085 Next Line while the rc.exe preprocessor does not ponder U+0085 to be whitespace. So, while I ponder the rc.exe preprocessor using iswspace to be the most anticipateed exarrangeation for its whitespace handling, I don’t have a reason for why U+0085 in particular is leave outd.

In terms of pragmatic consequences of this misalign in whitespace characters between the preprocessor and the parser, I don’t have much. This is mostly fair another entry in the vague “leangs you would anticipate some consistency on” catebloody. The only leang I was able to come up with is roverhappinessed to the previous The enticount on unrecorded concept of the ‘output’ code page section, since the trimming of whitespace-that-only-the-preprocessor-ponders-to-be-whitespace unbenevolents that this example will show the input/output code page desync:

<U+00A0><U+1680><U+180E>
#pragma code_page(1252)
1 RCDATA { "Ó" }

resinator‘s behavior🔗

resinator does not currently administer this very well. There’s some aid for handling U+00A0 (No-Break Space) at the commence of a line in the tokenizer due to a previously infinish empathetic of this bug/quirk, but I’m currently in the process of pondering how this should best be administerd.

parser bug/quirk, utterly baffling

String literals that are forced to be ‘wide’🔗

There are two types of string literals in .rc files. For deficiency of better terminology, I’m going to call them normal ("foo") and wide (L"foo", notice the L prerepair). In the context of raw data blocks, this separateence is unbenevolentingful with ponders to the compiled result, since normal string literals are encoded using the current output code page (see The enticount on unrecorded concept of the ‘output’ code page), while wide string literals are encoded as UTF-16:

1 RCDATA {
  "foo",  ────►  66 6F 6F  foo
  L"foo"  ────►  66 00 6F 00 6F 00  f.o.o.
}

However, in other contexts, the result is always encoded as UTF-16, and, in that case, there are some exceptional (and strange) rules for how strings are parsed/administerd. The brimming catalog of contexts in which this occurs is not super relevant (see the usages of parseQuotedStringAsWideString in resinator if you’re inquisitive), so we’ll caccess on fair one: STRINGTABLE strings. Wilean a STRINGTABLE, both "foo" and L"foo" will get compiled to the same result (encoded as UTF-16):

STRINGTABLE {
  1 "foo"   ────►  66 00 6F 00 6F 00  f.o.o.
  2 L"foo"  ────►  66 00 6F 00 6F 00  f.o.o.
}

We can also neglect L prerepaired strings (wide strings) from here on out, since they aren’t actuassociate any separateent in this context than any other. The bug/quirk in ask only manifests for “normal” strings that are parsed/compiled into UTF-16, so for the sake of clarity, I’m going to call such strings “forced-wide” strings. For all other strings except “forced-wide” strings, integer escape sequences (e.g. \x80 [hexadecimal] or \123 [octal]) are administerd as you might anticipate—the number they encode is honestly disindictted, so e.g. the sequence \x80 always gets compiled into the integer appreciate 0x80, and then either written as a u8 or a u16 as seen here:

1 RCDATA {
  "\x80",    ────►  80
  L"\x80"    ────►  80 00
}

STRINGTABLE {
  1 L"\x80"  ────►  80 00
}

However, for “forced-wide” strings, this is not the case:

STRINGTABLE {
  1 "\x80"  ────►  AC 20
}

Why is the result AC 20? Well, for these “forced-wide” strings, the escape sequence is parsed, then that appreciate is re-clarifyed using the current code page, and then the resulting codepoint is written as UTF-16. In the above example, the current code page is Windows-1252 (the default), so this is what’s going on:

  • \x80 parsed into an integer is 0x80
  • 0x80 clarifyed as Windows-1252 is
  • has the codepoint appreciate U+20AC
  • U+20AC encoded as little-finishian UTF-16 is AC 20

This unbenevolents that if we include a separateent code page, then the compiled result will also be separateent. If we include rc.exe /c65001 to set the code page to UTF-8, then this is what we get:

STRINGTABLE {
  1 "\x80"  ────►  FD FF
}

FD FF is the little-finishian UTF-16 encoding of the codepoint U+FFFD (� aka the Replacement Character). The exarrangeation for this result is a bit more graspd, so let’s get a inestablish detour…

It is possible for string literals wilean .rc files to grasp byte sequences that are pondered invalid wilean their code page. The easiest way to show this is with UTF-8, where there are many ways to produce invalid sequences. One such way is fair to include a byte that can never be part of a valid UTF-8 sequence, appreciate <0xFF>. If we do so, this is the result:

1 RCDATA {
  "<0xFF>",  ────►  EF BF BD
  L"<0xFF>"  ────►  FD FF
}

Compiled using the UTF-8 code page via rc.exe /c65001

EF BF BD is U+FFFD (�) encoded as UTF-8, and (as alludeed before), FD FF is the little-finishian UTF-16 encoding of the same codepoint. So, when come atraverseing an invalid sequence wilean a string literal, the Windows RC compiler alters it to the Unicode Replacement Character and then encodes that as wantipathyver encoding should be disindictted in that context.

Okay, so getting back to the bug/quirk at hand, we now comprehend that invalid sequences are altered to , which is encoded as FD FF. We also comprehend that FD FF is what we get after compiling the escaped integer \x80 wilean a “forced-wide” string when using the UTF-8 code page. Further, we comprehend that escaped integers in “forced-wide” strings are re-clarifyed using the current code page.

In UTF-8, the byte appreciate 0x80 is a continuation byte, so it produces sense that, when re-clarifyed as UTF-8, it is pondered an invalid sequence. However, that’s actuassociate irrelevant; parsed integer sequences seem to be re-clarifyed in isolation, so any appreciate between 0x80 and 0xFF is treated as an invalid sequence, as those appreciates can only be valid wilean a multi-byte UTF-8 sequence. This can be checked by trying to produce a valid multi-byte UTF-8 sequence using an integer escape as at least one of the bytes, but seeing noleang but � in the result:

STRINGTABLE {
  1 "\xE2\x82\xAC"      ────►  FD FF FD FF FD FF
  2 "\xE2<0x82><0xAC>"  ────►  FD FF FD FF FD FF
}

E2 82 AC is the UTF-8 encoding of € (U+20AC)

An extra wrinkle comes when dealing with octal escapes. 0xFF in octal is 0o377, which unbenevolents that octal escape sequences need to adselect 3 digits in order to depict all possible appreciates of a u8. However, this also unbenevolents that octal escape sequences can encode appreciates above the peak u8 appreciate, e.g. \777 (the peak escaped octal integer) recurrents the appreciate 511 in decimal or 0x1FF in hexadecimal. This is administerd by the Windows RC compiler by truncating the appreciate down to a u8, so e.g. \777 gets parsed into 0x1FF but then gets truncated down to 0xFF before then going thcdisorrowfulmireful the steps alludeed before.

Here’s an example where three separateent escaped integers finish up compiling down to the same result, with the last one only being equivalent after truncation:

STRINGTABLE {
  1 "\x80"  ────► 0x80 ─►  ─► AC 20
  2 "\200"  ────► 0x80 ─►  ─► AC 20
  3 "\600"  ────► 0x180 ─► 0x80 ─►  ─► AC 20
}

Compiled using the Windows-1252 code page, so 0x80 is re-clarifyed as € (U+20AC)

Finassociate, leangs get a little more bizarre when combined with The enticount on unrecorded concept of the ‘output’ code page, as it turns out the re-clarifyation of the escaped integers in “forced-wide” strings actuassociate includes the output code page, not the input code page.

Why?🔗

This one is truly baffling to me. If this behavior is intentional, I don’t comprehfinish the include-case at all. It effectively unbenevolents that it’s impossible to include escaped integers to depict stateive appreciates, and it also unbenevolents that which appreciates those are depfinishs on the current code page. For example, if the code page is Windows-1252, it’s impossible to include escaped integers for the appreciates 0x80, 0x820x8C, 0x8E, 0x910x9C, and 0x9E0x9F (each of these is mapped to a codepoint with a separateent appreciate). If the code page is UTF-8, then it’s impossible to include escaped integers for any of the appreciates from 0x800xFF (all of these are treated as part of a invalid UTF-8 sequence and altered to �). This restrictation seemingly fall shortures the entire purpose of escaped integer sequences.

This directs me to think this is a bug, and even then, it’s a very strange bug. There is absolutely no reason I can envision of for the result of a parsed integer escape to be accidenloftyy re-clarifyed as if it were encoded as the current code page.

resinator‘s behavior🔗

resinator currently alignes the behavior of the Windows RC compiler exactly for “forced-wide” strings. However, using an escaped integer in a “forced-wide” string is anticipateed to become a cautioning in the future.

utterly baffling, miscompilation

Codepoint misbehavior/miscompilation🔗

There are a scant separateent ASCII administer characters/Unicode codepoints that cainclude strange behavior in the Windows RC compiler if they are put stateive places in a .rc file. Each case is adequately separateent that they might authorization their own section, but I’m fair going to lump them together into one section here.

U+0000 Null🔗

The Windows RC compiler behaves very strangely when embedded NUL (<0x00>) characters are in a .rc file. Some examples with ponders to string literals:

will error with unanticipateed finish of file in string literal

“thrives” but results in an desotardy .res file (no RCDATA resource)

Even stranger is that the character count of the file seems to matter in some create for these examples. The first example has an odd character count, so it errors, but grasp one more character (or any odd number of characters; doesn’t matter what/where they are, can even be whitespace) and it will not error. The second example has an even character count, so grasping another character (aacquire, anywhere) would cause the unanticipateed finish of file in string literal error.

U+0004 End of Transleave oution🔗

The Windows RC compiler seemingly treats ‘End of Transleave oution’ (<0x04>) characters outside of string literals as a ‘skip the next character’ teachion when parsing. This unbenevolents that:

1 RCDATA<0x04>! { "foo" }

gets treated as if it were:

while

1 RCDATA<0x04>!?! { "foo" }

gets treated as if it were:

U+007F Delete🔗

The Windows RC compiler seemingly treats ‘Delete’ (<0x7F>) characters as a terminator in some capacity. A scant examples:

gets parsed as 1 RC DATA {}, directing to the compile error file not set up: DATA

“thrives” but results in an desotardy .res file (no RCDATA resource)

fall shorts with unanticipateed finish of file in string literal

U+001A Substitute🔗

The Windows RC compiler treats ‘Substitute’ (<0x1A>) characters as an ‘finish of file’ labeler:

1 RCDATA {}
<0x1A>
2 RCDATA {}

Only the 1 RCDATA {} resource produces it into the .res, everyleang after the <0x1A> is neglectd

but include of the <0x1A> character can also direct to a (presumed) infinite loop in stateive scenarios, appreciate this one:

1 MENUEX FIXED<0x1A>VERSION

U+0900, U+0A00, U+0A0D, U+0D00, U+2000🔗

The Windows RC compiler will error and/or neglect these codepoints when included outside of string literals, but not always. When included wilean string literals, the Windows RC compiler will miscompile them in some very bizarre ways.

1 RCDATA { "ऀ਀਍ഀ " }

The anticipateed result is the resource’s data to grasp the UTF-8 encoding of each codepoint, one after another, but that is not at all what we get:

Expected bytes: E0 A4 80 E0 A8 80 E0 A8 8D E0 B4 80 E2 80 80

  Actual bytes: 09 20 0A 20 0A 20

These are effectively the alterations that are being made in this case:

<U+0900>  ────►  09
<U+0A00>  ────►  20 0A
<U+0A0D>  ────►  20 0A
<U+0D00>  ────►  <leave outted enticount on>
<U+2000>  ────►  20

It turns out that all the codepoints have been turned into some combination of whitespace characters: <0x09> is \t, <0x20> is <space>, and <0x0A> is \n. My guess as to what’s going on here is that there’s some whitespace accomprehendledgeion code going gravely haywire, in combination with some sort of finishianness heuristic. If we run the example thcdisorrowfulmireful the preprocessor only (rc.exe /p /c65001 test.rc), we can see that leangs have already gone wrong (notice: I’ve underlined some whitespace characters):

#line 1 "test.rc"
1 RCDATA { "────

·" }

There’s quite scant bugs/quirks includeing here, so I’ll do my best to elucidate.

As detailed in The Windows RC compiler ‘speaks’ UTF-16, the preprocessor always outputs UTF-16, which unbenevolents that the preprocessor will clarify the bytes of the file using the current code page and then produce them back out as UTF-16. So, with that in mind, let’s leank about U+0900, which erroneously gets altered to the character <0x09> (\t):

  • In the .rc file, U+0900 is encoded as UTF-8, unbenevolenting the bytes in the file are E0 A4 80
  • The preprocessor will decode those bytes into the codepoint 0x0900 (since we set the code page to UTF-8)

While integer finishianness is irrelevant for UTF-8, it is relevant for UTF-16, since a code unit (u16) is 2 bytes wide. It seems possible that, becainclude the Windows RC compiler is so UTF-16-centric, it has some heuristic to infer the finishianness of a file, and that heuristic is being triggered for stateive whitespace characters. That is, it might be that the Windows RC compiler sees the decoded 0x0900 codepoint and leanks it might be a byteswapped 0x0009, and therefore treats it as 0x0009 (which is a tab character).

This sort of leang would elucidate some of the alters we see to the preprocessed file:

  • U+0900 could be perplexd for a byteswapped <0x09> (\t)
  • U+0A00 could be perplexd for a byteswapped <0x0A> (\n)
  • U+2000 could be perplexd for a byteswapped <0x20> (<space>)

For U+0A0D and U+0D00, we need another piece of inestablishation: carriage returns (<0x0D>, \r) are finishly neglectd by the preprocessor (i.e. RC<0x0D>DATA gets clarifyed as RCDATA). With this in mind:

  • U+0A0D, ignoring the 0D part, could be perplexd for a byteswapped <0x0A> (\n)
  • U+0D00 could be perplexd for a byteswapped <0x0D> (\r), and therefore is neglectd

Now that we have a theory about what might be going wrong in the preprocessor, we can check the preprocessed version of the example:

#line 1 "test.rc"
1 RCDATA { "────

·" }

From Multiline strings don’t behave as anticipateed/recorded, we comprehend that this string literal—contrary to the recordation—is an adselected multiline string literal, and we also comprehend that whitespace in these unrecorded string literals is typicassociate collapsed, so the two recentlines and the trailing space should become one 20 0A sequence. In fact, if we get the output of the preprocessor and duplicate it into a recent file and compile that, we get a finishly separateent result that’s more in line with what we anticipate:

Compiled data: 20 20 20 20 20 0A

As detailed in The column of a tab character matters, an embedded tab character gets altered to a variable number of spaces depfinishing on which column it’s at in the file. It fair so happens that it gets altered to 4 spaces in this case, and the remaining 20 0A is the collapsed whitespace follothriveg the tab character.

However, what we actuassociate see when compiling the 1 RCDATA { "ऀ਀਍ഀ " } example is:

09 20 0A 20 0A 20

where these alterations are occurring:

<U+0900>  ────►  09
<U+0A00>  ────►  20 0A
<U+0A0D>  ────►  20 0A
<U+0D00>  ────►  <leave outted enticount on>
<U+2000>  ────►  20

So it seems that someleang about when this bug/quirk gets place in the compiler pipeline impacts how the preprocessor/compiler treats the input/output.

  • Normassociate, an embedded tab character will get altered to spaces during compilation, but even though the Windows RC compiler seems to leank <U+0900> is an embedded tab character, it gets compiled into <0x09> rather than altered to space characters.
  • Normassociate, an unrecorded-but-adselected multiline string literal has its whitespace collapsed, but even though the Windows RC compiler seems to leank <U+0A00> and <U+0A0D> are recent lines and <U+2000> is a space, it doesn’t collapse them.

So, to condense, these codepoints anticipateed perplex the Windows RC compiler into leanking they are whitespace, and the compiler treats them as the whitespace character in some ways, but presents novel behavior for those characters in other ways. In any case, this is a miscompilation, becainclude these codepoints have no authentic relationship to the whitespace characters the Windows RC compiler misgets them for.

U+FEFF Byte Order Mark🔗

For the most part, the Windows RC compiler skips over <U+FEFF> (byte-order label or BOM) everywhere, even wilean string literals, wilean names, etc. (e.g. RC<U+FEFF>DATA will compile as if it were RCDATA). However, there are edge cases where a BOM will cainclude cryptic and unelucidateed errors, appreciate this:

#pragma code_page(65001)
1 RCDATA { 1<U+FEFF>1 }
test.rc(2) : overweightal error RC1011: compiler restrict : '1 }
': macro definition too huge

U+E000 Private Use Character🔗

This behaves aanticipateed to the byte-order label (it gets skipped/neglectd wherever it is), although <U+E000> seems to elude causing errors appreciate the BOM does.

U+FFFE, U+FFFF Noncharacter🔗

The behavior of these codepoints on their own is strange, but it’s not the most engaging part about them, so it’s up to you if you want to enhuge this:

Behavior of U+FFFE and U+FFFF on their own
Expected bytes: EF BF BE

  Actual bytes: EF BF BD EF BF BD (UTF-8 encoding of �, twice)

U+FFFF behaves the same way.

Expected bytes: FE FF

  Actual bytes: FD FF FD FF (UTF-16 LE encoding of �, twice)

U+FFFF behaves the same way.

#pragma code_page(65001)
1 RCDATA { "<U+FFFE>" }

Expected bytes: 3F

  Actual bytes: FE FF

U+FFFF behaves the same way, but would get compiled to FF FF.

#pragma code_page(65001)
1 RCDATA { L"<U+FFFE>" }

Expected bytes: FE FF

  Actual bytes: FE 00 FF 00

U+FFFF behaves the same way, but would get compiled to FF 00 FF 00.

The engaging part about U+FFFE and U+FFFF is that their presence impacts how every non-ASCII codepoint in the file is clarifyed/compiled. That is, if either one materializes anywhere in a file, it impacts the clarifyation of the entire file. Let’s commence with this example and try to comprehfinish what might be happening with the characters in the RCD䄀T䄀 resource type:

1 RCD䄀T䄀 { "<U+FFFE>" }

If we run this thcdisorrowfulmireful the preprocessor only (rc /c65001 /p test.rc), then it finishs up as:

1 RCDATA { "��" }

The clarifyation of the <U+FFFE> codepoint itself is the same as depictd above, but we can also see that the follothriveg alteration is occurring for the codepoint:

<U+4100> ()  ────►  <U+0041> (A)

And this alteration is not an illusion. If you compile this example .rc file, it will get compiled as the predepictd RCDATA resource type. So, what’s going on here?

Let’s back up a bit and talk in a bit more detail about UTF-16 and finishianness. Since UTF-16 includes 2 bytes per code unit, it can be encoded either as little-finishian (least-meaningful byte first) or huge-finishian (most-meaningful byte first).

<U+0041> <U+ABCD> <U+4100>

In many cases, the finishianness of the encoding can be inferred, but in order to produce it ununclear, a byte-order label (BOM) can be included (usuassociate at the commence of a file). The codepoint of the BOM is U+FEFF, so that’s either encoded as FF FE for little-finishian or FE FF for huge-finishian.

With this in mind, ponder how one might administer a huge-finishian UTF-16 byte-order label in a file when commenceing with the assumption that the file is little-finishian.

Big-finishian UTF-16 encoded byte-order label:

Decoded codepoint, assuming little-finishian:

So, commenceing with the assumption that a file is little-finishian, treating the decoded codepoint <U+FFFE> as a trigger for switching to clarifying the file as huge-finishian can produce sense. However, it only produces sense when you are laboring with an encoding where finishianness matters (e.g. UTF-16 or UTF-32). It materializes, though, that the Windows RC compiler is using this <U+FFFE>? Oh, the file is huge-finishian and I should byteswap every codepoint” heuristic even when it’s dealing with UTF-8, which doesn’t produce any sense—finishianness is irrelevant for UTF-8, since its code units are a individual byte.

As alludeed in U+0900, U+0A00, etc, this finishianness handling is anticipateed happening in the wrong phase of the compiler pipeline; it’s acting on already-decoded codepoints rather than impacting how the bytes of the file are decoded.

If I had to guess as to what’s going on here, it would be someleang appreciate:

  • The preprocessor decodes all codepoints, and internassociate supposes little-finishian in some create
  • If the preprocessor ever come atraverses the decoded codepoint <U+FFFE>, it supposes it must be a byteswapped byte-order label, indicating that the file is encoded as huge-finishian, and sets some inside ‘huge-finishian’ flag
  • When writing the result after preprocessing, that ‘huge-finishian’ flag is included to determine whether or not to byteswap every codepoint in the file before writing it (except ASCII codepoints for some reason)

This would elucidate the behavior with we saw earlier, where this .rc file:

1 RCD䄀T䄀 { "<U+FFFE>" }

gets preprocessed into:

1 RCDATA { "��" }

which unbenevolents the follothriveg (byteswapping) alteration occurred, even to the characters preceding the <U+FFFE>:

<U+4100> ()  ────►  <U+0041> (A)
Wait, what about U+FFFF?🔗

U+FFFF labors the exact same way as U+FFFE—it, too, caincludes all non-ACII codepoints in the file to be byteswapped—and I have no clue as to why that would be since U+FFFF has no apparent relationship to a BOM. My only guess is an errant >= 0xFFFE check on a u16 appreciate.

resinator‘s behavior🔗

Any codepoints that cainclude misbehaviors are either a compile error:

test.rc:1:9: error: character '\x04' is not apexhibited outside of string literals
1 RCDATA�!?! { "foo" }
        ^
test.rc:1:1: error: character '\x7F' is not apexhibited
�1 RCDATA {}
^

or the miscompilation is eludeed and a cautioning is disindictted:

test.rc:1:12: cautioning: codepoint U+0900 wilean a string literal would be miscompiled by the Win32 RC compiler (it would get treated as U+0009)
1 RCDATA { "ऀ਀਍ഀ " }
           ^~~~~~~
test.rc:1:12: cautioning: codepoint U+FFFF wilean a string literal would cainclude the entire file to be miscompiled by the Win32 RC compiler
1 RCDATA { "" }
           ^~~
test.rc:1:12: notice: the presence of this codepoint caincludes all non-ASCII codepoints to be byteswapped by the Win32 RC preprocessor
preprocessor bug/quirk

The griefful state of the lonely forward slash🔗

If a line consists of noleang but a / character, then the / is neglectd enticount on (notice: the line can have any amount of whitespace preceding the /, but noleang after the /). The follothriveg example compiles fair fine:

/
1 RCDATA {
  /
  /
}
/

and is effectively equivalent to

1 RCDATA {}

This seems to be a bug/quirk of the preprocessor of rc.exe; if we include rc.exe /p to only run the preprocessor, we see this output:


1 RCDATA {


}


It is very appreciate that this is a bug/quirk in the code reliable for parsing and removing comments. In fact, it’s pretty effortless to comprehfinish how such a bug could come about if we leank about a state machine that parses and erases comments. In such a state machine, once you see a / character, there are three relevant possibilities:

  • It is not part of a comment, in which case it should be disindictted
  • It is the commence of a line comment (//)
  • It is the commence of a multiline comment (/*)

So, for a parser that erases comments, it produces sense to hanciaccess off on disindictting the / until we determine whether or not it’s part of a comment. My guess is that the in-between state is not being administerd brimmingy accurately, and so instead of disindictting the / when it is trailed promptly by a line fracture, it is accidenloftyy being treated as if it is part of a comment.

resinator‘s behavior🔗

resinator does not currently try to emutardy the behavior of the Windows RC compiler, so / is treated as any other character would be and the file is parsed accordingly. In the case of the above example, it finishs up erroring with:

test.rc:6:2: error: anticipateed quoted string literal or unquoted literal; got '<eof>'
/
 ^

What resinator should do in this instance is an discleave out ask.

Conclusion🔗

Well, that’s all I’ve got. There’s a scant leangs I left out due to them being too inmeaningful, or becainclude I have forgotten about some weird behavior I grasped aid for at some point, or becainclude I’m not (yet) conscious of some bugs/quirks of the Windows RC compiler. If you got this far, thanks for reading. Like resinator itself, this finished up taking a lot more effort than I initiassociate anticipated.

If there’s anyleang to get away from this article, I hope it’d be someleang about the beneficialness of fuzzing (or adjacent techniques) in exposing obsremedy bugs/behaviors. If you have written gentleware that lfinishs itself to fuzz testing in any way, I highly aid you to ponder trying it out. On resinator‘s finish, there’s still a lot left to spendigate in terms of fuzz testing. I’m not brimmingy prentd with my current approach, and there are aspects of resinator that I comprehend are not being properly fuzz tested yet.

I’ve fair freed an initial version of resinator as a standalone program if you’d appreciate to try it out. If you’re a Zig includer, see this post for details on how to include the version of resinator included in the Zig compiler. My next steps will be grasping aid for altering .res files to COFF object files in order for Zig to be able to include its self-presented joiner for Windows resources. As always, I’m anticipateing this COFF object file stuff to be pretty straightforward to carry out, but the pwithdrawnce is definitely not in my prefer for that assumption hanciaccessing.





Source join

Thank You For The Order

Please check your email we sent the process how you can get your account

Select Your Plan