| ucpp-1.2 is a C preprocessor mostly compliant to ISO-C99. |
ucpp-1.3 is a C preprocessor compliant to ISO-C99. |
| |
|
| Author: Thomas Pornin <pornin@bolet.org> |
Author: Thomas Pornin <pornin@bolet.org> |
| Main site: http://www.di.ens.fr/~pornin/ucpp/ |
Main site: http://pornin.nerim.net/ucpp/ |
| |
|
| |
|
| |
|
| replacement, conditional compilation and inclusion of header files. |
replacement, conditional compilation and inclusion of header files. |
| It is often found as a stand-alone program on Unix systems. |
It is often found as a stand-alone program on Unix systems. |
| |
|
| Ucpp is such a preprocessor; it is designed to be quick and light, |
ucpp is such a preprocessor; it is designed to be quick and light, |
| but anyway fully compliant to the ISO standard 9899:1999, also known |
but anyway fully compliant to the ISO standard 9899:1999, also known |
| as C99. Ucpp can be compiled as a stand-alone program, or linked to |
as C99. ucpp can be compiled as a stand-alone program, or linked to |
| some other code; in the latter case, ucpp will output tokens, one |
some other code; in the latter case, ucpp will output tokens, one |
| at a time, on demand, as an integrated lexer. |
at a time, on demand, as an integrated lexer. |
| |
|
| Ucpp operates in two modes: |
ucpp operates in two modes: |
| -- lexer mode: ucpp is linked to some other code and outputs a stream of |
-- lexer mode: ucpp is linked to some other code and outputs a stream of |
| tokens (each call to the lex() function will give one token) |
tokens (each call to the lex() function will yield one token) |
| -- non-lexer mode: ucpp preprocesses text and outputs the resulting text |
-- non-lexer mode: ucpp preprocesses text and outputs the resulting text |
| on a file descriptor; if linked to some other code, the cpp() function |
to a file descriptor; if linked to some other code, the cpp() function |
| must be called repeatedly, otherwise ucpp is a stand-alone binary. |
must be called repeatedly, otherwise ucpp is a stand-alone binary. |
| |
|
| |
|
| NO_LIBC_BUF |
NO_LIBC_BUF |
| NO_UCPP_BUF |
NO_UCPP_BUF |
| Two options used to disable the two bufferings inside ucpp. Define |
Two options used to disable the two bufferings inside ucpp. Define |
| both options for maximum memory saving but you will probably want |
both options for maximum memory savings but you will probably want |
| to keep libc buffering if you want decent performance. Define none |
to keep libc buffering for decent performance. Define none on large |
| on large systems (modern 32 or 64-bit systems). |
systems (modern 32 or 64-bit systems). |
| UCPP_MMAP |
UCPP_MMAP |
| With this option, if ucpp internal buffering is active, ucpp will |
With this option, if ucpp internal buffering is active, ucpp will |
| try to mmap() the input files. This might give a slight performance |
try to mmap() the input files. This might yield a slight performance |
| improvement, but will work only on a limited set of architectures. |
improvement, but will work only on a limited set of architectures. |
| PRAGMA_TOKENIZE |
PRAGMA_TOKENIZE |
| Make ucpp generate tokenized PRAGMA tokens on #pragma and _Pragma(); |
Make ucpp generate tokenized PRAGMA tokens on #pragma and _Pragma(); |
| Do not evaluate _Pragma() inside #if, #include, #include_next and #line |
Do not evaluate _Pragma() inside #if, #include, #include_next and #line |
| directives; instead, emit an error (since the remaining _Pragma will |
directives; instead, emit an error (since the remaining _Pragma will |
| surely imply a syntax error). |
surely imply a syntax error). |
| |
DSHARP_TOKEN_MERGE |
| |
When two tokens are to be merged with the `##' operator, but fail |
| |
because they do not merge into a single valid token, ucpp keeps those |
| |
two tokens separate by adding an extra space between them in text |
| |
output. With this option on, that extra space is not added, which means |
| |
that some tokens may merge partially if the text output is preprocessed |
| |
again. See tune.h for details. |
| INMACRO_FLAG |
INMACRO_FLAG |
| In lexer mode, set the inmacro flag to 1 if the current token comes |
In lexer mode, set the inmacro flag to 1 if the current token comes |
| from a macro replacement, 0 otherwise. macro_count maintains an |
from a macro replacement, 0 otherwise. macro_count maintains an |
| Default predefined macros in stand-alone ucpp. |
Default predefined macros in stand-alone ucpp. |
| STD_ASSERT |
STD_ASSERT |
| Default assertions in stand-alone ucpp. |
Default assertions in stand-alone ucpp. |
| NATIVE_INTMAX |
NATIVE_SIGNED |
| NATIVE_UINTMAX |
NATIVE_UNSIGNED |
| SIMUL_UINTMAX |
NATIVE_UNSIGNED_BITS |
| |
NATIVE_SIGNED_MIN |
| |
NATIVE_SIGNED_MAX |
| |
SIMUL_ARITH_SUBTYPE |
| |
SIMUL_SUBTYPE_BITS |
| |
SIMUL_NUMBITS |
| WCHAR_SIGNEDNESS |
WCHAR_SIGNEDNESS |
| Those options define how #if expressions are evaluated; see the |
Those options define how #if expressions are evaluated; see the |
| cross-compilation section of this file for more info. |
cross-compilation section of this file for more info, and the |
| |
comments in tune.h. Extra info is found in arith.h and arith.c, |
| |
at the possible expense of your mental health. |
| DEFAULT_LEXER_FLAGS |
DEFAULT_LEXER_FLAGS |
| DEFAULT_CPP_FLAGS |
DEFAULT_CPP_FLAGS |
| Default flags in respectively lexer and non-lexer modes. |
Default flags in respectively lexer and non-lexer modes. |
| siglongjmp(); it is known to (very slightly) improve performance |
siglongjmp(); it is known to (very slightly) improve performance |
| on AIX systems. |
on AIX systems. |
| MAX_CHAR_VAL |
MAX_CHAR_VAL |
| Ucpp will consider characters whose value is equal or above |
ucpp will consider characters whose value is equal or above |
| MAX_CHAR_VAL as outside the C source charset (so they will be |
MAX_CHAR_VAL as outside the C source charset (so they will be |
| treated just like '@', for instance). For ASCII systems, 128 |
treated just like '@', for instance). For ASCII systems, 128 |
| is fine. 256 is a safer value, but uses more (static) memory. |
is fine. 256 is a safer value, but uses more (static) memory. |
| true content; this is intended for reconstruction of the source |
true content; this is intended for reconstruction of the source |
| line. Beware that some comments may have embedded newlines. |
line. Beware that some comments may have embedded newlines. |
| COPY_LINE_LENGTH |
COPY_LINE_LENGTH |
| Ucpp can maintain a copy of the current source line, up to that |
ucpp can maintain a copy of the current source line, up to that |
| length. Irrelevant to stand-alone version. |
length. Irrelevant to stand-alone version. |
| *_MEMG |
*_MEMG |
| Those settings modify ucpp behaviour, wrt memory allocations. With |
Those settings modify ucpp behaviour, wrt memory allocations. With |
| With this setting, ucpp will check for the return value of malloc() |
With this setting, ucpp will check for the return value of malloc() |
| and exit with a diagnostic when out of memory. MEM_CHECK is implied |
and exit with a diagnostic when out of memory. MEM_CHECK is implied |
| by AUDIT. |
by AUDIT. |
| |
-DMEM_DEBUG |
| |
Enable memory debug code. This will track memory leaks and several |
| |
occurrences of memory management errors; it will also slow down |
| |
things and increase memory consumption, so you probably do not |
| |
want to use this option. |
| -DINLINE=foobar |
-DINLINE=foobar |
| The ucpp code uses "inline" qualifier for some functions; by |
The ucpp code uses "inline" qualifier for some functions; by |
| default, that qualifier is macro-replaced with nothing. Define |
default, that qualifier is macro-replaced with nothing. Define |
| gcc believes some variables might be used prior to their |
gcc believes some variables might be used prior to their |
| initialization; ignore those messages. |
initialization; ignore those messages. |
| |
|
| 5. Install wherever you want the binary and the man page ucpp.1. |
5. Install wherever you want the binary and the man page ucpp.1. I |
| |
have not provided an install sequence because I didn't bother. |
| |
|
| 6. If you do not have the make utility, compile each file seperately |
6. If you do not have the make utility, compile each file separately |
| and link them together. The exact details depend on your compiler. |
and link them together. The exact details depend on your compiler. |
| You must define the macro STAND_ALONE when compiling cpp.c (there |
You must define the macro STAND_ALONE when compiling cpp.c (there |
| is such a definition, commented out, in cpp.c, line 34). |
is such a definition, commented out, in cpp.c, line 34). |
| |
|
| There is no "configure" script since: |
There is no "configure" script because: |
| -- I do not like the very idea of a "configure" script. |
-- I do not like the very idea of a "configure" script. |
| -- Ucpp is written in ANSI-C and should be fairly portable. |
-- ucpp is written in ANSI-C and should be fairly portable. |
| -- There is no such thing as "standard" settings for a C preprocessor. |
-- There is no such thing as "standard" settings for a C preprocessor. |
| The predefined system macros, standard assertions,... must be tuned |
The predefined system macros, standard assertions,... must be tuned |
| by the sysadmin. |
by the sysadmin. |
| not C99 (or later), read the cross-compilation section in this README |
not C99 (or later), read the cross-compilation section in this README |
| file. |
file. |
| |
|
| The C90 and C99 standards state that external linkage names might |
The C90 and C99 standards state that external linkage names might be |
| be considered equal or different based upon only their first 6 |
considered equal or different based upon only their first 6 characters; |
| characters; this rule might make ucpp not to compile on a conformant C |
this rule might make ucpp not compile on a conformant C implementation. |
| implementation. I have yet to see such an implementation, however. |
I have yet to see such an implementation, however. |
| |
|
| If you want to use ucpp as an integrated preprocessor and lexer, see the |
If you want to use ucpp as an integrated preprocessor and lexer, see the |
| section REUSE. Compiling ucpp as a library is an exercise left to the |
section REUSE. Compiling ucpp as a library is an exercise left to the |
| reader. |
reader. |
| |
|
| With the LOW_MEM code enabled, ucpp can run on a Minix-86 or Msdos |
With the LOW_MEM code enabled, ucpp can run on a Minix-i86 or Msdos |
| 16-bit small-memory-model machine. It will not be fully compliant |
16-bit small-memory-model machine. It will not be fully compliant |
| on such an architecture to C99, since C99 states that at least one |
on such an architecture to C99, since C99 states that at least one |
| source code with 4095 simultaneously defined macros must be processed; |
source code with 4095 simultaneously defined macros must be processed; |
| subclause (which BSD dropped recently anyway) and with no reference to |
subclause (which BSD dropped recently anyway) and with no reference to |
| Berkeley (since the code is all mine, written from scratch). Informally, |
Berkeley (since the code is all mine, written from scratch). Informally, |
| this means that you can reuse and redistribute the code as you want, |
this means that you can reuse and redistribute the code as you want, |
| provided that you states in the documentation (or any substantial part |
provided that you state in the documentation (or any substantial part of |
| of the software) of redistributed code that I am the original author. |
the software) of redistributed code that I am the original author. (If |
| (If you press a cdrom with 200 software packages, I do not insist on |
you press a cdrom with 200 software packages, I do not insist on having |
| having my name on the cover of the cdrom -- just keep a Readme file |
my name on the cover of the cdrom -- just keep a Readme file somewhere |
| somewhere on the cdrom, with the copyright notice included.) |
on the cdrom, with the copyright notice included.) |
| |
|
| As a courteous gesture, if you reuse my code, please drop me a mail. |
As a courteous gesture, if you reuse my code, please drop me a mail. |
| It raises my self-esteem. |
It raises my self-esteem. |
| |
|
| |
|
| Afterwards: |
Afterwards: |
| |
|
| -- if you are in lexer mode, call lex(); each call will make the ctok |
-- if you are in lexer mode, call lex(); each call will make the ctok |
| field point to the next token. A non-zero return value is an error. |
field point to the next token. A non-zero return value is an error. |
| lex() skips whitespace tokens. The memory used by the string value |
lex() skips whitespace tokens. The memory used by the string value |
| ignore the error. |
ignore the error. |
| |
|
| -- otherwise, call cpp(); each call will analyze one or more tokens |
-- otherwise, call cpp(); each call will analyze one or more tokens |
| (one token if it did not find a cpp directive, or a macro name). |
(one token if it did find neither a cpp directive nor a macro name). |
| A positive return value is an error. |
A positive return value is an error. |
| |
|
| For both functions, if the return value is CPPERR_EOF (which is a |
For both functions, if the return value is CPPERR_EOF (which is a |
| This will add a trailing 0 if the line was not read entirely. |
This will add a trailing 0 if the line was not read entirely. |
| |
|
| |
|
| |
ucpp may be configured at runtime to accept alternate characters as |
| |
possible parts of identifiers. Typical intended usage is for the '$' |
| |
and '@' characters. The two relevant functions are set_identifier_char() |
| |
and unset_identifier_char(). When this call is issued: |
| |
set_identifier_char('$'); |
| |
then for all the remaining input, the '$' character will be considered |
| |
as just another letter, as far as identifier tokenizing is concerned. This |
| |
is for identifiers only; numeric constants are not modified by that setting. |
| |
This call resets things back: |
| |
unset_identifier_char('$'); |
| |
Those two functions modify the static table which is initialized by |
| |
init_cpp(). You may call init_cpp() at any time to restore the table |
| |
to its standard state. |
| |
|
| |
When using this feature, take care of the following points: |
| |
|
| |
-- Do NOT use a character whose numeric value (as an `unsigned char' |
| |
cast into an `int') is greater than or equal to MAX_CHAR_VAL (in tune.h). |
| |
This would lead to unpredictable results, including an abrupt crash of |
| |
ucpp. ucpp makes absolutely no check whatsoever on that matter: this is |
| |
the programmer's responsibility. |
| |
|
| |
-- If you use a standard character such as '+' or '{', tokens which |
| |
begin with those characters cease to exist. This can be troublesome. |
| |
If you use set_identifier_char() on the '<' character, the handling of |
| |
#include directives will be greatly disturbed. Therefore the use of any |
| |
standard C character in set_identifier_char() of unset_identifier_char() |
| |
is declared unsupported, forbidden and altogether unwise. |
| |
|
| |
-- Stricto sensu, when an extra character is declared as part of an |
| |
identifier, ucpp behaviour cease to conform to C99, which mandates that |
| |
characters such as '$' or '@' must be treated as independant tokens of |
| |
their own. Therefore, if your purpose is to use ucpp in a conformant |
| |
C implementation, the use of set_identifier_char() should be made at |
| |
least a runtime option. |
| |
|
| |
-- When enabling a new character in the middle of a macro replacement, |
| |
the effect of that replacement may be delayed up to the end of that |
| |
macro (but this is a "may" !). If you wish to trigger this feature with |
| |
a custom #pragma or _Pragma(), you should remember it (for instance, |
| |
usine _Pragma() in a macro replacement, and then the extra character |
| |
in the same macro replacement, is not reliable). |
| |
|
| |
|
| |
|
| COMPATIBILITY NOTES |
COMPATIBILITY NOTES |
| ------------------- |
------------------- |
| -- Traditional C, aka "K&R". This is the language first described by |
-- Traditional C, aka "K&R". This is the language first described by |
| Brian Kernighan and Dennis Ritchie, and implemented in the first C |
Brian Kernighan and Dennis Ritchie, and implemented in the first C |
| compiler that was ever coded. There are actually several dialects of |
compiler that was ever coded. There are actually several dialects of |
| K&R, and all of them are considered as deprecated. |
K&R, and all of them are considered deprecated. |
| |
|
| -- ISO 9899:1990, aka C90, aka C89, aka ANSI-C. Formalized by ANSI |
-- ISO 9899:1990, aka C90, aka C89, aka ANSI-C. Formalized by ANSI |
| in 1989 and adopted by ISO the next year, it is the C flavour many C |
in 1989 and adopted by ISO the next year, it is the C flavour many C |
| with enhancements, clarifications and several new features. |
with enhancements, clarifications and several new features. |
| |
|
| -- ISO 9899:1999, aka C99. This is an evolution on C90, almost fully |
-- ISO 9899:1999, aka C99. This is an evolution on C90, almost fully |
| backward compatible with C90 (exhibitting a code that makes a difference |
backward compatible with C90. C99 introduces many new and useful |
| is a tricky exercise). C99 introduces many new and useful features, |
features, however, including in the preprocessor. |
| however, including in the preprocessor. |
|
| |
|
| There was also a normative addendum in 1995, that added a few features |
There was also a normative addendum in 1995, that added a few features |
| to C90 (for instance, digraphs) that are also present in C99. |
to C90 (for instance, digraphs) that are also present in C99. It is |
| |
sometimes refered to as "C95" or "AMD 1". |
| |
|
| |
|
| Ucpp implements the C99 standard, but can be used in a stricter mode, |
ucpp implements the C99 standard, but can be used in a stricter mode, |
| to enforce C90 compatibility (it will, however, still recognize some |
to enforce C90 compatibility (it will, however, still recognize some |
| constructions that are not in plain C90). |
constructions that are not in plain C90). |
| |
|
| Ucpp also knows several extensions to C99: |
ucpp also knows about several extensions to C99: |
| |
|
| -- Assertions: this is an extension to the defined() operator, with |
-- Assertions: this is an extension to the defined() operator, with |
| its own namespace. Assertions seem to be used in several places, |
its own namespace. Assertions seem to be used in several places, |
| support is always active. |
support is always active. |
| |
|
| The ucpp code itself should be compatible with any ISO-C90 compiler. |
The ucpp code itself should be compatible with any ISO-C90 compiler. |
| The cpp.c file is rather big (~ 53kB), it might confuse old 16-bit C |
The cpp.c file is rather big (~ 64kB), it might confuse old 16-bit C |
| compilers; the macro.c file is somewhat large also (~ 43kB). |
compilers; the macro.c file is somewhat large also (~ 47kB). |
| |
|
| The evaluation of #if expressions is subject to some subtleties, see the |
The evaluation of #if expressions is subject to some subtleties, see the |
| section "cross-compilation". |
section "cross-compilation". |
| strict positivity is already assured by the C standard, so you just need |
strict positivity is already assured by the C standard, so you just need |
| to adjust MAX_CHAR_VAL. |
to adjust MAX_CHAR_VAL. |
| |
|
| Ucpp has been tested succesfully on ASCII/ISO-8859-1 and EBCDIC systems. |
ucpp has been tested succesfully on ASCII/ISO-8859-1 and EBCDIC systems. |
| Beware that UTF-8 is NOT compatible with EBCDIC. |
Beware that UTF-8 is NOT compatible with EBCDIC. |
| |
|
| Pragma handling: when used in non-lexer mode, ucpp tries to output |
Pragma handling: when used in non-lexer mode, ucpp tries to output a |
| a source text that, read again, will give the exact same stream of |
source text that, when read again, will yield the exact same stream of |
| tokens. This is not completely true with regards to line numbering in |
tokens. This is not completely true with regards to line numbering in |
| some tricky macro replacements, but it should work correctly otherwise, |
some tricky macro replacements, but it should work correctly otherwise, |
| especially with pragma directives if the compile-time option PRAGMA_DUMP |
especially with pragma directives if the compile-time option PRAGMA_DUMP |
| was set: #pragma are dumped, non-void _Pragma() are converted to the |
was set: #pragma are dumped, non-void _Pragma() are converted to the |
| corresponding #pragma and dumped also. |
corresponding #pragma and dumped also. |
| |
|
| Ucpp does not macro-replace the contents of #pragma and _Pragma(); |
ucpp does not macro-replace the contents of #pragma and _Pragma(); |
| If you want a macro-replaced pragma, use this: |
If you want a macro-replaced pragma, use this: |
| |
|
| #define pragma_(x) _Pragma(#x) |
#define pragma_(x) _Pragma(#x) |
| inside a #pragma or another _Pragma). |
inside a #pragma or another _Pragma). |
| |
|
| |
|
| I wrote ucpp according to what is found in "The Language C" from Brian |
I wrote ucpp according to what is found in "The C Programming Language" |
| Kernighan and Dennis Ritchie (2nd edition) and the C99 standard; but I |
from Brian Kernighan and Dennis Ritchie (2nd edition) and the C99 |
| could have misinterpreted some points. On some tricky points I got help |
standard; but I could have misinterpreted some points. On some tricky |
| from the helpful people from the comp.std.c newsgroup. For assertions |
points I got help from the helpful people from the comp.std.c newsgroup. |
| and #include_next, I mimicked the behaviour of GNU cpp, as is stated |
For assertions and #include_next, I mimicked the behaviour of GNU cpp, |
| in the GNU cpp info documentation. An open question is related to the |
as is stated in the GNU cpp info documentation. An open question is |
| following code: |
related to the following code: |
| |
|
| #define undefined ! |
#define undefined ! |
| #define makeun(x) un ## x |
#define makeun(x) un ## x |
| bar |
bar |
| #endif |
#endif |
| |
|
| Ucpp will replace 'defined foo' with 0 first (since foo is not defined), |
ucpp will replace 'defined foo' with 0 first (since foo is not defined), |
| then it will replace the macro makeun, and the expression will become |
then it will replace the macro makeun, and the expression will become |
| 'un0', which is replaced by 0 since this is a remaining identifier. The |
'un0', which is replaced by 0 since this is a remaining identifier. The |
| expression evaluates to false, and 'bar' is emitted. |
expression evaluates to false, and 'bar' is emitted. |
| behaviour). |
behaviour). |
| |
|
| |
|
| |
Another point about macro replacement has been discussed at length in |
| |
several occasions. It is about the following code: |
| |
|
| |
#define CAT(a, b) CAT_(a, b) |
| |
#define CAT_(a, b) a ## b |
| |
#define AB(x, y) CAT(x, y) |
| |
CAT(A, B)(X, Y) |
| |
|
| |
ucpp will produce `CAT(X,Y)' as replacement for the last line, whereas |
| |
some other preprocessors output `XY'. The answer to the question |
| |
"which behaviour is correct" seems to be "this is not defined by the |
| |
C standard". It is the answer that has been actually given by the C |
| |
standardization committee in 1992, to the defect report #017, question |
| |
23, which asked that very same question. Since the wording of the |
| |
standard has not changed in these parts from the 1990 to the 1999 |
| |
version, the preprocessor behaviour on the above-stated code should |
| |
still be considered as undefined. |
| |
|
| |
It seems, however, that there used to be a time (around 1988) when the |
| |
committee members agreed upon a precise macro-replacement algorithm, |
| |
which specified quite clearly the preprocessor behaviour in such |
| |
situation. ucpp behaviour is occasionnaly claimed as "incorrect" with |
| |
regards to that algorithm. Since that macro replacement algorithm has |
| |
never been published, and the committee itself backed out from it in |
| |
1992, I decided to disregard those feeble claims. |
| |
|
| |
It is possible, however, that at some point in the future I rewrite the |
| |
ucpp macro replacement code, since that code is a bit messy and might be |
| |
made to use less memory in some occasions. It is then possible that, in |
| |
the aftermath of such a rewrite, the ucpp behaviour for the above stated |
| |
code become tunable. Don't hold your breath, though. |
| |
|
| |
|
| About _Pragma: the standard is not clear about when this operator is |
About _Pragma: the standard is not clear about when this operator is |
| evaluated, and if it is allowed inside #if directives and such. For |
evaluated, and if it is allowed inside #if directives and such. For |
| ucpp, I coded _Pragma as a special macro with lazy replacement: it will |
ucpp, I coded _Pragma as a special macro with lazy replacement: it will |
| equivalent, except that the types used are intmax_t and uintmax_t, as |
equivalent, except that the types used are intmax_t and uintmax_t, as |
| defined in <stdint.h>. |
defined in <stdint.h>. |
| |
|
| Ucpp can use two expression evaluators: one uses native integer types |
ucpp can use two expression evaluators: one uses native integer types |
| (one signed and one unsigned), the other evaluator emulates big integer |
(one signed and one unsigned), the other evaluator emulates big integer |
| numbers by representing them with two "unsigned long". By default, it |
numbers by representing them with two values of some unsigned type. The |
| will use the first evaluator, using (u)intmax_t as native types if the |
emulated type handles signed values in two's complement representation, |
| compiler is C99-compliant, or (unsigned) long otherwise. If you want |
and can be any width ranging from 2 bits to twice the size of the |
| another behaviour, modify the relevant section in tune.h. Here are |
underlying native unsigned type used. An odd width is allowed. When |
| examples of definitions: |
right shifting an emulated signed negative value, it is left-padded with |
| |
bits set to 1 (this is sign extension). |
| /* evaluate natively with type "long long" */ |
|
| #define NATIVE_UINTMAX unsigned long long |
When the ARITHMETIC_CHECKS macro is defined in tune.h, all occurrences |
| #define NATIVE_INTMAX long long |
of implementation-defined or undefined behaviour during arithmetic |
| |
evaluation are reported as errors or warned upon. This includes all |
| /* evaluate natively with type "long" (even if bigger is available) */ |
overflows and underflows on signed quantities, constants too large, |
| #define MATIVE_UINTMAX unsigned long |
and so on. Errors (which terminate immediately evaluation) are emitted |
| #define MATIVE_INTMAX long |
for division by 0 (on / and % operators) and overflow (on / operator); |
| |
otherwise, warnings are emitted and the faulty evaluation takes place. |
| /* evaluate with bignum evaluation */ |
This prevents ucpp from crashing on typical x86 machines, while still |
| #undef NATIVE_UINTMAX |
allowing to use some extensions. |
| #define SIMUL_UINTMAX |
|
| |
|
| The bignum evaluation handles signed integers in two's complement |
|
| representation, whether this is the native integer representation or |
|
| not. The code makes the non-standard assumption that unsigned long are |
|
| represented unpadded in memory, that is, unsigned long are made up of |
|
| exactly sizeof(unsigned long) * CHAR_BIT bits. I have never heard of any |
|
| architecture where this assumption would be false. |
|
| |
|
| |
|
| |
|
| FUTURE EVOLUTIONS |
FUTURE EVOLUTIONS |
| ----------------- |
----------------- |
| |
|
| Ucpp is quite complete now. There was a longstanding project of |
ucpp is quite complete now. There was a longstanding project of |
| "traditional" preprocessing, but I dropped it because it would not |
"traditional" preprocessing, but I dropped it because it would not |
| map cleanly on the token-based ucpp structure. Maybe I will code a |
map cleanly on the token-based ucpp structure. Maybe I will code a |
| string-based preprocessor one day; it would certainly use some of the |
string-based preprocessor one day; it would certainly use some of the |
| code from lexer.c, eval.c, mem.c and hash.c. However, making such a tool |
code from lexer.c, eval.c, mem.c and nhash.c. However, making such a |
| is almost irrelevant nowadays. If one wants to handle such project, |
tool is almost irrelevant nowadays. If one wants to handle such project, |
| using ucpp as code base, I would happily provide some help, if needed. |
using ucpp as code base, I would happily provide some help, if needed. |
| |
|
| |
|
| CHANGES |
CHANGES |
| ------- |
------- |
| |
|
| |
From 1.2 to 1.3: |
| |
|
| |
* brand new integer evaluation code, with precise evaluation and checks |
| |
* new hash table implementation, with binary trees |
| |
* relaxed attitude on failed `##' operators |
| |
* bugfix on macro definition on command-line wrt nesting macros |
| |
* support for up to 32766 macro arguments in LOW_MEM code |
| |
* support for optional additional "identifier" characters such as '$' or '@' |
| |
* bugfix: memory leak on void #assert |
| |
|
| From 1.1 to 1.2: |
From 1.1 to 1.2: |
| |
|
| * bugfix: numerous memory leaks |
* bugfix: numerous memory leaks |
| --------- |
--------- |
| |
|
| Volker Barthelmann, Neil Booth, Stephen Davies, Stéphane Ecolivet, |
Volker Barthelmann, Neil Booth, Stephen Davies, Stéphane Ecolivet, |
| Marcus Holland-Moritz, Antoine Leca, Cyrille Lefevre, Dave Rivers, Loic |
Marc Espie, Marcus Holland-Moritz, Antoine Leca, Cyrille Lefevre, |
| Tortay and Laurent Wacrenier, for suggestions and beta-testing. |
Dave Rivers, Loic Tortay and Laurent Wacrenier, for suggestions and |
| |
beta-testing. |
| |
|
| Paul Eggert, Douglas A. Gwyn, Clive D.W. Feather, and the other guys from |
Paul Eggert, Douglas A. Gwyn, Clive D.W. Feather, and the other guys from |
| comp.std.c, for explanations about the standard. |
comp.std.c, for explanations about the standard. |