Flex is a tool that generates C or C++ source code that are lexical analyzers (scanners). It recognizes syntax, whereas another program, Bison, can be used to generate the semantic analysis portion. When both are used, a lot of work required for writing a compiler is not necessary. Flex can be used for many other purposes as well, not just for compilers. Often we don't need the semantic analysis portion and Flex is enough. Regardless of whether Bison is used, it is worthwhile to learn about Flex.
There is a lot of material about Flex and Bison available, including books. This page is intended to help with use of the Windows ("WIN32") version. In particular, the samples shown in the documentation do not work as shown. Perhaps there are some minor details that are not documented very well or perhaps there are minor bugs that need to be fixed in Flex. The simple samples I have here work for me using Flex 2.5.2.
Flex ("Fast Lex") is a free version of Lex and Bison is a free version of YACC ("Yet Another Compiler Compiler"). Lex and (especially) YACC are classic tools with many years of use.
For Windows versions go to Lex/YACC (actually FLEX and BISON). To use Flex to generate a C scanner, we only need the program (Flex.exe). To generate a C++ scanner, we also need FlexLexer.h, which is available with the Flex source code. Since that is the only file needed to use the pre-built version of Flex, it is unfortunate that it is not available separately or whatever. If, however, you generate a C language scanner, then FlexLexer.h is not needed.
Another file that is needed for a C++ scanner is unistd.h. It is a Unix include file and should not be included by the Windows version of Flex, but it is. So you just need to create the file. I don't know if we can leave the file totally empty but I am sure you can get it to work now that you are aware of the problem.
It is possible to set up a VC project so that Flex is executed automatically to generate the C or CPP file. The following works for VC 6 at least. So let's assume that the Flex input file has a "l" (as in lexical) for an extension and that the output will have a "c" (for C language) or "cpp" (for C++ language) extension.
Don't use precompiled headers unless you know how to use them well enough to use them for this. I had a problem with it complaining about macro redefinitions and premature end-of-file, until I turned off use of precompiled headers.
Generate a "Win32 Console Application" and make it an empty project (no generated source code). For example, I am using "SimpleFlex" for my project name.
Optional: You can customize the Fileview's folders so that the Flex input file is shown in the Source Files folder. In the properties of the Source Files folder, add the extension ("l") to the list of extensions.
Then create a file with a "l" extension for the project; for example, "SimpleFlex.l". In the file, put one of the samples from below. Then in the project settings, create a Custom Build step. If you are not familiar with Custom Builds, then look for the "Custom Build" tab in the project settings. Use the following for the Custom Build step:
Description: Generating lexical analyzer
Commands: C:\Software\FLEX252\flex.exe -o$(ProjDir)\$(InputName).c $(InputPath)
Outputs: $(ProjDir)\$(InputName).c
Where:
After putting in the file a sample from below and creating the Custom Build step, compile the file. You can use Ctrl-F7 to just compile. Actually, at this point, you can just build the project; there is noting for the build to do except generate the scanner (c or cpp file). The custom build should execute Flex, but the only way you will know it does is because the description is shown in the Build output. The c or cpp file should have been generated and then it can be added to the project. Now when you build the project, the c or cpp file should be generated. If you get the errors I describe above (macro redefinitions and premature end-of-file) then turn off precompiled headers for the project.
I have two samples for each there are C and C++ versions. The first one is so simple that it has little value except as a sample.
The following are C and C++ versions of a very simple sample that reads from standard input and writes to standard output. The data is copied as-is, except anywhere "old" exists, "new" is written instead. Note that in the samples, "old" is not in quotes; it is the text that will be scanned for, but does not need quotes. Refer to the Flex documentation for details of the Flex input format.
Flex and Bison were developed before the C++ standard, so they originally only supported C. The following is a very simple sample of Flex input for generating a scanner.
%option main
%{
#include "stdlib.h"
%}
%%
old fputs("new", stdout);
%%
Note that "%option main" causes a default "main" to be generated. As far as I know, the #include should not be needed and is not shown in the samples in the documentation; however it was needed when I tried to use Flex.
It is, however, possible to generate a C++ file that uses a class (named "yyFlexLexer" by default ) to do the scanning. This does not help much for a simple scanner, but can of course help in a much larger project. The following is a C++ version of the above:
%option noyywrap
%option c++
%{
#include "stdio.h"
%}
%%
old cout << "new";
%%
int main(int argc, char* argv[]) {
yyFlexLexer Lexer;
Lexer.yylex();
return 0;
}
Note that the default "main" does not use the yyFlexLexer class, so we must provide a main. Also note that stdio.h is being included, whereas for a C language scanner, only stdlib.h is needed.
Now let's do something useful. This sample can be used to remove trailing blanks for each line of a file. For testing purposes, we can replace trailing blanks and tabs with a specific character (such as '|', the vertical bar), just to make it easier to see it working. Then later the code that writes the specific character can be altered so that nothing is written to replace trailing blanks.
Start with the following:
%option noyywrap
%{
#include "stdlib.h"
%}
%%
[ \t]+$ fputc('|', yyout);
%%
int main(int argc, char* argv[]) {
if (argc > 1)
yyin = fopen(argv[1], "r");
yyout = fopen("Output.txt", "w");
yylex();
return 0;
}
Notice that there is a main and that it will open a file if a filename is in the command line. If a filename is not in the command line, then the input defaults to standard input. It writes it's output to the file "Output.txt".
Start with the following:
%option noyywrap
%option c++
%{
#include <fstream.h>
%}
%%
[ \t]+$ ;
%%
int main(int argc, char* argv[]) {
ifstream LexerFIn;
ofstream LexerFOut;
// open input
if (argc > 1) {
LexerFIn.open(argv[1]);
if (LexerFIn.fail()) {
cerr << "Input file cannot be opened\n";
return 0;
}
}
else {
cerr << "Input file not specified\n";
return 0;
}
// open output
LexerFOut.open("Output.txt");
if (LexerFOut.fail()) {
cerr << "Output file cannot be opened\n";
return 0;
}
// call scanner
yyFlexLexer Lexer(&LexerFIn, &LexerFOut);
Lexer.yylex();
return 0;
}
Note that Flex (at least version 2.5.2 and before) uses the old-style iostreams and therefore the old-style headers (such as fstream.h)! Therefore it is less likely you will want to use a C++ scanner.
Also note that this sample will open a file if a filename is in the command line and will write it's output to the file "Output.txt".
The above sample will replace trailing blanks and tabs with '|' (the vertical bar), but you can use whatever character you want to. After testing, you can change the rule so that trailing blanks will simply be removed. Change the rules to:
%% [ \t]+$ ; %%
That will cause nothing to be done for trailing blanks. See Lex - A Lexical Analyzer Generator for an explanation of how that rule works.
You should now have a good enough start to be able to make other improvements too, such as changing where the output is written to.
The following two books might be helpful.
| ISBN | Author | Publisher | Title |
|---|---|---|---|
| 007709221X | J. P. Bennett | McGraw-Hill | Introduction to compiling techniques : a first course using ANSI C, LEX, and YACC |
| 1565920007 | John Levine | O'Reilly | lex & yacc, 2nd Edition |
I have the lex & yacc book from O'Reilly. Even though it is categorized as a "UNIX Programming Tools" book, it has very little that is specific to UNIX. It is nearly all about just Lex and YACC, which are very compatible with Flex and Bison.
Most (probably all) of the following I found by searching the internet.
See my Visual C++ Programmer Stuff page for more C++ stuff.