Poll

Opinion?

Good
1 (33.3%)
Bad
0 (0%)
Ugly
2 (66.7%)

Total Members Voted: 1

Author Topic: Programmers Opinion: enums  (Read 7405 times)

daniel.santos

  • Guest
Programmers Opinion: enums
« on: 23 February 2009, 22:15:09 »
As I've  been reconstructing the networking code, I've added a very pretty object printing mechanism that can be compiled to zero-impact if desired (perhaps based upon rather or not DEBUG, DEBUG_NETWORK or some such macro is defined).  I want this in place so that very good tracing can be done to simplify and easy network protocol troubleshooting.  I suppose that something similar could be done by defining this for wireshark in whatever language you use to define this stuff, but I really think it's better to be able to do it from the program its self, especially since it keeps it bound to the code (there's no dual maintenance to worry about).

So the new networking stuff uses a lot of enum values and when printing an object, I would prefer to use the actual value name rather than printing a numeric value.  This leads me back to an issue that's often bothered me, both in C or C++ and in Java, and that's printing the names of enum values.  The solution I've been using (a very common one) is to define separately the enum body and a const char *[] array with the names like this:

Header File:
Code: [Select]
enum State {
    STATE_UNCONNECTED,  /** not yet connected */
    STATE_LISTENING,    /** not yet connected, but listening for connections */
    STATE_CONNECTED,    /** established a connection and sent (or sending) handshake */
    STATE_INTRODUCED,   /** handshake completed */
    STATE_INITIALIZED,  /** has received game settings (still in lobby) */
    STATE_LAUNCH_READY, /** ready to launch game (when given the O.K.) */
    STATE_LAUNCHING,    /** launching game */
    STATE_READY,        /** ready to begin play */
    STATE_PLAY,         /** game started (normal play state) */
    STATE_PAUSED,       /** game paused */
    STATE_QUIT,         /** quit game requested/initiated */
    STATE_END,          /** game terminated */

    STATE_COUNT
};

extern const char *StateEnumNames[STATE_COUNT];

Implementation File:
Code: [Select]
const char *StateEnumNames[STATE_COUNT] = {
    "STATE_UNCONNECTED",
    "STATE_LISTENING",
    "STATE_CONNECTED",
    "STATE_INTRODUCED",
    "STATE_INITIALIZED",
    "STATE_LAUNCH_READY",
    "STATE_LAUNCHING",
    "STATE_READY",
    "STATE_PLAY",
    "STATE_PAUSED",
    "STATE_QUIT",
    "STATE_END"
};


As you may be aware, code duplication really bothers me.  If something has copy and paste code in it, then there should probably be a generic implementation somewhere, rather using templates, base classes, etc., to encapsulate the commonalities (just gotta be aware of template code bloat issues).  I really hate having something like this where one depends upon the other because it's easy to screw up.  I've been examining the alternatives and I have one, but I'm not sure I like it.  So I ask you guy's opinion (and I intend this question for C/C++ programmers only please.  But trust me, I value all of you guy's opinions where it counts!)

Code: [Select]
/*
    This is a sample way to implement an enum so that the list only appears once in the code using
    a preprocessor macro and that same list is used for both the enum body and a string array
    that contains the corresponding name of each value.

    Advantages
    * No room for error
      o Renaming any of the values automatically renames in both the enum body and the string array
      o Reordering values automatically updated in both
    * Don't have to have redundant prefix appended to each value name
    * Less code text (no copy & paste)
    * No need to maintain both enum body and string array separately

    Disadvantages
    * Not very Doxygen friendly (see notes below)
    * Relies heavily on C preprocessor
    * Added code complexity
    * Is not immediately intutive (until you get used to it)
    * It adds one more project-specific paradigm for the new programmer to learn

    Other Notes
    The use of "static const char *[]" is wasteful as (I believe) it would cause strings to be
    embedded in every object file that includes them.  I'm mostly using it here to keep all of this
    code in one place.  Most probably, the correct solution is to use "extern const char *[]" and
    then use an implementation file to expand the names.  However, if it were used for debug only
    (i.e., had #ifdef DEBUG around it) the overhead wouldn't matter and it could be kept all
    together.

    Finally, it may be possible to get Doxygen to like this by feeding it MACRO_EXPANSION=YES,
    EXPAND_ONLY_PREDEF=YES, and EXPAND_AS_DEFINED=<list> where <list> contains "EV" as well as
    each of the xxx_ENUM_VALUES macros used.
*/

#define STATE_ENUM_VALUES                                                           \
    EV(UNCONNECTED) /** not yet connected */                                        \
    EV(LISTENING)   /** not yet connected, but listening for connections */         \
    EV(CONNECTED)   /** established a connection and sent (or sending) handshake */ \
    EV(INTRODUCED)  /** handshake completed */                                      \
    EV(INITIALIZED) /** has received game settings (still in lobby) */              \
    EV(LAUNCH_READY)/** ready to launch game (when given the O.K.) */               \
    EV(LAUNCHING)   /** launching game */                                           \
    EV(READY)       /** ready to begin play */                                      \
    EV(PLAY)        /** game started (normal play state) */                         \
    EV(PAUSED)      /** game paused */                                              \
    EV(QUIT)        /** quit game requested/initiated */                            \
    EV(END)         /** game terminated*/


enum State {
#define EV(value_name) STATE_ ## value_name,
    STATE_ENUM_VALUES
#undef EV
    STATE_COUNT
};

static const char * StateEnumNames[STATE_COUNT] = {
#define EV(value_name) "STATE_" #value_name,
    STATE_ENUM_VALUES
#undef EV
};

This is the simplest that I've been able to refine it to.  It *might* be possible to further refine it with a generic macro that expands the enum and string array, but I actually don't know how to do it because I would have to redefine EV and I don't believe there is any way to undefine and redefine macros within a macro expansion, plus the fact that it would further obfuscate it.  If it was possible, it would look something like this:

Code: [Select]
#ifdef SOME_DEBUG_OPTION
#   define STRINGY_ENUM_MACRO(name, prefix, values) \
        enum name { /* insert magic here */, prefix ## COUNT}; \
        static const char *name ## EnumNames[prefix ## COUNT] = { /* insert magic here */ };
#else
#   define STRINGY_ENUM_MACRO(name, prefix, values) \
        enum name { /* insert magic here */, prefix ## COUNT};
#endif

#define STATE_ENUM_VALUES /* same as above */
 STRINGY_ENUM_MACRO(State, STATE_, STATE_ENUM_VALUES)

If that's possible, and I don't know how to do it (thus the /* insert magic here */), then it would further reduce the number of lines of code (and at this point, there are some 20-ish similar enums in the code) but as I said, it would further obfuscate it.

For now, I'm going the copy & paste route, but I would love to hear your opinions.  Thanks!

hailstone

  • Local Moderator
  • Battle Machine
  • ********
  • Posts: 1,568
    • View Profile
Re: Programmers Opinion: enums
« Reply #1 on: 24 February 2009, 10:56:36 »
It would be nice if there was a toString function like in C#. I think you've done it the best that I know of. Any other way I can think of will require specifying the enum and a string.

Maybe http://www.codeproject.com/KB/cpp/C___enums_to_strings.aspx or http://www.gamedev.net/community/forums/topic.asp?topic_id=299161&whichpage=1&#1898948 has something useful.
Glest Advanced Engine - Admin/Programmer
https://sourceforge.net/projects/glestae/

jrepan

  • Guest
Re: Programmers Opinion: enums
« Reply #2 on: 24 February 2009, 11:20:49 »
I think that using preprocessor(especially #if-s and #define-s) outside #include-s part is ugly. But at the same time duplicating code is bad. So I'm not sure what's the best solution.
By the way, I see you are using STATE_UNCONNECTED instead of sUnconnected unlike elsewhere in Glest.

daniel.santos

  • Guest
Re: Programmers Opinion: enums
« Reply #3 on: 24 February 2009, 15:51:34 »
Thanks for feedback all.  Especially thanks hailstone for some very good examples of encapsulating this!  I think that in the end it's the lessor of evils to just do the copy & paste and keep it simple.  None the less, I think I'll encapsulate all of the strings into the GameConstants class rather than leaving them as stray globals as that would seem to be slightly better.

Just to follow up, here's a snippet (propritary information removed) from a Java enum-as-a-class style that I use.  When I wrote this originally, Java 5 was still not final and the company I worked for was slow to accept new standards.  However, once I studied the new Java 5 enums it became apparent that I wouldn't be able convert my paradigm to the Java 5 enum, so I left them all as classes.

Code: [Select]
/**
 *  <p>A pseudo-Emun for the states of each component.  This is not a real Java
 *  5.0 Enum because we're using inheritance to better clarify the states that
 *  apply only to certain components.  Thus, each component can derive from
 *  my.package.State and specify their custom states.</p>
 */
public class State implements Serializable {
    private static byte nextId = 0;
    private static int maxNumParams = 0;
    private static final ArrayList<State> states = new ArrayList<State>();

    private final byte id;
    private final String name;
    private final String description;
    private final byte numParams;

    public static final State UNINITIALIZED = new State(
            "UNINITIALIZED",
            "Uninitialized", 0);
    public static final State INITIALIZING = new State(
            "INITIALIZING",
            "Initializing", 0);
    public static final State INITIALIZED = new State(
            "INITIALIZED",
            "Initialized", 0);
    public static final State STARTING = new State(
            "STARTING",
            "Starting up", 0);
    public static final State RUNNING = new State(
            "RUNNING",
            "Running", 0);
    public static final State PAUSED = new State(
            "PAUSED",
            "Paused", 0);
    public static final State IDLE = new State(
            "IDLE",
            "Idle", 0);
    public static final State STOPPING = new State(
            "STOPPING",
            "Stopping", 0);
    public static final State STOPPED = new State(
            "STOPPED",
            "Stopped", 0);

    protected State(String name, String description, int numParams) {
        assert numParams <= Byte.MAX_VALUE;

        synchronized(State.class) {
            if(nextId == Byte.MAX_VALUE) {
                throw new IndexOutOfBoundsException("Only " + Byte.MAX_VALUE
                        + " distinct States are currently supported.  You "
                        + "must change the implementation of my.package.State, "
                        + "<some other class> to support to "
                        + "support a greater number of states.  This "
                        + "limitation was implemented to reduce packet size of "
                        + "status updates.");
            }
            this.id = nextId++;
            this.name = name;
            this.description = description;
            this.numParams = (byte)numParams;

            if(maxNumParams < numParams) {
                maxNumParams = numParams;
            }
            this.id = nextId++;
            this.name = name;
            this.description = description;
            this.numParams = (byte)numParams;

            if(maxNumParams < numParams) {
                maxNumParams = numParams;
            }

            assert this.id == states.size();
            states.add(this);
        }
    }

    /**
     *  Retrieve a State object by it's id.
     *  @return the State object associated with the specified id.
     *  @throws IndexOutOfBoundsException if id does not refer to a valid state.
     */
    public static State getById(byte id) throws IndexOutOfBoundsException {
        return states.get(id);
    }

    public static synchronized int getMaxNumParams() {
        return maxNumParams;
    }

    public static State[] getAllStates() {
        return states.toArray(new State[states.size()]);
    }

    public byte getId() {
        return id;
    }

    public String getName() {
        return name;
    }

    public String getDescription() {
        return description;
    }

    public byte getNumParams() {
        return numParams;
    }

    public String toString() {
        return name;
    }

    public String toString(String[] params) {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        PrintWriter pw = new PrintWriter(baos);
        // ps.printf(description, (Object[])params);
        pw.print(description);
        for(int i = 0; i < getNumParams(); i++) {
            pw.print((i > 0 ? ", " : ": ") + params[i]);
        }
        pw.flush();
        return baos.toString();
    }
}

oops, would post a derived class but I'm out of time, I gotta run!


daniel.santos

  • Guest
Re: Programmers Opinion: enums
« Reply #4 on: 24 February 2009, 17:41:34 »
Ok, so this is still a bit OT, but this is an example of a derived Java class from the previous State example:

Code: [Select]
package my.package.;

import my.package.State;

public class LogFileMonitorState extends State {
    public static final State READING_DIR = new LogFileMonitorState (
            "READING_DIR",
            "Reading directory %s", 1);
    public static final State PROCESSING_LOG_FILE = new LogFileMonitorState (
            "PROCESSING_LOG_FILE",
            "Processing log file %s", 1);

    protected LogFileMonitorState(String name, String description, int numParams) {
        super(name, description, numParams);
    }
}

This is the type of functionality I had hoped out of Java 5 enums, but when they emerged, they didn't support polymorphism (which I know is tricky anyway).

So C++0x promises type safe enumerations, but I'm still not seeing any functionality to retrieve a name from the language it's self.  Also, this information can be access from the debug information in some cases.  With gcc, I think you have to compile with -g3 to get enum names (I know that's the case for preprocessor macros, but I'm not actually certain on enums since they are a part of the C/C++ language its self).  glibc offers a limited self-debugging library (mostly for backtraces) that will attempt to retrieve debug information, but I don't know much about it.  Either way, I agree with the simplicity of just accepting the copy & paste solution for now -- at least we can keep all of this encapsulated in one place.  I'm thinking about moving all global enumerations to the game_constants.h file.

I think that using preprocessor(especially #if-s and #define-s) outside #include-s part is ugly. But at the same time duplicating code is bad. So I'm not sure what's the best solution.
By the way, I see you are using STATE_UNCONNECTED instead of sUnconnected unlike elsewhere in Glest.

Yea, I'm renaming all of the enum values to follow more standard naming conventions.  Generally, in most languages, constants are expressed as ALL_UPPER_CASE with underscores delimiting words.  Each enum value will continue to have a unique prefix for the enum and contain PREFIX_COUNT as the last element.  The enum type names will be ProperCase as they are now.  I'm going to eventually write up a coding standards document and post it as an open proposal for feedback so that it isn't just me making these decisions.  In brief, I'm basically following K&R coding standards with tabs instead of spaces for indentation, tab size of 4, and a max line width of 100 characters.  80 Characters is the typical line limit, but there are a lot of long names in this code and I'm just feeling at this point in technology, most people have sufficient monitors & video cards that 100 characters is more appropriate.  Heck, as powers-of-two obsessive as I am, I wouldn't mind going to 128!! :) lol!

Thanks again for the feedback, I really appreciate it!  (And I was leanin towards "it's too ugly" myself).


hailstone

  • Local Moderator
  • Battle Machine
  • ********
  • Posts: 1,568
    • View Profile
Re: Programmers Opinion: enums
« Reply #5 on: 25 February 2009, 05:02:43 »
From http://stackoverflow.com/questions/201593/is-there-a-simple-script-to-convert-c-enum-to-string

Quote from: James Curran
The problem is, once a C program is compiled, the binary value of the enum is all that is used, and the name is gone.

Quote from: hydros
QT is able to pull that of (thanks to the meta object compiler): http://labs.trolltech.com/blogs/2008/10/09/coding-tip-pretty-printing-enum-values/
Glest Advanced Engine - Admin/Programmer
https://sourceforge.net/projects/glestae/

PolitikerNEU

  • Guest
Re: Programmers Opinion: enums
« Reply #6 on: 25 February 2009, 11:15:37 »
Hello, I have done a different macro-based approach. The code for generating the macro is really ugly and it is slow but the code for using the macro doesn't seem too bad to me (of course it is worse than normal code).
What do you think - this is my macro code
Code: [Select]
#define ENUM(name,...)\
class unimportant_##name{\
private:\
char** strings;\
int* number;\
int num_char;\
public:\
unimportant_##name(const char* init) {\
num_char = 1;\
int curpos = 0;\
while (init[curpos]) {\
if (init[curpos] == ',') {\
num_char++;\
}\
curpos++;\
}\
int * beg_count = new int[num_char];\
int * name_count = new int[num_char];\
int * end_count = new int[num_char];\
number = new int[num_char];\
int cur_number = 0;\
strings = new char*[num_char];\
for (int i = 0; i < num_char; i++) {\
beg_count[i] = 0;\
name_count[i] = 0;\
end_count[i] = 0;\
}\
number[0] = 0;\
curpos = 0;\
int cur_length;\
int count = 0;\
while (init[curpos]) {\
if(count) {\
number[count] = number[count-1]+1;\
}\
cur_length = 0;\
while (init[curpos] && !((init[curpos] >= 'a' && init[curpos]\
<= 'z') || (init[curpos] >= 'A' && init[curpos] <= 'Z')\
|| init[curpos] == '_')) {\
curpos++;\
beg_count[count]++;\
}\
while (init[curpos]\
&& ((init[curpos] >= 'a' && init[curpos] <= 'z')\
|| (init[curpos] >= 'A' && init[curpos] <= 'Z')\
|| init[curpos] == '_')) {\
name_count[count]++;\
curpos++;\
}\
while (init[curpos] && init[curpos] != ',') {\
if (init[curpos] == '=') {\
curpos++;\
end_count[count]++;\
number[count] = 0;\
bool negative = false;\
for(;init[curpos] && init[curpos] != '-' && (init[curpos] < '0' || init[curpos] > '9');curpos++,end_count[count]++) {}\
if (init[curpos] && init[curpos] == '-') {\
negative = true;\
curpos++;\
end_count[count]++;\
}\
while(init[curpos] && init[curpos] >= '0' && init[curpos] <= '9') {\
number[count]*= 10;\
number[count]+= init[curpos] - '0';\
curpos++;\
end_count[count]++;\
}\
if (negative) {\
number[count]=-number[count];\
}\
} else {\
end_count[count]++;\
curpos++;\
}\
}\
if (init[curpos]) {\
curpos++;\
}\
strings[count] = new char[name_count[count] + 1];\
count++;\
cur_number++;\
}\
curpos = 0;\
for (int i = 0; i < num_char; i++, curpos++) {\
curpos += beg_count[i];\
for (int j = 0; j < name_count[i]; j++, curpos++) {\
strings[i][j] = init[curpos];\
}\
curpos += end_count[i];\
strings[i][name_count[i]] = '\000';\
}\
delete[] beg_count;\
delete[] name_count;\
delete[] end_count;\
}\
const char* operator[](int i) const {\
if (i >= 0 && i < num_char && number[i] == i) { /*Most of the time "standard" enums are used*/\
return strings[i];\
}\
for (int j = 0; j < num_char; j++) {\
if (number[j] == i) {\
return strings[j];\
}\
}\
return NULL;\
}\
~unimportant_##name() {\
for (int i = 0; i < num_char; i++) {\
delete[] strings[i];\
}\
delete[] strings;\
delete[] number;\
}\
} name##_get(#__VA_ARGS__);\
enum name {\
__VA_ARGS__\
}

And you define an enum as following:
Code: [Select]
ENUM(the_name, CONST_1, CONST_2); so you can get the values using
Code: [Select]
the_name_get[CONST_1]
* Disadvantages:
 - I haven't managed to disallow i.e. the_name_get[5]
 - Instances of unimportant_the_name can be created (maybe this can be avoided, I am not yet a that good programmer)
 - The semicolon at the end is necessary
 - Maybe other disadvantages

* Advantages
 - You can do many things you can do with "normal" enums what you can't do with the initial solution (at least I think so) like
 
Code: [Select]
ENUM(the_name, CONST_1 = 5, CONST_2) a_enum = CONST_2;
EDIT:
 - Since names of enums are only used for debug purposes, the whole code could be put into an ifdef so it is not created for release versions so there would not be overhead

EDIT II (Short description of what the code does):
 - The class constructor initialises the class using the string "CONST_1 = 5, CONST_2", the greatest part of it is string processing
 - The operator[] returns the string corresponding to the value or null if no element exists
 - The destructor (which gets never called?) frees some memory

EDIT III:
 - The operator []-function was wrong (haven't thought of something like enum xyz{one = 5, two= 1};) - For large enums, it would be faster to store the strings in a sorted way so logarithmic access would be possible

EDIT IV:
 - Fixed a bug and made it possible to start with negative values
« Last Edit: 25 February 2009, 18:49:10 by PolitikerNEU »

hailstone

  • Local Moderator
  • Battle Machine
  • ********
  • Posts: 1,568
    • View Profile
Re: Programmers Opinion: enums
« Reply #7 on: 25 February 2009, 12:24:29 »
Quote
unimportant_##name(const char* init) {\
      num_char = 1;\
      int curpos = 0;\
      while (init[curpos]) {\
         if (init[curpos] == ',') {\
            num_char++;\
         }\
         curpos++;\
      }
I'm thinking this loop will either get an index out of bounds error (maybe in the form of not allowed to access memory) or loop infinitely.
Glest Advanced Engine - Admin/Programmer
https://sourceforge.net/projects/glestae/

PolitikerNEU

  • Guest
Re: Programmers Opinion: enums
« Reply #8 on: 25 February 2009, 12:41:16 »
I don't think so because c-strings (char*) are ended with a NULL-Char ('\000') per definition and since we only get valid strings because they are created by ""-ing the argument list
init[curpos] is NULL at the last char (and 0 = false), so the loop aborts correctly.

EDIT:
I mean: If you see "CONST_1 = 5, CONST_2" it is really "CONST_1 = 5, CONST_2\000" in memory (how else would any function know when a c-string ends?)
« Last Edit: 25 February 2009, 12:42:49 by PolitikerNEU »

daniel.santos

  • Guest
Re: Programmers Opinion: enums
« Reply #9 on: 26 February 2009, 06:01:13 »
Well, I must say that this has been a most useful discussion!  I originally started writing a response to this thread about 6 hours ago, but since discovered a mechanism that would work, figured out that it wouldn't and then discovered how to make it work anyway! lol!

So, first off I shall say to PolitikerNEU that I like the basics of your approach (I wrote this part several hours ago).  But it seems that it would be better to create a single class and have your macro just use that.  As it is, your code will create a new class for each enum and that means more code being generated by the compiler (and a larger executable).  And the more executable code you have, the less likely you are to get a cache hit (in the CPU) when fetching instructions for execution.  Even though you may not see the code in your source files, the compiler will generate all of the code for the class each time, even though each enum class will do the exact same thing.  Another ugly hacky thing you might want to consider is modifying the data segment of your app!  I think that in some cases (maybe with some security hardened compilers & kernel modifications, like PaX, GRSecurity, and some of the Hardened Gentoo's gcc patch, etc.) this will cause some type of fatal exception.  But in short, you may have shown me a true viable option that isn't as ugly as my approach! :)

So, here's the scoop:

First off, I wrote a generic class (as I suggested to PolitikerNEU above) and gave it the option as a constructor paramter as to rather or not to copy the data to the heap.  If you specify false, it just modifies the data in the data segment of the executable -- an area that supposed to be read only, but that's only in theory.  But if you get a crash (as I mentioned before, I think some compilers or Linux Kernel mods can optionally give you a fatal exception for trying to do this) then you can just pass true and it will never try to modify it.

In addition, I used lazy initialization so that if the names are never used, the footprint is almost non-existent.

shared/include/util/util.h:
Code: [Select]
class EnumNames : Uncopyable {
private:
    const char *valueList;
    const char **names;
    size_t count;
    bool copyStrings;

public:
    EnumNames(const char *valueList, size_t count, bool copyStrings);
    ~EnumNames();
    const char*getName(int i) const {
        assert(i < count);
        if(!names) {
            const_cast<EnumNames*>(this)->init();
        }
        return names[i];
    }

private:
    void init();
};

So lazy initialization works by checking if names is NULL or not.  If it's NULL, then we known we aren't initialized yet.  Note that it's not thread-safe, so if there's a possibility for contention, you can just call the getName() method for each object in your main thread before forking or creating your other threads -- this will force initialization.  Since it's really read-only after that, there's no danger of race condition if it's already initialized.  Alternately, a value could be added to specify rather or not to use lazy initialization.

from shared/sources/util/util.cpp:
Code: [Select]
EnumNames::EnumNames(const char *valueList, size_t count, bool copyStrings)
        : valueList(valueList)
        , names(NULL)
        , count(count)
        , copyStrings(copyStrings) {
}

EnumNames::~EnumNames() {
    if(names) {
        delete[] names;
        if(copyStrings) {
            delete[] valueList;
        }
    }
}

void EnumNames::init() {
    size_t curName = 0;
    bool needStart = true;

    assert(!names);
    names = new const char *[count];
    if(copyStrings) {
        valueList = strcpy(new char[strlen(valueList) + 1], valueList);
    }

    for(char *p = const_cast<char*>(valueList); *p; ++p) {
        if(isspace(*p)) { // I don't want to have any leading or trailing whitespace
            *p = 0;
        } else if(needStart) {
            // do some basic sanity checking, even though the compiler should catch any such errors
            assert(isalpha(*p) || *p == '_');
            assert(curName < count);
            names[curName++] = p;
            needStart = false;
        } else if(*p == ',') {
            assert(!needStart);
            needStart = true;
            *p = 0;
        }
    }
    assert(curName == count);
}
So I'm basically doing what PolitikerNEU does in his class except that I'm also stripping out any white space and doing some sanity checks.  The sanity checks are mostly to make sure my code is correct because if the macros are called incorrectly, it should result in a compiler error.

So here is where we actually define our enums.  I'm putting them in game/game/game_constants.h and .cpp.  Here's the header file snippet:
Code: [Select]
#ifdef GAME_CONSTANTS_DEF
#   define STRINGY_ENUM(name, enumNamesObject, countValue, ...) \
        enum name {__VA_ARGS__, countValue}; \
        Shared::Util::EnumNames enumNamesObject(#__VA_ARGS__, countValue, false)
#else
#   define STRINGY_ENUM(name, enumNamesObject, countValue, ...) \
        enum name {__VA_ARGS__, countValue}; \
        extern Shared::Util::EnumNames enumNamesObject
#endif

/**
 * Enumeration to manage the various states of a network interface.  This value is used
 * for both local and remote interfaces.  The enum value can be converted into a string
 * containing it's name by calling stateEnumNames.getName(value);
 */
STRINGY_ENUM(State, stateEnumNames, STATE_COUNT,
    STATE_UNCONNECTED,  /** not yet connected */
    STATE_LISTENING,    /** not yet connected, but listening for connections */
    STATE_CONNECTED,    /** established a connection and sent (or sending) handshake */
    STATE_INTRODUCED,   /** handshake completed */
    STATE_INITIALIZED,  /** has received game settings (still in lobby) */
    STATE_LAUNCH_READY, /** ready to launch game (when given the O.K.) */
    STATE_LAUNCHING,    /** launching game */
    STATE_READY,        /** ready to begin play */
    STATE_PLAY,         /** game started (normal play state) */
    STATE_PAUSED,       /** game paused */
    STATE_QUIT,         /** quit game requested/initiated */
    STATE_END           /** game terminated */
);

I wish I had thought of this before.  So in game_constants.cpp, we just define GAME_CONSTANTS_DEF before including the header file and it changes the "extern EnumNames" declarations to definitions.

from game/game/game_constants.cpp:
Code: [Select]
#define GAME_CONSTANTS_DEF
#include "game_constants.h"

So now this initialization is less ugly, it's still compact & efficient and it should still be Doxygen-compatibile, as long as those arguments are passed.  Oops, I forgot to mention the arguments.  I put them in the comments of my original post.  But here is what goes in the Doxygen.ini:
Code: [Select]
MACRO_EXPANSION=YES,
EXPAND_ONLY_PREDEF=YES
EXPAND_AS_DEFINED=STRINGY_ENUM

In addition, if we wanted to nullify this for release builds we could use #ifdef DEBUG around the EnumNames declaration & implementation and make it into:
Code: [Select]
#ifndef SOME_DEBUG_MACRO
class EnumNames {
public:
    EnumNames(const char *valueList, size_t count, bool copyStrings) {}
    const char *getName(int i) {return "";}
};
#else
class EnumNames { // the real declaration here
.
.
.
#endif
Then, as long as inlining is turned on, it should get completely compiled out and the strings should never appear in any object files.  Calls to EnumName::getName() will likely be replaced with a pointer to a null value somewhere in the executable's data segment and each instance will end up a zero-sized object.

So thanks again PolitikerNEU for sharing your ideas!!

PolitikerNEU

  • Guest
Re: Programmers Opinion: enums
« Reply #10 on: 26 February 2009, 09:41:35 »
At first I have to say: Wow, this code is really good.

I got a SIGSEGV if I tried to run this code with "false" - I didn't got it when i used "true" (and am using Linux) - so I think it would be better to let this value be true if compiling on linux.

One "problem" I found is that this code doesn't handle enum-code like
enum {first = -1, second = 3, third}; correctly (It should compile but will (maybe) give an Array-Out-Of-Bounds-Exception - but it will not return the correct name).
My assumptions are:
 * It is not common to use negative values in enums
 * If values are set, then mostly there are no large "gaps" inbetween
 * Most of the time the values of enums are sorted or nearly sorted.

Therefore we need an implementation which is fast in the common case (and I think daniel's implementation is fast in this case) but does not fail in the not-so-common-cases because even if there is currently no such enum in glest, it is possible that at some point in the future this is added and the code should not fail then.

I am trying to change daniel's code a bit so it doesn't fail in the not-common cases and I'll post it when it is ready (so that someone can make it better) - but I don't know if I'll manage to to it in the near future (meaning: next 24 hours or so) because I got an exam tomorrow.

EDIT:
Removed code
« Last Edit: 26 February 2009, 10:50:58 by PolitikerNEU »

daniel.santos

  • Guest
Re: Programmers Opinion: enums
« Reply #11 on: 26 February 2009, 10:43:14 »
At first I have to say: Wow, this code is really good.
*blush* thank you! :)

I got a SIGSEGV if I tried to run this code with "false" - I didn't got it when i used "true" (and am using Linux) - so I think it would be better to let this value be true if compiling on linux.
Ahh, good to know.  I haven't actually gotten a chance to run it yet.  I know it's bad form to modify data that's supposed to be read-only, maybe it was on windows that I got away with this all of the time before.

One "problem" I found is that this code doesn't handle enum-code like
enum {first = -1, second = 3, third}; correctly
I haven't actually run it yet because I have my sources broken with some stuff right now, but I forgot to mention that it's not designed to handle anything other than the 0, 1, 2, 3 types of linear values.  To support that it would need a lot of modifications and maybe use a std:map<int, const char*>.  In fact, I'm a little surprised that it would even compile with that, but when I think about it it makes sense.  The text parsing would have to look for an equals sign and then evaluate the expression, which doesn't have to be an actual number.  For instance, this is valid C++ code (but not valid in C).
Code: [Select]
const int BLAH = 1;
enum eblah {
    yada = BLAH,
    fred = BLAH + 2,
    jack = (BLAH - 42 * 16) / 3,
    jill = (int)(BLAH * 4.6 - 2) | 0xff334455ll + (int)4.3 ^ jack
};
I can tell you that I'm not going to write code that parses that!  The compiler will actually evaluate this at compile time and just insert the results, which will be some screwed up number.  In C++ and especially with more modern compilers (that support C99 & such) it's better to prefer const values to #define for constants.  If you are a student of the C++ language, I highly suggest Scott Meyer's books, the man is a true archon of the C++ language.  I'm working through Effective C++ right now and I've learned soooo much! :)

PolitikerNEU

  • Guest
Re: Programmers Opinion: enums
« Reply #12 on: 26 February 2009, 11:06:42 »
Just a quick idea (I hope it could work):
How about using
Code: [Select]
namespace anyother {
void function() {
int __VA_ARGS__;
int xyz[] = {__VA_ARGS};
}
}

I think this could work and then we have all values in the array xyz and could read from them without having to parse the code.

EDIT: I'll try to adopt your code to this - maybe it works

EDIT II: The non-initialised values should not be a problem because they can be detected (they have no =) and calculated from the known values

EDIT III: The namespace could be a problem - EDIT IV: No, it's no problem

EDIT: It seems to work (:-)) - I have tested it with your yada, fred etc. code ), but the problem is: I need to create an additional function for each macro which of course adds overhead independant if I need it or not - so the whole is slower
The value-getting is slower too - and it is much slower than before if only a single = exists, so my code needs a lot of optimizing

This is the code:
For the .h:
Code: [Select]
class EnumNames {
private:
    const char *valueList;
    int *numbers;
    char **names;
    bool *equalSigns;
    size_t count;
    bool copyStrings;

    void (*tocall)(size_t&, int*&);

public:
    EnumNames(const char *valueList, bool copyStrings, void (*func)(size_t&, int*&));
    ~EnumNames();
    const char*getName(int i) const {
        if(!names) {
            const_cast<EnumNames*>(this)->init();
        }
        if(!numbers) {
         return names[i];
        } else {
        int left = 0;
        int right = count-1;
        while(left != right && numbers[left] != i) {
        if(numbers[(left + right) / 2] < i) {
        left = (left + right + 1) / 2;
        } else {
        right = (left + right) / 2;
        }
        }
        if(numbers[left] == i) {
        return names[left];
        }
        return NULL;
        }
    }

private:
    void init();
    void sort();
};

For the .cpp:
Code: [Select]
EnumNames::EnumNames(const char *valueList, bool copyStrings, void (*func)(size_t&, int*&))
        : valueList(valueList)
        , names(NULL)
        , copyStrings(copyStrings)
        , tocall(func){
}

EnumNames::~EnumNames() {
    if(names) {
        delete[] names;
        if(copyStrings) {
            delete[] valueList;
        }
    }
}

void EnumNames::sort() {
int * arr = numbers;
char **arrC = const_cast<char**>(names);
int *bufArray = new int[count];
char **bufArrayC = new char*[count];

for(size_t width = 1; width < count; width *= 2) {
char **jC = arrC;
char **targetC = bufArrayC;
for(int *j = arr, *target = bufArray; j < arr + count; j += 2*width, jC += 2*width) {
if(j + width >= arr + count) {
for(;j < arr + count; ++j, ++jC) {
*target = *j;
*targetC = *jC;
++target;
++targetC;
}
break;
}

int *left = j;
char **leftC = jC;
int *right = j + width;
char **rightC = jC + width;
int *leftend = right;
int *rightend = (j + 2 * width >= arr + count)?(arr + count):(j + 2 * width);

for(;left < leftend && right < rightend;) {
if (*left < *right) {
*targetC = *leftC;
*target = *left;
++left;
++leftC;
} else {
*targetC = *rightC;
*target = *right;
++right;
++rightC;
}
++target;
++targetC;
}

for(;left < leftend;++left,++leftC) {
*target = *left;
*targetC = *leftC;
++target;
++targetC;
}
for(;right < rightend;++right,++rightC) {
*targetC = *rightC;
    *target = *right;
++target;
++targetC;
}
}
int *swap = arr;
arr = bufArray;
bufArray = swap;
char **swapC = arrC;
arrC = bufArrayC;
bufArrayC = swapC;
}

numbers = const_cast<int*>(arr);
names = arrC;
delete[] bufArray;
delete[] bufArrayC;
}

void EnumNames::init() {
tocall(count,numbers);
    size_t curName = 0;
    bool needStart = true;
    bool equalSign = false;
    int openBraces = 0;

    assert(!names);
    equalSigns = new bool[count];
    names = new char *[count];
    if(copyStrings) {
        valueList = strcpy(new char[strlen(valueList) + 1], valueList);
    }

    for(char *p = const_cast<char*>(valueList); *p; ++p) {
        if(isspace(*p)) { // I don't want to have any leading or trailing whitespace
            *p = 0;
        } else if(needStart) {
            // do some basic sanity checking, even though the compiler should catch any such errors
            assert(isalpha(*p) || *p == '_');
            assert(curName < count);
            equalSigns[curName] = false;
            names[curName++] = p;
            needStart = false;
        } else if(!(openBraces) && *p == '=') {
        equalSigns[curName-1] = true;
        equalSign = true;
        } else if(*p == '(') {
        ++openBraces;
        } else if (*p == ')') {
        --openBraces;
        } else if(!(openBraces) && *p == ',') {
            assert(!needStart);
            needStart = true;
            *p = 0;
        }
    }
    if (equalSign) {
    //"Correct" the wrong values (which had no assignment)
    for (size_t i = 0; i < count; ++i) {
if (!(equalSigns[i])) {
if (i) {
numbers[i] = numbers[i-1]+1;
} else {
numbers[0] = 0;
}
}
}
    sort(); //Sort the values using mergesort
    } else {
    delete[] numbers;
    numbers = 0;
    }
    assert(curName == count);
}

But my macro-definition is a lot longer now :-( - but some parts could be written to another function (sorry, this is only a part of the definition, I'll extend it to what it was (including the ifdefs, I mean) soon - If memcpy is allowed, it could be used for copying the array
Code: [Select]
#ifdef GAME_CONSTANTS_DEF
#   define STRINGY_ENUM(name, enumNamesObject,  ...) \
        enum name {__VA_ARGS__}; \
        void name##NumbersInit(size_t & count, int*& numbers) {\
        assert(!numbers)\
        int __VA_ARGS__;\
        int temp[] = {__VA_ARGS__};\
        count = sizeof(temp) / sizeof(int);\
        numbers = new int[count];\
        for(size_t i = 0; i < count; ++i) {\
        numbers[i] = temp[i];\
        }\
        }\
        Shared::Util::EnumNames enumNamesObject(#__VA_ARGS__, true, name##NumbersInit)
#else
#   define STRINGY_ENUM(name, enumNamesObject, ...) \
        enum name {__VA_ARGS__}; \
        extern Shared::Util::EnumNames enumNamesObject
#endif

PS: I have  tested it using a simple new C++-Project (and deleting some namespaces and adding the iswhite etc. function) - and I didn't test the macro code, but I hope it is correct

There is no "count"-variable any longer, which makes the structure a little bit "more beatiful", but of course slower.
But generally I think performance is not so important here because everything is only used when debugging

EDIT: Fixed the macro thing, soon I'll add sorting the array

EDIT II: Added the sorting
I have tested the code using daniel's blahs:
Code: [Select]
const int BLAH = 1;
STRINGY_ENUM(eblah,eblah_get,
first,
second,
third = 4,
forth,
    yada = BLAH,
    fred = BLAH + 2,
    jack = (BLAH - 42 * 16) / 3,
    jill = (int)(BLAH * 4.6 - 2) | 0xff334455ll + (int)4.3 ^ jack,
    end
);

int main() {
cout << eblah_get.getName(first) << " = " << first << endl;
cout << eblah_get.getName(second) << " = " << second << endl;
cout << eblah_get.getName(third) << " = " << third << endl;
cout << eblah_get.getName(forth) << " = " << forth << endl;
cout << eblah_get.getName(yada) << " = " << yada << endl;
cout << eblah_get.getName(fred) << " = " << fred << endl;
cout << eblah_get.getName(jack) << " = " << jack << endl;
cout << eblah_get.getName(jill) << " = " << jill << endl;
cout << eblah_get.getName(end) << " = " << end << endl;
}
Note that is is not a "bug" (at least I don't think so) that you get "yada" instead of second because both have value 1
« Last Edit: 27 February 2009, 13:26:17 by PolitikerNEU »

hailstone

  • Local Moderator
  • Battle Machine
  • ********
  • Posts: 1,568
    • View Profile
Re: Programmers Opinion: enums
« Reply #13 on: 26 February 2009, 13:30:06 »
I don't think so because c-strings (char*) are ended with a NULL-Char ('\000') per definition and since we only get valid strings because they are created by ""-ing the argument list
init[curpos] is NULL at the last char (and 0 = false), so the loop aborts correctly.

EDIT:
I mean: If you see "CONST_1 = 5, CONST_2" it is really "CONST_1 = 5, CONST_2\000" in memory (how else would any function know when a c-string ends?)
OK. I forgot about that but why are you using c-strings instead of strings?
Glest Advanced Engine - Admin/Programmer
https://sourceforge.net/projects/glestae/

PolitikerNEU

  • Guest
Re: Programmers Opinion: enums
« Reply #14 on: 26 February 2009, 13:56:40 »
I know it is obsolete because I now have used daniel's code but - I haven't studied the glest source yet so I wanted to keep external dependencies (and using std::string is a such I think) as low as possible - that's why I used cstrings. (BTW: Daniel's method of using pointer to a string was better anyway)
« Last Edit: 26 February 2009, 14:02:02 by PolitikerNEU »

hailstone

  • Local Moderator
  • Battle Machine
  • ********
  • Posts: 1,568
    • View Profile
Re: Programmers Opinion: enums
« Reply #15 on: 26 February 2009, 14:17:13 »
OK, thanks. I'm not quite sure what the benefits of different things are yet. I'm studying data structures and algorithms with c++ and another unit on large scale software development (with Java) this semester which I'm hoping will fill in some blanks.
Glest Advanced Engine - Admin/Programmer
https://sourceforge.net/projects/glestae/

PolitikerNEU

  • Guest
Re: Programmers Opinion: enums
« Reply #16 on: 27 February 2009, 13:30:51 »
hmm ... at least I haven't learned the differences between implemenations of different data structures in algorithms and data structures - we have only learned about the advantages/disadvantages of different sorting algorithms, heuristics etc.

In short: A c-string is simply an array of characters, meaning: you cannot increase the length of the string etc.
I think the c++-String uses a character-array too (?), but allocates more memory than needed for the string so you can add some characters (and of course methods are more "beautiful")

daniel.santos

  • Guest
Re: Programmers Opinion: enums
« Reply #17 on: 27 March 2009, 00:19:04 »
Yes, C++'s std::string class is a slightly heavier weight object than a standard C-style string.  I suppose it's a trade-off issue.  But when you have a large string that you will be breaking up into multiple strings, the efficiency can be multiplied by using a single buffer to store all of those strings and then allocate an array of pointers to those strings.  In the case of my enum example, I'm wanting this particular snippet of code to be very lightweight because it may be of limited or no use for release apps, yet it may be included in released builds.  Thus, I'm using some of these lighter weight short-cuts to reduce overhead.

I like how Scott Meyers puts it in his Effective C++ book, Item #1: View C++ as a federation of languages.  This federation consisting of C, C++, STL and io streams.  In the EnumNames class, I'm exploiting features of the C language (which C++ is built upon) to gain efficiency.  In general, C *can be* (but not necessarily *is*) more efficient, at the price of greater difficulty to use and being easier to use incorrectly resulting in memory leaks, buffer overruns, etc.  Remember that every memory allocation must be managed individually by your memory management sub-system(s).  There is a small price to pay for each individual allocation.  In a high use situation, you can often gain substantial performance benefits by allocating a large block of data that you use in a customized fashion over allocating multiple small objects.  Again, the price is that you invite errors and have to do heavy scrutinization of your code to make sure your custom memory management is working correctly.  A good example of this is the id Software's Quake engine, which performed it's own memory management to a large extent.

As a personal example, I had an app that had to monitor a file system directory and notify other parts of the app if a file was added or removed from the directory.  I initially wrote this half-heartedly and later discovered that in production, there were upwards of 160k files in these directories and my app was chewing up CPU and memory like mad! (it was in Java).  Thus, I had to write a mechanism using a combination of C and C++ (and link it in via Java Native Interface) where I did such custom memory management because the price of when there was 160k objects was high enough that it was worth the time and effort.  Of course, if this had been on Linux (it was HP-UX) I could have just used FAM or some such to request that the kernel inform me of changes.

In general, I don't promote attempting to perform your own memory management (like in Quake & such) but there are times when it's appropriate.
« Last Edit: 27 March 2009, 00:21:34 by daniel.santos »

 

anything