Hideki A. Ikeda (HidekiAI) [池田英紀] ["Tony" Ikeda] –  BLog

Tag: coding

Most productive with 8+ hours of sleep…

by HidekiAI on Sep.16, 2009, under Technology Opinions

I just saw a very interesting survey at www.CodeProject.com.  It was/is interesting to me because I used to keep track of daily logs and journals to determine why I felt so unproductive (to me, as a programmer, productivity is based on getting my tasks done on time).

Here’s the snapshot of the survey:

When do you do your best coding?

Survey period: 31 Aug 2009 to 7 Sep 2009

Are you a night owl? An early riser? Or does 9-5 work for you?
Choose the periods in which you feel you code best.

Option Votes

%

6am – 9am

284

18.39

284 votes, 18.39%
9am – 12pm

615

39.83

615 votes, 39.83%
12pm – 3pm

256

16.58

256 votes, 16.58%
3pm – 6pm

381

24.68

381 votes, 24.68%
6pm – 9pm

350

22.67

350 votes, 22.67%
9pm – 12am

430

27.85

430 votes, 27.85%
12am – 3am

302

19.56

302 votes, 19.56%
3am – 6am

118

7.64

118 votes, 7.64%
Responses

1544

1544 votes
This is a multiple choice question. Totals may not add up to 100%

From: http://www.codeproject.com/script/Surveys/Results.aspx?srvid=953

Notice that the highest productivity for most are at 0900-1200 hrs.  When I did contracting for a while working from home, I realized that when I woke up at 0700 hrs and began straight to work, I was able to solve problems and issues much more efficiently and quicker in the mornings.

So based on that pattern, I began to make sure that any issues which involved more analytical and thinking process were reserved for the mornings or put off until the next day if needed, and anything that was more mechanical were scheduled after 1200 (noon).

I dislike mornings, I prefer to sleep in until 0800 or later on weekends.  So I’m not really a morning person, or to say that I am productive in the early mornings.  I believe it is more about having a good rest and stepping away from the problems for period of time is what makes me productive.

This is why I never pull an all-nighters (if I can help it) because if I cannot solve the problems due to my “big muscle” (our brain) being tired, it’s just unproductive and inefficient in the long run.

I’ve made dumb coding-mistakes in the past due to lack of sleep.  The one that I remember most well and use as my example is the one about multiplying an integer by 8 by using a loop for it (in Assembly Language, I think it was on Zilog Z80).  Yes, if I was wide awake (or even half awake), I would have just left-shifted the register (the integer value) by 3 and be done with it…  The next day when I saw that code, I was in horror…

Unfortunately, I did not learn any lesson back then, I was young and I always assumed putting more hours into it in single sittings meant working hard at it.  I did not realize that it was important to work smarter, not harder…

Related posts

Leave a Comment :, more...

Why KeepAlive Heartbeats should be on Dedicated Socket Stream

by HidekiAI on Mar.15, 2009, under

This may sound like a trivial thing, a common sense, yet it is not the first time I’ve seen this issue, even to an experienced game developer who has worked on more than one networking projects.

It may be more of an oversight or possibly, it’s so trivial and common sense (once you’ve explained the issue), that the developer(s) may have just assumed it will be and just made an assumptions about having keep-alive heartbeat messages to be asynchronous and on a separate thread.

My pages and blogs tends to be more about anti-pattern than design pattern. To me, learning from personal and/or others mistakes are more educational because there is a story to tell of “this is what happened and this is the consequences I’ve paid for the mistakes”.

In any case, if you ask any programmers who’s worked (at least a little) on networking games will tell you that (most likely) it is just pure common sense to make heartbeat messages (a.k.a. keepalive, pingpong, etc) should be on at least asynchronous (bi-directional so it does not block on incoming and outgoing messages) and preferably on a separate thread (none-blocking).  IMHO, I think it’s the opposite, it should be separate threaded but optionally asynchronous, or to be more specific, on a separate stream dedicated just for keep-alive/heartbeat (not necessarily separate thread).

Normally (in my opinion based on experiences) a heartbeat/keep-alive message should be:

  • On a separate thread – on the client side, heartbeat should be scheduled (sleep) and sent on a separate thread, while on the server side it should poll or trigger (commonly as a worker thread on server side if polling or some kind of I/O Completion Port trigger to wake up sleeping thread) to process it.
  • From the client side – it should be up to the client to inform the server that it wants to stay alive.  One of the main purpose of heartbeat messages (some would say only purpose) should be so that if server does not hear from that client within expected window of time, it will disconnect it.  On a WAN based server, this is more obvious because you want to avoid DoS (unless you have some kind of tar-pit mechanism which does not consume resources) so that you can release the resources allocated to the socket(s) before you run out of heap.
  • Keep-alive message should only be sent from the client if in idled mode (optional) – why bother sending keep-alive message to the server if other messages (such as positional or action messages) are already being sent.  Again, the purpose of keep-alive is so that the server is aware that the client is not sleeping (i.e. hang, even though TCP/IP socket is still open).  This is a bit more tedious since it would require some state tracking of when the last message was sent to the server.
  • Preferably asynchronous – on both the client side sending the heartbeat message and server side receiving it, in case the sender (client) requires an acknowledgment of response (from the server), you want it asynchronous so that incoming and outgoing streams.  This is somewhat optional for it is usually based on design (i.e. if you want to determine the ping-time so that you can anticipate on dead-reckoning or want to have some kind of dashboard or HUD to show to the user the round-trip).  You should not count on it for I’ve seen situations in the past where the network was setup with synchronous HUB, in which all messages were forced to be uni-directional even if the code was capable of handling bidirectional.  Sometimes, you become limited by hardware…

Just as this definition of KeepAlive (although this page is for HTTP server based, the concept is the same), keep-alive mechanism should at least have intervals which are negotiated at the beginning of the connection (i.e. the server side informs the client that it expects keep-alive messages every n seconds).  I usually also give some mercy time on the server side for I’ve seen cases where the client would squeeze in at the last millisecond of window-of-opportunity to send its KeepAlive message in which the delay in the network has caused server had disconnected and milliseconds later, that KeepAlive message arrives.

Because keep-alive/heartbeat messages are so small, it may be argued that it should be sent (from the client-side) every interval even if other messages were sent to indicate that it’s still connected, some may counter-argue that it’s wasteful.  However way it is done is irrelevant, only thing that matters is that it is cleanly integrated and trusted that it would do as expected/designed.  Meaning:

  • Client sends a keep-alive on schedule
  • Server receives keep-alive message within the expected time-window so it won’t forcefully disconnect the client.

From my experiences, ones that counter-argue that they should not have to send keep-alive as long as any kind of messages are being sent to the server to indicate that the client is alive, have probably made a naive assumptions that server is implemented to treat the message stream in none-serialized form.

The problem I’ve seen from serializing the stream is the following scenario (and it does not matter whether it is on the server or client side, the effect is the same):

  1. Client is sending messages and the server updates its lastKeepAliveTime (or resets its nextExpectedKeepAliveTime – implementations is irrelevant for there are many ways to do this)
  2. The server model (although asynchronous) is serialized on each clients’ incoming message streams, and processes the requests in order received.
  3. One of the messages just happens to be the type of message which has a bug or does not comply to the requirements of agreed upon time-slice required by design, that it blocks for some time.  For example, that message just happens to query an SQL or another server for data, which has the potential to block (short period time-out is the key to resolve this, but makes it quite difficult to handle for there will be more error cases than success).
  4. The same type of messages are stacked few more times in the message queue, and the client goes idle.
  5. Server is being blocked for a while now, busy processing these (incoming) messages.
  6. Client sends a KeepAlive message because it hasn’t sent any messages in a while.
  7. That KeepAlive goes into the tail of the incoming message-queue
  8. The scheduler kicks in (on a separate thread) because it is time to check if the client has sent its last KeepAlive.  Keep in mind that client is sending on the same stream for both normal message and KeepAlive message.
  9. Because the KeepAlive message is still at the back of the queue, even though the client had disciplined to send the KeepAlive, server has determined it is going to disconnect it on the other thread.

All in all, by dedicating a separate incoming steam (which would mean two or more ports/socket per client) would cleanly manage the heartbeat thread from disconnecting the client on the server side.

Again, all these issues I’ve mentioned seems quite trivial and common sense, yet it has been ignored for one reason or another.  Having a separate stream for keep-alive/heartbeat message would resolve these issues, but that would be (sometimes) difficult (actually, more like “risky”) to integrate late in the development (it’s harder to add code late in the project, especially multi-thread implementations for it can introduce new and more complex bugs).  It also requires negotiation between client team and server team (if there are separation of such, rather than just “network team”) because both side has to agree that they will dedicate a separate socket for just the heartbeat message.

Advantages and Disadvantages of Separate Socket

There are some (possibly, quite a few) disadvantages to having a separate socket for keep-alive/heartbeat messages as much as there are advantages.

I’ve seen edge-cases in the past (on the client side) in which DirectX had hung (blocked indefinitely) on the main thread but the separate thread for KeepAlive message hasn’t.  This caused an issue where the server was receiving the heartbeat messages so it would not disconnect the client, yet there were no transaction messages coming in.

From the server side perspective, as long as the heartbeat message is being processed, it cannot (will not) disconnect the client.  On a stateless server mechanism where it trusts the client to direct the server what it is up to, server has no idea what to expect so it cannot predict such as “I’ve not received positional update in a while so I better disconnect it” for it does not know if the client is paused or not.

These kind of situations are based on case-by-cases and is based on design more than anything else.  If the design has specificed that server is not stateless, possibly, it can have made more intelligent decisions based on conditions.  It’s always (and should be) based on what design demands, but as a programmer, it is up to you to inform the architect or designers of the consequences of such edge-cases.

Or what about cases of DoS where the attacker has figured out that as long as they can keep the socket updated with heartbeat message (while also keeping the main socket alive) that the server will not disconnect, causing starvations of resources?

What about a bug in the client side in which (due to complexities of multi-threading) the client has disconnected from the main socket but kept the heartbeat socket alive, reconnected and opened another socket for main and heartbeat (total of 3 sockets with the zombie/dangling socket from the bug)?  Of course, it should be the discipline of the server side to disconnect that zombie socket if the main socket is disconnected, but you’d probably not catch that bug until you’ve encountered it (depends on the implementations of course, if done right, you’d never encounter this bug).

I’ve seen implementations which treated each socket as separate thread on the server side, that can cause some hair-pulling experiences if decisions are made as an afterthought of allowing multiple sockets per client too late in the project (near production release).

Heartbeat/keep-alive messages are trivial concept, yet if not thought out and designed early in project, it can backfire, so don’t forget to at least have some write-up in your design stage of your networking structure for both client and server so that it won’t bite you too late in your project.  At the end of the day, both the client and server side programmer(s) should be making sure what you can do to assure that the clients won’t get disconnected for wrong reason and it will require some cooperations and negotiations.

Related posts

Leave a Comment :, , , , , , , more...

Use your O/S to Multi-Process

by HidekiAI on Jun.15, 2008, under

Introduction

This page is not about how to program multi-threading, multi-processing, parallel processing, prevention of deadlocks, how to work with MUTEX (semaphores, etc), race-conditions, good practices to working with threads, etc. This page is not about encouraging you to evolve your applications to begin using multi-processors to your advantages (Intel wrote a great overall intro to why you should think about doing parallel processing for game developers too). You can find all these kinds of documentations, white-papers, etc all over the web (my recommendations are to go to MSDN, IBM, Intel, and even developers.net to search for what you are looking for).

If you are an advance programmer, familiar with parallel processing, distributed (and grid) computing (Microsoft calls it HPC), or multi-threadings, this page is not for you. Most likely, you’ve Google’d searching for keyword such as “CreateThread” or “CreateProcess” and stumbled upon this page searching for sample code, in which case, my recommendations are to go to MSDN and use their sample code.

What this page is about is to demonstrate how easy it is to code in taking advantages of multi-processors using Windows APIs of either CreateThread() or CreateProcess(). The sample code is not truly thread-safe and does printf() without considerations of MUTEX possibly causing deadlocks, but that’s up to you (as a programmer) to make it work.

The sample code is barely a useful application, but it was written with proof-of-concept in mind to let you run and realize for yourselves that just by creating a thread, thanks to the Operating System (in this case, Windows XP Pro, Vista, Server) mechanism, you automatically get a free parallel processing feature on a multi-processor CPU system (if the O/S recognizes you have multiple processor that is).

Sample Code

Before we go more into depth, we’ll start off with the sample code. It is unfortunate that such an intuitive and useful application as WordPress cannot display source code well, I’m using now using Highlight Source Pro plug-in but it still is difficult to read (it’s way better than “Code Snippet” at least because it does not flow over the view), thus you’ll find the link here of the actual code.

The code was written and tested on the following configurations:

  • Visual Studio Express Edition 2008 Beta (VC9) – I cannot afford Visual Studio (that’s why I’ve switched to Linux for home – at work, I’m spoiled with Visual Studio) but this Express Edition is superior! Almost every common features I use at work on Visual Studio Professional (or Enterprise or Architect) is on Express Edition. So what if I cannot have C# and C++ (Managed) on the same solution, how often does that happen? I’m so surprised that Microsoft would give away such high quality IDE for free!
  • Windows Vista (32 bits) – My laptop came with 32-bits Vista (although most of the time, I’m using XUbuntu). I could have fired up Windows XP Home to test it on my desktop (this one is Gentoo compiled with x86_64 AMD Opteron X2) but from what I recall, Home edition does not support multi-processor (I may be wrong).
  • AMD Turion 64 X2 – It’s an HP Pavilion dv6000 series. I was kind of upset when I ordered from HP and specifically requested for 64-bit Vista just to try it out (since I knew most of the time, I’d be on Linux, I didn’t care if 64-bit version was incompatible with drivers, etc). But in any case, only reason why you’d want 64-bits Windows is to deal with memory barrier of greater than 3GBytes and I’d be insane to have a 4GByte laptop! I also could have tested it on my Opteron 64 X2 but well, I’m lazy…

You should be able to just create a Console Application in Visual Studio (should work on VC7, VC8, and VC9Beta), copy-and-paste the entire code and compile it. I don’t remember if VC7 (Visual Studio 2003) went with _tmain() or main() but if it doesn’t support UNICODE based mode, just edit the section that are UNICODE macros.

Speaking of UNICODE, all char* (string) based methods (such as printf(), etc) are using Visual Studio macros so that whether you compile with UNICODE defined or not, it should work transparently. For more infomation on this, go to Routine Mappings for more details (I think this link is for VC8).

//---------------------------------------------------------------------------------------------------------------------------------
//       Author: Hideki A. Ikeda
//      Purpose: Demonstrate and test the usage of CreateThread() and CreateProcess()
// Date Created: 06.14.2008
//---------------------------------------------------------------------------------------------------------------------------------
#include "stdafx.h"
#include     // for usage of toupper()
#include     // for std::string
 
//---------------------------------------------------------------------------------------------------------------------------------
const unsigned int    MINIMUM_PROCESSORS_COUNT = 2;    // default of 2 process minimum
 
//---------------------------------------------------------------------------------------------------------------------------------
bool            DoMasterProcess(_TCHAR * szApplicationName);
bool            DoMasterThread(void);
bool            DoTask(void);
DWORD WINAPI    DoWorkerThread(LPVOID lpParam);
 
//---------------------------------------------------------------------------------------------------------------------------------
int _tmain(int argc, _TCHAR* argv[])
{
    // argv[1]:
    //    * P = Multi Processor mode
    //    ** S = SubProcess created from main Process - this is because CreateProcess() runs another instance(s) of this application (as another process)
    //    * T = Multi Threaded mode
    //    * O = OpenMP mode (let the interface parallize the methods)
    if (argc > 1)
    {
        LARGE_INTEGER    startQPC;
        LARGE_INTEGER    frequency;
        QueryPerformanceCounter(&startQPC);
        QueryPerformanceFrequency(&frequency);
 
        // We onnly care about the first character of arg[1] (also force it upper case so we only do single compare per test)
        _TCHAR    parameter1 = _totupper(argv[1][0]);
        if (parameter1 == _T('P'))
        {
            if (DoMasterProcess(argv[0]) == false)
            {
                return(-1);
            }
        }
        else if (parameter1 == _T('S'))
        {
            if (DoTask() == false)
            {
                return(-2);
            }
        }
        else if (parameter1 == _T('T'))
        {
            if (DoMasterThread() == false)
            {
                return(-3);
            }
        }
        else if (parameter1 == _T('O'))
        {
            // Cannot utilize OpenMP on Express Edition.  For more details, see https://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=98939&wa=wsignin1.0
        }
        else
        {
        }
 
        LARGE_INTEGER    endQPC;
        QueryPerformanceCounter(&endQPC);
        const double    totalTimeInSeconds = (endQPC.QuadPart - startQPC.QuadPart) / (double) frequency.QuadPart;
        _tprintf(_T("\n\nTotal time was %f seconds\n\n"), totalTimeInSeconds);
    }
    return(0);
}
//---------------------------------------------------------------------------------------------------------------------------------
bool    DoMasterProcess(_TCHAR * szApplicationName)
{
    bool    processCompleted = false;
 
    if (szApplicationName && _tcsclen(szApplicationName))
    {
        SYSTEM_INFO    systemInfo;
        GetSystemInfo(&systemInfo);    // query how many processor this machine has
 
        // NOTE: CreateProcess() will not spawn a process if the application name is passed incorrectly (specifically UNICODE versus ASCII).
        //       Also, because argv[0] may contain space (because of full path name that has subfolder with spaces), it needs to
        //       be wrapped with quotes or else it any string separated by space would be treated as argv[n] command line arguments.
        //       Secondly, because UNICODE mode actually alters the command line arg, you cannot pass a const data or else it will
        //       cause access violations.
        const unsigned int    sizeofBuffer = 1024;
        _TCHAR    appName[sizeofBuffer];        appName[0] = 0;
        _TCHAR    commandArgs[sizeofBuffer];    commandArgs[0] = 0;
        _stprintf_s(appName, sizeofBuffer, _T("%s"), szApplicationName);            // no need to wrap it with double-quotes?
        _stprintf_s(commandArgs, sizeofBuffer, _T("\"%s\" S"), szApplicationName);    // make sure arg[0] part of the command line (must be wrapped with double-quotes)
#if defined(_DEBUG)
        _tprintf(_T("Start: MultiProcess mode - Creating two processes of '%s'\n"), szApplicationName);
#endif
        // create two sub processes
        unsigned int        processCount = MINIMUM_PROCESSORS_COUNT;
        if (processCount < systemInfo.dwNumberOfProcessors)
        {
            processCount = systemInfo.dwNumberOfProcessors;
        }
        _tprintf(_T("Creating %d Sub-Processes\n"), processCount);
        STARTUPINFO *            subProcessStartupInfo = new STARTUPINFO[processCount];
        PROCESS_INFORMATION *    subProcessInfo        = new PROCESS_INFORMATION[processCount];
        for (unsigned int currentProcessIndex = 0; currentProcessIndex < processCount; ++currentProcessIndex)
        {
            ZeroMemory(&subProcessStartupInfo[currentProcessIndex], sizeof(STARTUPINFO));
            subProcessStartupInfo[currentProcessIndex].cb = sizeof(STARTUPINFO);
            ZeroMemory(&subProcessInfo[currentProcessIndex], sizeof(PROCESS_INFORMATION));
 
            // For more details, see http://msdn.microsoft.com/en-us/library/ms682425(VS.85).aspx
            BOOL createSuccess = CreateProcess(
                                                appName,                // application name - this is without the double-quotes wrapped
                                                commandArgs,            // command line argument with argv[0] being the actual module (program) name.
                                                NULL,                    // ProcessAttributes
                                                NULL,                    // ThreadAttributes
                                                FALSE,                    // InheritHandles
                                                NORMAL_PRIORITY_CLASS,    // CreationFlags
                                                NULL,                    // Environment
                                                NULL,                    // CurrentDirectory - set this to NULL to use the same directory as the calling process!
                                                &subProcessStartupInfo[currentProcessIndex],
                                                &subProcessInfo[currentProcessIndex]);
            if (createSuccess == FALSE)
            {
                DWORD    lastError = GetLastError();
                _tprintf(_T("Unable to create SubProcess #%d with error #%d (0X%08X)\n"), currentProcessIndex, lastError, lastError);
                return(false);
            }
            else
            {
#if defined(_DEBUG)
                _tprintf(_T("\tSubProcess %d created successfully\n"), currentProcessIndex);
#endif
            }
        }
 
        // wait until the two processes completes
        for (unsigned int currentProcessIndex = 0; currentProcessIndex < processCount; ++currentProcessIndex)
        {
            WaitForSingleObject(subProcessInfo[currentProcessIndex].hProcess, INFINITE);
#if defined(_DEBUG)
            _tprintf(_T("\t\tSubprocess %d completed\n"), currentProcessIndex);
#endif
        }
 
        // Close process and thread handles.
        for (unsigned int currentProcessIndex = 0; currentProcessIndex < processCount; ++currentProcessIndex)
        {
            CloseHandle(subProcessInfo[currentProcessIndex].hProcess);
            CloseHandle(subProcessInfo[currentProcessIndex].hThread);
        }
#if defined(_DEBUG)
        _tprintf(_T("Done: MultiProcess mode\n"));
#endif
        delete [] subProcessStartupInfo;
        delete [] subProcessInfo;
        processCompleted = true;
    }
    return(processCompleted);
}
//---------------------------------------------------------------------------------------------------------------------------------
bool    DoMasterThread(void)
{
    bool    threadCompleted = true;
 
    SYSTEM_INFO    systemInfo;
    GetSystemInfo(&systemInfo);                // query how many processor this machine has
    unsigned int        processCount = MINIMUM_PROCESSORS_COUNT;
    if (processCount < systemInfo.dwNumberOfProcessors)
    {
        processCount = systemInfo.dwNumberOfProcessors;
    }
    _tprintf(_T("Creating %d Worker Threads\n"), processCount);
 
    HANDLE *    hThreadArray    = new HANDLE[processCount];    // handle returned by GetCurrentThread()
    DWORD *        dwThreadIdArray = new DWORD[processCount];    // ThreadID is optional but useful for debugging (but not really used in this demo code)
 
    // Create worker threads.
    for (unsigned int currentThreadIndex = 0; currentThreadIndex < processCount; ++currentThreadIndex)
    {
        // Create the thread to begin execution on its own - For mroe details, see http://msdn.microsoft.com/en-us/library/ms682453(VS.85).aspx
        hThreadArray[currentThreadIndex] = CreateThread(
                                                        NULL,                                    // lpThreadAttributes - Use default security attributes
                                                        0,                                        // dwStackSize - Use default stack size
                                                        DoWorkerThread,                            // lpStartAddress - My thread function name
                                                        NULL,                                    // lpParameter - My argument to thread function
                                                        0,                                        // dwCreationFlags - Use default creation flags
                                                        &dwThreadIdArray[currentThreadIndex]);    // lpThreadId - CreateThread() returns the thread ID and is useful for debugging
 
        if (hThreadArray[currentThreadIndex] == NULL)
        {
            // No need to call ExitProcess() (although you'd normally want to) because the main function will bail out upon failure
            DWORD    lastError = GetLastError();
            _tprintf(_T("Unable to create Thread #%d with error #%d (0X%08X)\n"), currentThreadIndex, lastError, lastError);
            return(false);
        }
    } // for()
 
    // Wait until all threads have terminated.
    WaitForMultipleObjects(processCount, hThreadArray, TRUE, INFINITE);    // bWaitAll = TRUE
 
    // Close thread handles
    for (unsigned int currentThreadIndex = 0; currentThreadIndex < processCount; ++currentThreadIndex)
    {
        CloseHandle(hThreadArray[currentThreadIndex]);
    }
    // clean up
    delete [] dwThreadIdArray;
    delete [] hThreadArray;
    return(threadCompleted);
}
//---------------------------------------------------------------------------------------------------------------------------------
// Normally, the lpParam is the data pointer to lpParameter passed via CreateThread()
DWORD WINAPI  DoWorkerThread(LPVOID /*lpParam*/)
{
    if (DoTask() == false)
    {
        return(1);
    }
    return(0);
}
//---------------------------------------------------------------------------------------------------------------------------------
// The task is simple.  Just wait 1 second and leave.
// The idea is that if the system is truly parallel, as long as we spawn equal number of processes or threads for the number of
// processors on the system, it should return all at once.
// For example, if you have a dual-core and you spawn 2 threads.  It should execute these two threads simultaneously, thus
// the time it should take to execute both threads would be 1 second (plus small overhead) because they should run in parallel.
bool    DoTask(void)
{
    bool    processCompleted = false;
    // TODO: Set affinity to assign per process so that we're forcing (but garaunteeing) that each process is dedicated to each processor
    // SetThreadAffinityMask(GetCurrentThread(), bitProcessorAffinityMaskFlag);
    Sleep(1000);
    return(processCompleted);
}

About the code

So the demo basically has two modes. You use the command line parameter “P” to create process and “T” for threads. I was going to demonstrate OpenMP but it is not supported for Express Edition (I think on retail version, OpenMP is only supported for Enterprise and above).

In any case, the code assumes you have at least a dual processor. But for both CreateThread() and CreateProcess() mode, even on a single processor system, it will at the minimum create two. On a quad processor, it will instantiate 4 threads and/or processes.

The task job and concept is simple, just wait 1 second (1000 milliseconds) and bail out. Before the processes or threads are created, we record the starting time (QueryPerformanceCounter()) and then calculate the deltaT upon all threads (or processes) have completed.

If all the tasks are in fact ran in parallel, the total time spent on the application should be close to 1 second plus the small overhead of creating and cleaning up. If the total time of execution is way over 1 second (i.e. 2 seconds), then most likely, the threads got serialized (meaning each thread ran one after next in order of creation) rather than parallelized.

The screen-shot below are an example on my dual processor (AMD Turion 64 X2), first with parameter “P” (CreateProcess()) and followed by “T” (CreateThread()). I then experimented by increasing the MINIMUM_PROCESSORS_COUNT = 8 just to demonstrate that there are no prohibition of just because you have single processor, you are not restricted to creating multiple processes or threads. It just demonstrates that your application will scale as you get more processors on your platform.

Of course, the better scientific and academic approach would be to collect data on the single processor (with almost exact same setup) with similar frequency, collect the data on two threads, then run it in dual processor and compare (and you should see that theoretically double the performance – or half the time it takes to process – Note that I am exaggerating when I say it will double, see Parallel Computing in regards to Amdahl’s law).

Console Output

In any case, the above screen-shot shows that the overhead for CreateProcess() method is way higher (0.043297 seconds = 43 milliSec) than CreateThread() (0.000405 seconds = 405 microSec). That’s a significant differences! But that doesn’t mean you should not dismiss CreateProcess() yet, nor is it the point here. The point here is that there are many ways to process tasks in parallel. Even with 43 milliseconds overhead, it was still able to process the tasks in parallel (or else it would taken 1000+ milliseconds rather than 43).

To make an assumptions that “perhaps Windows Vista is doing a great job with time-slicing and it’s not really doing multi-processing” (see time slice comment on Multitasking (Windows)) is a good point to make. According to this link, Windows (I’ve read in SysIntel’s article that Vista has a “fairer” scheduling than XP and older Windows) time-slices at 20 mSec per thread per process. That could justify that CreateProcess() method creating 2 processes, each owning 1 thread, time slicing 20 mSec each (totals to 40 mSec + 3 mSec overhead). To be honest, I cannot prove it… And what bothers me more is that when I actually create 8 processes and watch the TaskManager, I do see a flash of 8 processes created but most of them gets assigned to CPU#0 and only one commonly gets assigned to CPU#1. But then again, I’ve not done too much investigations because I’d use CreateProcess() for different purpose and I’d probably force the Thread Affinity (see Multiple Processors (Windows) for some info on setting affinities). But my guess is that because my tasks are so small, Windows was able to push all/majority of the tasks into CPU#0. I’m confident that if your tasks are more heavy-duty, Windows will probably do its fair job assigning each tasks to each CPU.

In any case, that’s the point of this page, make the O/S do the work of processing jobs in parallel, trust it, you do your part, let the smart guys at Microsoft (and if it is Linux SMP, the kernel developers) do their part about what they advertise on the box (from what I recall, only “Business”, “Ultimate” and “Enterprise” Vista supported dual processors – although I have a “Vista Home Premium” and it apparently supports dual processors since the task manager shows it and I can set affinities).

With that said, just like the SMP (Symmetric MultiProcessing), you should be able to trust Windows to distribute threads to multiple processors and should be transparent to you. You should just be able to create threads and trust that it will work. From several (work related) experiences, I’ve verified for myself that even XP Pro does a great job distributing threads to multiple processors. Again, the point here is that you should just be able to create threads, let it process it and trust that the O/S will distribute the tasks to appropriate/available processors. You can then brag that (thanks to Windows) that your application supports multi-processor and scalable to increase performance (let’s hope) as you run it on more processor system.

Finally, when or what would be the good reason to create sub-processes rather than threads? If you’ve studied the API of CreateProcess(), you’d notice immediately that the parameters represents execution of a “module” (as Microsoft calls it) based on passing data via a method of command line arguments. Folks that are familiar with the *NIX style O/S are much more comfortable with this methods because they are used to writing small applications (sometimes wrapped with GUI front-end). No offense to Windows application writers, but Windows .EXE are fat! Remember the .COM files days? .COM files had a requirement to fit into small footprint. Remember, smaller footprint means it’ll have less chances of (instruction) cache-misses. I believe that is the formula to parallel processing, to keep the jobs small and quick.

Another advantages of dealing with parallelizations as processes rather than threads is that it’s easier to implement distributed computing. Each run of the .EXE is small jobs, specific to its own purpose and nothing more. Or another method I’ve seen used is that the .EXE is a standalone tool that can be integrated from other tools to run a job. Have you ever written a simple command-line tool, and then you write another program and you have decided it would be easier to just call this tool you wrote from your C-code using C RunTime Library exec() (now called WinExec()) to run the file? Microsoft recommends that rather than using WinExec(), to use CreateProcess().

Oh, and keep in mind that nothing is preventing you from having multi-thread inside each processes. The art of programming multi-thread takes time to adjust and master (especially debugging to catch and determine the causes deadlocks), but once you get the hang of it, it’ll become easier, you just need to keep practicing.

Related posts

Leave a Comment :, , , , , , , , more...

Google AdSense

Google Analytics

Google AdSense Search

Categories