Lead Developer, Stardock Entertainment
Published on July 22, 2005 By CariElf In GalCiv Journals
The crashes and memory errors made it impossible for me to test to see if the AI was building starbases, so my main priority this week was debugging the crashes and memory errors. I did some multi-tasking and worked on the fleets in a clean copy of the project while running the game in the AI test mode and waiting for it to crash, but more on that later.

At first, I tried using BoundsChecker, which works well for memory and resource leaks, but doesn't really help much for heap errors, which is what I was trying to find. There were places where the heap errors were more likely to show up. So running in the debugger without BoundsChecker, I started adding in a lot of calls to _CrtCheckMemory and commenting out or otherwise disabling code. While I was narrowing down the cause of the heap errors, I was able to find and fix areas of the code that caused crashes, a lockup, or was potentially non-threadsafe.

The errors and crashes ended up having nothing to do with the STL. About 20 minutes ago, I thought t that I had determined the cause of the heap errors, but the changes I made either didn't fix the problem, or there are other things still in the code causing heap errors. So I'll have to run some more tests on Monday.

I got a good portion of the fleet code in, but I'm still working on the functions that handle collision detections. I'll also still need to hook up the fleet combat window. The starbase animations are going to have to wait until I can get rid of these crashes and have enough of the fleet code in for Jesse to work on fleet battles.

Comments
on Jul 22, 2005
Is it dangerous to hunt bugs? Do they ever fight back or are they more into running away and hiding? What do you do with them when you find them? Eat them, stuff them and mount them on the wall, or is it all about the hunt? Maybe you should share them with your friends and family, I'm sure bugs will make a great gift. Happy hunting
on Jul 23, 2005
Why did you think it it would have something to do with the STL? I tend to think that the STL is quite stable and secure to use (it's just that the C++ template syntax has been created in order to show programmers what sadism ment).
Don't you have a tool like Purify that helps find memory corruptions? Or is purify too slow and bloated to work with GC?
I remember hunting a nice function pointer cast for several days. Do you pass funciton pointers as arguments to functions? A wrong signature can do weird things to the registers.
on Jul 25, 2005

Strange, it didn't post my reply the first time around.

LDiCesare,

Because most of the time when it was crashing, the stack was all STL code, and I had read an article about previous versions of the STL not being threadsafe.  There wasn't anything specific in either the MSDN or on Dinkum's site that said that these issues had been resolved for the newer versions of Visual Studio. 

I hadn't heard about Purify before, but it sounds like BoundsChecker.  The problem with debugging heap errors either with _CrtCheckMemory or BoundsChecker, it's difficult to tell what actually caused the error because the error only shows up when something (like in the STL code) checks to see if the heap is ok.  You can narrow it down by checking the heap a lot more often, which is what I did. It probably wouldn't be so bad if someone ran it on a regular basis, but we tend to not run it until we have a problem because it slows things down horribly when you debug with boundschecker enabled.

We do pass function pointers as agruments, mostly for sorting lists, but we've never had a problem with it. 

It looks as if Joe found the last of the areas where deleted memory was being modified.  He told me where it was and I fixed it, and now GC2 seems happy.   So I commited all of my fixes, and I can go back to coding features.

on Jul 25, 2005
Hi CariElf,

I am neither a programmer nor someone in the know, just a gamer.

I missed my calling twenty years ago and love listening to you guys (gals) talk about the programming profession. I love playing your work and hope that you don't stop! You people amaze me and have nothing but my greatest respect. Plus, you get to do what you love, which earns nothing but my envy.

Take Care,

---Ray
on Jul 26, 2005
Microsoft's compiler adds a certain character to the start and end of each block allocated on the heap. One way you can sometimes get useful checking done is by seeing if any of these characters have been overwritten in the areas where _CrtCheckMemory is being called. Also, if you are in _CrtCheckMemory, sometimes getting the pointer at which the check failed can give you some insight into what variables were set in the area. If you know what the values of the variables or objects are, you can often find out what part of the code where they were last manipulated, and then from that knowledge debug that section of code to find the problem.
on Jul 26, 2005

Ray, thanks for the good word.

James,

Isn't that how _CrtCheckMemory works? It knows if someone did something bad if the bytes are overwritten, and it throws an exception?  It sounds like your advice would work well for buffer overruns, which BoundsChecker detects, but we were dealing with memory being modified after it was deleted, so the memory was always junk when I looked it up in the memory window.   

 

on Jul 26, 2005
Im assuming you guys are taking advantage of GFLAGS (Which allows you to do all sorts of heap debugging)?
on Jul 26, 2005
Are you sure it was junk and not just an object that looked like junk? Hmmm . . . I've never had to deal with using memory after it's deleted; since I'm still in school I have thus far managed to stay on the stack most of the time.

If the part of memory that is causing the problem is deterministic (i.e. it happens at the same address every time you do it), then one possibility is laying down a watchpoint for when that memory is accessed. If you can do that, then it might be possible to locate the problem.

Anyhow, glad to hear you guys solved it. One other possible solution is to wrap things that have a good chance of getting deleted sometime in a boost::smart_ptr, if you guys are using boost at all. Looking forward to the final product.
on Jul 27, 2005
It looks as if Joe found the last of the areas where deleted memory was being modified. He told me where it was and I fixed it, and now GC2 seems happy.

Well, if it was accessing freed memory, purify would have found it. I don't know how BoundChecker works, but it seems to be only memory coloration? At work we used to use purify but it was too slow (instrumenting the dll's takes a lot of time and eats space) but we stopped using it only because we replaced it by some in-home tool.
on Jul 27, 2005

Are you sure it was junk and not just an object that looked like junk? Hmmm . . . I've never had to deal with using memory after it's deleted; since I'm still in school I have thus far managed to stay on the stack most of the time.
  When you start having a lot of pointers to something, it's easier to do, particularly if you're using multiple threads.  Direct3D is based on COM, and the objects track themselves so that they don't get deleted until everything that is pointing to them releases the pointer.  I rather like that system.

If the part of memory that is causing the problem is deterministic (i.e. it happens at the same address every time you do it), then one possibility is laying down a watchpoint for when that memory is accessed. If you can do that, then it might be possible to locate the problem.

Nope, it was in different places; it was occurring in objects or data types that are constantly being added or deleted.

Im assuming you guys are taking advantage of GFLAGS (Which allows you to do all sorts of heap debugging)?
  Actually, no, I hadn't heard of it before.  I'll check into it. 

At work we used to use purify but it was too slow (instrumenting the dll's takes a lot of time and eats space) but we stopped using it only because we replaced it by some in-home tool.

Bounds Checker also uses instrumentation, which is why it's so slow.  There are various settings that you can check for memory debugging, but they are all pretty slow.  The slowness is the main thing that detracts from its usefulness.  For memory leaks, I can let GC2 run in BoundsChecker in AI test mode and ignore it on one machine and work on other stuff on another computer and then fix the memory leaks after shutting down GalCiv2.  Checking for the heap errors, I was constantly changing code and restarting GalCiv2 after a heap error was detected.

on Jul 27, 2005
link about GFLAGS: http://www.osronline.com/ddkx/ddtools/gflags_00s3.htm

"GFlags (gflags.exe), the Global Flag Editor, enables and disables advanced internal system diagnostic and troubleshooting features. You can run GFlags from a Command Prompt window or use its graphical user interface dialog box."

Highlights:

-The Enable heap free checking flag validates each heap allocation when it is freed.
-The Enable heap tagging flag assigns unique tags to heap allocations.
-The Enable heap tail checking flag checks for buffer overruns when the heap is freed.
-The Enable heap validation on call flag validates the entire heap each time a heap function is called.
-The Enable page heap flag turns on page heap verification, which monitors dynamic heap memory operations, including allocate and free operations, and causes a debugger break when the verifier detects a heap error.