Lead Developer, Stardock Entertainment
Published on May 1, 2006 By CariElf In GalCiv Journals
Our top priority since 1.1 came out has been resolving the memory and performance issues that people have been having. BoundsChecker didn't come up with any significant memory leaks, so that left us with checking the change logs to see what we might have done to cause the issues. We were obviously doing something wacky, because people with 4 GB page files were still getting out of virtual memory errors.

One of the changes we made was to the way that the shipcfg files were parsed. The shipcfg files are really just ini files, which I think Joe picked for the shipcfg format because ini files are supposed to be fast for reading and writing. They have two main disadvantages. The first is that you can't have an ini file greater than 64 kb on Windows ME and Windows 98; it won't be able to read anything past the 64 kb mark. The second is that you have to know what is the size of the longest section name, and the longest key name, or else know what all the section and key names are before you go to read in the data from the file. If you don't know the section and keynames, you have to call APIs to get all the section and keynames that are in the file, and you need a buffer to store them.

There are two ways to allocate memory: static and dynamic. Static means that you always allocate the same amount of memory, no matter how much of it you actually need to use. If you allocate too much, you'll be taking up memory you aren't using, but if you don't allocate enough, you'll run into problems because you don't have enough to use. Dynamic memory is allocated as you need it, when you need it. Since you're in charge of dynamic memory, you have to remember to deallocate (release) it. When you forget to deallocate it, it becomes a memory leak. As far as your program and OS are concerned, that memory is still being used and is unavailable to be allocated to something else. Another bad thing about dynamic allocation is that you can fragment memory.

I'm assuming that most of you have seen Windows' disk defrag program. If you haven't, you might want to go to Start->All Programs->Accessories->System Tools and run it, because your had drive probably needs it. It will also give you a visual indication of what I'm talking about in this paragraph. When you create or copy files to your hard drive, the files are copied into consecutive blocks of memory. If you delete one or more of those files, that leaves a hole in memory. Depending on how big the hole is, it might get filled with other files. But if it's too small, it's just wasted. Disk defrag goes through your hard drive and tries to move stuff around so that it is stored more efficiently, with bigger blocks of available memory. Fragmentation can also happen in system memory, aka RAM. So even if you deallocate dynamic memory, there's a chance that the memory you released won't be used again by your program, so it keeps using more memory. When the program runs out of available RAM, it will start using virtual memory. Virtual memory is really just a section of the hard drive that is set aside for the operating system's use when it runs out of RAM.

So how does all of this relate to GC2's issues? The shipcfg files originally used static memory allocation for the buffer that was used to get the section and keynames, but if you added enough jewelry and components to the ship, the buffer wasn't big enough. We needed a quick way to fix this that wouldn't involve re-engineering the shipcfg code and that wouldn't make reading in the shipcfg files take longer. The quickest change to implement was to switch from using static allocation to using dynamic allocation. In order to make sure that the buffer was big enough, I suggested that we dynamically allocate a buffer that was the same size as the file. The bad thing about this solution was that it didn't take into account that we weren't saving the parsed data values in memory after reading in the file for the first time. Every time you built a colony ship, the colony ship cfg file was read in and a buffer was allocated and deallocated. So I'm thinking that we were probably fragmenting memory.

Last week, I wrote code to make GC2 saved the parsed values from the shipcfg files in memory, so that they would only have to be read in once. It would mean that we're hanging on to a little more memory than we were before, but it should cut down on the fragmentation. It definitely cuts down on the amount of time spent creating ship graphics, which you will notice when loading a save game. If it's not enough, we may still have to change the shipcfg file format, but keeping the parsed results in memory will help keep load times down. I wrote the new shipcfg code in such a way that I should only have to replace one function if we need to switch the file format, the one that actually reads in the data. The code that uses the stored data to put together a ship with all its components will not need to be changed.

Another problem area is the save game code. Writing to the hard disk is slow, so it's quicker to create the file in memory and then write it out to the file in one fell swoop rather then writing out each datum as you go. Since the exact file size of a given save game is unknown, the save game code uses dynamic memory allocation. Each object in the game (ie ships, planets, civs, etc) has its own function to create a block of memory containing all the data it needs to store, which it then passes to the main save game function, and is added to the main block. This is using the same code as in GC1, but we had less dynamic data in GC1. Originally, all of the buffers started out as 1 kb and whenever the buffers needed to increase in size, they would allocate their current size + 1 kb and copy the data from the old buffer to the new buffer, then deallocate the old buffer. The process of growing and copying the buffers was taking up more time than the actual saving of data, and I needed a way to improve performance without doing major surgery to the save game code before version 1.0 came out. So I did some profiling on how big the buffers were for a gigantic galaxy in the first few turns of the game and how much they grew, and used those numbers to change the initial buffer sizes and how much they grew by for each data object. This was, admittedly, more of a band-aid than an actual fix.

Apart from adding some new things that needed to be saved, I don't think that we've really touched the save game code much. However, it is still fairly inefficient because of all the buffer growing and copying. So the next change I have started to make is to make all the data objects use one buffer. Once that is done, I can make further optimizations to the code like initializing the buffer size based on the galaxy size, and see if it does more good than harm to keep the buffer in memory so that it doesn't have to allocate and deallocate 2-13 MB (or more) every time the game saves. At the very least, making everything use one buffer should cut down on fragmentation and make the saving go quicker. I will also be reviewing the code to make sure that only necessary data is being saved rather than being recalculated, in an effort to cut down loading time.

Once I've finished making our memory usage more efficient, I'll start working on the modding stuff again.



Edit: Ok, since I'm getting e-mails and comments about this, I would like to clarify that I am not blaming your hard drives for causing the crashes. The point of this article is that I am working on resolving the memory issues. The comment about running disk defrag was meant as a general statement that you should regularly defrag your hard drives, and to provide a visual representation of what is happening when memory is fragmented.

Update: In my sticky thread here Link I put instructions and a link for an unofficial test exe.
--Cari

Comments (Page 3)
6 Pages1 2 3 4 5  Last
on May 02, 2006
I'm surprised though that heap fragmentation is actually an issue... would it not be required that the allocations be both asymptotically increasing in size, and their deallocations be sporadic? Maybe do a profile to see if things are hanging around too long and try and deallocate them earlier if possible?


I'm not sure about the requirements for memory fragmentation. The symptoms seemed to match the problems we're seeing, though, and the changes to how the shipcfg files were read in seemed like a likely candidate. I thnk that we will need to do profiles even with these changes to see what is using all the memory, because GalCiv2 does seem to be using a lot.

So the the bottom line is that quick solutions are very often NOT the BEST solutions. So avoid quick solutions which cause your customers unnecessary grief. Secondly testing without lesss than 2 GB pysical memory was a very dumb move. Inexcuseable. I have 1GB memoery and most people have 500 MB. Since I saw posts on how GC II1.0 was optimized for lesser memory confiurations the same should have applied to your testing process for 1.1.Assumption is the Mother of all f-ups as the vulgar expression goes and Stardock blew it royally here. Hopefully this will be an OBJECT lesson that Windows blinded developers will learn for a long time to come.


os2wiz, would you like some help down from your soapbox? It's this kind of response that makes me not want to write dev journals because there's always someone like you with a holier than thou attitude to make condencending remarks. I don't mind criticism, when it's not handed down in such an insulting manner. If you can't be civil, keep your snotty remarks to yourself.

That being said, yes, we should have tested it on our lower end test machines, which we will be definitely doing before we release another update..
on May 02, 2006
Actually I have always been scared of using defrag. On a large drive it is an overnight job, and if the power goes out your system is pretty much screwed ... no? Please correct me if I am wrong here ...


Hmm, I don't know. I've never had that happen. I may have to go out and buy an automatic power backup now. My new computer has 250 GB hard drives.

on May 02, 2006
So the the bottom line is that quick solutions are very often NOT the BEST solutions. So avoid quick solutions which cause your customers unnecessary grief. Secondly testing without lesss than 2 GB pysical memory was a very dumb move. Inexcuseable. I have 1GB memoery and most people have 500 MB. Since I saw posts on how GC II
1.0 was optimized for lesser memory confiurations the same should have applied to your testing process for 1.1.
Assumption is the Mother of all f-ups as the vulgar expression goes and Stardock blew it royally here. Hopefully this will be an OBJECT lesson that Windows blinded developers will learn for a long time to come.


That seems rather uncalled for. The statement was that the developers and artists computerse were state of the art, not the testers. They also beta tested on HUNDREDS of computers. That is what a PUBLIC beta is for. The reason this slipped through is not due to not testing on lower end systems (this occurs on high end systems as well), but due to the fact that none of the beta testers wrote in about it.

Quick solutions may not be the best, but i'd rather have a quick solution for the short term while they work on a better one that not play the game for three months. Stardock is doing 10 times the support most companies do, and you deserve to be flamed for treating them with such disrespect after that.
on May 02, 2006
Hmm, I don't know. I've never had that happen. I may have to go out and buy an automatic power backup now. My new computer has 250 GB hard drives.

I, unfortunalty had power go out when defragging. Kills the computer, pretty much. Now, I'm a firm believer in large backup power supplies. (Losing 4 term papers sucks)

on May 02, 2006
You are right about that. The real danger is brown outs though, as they can destroy parts not just corrupt files. I have a UPS system with AVR that will run my whole office (5+ 19" CRT monitors) for more than an hour should I need it. Well worth the $250 when you consider that one of those monitors alone is worth more than that.

I would suggest a device from APC's RS series. Excelent devices with plenty of AVR capabilities and extra power for those extra fans and hard drives.
on May 03, 2006
why don't you save the ship configs as binaries instead of parsing ascii files?
on May 03, 2006
Stange that as i have had a power cut in the past whilst defragging the HD. Never had a problem with the pc since. Just fired it up in safe-mode and redid the defrag again. ? ?

BTW Cari is there an ETA for this fix as a lot of my members and others here have now 'hung up' the game until the mem leak is fixed as they say it is not now worth playing. Some even had it happen on a small map too. Personaly i have yet to see this memory error although i do have 1024mb of RAM and 3GB virtual memory allocation on a seperate drive.

Thanks

DG
on May 03, 2006
why don't you save the ship configs as binaries instead of parsing ascii files?


Actually, I guess that would be a better solution in terms of fast i/o than changing to XML if the changes I already made aren't enough. We've usually don't use binary format for data files, though, so that people have an easier time modding them.

BTW Cari is there an ETA for this fix


I may be able to put up an unofficial test build with my changes later today as a link.
on May 03, 2006
I may be able to put up an unofficial test build with my changes later today as a link.

That would be great!
on May 03, 2006
I may be able to put up an unofficial test build with my changes later today as a link.


Looking forward to it as are most of my members

on May 03, 2006
I have had very long save times since before the 1.1 beta. It never crashed though until recently. By the way its not
too difficult to pull out 1GB of memory from the developers computer. It takes about 3 minutes to open the case pop out the the one or two memory modules involved and close the case again. I think my remarks were not only on the money, but also humorously sarcastic. You failed to note the capitalized OBJECT lesson and the "windows -blinded "
developers. I also would enjoy pointing out. That os/2 uses memory FAR more efficiently than windows, and that the same code written to the os/2 platform would not have required the same physical memory and the swap file works far better than the so-called virtual memory of windows. My remarks were actually a triple entendre: a swipe at Stardock on the specific problen, a swipe using the name of Stardock products, and a swipe at the winows platform as still quite inept in certain api's and efficiency of resource utilization.
on May 03, 2006
Cari does the test build rectify the issue with the saved game files not loading? Or is that issue something else completely?

Thought i would ask so you hopefully dont get a hundred people asking the same thing
on May 03, 2006
I'm not sure about the requirements for memory fragmentation. The symptoms seemed to match the problems we're seeing, though, and the changes to how the shipcfg files were read in seemed like a likely candidate. I thnk that we will need to do profiles even with these changes to see what is using all the memory, because GalCiv2 does seem to be using a lot.


Neither am I, but I think sporadic deallocation would be the key to any heap free-space fragmentation. But looking back at your journal you say you deallocate the buffer after parsing, so I'd say something else must be at work. Unless the allocator isn't consolidating the free space blocks? I can see why they might do that for the sake of speed, but I would imagine that would make for a really poor allocator. And even if they did, as long as a ship parse completed once subsequent parses (of the same file) should be able to re-use the same chunk of memory, so they shouldn't contribute to any fragmentation.

I can see why you think of heap fragmentation when the virtual mem is being eaten up, as you'll see that when there's fragmentation, but unless the allocator is brain-dead and not consolidating free blocks there must be a large amount of memory that is staying allocated to cause any free-space fragments. And if that's the case then the real problem is that memory usage is continually increasing... any fragmentation would just cause the allocations to fail sooner. i.e., eliminating them would only delay the allocation failures, not solve them.

Heh, I guess this qualifies as armchair debugging eh? Please keep up with the journals, I for one enjoy reading them.
on May 03, 2006
Before I read the entire list of comments I wanted to add this.

I have a new dual CPU Pentium D 820 2.8 Mhz x2 with 2 Meg ram.

I noticed that the load and save times were dramatically significantly increased when I used the Terran Fleet Pack from the download library and I had a lot of transports and constructors with that extra jewrly running around in a large or better universe in the late game.

1.10 seems to have made this better, but I definately concur that the extra ship components were the severe memory problem. It was very obvious in playing a game with and without those very detailed ship designs. WIthout them the game flew on my machine, with them it started to crawl during the save games and the load times took minutes instead of seconds.
on May 03, 2006
I really do not see the uproar over my comments as being warranted. They were stated forcefully, but not insultingly. Carielf has a very thin skin, indeed, if my caustic comments, which were entirely accurate, get under her skin. There is NO reason for Stardock not to have a handful of computers used for testing other than the state of the art developer's machines. I have participated in betas before for other software firms and most of the better one's do have a few
variable configuration machines for internal testing not a complete reliance on the public testers It is always better
public relations to catch the mistake in house before the release of a finished product. I will not apologize for my comments which were accurate, insiteful, and wryly humorous. I am not in the habit of kissing butt to have my views respected. They stand on their own merit. If you can't take the heat get out of the kitchen.
I do not wish an antagonistic relationship, but I am direct in what I say and I am not going to change that for anyone. Unless of course you have a team of assasins ready to be dispatched to my home. Then slap me silly please and accept my humble apologies. Kill the messenger, kill the truth.
6 Pages1 2 3 4 5  Last