Adventures with Fritz

This is an article about testing and some of the problems I encountered during engine-engine matches using FRITZ as base software. It's my understanding this article is a must-read for users who like to play these engine-engine matches with Pro Deo. This article also can be important to my colleague chess programmers because I don't know if these problems may also occur when testing their own engine.

This article will be put on the CCC discussion board in the hope to create awareness, to receive useful comments, ask other testers and chess programmers either for confirmation or denial of the below listed problem.


Since 4-5 years I am using the eng-eng match technique as the final piece to test the changes I make. During the first 3-4 years the eng-eng testing was done under the REBEL DOS interface, but this testing was limited because it could only play against itself. The moment I had made my engine available to run under other interfaces I thought it would be an improvement to move to a different eng-eng testing environment that allowed me to test against more opponents.

From the alternatives I choose for the FRITZ software mainly because of its user-friendly eng-eng match software. I created a set of 100 balanced opening positons and 4 fixed sparring engines (Fritz8, Shredder7, Junior8 and Hiars8) and let them play on 4 PC's at various levels, each producing 200 games, thus 4x200 = 800 games in total.

Testing is done without any learning activated, no opening books, same hash table size, same engine parameters, meaning: exclude all randomness that possibly may influence the progress of a game. Re-running the test should simply produce an equal result or something very close.

This procdure was repeated several times to ensure its reliability and without any exception all of the replayed 800 game matches produced an acceptable error margin between -1% and +1%. It seems the system was working and I had created myself a reliable testing environment to test program changes, run the 800 game eng-eng match to see if it would produce a higher match score. So far so good.


During time I noticed something odd, that the match results against Shredder7 and Junior8 went down considerable and on the other hand the match score against Hiarcs8 went up, also considerable, all of this as a pattern. This pattern remained so constant it made me suspicious and so I ran the initial match again and there it was, it produced a -3% match result, meaning a loss of 20 elo points for no good reason. My test environment was not reliable anymore, Houston there is a problem.

I double-checked all the settings I was using that could explain this sudden fluctuation in score and found none, all the conditons were the same until I noticed something there had been an unimportant change after all, that at a certain moment I had set the main engine (the one that is loaded at program start) on all 4 PC's to FRITZ8.

I couldn't believe this change could make any difference at all else it would mean 1 or 2 of the engines is not correctly loaded, meaning entering the world of bugs. I decided to find out nevertheless, after all I had no other clue than this.

The experiment

I took an older version (Rebel 12.00.01) and ran 3 exact 4x200=800 games test-matches (time control 40/5) with the following exception:
  • Match-1, FRITZ8 loaded at program start.
  • Match-2, own engine loaded at program start (Shredder loaded with Shredder, Junior with Junior, etc.)
  • Match-3, Pro Deo loaded at program start.
It should produce match scores within an error margin of -1% or +1% else something serious is wrong with the testing technique itself which is either related to bugs or to the fact that 800 games is still not enough to ensure a -1% or +1% error margin. The results are telling and leave no room for speculation, there is something wrong with the testing environment.

  Match-1, FRITZ8 loaded at program start            38.1%
  Match-2, own engine loaded at program start        40.8%
  Match-3, Pro Deo loaded at program start.          42.8%

An unbelievable and unacceptable difference of 4.7% which corresponds with an elo difference of more than 30 elo points depending on what engine is loaded at program start.

Where to go from here?

It's tempting to advice users to have Pro Deo loaded at program start all the time (eng-eng and auto232) to ensure the best results but somehow this is an unsatisfactory thing to say, it's more constructive to start searching for the reasons behind and look for water-proofed solutions, hence I put this article on the CCC forum for discussion. An interesting information for me would be to receive the experiences of fellow programmers and testers, maybe things are entirely Pro Deo related after all.

My conclusion so far is that I could not find any satisfactory explanation why Hiarcs8, Junior8 and Fritz8 match scores fluctuate so much. There is a possible reason for Shredder7, its settings are not correctly remembered, from time to time Shredder7 uses "position learning" after all, no matter the fact it is turned off. Other engines have this (settings) problem as well, for instance Chess Tiger 15 starts with the Gambit style as default setting, when you change it to Normal, exit and restart the program the Gambit style is active again.

More ChessBase oddities

Other ChessBase oddities that are NOT related to this topic (which engine loaded at program start) but general hints for accurate testing, the below listed oddities are easily to overcome.

  • This article is based on my experiences with the Fritz7 interface, the Fritz8 interface might be a different story. My preference for the Fritz7 interface is mainly because Fritz8 doesn't save Pro Deo's current personality right, Fritz7 does.

  • There sometimes is a problem Fritz7 starts with the wrong Pro Deo personality. While the WB2UCI.ENG adaptor clearly states to use engine_X the Fritz7 interface ignores this and starts another engine. The problem occurs about 10% of the time. I have no idea if this problem still exist in the Fritz8 interface. The cure is to exit and restart Fritz7. So always check the param.txt file to see if a match is initialized well, see the Pro Deo FAQ for details.

