We are not POLYBIUS
Hi folks!
Cody here, I hope you are doing well and you're all safe and sound during this troubling times!
I promised you guys a post about what's going on about the hard AI, so here it goes.
First thing first. I have been programming for more than 10 years now and I have a fair bit of experience but AI, is not my field of focus and before Lorenzo, all my knowledge on the matter was coming mostly from stuff I studied in my free time making AIs for personal projects, and from working close with people that were developing AIs for games I was somehow involved with. That said, I've always wanted to program an AI for a "real" game so when Lorenzo came about, I decided to not pass on the occasion and take care of the matter myself. Cocky I know. But, it made perfect sense budget wise, since hiring an external expert would have required a big slice of our budget and there is always the risk of not getting exactly what you want and reworks can get really expensive.
Now, bare in mind that this is not a technical dissertation so I will try to keep it simple and short, avoiding to go into too much detail.
The first thing to do when making an AI is to analyze the challenges the AI will be solving, in order to decide what kind, or pattern, of AI to implement.
In our case the main goal of the AI is to find an optimal sequence of action in a pool of finite options. Lucky for us, there is an algorithm that it's pretty good at that.
Monte Carlo Tree Search or MCTS for short.
MCTS is usually the preferred method to write AIs for boardgames and card games (including Google's AlphaGO AI)
I won't go into the details of the MCTS but in a nutshell the principle behind it is really simple; from the point that we are in the game, we try (simulate) different sequences of actions over and over until we find the best one.
Ideally, with MCTS we would simulate a sequence of actions until we reach the end of the game so that we can evaluate the actions we undertook based on the output of the game. If an action led us to a win, we increase its value. At every simulation, we try different actions so that after a decent amount of simulations we can have an idea on which are the juicy actions to undertake and in what order.
Given that I couldn't simulate actions over actions till the end of the game without cheating or putting in place a really good heuristic for the randomness of the dices further down the line I had to limit the "depth" of the simulation to a single round and come up with a method to determine if a sequence of actions was good enough; in short I had to settle for a local maxima.
That presented a number of challenges and I had to try different approaches. I tried evaluating the final score as if the game would end after the sequence of actions I'm evaluating, the number of cards, resources and a lot of other combinations. Another parameter to take in account when evaluating the goodness of a sequence is its length aka how many family member I'm using to reach the result I'm evaluating? Is it better a sequence that gets me to 50 points and 3 cards with 4 family member or 30 pints, 1 card with only 2 family members? Sometimes the answer might not be so easy as it might appear. At the moment, I settled for the points per cards ratio, not taking in account the length of the sequence.
But it doesn't ends there, how do we factor for faith?
This was and still is one of my biggest problems. Faith, if we don't want the AI to be excommunicated, needs to be collected in two separated rounds, and we can only simulate one. So, do we take all the faith we can now or we wait the next one? Do we really want to collect the faith or is better for us to concentrate on other cards? If I want to collect the faith which is the best way to do it?
All these questions and many more nuances had to be taken in account when evaluating actions to take.
I tried out different solutions and algorithms everyone with it's ups and down.
You might start to see why the development process took so long and why we don't have a hard AI yet. The more and more I went further in the development and try out of solutions, the more I run in particular situation that need to be accounted for.
And every further evaluation the AI had to take in account, added time to its computation time, reducing the amount of tries/simulations I could run.
To make things worse, even by simulating only four family members actions at most, the amount of combinations it's pretty high if we consider that some cards offers options for payment or activations while other cards lets you acquire more cards, or privileges and each council privilege can be resolved in 5 different ways.
And then we have leaders.
This was another big challenge because the advantage of activating a leader it's not something that it easy to put into numbers and it depends on a whole set of factors.
Put all that together and you can see for yourself that, blindly simulating stuff, with a limited number of simulations due to time limits wasn't not going to cut it.
So, I had to build on top of everything, a way to pre-select actions to try, based on a set of weighted parameters.
I decided to settle on the following approach:
I started with a few parameters to determine what a good objective would be and then I added more parameters to evaluate the goodness of a move in relation to the selected objective.
From there I kept adding more and more parameters, trying to fine tune the evaluation systems. I got to a point where the number of parameter got a bit out of control and keeping them balanced started to be a challenge. At one point I started using genetic algorithms to try to find the optimal values for the weights but it was a bust, especially because if I had to add a new weight after running the genetic algorithm, I had to potentially recalculate all the others weights and a genetic algorithm is not something fast at finding solutions.
Things started to getting crazy but I thought I was getting closer to having a hard AI.
(here a gif showing the piece of code that populate all the parameters/weights of the AI)
( example of weights being calculated by the genetic algorithm )
Before continuing I have to explain how I test the an AI.
Having done all that and being satisfied with the results, I do it again but activating the leaders option, because leaders are such a hurdle that an AI that plays good without not necessarily plays good with them. This is mainly because when we play without, for each round the AI sets cards or harvest/production as objectives, while with leaders it has to targets specific amount of resources, which can be harder because you have way more options to get resources than to get a single card and as I said before, it's harder to weight the goodness of such plays ( for example, if I have to get 18 coins, I might want to go to the market, to the council and avoid spending coins; doing so I'm usually not getting any card or getting cards that don't have a real value in terms of strategy - situations like these are really hard to evaluate, considering that the advantage that the leader could give me is even harder to evaluate ).
During these test I sometimes obtain exceptionally good games or terribly bad ones; when that happens I save the random seed of the match (which is the only thing I need to reply the exact same match) and I add it to a pool of "critical matches". If the AI I'm working on is promising, somewhere between points 1 and 3, I test the AI in these critical matches, to see if it performs better or worse than it's predecessor.
( a really bad match )
Ok, now that I explained how I test an AI, back to developing process!
As I was saying, things went crazy but I thought I was getting closer, until I got to the point where in test 1, I could hardly advance the median score over 100 points (as a matter of fact I got to a maximum median of a 105 so far, although the current medium AI shipped with the game has a median of 99 in test 1 ).
Eventually I had to take a step back, re-evaluate what I was doing and decided how to proceed.
As Elon Musk said a while back, during a Starship update event : "If a design is taking too long, the design is wrong"
I didn't actually want to start from scratch (sorry Elon, of course I still love you) mostly because I was and I still am, convinced that MCTS is the way to go.
What I did is changing radically the evaluation functions. Where Google's AlphaGO uses a specifically trained neural network to evaluate the goodness of a move, I decided to rely on a series of statements, rules if you will more readable and simple then a big bunch of weights.
What that mean is that the basic approach is still the same, find and objective and build towards it but instead of having countless generic parameters to decide the objective I have now a dynamic set of logic rules. In a nutshell, what I did is giving every cards a starting value, and then define a set of rules to modify that value as the game progress. The idea is to reach a complex evaluation function by putting together simpler statements.
It might not look like a big change over the previous direction but it actually is for a few reason:
It's basically like Lego, I just need to code the basic bricks and then try to put them together in a meaningful structure. Obviously there is no guarantee that this way is the right way, but it's a way and it's what I'm working on right now.
At the moment I already have a set of basic rules ready; a quick test with just a few rules set, resulted in a 76 points median on test 1 and 63 playing against the current medium AI (which scored a median of 74).
( current approach test#1 )
( current approach test#2 )
Here are a few screenshot with examples of how rules works:
( first we have a set of basic values for each card )
( then we have the actual rules - in this case for example, we have a rule that says that if we own the Forest, then Carpentry value increases by 25 points; at the same time, another rules says that if we have the Forest, the Guild Of Painters increases its value by 40 points too )
( here we have a rules that says that if we have 5 terrains then the value of the Farmer is increased by 50 points while the following rules says that if we have at least 4 terrains, the value of the Noble is increased by 100. We can also have more generic rules, like the last two ones that increases the value of all terrains by 50 if we have already five or less then or equal to 2 )
( another example of the rules available at the moment. The first one says that if at least two cards among Monastery, Stronghold and Gravel Quarry are in my possession ( O ) or available for purchase (A) then the value of those cards is increased by 35 points.
Then we have a rule that says that the output of our harvest will produce 2 ore more stones, then the value of the Guild Of Sculptors is increased by 50.
Last but not least, we have a rule that says that if we have 12 or more gold in our warehouse then the value of all characters is increased by 70 points )
So this is it. This is where we are now. The problem is that right now, the time I can dedicate to Lorenzo is extremely reduced; we are a small studio and the human resources we have are limited and constantly dancing between projects.
Luckily though, I'm close to finish the implementation of what I consider to be the minimum set of rules to build a reasonable AI so, hopefully soon enough, I'll be able to give other people in the studio a tool to proceed with the development and testing so that I will know if all this change in direction is a bust or the real deal.
I guess that's it for now.

Cheers!
Cody
Cody here, I hope you are doing well and you're all safe and sound during this troubling times!
I promised you guys a post about what's going on about the hard AI, so here it goes.
First thing first. I have been programming for more than 10 years now and I have a fair bit of experience but AI, is not my field of focus and before Lorenzo, all my knowledge on the matter was coming mostly from stuff I studied in my free time making AIs for personal projects, and from working close with people that were developing AIs for games I was somehow involved with. That said, I've always wanted to program an AI for a "real" game so when Lorenzo came about, I decided to not pass on the occasion and take care of the matter myself. Cocky I know. But, it made perfect sense budget wise, since hiring an external expert would have required a big slice of our budget and there is always the risk of not getting exactly what you want and reworks can get really expensive.
Now, bare in mind that this is not a technical dissertation so I will try to keep it simple and short, avoiding to go into too much detail.
The first thing to do when making an AI is to analyze the challenges the AI will be solving, in order to decide what kind, or pattern, of AI to implement.
In our case the main goal of the AI is to find an optimal sequence of action in a pool of finite options. Lucky for us, there is an algorithm that it's pretty good at that.
Monte Carlo Tree Search or MCTS for short.
MCTS is usually the preferred method to write AIs for boardgames and card games (including Google's AlphaGO AI)
I won't go into the details of the MCTS but in a nutshell the principle behind it is really simple; from the point that we are in the game, we try (simulate) different sequences of actions over and over until we find the best one.
Ideally, with MCTS we would simulate a sequence of actions until we reach the end of the game so that we can evaluate the actions we undertook based on the output of the game. If an action led us to a win, we increase its value. At every simulation, we try different actions so that after a decent amount of simulations we can have an idea on which are the juicy actions to undertake and in what order.
Given that I couldn't simulate actions over actions till the end of the game without cheating or putting in place a really good heuristic for the randomness of the dices further down the line I had to limit the "depth" of the simulation to a single round and come up with a method to determine if a sequence of actions was good enough; in short I had to settle for a local maxima.
That presented a number of challenges and I had to try different approaches. I tried evaluating the final score as if the game would end after the sequence of actions I'm evaluating, the number of cards, resources and a lot of other combinations. Another parameter to take in account when evaluating the goodness of a sequence is its length aka how many family member I'm using to reach the result I'm evaluating? Is it better a sequence that gets me to 50 points and 3 cards with 4 family member or 30 pints, 1 card with only 2 family members? Sometimes the answer might not be so easy as it might appear. At the moment, I settled for the points per cards ratio, not taking in account the length of the sequence.
But it doesn't ends there, how do we factor for faith?
This was and still is one of my biggest problems. Faith, if we don't want the AI to be excommunicated, needs to be collected in two separated rounds, and we can only simulate one. So, do we take all the faith we can now or we wait the next one? Do we really want to collect the faith or is better for us to concentrate on other cards? If I want to collect the faith which is the best way to do it?
All these questions and many more nuances had to be taken in account when evaluating actions to take.
I tried out different solutions and algorithms everyone with it's ups and down.
You might start to see why the development process took so long and why we don't have a hard AI yet. The more and more I went further in the development and try out of solutions, the more I run in particular situation that need to be accounted for.
And every further evaluation the AI had to take in account, added time to its computation time, reducing the amount of tries/simulations I could run.
To make things worse, even by simulating only four family members actions at most, the amount of combinations it's pretty high if we consider that some cards offers options for payment or activations while other cards lets you acquire more cards, or privileges and each council privilege can be resolved in 5 different ways.
And then we have leaders.
This was another big challenge because the advantage of activating a leader it's not something that it easy to put into numbers and it depends on a whole set of factors.
Put all that together and you can see for yourself that, blindly simulating stuff, with a limited number of simulations due to time limits wasn't not going to cut it.
So, I had to build on top of everything, a way to pre-select actions to try, based on a set of weighted parameters.
I decided to settle on the following approach:
- At each round I would pick a good objective, either a card to obtain or an amount of resources to reach (for activating leaders mostly or avoiding being excommunicated) based on a series of weight that I called "strategical weights"
- I would then weight the available actions on the base on how close they would get me to my objective; these are the "tactical weights"
- Try out the best N actions and then pick the best based on the evaluation formula we talked before ( N being another parameter that I could vary )
I started with a few parameters to determine what a good objective would be and then I added more parameters to evaluate the goodness of a move in relation to the selected objective.
From there I kept adding more and more parameters, trying to fine tune the evaluation systems. I got to a point where the number of parameter got a bit out of control and keeping them balanced started to be a challenge. At one point I started using genetic algorithms to try to find the optimal values for the weights but it was a bust, especially because if I had to add a new weight after running the genetic algorithm, I had to potentially recalculate all the others weights and a genetic algorithm is not something fast at finding solutions.
Things started to getting crazy but I thought I was getting closer to having a hard AI.


Before continuing I have to explain how I test the an AI.
- First I try it out in a batch of 50 games where they basically play alone. This is the optimal situation, no one is interfering with you and no one is going to steal your cards or actions on the board. If it's promising I then do the test again with 250 or 500 games, to be sure that is not a fluke.
( results of test #1 for the current medium AI - IDLER_CONSUL is a dumb AI that always plays the same action: council palace for religion )
- If the results are good, I proceed with 50 games, 1 on 1 between the new AI and the current medium one. Again, if promising I re do the test increasing the number of games.
- If that test passes too I try 50 games, 1 on 1 only with the new AI to see how it handles an opponent that it's reasoning in the same way and potentially steals your moves; aka I want to see how the AI adapts to a continuous change of plans. The idea is that the AI should be able to change it's strategy and adapt to the opponents moves.
- The "last" test is with 4 players, against 3 current medium AI first and then against itself.
Having done all that and being satisfied with the results, I do it again but activating the leaders option, because leaders are such a hurdle that an AI that plays good without not necessarily plays good with them. This is mainly because when we play without, for each round the AI sets cards or harvest/production as objectives, while with leaders it has to targets specific amount of resources, which can be harder because you have way more options to get resources than to get a single card and as I said before, it's harder to weight the goodness of such plays ( for example, if I have to get 18 coins, I might want to go to the market, to the council and avoid spending coins; doing so I'm usually not getting any card or getting cards that don't have a real value in terms of strategy - situations like these are really hard to evaluate, considering that the advantage that the leader could give me is even harder to evaluate ).
During these test I sometimes obtain exceptionally good games or terribly bad ones; when that happens I save the random seed of the match (which is the only thing I need to reply the exact same match) and I add it to a pool of "critical matches". If the AI I'm working on is promising, somewhere between points 1 and 3, I test the AI in these critical matches, to see if it performs better or worse than it's predecessor.

Ok, now that I explained how I test an AI, back to developing process!
As I was saying, things went crazy but I thought I was getting closer, until I got to the point where in test 1, I could hardly advance the median score over 100 points (as a matter of fact I got to a maximum median of a 105 so far, although the current medium AI shipped with the game has a median of 99 in test 1 ).
Eventually I had to take a step back, re-evaluate what I was doing and decided how to proceed.
As Elon Musk said a while back, during a Starship update event : "If a design is taking too long, the design is wrong"
I didn't actually want to start from scratch (sorry Elon, of course I still love you) mostly because I was and I still am, convinced that MCTS is the way to go.
What I did is changing radically the evaluation functions. Where Google's AlphaGO uses a specifically trained neural network to evaluate the goodness of a move, I decided to rely on a series of statements, rules if you will more readable and simple then a big bunch of weights.
What that mean is that the basic approach is still the same, find and objective and build towards it but instead of having countless generic parameters to decide the objective I have now a dynamic set of logic rules. In a nutshell, what I did is giving every cards a starting value, and then define a set of rules to modify that value as the game progress. The idea is to reach a complex evaluation function by putting together simpler statements.
It might not look like a big change over the previous direction but it actually is for a few reason:
- It simplify a lot the codebase and it's readability
- Given the right tool, It makes it possible for non-programers people to create and test Ais
- It's usually clearer to understand what and why the AI is choosing a particular move
It's basically like Lego, I just need to code the basic bricks and then try to put them together in a meaningful structure. Obviously there is no guarantee that this way is the right way, but it's a way and it's what I'm working on right now.
At the moment I already have a set of basic rules ready; a quick test with just a few rules set, resulted in a 76 points median on test 1 and 63 playing against the current medium AI (which scored a median of 74).


Here are a few screenshot with examples of how rules works:




Then we have a rule that says that the output of our harvest will produce 2 ore more stones, then the value of the Guild Of Sculptors is increased by 50.
Last but not least, we have a rule that says that if we have 12 or more gold in our warehouse then the value of all characters is increased by 70 points )
So this is it. This is where we are now. The problem is that right now, the time I can dedicate to Lorenzo is extremely reduced; we are a small studio and the human resources we have are limited and constantly dancing between projects.
Luckily though, I'm close to finish the implementation of what I consider to be the minimum set of rules to build a reasonable AI so, hopefully soon enough, I'll be able to give other people in the studio a tool to proceed with the development and testing so that I will know if all this change in direction is a bust or the real deal.
I guess that's it for now.

Cheers!
Cody